Introduction to YARN

Introduction to YARN Tutorials
Introduction:

When someone times mention the Map Reduce, we immediately think of Hadoop. This idea was being proposed by Google.
Map Reduce, generated huge interestin the computing world This interest was declared in Hadoop, which was developed
at Google. On general availability, Hadoop used to develop solutions use the hardware. Map Reduce was not a suitable
algorithm for the problem at hand. Hadoop was  re-architected, making it capable of supporting distributed computing
solutions, or only  Supporting Map/Reduce. Post the re-architecture exercise, the main feature that differentiates
Hadoop 2 from Hadoop 1 is YARN (Yet another Resource Negotiator). YARN was developed as a component of  The Map Reduce
project and was created to overcome some of the performance. Other solution models like DAG (Directed Acyclic Graph).



Definition of YARN:

Apache Hadoop YARN (Yet another Resource Negotiator) is a group management technology. YARN is main  key features in
second-generation Hadoop 2 version of the Apache Software Foundation's open source distributed processing framework.



Interactive Queries on YARN:


Apache is the application framework defines YARN. Allowing development of solutions Using Directed Acyclic Graph (DAG)
of tasks in single job, DAG tasks are a more powerful tool than Traditional Map Reduce. It reduces the need to execute
 many jobs to query Hadoop. Map Reduce Jobs are creating to execute a single query. Each Map Reduce job has to be
initialized, intermediate data  Needs to be store and swapped between jobs, which slow down query execution. In DAG
it is single Job and data do not need to be store again.

Real-time Processing on YARN :


Apache STORM brings real-time processing of high-speed data using the emit-Bolt type. A emit is the message source and
a Bolt processes the data. YARN is expected to allow placement Of data. Which in turn will reduce network transfer and
the cost of receiving data? Map reduces uses the receiving data.


Iterative Machine Learning on YARN :


Apache SPARK is an in-memory computing framework and ported onto Hadoop YARN. SPARK is Designed for iterative machine
learning algorithms faster by storing the data. Glib is machine learning library which uses SPARK. It uses to store
data in memory for efficient execution of iterative machine learning algorithms.


Why YARN was needed....?


Before we understand the need of Yarn we should know how cluster resource management was Done in Hadoop 1.0 and what
the problem in that approach was.


Graph Processing on YARN:

Apache Graph is an iterative graph processing system built to high scalability. Graph has  Been upgrade to run on Yarn.
It uses YARN for huge Synchronous Processing (BSP) for semi-structure Graph data on huge volumes. Use of Hadoop 1 and
its iterative nature, Hadoop 1 was inefficient.  And graph was designed to Hadoop 1. How everything stacks up on YARN
The Hadoop 2 technology stack is predicted to have an important effect on the application. These Application use for
batch processing, interactive queries, real-time computing and in-memory  Computing on top of YARN and federated
The yarn has different engines like Map Reduce, slider. Different Hadoop components can execute on these engines or on Yarn.
Some of the components like And Slider is still in development phase. The technology stack of the Hadoop 2 ecosystem is as follows.


1) Map/Reduce:

 Map/Reduce will run on top of YARN. The code remains same but configuration changes will be Required to transfer an application to Hadoop 2

2) Real Time-Slider:

Slider engine will bridge the gap between existing application and YARN application  And allow the existing application to use Hadoop 2
ecosystem via YARN. With Slider, distributed applications That aren't YARN-aware can now "slide into YARN" to run on Hadoop - usually with no code changes. STORM is plane to slide.

3) Existing Products which have migrated to YARN:

Without using engines like Tez or Slid.


The advantage of YARN:

Yarn does efficient use of the resource. There are no more fixed map-reduce slots. YARN provides central resource manager.
With YARN, you can now run many applications in Hadoop, all sharing a common resource. Yarn can even run the application
that does not follow Map Reduce model.

Conclusion :

YARN makes Hadoop 2 a more powerful, scalable and extendable architecture. When compared to its previous version.
YARN will provide development and architecture community. Which will have capabilities like batch, interactive queries,
real-time computing

Share this

Related Posts

Previous
Next Post »