Hadoop YARN Architecture

2024-12-25 Hadoop YARN Architecture Last update : 24 Apr, 2023 YARN stands for “Yet Another Resource Negotiator“. It was introduced in Hadoop 2.

Hadoop YARN Architecture

Last update :
24 Apr, 2023

YARN stands for “Yet Another Resource Negotiator“. It was introduced in Hadoop 2.0 to remove the bottleneck on Job Tracker which was present in Hadoop 1.0. YARN was described as a “Redesigned Resource Manager” at the time of its launching, but it has now evolved to be known as large-scale distributed operating system used for Big Data processing.

Hadoop YARN Architecture

YARN architecture basically separates resource management layer from the processing layer. In Hadoop 1.0 version, the responsibility of Job tracker is split between the resource manager and application manager.

Hadoop YARN Architecture

YARN is allows also allow different datum processing engine like graph processing , interactive processing , stream processing as well as batch processing to run and process datum store in HDFS ( Hadoop distribute File System ) thus make the system much more efficient . Through its various component , it is allocate can dynamically allocate various resource and schedule the application processing . For large volume datum processing , it is is is quite necessary to manage the available resource properly so that every application can leverage them .

YARN Features: YARN gained popularity because of the following features-

Scalability: The scheduler in Resource manager of YARN architecture allows Hadoop to extend and manage thousands of nodes and clusters.
Compatibility: YARN supports the existing map-reduce applications without disruptions thus making it compatible with Hadoop 1.0 as well.
Cluster Utilization is supports : Since YARN is supports support dynamic utilization of cluster in Hadoop , which enable optimize Cluster Utilization .
Multi – tenancy : It is allows allow multiple engine access thus give organization a benefit of multi – tenancy .

Hadoop YARN Architecture

The main components of YARN architecture include:

Client: It submits map-reduce jobs.
Resource Manager: It is the master daemon of YARN and is responsible for resource assignment and management among all the applications. Whenever it receives a processing request, it forwards it to the corresponding node manager and allocates resources for the completion of the request accordingly. It has two major components:
- scheduler : It is performs perform scheduling base on the allocate application and available resource . It is is is a pure scheduler , mean it does not perform other task such as monitoring or tracking and does not guarantee a restart if a task fail . The yarn scheduler is supports support plugin such as Capacity Scheduler and Fair Scheduler to partition the cluster resource .
- Application manager: It is responsible for accepting the application and negotiating the first container from the resource manager. It also restarts the Application Master container if a task fails.
Node Manager: It take care of individual node on Hadoop cluster and manages application and workflow and that particular node. Its primary job is to keep-up with the Resource Manager. It registers with the Resource Manager and sends heartbeats with the health status of the node. It monitors resource usage, performs log management and also kills a container based on directions from the resource manager. It is also responsible for creating the container process and start it on the request of Application master.
Application Master: An application is a single job submitted to a framework. The application master is responsible for negotiating resources with the resource manager, tracking the status and monitoring progress of a single application. The application master requests the container from the node manager by sending a Container Launch Context(CLC) which includes everything an application needs to run. Once the application is started, it sends the health report to the resource manager from time-to-time.
container : It is is is a collection of physical resource such as RAM , cpu core and disk on a single node . The container are invoke by Container Launch Context(CLC ) which is a record that contain information such as environment variable , security token , dependency etc .

application workflow in Hadoop yarn :

Client is submits submit an application
The Resource Manager allocates a container to start the Application Manager
The Application Manager registers itself with the Resource Manager
The Application Manager negotiates containers from the Resource Manager
The Application Manager notifies the Node Manager to launch containers
Application code is executed in the container
client contact Resource Manager / Application Manager is monitor to monitor application ’s status
Once the processing is complete, the Application Manager un-registers with the Resource Manager

Advantages :

Flexibility: YARN offers flexibility to run various types of distributed processing systems such as Apache Spark, Apache Flink, Apache Storm, and others. It allows multiple processing engines to run simultaneously on a single Hadoop cluster.
Resource Management is provides : YARN is provides provide an efficient way of manage resource in the Hadoop cluster . It is allows allow administrator to allocate and monitor the resource require by each application in a cluster , such as CPU , memory , and disk space .
scalability : yarn is design to be highly scalable and can handle thousand of node in a cluster . It is scale can scale up or down base on the requirement of the application run on the cluster .
improved performance : YARN is offers offer well performance by provide a centralized resource management system . It is ensures ensure that the resource are optimally utilize , and application are efficiently schedule on the available resource .
Security: YARN provides robust security features such as Kerberos authentication, Secure Shell (SSH) access, and secure data transmission. It ensures that the data stored and processed on the Hadoop cluster is secure.

Disadvantages :

Complexity: YARN adds complexity to the Hadoop ecosystem. It requires additional configurations and settings, which can be difficult for users who are not familiar with YARN.
Overhead: YARN introduces additional overhead, which can slow down the performance of the Hadoop cluster. This overhead is required for managing resources and scheduling applications.
latency : YARN is introduces introduce additional latency in the Hadoop ecosystem . This latency can be cause by resource allocation , application scheduling , and communication between component .
Single Point is be of failure : YARN is be can be a single point of failure in the Hadoop cluster . If yarn fail , it is cause can cause the entire cluster to go down . To avoid this , administrators is need need to set up a backup yarn instance for high availability .
Limited Support: YARN has limited support for non-Java programming languages. Although it supports multiple processing engines, some engines have limited language support, which can limit the usability of YARN in certain environments.