Hadoop Map Reduce


hadoop map reduce tutorials

Hadoop MapReduce is  software framework for distributed processing of large data sets on
Compute clusters of commodity hardware. It is a sub-project of the Apache Hadoop project.
According to The Apache Software Foundation, the primary goal of Map Reduce is to Split
the input data set into independent chunks. That  processed in a completely parallel Manner.
The Hadoop Map-Reduce framework sorts the outputs of the maps, which are then input To the
reduce tasks. , both the input and the output of the job  stored in a File system.


Purpose:


This document describes all user-facing facets of the Hadoop Map Reduce framework And serves
as a tutorial.


Algorithm:


1. Generally, Map Reduce paradigm  is  based on sending the computer to where the data resides

2. MapReduce program executes in three stages, map stage, shuffle stage, and reduce
   Stage


Map stage:


The map or map per’s job is to process the input data. Generally, the input data is in the
A form of file or directory and stored in the Hadoop file system (HDFS). The input file Is
passed to the mapped function line by line.  The mapped processes the data.


Reduce stage:


It is the combination of the Shuffle stage and the Reduce stage. The Reducer’s  job Is to
process the data that comes from the mapped. After processing, it produces a new set Of output,
which will store in the HDFS?


1. Among a Map Reduce job, Hadoop sends the Map and Reduce tasks to the suitable servers In the
   cluster

2. Framework maintains all details of data such as verifying tasks
   completion and copying data around the cluster between the nodes.

3. Most of the making takes place on nodes with data on local disks that reduces the network
   Traffic

4. After completion of the given tasks, the cluster collects and reduces the data to form an The
   appropriate result, and sends it back to the Hadoop server.


Input & Output (Java perspective):

The Map Reduce framework operates on <key, value> pairs, that is, the framework views the Input to
the job,  set of <key, value> pairs and produces a set of <key, value> pair the output of the job,
of different types The key and the value classes should be in order manner by the framework and
hence, Need to develop the Writable interface. Aim addition, the key classes have to put in place
The Writable-Comparable interface to ease sorting by the framework.



The terminology of Map-Reduce


1. PayLoad: Map and the Reduce functions developed by the applications. From the core Of the job

2. Mapper: Mapped maps the input key/value pairs to a set of intermediate key/value pair.

3. NamedNode: Node that manages the Hadoop Distributed File System (HDFS).

4. DataNode: Node where data presented in advance before any processing takes place.

5. MasterNode: Node where Job Tracker runs and which accepts job requests from clients.

6. SlaveNode: Node where Map and Reduce program runs.

7. JobTracker: Schedules jobs and tracks the assign jobs to Task tracker.

8. Task Tracker: Tracks the task and reports status to Job Tracker.

9. Job a program is an execution of a Mapped and Reducer across a dataset.

10. Task: An execution of a Mapped or a Reducer on a slice of data.

11. Task Attempt: A particular instance of an attempt to execute a task on a Slave Node.



Advantages of Hadoop Map-Reduce:


1. Hadoop is a platform that is scalable.


2. It comes across as a very cost-effective solution for businesses.


3. It will produce flexible data.

4.Hadoop Map Reduce takes minutes to process terabytes of data and hours of pet bytes of data.

5. Parallel processing

6. Simple model of programming



Occlusion:


When it processing under large data set.  Hadoop's Map Reduce programming allows for the processing

Of large volume of data.  Hadoop also triumphs across relational database & management systems when it

Comes to the process of large data clusters.

Share this

Related Posts

Previous
Next Post »