1) MapReduce: It's an alternative implementation of Hadoop Job tracker and task tracker, which can accelerate job execution performance. It eliminates the overhead associated with job tracker and task trackers in a standard Hadoop architecture while providing low-latency, HPC-style distributed processing.
2) IGFS: It's also an alternate implementation of Hadoop file system named IgniteHadoopFileSystem, which can store data in-memory. This in-memory file system minimise disk IO and improve performances.
3) Secondary file system: This implementation works as a caching layer above HDFS, every read and write operation would go through this layer and can improve performance.
In this blog post, I'm going to configure apache ignite with Hadoop for executing in-memory map reduce job.
А high-level architecture of the Ignite hadoop accelerator is as follows:
OS: RedHat enterprise linux 6
RAM: 2 GB
Hadoop Version: 2.7.2
Apache ignite version: 1.6
We are going to install Hadoop Pseudo-Distributed cluster in one machine, means datanode, namenode, task/job tracker, everything will be one virtual machine.
First of all, we will install and configure hadoop and will be proceed to apache ignite. Assume that JAVA 1.7_4* has been and installed and JAVA_HOME are in env variables.
Unpack the hadoop distribution and set the JAVA_HOME path in the etc/hadoop/hadoop-env.sh file as follows:
export JAVA_HOME=JAVA_home pathStep 2:
Add the following configuration in etc/hadoop/core-site.xml
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://localhost:9000</value> </property> </configuration>etc/hadoop/hdfs-site.xml:
<configuration> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration>Step 3:
Setup password less or passphraseless ssh as follows:
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa $ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys $ chmod 0600 ~/.ssh/authorized_keysand try ssh localhost, it shouldn't ask you for password.
Format the HDFS file system:
$ bin/hdfs namenode -formatStart namenode/datanode daemon
Also it's better to add HADOOP_HOME in environmental variable of operation system.
make a few directories in HDFS to run map reduce job.
bin/hdfs dfs -mkdir /user bin/hdfs dfs -mkdir /inputput some files in directory /input
$ bin/hdfs dfs -put $HADOOP_HOME/etc/hadoop /inputStep 6:
Run the map reduce example word count
$ bin/hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount /input/hadoop outputView the output files on the distributed filesystem:
$ bin/hdfs dfs -cat output/*it should be a huge file, lets see the fragment of the output
want 1 warnings. 1 when 9 where 4 which 7 while 1 who 6 will 23 window 1 window, 1 with 62 within 4 without 1 work 12 writing, 27Now lets install and configure the apache Ignite.
Unpack the distributive of the apache ignite some where in your system and add the IGNITE_HOME path to the rood directory of the installation.
Add the following libraries in $IGNITE_HOME/libs:
We are going to use ignite default configuration, config/default-config.xml, now we can start ignite as follows:
$ bin/ignite.shStep 10:
Now we will add a few staff to use ignite job tracker instead of hadoop. Add the HADOOP_CLASSPATH to the environmental variables as follows:
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$IGNITE_HOME/libs/ignite-core-1.6.0.jar:$IGNITE_HOME/libs/ignite-hadoop-1.6.0.jar:$IGNITE_HOME/libs/ignite-shmem-1.0.0.jarStep 11:
Stop hadoop name node and data nodes.
$ sbin/stop-dfs.shStep 12:
<property> <name>mapreduce.framework.name</name> <value>ignite</value> </property> <property> <name>mapreduce.jobtracker.address</name> <value>127.0.0.1:11211</value> </property>Step 13:
Run the previous example of the word count map reduce example:
$bin/hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount /input/hadoop output2Here is the output:
Now Execution time is faster than last time, when we used hadoop task tracker. Lets examine the Ignite task execution statistics through ignite visor:
In the next blog post, we will cover the Ignite for IGFS and secondary file system.
If you like this article, you would also like the book