·
Download
cs246.vdi.tgz at
·
Download
Cygwin window at http://cygwin.com/install.html
·
Once
all the downloads complete, open Cygwin and type \tar -xvf cs246.vdi.tgz. it
will generate cs246.vdi. The VDI file you obtained is a Linux virtual machine
with a pre-configured Hadoop environment. If it does not work, download ‘vdi’
file direct from
- Start VirtualBox and click New. Type any name you want for your virtual machine like “cs246”. Choose Linux as the operating system to install and Ubuntu as the type of distribution to install. Set the memory size to at least 1024 MB. In the Hard Drive step, check the \Use an existing virtual hard drive" radio button and point to the provided cs246.vdi file, and click on Create.
Virtual
machine includes the following software
- Ubuntu 12.04
- JDK 7 (1.7.0 10)
- Hadoop 1.0.4
- Eclipse 4.2.1 (Juno)
Hadoop can be run in three modes.
1. Standalone (or local)
mode: There are no daemons running in this mode. Hadoop
uses the local file system as an substitute for HDFS file
system. If you do a JPS on
your terminal, there would be no Job tracker, Name node
or other daemons running.
The jobs will run as if there is 1 mapper and 1
reducer.
2.
Pseudo-distributed mode: All the
daemons run on a single machine and this setting mimics the behavior of a
cluster. All the daemons run on your machine locally using the HDFS protocol.
There can be multiple mappers and reducers.
3.
Fully-distributed mode: This is how Hadoop runs on a real cluster.
To start Hadoop in pseudo mode:
$
sh /usr/local/hadoop/bin/start-all.sh
starting namenode, logging to
/usr/local/hadoop/libexec/../logs/hadoop-cs246-namenode-cs246.out
localhost: starting datanode, logging to
/usr/local/hadoop/libexec/../logs/hadoop-cs246-datanode-cs246.out
localhost: starting secondarynamenode, logging to
/usr/local/hadoop/libexec/../logs/hadoop-cs246-secondarynamenode-cs246.out
starting jobtracker, logging to
/usr/local/hadoop/libexec/../logs/hadoop-cs246-jobtracker-cs246.out
localhost: starting tasktracker, logging to
/usr/local/hadoop/libexec/../logs/hadoop-cs246-tasktracker-cs246.out
To stop Hadoop:
$
sh /usr/local/hadoop/bin/stop-all.sh
stopping jobtracker
localhost: stopping tasktracker
stopping namenode
localhost: stopping datanode
localhost: stopping secondarynamenode
To view the Files in the HDFS :
$
hadoop fs -ls
Found 5 items
drwxr-xr-x -
cs246 supergroup 0 2013-02-08 16:51
/user/cs246/MaxTempDataHDFS
-rw-r--r-- 1
cs246 supergroup 9582237 2013-02-14
20:14 /user/cs246/NYSEDATA_HDFS
drwxr-xr-x -
cs246 supergroup 0 2013-01-11
07:01 /user/cs246/dataset
drwxr-xr-x -
cs246 supergroup 0 2013-02-07 17:12
/user/cs246/hadoopDir
drwxr-xr-x -
cs246 supergroup 0 2013-01-11
07:04 /user/cs246/output
To view the status of various
nodes in Hadoop:
$ jps
12868
TaskTracker
12555
SecondaryNameNode
13318 Jps
12116
NameNode
12332
DataNode
12649
JobTracker
Copy local data to HDFS:
$ Hadoop fs -copyFromLocal
/home/share/currency_dat.csv hdfs://localhost:54310/user/cs246/currency_data.csv
Note : Ensure the Hadoop name node is up and running.
hello nice one I am also active blogger Can so do you work on hadoop and can we communicate through mail khant.vyas@gmail.com my mail
ReplyDelete