Snooze deployment on grid’5000 comes with scripts for configuring a hadoop cluster and launching benchmarks on it.
1 System deployment
You just have to follow the deployment procedure explained in the documentation.
After that you can launch some virtual machines. Since those virtual machines will host hadoop services we suggest you to set (at least) the number of vcpus to 3 and ram to 3GB.
2 Configure hadoop
Once deployed, you will find the hadoop deployment scripts on the first bootstrap :
$bootstrap) cd /tmp/snooze/experiments
You need to create a file containing the IPs addresses of your virtual machines.
To achieve this you can make a request to the EC2 API to get the list of instances running, this will return a XML containing all the instances (and their IPs).
Finally you will just have to parse the output to get the IPs.
In the following code, we assume that you are connected to the first bootstrap and that the SnoozeEC2 service is running on this node.
$bootstrap) curl localhost:4001?Action=DescribeInstances > instances
The following code will output the list of IPs addresses of your running virtual machines. You have to redirect this to the file /tmp/snooze/experiments/tmp/virtual_machine_hosts.txt.
require 'rexml/document'
include REXML
# instance file contains the output of "curl snoozeec2?Action=DescribeInstances"
file = File.new 'instances'
doc = Document.new file
XPath.each(doc, "//ipAddress"){ |item| puts item.text}
Finally, go to /tmp/snooze/experiments/ and launch :
$bootstrap) ./experiments -m configure
--
[Snooze-Experiments] Configuration mode (normal, variable_data):
normal
3 Launch a benchmark
You can get the list of available benchmark by typing :
$bootstrap) ./experiments -m benchmark
--
[Snooze-Experiments] Benchmark name (e.g. dfsio, dfsthroughput, mrbench, nnbench, pi, teragen, terasort, teravalidate, censusdata, censusbench, wikidata, wikibench):
Choose one and you’re done.