Skip to content

Installing Greenplum Hadoop (GPHD) for hadoop on VM

by on February 7, 2013

The only distribution with Hadoop Virtualization Extension
Which gives another layer under the Rack, so one can run two VMs on the same physical host, without having to worry about the location of the duplicated blocks.

Greenplum comes out from EMC which owns VMware, so it all comes from the same family.

Some about Virtualization for hadoop:

I failed with deploying Serengeti on my cluster – mainly because my lack of knowledge in VMWare.
The installation keep failing, and I think it is because it requires DHCP to allocate IPs, although they mention it can work with static IPs and I think I gave it what it requires, but still no luck.
So I’m abandoning the Serengeti, and I created my own VMs and now I’m going to check it.

Installing GPHD:


6 servers

  • h177 – admin
  • h196 – master
  • h197 – slave1
  • h198 – slave2
  • h199 – slave3
  • h200 – slave4

Preparing the repo

cat /etc/yum.repos.d/puppetlabs.repo


yum -y install createrepo

yum -y install facter httpd mod_ssl

yum -y install postgresql-devel postgresql-server






ls -l *rpm

-rw-r–r–. 1 root root  564104 Jul 30  2011 augeas-libs-0.9.0-1.el6.rfx.x86_64.rpm

-rw-r–r–. 1 root root 1085304 Aug 22 00:55 puppet-2.7.19-1.el6.noarch.rpm

-rw-r–r–. 1 root root   25416 Aug 22 00:55 puppet-server-2.7.19-1.el6.noarch.rpm

-rw-r–r–. 1 root root   21520 Feb 14  2012 ruby-augeas-0.4.1-1.el6.x86_64.rpm

-rw-r–r–. 1 root root   11188 May  4  2010 ruby-shadow-1.4.1-13.el6.x86_64.rpm

rpm -ivh *rpm

download GPHD_1_2_0_0_GA.all.tgz

rpm -ivH ./GPHD_1_2_0_0_GA.all/icm/rpm/external_rpm/rubygems-1.3.7-1.el6.noarch.rpm ./GPHD_1_2_0_0_GA.all/icm/rpm/external_rpm/ruby-rdoc- ./GPHD_1_2_0_0_GA.all/icm/rpm/external_rpm/rubygem-mongrel-1.1.5-3.el6.x86_64.rpm ./GPHD_1_2_0_0_GA.all/icm/rpm/external_rpm/ruby-irb- ./GPHD_1_2_0_0_GA.all/icm/rpm/external_rpm/rubygem-daemons-1.0.10-2.el6.noarch.rpm ./GPHD_1_2_0_0_GA.all/icm/rpm/external_rpm/rubygem-gem_plugin-0.2.3-3.el6.noarch.rpm ./GPHD_1_2_0_0_GA.all/icm/rpm/external_rpm/rubygem-fastthread-1.0.7-2.el6.x86_64.rpm ./GPHD_1_2_0_0_GA.all/icm/rpm/external_rpm/rubygem-rake-0.8.7-2.1.el6.noarch.rpm

yum -y install libselinux-ruby.x86_64

run: GPHD_1_2_0_0_GA.all/icm/script/

missing Greenplum DB with the right version – 4.2.3. have only skip step.

jumping to page 15


rpm -ivH sshpass-1.05-1.el6.rf.x86_64.rpm

rpm -ivh ./GPHD_1_2_0_0_GA.all/icm/rpm/gphd_rpm/gphdmgr-webservices-1.0.0-1.noarch.rpm


#I had to run a few times:

* it will request to delete the postgres db

* have to shutdown manually the tomcat – BEFORE running the postinstall. use /usr/lib/gphd/gphdmgr/apache-tomcat/bin/


Checking OS…[OK]

Verifying pre-requisites…

Verifying Java… [OK]

Verifying httpd… [OK]

Verifying apache-tomcat… [OK]

Verifying postgresql-server… [FAILED]

An existing postgres installation has been detected. gphdmgr cannot proceed with a pre-existing database


        1. Abort now. I’ll do the clean up myself (default)

        2. Back up the /usr/lib/pgsq/data and continue with the installation (Note: this will require stopping postgresql)

        3. Delete existing data dir and continue with the gphdmgr installation

Please enter your choice: 3

This will clean up the postgres data dir. Are you sure you want to proceed ? y/n: y

GPHD manager will remove the /usr/lib/pgsql/data and proceed with a new DB setup

Verifying other dependency packages… [OK]

Disabling SELINUX… [OK]

Setting network accessible yum repo… [OK]

Setting up postgres database server… [OK]

Initializing postgresql db…

Removing exising postgresql data dir… [OK]

Initializing puppet…

Backing up the old puppet agent report data… [OK]

Backing up the old gphdmgr icm puppetmaster ssl data… [OK]


Initializing tomcat… [OK]

Setting up gphdmgr… [OK]

Setting up Mongrel… [OK]

Creating soft links… [OK]

Setup complete

Verifying Services:

– Postgres database is running with pid:9191

– GPHDMgr web service is running with pid:9626

– Puppetmaster is running with pid:10132










SUCCESS: GPHDMgr admin node installation is successful

Deployment details can be found at /var/log/gphd/gphdmgr/installer.log

# installing

#Note – need to unset http_proxy

icm_client scanhost -o host1,host2,host3


Second installation attempt: components one by one


icm_client deploy hadoop -l testgphd -n h196 -j h196 -d h197,h198,h199,h200 -f /data/dfs -g /data/name1,/data/name2 -m /data/mapred -c /data/ckp


icm_client deploy zookeeper -l testgphd -z h196


icm_client deploy hbase -l testgphd -m h196 -r h197,h198,h199,h200

icm_client deploy pig -l testgphd -g h196

icm_client deploy mahout -l testgphd -m h196


From → Uncategorized

Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: