Skip to content

Duplicating Cloudera VM Cloudera related changesCloudeTable

by on January 27, 2015

Opposing the privided VMDK by cloudera, these steps show how to create a template from a VM on ESXi, handled by vsphere with (almost) any component.

Basic steps:

  1. Create a working cluster with all required components on a single VM
  2. Shut cluster down and set cdh services off
  3. Create template from VM and deploy a new VM from the template
  4. Perform hostname changes on the new VM
  5. Restart cluster in he new VM and check everything works

I’ve checked the following components:

HDFS, Hive, Hue, Impala, Oozie, Spark, YARN (MR2 Included), ZooKeeper

And I used these instructions:
Note – the psql commands changed between v4-8-3 and v5-3-0

Step 1 – installation

Install CDH with required components and check everything works

Step 2 – shut down cluster

From CDH web console – Shutdown cluster and Cloudera Management Services

From Linux CLI:

service cloudera-scm-agent stop
service cloudera-scm-server stop
service cloudera-scm-server-db stop
chkconfig cloudera-scm-agent off
chkconfig cloudera-scm-server off
chkconfig cloudera-scm-server-db off

Note – the chkconfig is important, so the services will not restart automatically on the new VM

Step 3 – Create VM template

From Vsphere UI (or any other vmware tool)

  1. Point on VM and right-click to “Clone to Template…
  2. Go to newly created template and right-click to “Deploy virtual Machine
  3. Start the newly created VM

Step 4 – Perform changes in new VM

Change hostname and set new IP (might need the VM console for this)
Useful command is system-config-network

Cloudera related changes:

  1. Change the host name in the file /etc/cloudera-scm-agent/config.ini
  2. Update host name in the following tables in Postgres
    Use dbvisualizer ( or any other tool)
    or the command “psql -U cloudera-scm -p 7432 -d scm
    password from here – /var/lib/cloudera-scm-server-db/data/generated_password.txt
    Note – some of the changes can be done from CM UI after the host tables is changed. i.e. not via the psql.

Table hosts

update hosts set name='NEWHOSTNAME.FQDN' where host_id=1;

Table hosts_aud

select * from hosts_aud;
update hosts_aud set name='NEWHOSTNAME.FQDN where host_id=1;

Table processes

select process_id,name,status_links from processes;

Create for each process its own update, something like:

update processes set status_links='{"status":"http://NEWHOSTNAME.FQDN:8042/"}' where process_id=66;
update processes set status_links='{"status":"http://NEWHOSTNAME.FQDN:8084/"}' where process_id=29;
update processes set status_links='{"status":"http://NEWHOSTNAME.FQDN:8091/"}' where process_id=30;
update processes set status_links='{"status":"http://NEWHOSTNAME.FQDN:50090/"}' where process_id=62;
update processes set status_links='{"status":"http://NEWHOSTNAME.FQDN:50070/"}' where process_id=63;
update processes set status_links='{"status":"http://NEWHOSTNAME.FQDN:50075/"}' where process_id=64;
update processes set status_links='{"status":"http://NEWHOSTNAME.FQDN:25000/"}' where process_id=73;
update processes set status_links='{"status":"http://NEWHOSTNAME.FQDN:19888/"}' where process_id=65;
update processes set status_links='{"status":"http://NEWHOSTNAME.FQDN:11000/oozie"}' where process_id=74;
update processes set status_links='{"status":"http://NEWHOSTNAME.FQDN:25010/"}' where process_id=71;
update processes set status_links='{"status":"http://NEWHOSTNAME.FQDN:8088/"}' where process_id=67;
update processes set status_links='{"status":"http://NEWHOSTNAME.FQDN:8086/"}' where process_id=27;
update processes set status_links='{"status":"http://NEWHOSTNAME.FQDN:8087/"}' where process_id=28;
update processes set status_links='{"status":"http://NEWHOSTNAME.FQDN:25020/"}' where process_id=72;
update processes set status_links='{"status":"http://NEWHOSTNAME.FQDN:8888/"}' where process_id=75;
update processes set status_links='{"status":"http://NEWHOSTNAME.FQDN:18088"}' where process_id=82;

Note: Need to check resource field as well

Tables configs_aud and configs(two similar tables)

select config_id,attr,value from configs_aud where value like '%OLDHOSTNAME%';

Update according to the output, e.g.:

update configs_aud set value='NEWHOSTNAME.FQDN' where config_id=63;
update configs_aud set value='NEWHOSTNAME.FQDN:7432' where config_id=16;

Table command

select command_id,arguments from commands where arguments like '%OLDHOSTNAME%';
update commands set arguments='{"@class":"com.cloudera.cmf.command.BasicCmdArgs","alertConfig":null,"args":["NEWHOSTNAME.FQDN","postgresql","NEWHOSTNAME.FQDN:7432","amon","amon","gybJy2O6OM"],"scheduleId":null,"scheduledTime":null}' where command_id=16;

Step 5 – Start Cluster on new VM

On new VM – Linux CLI – Restart services:

service cloudera-scm-server-db restart
service cloudera-scm-server restart
service cloudera-scm-agent restart
chkconfig cloudera-scm-agent on
chkconfig cloudera-scm-server on
chkconfig cloudera-scm-server-db on

From CDH web console – Start Cloudera Management Services and the Cluster itself

From CDH web console – Go to each component configuration tab and search for remains of old host name. eg:

  • in Hive – search for “Hive Metastore Database Host”
  • in Hue – search for “HDFS Web Interface Role”
  • in Zookeeper – search for “ZooKeeper Server ID”

From CDH web console – might also need to “Deploy client configuration”

And Lastly – Clean up old log files:

find /var/log –type f |grep -i OLDHOSTNAME
rm -f `find /var/log –type f |grep -i OLDHOSTNAME`

Next task – have a script or puppet to do it all.

Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: