Skip to content

Something works for a change (at least partially)

by

We have a sudden request from the sales department:

They want to see reports on an iPad to show the potential customers.

Well…

Should be easy?

  • Moday
    • We got the request
    • As the back-end server requires now a “Mobile” component I decide to install a new instance on another server
    • and fail
    • The server needs an Oracle server for its metadata
    • During the installation i got some warning which I ignored
    • Copying project from another server resulted in space issues of the instance
    • I’ve resized Oracle table spaces and recreated metadata
    • Still warnings during installation (about UTF settings and such)
    • Meanwhile a mail is sent by the product person that installation is done
  • Tuesday
    • Failed to use the instances so I switched to another Oracle instance
    • No errors
    • We copy the relevant projects and set up the server
    • Now – how do we test that it works on iPad? the sales person already flu to customer
    • So – I install Android SDK + Emulator on my desktop
    • My desktop barely lives with emulator memory requirements.
    • I find how to connect the  emulator to the network:
emulator.exe -avd Nexus_10_API_25 -dns-server 10.232.217.1
    • and how to install the APK:
adb.exe install /path/to/Product_GA_Android.apk
    • But I cannot find how to connect report application to the right URL to see the reports.
  • Wednesday
    • The products’s support person is OOO
    • We manage to put our hands on two iPad
    • New issue – how to connect the iPad to the corporate network?
    • To find the right person who can do it is a major task
    • I have some other tasks
  • Thursday (today)
    • The products’s support person is here
    • While showing him the emulator I do some extra clicks
    • and voila – I can see the reports in the emulator!!
    • We’re half way there…
    • Applause
  • TODO
    • Now we need the network and security teams to allow the network connection and we can do the final reports tuning.
    • We’ll meet them on Sunday.
    • Getting an iMac so we can have iPad emulator on it (a tough task)
    • Although – I’m sure that when the sales team will do a demo for the next customer, the network will not be available.

Now we can rest for the weekend

How to delete a cluster in HortonWorks (for testing purposes)

by

Background

During my testing of creating clusters using blueprints, i want just to recreate the cluster – not reinstall all packages.

Unlike the other vendor we’re using it is impossible to delete a cluster in HortonWorks.

It is possible to remove components and hosts, but you’ll end up with a single component (usually zookeeper) on a single host and this last component cannot be deleted.

So the steps should be:

Shutting down and cleaning cluster

  • Stop cluster
  • Remove all components in the right order of dependency (zookeeper will be last)
  • Remove all hosts till one is left
  • Stop ambari-agents on all hosts
  • Stop ambari-server
  • Stop PostgreSQL

Removing and recreating the cluster

  • Cleanup the PostgreSQL data dir – /var/lib/pgsql/data
  • Initialise DB using “postgresql-setup initdb”
  • setup ambari if required, e.g.
    “ambari-server setup –jdbc-db=postgres –jdbc-driver=/path/to/postgresql-jdbc.jar”
  • initialize ambari-server:
    “ambari-server setup -j /path/to/java/jdk/ -s”
  • start ambari-server and agents

And voila – empty ambari, ready for playing with blueprints.

TODO

  • Remove all components using REST APIs
  • Automate process with a single script

Upgrading RedHat 6.x to 7.y

by

Bottom line

I did it!!

My recommendation

Don’t do it…

.

.

.

.

And the story goes like this:

Our application is based on a cluster of Hadoop, installed with HortonWorks distribution.
One of the customers is going to start with Version X and then sometime upgrade to Version X+1. These version requires to change the platform matrix: from HDP 2.2 to HDP 2.3 or 2.4 and from RHEL 6.5 to RHEL 7.1.

Checking with the Vendor they gave us recommendation to upgrade first the Hadoop cluster, and then upgrade the OS.

Checking with their competitor Vendor, they specifically do not recommend and upgrade of OS and prefer a fresh installation of RHEL 7.1.

Checking with RedHat they say that it is doable, but they give a long list of constraints.

So i started to check it on some VMs we had.

Following instruction from here – https://access.redhat.com/solutions/637583.

First obstacle

I have to have IT to give me access to the various channels required.
It took them some time but they did mange to give it to me.

Second obstacle

Red-Hat version (6.5 Vs 6.7).
in the above instruction the first stage is to update all packages of the SO to the latest, i.e.

yum update -y
reboot

meaning to upgrade to RHEL 6.7

IT gave me access to local channels with 6.5 only. I did not notice at first, and it can be done afterwards. no harm done.

Third obstacle

Installing the pre-upgrade utility. Should be simple:

yum -y install preupgrade-assistant preupgrade-assistant-ui preupgrade-assistant-contents

But I failed with missing rpms – mainly openscap. it was not missing, but there was a mismatch between the versions of i686 and x86_64.

After a long battle, I found a site to download openscap with the lower version and I’ve installed the rpms locally.

openscap-1.0.8-1.el6_5.1.i686.rpm
openscap-1.0.8-1.el6_5.1.x86_64.rpm
openscap-content-1.0.8-1.el6_5.1.noarch.rpm
openscap-devel-1.0.8-1.el6_5.1.i686.rpm
openscap-devel-1.0.8-1.el6_5.1.x86_64.rpm
openscap-engine-sce-1.0.8-1.el6_5.1.x86_64.rpm
openscap-engine-sce-devel-1.0.8-1.el6_5.1.x86_64.rpm
openscap-extra-probes-1.0.8-1.el6_5.1.x86_64.rpm
openscap-python-1.0.8-1.el6_5.1.x86_64.rpm
openscap-utils-1.0.8-1.el6_5.1.x86_64.rpm

Only after installing the above rpms, with this exact version, and I tried a few others, did I manage to install the pre-upgrade tool and run it.

Forth obstacle

Output of the pre-upgrade utility itself.

It gave many warnings that are need to be checked and only a few FAILs that has to be resolved.

One of the error was that I had the wrong OS flavor – as I did not upgrade to 6.7. I managed to do it with out IT help, using rhel67.iso file that I had.

Another was about the /usr directory, which cannot reside on separate FS because of some changes done in RHEL 7.1.

Other issues related to eth naming convention (and good that this is just a VM with single network link and not a physical server…)

and a few others issues – some look more important than others, but none are show stoppers.

One of the “funny” issues is that there are many rpms installed that are not signed by RHEL. When i look at the list, most, if not all of them, are Hadoop related rpms…

Running the upgrade

RedHat warns in their instructions that it may take a long time, but in my small VM it was actually very fast, and even the reboot brought the VM alive with no issues.

Fifth obstacle

Let the fun begin.
The VM is connected to the Hadoop cluster, meaning that ambari-agent is running OK.
But no other process of Hadoop is willing to start.

To Be Continued…

Hortonworks fails to create namenode high availability

by

We’re working for more than a day to solve this out, until we’ve found how to work around it.

We have Ambari 1.6.1 running HDP 2.1 (quiet old, but this is what our customer has).

Issue:
When one tries to enable HA for namenode, the is a nice wizard telling you to do this and that.
On step 4 we should do:

hdfs dfsadmin -safemode enter
hdfs dfsadmin -saveNamespace

and then wait till the checkpoint is created.
But the checkpoint is never detected.

We even found a jira about this:

Ambari NN HA wizard cannot detect checkpoint” – https://issues.apache.org/jira/browse/AMBARI-7220

But the fix is in Ambari 1.7.0 – which does not support HDP 2.1 :-(

Solution:

We’ve copied the URL to another tab in the browser and just changed the step number:
http://host:8080/#/main/admin/highAvailability/enable/step5

Than we could continue with the wizard.

Updating CDH configuration using python and REST APIs

by

I have improved my previous script in the previous post

Now I’ve composed some python script (my first serious python script…) (And I think it can be done easily in bash…)

The script could be beautified, but I think it is quiet readable as it is now.

this particular script modify two parameters for YARN.

the parameters and their values are hard-coded, but it seems not too complicated to get everything from the input as variable.

Next step – creating CDH clusters using scripts only.

modify_yarn_config.py

#!/usr/bin/python
import urllib2
import sys, getopt
import base64
from urlparse import urlparse
import json
from pprint import pprint
import cdhRest
yarn_json = ' { "items" : [ { "name" : "yarn_nodemanager_resource_cpu_vcores", "value" : "8" }, { "name" : "yarn_nodemanager_resource_memory_mb", "value" : "8192" } ] }'
content_header = {'Content-type':'application/json', 'Accept':'application/vnd.error+json,application/json', 'Accept-Version':'1.0'}
yarnJsonObj = json.loads(yarn_json)
def main(argv):
  chost = ''
  username = ''
  password = ''
  try:
    opts, args = getopt.getopt(argv,"h:u:p:",["chost=","user=","pass="])
  except getopt.GetoptError:
    print 'test.py -h <chost> -u <username> -p <password>'
    sys.exit(2)
  for opt, arg in opts:
    if opt in ("-h", "--chost"):
      chost = arg
    elif opt in ("-u", "--user"):
      username = arg
    elif opt in ("-p", "--pass"):
      password = arg

  base64string = base64.encodestring( '%s:%s' % (username, password))[:-1]
  authheader = 'Basic %s' % base64string
  apiver = cdhRest.getversion(chost, authheader)
  baseurl = "http://" + chost + ":7180/api/" + apiver + "/clusters"
  clusterslist = cdhRest.get_cluster_names(baseurl, authheader)
  for cluster in clusterslist:
    baseurl1 = baseurl + "/" + cluster + "/services"
    service = cdhRest.get_service_name_by_type(baseurl1, "YARN", authheader)
    baseurl1 = baseurl1 + "/" + service + "/roleConfigGroups"
    confgroups = cdhRest.get_conf_groups(baseurl1, "NODEMANAGER", authheader)
    for confgroup in confgroups:
      baseurl2 = baseurl1 + "/" + confgroup + "/config?view=full"
      req = urllib2.Request(baseurl2)
      req.add_header("Authorization", authheader)
      handle = urllib2.urlopen(req)
      thepage = handle.read()
      data = json.loads(thepage)
# example taken from here http://stackoverflow.com/questions/21243834/doing-put-using-python-urllib2
      baseURL = baseurl1 + "/" + confgroup + "/config"
      request = urllib2.Request(url=baseURL, data=json.dumps(yarnJsonObj), headers=content_header)
      request.add_header("Authorization", authheader)
      request.get_method = lambda: 'PUT' #if I remove this line then the POST works fine.

      response = urllib2.urlopen(request)

if __name__ == "__main__":
  main(sys.argv[1:])

cdhRest.py

#!/usr/bin/python

import urllib2
import sys, getopt
import base64
from urlparse import urlparse
import json
from pprint import pprint

#
# function get version - get the REST api version of cm
#
def getversion(chost, authheader):
  theurl = "http://" + chost + ":7180/api/version"
  req = urllib2.Request(theurl)
  req.add_header("Authorization", authheader)
  handle = urllib2.urlopen(req)
  ret = handle.read()
  return ret

#
# get_cluster_names - get list of cluster names
#
def get_cluster_names(theurl, authheader):
  req = urllib2.Request(theurl)
  req.add_header("Authorization", authheader)
  handle = urllib2.urlopen(req)
  thepage = handle.read()
  data = json.loads(thepage)
  ret = []
  for xx in data["items"]:
    ret = ret + [xx["name"]]
  return ret

#
# get_service_name_by_type - get service name by type
#
def get_service_name_by_type(theurl, type, authheader):
  req = urllib2.Request(theurl)
  req.add_header("Authorization", authheader)
  handle = urllib2.urlopen(req)
  thepage = handle.read()
  data = json.loads(thepage)
  for xx in data["items"]:
    aa = xx["type"]
    if (aa == type):
    ret = xx["name"]
    return ret

#
# get_conf_groups - get list of configuration groups
#
def get_conf_groups(theurl, roleType, authheader):
  req = urllib2.Request(theurl)
  req.add_header("Authorization", authheader)
  handle = urllib2.urlopen(req)
  thepage = handle.read()
  data = json.loads(thepage)
  ret = []
  for xx in data["items"]:
    if (xx["roleType"] == roleType):
      ret = ret + [xx["name"]]
  return ret

aaa

Using REST APis with Cloudera

by

We added a new function to the oozie and it forces us to do some extra post installation tasks:

  • Copy the jar to oozie read location
  • Update a few configuration parameters

Posting a question in Cloudera forum returned with a quick answer that helped me write the script below.

Note – I could have the script much shorter, if I created the required json file hard-coded, and just loaded it. But this was mre fun, and it may be useful if there are some changes in the oozie configuration in the next versions.

Once i got this work, my next target is to create a new full cluster only with REST apis.

Update – need to modify the script – have to add view=full, otherwise i get partial configuration and the below does not work.

#!/bin/bash
#
# This script modifies configuration in oozie for the new functionality
#
# It extract first the current configuration for oozie in json format
# Then it checks 3 parameters: xxxHadoopCounter, com.xxx.oozie.actions.WaitAction and wait-action-0.5.xsd
# If the parameter is missing it is added to the json file
#
# Then it loads the json back into the CDH
#
#
# Usage
#
function Usage {
echo "Usage: $0 "
exit 2
}
export host=$1
[[ -z "$host" ]] && Usage
#
# get_version
#
function get_version {
curl -u admin:admin "http://$host:7180/api/version" 2> /dev/null
}

export apiver=`get_version`
#
# get_cluster_name
#
function get_cluster_name {
curl -u admin:admin "http://$host:7180/api/$apiver/clusters" 2> /dev/null | awk '/name/{print $NF}'|sed 's/,//' | sed 's/"//g'
}

cname=`get_cluster_name`

tmp_json=/tmp/oozie_`date +%Y%m%d_%H%M%S`.json

#
# Extract the json
#
curl -u admin:admin http://$host:7180/api/${apiver}/clusters/${cname}/services/oozie/roleConfigGroups/oozie-OOZIE_SERVER-BASE/config > $tmp_json 2> /dev/null
cp    $tmp_json ${tmp_json}.ORIG

#
# 1. add the oozie_config_safety_valve
#
grep -l -s xxxHadoopCounter=com.xxx.oozie.CountersElFunctions#xxxHadoopCounter $tmp_json > /dev/null 2> /dev/null
if [ $? -ne 0 ]
then
awk '
/items/{
print $0;
print " \"name\" : \"oozie_config_safety_valve\",\n\"value\" : \"\\noozie.action.sharelib.for.java\\njava\\n\\n\\noozie.service.ELService.ext.functions.workflow\\nxxxHadoopCounter=com.xxx.oozie.CountersElFunctions#xxxHadoopCounter\\n\\n\"\n}, { " ; next ;
}
{print $0}
' $tmp_json > ${tmp_json}.new
mv ${tmp_json}.new $tmp_json
fi

#
# 2. add the oozie_executor_extension_classes
#
grep -l -s com.xxx.oozie.actions.WaitAction $tmp_json > /dev/null 2> /dev/null
if [ $? -ne 0 ]
then
awk '
/oozie_executor_extension_classes/{addvalue=",com.xxx.oozie.actions.WaitAction";}
/value" :/&&addvalue{sub(/\"$/,"");print $0 addvalue "\"" ;addvalue="";next}
{print $0}
' $tmp_json > ${tmp_json}.new
mv ${tmp_json}.new $tmp_json
fi

#
# 3. add the oozie_workflow_extension_schemas
#
grep -l -s wait-action-0.5.xsd $tmp_json > /dev/null 2> /dev/null
if [ $? -ne 0 ]
then
awk '
/oozie_workflow_extension_schemas/{addvalue=",wait-action-0.5.xsd";}
/value" :/&&addvalue{sub(/\"$/,"");print $0 addvalue "\"" ;addvalue="";next}
{print $0}
' $tmp_json > ${tmp_json}.new
mv ${tmp_json}.new $tmp_json
fi
curl -u admin:admin \
-H "Content-Type: application/json" \
-X PUT \
http://$host:7180/api/${ver}/clusters/${cname}/services/oozie/roleConfigGroups/oozie-OOZIE_SERVER-BASE/config \
-d "`cat ${tmp_json}`" > /dev/null 2> /dev/null

small bug when re-downloading Cloudera Parcels

by

I had an issue with uid of users on different servers so i had to delete and reinstall the cluster.

unfortunately – I kept the old parcels files with the old UID

ls -l /opt/cloudera/parcel-repo/CDH-5.3.2-1.cdh5.3.2.p0.10-el6.parcel*
 -rw-r----- 1 cloudera-scm cloudera-scm 1558200266 May 12 14:51 /opt/cloudera/parcel-repo/CDH-5.3.2-1.cdh5.3.2.p0.10-el6.parcel
 -rw-r----- 1 cloudera-scm cloudera-scm 848904192 May 12 14:52 /opt/cloudera/parcel-repo/CDH-5.3.2-1.cdh5.3.2.p0.10-el6.parcel.part
 -rw-r----- 1 522 522 41 Apr 7 13:33 /opt/cloudera/parcel-repo/CDH-5.3.2-1.cdh5.3.2.p0.10-el6.parcel.sha

note that the *.parcel.sha file is with the old UID of cloudera-scm account.

I saw that the parcels is being downloaded and redownloded in an endless loop.

in the log file i saw:
 2015-05-12 14:07:14,322 INFO MainThread:com.cloudera.parcel.components.PeriodicParcelTasks: Set up periodic parcel tasks every 60 minutes.
 2015-05-12 14:07:14,337 INFO ParcelUpdateService:com.cloudera.parcel.components.LocalParcelManagerImpl: Found files CDH-5.3.2-1.cdh5.3.2.p0.
 10-el6.parcel under /opt/cloudera/parcel-repo
 2015-05-12 14:07:14,352 WARN ParcelUpdateService:com.cloudera.parcel.components.LocalParcelManagerImpl: Error reading hash file: CDH-5.3.2-1
 .cdh5.3.2.p0.10-el6.parcel.sha
 java.io.FileNotFoundException: /opt/cloudera/parcel-repo/CDH-5.3.2-1.cdh5.3.2.p0.10-el6.parcel.sha (Permission denied)
 at java.io.FileInputStream.open(Native Method)
 at java.io.FileInputStream.(FileInputStream.java:146)
 at com.google.common.io.Files$FileByteSource.openStream(Files.java:124)
 at com.google.common.io.Files$FileByteSource.openStream(Files.java:114)
 at com.google.common.io.ByteSource$AsCharSource.openStream(ByteSource.java:287)
 at com.google.common.io.CharSource.openBufferedStream(CharSource.java:80)
 at com.google.common.io.CharSource.readFirstLine(CharSource.java:157)
 at com.google.common.io.Files.readFirstLine(Files.java:674)
 at com.cloudera.parcel.components.LocalParcelManagerImpl.readFirstLineFromFile(LocalParcelManagerImpl.java:392)
 at com.cloudera.parcel.components.LocalParcelManagerImpl.getParcelHash(LocalParcelManagerImpl.java:348)
 at com.cloudera.parcel.components.LocalParcelManagerImpl.processParcel(LocalParcelManagerImpl.java:182)
 at com.cloudera.parcel.components.LocalParcelManagerImpl.scanRepo(LocalParcelManagerImpl.java:142)
 at com.cloudera.parcel.components.LocalParcelManagerImpl$1.run(LocalParcelManagerImpl.java:155)
 at com.cloudera.parcel.components.LocalParcelManagerImpl$1.run(LocalParcelManagerImpl.java:152)
 at com.cloudera.cmf.persist.ReadWriteDatabaseTaskCallable.call(ReadWriteDatabaseTaskCallable.java:36)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)

Once i deleted the old sha file the downloding of the parcel ended and i could continue with the installation.

I’d think there should be some kind of warning to the screen that something is wrong so one won’t have to wait that long

This happend with CDH 5.4