Skip to content

Using falcon in HDP for backup – ongoing work

by on April 19, 2015

My project wants to check if falcon is suitable for backup between two clusters.

Following the only example I could find:

Main difference between the example and my tests is that I’ve created my own clusters and did not use the sandbox from HDP.

The issues I’ve encountered:

Issue #1

When you run falcon you’ll get the error:

Error: Invalid Execute server or port:
Cannot initialize Cluster. Please check your configuration for and the correspond server addresses.”

In order to overcome this one need to disable the parameter “yarn.timeline-service.enabled

Take from here –

In Ambari UI,click on Yarn, click on Configs, under Application Timeline Server uncheckthe box next to yarn.timeline-service.enabled, Save, then restart Yarn,then restart Falcon

Issue #2

Trying to submit the process entity showedan error:

falcon entity -type process -submit -file emailIngestProcess.xml
Error: org.apache.hadoop.ipc.RemoteException( User: falcon is not allowed to impersonate falcon

For this you’ll need to change the parameter “hadoop.proxyuser.falcon.groups

In the HDFS–>config to the right user permissions.
I’ve just put “*” (asterix) so it will grant all
then restart HDFS and other services

Issue #3

If you’re behind proxy – you’ll have to change the script in HDFS – can be done from HUE using the file browser
/ user/ ambari-qa/ falcon/ demo/ apps/ ingest/ fs/
Edit and add your proxy server (export http_proxy=http://proxyserver:8080 – or other port you’re using)

Issue #4
trying to load the rawEmailIngestProcess returns some error:

falcon entity -type process -schedule -name rawEmailIngestProcess
Error: null

Here there is probably a bug – the feed has to have an input. taken from here –

Current version I’m using probably does not have the fix

I created an empty feed (copied rawEmailFeed.xml and modified it)

<?xml version="1.0" encoding="UTF-8"?>
 A feed representing Hourly customer email data retained for 90 days
 <feed description="Empty feed" name="emptyFeed"
 <late-arrival cut-off="hours(4)"/>
 <cluster name="primaryCluster" type="source">
 <validity start="2014-02-28T00:00Z" end="2016-03-31T00:00Z"/>
 <retention limit="days(90)" action="delete"/>
 <location type="data"
 <location type="stats" path="/none"/>
 <location type="meta" path="/none"/>
 <ACL owner="ambari-qa" group="users" permission="0777"/>
 <schema location="/none" provider="none"/>

and then loaded it

falcon entity -type feed -submit -file emptyFeed.xml

I modified the emailIngestProcess.xml – added an inputs to it:

> <inputs>
> <input name="input" feed="emptyFeed" start="now(0,0)" end="now(0,0)" />
> </inputs>

and deleted and reloaded rawEmailIngestProcess

falcon entity -type process -delete -name rawEmailIngestProcess
falcon entity -type process -submit -file emailIngestProcess.xml

Issue #5

Beacuse i’m installing my own clusters and not using sandbox – one have to configure all correctly:

checkout Chapter 19.3 in

Need to change the property oozie.service.HadoopAccessorService.hadoop.configurations into something like:


Where h153 and h156 are host names of the two clusters name nodes and resource managers

This is so far.

Next steps- to backup hive tables.


From → Uncategorized

Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: