Skip to content

YARN and logstash [FAIL] :(

by on January 22, 2015

I really tried, but so far have failed in sending the logs of YARN tasks to logstasth (using CDH 5.3).

(and now please come someone and say: “hey it’s easy: just add bla to foo” and make me happy again…)

What i did so far:

  • Installed logstash and kibana and elasticsearch
  • Played with logstash and log4j with java program
  • Add parameters to YARN log4j properties and messed things out

First – install logstash and freinds:

Just follow this excellent blog post “How to install Logstash with Kibana interface on RHEL

Second – Java program:

I took the code from here – “Log4j Hello World Example
But, as I’m a CLI person, I just removed the package line – see java code below.

To compile – you’ll need java, javac and log4j.
On redhat, install using yum, then, compile using javac, and run (and you’ll need the, see below):

yum -y install java-1.7.0-openjdk java-1.7.0-openjdk-devel log4j
javac -cp /usr/share/java/log4j.jar:.
java -cp /usr/share/java/log4j.jar:. HelloExample

Third – with SocketAppender for logstash

Took it from here – “Log4j SocketAppender and socket server example” See the file below

Forth – connect CDH log files to logstash

There are several ways to do it:

  • rsyslogd
    • As there is already a rsyslog installed on RHEL, this is easy.
    • Just add a configuration file – /etc/rsyslog.d/logstash.conf – see below an example
    • You’ll need to add each and every log of cloudera processes – each log is located in a different library
    • Restart service rsyslogd
    • rsyslog documetation are here – “Welcome to Rsyslog”  – but I do not find them too user freindly, so i just modify their examples.
  • logstash.conf
    • To use logstash – you’ll need to install logstash on each node (well, obviously…)
    • configure the /etc/logstash/conf.d/logstash.conf – their site have good documentation “Logstash Config Language
    • Start the service logstash

The problem with both rsyslogd and logstash is that they can accept regexp only in the file name of the log, while the dirctory of the log has to have a full path.
This is not good for YARN processes, as their logs are located under a generated directory which includes a unique name.
so, for most logs one can use /full/path/to/log/*.log except for yarn logs which should look like /full/path/to/*log*/*log.
I thought i could use the log4j properties snippet as described below.

  • CDH
    • The configuration has to be done via CDH web admin (http://host:7180)
    • go to each component and search for
    • You’ll get list of snippets with description like “For advanced use only, a string to be inserted into for this role only.”
    • After adding the lines of the listed below, the changed service has to be restarted.
    • CDH then regenerate the configruation directory and add the lines to the conf file

Fifth – YARN fail

After doing all the previous steps, suddenly all my YARN Node Managers appeared as red.
The java process was up, but:
log files were not created (/var/log/hadoop-yarn/
the ports 8042 were not opened

After investigation, I’ve found that my configuration was bad and I removed the

So far conclusion – my task of sending the YARN logs to logstash had failed.

Last – to do next

Until now I played on our large cluster and interfere with other members with the logstash playing.
(foolishly, I thought it will work)

Now I’m going to create a small cluster on VM and will play there,

Once success, I’ll post again.

Appendix – here are the files I used

import org.apache.log4j.Logger;
public class HelloExample{
	final static Logger logger = Logger.getLogger(HelloExample.class);
	public static void main(String[] args) {
		HelloExample obj = new HelloExample();
	private void runMe(String parameter){
			logger.debug("This is debug : " + parameter);
		if(logger.isInfoEnabled()){"This is info : " + parameter);
		logger.warn("This is warn : " + parameter);
		logger.error("This is error : " + parameter);
		logger.fatal("This is fatal : " + parameter);


#Define the log4j configuration for local application
log4j.rootLogger=ERROR, server
#We will use socket appender
#Port where socket server will be listening for the log events
log4j.appender.server.Port=4560 # Note - should be the same port as defined in logstash
#Host name or IP address of socket server
log4j.appender.server.RemoteHost=HOSTNAME # Note - should be the logstash server
#Define any connection delay before attempting to reconnect


$InputFileName /var/log/cloudera-scm-server/db.log
$InputFileTag cloudera-scm-server-db:
$InputFileStateFile state-cloudera-scm-server-db

$InputFilePollInterval 10

if $programname == 'cloudera-scm-server-db' then #Change the destination host
if $programname == 'cloudera-scm-server-db' then ~

From → FAIL

Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: