Friday, September 6, 2013

Monitoring Solr with graphite and carbon


This blog post requires graphite, carbon and python to be installed on your *ux. I'm running this on ubuntu.

http://graphite.wikidot.com/
https://launchpad.net/graphite/+download


To setup monitoring RAM usage of Solr instances (shards) with graphite you will need two things:

1. backend: carbon
2. frontend: graphite

The data can be pushed to carbon using the following simple python script.

In my local cron I have:

1,6,11,16,21,26,31,36,41,46,51,56 * * * * \
   /home/dmitry/Downloads/graphite-web-0.9.10\
          /examples/update_ram_usage.sh

The shell script is a wrapper for getting data from the remote server + pushing it to carbon with a python script:

scp -i /home/dmitry/keys/somekey.pem \
    user@remote_server:/path/memory.csv \ 
    /home/dmitry/Downloads/MemoryStats.csv

python \
  /home/dmitry/Downloads/graphite-web-0.9.10\
    /examples/solr_ram_usage.py

An example entry in the MemoryStats.csv:

2013-09-06T07:56:02.000Z,SHARD_NAME,\
  20756,33554432,10893512,32%,15.49%,SOLR/shard_name/tomcat

The command to produce a memory stat on ubuntu:

COMMAND="ssh user@remote_server pidstat -r -l -C java" | grep /path/to/shard 


The python script is parsing the csv file (you may want to define your own format of the input file, I'm giving this as an example):

import sys
import time
import os
import platform
import subprocess
from socket import socket
import datetime, time

CARBON_SERVER = '127.0.0.1'
CARBON_PORT = 2003

delay = 60
if len(sys.argv) > 1:
  delay = int( sys.argv[1] )

sock = socket()
try:
  sock.connect( (CARBON_SERVER,CARBON_PORT) )
except:
  print "Couldn't connect to %(server)s on port %(port)d, is carbon-agent.py running?" % { 'server':CARBON_SERVER, 'port':CARBON_PORT }
  sys.exit(1)

filename = '/home/dmitry/Downloads/MemoryStats.csv'

lines = []

with open(filename, 'r') as f:
  for line in f:
    lines.append(line.strip())

print lines
 
lines_to_send = []

for line in lines:
  if line.startswith("Time stamp"):
    continue
  shard = line.split(',')
  lines_to_send.append("system."+shard[1]+" %s %d" %(shard[5].replace("%", ""),int(time.mktime(datetime.datetime.strptime(shard[0], "%Y-%m-%dT%H:%M:%S.%fZ").timetuple()))))

#all lines must end in a newline
message = '\n'.join(lines_to_send) + '\n'
print "sending message\n"
print '-' * 80
print message
print
sock.sendall(message)
time.sleep(delay)

After the data has been pushed you can view it in graphite GWT based UI. The good thing about graphite vs jconsole or jvisualvm is that it persists data points so you can view and analyze them later.




For Amazon users, an alternative way of viewing the RAM usage graphs is with CloudWatch, although at the moment of this writing it allows storing 2 weeks worth of data only.

1 comment:

Anonymous said...

I was trying to monitor solr with graphite and I found your nice blog but I found this on official solr docs. Checkout this : https://cwiki.apache.org/confluence/display/solr/Metrics+Reporting

Just quick question if you know.

Do you know where is solr.xml located if not found in solr/home ? I can't find it.