Saturday, January 29, 2011

SOLR: speed up batch posting

If you are familiar with the Apache SOLR and deal with index worth of millions documents, it becomes quite important to be able to (re-)index fast. There are various techniques as to what to tweak for the indexing go faster. Another simple way to speed up your (batch) indexing without changing your SOLR schema is to modify logging.

Apparently, when deployed under Tomcat, SOLR logs each and every update request during POSTing process. Experience shows, that heavy http operation is done faster, when logging is minimal.

SOLR (as of 1.4 at least) has admin GUI which serves as a central information hub for the given SOLR core. Among other useful features, it has a page where one can set up logging levels of different SOLR components. In default SOLR installation you can access the page via http://localhost:8983/solr/admin/logging. By default, the logging levels amount mainly to INFO, which permits logging of all the select/update requests (imagine 1 million of such log entries for batch reindexing).

It would be handy to be able to automatically change the logging levels to, say, WARNING before batch POSTing and back to INFO after that. solr/admin/logging is declared as servlet in web.xml of the corresponding SOLR core:



Logging
org.apache.solr.servlet.LogLevelSelection



All the components which allow changing their logging levels are listed on the page http://localhost:8983/solr/admin/logging. Using curl we can send a post request to the servlet and set the desired levels. It is reasonable to implement a function, which takes logging level and url of SOLR core as parameters (choose your own favourite language, this is done in Perl):


sub setSolrLogLevel
{
my ($url, $level) = @_;

print "setting logging level to $level\n";
my $res = system("curl --user user:pass -d \"submit=set&root=$level&fi=$level" .
"&fi.alphasense=$level&fi.alphasense.solr=$level&fi.alphasense.solr.query=$level" .
"&fi.alphasense.solr.query.AlphaSenseQParserPlugin=$level&httpclient=$level&httpclient.wire=$level&httpclient.wire.content=$level&httpclient.wire.header=$level&javax=$level&javax.management=$level&javax.management.mbeanserver=$level&org=$level&org.apache=$level&org.apache.catalina=$level&org.apache.catalina.core=$level&org.apache.catalina.core.ContainerBase=$level&org.apache.catalina.core.ContainerBase.%5BCatalina%5D=$level&org.apache.catalina.core.ContainerBase.%5BCatalina%5D.%5Blocalhost%5D=$level&org.apache.catalina.core.ContainerBase.%5BCatalina%5D.%5Blocalhost%5D.%5B%2Fsolrtopic%5D=$level&org.apache.catalina.core.ContainerBase.%5BCatalina%5D.%5Blocalhost%5D.%5B%2Fsolrtopic%5D.%5BLogging%5D=$level&org.apache.catalina.core.ContainerBase.%5BCatalina%5D.%5Blocalhost%5D.%5B%2Fsolrtopic%5D.%5BSolrServer%5D=$level&org.apache.catalina.core.ContainerBase.%5BCatalina%5D.%5Blocalhost%5D.%5B%2Fsolrtopic%5D.%5BSolrUpdate%5D=$level&org.apache.catalina.core.ContainerBase.%5BCatalina%5D.%5Blocalhost%5D.%5B%2Fsolrtopic%5D.%5Bdefault%5D=$level&org.apache.catalina.core.ContainerBase.%5BCatalina%5D.%5Blocalhost%5D.%5B%2Fsolrtopic%5D.%5Bjsp%5D=$level&org.apache.catalina.core.ContainerBase.%5BCatalina%5D.%5Blocalhost%5D.%5B%2Fsolrtopic%5D.%5Bping%5D=$level&org.apache.catalina.session=$level&org.apache.catalina.session.ManagerBase=$level&org.apache.commons=$level&org.apache.commons.digester=$level&org.apache.commons.digester.Digester=$level&org.apache.commons.digester.Digester.sax=$level&org.apache.commons.httpclient=$level&org.apache.commons.httpclient.ChunkedInputStream=$level&org.apache.commons.httpclient.HeaderElement=$level&org.apache.commons.httpclient.HttpClient=$level&org.apache.commons.httpclient.HttpConnection=$level&org.apache.commons.httpclient.HttpMethodBase=$level&org.apache.commons.httpclient.HttpMethodDirector=$level&org.apache.commons.httpclient.HttpParser=$level&org.apache.commons.httpclient.HttpState=$level&org.apache.commons.httpclient.MultiThreadedHttpConnectionManager=$level&org.apache.commons.httpclient.SimpleHttpConnectionManager=$level&org.apache.commons.httpclient.auth=$level&org.apache.commons.httpclient.auth.AuthChallengeProcessor=$level&org.apache.commons.httpclient.cookie=$level&org.apache.commons.httpclient.cookie.CookiePolicy=$level&org.apache.commons.httpclient.cookie.CookieSpec=$level&org.apache.commons.httpclient.methods=$level&org.apache.commons.httpclient.methods.EntityEnclosingMethod=$level&org.apache.commons.httpclient.methods.ExpectContinueMethod=$level&org.apache.commons.httpclient.methods.PostMethod=$level&org.apache.commons.httpclient.params=$level&org.apache.commons.httpclient.params.DefaultHttpParams=$level&org.apache.commons.httpclient.params.HttpMethodParams=$level&org.apache.commons.httpclient.util=$level&org.apache.commons.httpclient.util.EncodingUtil=$level&org.apache.commons.httpclient.util.ExceptionUtil=$level&org.apache.commons.httpclient.util.IdleConnectionHandler=$level&org.apache.jasper=$level&org.apache.jasper.EmbeddedServletOptions=$level&org.apache.jasper.JspCompilationContext=$level&org.apache.jasper.compiler=$level&org.apache.jasper.compiler.Compiler=$level&org.apache.jasper.compiler.JspConfig=$level&org.apache.jasper.compiler.JspRuntimeContext=$level&org.apache.jasper.compiler.TldLocationsCache=$level&org.apache.jasper.servlet=$level&org.apache.jasper.servlet.JspServlet=$level&org.apache.jasper.servlet.JspServletWrapper=$level&org.apache.solr=$level&org.apache.solr.analysis=$level&org.apache.solr.analysis.BaseTokenFilterFactory=$level&org.apache.solr.analysis.BaseTokenizerFactory=$level&org.apache.solr.client=$level&org.apache.solr.client.solrj=$level&org.apache.solr.client.solrj.impl=$level&org.apache.solr.client.solrj.impl.CommonsHttpSolrServer=$level&org.apache.solr.common=$level&org.apache.solr.common.util=$level&org.apache.solr.common.util.ConcurrentLRUCache=$level&org.apache.solr.core=$level&org.apache.solr.core.Config=$level&org.apache.solr.core.CoreContainer=$level&org.apache.solr.core.JmxMonitoredMap=$level&org.apache.solr.core.RequestHandlers=$level&org.apache.solr.core.SolrConfig=$level&org.apache.solr.core.SolrCore=$level&org.apache.solr.core.SolrResourceLoader=$level&org.apache.solr.handler=$level&org.apache.solr.handler.AnalysisRequestHandler=$level&org.apache.solr.handler.XmlUpdateRequestHandler=$level&org.apache.solr.handler.admin=$level&org.apache.solr.handler.admin.LukeRequestHandler=$level&org.apache.solr.handler.admin.SystemInfoHandler=$level&org.apache.solr.handler.component=$level&org.apache.solr.handler.component.QueryElevationComponent=$level&org.apache.solr.handler.component.SearchHandler=$level&org.apache.solr.handler.component.SpellCheckComponent=$level&org.apache.solr.highlight=$level&org.apache.solr.highlight.SolrHighlighter=$level&org.apache.solr.request=$level&org.apache.solr.request.BinaryResponseWriter=$level&org.apache.solr.request.XSLTResponseWriter=$level&org.apache.solr.schema=$level&org.apache.solr.schema.FieldType=$level&org.apache.solr.schema.IndexSchema=$level&org.apache.solr.search=$level&org.apache.solr.search.SolrIndexSearcher=$level&org.apache.solr.servlet=$level&org.apache.solr.servlet.LogLevelSelection=$level&org.apache.solr.servlet.SolrDispatchFilter=$level&org.apache.solr.servlet.SolrRequestParsers=$level&org.apache.solr.servlet.SolrServlet=$level&org.apache.solr.servlet.SolrUpdateServlet=$level&org.apache.solr.spelling=$level&org.apache.solr.spelling.AbstractLuceneSpellChecker=$level&org.apache.solr.spelling.FileBasedSpellChecker=$level&org.apache.solr.spelling.IndexBasedSpellChecker=$level&org.apache.solr.update=$level&org.apache.solr.update.SolrIndexConfig=$level&org.apache.solr.update.UpdateHandler=$level&org.apache.solr.util=$level&org.apache.solr.util.SolrPluginUtils=$level&org.apache.solr.util.plugin=$level&org.apache.solr.util.plugin.AbstractPluginLoader=$level\" $url");
print "Result code:$res\n";
}


There you go. Call setSolrLogLevel("http://localhost:8983/solr/admin/logging", "WARNING"); before the batch POSTing and setSolrLogLevel("http://localhost:8983/solr/admin/logging", "INFO"); after the batch POSTing has finished.