Wednesday, December 30, 2015

Apache Solr Enterprise Search Server -- Third edition

This year gave me a chance to be a technical reviewer of the book with search engine topic. The title is Apache Solr Enterprise Search Server and it saw the light in its third edition. The first edition back in 2010 helped me to start thinking in NoSQL way, despite that SQL has been literally everywhere (well, and still is). It does take a bit of mind warping to think beyond relational database lingo and data modelling and in my opinion is rather useful for your career as a software engineer.



Here goes my review on Amazon:

This book in its first edition was the first one around back in 2010, that covered Apache Solr in as much detail as I needed to get into the topic quickly. This third edition includes revisions for Apache Solr 5, notoriously covering things like Solr admin page, SolrCloud, scaling the search engine for large amount of documents, text analysis, indexing, search and even map-reducing your Solr index! In particular, throwing a MapReduce task at large-scale indexing task has been hard / unclear in the past and now it is available to any user of Apache Solr out of the box. This makes books like this immensely important to not waste one's time in looking around for useful bits of information scattered here and there. More importantly, authors of the book are directly involved into the project, either as Apache Solr / Lucene committers or active practitioners and developers of the technology. So I recommend this book for an entry-level and mid-level search engineers that look into getting their hands dirty with search problems and / or improving on the previously untapped areas of the search engine world.

Sunday, October 11, 2015

[ANNOUNCE] Luke 5.3.0 released: naturally runs on Java 8

This release runs on Java8 and does not run on Java7.

This release includes a number of pull requests and github issues. Worth mentioning:
#38 upgrade to 5.3.0 itself
#28 Added LUKE_PATH env variable to luke.sh
#35 Added copy, cut, paste etc. shortcuts, using Mac command key
#34 Fixed lastAnalyzer retrieval (this feature remembers the last used analyzer on the Search tab)
#31 200 stargazers on github (by the time of this release the number crossed 260). Luke community is growing.

Everybody is welcome to contribute. If you feel like you care about search / indexing or would like to get deeper with Apache Lucene, go ahead and pick a ticket: https://github.com/DmitryKey/luke/issues
And, don't be afraid, we do not have any complaint departments:


All you need is your favourite beverage and a good debugger.

Wednesday, July 8, 2015

[ANNOUNCE] Luke 5.2.0 released

This is a major release supporting lucene / solr 5.2.0. Download the zip here:

It supports elasticsearch 1.6.0 (lucene 4.10.4)
Issues fixed:
#20 Added support for reconstructing field values of indexed and not stored fields, that do not expose positions.
Pull requests:
#23 Elasticsearch support and Shade plugin for assembly
#26 added .gitignore to project
#27 Lucene 5x support
#28 Added LUKE_PATH env variable to luke.sh
#30 Luke 5.2

I'd like to highlight the contribution of Tomoko Uchida who has been recently very active in sending pull requests, including upgrade to lucene 5.x and first version of Apache Pivot based luke ui.

Wednesday, April 15, 2015

Luke gets support for Elasticsearch indices

That is that, really. The so long awaited proper support for elasticsearch indices.





Luke supported Apache Solr indices already. Why not Elasticsearch? The reason was, that ES uses its own SPI for postings format. If you tried to open an Elasticsearch index with luke before, you'd get something like:

A SPI class of type org.apache.lucene.codecs.PostingsFormat with name 'es090' does not exist. You need to add the corresponding JAR file supporting this SPI to your classpath. The current classpath supports the following names: [Lucene40, Lucene41]


The biggest issue of supporting custom SPI is that you'd need to hack the luke jar binary and add the ES SPI. I bet it is not what you would want to spend your time on.

With the excellent pull request by apakulov https://github.com/DmitryKey/luke/pull/23 luke uses shade maven plugin, that does all the magic. It magically updates the in-binary META-INF/services file with the following entry:

org.elasticsearch.index.codec.postingsformat.Elasticsearch090PostingsFormat
org.elasticsearch.search.suggest.completion.Completion090PostingsFormat
org.elasticsearch.index.codec.postingsformat.BloomFilterPostingsFormat


Currently this is available on luke master: https://github.com/DmitryKey/luke and a pre-release: https://github.com/DmitryKey/luke/releases/tag/luke-4.10.4-field-reconstruction

Saturday, March 21, 2015

Flexible run-time logging configuration in Apache Solr 4.10.x

In a multi-shard setup it is useful to be able to change log level in runtime without going to each and every shard's admin page.

For example, we can set the logging to WARN level during massive posting sessions and back to INFO, when serving the user queries.

In solr 4.10.2 these one-liners do the trick:

# set logging level to WARN,
# saves disk space and speeds up massive posting 
curl -s http://localhost:8983/solr/admin/info/logging \
                       --data-binary "set=root:WARN&wt=json" 
 
# set logging level to INFO,
# suitable for serving the user queries 
curl -s http://localhost:8983/solr/admin/info/logging \
                       --data-binary "set=root:INFO&wt=json"

Back from Solr you get a JSON with the current status of each configured logger.

Monday, March 16, 2015

Luke keeps getting updates and now on Apache Pivot

Originally developed for fun and profit by Andrzej Bialecki, the lucene toolbox luke continues to be developed. Its releases are published at: https://github.com/DmitryKey/luke/releases


Most recently Tomoko Uchida has contributed into effort of porting Luke to an Apache License 2.0 friendly GUI framework Apache Pivot. New branch has been created to host this work:

https://github.com/DmitryKey/luke/tree/pivot-luke

Currently supported Lucene: 4.10.4.

It is far from completion, but already now you can:

  • open your Lucene index and check its metadata

  • page through the documents and analyze fields


  • search the index

We will appreciate if you could test the pivot luke and give your feedback.