Sunday, October 9, 2016

Luke 6.2.1 release and all things open source

Release

Indeed, luke 6.2.1 for lucene 6.2.1 is out of the oven. This is the proud moment for Tomoko Uchida, my co-committer to have been a release manager for the first time. Congrats, Tomoko!

Community

As luke gets more and more stargazers on github (520 at the time of this writing), I tend to glance over the list of them which sometimes makes my day. But beyond that and more importantly, this lays out the community of Lucene / Solr / Elasticsearch users and developers, that hopefully enjoy using luke too. 

Big names on user list

Having access to the stats of the luke repo gives insights on who and when might be talking about luke. This time, it is PayPal Engineering. And here is their nice technical writeup on indexing lots of data in Elasticsearch and field usage of luke for optimizing the lucene index data structures: https://www.paypal-engineering.com/2016/08/10/powering-transactions-search-with-elastic-learnings-from-the-field/

London Lucene/Solr hackday

Hackday is an amazing way to jump out of a routine and think big: what can be improved in the search land of Lucene / Solr technology and tooling? It was great to see that luke was picked up as one topic on the Lucene / Solr hackday in London: https://github.com/flaxsearch/london-hackday-2016. And there it is, Marple, browser-driven explorer for lucene indexes: https://github.com/flaxsearch/marple. Go check it out.

New contributors to luke

Tomoko and I have been active promoting luke on various occasions, Lucene / Solr Revolution 2015 and  ApacheCon 2015. And of course on twitter. Recently Florian Hopf has become active in sending pull requests to improve luke and fix various nagging issues. Welcome!

Wednesday, April 13, 2016

Luke 6.0 has been released

#luke 6.0 has been released. Major upgrade to #lucene 6.0 api: https://github.com/DmitryKey/luke/releases/tag/luke-6.0.0


There are other interesting features cooking, like access to DocValues: https://github.com/DmitryKey/luke/pull/53

If you feel like contributing, either by code or documentation, feel free to join the project:


Wednesday, December 30, 2015

Apache Solr Enterprise Search Server -- Third edition

This year gave me a chance to be a technical reviewer of the book with search engine topic. The title is Apache Solr Enterprise Search Server and it saw the light in its third edition. The first edition back in 2010 helped me to start thinking in NoSQL way, despite that SQL has been literally everywhere (well, and still is). It does take a bit of mind warping to think beyond relational database lingo and data modelling and in my opinion is rather useful for your career as a software engineer.



Here goes my review on Amazon:

This book in its first edition was the first one around back in 2010, that covered Apache Solr in as much detail as I needed to get into the topic quickly. This third edition includes revisions for Apache Solr 5, notoriously covering things like Solr admin page, SolrCloud, scaling the search engine for large amount of documents, text analysis, indexing, search and even map-reducing your Solr index! In particular, throwing a MapReduce task at large-scale indexing task has been hard / unclear in the past and now it is available to any user of Apache Solr out of the box. This makes books like this immensely important to not waste one's time in looking around for useful bits of information scattered here and there. More importantly, authors of the book are directly involved into the project, either as Apache Solr / Lucene committers or active practitioners and developers of the technology. So I recommend this book for an entry-level and mid-level search engineers that look into getting their hands dirty with search problems and / or improving on the previously untapped areas of the search engine world.

Sunday, October 11, 2015

[ANNOUNCE] Luke 5.3.0 released: naturally runs on Java 8

This release runs on Java8 and does not run on Java7.

This release includes a number of pull requests and github issues. Worth mentioning:
#38 upgrade to 5.3.0 itself
#28 Added LUKE_PATH env variable to luke.sh
#35 Added copy, cut, paste etc. shortcuts, using Mac command key
#34 Fixed lastAnalyzer retrieval (this feature remembers the last used analyzer on the Search tab)
#31 200 stargazers on github (by the time of this release the number crossed 260). Luke community is growing.

Everybody is welcome to contribute. If you feel like you care about search / indexing or would like to get deeper with Apache Lucene, go ahead and pick a ticket: https://github.com/DmitryKey/luke/issues
And, don't be afraid, we do not have any complaint departments:


All you need is your favourite beverage and a good debugger.

Wednesday, July 8, 2015

[ANNOUNCE] Luke 5.2.0 released

This is a major release supporting lucene / solr 5.2.0. Download the zip here:

It supports elasticsearch 1.6.0 (lucene 4.10.4)
Issues fixed:
#20 Added support for reconstructing field values of indexed and not stored fields, that do not expose positions.
Pull requests:
#23 Elasticsearch support and Shade plugin for assembly
#26 added .gitignore to project
#27 Lucene 5x support
#28 Added LUKE_PATH env variable to luke.sh
#30 Luke 5.2

I'd like to highlight the contribution of Tomoko Uchida who has been recently very active in sending pull requests, including upgrade to lucene 5.x and first version of Apache Pivot based luke ui.

Wednesday, April 15, 2015

Luke gets support for Elasticsearch indices

That is that, really. The so long awaited proper support for elasticsearch indices.





Luke supported Apache Solr indices already. Why not Elasticsearch? The reason was, that ES uses its own SPI for postings format. If you tried to open an Elasticsearch index with luke before, you'd get something like:

A SPI class of type org.apache.lucene.codecs.PostingsFormat with name 'es090' does not exist. You need to add the corresponding JAR file supporting this SPI to your classpath. The current classpath supports the following names: [Lucene40, Lucene41]


The biggest issue of supporting custom SPI is that you'd need to hack the luke jar binary and add the ES SPI. I bet it is not what you would want to spend your time on.

With the excellent pull request by apakulov https://github.com/DmitryKey/luke/pull/23 luke uses shade maven plugin, that does all the magic. It magically updates the in-binary META-INF/services file with the following entry:

org.elasticsearch.index.codec.postingsformat.Elasticsearch090PostingsFormat
org.elasticsearch.search.suggest.completion.Completion090PostingsFormat
org.elasticsearch.index.codec.postingsformat.BloomFilterPostingsFormat


Currently this is available on luke master: https://github.com/DmitryKey/luke and a pre-release: https://github.com/DmitryKey/luke/releases/tag/luke-4.10.4-field-reconstruction

Saturday, March 21, 2015

Flexible run-time logging configuration in Apache Solr 4.10.x

In a multi-shard setup it is useful to be able to change log level in runtime without going to each and every shard's admin page.

For example, we can set the logging to WARN level during massive posting sessions and back to INFO, when serving the user queries.

In solr 4.10.2 these one-liners do the trick:

# set logging level to WARN,
# saves disk space and speeds up massive posting 
curl -s http://localhost:8983/solr/admin/info/logging \
                       --data-binary "set=root:WARN&wt=json" 
 
# set logging level to INFO,
# suitable for serving the user queries 
curl -s http://localhost:8983/solr/admin/info/logging \
                       --data-binary "set=root:INFO&wt=json"

Back from Solr you get a JSON with the current status of each configured logger.