Sunday, May 6, 2018

Automatic writing with Deep Learning: Preface


This article was also reblogged at: https://dzone.com/articles/automatic-writing-with-deep-learning-preface


Quite many machine and deep learning problems are directed at building a mapping function of roughly the following form:


Input X ---> Output Y,


where:

X is some sort of an object: an email text, an image, a document; 

Y is either a single class label from a finite set of labels, like spam / no spam, detected object or a cluster name for this document or some number, like salary in the next month or stock price.

While such tasks can be daunting to solve (like sentiment analysis or predicting stock prices in realtime) they require rather clear steps to achieve good levels of mapping accuracy. Again, I'm not discussing situations with lack of training data to cover the modelled phenomenon or poor feature selection.

In contrast, somewhat less straightforward areas of AI are the tasks that present you with a challenge of predicting as fuzzy structures as words, sentences or complete texts. What are the examples? Machine translation for one, natural language generation for another. One may argue, that transcribing audio to text is also such type of mapping, but I'd argue it is not. Audio is a "wave" and the speech detection is an okay solved task (with state of the art above 90% of accuracy), however such an algorithm does not capture the meaning of the produced text,  except for where it is necessary to do the disambiguation of what was said. Again, I have to make it clear, that audio->text problem is not at all easy with its own intricacies, like handling speaker self corrections, noise and so on.



Lately, the task of writing texts with a machine (e.g. here) caught my eye on twitter. Previously, papers from Google on writing poetry or other text producing software were giving me creepy feelings. I somehow undermined the role of such algorithms in the space of natural language processing and language understanding and saw only diminishing value of such systems to users. Again, any challenging tasks might be solved and even bring value to solving other challenging tasks. But who would use an automatic poetry writing system? Why would somebody, I thought, use these systems -- just for fun? My practical mind battled against such "fun" algorithms. Again, making an AI/NLProc system capable of producing anything sensible is hard. Take the task of sentiment analysis, where it is quite unclear what the agreement between experts is, not to mention non-experts.

I think this post has poured enough of text onto the heads of my readers. I will use this post as a self-motivating mechanism to continue the research with systems producing text. My target is to complete the neural network training on the text from my Master thesis and show you some examples for your judgement of the usefulness of such systems.

Saturday, May 5, 2018

AI for lip reading

It is exciting to push your imagination for where else can you apply AI, machine learning and most certainly -- deep learning, that is so popular these days. I came across this question on quora that provoked me to think a bit how would one go about training a neural network to lip read. I don't actually know what made me answer this question more: that found myself in an unusual context sitting on an Angularjs meetup at Google offices in New York City (after work, usual level tired) or the question itself. Whatever the reason, here is my answer:

Source: http://theconversation.com/our-lip-reading-technology-promises-to-make-hearing-aids-more-human-45166

I would probably first start with formalizing what is lip reading process from a human understandable algorithm point of view. May be it is worth to talk to a professional, like a spy or something. Obviously you need training data. Understanding, what is lip reading from the algorithm perspective will affect on what data you need.


    1. To read a word of several syllables you’d need a sequence of anchor lip positions, that represent syllables. Or probably vowels / consonants. See, I don’t know, which one is best. But you’d need to start with the lowest level possible out of which you can compose larger sequences, like letters -> syllables -> words. Let’s call these states.
    2. A particular lip posture (is that the right word?) will most probably map to ambiguous states.
    3. Now the interesting part is how to resolve the ambiguities. Number 2 produces several options. Out of these you can produce a multitude of words that we can call candidates.
    4. Then you need to score candidates based on some local context information. Here it turns into a natural language understanding.
    5. I'd start with seq2seq.

    Tuesday, January 16, 2018

    New Luke on JavaFX

    Hello and Happy New Year to my readers!

    I'm happy to announce release of completely reimplemented Luke -- using JavaFX technology.  Luke is the toolbox for analyzing and maintaining your Lucene / Solr / Elasticsearch index on low level. 

    The implementation was contributed by Tomoko Uchida, who also did the honors of releasing it.

    The excitement of this release is supported by the fact, that in this version Luke becomes fully compliant with ALv2 license! And it gets very close to be contributed to Lucene project. At this point we need lots of testing to make sure JavaFX version is on par with the original thinlet based one.

    Here is how load index screen looks like in new JavaFX luke:


    After navigating to the Solr 7.1 index and pressing OK, here is what luke shows:


    I have loaded an index of Finnish wikipedia with 1,069,778 documents, and luke tells me that the index does not have deletions and was not optimized. Let's go ahead and optimize it:




    Notice, that on this dialogue you can request only expunging of deleted docs, without merging (the costly part for large indices). After optimization's complete, you'll have a full log of actions in front of you to confirm the operation was successful:


    You could also opt for checking the health of your index via Tools -> Check index menu item:



    Let's move to the Search tab. It has changed slightly in that search box has moved to the right, while search settings and other knobs were moved to the left.

    Thinlet version:


    JavaFX version:



    It is more intuitive UI now in terms of access to various tools like Analyzer, Similarity (now with access to parameters of new BM25 ranking model, that became default in Lucene and default in luke) and More Like This. There is a new Sort sub-tab that lets you choose a primary and secondary field to sort on. Collectors tab however is gone: please let us know, if you used it for some task -- would love to learn.

    Moving on to the Analysis tab, I'd like to draw your attention towards really cool functionality of loading custom jars with your implementation of a character filter, tokenizer or token filter to form your custom analyzer. Test these right in the luke UI without the need to reload shards in your Solr / Elasticsearch installation:



    Last, but not least is Logs tab. Essentially you should have been missing it for as long as luke exists: getting a handle of what's happening behind the scenes during an error case or a normal operation.

    In addition, this version of Luke supports the recently released Lucene 7.2.0.

    Wednesday, November 1, 2017

    Will deep learning make other machine learning algorithms obsolete?

    The fourth (fifth?) quoranswer is here! This time we'll talk a bit about deep learning and its role in making other state of the art machine learning methods obsolete.


    Will deep learning make other machine learning algorithms obsolete?


    I will try to take a look at the question from the natural language processing perspective.

    There is a class of problems in NLProc, that might not be benefited from deep learning (DL), at least directly. For the same reasons, machine learning  (ML) cannot help so easily. I will give three examples, which share more or less the same property so hard to model with ML or DL:

    1. Identifying and analyzing a sentiment polarity oriented towards a particular object: person, brand etc. Example: I like phoneX, but dislike phoneY. If you monitor the sentiment situation for the phoneX you'll expect this message to be positive, while negative polarity for the phoneY. One can argue, it is easy / doable with ML / DL, but I doubt you can stay solely within that framework. Most probably you'll need a hybrid with rule-based system, syntactic parsing etc, which somewhat defeats the purpose of DL: be able to train neural network on a large amount of data without domain (linguist) knowledge.

    2. Anaphora resolution. There are systems that use ML (and hence DL can be tried?), like BART coreference system , but most of the research I have seen so far is based around some sort of rules / syntactic parsing (this presentation is quite useful: Anaphora resolution). There is a vast application area for AR, including sentiment analysis and machine translation (also fact extraction, question-answering etc).

    3. Machine translation. Disambiguation, anaphora, object relations, syntax, semantics and more in a single soup. Surely, you can try to model all of these with ML, but commercial systems in MT are more or less done with rules (+ml recently). I'm expecting DL can produce advancements in MT. I'll cite one paper here that uses DL and improves on phrase-based SMT: [1409.3215] Sequence to Sequence Learning with Neural Networks Update: some recent fun experiment with DL based machine translation.

    The list can be extended to knowledge bases etc, but I hope I made my point.

    Sunday, October 29, 2017

    More fun with Google machine translation

    Having posted in quoranswer tag specifically on machine translation tricks and challenges + looking at some fun with Mongolian->Russian translation with Google, I decided to experiment with Mongolian->English pair. To make this work, you'd need a Cyrillic keyboard and type only Russian letters 'а' as input on Mongolian language side. Throughout the text I'll refer to Google Translate as "neural network" or "network", as it has been known that Google has switched its translation system over to a Neural Network implementation.

    So let's get going. It all starts rather sane:



    а   -> a
    аа -> ah

    And as we stack up more letters on the left, we start getting more interesting translations:

    ааа -> Well
    аааа -> ahaha
    ааааа -> sya
    аааааа -> Well
    ааааааа -> uh

    and skipping a bit:

    ааааааааа -> that's all

    (at this point you'd imagine that deep neural network had some fun you teasing it and wants you to stop. But no).

    аааааааааа -> that's ok
    аааааааааааааа -> that's fine

    ааааааааааааааааа -> everything is fine

    ааааааааааааааааааа -> it's a good thing


    And a bit more letters stacked up, the network begs to stop again, threatening:

    ааааааааааааааааааааааааааааааааааааа -> it's all over

    Then, after having enough of statements, the network starts asking questions.

    ааааааааааааааааааааааааааааааааааааааааа -> is it a good thing?

    and answers own question:

    аааааааааааааааааааааааааааааааааааааааааа -> it's a good thing

    few comments here and there:

    ааааааааааааааааааааааааааааааааааааааааааааааааааа -> a good time

    аааааааааааааааааааааааааааааааааааааааааааааааааааа-> to have a good time

    Eventually, more dictionary entries crop in:

    аааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааа -> to a whirlwind

    And, unexpectedly:

    ааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааа -> to make a date
    аааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааа -> to make a living

    Then, the network starts to output:

    ааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааа -> to make a dicision

    And begs me to put some sane words in instead of the letter non-sense:

    ааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааа -> put your own word

    аааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааа -> a whistle-blower

    The latter one is probably meant as an offence to add colour to network's ask.

    ааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааа -> have a private time in the world

    Notice how general words are, like "private", "time", "world". Still they are grammatical and make sense, except unlikely as translations.

    аааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааа -> a mortal year

    And to begging again:

    ааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааа -> have a kindness in the world

    Again, all my commentary is meant as fun, I'm not intending to (mis)lead you to something here.

    аааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааа -> a dead dog

    ааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааа -> put ā € |

    аааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааа -> have a deadline

    And more threats, again:

    аааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааа -> a hash of you

    аааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааа -> a mortal beefed up

    ааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааа -> have a heartbroker

    A heartbroker? Really? Something new.

    аааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааа -> a hash of a tree

    ааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааа -> to put a lot of light on it

    And finally, the network gets hungry:

    ааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааа -> to have a meal

    And positively concludes:

    аааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааа -> a date auspicious

    аааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааааа -> a friend of a thousand years

    Hope you had fun reading these, and please try some for yourselves.

    Saturday, October 28, 2017

    What are some funny Google Translate tricks?

    This is the third quoranswer blog post, answering the question What are some funny Google Translate tricks? I have decided to update the Google translations based on the current situation. I think they are still a lot of fun. Let me know in comments, if you came across some funny translations!


    There used to be a funny politically coloured trick for Russian->English, where sense was inverted on translation depending on what President names were used in positive vs negative context. I can’t reproduce it right now, but GT produces this at the moment:
    Обама не при чём, виноват Путин.
    human: Obama is innocent, Putin is to blame.
    GT: Obama has nothing to do with Putin. (Previously in Aug 4, 2016: "Obama is not to blame, blame Putin.")
    Путин не при чём, виноват Обама
    human: Putin is innocent, Obama is to blame.
    GT: Putin has nothing to do with Obama's fault. (Previously in Aug 4, 2016: "Putin is not being Obama's fault.")

    Tuesday, October 24, 2017

    What grammatical challenges prevent Google Translate from being more effective?

    Here is one more Quora question on the exciting topic of machine translation and my answer to it.

    The question had some sub-questions:

    • Is there a set of broad grammatical rules which decreases its efficacy?
    • How can these challenges be overcome? Is it possible to fully automate good quality translation?

    Below is my answer, hoping it will be interesting to learn about machine translation and different language pairs. Note, that translations given currently by Google Translate might differ from below as they were obtained in 2013. UPD: and they do! See comments to this post.

    Google is pretty good at modeling close enough language pairs. By close enough I mean languages that share multiple vocabulary units, have similar word order, morphological richness level and other grammatical features.

    Let's pick an example of a pair, where Google Translate (GT) is good. Round-trip method is one way to verify whether the languages are close enough, at least statistically, for GT:

    (these examples are using GT only, no human interpretation involved)

    English: I am in a shop.
    Dutch: Ik ben in een winkel.
    back to English I'm in a store. (quite ok)

    English: I danced into the room.
    Dutch: Ik danste in de kamer.
    back to English: I danced in the room. (preposition issues)


    Let's pick a pair of more unrelated languages (by the way, when we claim the languages are unrelated grammatically, they may also be unrelated semantically or even pragmatically: different languages were created by people to suit their needs at particular moments of history). One such pair is English and Finnish:

    Finnish: Hän on kaupassa.
    English: He is in the shop.
    Finnish: Hän on myymälä. (roughly the original Finnish sentence)

    This example has pronoun hän, which in Finnish is not gender specific. It should be resolved based on larger context, than just a sentence. Somewhere before this sentence in a text, there should have been a mention of who hän is referring to.

    To conclude this particular example: Google Translate translates on a sentence level and that is a limitation in itself, that makes correct pronoun resolution impossible. Pronouns are useful, if we wanted to understand, what was the interaction between the objects in a text.


    Let's pick another example of unrelated languages: English and Russian.

    Russian: Маска бывает правдивее и выразительнее лица.
    English: The mask is truthful and expressive face. (should have been: The mask can be more truthful and expressive than face)
    back to Russian: Маска правдивым и выразительным лицом. (hard to translate, but the meaning roughly: The mask being a truthful and expressive face).

    To conclude this example: languges with rich morphology that, in the case of the Russian language, convey grammatical case in just a word inflection and thus require deeper grammatical analysis, which pure statistical machine translation methods lack no matter how much data has been acquired. There exist methods of combining rules and statistics together.


    Another pair and different example:
    English: Reporters said that IBM has bought Lotus.
    Japanese: 記者は、IBMがロータスを買っていると述べた。
    back to English: The reporter said that IBM Lotus are buying.

    Japanese has a "recursive syntax", that represents this English sentence, like:

    Reporters (IBM Lotus has bought) said that.

    i.e. the verb is syntacically placed after the subject-object pair of a sentence or a sub-sentence (direct / indirect object).

    To conclude this example: there should exist a method of mapping syntax structures as larger units of the language and that should be done in a more controlled fashion (i.e. is hard to derive from pure statistics).