Monday, February 23, 2009

Mr. Lecturer

Last week I have given lectures on Machine Translation (in Russian) in my home Saint-Petersburg State University.

I let my students know beforehand that the course is experimental and they happen to be the pioneers whom the course is going to be tested on.

After the 3,5 h lecture I have asked them, how do they feel about the experiment.

The answer was: The course was interesting. In case it would have been uninteresting, they would sleep. I believe it is the best compliment, especially taking into account the fact, that a human concentrates only first 40 (15?) minutes.

Saturday, February 14, 2009

Unicode in Perl

Sometimes it feels that perl's power in string manipulations comes at a cost of its synthax being awkward.

When you open a file for reading without caring in what encoding is its contents, you do:

open FILE, "<".$filename or die $!;

But if you do care of an encoding you should open the file using the following instruction:

open ENC_FILE, "<:encoding(cp1251)", $enc_filename or die $!;

Now the key point is in comma following the encoding instruction. If you put there "." instead (which I believe does the concatenation of stream direction sign "<" and the filename), the file fails to open.

Another important addition is: if you know in advance in which encoding the file contents is represented, specify it using the above encoding instruction. Doing this you get all the string data to be in internal perl's representation which is by default utf8.

Tuesday, February 10, 2009

I feel like on top of the world (c)

.. when I manage to make an unhadled exception in compiler / interpreter. This time it has happened with Perl Command Line Interpreter:

Perl: file or directory

To check this, the prescription says:

if (-d $file)
   print $file." is a directory\n";
} else {
   print $file." is a file\n";

When this is used in pair with IO:Dir, which helps you to enumerate contents of a given directory, one non-obvious step should not be forgotten:

tie %dir, 'IO::Dir', $dir;
foreach my $entry(keys %dir) {
   next if ($entry eq '.' or $entry eq '..');
   # important part is here: concatenation with the full path
   if (-d $dir."/".$entry)
      print $entry." is a directory\n";

Sunday, February 8, 2009

Simple Perl modules

Making it already a rool to post technical details for which I have spent more than 20 minutes, I decided to post as well this.

keywords: How to write perl modules

Answer: it's simple!

Create file in Lib/ directory where you like with the following contents:

package Lib::StringManip;

use strict;

use base 'Exporter';
our @EXPORT = ('trim');

# Perl trim function to remove whitespace from the start and end of the string
sub trim
my $string = shift;
$string =~ s/^\s+//;
$string =~ s/\s+$//;
return $string;


Add the full path to Lib/ to PERL5LIB environment variable. In my case (win32) it is: PERL5LIB=%PERL5LIB%;D:\Programming\Perl

i.e. inside D:\Programming\Perl I have Lib/ In Linux/Unix: export PERL5LIB=some_path/Lib/

Usage snippet:

#!perl -w

use strict;
use Lib::StringManip;

print trim(" trim me! ");

Thursday, February 5, 2009

Natural Langauge Processing and preparation of a human brain

Having read a number of articles dealing with natural language processing (NLP), cognition and linguistics, like Beyond Zipf's law: Modeling the structure of human language, I come to a conclusion, that NLP in essence is one of the most accurate and non-intruding ways to understand how human brain works.

Compare NLP, for example to neuropsychology.