Friday, November 19, 2010
"OOPness" in Java
So by not supporting "pass by reference" feature Java makes your code even more OOP-ish. Even though the topic can be well known, this article helps to put things together and make a memory refresh.
Wednesday, November 3, 2010
Successive replacement in regular expressions (java)
So the task is like this: you have a text T, like "cat-1 dog-1 cat-1 elephant-1 cat-2 dog-2 cat-3".
Suppose we want to change numerals attached to the words "cat" to their word representations: "1" to "one", "2" to "two".
One straightforward way would be to match all "cat-([0-9])+" subsequences and then run replace operation on T.
So the code would look something like this:
String T = "cat-1 dog-1 cat-1 elephant-1 cat-2 dog-2 cat-3";
Pattern catPattern = Pattern.compile("cat-([0-9]+)");
Matcher catMatcher = catPattern.matcher(T);
Map numToWord = new HashMap();
numToWord.add("1", "one");
numToWord.add("2", "two");
numToWord.add("3", "three"); // ...
while (catMatcher.find())
{
T = T.replaceFirst(catMatcher.group(1), numToWord.get(catMatcher.group(1)));
}
This code produces:
cat-one dog-one cat-1 elephant-1 cat-two dog-2 cat-three
Which is missing one substitution. Ok, let's use replaceAll instead and make sure we touch only cats:
{
T = T.replaceAll("cat-" + catMatcher.group(1), "cat-" + numToWord.get(catMatcher.group(1)));
}
which produces what we want:
cat-one dog-1 cat-one elephant-1 cat-two dog-2 cat-three
But now what happens inside the loop is logically out of sync with the loop condition: we iterate over matches, but call replaceAll (probably not efficient either, as replaceAll will be attempted even when not needed anymore, for duplicate matches).
Any more elegant and correct solution?
Yes! It is called Matcher.appendReplacement
Pattern catPattern = Pattern.compile("cat-([0-9]+)");
Matcher catMatcher = catPattern.matcher(T);
MapnumToWord = new HashMap ();
numToWord.put("1", "one");
numToWord.put("2", "two");
numToWord.put("3", "three"); // ...
StringBuffer sb = new StringBuffer();
while (catMatcher.find())
{
System.out.println("Match:" + catMatcher.group(1));
catMatcher.appendReplacement(sb, "cat-" + numToWord.get(catMatcher.group(1)));
}
catMatcher.appendTail(sb);
now sb.toString() contains:
cat-one dog-1 cat-one elephant-1 cat-two dog-2 cat-three
If you append System.out.println(sb.toString()); inside the while loop, you will also see, that replacements happen in sync with the while loop's state, so that what is inside the loop and what while loops over are in sync.
Saturday, August 21, 2010
B2B: what is WSDL (simple explanation)
So the server declares a method (which is in fact a remote method):
Double squareRoot(Double number)
in its WSDL file. The client side then takes this file and generates the client side code, which handles the protocol of communication and remote method invocation automatically. What client needs is simply to implement a business logic around this invocation, like a web page with a text field for the number or the entire hardware with touch screen display.
In general the input and return types of remote methods can be simple and complex -- in fact as complex and "proprietary" as you need, because the entire WSDL file is nothing but xml as well as the messages being sent over the network.
One of the libraries used in the industry is Axis and Axis2. Using the library you can generate the java code from wsdl for you client (and even generate wsdl from your java code for you server).
Tuesday, August 17, 2010
Zero-witdth negative lookahead group: example
name1[one_space]surname1[two_spaces]age
surname2[two_spaces]surname2[two_spaces]age
In this example it is easy to solve the problem with zero-width negative lookahead group (?!\\w), because age is always supposed to be numeric. So the final code will be something like this:
String s = current_string.split("\\s{2}(?!\\w)");
Thursday, July 1, 2010
Парижские впечатления
Tuesday, April 27, 2010
Dumper and sorting of keys
use strict;
use Data::Dumper;
$hash_ref = generate your hash here
$Data::Dumper::Sortkeys = \&my_filter;
print Dumper($hash_ref), "\n";
sub my_filter {
my ($hash) = @_;
# return an array ref containing the hash keys to dump
# in the order that you want them to be dumped
return [
sort {$a <=> $b} keys %$hash
];
}
Wednesday, April 21, 2010
Сходимость
Всё это к тому, что полезно преподавать программирование, рассматривающее программу как целостную математическую схему, а не только как набор алгоритмов, приёмов и дизайна.
Tuesday, April 20, 2010
Thursday, April 8, 2010
An idea for GUI designers and developers -- a standalone app / OS level change.
This may give a lot of opportunities to quite flexibly configure your GUI world and save a lot of time spent otherwise on the monkey repetitions.
Saturday, March 13, 2010
giza++ under windows: episode 2
The full list of steps goes here (I suppose that source corpus is stored in corpus.ru and target corpus is stored in corpus.en):
1. produce vcb and dictionary files with "plain2snt.out corpus.ru corpus.en" (credits: http://vee-r.blogspot.com/2006/12/giza-guide.html)
2. produce cooc file with "snt2cooc.out corpus.ru.vcb corpus.en.vcb corpus.ru_corpus.en.snt > ru_en.cooc" (credits: myself, after analyzing the train-factored-phrase-model.perl from Moses package)
3. run GIZA++ with config file:
outputfileprefix play_giza
sourcevocabularyfile corpus.ru.vcb
targetvocabularyfile corpus.en.vcb
c corpus.ru_corpus.en.snt
CoocurrenceFile ru_en.cooc
model1dumpfrequency 1
model4smoothfactor 0.4
nsmooth 4
onlyaldumps 1
nodumps 1
p0 .999
m1 5
m2 0
m3 3
m4 3
o giza
if these options are stored in giza.config then run "GIZA++ giza.config". This produces giza.A3.final file, typical entry of which is:
# Sentence pair (1) source length 4 target length 7 alignment score : 2.25315e-10
there is a book on the table
NULL ({ }) на ({ }) столе ({ 5 6 7 }) лежит ({ 1 2 }) книга ({ 3 4 })
which means the following mapping:
"столе" --> "on the table"
"лежит" --> "there is"
"книга" --> "a book"
Running GIZA++ under win32 and under linux gives same results in terms of word mappings, except that the alignment scores may slightly differ due to possibly different float point precision models.
Saturday, February 27, 2010
giza++ under windows
gcc version 3.4.4 (cygming special, gdc 0.12, using dmd 0.125)
make GNU Make 3.81 (built for i686-pc-cygwin)
With
$ make
under giza-pp-v1.0.3\giza-pp I have obtained two executables: GIZA++-v2/GIZA++.exe and mkcls-v2/mkcls.exe which perfectly run under Windows XP Professional Version 2002 SP 2.
Saturday, December 19, 2009
RuSSIR'2010
4th Russian Summer School in Information Retrieval (RuSSIR 2010)
Monday September 13 - Saturday September 18, 2010
Voronezh, Russia
http://romip.ru/russir2010/eng/
FIRST CALL FOR COURSE PROPOSALS
The 4th Russian Summer School in Information Retrieval (RuSSIR 2010) will be held on September 13-18, 2010 in Voronezh, Russia, one of the major cities in south-western Russia. The mission of the school is to teach students about modern problems and methods in Information Retrieval; to stimulate scientific research in the field of Information Retrieval; and to create an opportunity for informal contacts among scientists, students and industry professionals. The Russian Conference for Young Scientists in Information Retrieval will be co-located with the school. RuSSIR 2010 will offer 4 or 5 courses and host approximately 100 participants. The working languages of the school are English (preferable) and Russian. The target audience of RuSSIR is advanced graduate and PhD students, post-doctoral researchers, academic and industrial researchers, and developers.
The RuSSIR 2010 Organizing Committee invites proposals for courses on a wide
range of IR-related topics, including but not limited to:
- IR theory and models
- IR architectures
- Algorithms and data structures for IR
- Text IR
- Multimedia (including music, speech, image, video) IR
- Natural language techniques for IR tasks
- User interfaces for IR
- Web IR (including duplicate detection, hyperlink analysis, query logs)
- Text mining, information and fact extraction
- Mobile applications for IR
- Dynamic media IR (blogs, news, WIKIs)
- Social IR (collaborative filtering, tagging, recommender systems)
- IR evaluation.
Each course should consist of five 90-minute-long sessions (normally in five consecutive days). The course may include both lectures and practical exercises in computer labs.
RuSSIR 2010 organizers will cover travel expenses and accommodations at the school for one lecturer per course, but there is no additional honorarium. The RuSSIR organizers would highly appreciate if, whenever possible, lecturers could find alternative funding to cover travel and accommodation expenses and indicate this possibility in the proposal.
Course proposals for RuSSIR 2010 must be submitted by email to Pavel Braslavski (pb@yandex-team.ru), by February 14, 2010. A course proposal should contain a brief description of the course (up to 200 words), preferred schedule, prerequisites, equipment needs, a short description of teaching/research experience and contact information of the lecturer(s). All proposals will be evaluated by the RuSSIR 2010 program committee according to the school goals, presentation clarity, lecturer’s qualifications and experience. Topics not featured at previous RuSSIRs are preferred. All submitters will be notified by March 1, 2010. Early informal inquiries about the school or the proposal evaluation process are encouraged.
About RuSSIR: The Russian Summer School in Information Retrieval is co-organized by the Russian Information Retrieval Evaluation Seminar (ROMIP) and Voronezh State University. Previous schools took place in Ekaterinburg, Taganrog, and
Petrozavodsk. Previous RuSSIR courses included IR Models (by Djoerd Hiemstra), Modeling Web Searcher Behavior and Interactions (by Eugene Agichtein), Computational Advertising (by James Shanahan), Text Mining, Information and Fact Extraction (by Marie-Francine Moens), Natural Language Processing for Information Access (by Horacio Saggion), Music IR (by Andreas Rauber), and other. Ricardo Baeza-Yates, VP of Research for Europe and Latin America at Yahoo, has confirmed as an invited lecturer for RuSSIR 2010 with the course 'Web data mining'.
About the RuSSIR 2010 location: Voronezh is a major city in southwestern Russia, spanning both sides of the Voronezh River, with population of 850,000. Express trains from Moscow to Voronezh take about 10 hours. There are also regular flights from Moscow, Munich, Prague, Tel-Aviv, and Istanbul. The town was founded in 1586. In the 17th century, Voronezh gradually evolved into a sizeable town, especially after Tsar Peter the Great built a dockyard in Voronezh. Currently, Voronezh is an administrative, economic and cultural center of the Voronezh region. Voronezh surrounding area has many attractions including archeological museum, nature and historical reserve Divnogorie, Kostomarovo cave monastery, Orlov trotter stud farm at Khrenovoe. Voronezh has a large student population: 37 institutions of higher education and 53 colleges educating over 127,000 students today. Voronezh State University was founded in 1918 and is one of the largest universities in Russia, with a total enrollment of 22,000.
Contacts
Use the e-mail address and substitute [at] with @ and [dot] with "." school[at]romip[dot]ru.
Monday, November 30, 2009
Semantic Analysis: theory, applications and use cases
Monday, October 12, 2009
Augmented Reality with Adobe Flash
As the presenters define it, AR is superimposing "graphics over real-world environments in realtime". So I checked out the exciting (and long) tutorial on how to set up the development environment to hit the road. For running a quick demo from the tutorial I printed a black and white pattern (it makes sense to make it smaller, than I made, as it is much easier to handle). Before diving into details I decided to have a clue on how it is going to look like and made a small video which I share with you (please be patient about video quality and mute your player).
It is early to draw any conclusions on the future of AR, but it sounds like a very exciting field of software development in the future. It is where creative people get together and come up with an exciting business card and all sorts of interesting things.
Wednesday, September 16, 2009
Sting's programmer's mind
"I quite like using songs as a modular system where you can mix and match lines from different songs. It's a tradition now and people expect it. Basically, it's all one big song. You could say it was an aspect of postmodernism if you liked but you'd be called pretentious if you said that."
Perl: concise way to map one array onto antother in perl hash
my @ar1 = (...);
my @ar2 = (...);
Easy way to map ar1 (keys) onto ar2 in perl is:
my %hash;
@hash{@ar1} = (@ar2) x @ar1;
Important assumption: the order in this two arrays matters. In other words first element of ar1 maps to first element of ar2, ..., n-th element of first array ar1 maps onto n-th element of ar2 and there exactly n elements in both arrays.
Examples
It is OK to have unique keys, obviously for the hash to preserve correct mapping (include use Data::Dumper in your code):
sub unique_mapping
{
my @ar1 = ('a', 'b', 'c', 'd', 'e');
my @ar2 = ('1', '2', '3', '4', '5');
print Dumper(\@ar1);
print Dumper(\@ar2);
my %hash;
@hash{@ar1} = (@ar2) x @ar1;
print Dumper(\%hash);
}
Result:
$VAR1 = [
'a',
'b',
'c',
'd',
'e'
];
$VAR1 = [
'1',
'2',
'3',
'4',
'5'
];
$VAR1 = {
'e' => '5',
'c' => '3',
'a' => '1',
'b' => '2',
'd' => '4'
};
The mapping is not what you might want to have in the case when keys are not unique:
sub keys_non_unique_mapping
{
my @ar1 = ('a', 'b', 'b', 'd', 'e');
my @ar2 = ('1', '2', '3', '4', '5');
print Dumper(\@ar1);
print Dumper(\@ar2);
my %hash;
@hash{@ar1} = (@ar2) x @ar1;
print Dumper(\%hash);
}
Result:
$VAR1 = [
'a',
'b',
'b',
'd',
'e'
];
$VAR1 = [
'1',
'2',
'3',
'4',
'5'
];
$VAR1 = {
'e' => '5',
'a' => '1',
'b' => '3',
'd' => '4'
};
Monday, September 14, 2009
Logging: helpful perl snippet to start with
use strict;
my $log_file=file_string_here;
my $LOG_HANDLE = open_log_file_for_writing($log_file);
log_entry($LOG_HANDLE, "Logging started");
log_entry($LOG_HANDLE, "Logging finished");
close_log_file($LOG_HANDLE);
sub open_log_file_for_writing
{
my $log_file = shift;
my $LOGGING_HANDLE;
print "INFO Opening log file...\n";
unless(open $LOGGING_HANDLE, ">> ", $log_file) {
return undef;
}
my $current_time = localtime;
print $LOGGING_HANDLE "\n".$current_time."\n";
return $LOGGING_HANDLE;
}
sub log_entry
{
my $LOGGING_HANDLE = shift;
my $log_entry = shift;
print $LOGGING_HANDLE $log_entry."\n";
}
sub close_log_file
{
my $LOGGING_HANDLE = shift;
print "INFO Closing log file...\n";
close($LOGGING_HANDLE);
}
upd: the logging handle can be externalized for easing the use of the logging. It comes at cost of global variable, but might still suit moderately sized perl-scripts. Code will change a bit:
my $g_LOGGING_HANDLE = open_log_file_for_writing($log_file);
log_entry("Logging started");
sub log_entry
{
my $log_entry = shift;
print $g_LOGGING_HANDLE $log_entry."\n";
}
Now you can call log("log entry goes here") from where you want quickly, without the need to pass down as well the logging handle, for example deep inside some procedure or function.
upd1: If you want to control whether to log or not, another small modification will do it for you:
my $LOG_ENABLED = 1; # put 1 to enable logging, 0 to disable logging
my $log_file = "plugin_request.log";
...
my $g_LOGGING_HANDLE;
undef $g_LOGGING_HANDLE;
if ($LOG_ENABLED)
{
$g_LOGGING_HANDLE = open_log_file_for_writing($log_file);
}
...
sub log_entry
{
my $log_entry = shift;
return if (!defined($g_LOGGING_HANDLE));
print $g_LOGGING_HANDLE $log_entry."\n";
}
sub close_log_file
{
my $LOGGING_HANDLE = shift;
return if (!defined($g_LOGGING_HANDLE));
print "INFO Closing log file...\n";
close($LOGGING_HANDLE);
}
Friday, September 11, 2009
C++: if file line length exceeds array (buffer) length
// read input file line by line
// allocate 256 characters for each line
ifstream input_file("some_file.txt");
const int BUF_SIZE=256;
char buf[BUF_SIZE];
string s, strCurString;
if (!input_file.is_open())
{
cerr << "File some_file.txt coudl not be open!" << endl;
getch();
exit(EXIT_FAILURE);
}
while(!input_file.eof()) {
input_file.getline(buf, BUF_SIZE);
strCurString = buf;
s += strCurString;
}
cout << "File contents: " << endl << s << endl;
But what if the current file length exceeds BUF_SIZE? Well, in this case the while loop will never end, becoming an infinite loop. Why? Simply, because in the input file stream object a special bit (failbit) will be set, saying that the last getline() operation has failed (in this case not due to the end of a file, but due to the buffer length exceeding). In this case all subsequent calls to getline() will fail to read anything (can be seen by calling input_file.gcount(), which constantly returns 0 (zero) after the last getline() call that led to setting a failbit).
To overcome this, we can use a trick found here:
// read input file line by line
// allocate 256 characters for each line
ifstream input_file("some_file.txt");
const int BUF_SIZE=256;
char buf[BUF_SIZE];
string s, strCurString;
if (!input_file.is_open())
{
cerr << "File some_file.txt coudl not be open!" << endl;
getch();
exit(EXIT_FAILURE);
}
while(!input_file.eof()) {
input_file.getline(buf, BUF_SIZE);
// remember about failbit when amount of
// characters in the current line is
// more than BUF_SIZE
if (input_file.fail() && !input_file.eof())
// clear up the failbit and
// continue reading the input file
input_file.clear();
strCurString = buf;
s += strCurString;
}
cout << "File contents: " << endl << s << endl;
Monday, August 31, 2009
HackDay'09
Sunday, August 2, 2009
Porvoo is a unique old city, but old town is disgustful
Surprised with the topic formulation? Read on for the details.
Visiting Porvoo
The unique old city is the old city of Porvoo [1] we have visited today. It's been a pleasure, though it was raining almost all the time. Some photos here [2]. Comments in Russian though (well, as it was once said in a humourous russian show KVN, "learn Russian in order to understand humour" and vice versa). Why Porvoo might have become widely known is thanks to a fact Finland had been declared autonomous Grand Duche of Russia in the local cathedral.
Related work and criticism
I usually try to avoid posting any anti-ads in the blog, but this time it's unavoidable. It is about the second part - old town, which is the restaurant with reported refined taste and special service. Well, it was about 16:00 already, but should it make any difference for the service quality? First we have waited way too much time for placing an order, basically from being *very* hungry to the point of thoughts to leave getting nothing for our pains. When the waitress came down, she missed the soup story completely leading to two of us having a soup (delicious, nothing to say!) instead of one. But the topic adjective starts with pasta. Have you ever eaten the fast food spaghetti? Somewhere in a student dormitory may be or in a camp. So take this spaghetti, add some sea food (which is nice however) and put it into the micro. The taste was something like this. We experienced culmination for an ordered and never arrived dessert. As some random passers-by might have fetched all the cakes we have ordered coupled with black tea, the wairtress wasn't very confused to inform us that only one cake left. Why that happened? Because the restaurant had another side feature: a cafe. Sounds like a multitier technology, right? Which in practice averages to a (restaurant + cafe) / 2 quality level. The place has managed to blur the impression about Porvoo as right after that we have left the city, but thinking about it now, when about 5h passed, I generally feel the journey was pleasant. We decided not to slap our wrathful feelings into their face, but to publish this post.
Discussion
We have thought about why the restaurant quality was *that* not good and came to the conclusion that the target audience of it was tourists. Which means mostly occasional visitors, who will never remember it (as they usually don't come back) even if it wasn't worthwhile. Also, if you do care about your clients, think twice of coupling a restaurant with a cafe in the same cramped premise.
Conclusion
Instead of blurring, use sharpening: go for a home pizza or may be visit some really expensive place to eat and polish the collected impressions of your day in Porvoo.
Bibliography
[1] http://en.wikipedia.org/wiki/Porvoo_Cathedral
[2] http://picasaweb.google.ru/dmitry.kan/Porvoo#