Optimisation in my code has led me to a code refactoring, where several methods with return values became one. Since I could not use "pass by reference" feature for all my parameters (primitives and objects as collections) I had to come up with an inner class for holding all return parameters.
So by not supporting "pass by reference" feature Java makes your code even more OOP-ish. Even though the topic can be well known, this article helps to put things together and make a memory refresh.
Friday, November 19, 2010
Wednesday, November 3, 2010
Successive replacement in regular expressions (java)
Actually not sure, how often people out there do the successive replacement in a target text having a regular expression pattern, but Java has rather neat solution for it. I'm publishing it here, because I know, that esp. younger developers can re-invent a wheel here and have longer debugging sessions.
So the task is like this: you have a text T, like "cat-1 dog-1 cat-1 elephant-1 cat-2 dog-2 cat-3".
Suppose we want to change numerals attached to the words "cat" to their word representations: "1" to "one", "2" to "two".
One straightforward way would be to match all "cat-([0-9])+" subsequences and then run replace operation on T.
So the code would look something like this:
This code produces:
cat-one dog-one cat-1 elephant-1 cat-two dog-2 cat-three
Which is missing one substitution. Ok, let's use replaceAll instead and make sure we touch only cats:
which produces what we want:
cat-one dog-1 cat-one elephant-1 cat-two dog-2 cat-three
But now what happens inside the loop is logically out of sync with the loop condition: we iterate over matches, but call replaceAll (probably not efficient either, as replaceAll will be attempted even when not needed anymore, for duplicate matches).
Any more elegant and correct solution?
Yes! It is called Matcher.appendReplacement
now sb.toString() contains:
cat-one dog-1 cat-one elephant-1 cat-two dog-2 cat-three
If you append System.out.println(sb.toString()); inside the while loop, you will also see, that replacements happen in sync with the while loop's state, so that what is inside the loop and what while loops over are in sync.
So the task is like this: you have a text T, like "cat-1 dog-1 cat-1 elephant-1 cat-2 dog-2 cat-3".
Suppose we want to change numerals attached to the words "cat" to their word representations: "1" to "one", "2" to "two".
One straightforward way would be to match all "cat-([0-9])+" subsequences and then run replace operation on T.
So the code would look something like this:
String T = "cat-1 dog-1 cat-1 elephant-1 cat-2 dog-2 cat-3";
Pattern catPattern = Pattern.compile("cat-([0-9]+)");
Matcher catMatcher = catPattern.matcher(T);
Map numToWord = new HashMap();
numToWord.add("1", "one");
numToWord.add("2", "two");
numToWord.add("3", "three"); // ...
while (catMatcher.find())
{
T = T.replaceFirst(catMatcher.group(1), numToWord.get(catMatcher.group(1)));
}
This code produces:
cat-one dog-one cat-1 elephant-1 cat-two dog-2 cat-three
Which is missing one substitution. Ok, let's use replaceAll instead and make sure we touch only cats:
{
T = T.replaceAll("cat-" + catMatcher.group(1), "cat-" + numToWord.get(catMatcher.group(1)));
}
which produces what we want:
cat-one dog-1 cat-one elephant-1 cat-two dog-2 cat-three
But now what happens inside the loop is logically out of sync with the loop condition: we iterate over matches, but call replaceAll (probably not efficient either, as replaceAll will be attempted even when not needed anymore, for duplicate matches).
Any more elegant and correct solution?
Yes! It is called Matcher.appendReplacement
Pattern catPattern = Pattern.compile("cat-([0-9]+)");
Matcher catMatcher = catPattern.matcher(T);
MapnumToWord = new HashMap ();
numToWord.put("1", "one");
numToWord.put("2", "two");
numToWord.put("3", "three"); // ...
StringBuffer sb = new StringBuffer();
while (catMatcher.find())
{
System.out.println("Match:" + catMatcher.group(1));
catMatcher.appendReplacement(sb, "cat-" + numToWord.get(catMatcher.group(1)));
}
catMatcher.appendTail(sb);
now sb.toString() contains:
cat-one dog-1 cat-one elephant-1 cat-two dog-2 cat-three
If you append System.out.println(sb.toString()); inside the while loop, you will also see, that replacements happen in sync with the while loop's state, so that what is inside the loop and what while loops over are in sync.
Subscribe to:
Posts (Atom)