A function that is useful is a comparator of strings (for sorting purpose, in general) that respect the natural order of numbers.
So when we have a list of strings, like file names, for examples, with numbers in them, it is better to respect the numerical order when strings are equal.
A common case is numbered names: a trivial, lexicographical order would be:
foo-1
foo-10
foo-11
foo-2
...
foo-9
where one expects:
foo-1
foo-2
...
foo-9
foo-10
foo-11
I show here how to do such comparator.
At work, in our client-server application, we had an implementation of such comparator, but it appeared it was a major cause of the slowdown of the display of a large list of items in a Swing table (JTable).
A quick look at the method shown the implementation was sub-optimal... It started by a useless String sw = new String(sp);
(wrapping of a parameter), and at some point, it tries to convert a string to integer by creating a BigInteger from it (!), and catching an exception if it failed to fall back on a simple string comparison.
All this object creation, exception catching and stuff was very costly, CPU and memory-wise (object creation is then paid in GC activity). When sorting 6 millions of entries, it took several minutes on a fast machine!
Since such task is something I like to do, back at home I thought about an algorithm for this task. I made a prototype, tested it, refined and fixed it, and finally I had a functional comparator.
I brought it back at work, replaced the old implementation with mine, and ran the little main() testing the functionality, comparing the time to do a simple string compare (with case-insensitive on) against the time of the special comparator.
Before, it was 19 ms against almost 1000 ms.
After, it was 19 ms against... 18 ms!
Looks like the case-insensitive algorithm is (relatively) costly, perhaps because it operates on Unicode chars.
The six million entry table was then sorted in a few seconds (instead of minutes).
The base idea of my algorithm is to avoid entirely to create new objects. Just a loop, some tests, some intermediary variables, simplistic math. Close of what String.compare() does, just a little more complex.
The base idea is simple: we do a loop over the characters of the two strings to compare, advancing in parallel.
As long as we are comparing non-digits, we continue if they are equal. If we find two different chars, we stop and return the comparison of those.
If we find digits, we then switch in number comparison mode: we convert the sequence of digits to longs, and compare them. If they are equal, and if we haven't still reached the end of the strings, we continue in non-digit more. If then are not equal, we just return the comparison of the two numbers.
There are some subtleties in handling the end of strings, but the essence of the algorithm is simple.
I show below the code, with abundant comments, for you, and for myself when I will read back the code some time later...
package org.philhosoft.util;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collections;
import java.util.Comparator;
/**
* Compare two strings by respecting the natural order of numbers.
* Ie. foo2 < foo10, a5b4 < a15b4 and so on.
*
* @author PhiLho
*/
public class NaturalOrderComparator implements Comparator<String>
{
@Override
public int compare(String s1, String s2)
{
if (s1.isEmpty() && s2.isEmpty())
return 0; // Identical...
if (s1.isEmpty())
return -1; // Empty is smaller than anything else
if (s2.isEmpty())
return 1;
char c1 = s1.charAt(0);
char c2 = s2.charAt(0);
boolean b1 = Character.isDigit(c1);
boolean b2 = Character.isDigit(c2);
// Fast exit
if (b1 && !b2 || !b1 && b2)
return s1.compareTo(s2); // One is a digit, the other isn't: do regular comparison
int len1 = s1.length();
int len2 = s2.length();
int pos1 = 0, pos2 = 0;
boolean bIsDigit = b1;
while (true)
{
if (bIsDigit)
{
// Do number comparison
long n1 = 0;
long n2 = 0;
// Convert the sequence of digits to a long number
do
{
c1 = s1.charAt(pos1);
b1 = Character.isDigit(c1);
if (b1)
{
n1 = n1 * 10 + (c1 - '0');
}
// Stop if one non-digit is found or if we reached the end of one string
} while (b1 && ++pos1 < len1);
// Idem, in the second string
do
{
c2 = s2.charAt(pos2);
b2 = Character.isDigit(c2);
if (b2)
{
n2 = n2 * 10 + (c2 - '0');
}
// Stop if one non-digit is found or if we reached the end of one string
} while (b2 && ++pos2 < len2);
// Compare the numbers
if (n1 < n2)
return -1;
if (n1 > n2)
return 1;
// Here, the numbers are equal. If we reached the end of the strings,
// we say they are equal, otherwise we continue on comparing strings
if (pos1 == len1 && pos2 == len2)
return 0;
}
else
{
// Do string comparison, character by character
do
{
c1 = s1.charAt(pos1);
c2 = s2.charAt(pos2);
b1 = !Character.isDigit(c1);
b2 = !Character.isDigit(c2);
// Two non-digits, different
if (b1 && b2 && c1 != c2)
return c1 - c2;
// One is digit, and the other isn't one
if (b1 && !b2 || !b1 && b2)
return c1 - c2; // Just compare these different chars
// Next chars
++pos1; ++pos2;
// Stop if one digit is found or if we reached the end of one string
} while (b1 && b2 && pos1 < len1 && pos2 < len2);
if (b1 && pos1 == len1 && pos2 == len2)
return 0; // At the end with non-digits without finding differences
}
// Have we reached one end?
if (pos1 == len1 && len1 < len2)
return -1; // s1 is shorter, so smaller (all chars were equal so far)
if (pos2 == len2 && len2 < len1)
return 1; // s2 is shorter, so smaller
// Not at the end, we stopped on different kind of char (digit vs. non-digits), let's process them
if (!bIsDigit) // Compared chars, we went one char too far, into digits
{
// Put back current chars into the comparison
--pos1; --pos2;
}
// Switch the comparion mode
bIsDigit = !bIsDigit;
}
}
}
You can find the full code, including the main() that tests the comparator (converted to a JUnit test at work...), at my Launchpad repository: NaturalOrderComparator.java
I hope it can be useful to somebody.
As usual, my code is released with the liberal zlib/libpng license.