avinash09 <avinash.it09@gmail.com> wrote:
> regex="^(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),
> (.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*)$"
A better solution seems to have been presented, but for the record I would like to note that
the regexp above is quite an effective performance bomb: For each group, the evaluation time
roughly doubles. Not a problem for 10 groups, but you have 28.
I made a little test and matching a single sample line with 20 groups took 120 ms/match, 24
groups took 2 seconds and 28 groups took 30 seconds on my machine. If you had 50 groups, a
single match would take 4 years.
The explanation is that Java regexps are greedy: Every one of your groups starts by matching
to the end of the line, then a comma is reached in the regexp and it backtracks. The solution
is fortunately both simple and applicable to many other regexps: Make your matches terminate
as soon as possible.
In this case, instead of having groups with (.*), use ([^,]*) instead, which means that each
group matches everything, except commas. The combined regexp then looks like this:
regex="^([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),...([^,]*)$"
The match speed for 28 groups with that regexp was about 0.002ms (average over 1000 matches).
 Toke Eskildsen
