Tom,
I did a google search, and found the following:
http://www.arglist.com/regex/Which states that Tcl uses the same library from Henry. Maybe someone involved with that project would help explain the library? Also I noticed at the url above is a few ports people did from Henry's code. I didn't download and analyze their code, but maybe they have made some comments that could help, or maybe have some improvements to the code..
Just a thought.. :)
Billy Earney
On Sun, Feb 19, 2012 at 5:42 PM, Tom Lane
<tgl@sss.pgh.pa.us> wrote:
Brendan Jurd <
direvus@gmail.com> writes:
> Are you far enough into the backrefs bug that you'd prefer to see it
> through, or would you like me to pick it up?
Actually, what I've been doing today is a brain dump. This code is
never going to be maintainable by anybody except its original author
without some internals documentation, so I've been trying to write
some based on what I've managed to reverse-engineer so far. It's
not very complete, but I do have some words about the DFA/NFA stuff,
which I will probably revise and fill in some more as I work on the
backref fix, because that's where that bug lives. I have also got
a bunch of text about the colormap management code, which I think
is interesting right now because that is what we are going to have
to fix if we want decent performance for Unicode \w and related
classes (cf the other current -hackers thread about regexes).
I was hoping to prevail on you to pick that part up as your first
project. I will commit what I've got in a few minutes --- look
for src/backend/regex/README in that commit. I encourage you to
add to that file as you figure stuff out. We could stand to upgrade
a lot of the code comments too, of course, but I think a narrative
description is pretty useful before diving into code.
regards, tom lane