I gave up on my DFA parser. I tinkered with it for a couple of days, but the new-token effect (where a trailing character in one pattern may be a leading character in another pattern) caused too much non-determinism in this application. I kept adding Ragel priorities but eventually reached a point where I realized it would have been less work to build a recursive descent tokenizer by hand. Instead, I switched to the Ragel version of a longest-match scanner. It works like Lex but generates code suitable for embedding in a reentrant function. The only trick to that is buffer handling when you can encounter arbitrarily large tokens such as program language comments. I handle that by allocating a heap buffer for all I/O and doubling its size whenever the scanner cannot match a complete token.