DGrok Delphi parser

What is DGrok?

The grammar: The "DGrok Delphi grammar" is the Delphi grammar that I've reverse-engineered (since CodeGear doesn't publish an official Delphi grammar — or at least, not an accurate one).

The parser: The "DGrok Delphi parser" is an open-source parser that can read Delphi code and build a syntax tree from it. The parser itself is written in C#. (Why?)

The tools: The "DGrok tools" are a set of open-source tools, currently under development, for parsing Delphi source code and doing cool stuff with it: smart and lightning-fast searches, tracking down weird code constructs, refactoring, etc. See the "List of DGrok tools" below.

Downloads

Current DGrok Delphi grammar (online version; may correspond to unreleased code)

DGrok downloads (grammar, parser, tools, and all source code)

List of DGrok tools

DGrok comes with a demo app, which you can use to parse one or more directory trees, and then analyze the code looking for patterns.

Here's the current list of patterns it can look for. It's no FxCop, but it's a start.

You can also add code to look for patterns of your own. See the classes in the DGrok.Framework\Visitors directory for examples.

Project status

Currently the parser is fully capable of parsing Delphi 2007 source code, but can't read code that uses new Delphi 2009 features like string locales or generics. (The DGrok grammar doesn't document these new features either.) There's also no symbol table support yet, so the tools can't do refactorings or Find References.

More information is available in the DGrok posts on my blog.

Delphi grammar and project status

Frequently Asked Question

Why isn't DGrok written in Delphi?

DGrok is written in C#, not Delphi. Sometimes people ask why.

When I first started the project that was to become DGrok, it was just a fancy Find tool, and I wrote it in .NET because .NET came with a regular-expression library. Later I tried using a parser generator, and there really aren't any good ones that produce Delphi code, so I stuck with C#. And when I eventually switched to a hand-coded recursive descent parser, well, I already had all these unit tests written in C#.

Besides that, C# has a lot of language niceties Delphi didn't. DGrok uses things like generics, anonymous methods, and iterators, none of which existed in Delphi back in 2004-2007 when I was writing DGrok.

I also like working in a garbage-collected environment; writing this in Delphi for Win32 would have required adding a lot of memory-management code that would just clutter things up. There was a Delphi for .NET in 2007, but they hadn't been giving it a lot of love; I don't think it even supported .NET 2.0 yet. They finally gave up on it and started reselling Oxygene instead, but that wasn't until later.

Plus, you can get a C# compiler for free. That's a bonus. I'm all in favor of free-as-in-speech tools that only have free-as-in-beer dependencies. (I suppose I could have looked into FreePascal, but I didn't have a strong desire to keep my code compilable in two different environments.)

The Ruby dependencies are mainly because it was easier and quicker to write those parts in Ruby. I'm all for using the right tool for the job, and interpreted languages are a great choice for codegen, because you can run them during your build process without needing to compile the code generator first. It's also easy and quick to hack on, which was really nice when I was bootstrapping the grammar and using Ruby to build my HTML documentation that also served as my what-to-do-next list.

Anyway, there's no technical reason you couldn't write DGrok in Delphi. If anyone wants to translate the code (or to otherwise enhance it or build on it, for that matter), I'd be happy to link to you. Or I could post the code on GitHub so you could fork it, if that would be useful. Let me know.

Why doesn't DGrok use a parser generator like ANTLR?

ANTLR is a fine tool, but it has problems with ambiguous grammars. It wants to be able to read from left to right, one token at a time, and always know what type of construct it's dealing with based only on what it's seen so far. (There's support for lookahead but it's extremely limited.)

That isn't good enough for the Delphi grammar. Delphi is full of ambiguity.

For example, take the humble semicolon. Most of the time, it's an unambiguous statement separator. That is, until you see a semicolon in the middle of a variable declaration:

var
  Foo: procedure; stdcall = nil;

So when you see the first semicolon, you don't know whether you're done with the variable declaration or not. ANTLR doesn't take well to that sort of thing.

Once you start digging into the grammar, it becomes obvious that the Delphi grammar grew organically over time, rather than being designed from the beginning to be easy to write tools for.

DGrok uses a hand-coded recursive-descent parser. It's hard to tell a tool how to deal with the grammar ambiguity if it wasn't designed for it, but it's easy to write code to deal with the ambiguity.

Contacting me

DGrok was written by Joe White. If you have any comments, corrections, questions, suggestions, etc., please feel free to use my contact form to get in touch with me.