This was a rough problem. I’m using Python’s SAX modules for parsing JMdict, and ran into a problem regarding its use of XML entities. Nothing wrong with JMdict, but the expansion is rather verbose and does not lend itself well to what I’m doing. I don’t want “word containing irregular kanji usage”, but rather, I want the “iK” code.
Unfortunately, the default ExpatParser doesn’t have a clear way to disable this. But if you read the docs closely enough, you can find out about setting a “default handler” for it, which has the side effect of disabling internal expansion.
This isn’t a perfect fix, but here’s the class I used to get this done: