An infrequent visitor joined our wiki hangout a couple weeks ago. Like your author, our visitor also lives in Boulder, Colorado and enjoys programming. He recommended a program called Bookish. At the time we opened a tab in our browser. Today we return and discover a moment of delightful meander among programming, language, and typesetting—grateful for having followed up on the recommendation of a peer. github ![]()
Bookish is a tool for books and articles that begin as an xml-ish + some markdown format and are converted to HTML and LaTeX.
It's origin story reads very much like the origins of TeX. The authors were programmers working on an article and dissatisfied with the typesetting who then solved the problem with programming.
Browse the article a moment, just to see their resulting typesetting for yourself. It is beautiful even without taking in the content. See The Matrix Calculus You Need For Deep Learning article
.
One of the authors of Bookish is "The ANTLR Guy." For non-programmers... nevermind, ANTLR is a tool for programmers; maybe even more for language designers.
ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files. It's widely used to build languages, tools, and frameworks. From a grammar, ANTLR generates a parser that can build and walk parse trees. See www.antlr.org
.
LaTeX is a software system for document preparation. TeX is a typesetting system, released in 1978, which was designed and written by Donald Knuth. TeX is especially good at typesetting complex mathematical formulae and is one of the most sophisticated digital typographical systems. See History of TeX site
and wikipedia
.
Back to Bookish... "As the ANTLR guy, I ain't afeared of building a language translator and so, following my motto 'Why program by hand in five days what you can spend five years of your life automating', I decided to simply solve this problem by building my own markdown translator."
The really tricky bit is the vertical alignment of equations within a line of HTML text. Check out this sentence with embedded equations:

(I had to take a snapshot and show that instead of giving raw HTML plus equations; github's markdown processor didn't handle it properly. haha.)
What does it mean to properly align an equation's image? It's painful. We need to convince latex to give us metrics on how far the typeset image drops below the baseline. (Latex calls this the depth.) It took a while, but I figured out how to not only compute the depth below baseline but also how to get it back into this Java program via the latex log file. You can see how all of this is done here: Translator.visitEqn()
.