As an experiment we've taken an export of production artifacts and written javascript that transforms that to federated wiki hypertext. We describe our pilot process.
This work started with a simple scrape of Amazon's Look Inside preview. See Cities Alive Inside
We've encoded most of our logic in a state machine that runs off of <p> tag class names in linear order. github ![]()
# Chronology
We settled on injecting javascript by adding a script tag to the end of the html document which we had in our possession. This ran with each refresh.
<script src="... resources/js/index.js"></script>
We switched on the p tag class name adding cases for tags of interest leaving the default case to capture tag and text in a plain paragraph. We used section and chapter headings to begin new pages to be exported on completion. github ![]()
We shortened page titles by recognizing patterns based on quotes and colons. We recorded these in our own table of contents. With this and specific conversions for block quotes, images and references we could get a sense of what browsing as hypertext would be like. github ![]()
We separated class names at the first hyphen to find a selector independent of formatting variations specific to the book versions. With this simplification and handlers for the remaining cases we had a rough but complete translation. github
We reached back into previously processed items when a subsequent p tag added an attribution to a block quote or a caption to an image. github ![]()
# Future
The work was complete enough to review with the author/publisher to consider how this could become a useful part of this and all future publications.
We will meet to consider our next steps.
1. fine tune heuristics used to adjust text 2. improve wiki’s handling of photo albums 3. improve wiki’s handling of citations and attributions 4. editorial decisions regarding hypertext modularity
There is work in progress elsewhere addressing 2 and 3. Number 4 is a subject that should be considered in the context of other business and social objectives.
.
I am about to embark on a digital archeology expedition into my computer graveyard. Aged hardware holds projects from my youth which I should like to recover and carry forward into the future. The software in which they were authored long ago reached end-of-life and probably only runs on the old machines.
My wife encourages me to let go of the old machines, let go of the emotional attachment, make space in my life for new projects. There is wisdom here.
Quite timely to witness Ward's example here; use state machines to transform one data structure into wiki.
In a reply in matrix chat room, Ward pointed to his own story of reconstructing old projects: wiki ![]()