2021-01-08
Divided the old RailRoad project into a public gem named Punk! together with a
template repository named Let’s Punk!
Working to get these to a certain point of doneness before creating a new
project from the template repository which I will use to re-create the 2010
prototype of HackTile.
2021-01-04
First day of a new way of working.
Spent the weekend rearranging the home office, so I’ll be in a different mental
space. It feels good.
Kids are on holidays. Dropped Eliza in to an all-day drama course, then went
shopping with Jack (bought some new coffee… important) and finished off with a
bike ride.
Finishing the Big Sur upgrade. PostgreSQL had died. Played around with OBS for
recording screencasts. Works better than iShowU. Plan to create a HelloWorld app
that smashes RailRoad together with PixiJS and Howler. Need to separate RR out
into a GEM first. So cracking on that.
Someone logged a MegaHAL bug noting that it doesn’t work with Ruby3, which was
released at XMAS, so fixing that too.
2020-09-05
Starting to muck about with text generation in anticipation of NaNoGenMo this
year. I trained a second-order Markov model on over 200 million words of data
(three thousand or so texts from Project Gutenberg). Written in Ruby, using my
native Sooth library, the entire process took 28 hours and resulted in a 674mb
model file. Because Sooth uses a 32-bit context, I used a 16-bit dictionary of
words, which I generated by stripping punctuation and capitalising words and
then selecting the most frequent 64822 words (I wrote a script to count word
frequencies and select word that occurred at least n times such that the
result would contain fewer than 65536 words; I think n ended up being 27 or
something like that).
I want to use the Markov model to generate sentences, but at the moment it does
a rather poor job. Here are some examples:
<SENTENCE> SOME GENERATION OF ARCHITECT OF GREATNE WOULD COME FOR THE TERM OF ADORATION <SENTENCE>
<SENTENCE> YES SIR HE MADE HER A MAGICIAN HE EXCLAIMED <SENTENCE>
<SENTENCE> AND THIS THOU BUT <BLANK> WAS A PRAYER OVER THAT ALIEN ELEMENT <SENTENCE>
I should also note that <SENTENCE>
and <BLANK>
are special words, as are
<ERROR>
, <PARAGRAPH>
, <CHAPTER>
and <BOOK>
. And that I strip S
from
the ends of words, so that GREATNESS
becomes GREATNE
, in an effort to reduce
the number of unique words (as removing an S
will often turn a word from
plural to singular).
As an example, here is the opening chapter of “The Emerald City of Oz” by L.
Frank Baum as presented to the inference algorithm, once parsed into the 16-bit
dictionary:
<BOOK>
<SENTENCE> PERHAP I SHOULD ADMIT ON THE TITLE PAGE THAT THIS BOOK IS BY L FRANK BAUM AND HIS CORRESPONDENT FOR I HAVE USED MANY SUGGESTION CONVEYED TO ME IN LETTER FROM CHILDREN <SENTENCE>
<SENTENCE> ONCE ON A TIME I REALLY IMAGINED MYSELF AN AUTHOR OF FAIRY TALE BUT NOW I AM MERELY AN EDITOR OR PRIVATE SECRETARY FOR A HOST OF YOUNGSTER WHOSE IDEA I AM <BLANK> TO WEAVE INTO THE THREAD OF MY STORIE <SENTENCE>
<PARAGRAPH>
<SENTENCE> THESE IDEA ARE OFTEN CLEVER <SENTENCE>
<SENTENCE> THEY ARE ALSO LOGICAL AND INTERESTING <SENTENCE>
<SENTENCE> SO I HAVE USED THEM WHENEVER I COULD FIND AN OPPORTUNITY AND IT IS BUT JUST THAT I ACKNOWLEDGE MY INDEBTEDNE TO MY LITTLE FRIEND <SENTENCE>
<PARAGRAPH>
<SENTENCE> MY WHAT IMAGINATION THESE CHILDREN HAVE DEVELOPED <SENTENCE>
<SENTENCE> SOMETIME I AM FAIRLY ASTOUNDED BY THEIR DARING AND GENIU <SENTENCE>
<SENTENCE> THERE WILL BE NO LACK OF FAIRY TALE AUTHOR IN THE FUTURE I AM SURE <SENTENCE>
<SENTENCE> MY READER HAVE TOLD ME WHAT TO DO WITH DOROTHY AND AUNT EM AND UNCLE HENRY AND I HAVE OBEYED THEIR MANDATE <SENTENCE>
<SENTENCE> THEY HAVE ALSO GIVEN ME A VARIETY OF SUBJECT TO WRITE ABOUT IN THE FUTURE ENOUGH IN FACT TO KEEP ME BUSY FOR SOME TIME <SENTENCE>
<SENTENCE> I AM VERY PROUD OF THIS ALLIANCE <SENTENCE>
<SENTENCE> CHILDREN LOVE THESE STORIE BECAUSE CHILDREN HAVE HELPED TO CREATE THEM <SENTENCE>
<SENTENCE> MY READER KNOW WHAT THEY WANT AND REALIZE THAT I TRY TO PLEASE THEM <SENTENCE>
<SENTENCE> THE RESULT IS VERY SATISFACTORY TO THE PUBLISHER TO ME AND I AM QUITE SURE TO THE CHILDREN <SENTENCE>
<PARAGRAPH>
<SENTENCE> I HOPE MY DEAR IT WILL BE A LONG TIME BEFORE WE ARE OBLIGED TO DISSOLVE PARTNERSHIP <SENTENCE>
<CHAPTER>
So, how to generate a novel novel from this mess? Here are my thoughts:
- Generate a prototype sentence, which consists of a certain number of empty slots for words, with the length of the sentence statistically consistent with what has been observed in the past.
- Populate the slots with some candidate keywords that have high mutual information according to the previous two sentences.
- For the remainder of the slots, determine a list of words that could fill those slots, as constrained by the other known words in the prototype sentence.
- Fill the empty slots with the candidate words, preferring to fixate on a relevant keyword, and providing the choice is legal according to the Markov model.
- Generate hundreds of candidate sentences, and select the best according to some heuristic.
The heuristic for selecting the best generation will be a function of two
factors; the average information of the generated sentence as measured by the
Markov model, and the average mutual information of the generated sentence, as
measured by a model that takes the previous few sentences into account, and
possibly also the fixation words.
These fixation words should also be determined stochastically from data, by
observing words that tend to occur in clusters. I am trying here to identify
character names, locations, objects and so on that are pertinent to the story.
If the model generates a sentence containing the word SHERLOCK
, for instance,
then the mere presence of this word in the story should make it much more likely
to occur in the future. This is something to be figured out.
2020-06-03
Sunny today after a couple of weeks of wind and rain and destructive storms. So
out for a mid-morning ride. Conditions were good and I zoomed along. Had a fun
moment when I, on the bike path, ringing my bell and weaving between elderly
pedestrians, overtook a young couple, resplendent in their spandex (or is it
lycra?) and peddling their expensive racing bikes along the road beside me.
They took one look at my K-Mart bike and I heard the boy whisper something to
the girl. Soon, on an uphill section, they zoomed past me, just before I veered
off into the back streets to take a short-cut through the nature reserve around
the rive that I usually favour.
I started riding hard, much harder than usual, to beat them to the spot where
our routes would converge. And I prevailed, arriving a good half-minute before
them (the shortcut saves at least a few hundred metres of distance) where I
dismounted to take a swig of my drink bottle and leiusirely finger my phone
before nodding as they rode past with stunned expressions which turned to
laughter in their wake.
2020-05-03
We’ve been in lockdown for over 50 days now, and yesterday was the first time in
seven weeks that I’ve really felt bored. I had an overdue task to work on for my
day job, which I was finding it difficult to get into, it was a gorgeous day but
we had nowhere to go and nobody to visit, and all other forms of entertainment
seemed trite. I guess I was suffering from ennui.
I feel better now; I’m making great progress on my overdue task, the weather is
worsening, and I’m looking forward to playing some games later today :P
2020-04-27
Western Australia has done a great job containing the spread of COVID-19, with
just ten new cases over the past ten days. As a result some restrictions are
being lifted, which raises the possibility of family brunch next weekend, and
dinner with my parents. Fingers-crossed that numbers continue to look good; the
risk is the delay between restrictions being lifted and numbers rising as a
result (it could be up to two weeks for us to notice).
Watched “Music from the Home Front”, a great live concert on TV this Anzac Day.
We stood on the driveway at 6am on Saturday the 25th with the kids for dawn
service. At least half the street did the same. Kids sang “Song of Australia”.
2020-04-24
Seen on my morning bike ride. Idiots live amongst us.
