In the future, everyone will be world-famous for 15...people.
Garkov is a webcomic that uses Markov chains to generate text using old Garfield strips as its input corpus. Inspect element reveals that the text is produced using Markov chains on individual letters.
What is a Markov chain?
A stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event.
What is a Markov chain?
To use Markov chains to generate text, you would read the input corpus and for each word, note the word that follows it and at what frequency. You would output text by making each word a random function of its immediate predecessor. The number of previous words is the “order”. Order-1 text is generated by producing the next word as a random function of the current word. Order-2 generation produces the next word as a random function of the previous two-word grouping. And so on. Text generated using higher orders will more closely resemble natural speech.
Illustrated Markov Chain
Output will depend on size of your input corpus — the larger the training dataset the better, although too large a sample size and your output will be exactly your input.
Markov chains can be used to generate text to bypass spam filters; model games of chance; certain statistical analysis; and even underlies Google’s PageRank algorithm:
The formula uses a model of a random surfer who gets bored after several clicks and switches to a random page. The PageRank value of a page reflects the chance that the random surfer will land on that page by clicking on a link. It can be understood as a Markov chain in which the states are pages, and the transitions are all equally probable and are the links between pages.
To write my own Markov chain text generator these are the likely steps:
- Read input corpus.
- Break input corpus into individual words using some kind of regex.
- Be able to choose order, say one to four, and build a hash(?) storing the n-order word(s) prefix with its word suffix.
- Select and output a prefix followed by its suffix.
- In case a prefix has multiple possible suffixes, select one from that set randomly or weighted by statistical frequency.
- Create new prefix-suffix pairs until output of predetermined length is completed.
Idea: input corpus is some kind of art theory book, output is your artist statement.