"A human being should be able to change a diaper, plan an invasion, butcher a hog, conn a ship, design a building, write a sonnet, balance accounts, build a wall, set a bone, comfort the dying, take orders, give orders, cooperate, act alone, solve equations, analyze a new problem, pitch manure, program a computer, cook a tasty meal, fight efficiently, die gallantly. Specialization is for insects." (Robert A. Heinlein)

Thursday, 28 July 2011

Random blog post title generator with Polygen

After so many “Test drive:” blog post I sometimes feel that my writing has become too much repetitive. I so decided to have some relax and, with a little of self-irony, I'll try writing a random blog post generator using Polygen. Writing a full post generator is a huge task: as you start decomposing a phrase structure the number of options literally explode. So let's start just with generating the post title.

A top down approach

The top-down approach is the one I find more natural while dealing with a Polygen grammar, so let's start with defining the very basic structure of a blog post
S ::= Title "\n"^
Body "\n"^
Tags;
the “\n” string is a new line escape character while the “^” operator tells Polygen to concatenate without adding spaces (To avoid a new line starting with a space).
We'll set the “Body” symbol to an empty preposition and forget about if (at least for the moment).
Body ::= _;
Titles for “review style” posts are usually something like “installed some-program on some-system”; this can be written, in Polygen grammar language, like following:
Title ::= Action (Program ("on the" Computer | "on" Os | "on the" Computer "and" Os) | Os "on the" Computer);



the “Action” symbol can be defined like this (more actions can be of course added).
Action ::= "Installing" | "Test drive:" | "Upgrading" | "Quick Test:";
In the same way the “Computer” symbol will choose among my own computers.
Computer := "EEEPC 900" | "Sempron 2400" | "PIII 550";
Since we will want reuse the generated “Computer” value further (in the tags section and may be in the future body) we used the “:=” to assign it. This operator tells Polygen to generate e preposition only once. We could of course continue like this (the easy way) by creating a lists of programs and operating systems but it would be more interesting, and funnier, try to write some more complex program name and OS name generator. Let's consider for program names made by a prefix and, a suffix (like Firefox for example) with a version number with optional release and beta status. This translates like this in the grammar language:
Program := Prefix ^ Suffix ["V." ^] Version [^ "." ^ Release ] [Beta];
Prefix ::= "Fire" | "Jar" | "Thunder" | "Hot" | "Spicy" | "Sea" | "K" | "G" | "Gnu";
Suffix ::= "bird" | "fox" | "dog" | "monkey" | "write" | "calc" | "show";
expressions included in square brackets are optional (with 50% chance to appear). “Version” and “Release” symbols are numbers while the “Beta” symbol is a little more complex.
Version ::= Number;
Release ::= Number;
Beta ::= "Alpha" [^ SmallNumber] | "Beta" [^ SmallNumber] | "M" ^ SmallNumber | "Milestone" SmallNumber;
Since usually beta release numbers have smaller range than regular version or release numbers I defined also a “SmallNumber” symbol.
Number ::= 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12;
SmallNumber ::= 1 | 2 | 3;
continuing in similar fashion we can define an operating system name made by a name followed by a version code name (in the familiar form of an adjective plus an animal name) or a version number all optionally followed by the word “Linux” (with the also optional “GNU” prefix).
Os := OsName (OsVersion | OsNumber) [["GNU/" ^] "Linux"];
OsName ::= ("U" | "Ku" | "Xu" | "Edu" | "Any" | "Some") ^ "buntu" | "Debian" | "Puppy";
OsVersion ::= Adjective Animal;
Adjective ::= "Alien" | "Breezy" | "Dodgy" | "Easy" | "Filthy" | "Gritty" | "Healthy";
Animal ::= "Albatross"| "Bear" | "Cat" | "Dog" ;
OsNumber ::= Number [^ "." ^ Number];
Of course non terminal symbols can be reused, like “Number” in this case, since it has been defined with the “::=” operator every time we well use the “Number” symbol e new one will be generated. One last thing: when dealing with space-less concatenation, “^” operator, and optional prepositions, inside square brackets, the concatenation operator should usually be placed inside the square brackets. Doing so the space-less concatenation will be tied to the optional preposition selection.
Here is the full grammar definition code

 S ::= Title "\n"^
      Body  "\n"^
      Tags;

Title ::= Action (Program ("on the" Computer | "on" Os | "on the" Computer "and" Os) | Os "on the" Computer);

Action ::= "Installing" | "Test drive:" | "Upgrading" | "Quick Test:";

Program := Prefix ^ Suffix ["V." ^] Version [^ "." ^ Release ] [Beta];
Prefix ::= "Fire" | "Jar" | "Thunder" | "Hot" | "Spicy" | "Sea" | "K" | "G" | "Gnu";
Suffix ::= "bird" | "fox" | "dog" | "monkey" | "write" | "calc" | "show";

Version ::= Number;
Release ::= Number;
Beta ::= "Alpha" [^ SmallNumber] | "Beta" [^ SmallNumber] | "M" ^ SmallNumber | "Milestone" SmallNumber;

Number ::= 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12;
SmallNumber ::= 1 | 2 | 3;

Computer := "EEEPC 900" | "Sempron 2400" | "PIII 550";

Os := OsName (OsVersion | OsNumber) [["GNU/" ^] "Linux"];
OsName ::= ("U" | "Ku" | "Xu" | "Edu" | "Any" | "Some") ^ "buntu" | "Debian" | "Puppy";
OsVersion ::= Adjective Animal;
Adjective ::= "Alien" | "Breezy" | "Dodgy" | "Easy" | "Filthy" | "Gritty" | "Healthy";
Animal ::= "Albatross"| "Bear" | "Cat" | "Dog" ;
OsNumber ::= Number [^ "." ^ Number];

Tags ::= ["Linux," ] (Os ^ "," | OsName ^ ",") Computer ^ "," Program;

Body ::= _;

I ::= "musante.i.ph post title generator by MM";

Conclusions

Will my next post be “Installing Thunderwrite 2.1 on the EEEPC 900 and Kubuntu Healthy Bear ”? I hope not! Polygen is just a toy, a uncommon way to have some fun with a computer (without the need of the latest graphic card). I don't know if I'll ever complete this grammar, writing a full post generator is a big task and even the title isn't complete yet (the “adjective+animal” couple, for example should give results with the same initials like “Alien Albatross”). I just hope you enjoyed the break in the “Test Drive:” chain!