"A human being should be able to change a diaper, plan an invasion, butcher a hog, conn a ship, design a building, write a sonnet, balance accounts, build a wall, set a bone, comfort the dying, take orders, give orders, cooperate, act alone, solve equations, analyze a new problem, pitch manure, program a computer, cook a tasty meal, fight efficiently, die gallantly. Specialization is for insects." (Robert A. Heinlein)

Friday, 31 October 2008

Populating a table with random people


It happened I had  to  fill a table with realistic but not real people data.Here how did I managed it (ported to MySQL sintax).
First lets create a people table like this :
CREATE TABLE PEOPLE(NAME VARCHAR(20),
                                     SURNAME VARCHAR(20),
                                     SEX VARCHAR(1),
                                     DOB DATE);
Then create tables with sample names and surnames:
CREATE TABLE NAMES(NAME VARCHAR(20), SEX VARCHAR(1));
CREATE TABLE SURNAMES(SURNAME VARCHAR(20));
And fill them with some values:
INSERT INTO NAMES(NAME,SEX) VALUES('Jhon','M');
INSERT INTO NAMES(NAME,SEX) VALUES('Jack','M');
INSERT INTO NAMES(NAME,SEX) VALUES('Jeff','M');
INSERT INTO NAMES(NAME,SEX) VALUES('Judy','F');
INSERT INTO NAMES(NAME,SEX) VALUES('Jade','F');
INSERT INTO NAMES(NAME,SEX) VALUES('Jane','F');
INSERT INTO SURNAMES(SURNAME) VALUES('Red');
INSERT INTO SURNAMES(SURNAME) VALUES('White');
INSERT INTO SURNAMES(SURNAME) VALUES('Green');
Now we'll be able to get a single random person with a query like this:
SELECT NAME, SEX,
           CURDATE() - INTERVAL RAND()*365.25*100 DAY AS DOB,
                 (SELECT SURNAME FROM SURNAMES ORDER BY RAND() LIMIT 1) AS SURNAME
FROM NAMES ORDER BY RAND() LIMIT 1;
We can write a simple procedure to populate the table with the desired number of rows:
CREATE PROCEDURE FILL(IN NUM INT)
   BEGIN
       WHILE NUM > 0 DO
          INSERT INTO PEOPLE(NAME,SEX,DOB,SURNAME)
          SELECT NAME, SEX,
                            CURDATE() - INTERVAL RAND()*365.25*100 DAY AS DOB,
                         (SELECT SURNAME FROM SURNAMES ORDER BY RAND() LIMIT 1) AS SURNAME
         FROM NAMES ORDER BY RAND() LIMIT 1;
         SET NUM = NUM-1;
      END WHILE;
END;
We can call this procedure like this:
mysql> call FILL(10000);
Query OK, 1 row affected (40.73 sec)
A ten thousand people population in forty seconds (on PIII at 550Mz) is not bad.
Of course many improvements can be done like adding a random place of birth (based on a table of places) or making date of birth distribution more realistic.

No comments :

Post a Comment