Thursday, August 18, 2011

Converting the data from Postgres to MongoDB

I couldn't find the time to get down to the No:sql(eu) Conference in London this week, but I did want to learn more about NoSQL databases, so I decided the best way to learn would be to move one of my existing websites from a traditional SQL database, to a nosql one.

I picked MongoDB almost at random, and my longwords.org website seemed to be the best option to switch, since I wrote it a couple of years ago and haven't looked at it since, so it would be good to get to know it again. The site gets about 2500 unique visitors a month, so the traffic isn't insignificant.

I split the migration process into 3 phases:

  1. Converting the data from Postgres to MongoDB
  2. Converting SQL queries into MongoDB Javascript
  3. Implementing MongoDB Javascript in MongoDB PHP statements

Converting the data from Postgres to MongoDB

Data conversion turned out to be the easiest part of the process.

Exporting data in Postgres is very easy, and longwords is based around 1 single table, so this command ran in psql dumped the data out into a CSV file:

\copy words to '/tmp/outputfile.csv' delimiters ',' with null as '0'

The next step was to import that data, which again took just 1 command

/usr/local/mongodb/bin/mongoimport -d wordsdb -c words -f word,number,votes,score —file /tmp/longwordslist.txt —type csv

Easy!

Creating the same indexes as had been in Postgres was simple too, from the mongo console I ran these commands in the new wordsdb database.

db.words.ensureIndex({score:-1});

db.words.ensureIndex({number:1}, {unique: true});

Converting SQL queries into MongoDB Javascript

This part took the longest, simply because I didn't know the MongoDB syntax beforehand. The longwords site used 3 main select statements, one which pulled out the next word to display, one to return the top 10 list of most popular words, and one to return the count of total votes.

MongoDB query to return single word:

db.words.find({number:1000});

MongoDB query to return top 10 words:

db.words.find().sort( { score : -1 }).limit(10);

MongoDB query to return sum of votes:

db.words.group( { reduce: function(obj,prev) { prev.votes += obj.votes; }, initial: { votes: 0 } } );

Notice the last query makes use of the group function in MongoDB, which is a simplified interface to the MapReduce functionality, and can be used to produce the same result as the "sum(value)" function in SQL.

There were also 2 update statements for when people vote yes or no to a word. These queries needed to increment the number of votes that word has received, and to increment or decrement the score of that word, depending on if the person clicked yes or no.

MongoDB query to increase score and increase votes values:

db.words.update( { word:"ascosporous" }, { $inc: { score : 1, votes : 1 } } );

MongoDB query to decrease score and increase votes values:

db.words.update( { word:"ascosporous" }, { $inc: { score : -1, votes : 1 } } );

With these statements in place, I was ready to implement them in the MongoDB PHP module.

Implementing MongoDB Javascript in MongoDB PHP statements

This took a little bit of time, but really the format changes are pretty obvious once you get used to it.

MongoDB PHP code to return single word:

$totalwords=$words->count();

$randomlength=rand(1,$totalwords);

$result=$words->find(array('number' => $randomlength));

MongoDB PHP code to return top 10 words:

$toprated = $words->find()->sort(array("score" => -1))->limit(10);

$count=0;

while ($count<10)

{

$row = $toprated->getNext();

$rowword = ucfirst($row[word]);

echo $rowword;

$count++;

}

MongoDB PHP code to return sum of votes:

$keys = array();

$reduce = "function(obj,prev) { prev.votes += obj.votes; }";

$initial = array("votes" => 0);

$g = $words->group($keys,$initial,$reduce);

$votecount = $g[retval][0][votes];

MongoDB PHP code to increase score and increase votes values:

$words->update(array("word" => $longword), array('$inc' => array("score" => 1,"votes" => 1)));

$words->update(array("word" => $longword), array('$inc' => array("score" => -1,"votes" => 1)));

Results

There was really only 1 issue with the conversion, and it's one that I still haven't overcome – the query to return the sum of votes causes significant CPU usage, unlike the original SQL statement which was a simple "select sum(votes) from words" query.

Until I come up with a solution, I've disabled that small section of the longwords page, but hopefully I'll find a suitable replacement statement. If you've got any suggestions, I'd love to hear them!

Other than that query, CPU and memory usage is minimal, as is disk I/O – there's certainly nothing which would make me think that MongoDB isn't a practical replacement for MySQL or Postgres for many websites.



No comments:

Post a Comment

Please feel free to contact or comment the article

Search This Blog