Tuesday, October 16, 2012

PHP vs Node.js vs Native Speed Comparison

A quick speed comparison between PHP, Node.js and native code. This is obviously an artificial benchmark. The code generates four random numbers between 0 and 100 and uses them as coordinates of two points on a plane, and then calculates the distance between them. This is repeated 10 million times. Only the loop is timed, so process startup time is not part of the timing. The results are:

PHP - 12.692 seconds
Node.js - 0.328 seconds
Native - 0.308 seconds

All tests were run on a MacBook Pro. PHP is almost 40 times slower than Node.js. Node.js is almost as fast as native code, which makes sense since it does just-in-time compilation.

PHP code:
<?php

function distance($x1, $y1, $x2, $y2) {
  $dx = $x2 - $x1;
  $dy = $y2 - $y1;
  return sqrt($dx * $dx + $dy * $dy);
}

$start = microtime(TRUE);
for($i = 0; $i < 10000000; $i++) {
  $x1 = rand(0, 100);
  $x2 = rand(0, 100);
  $y1 = rand(0, 100);
  $y2 = rand(0, 100);
  distance($x1, $y1, $x2, $y2);
}
$stop = microtime(TRUE);

echo "Elapsed time: " . ($stop - $start) . " s\n";

Node.js code:
function distance(x1, y1, x2, y2) {
  var dx = x2 - x1;
  var dy = y2 - y1;
  return Math.sqrt(dx * dx + dy * dy);
}

var start = Date.now()/1000;
for(var i = 0; i < 10000000; i++) {
  x1 = Math.random() * 100;
  x2 = Math.random() * 100;
  y1 = Math.random() * 100;
  y2 = Math.random() * 100;
  distance(x1, y1, x2, y2);
}
var stop = Date.now()/1000;

console.log("Elapsed time: " + (stop - start) + " s\n");

C code, compiled with gcc -Wall:
#include <stdlib.h>
#include <stdio.h>
#include <time.h>
#include <sys/time.h>
#include <math.h>

double distance(double x1, double y1, double x2, double y2) {
  double dx, dy;
  dx = x2 - x1;
  dy = y2 - y1;
  return sqrt(dx * dx + dy * dy);
}

int main(void) {
  int i;
  double x1, x2, y1, y2;
  struct timeval start, stop;

  srand(time(NULL));
  gettimeofday(&start, NULL);
  for(i = 0; i < 10000000; i++) {
    x1 = rand() * 100;
    x2 = rand() * 100;
    y1 = rand() * 100;
    y2 = rand() * 100;
    distance(x1, y1, x2, y2);
  }
  gettimeofday(&stop, NULL);
  printf("Elapsed time: %.3f s\n", (stop.tv_sec-start.tv_sec) + (stop.tv_usec-start.tv_usec) / 1000000.0);
  return 0;
}

Sunday, October 7, 2012

MongoDB 2.2 Aggregation And MapReduce Performance

TL;DR - 1 million documents with readings across 20 sensors (1 reading per document), find the number of readings and their average per sensor. Aggregation framework does it in 4.1 seconds, map/reduce in 67.2 seconds, map/reduce with jsMode=true in 45.5 seconds, and reading all documents into PHP and doing it there takes 3.3 seconds. Conclusion - MongoDB's map/reduce has horrible performance for some reason.

Load the initial data with
time mongo load_initial.js
which on my laptop took 1m32.271s.

// load_initial.js
//
// Create a collection of documents where each document is
// { "sensor_id": 4, "ts": 123456.78, "reading": 2.3 }

for(var i=0; i < 1000000; i++) {
  sensor_id = Math.floor(Math.random() * 20 + 1); // Random int 1-20.
  ts = Math.random() * 3000000; // Random float from 0-2999999.9999
  reading = Math.random() * 100; // Random float 0.0-99.9999
  obj = {"sensor_id": sensor_id, "ts": ts, "reading": reading};
  db.sensors.save(obj);
}
Run
mongo --eval 'db.sensors.count()'
as a sanity check, which on my laptop printed
MongoDB shell version: 2.2.0
connecting to: localhost:27017/test
1000000
For each sensor, let's count the number of readings and find the average reading. First let's do it using the aggregation framework:
// count_avg_by_sensor_aggregate.js
//
// For each sensor, find the number of readings, and their average value.

res = db.sensors.aggregate(
  {
    $group: {
      _id: "$sensor_id",
      "readings": {$sum: 1},
      "average": {$avg: "$reading"}
    }
  }
).result.sort(
  function(a, b) {
    return a._id - b._id;
  }
);

printjson(res);
Run it with
time mongo count_avg_by_sensor_aggregate.js
I omitted the output, but this took 0m4.111s. Now let's do this with a map/reduce:
// count_avg_by_sensor_mr.js
//
// For each sensor, find the number of readings, and their average value.

if(typeof jsMode === "undefined") {
  jsMode = false;
}

res = db.sensors.mapReduce(
  function() {
    emit(this.sensor_id, { "readings": 1, "total": this.reading });
  },
  function(key, values) {
    var result = { "readings": 0, "total": 0 };
    for(var i = 0; i < values.length; i++) {
      result.readings += values[i].readings;
      result.total += values[i].total;
    }
    return result;
  },
  {
    "out": { "inline": 1 },
    "finalize": function(key, value) {
      return {
        "readings": value.readings,
        "average": value.total / value.readings
      };
    },
    "jsMode": jsMode
  }
).results;

newres = [];
for(var i = 0; i < res.length; i++) {
  newres.push(
    {
      // When jsMode is true, _id becomes a string for some reason.
      "_id": parseInt(res[i]._id),
      "readings": res[i].value.readings,
      "average": res[i].value.average
    }
  );
}

newres = newres.sort(
  function(a, b) {
    return a._id - b._id;
  }
);

printjson(newres);
Run it with
time mongo count_avg_by_sensor_mr.js
This took 1m7.171s . Now do the same thing, but use jsMode=true:
time mongo --eval jsMode=true count_avg_by_sensor_mr.js
This took 0m45.459s . And finally, let's do the same calculation in PHP:
<?php

// count_avg_by_sensor.php
//
// For each sensor, find the number of readings, and their average value.

$mongo = new Mongo('mongodb://localhost:27017', array(
  'db' => 'test'
));
$db = $mongo->selectDB('test');
$sensors = $db->selectCollection('sensors');

$cursor = $sensors->find();
$result = array();
foreach($cursor as $doc) {
  $_id = (int)$doc['sensor_id'];
  if(!array_key_exists($_id, $result)) {
    $result[$_id] = array("readings" => 0, "total" => 0);
  }
  $result[$_id]["readings"] += 1;
  $result[$_id]["total"] += $doc['reading'];
}
ksort($result);

$newresult = array();
foreach($result as $_id => $value) {
  $newresult[] = array(
    "_id" => $_id,
    "readings" => $value["readings"],
    "average" => $value["total"] / $value["readings"]
  );
}

echo json_encode($newresult, JSON_PRETTY_PRINT);
Run it:
time php count_avg_by_sensor.php
This took 0m3.313s .