A reliable way to serialize/unserialize objects in PHP

An experienced PHP developer might be wondering why posting this topic in a blog if PHP already has universal and almost transparent tools for this job, specifically serialize and unserialize built-in functions.

The key statement here is “almost transparent” which means you have to include all class definitions before invoking unserialize or use some __autoload schema. Otherwise you are going to face the dreaded “The script tried to execute a method or access a property of an incomplete object” fatal error.

The whole problem is due to the fact a serialized object has no idea about its class definition except the class name(the reason behind that is absolutely valid).

To illustrate this problem have a look at the following trivial code which echoes serialized version of the Foo class.

<?php
class Foo {
  function say(){
    echo 'Foo';
  }
}

$foo = new Foo();
echo serialize($foo);

If you execute the snippet above it should print the following:

O:3:"Foo":0:{}

Now let’s try unserializing this string in order to experience the “incomplete object” error:

<?php
$str = 'O:3:"Foo":0:{}';
$foo = unserialize($str);
$foo->say();

Try running this code, you should see the following:

“Fatal error: main(): The script tried to execute a method or access a property of an incomplete object. Please ensure that the class definition “Foo” of the object you are trying to operate on was loaded _before_ unserialize() gets called or provide a __autoload() function to load the class definition in….”

Tada ;)

As error message clearly states you should include all classes before unserializing or provide a proper __autoload function. In my opinion none of these options are perfect and universal, here is why:

  • If you are using some generic serialization subsystem(e.g. universal cache or session persistence handler) you simply can’t predict what classes you actually need to include. And of course including all of them is very impractical.
  • __autoload may be your savior but in this case you need to decide how you are going to map class names to the actual files containing them on the disk:
    • Some folks(e.g. ZendFramework) encode path to the file in a class name using “_” as delimiters(e.g. “My_Db_Mysql_Row”) then simply replace “_” with “/” in __autoload, append “.php” and include the file.
    • Others(e.g. ezComponents IIRC) maintain static autoload maps with keys of such maps being class names and values - PHP file names containing these classes.

    I believe both of these options put quite severe constraints on a developer which is generally a “bad thing”. For example, some developers find __autoload being way too magic and hate encoding a path to a file in a class name.

That’s why I decided to hack up, hopefully, a more universal solution to this problem ;)

The main idea is to use a special serialization container class(SerializableContainer) which would take care of including class definitions automatically. The best way to show how it works is to provide some sample code, serialization first:

<?php
...
$foo = new Foo();
$serializable = new SerializableContainer($foo);
file_put_contents('./data', serialize($serializable));

…and here comes unserialization:

<?php
...
$serializable = unserialize(file_get_contents('./data'));
$foo = $serializable->getSubject();
$foo->say(); //should work just fine

No PHP fatal error is raised because SerializableContainer automagically includes all required class definitions during unserializing process. Before I show its code, here’s how it works step by step:

  1. Remember the way serialized object string looks like for $foo object? Here it is once again: O:3:”Foo”:0:{}. The most interesting part of it is the beginning: O:N… which tells to the userialize function we’ve got some object’s data here. Using simple regex it’s possible to extract all class names from the serialized string!
  2. Now, provided we have all class names extracted, we should somehow get files where they were included from. It’s a no-brainer using PHP5 reflection capabilities: $reflect = new ReflectionClass($class); $reflect->getFileName();
  3. The “class => file” map should be stored in SerializableContainer as an attribute and serialized along with other stuff since we’ll use it for class files inclusion upon unserialization.
  4. Finally, on data unserialization all we have to do is just to include required class files.

Ok, here’s the code:

<?php

class SerializableContainer {
  protected $subject;
  protected $serialized;
  protected $class_paths = array();

  function __construct($subject) {
    $this->subject = $subject;
  }

  function getSubject() {
    if($this->serialized) {
      $this->_includeFiles();
      $this->subject = unserialize($this->serialized);
      unset($this->serialized);
    }
    return $this->subject;
  }

  function __sleep() {
    $this->serialized = serialize($this->subject);
    $this->_fillClassPathInfo($this->serialized);
    return array('serialized', 'class_paths');
  }

  function _includeFiles() {
    foreach($this->class_paths as $path)
      require_once($path);
  }

  function _fillClassPathInfo($serialized) {
    $classes = $this->_extractSerializedClasses($serialized);
    $this->class_paths = array();

    foreach($classes as $class) {
      $reflect = new ReflectionClass($class);
      $this->class_paths[] = $reflect->getFileName();
    }
  }

  function _extractSerializedClasses($str) {
    if(preg_match_all('~([||;]O|^O):d+:"([^"]+)":d+:{~', $str, $m))
      return array_unique($m[2]);
    else
      return array();
  }
}

A couple of ideas where SerializableContainer could be quite handy:

a) Some transparent session persistence wrapper

<?php

class Session {
...
  function get($name) {
    if(!isset($_SESSION[$name]))
      return null;

    if(is_object($_SESSION[$name]) && $_SESSION[$name] instanceof SerializableContainer)
      return $_SESSION[$name]->getSubject();
    else
      return $_SESSION[$name];
  }

  function set($name, $value) {
    if(is_object($value)) {
      $serializable = new SerializableContainer($value);
      $_SESSION[$name] = $serializable;
    }
    else
      $_SESSION[$name] = $value;
  }
...
}

b) Some generic file(database, etc) caching subsystem

<?php
class FileCache {
  function set($key, $value, $params = array()) {
    $file = $this->getCacheDir() . '/' . $this->_getCacheFileName($key, $params);
    $container = new SerializableContainer($value);
    file_put_contents($file, serialize($container));
  }

  function get($key, $params = array()) {
    if(!$file = $this->_findCacheFile($key))
      return false;
    $container = unserialize(file_get_contents($file));
    return $container->getSubject();
  }

Well, I think, you get the idea ;)

As everything in our world, SerializableContainer is not perfect:

  • It depends on PHP’s internal serialization format. Should PHP core developers ever change it(word on the street, this is possible in PHP6) there’s a big chance SerializableContainer may get broken. Hopefully this can be fixed by adding proper phpversion and version_compare checks into it.
  • SerializableContainer uses a regular expression in order to extract class names from the raw serialized data which potentially may become a bottleneck for a large amount of data. I have not profiled it, but, I believe, if that’s the case, the regex can be replaced with a number of faster strpos/substr equivalent calls(or rewritten as a PHP extension in the worst scenario)
  • You should think twice before relocating files with classes which were subject to serialization using this approach since you may face “file not found” error(once again, SerializableContainer stores full paths to files containing classes)
  • If this approach is used in a cluster, all servers must have all PHP code stored using identical file system layout for obvious reasons.

What do you think about this approach?

P.S. Actually such SerializableContainer is implemented in limb/core package as a lmbSerializable class(here’s a unit test), feel free to use it.

15 Responses to “A reliable way to serialize/unserialize objects in PHP”

  1. Edward Z. Yang Says:

    Your approach amounts to parsing the serialized string twice. I personally don’t see anything wrong with autoloading the classes (I should be able to look at a class name and figure out its location), but…

    Your regex might be buggy, for example:

    serialize(’”‘) == ’s:1:”"”;’

    Notice how the quote is un-escaped; unserialize uses the number in order to determine how long the following string is.

    The serialize() format has changed in the past in a way that was backwards-incompatible. Your wrapper may want to check for that.

  2. pachanga Says:

    Edward, thanks for comments.

    As for possible drawbacks to this approach, I listed some of them in the bottom of my post and the ones you mentioned(extra parsing and serialize format changes) are among of them too ;)

    As for the regex being buggy…well, it seems pretty valid to me. Maybe I’m just missing something?

  3. James Dempster Says:

    I like your idea and it’s well implemented.

    Although I’m not in favour of this approach. I think that a well structured application layout is very important. If you have that then you know where class structures are located and can find them easily at runtime by use of __autoload. Also as this approach stores the filename of the class structure with the serialised object, if the file was ever to be moved to a new location it would break un-serialization of the object. But if __autoload is used this would allow you to adjust to this relocation or restructuring of your filesystem.

  4. pachanga Says:

    James, you are right about possible problems due to files relocation, I’ll update the drawbacks list, thanks. Fortunately, this happens not that often to be a real issue(at least in my practice). However if this approach is used in a cluster, all servers _must_ have all PHP code stored using identical file system layout for obvious reasons.

  5. sorccu Says:

    var_dump(preg_match(’~([||;]O|^O):\d+:”([^”]+)”:\d+:{~’, serialize(’user;O:3:”Foo”:0:{}input’)));

    So, depending on what you are serializing, some evil madman could be able to force you to load a class file that isn’t necessarily needed. Given the need for the class to already exist at the time of serializing, this is hardly a serious risk but it is nevertheless quite scary.

  6. pachanga Says:

    sorccu, I believe this should be a concern of the code using SerializableContainer. SerializableContainer is a pretty generic solution and it simply can’t predict all possible misuses. The same security concerns are valid for serialize/unserialize calls too. Anyway thanks for raising this issue.

  7. pachanga Says:

    >Although I’m not in favour of this approach.
    >I think that a well structured application layout is very important…
    >If you have that then you know where class structures are located
    >and can find them easily at runtime by use of __autoload

    Unfortunately, this is often not the case when dealing with misc. external libraries which have different file system layout, naming conventions or simply don’t use __autoload at all.

  8. Alexey Zakhlestin Says:

    At some point I thought you would go the “java way” and put the code of the class into serialized stream ;)

    Solution is an interesting one. Though, I doubt I will have a use-case for that. My projects usually have “ezc-style” autoload maps (and I patch external libs which I use to do the same)

  9. PHPDeveloper.org Says:

    Pavel Shevaev’s Blog: A reliable way to serialize/unserialize objects in PHP…

    Pavel Shevaev has posted his method (a reliable way) for serializing ……

  10. developercast.com » Pavel Shevaev’s Blog: A reliable way to serialize/unserialize objects in PHP Says:

    […] Shevaev has posted his method (a reliable way) for serializing and unserializing objects in your […]

  11. Adam Says:

    Serialize for the win!

  12. Веб-обзор #10 - бизнес, бизнес, немного РНР, совсем немного PostgreSQL и архитектуры Flickr и Google на русском. | Alpha-Beta-Release Blog Says:

    […] A reliable way to serialize/unserialize objects in PHP - ещё один материал, более основательный, для РНР разработчиков, который расскажет о таком, казалось бы тривиальном процессе, как сериализация и десериализация объектов. На самом деле это совсем не так просто и чревато разными подводными камнями, особенно для начинающих программистов, так что очень рекомендуется для изучения. […]

  13. kate Says:

    good post man thx

  14. Autoload Magic | Straylight Run Says:

    […] Use a special container class that all other classes inherit from. This method is suitable mostly when attempting to unserialize objects (when unserializing objects, PHP must have the class definition to recover the object). This special container will automatically know the class file for the contained object. Then, the only class file that needs to be located is the container’s class file. (Autoloading in the context of object un/serialization is a special case of class loading.) […]

  15. container class Says:

    […] … 251 + 251 equals 502, and since the container is 501px wide, it can’t fit in the same row. …My desperate life quest for efficiency Blog Archive A …Then, the only class file that needs to be located is the container’s class file. … Mail (will not […]

Leave a Reply