require(), require_once() and Dynamic Autoloading in PHP

Introduction

I program in PHP a lot,  but I’ve avoided using autoloaders,  except when I’ve been working in frameworks,  such as symfony,   that include an autoloader.  Last month I started working on a system that’s designed to be part of a software product line:  many scripts,  for instance,  are going to need to deserialize objects that didn’t exist when the script was written:  autoloading went from a convenience to a necessity.

The majority of autoloaders use a fixed mapping between class names and PHP file names.  Although that’s fine if you obey a strict “one class,  one file” policy,  that’s a policy that I don’t follow 100% of the time.  An additional problem is that today’s PHP applications often reuse code from multiple frameworks and libraries that use different naming conventions:  often applications end up registering multiple autoloaders.  I was looking for an autoloader that “just works” with a minimum of convention and configuration — and I found that in a recent autoloader developed by A.J. Brown.

After presenting the way that I integrated Brown’s autoloader into my in-house frameowrk,  this article considering the growing controversy over require(),  require_once() and autoloading performance:  to make a long story short,  people are experiencing very different results in different environments,  and the growing popularity of autoloading is going to lead to changes in PHP and the ecosystem around it.

History:  PHP 4 And The Bad Old Days

In the past,  PHP programmers have included class definitions in their programs with the following four built-in functions:

  • include
  • require
  • include_once
  • require_once

The difference between the include and require functions is that execution of a program will continue if a call to include fails and will result in an error if a call to require fails.  require_ and require_once are reccomended for general use,  since you’d probably rather have an application fail if libraries are missing rather than barrel on with unpredictable results.  (Particularly if the missing library is responsible for authentication.)

If you write

01 require "filename.php";

PHP will scan the PHP include_path for a directory that contains a file called “filename.php”;  it then executes the content of “filename.php” right where the function is called.  You can do some pretty funky things this way,  for instance,  you can write

02 for (int i=0;i<10;$i++) {
03    require "template-$i.php";
04 }

to cause the sequential execution of a  namset of PHP files named “template-0.php” through “template-9.php.”  A required file has access to local variables in the scope that require is called,  so require is particularly useful for web templating and situations that require dynamic dispatch (when you compute the filename.)  A cool,  but slightly obscure feature,  is that an included file can return a value.  If an source file,  say “compute-value.php” uses the return statement,

05 return $some_value;

the value $some_value will be return by require:

06 $value=require "compute-value.php";

These features can be used to create an MVC-like framework where views and controllers are implemented as source files rather than objects.

require isn’t so appropriate,  however,  when you’re requiring a file that contains class definitions.  Imagine we have a hiearchy of classes like Entity -> Picture -> PictureOfACar,PictureOfAnAnimal.  It’s quite tempting for PictureofACar.php and PictureofAnAnimal.php to both

07 require "Picture.php";

this works fine if an application only uses PictureOfACar.php and requires it once by writing

08 require "PictureOfACar.php";

It fails,  however,  if an application requires both PictureOfACar and PictureOfAnAnimal since PHP only allows a class to be defined once.

require_once neatly solves this problem by keeping a list of files that have been required and doing nothing if a file has already been required.  You can use require_once in any place where you’d like to guarantee that a class is available,  and expect that things “just work”

__autoload

Well,  things don’t always “just work”.  Although large PHP applications can be maintained with require_once,  it becomes an increasing burden to keep track of require files as applications get larger.  require_once also breaks down once frameworks start to dynamically instantiate classes that are specified in configuration files.  The developers of PHP found a creative solution in the __autoload function,  a “magic” function that you can define.  __autoload($class_name) gets called whenever a PHP references an undefined class.  A very simple __autoload implementation can work well:  for instance,  the PHP manual page for __autoload has the following code snippet:

09 function _autoload($class_name) {
10    require_once $class_name . '.php';
11 }

If you write

12 $instance=new MyUndefinedClass();

this autoloader will search the PHP include path for “MyUndefinedClass.php.”  (A real autoloader should be a little more complex than this:  the above autoloader could be persuaded to load from an unsafe filename if user input is used to instantiate a class dynamically,  i.e.

13 $instance=new $derived_class_name();

Static autoloading and autoloader proliferation

Unlike Java,  PHP does not have a standard to relate source file names with class names.  Some PHP developers imitate the Java convention to define one file per class and name their files something like

ClassName.php,  or
ClassName.class.php

A typical project that uses code from several sources will probably have sections that are written with different conventions  For instance,  the Zend framework turns “_” into “/” when it creates paths,  so the definition of “Zend_Loader” would be found underneath “Zend/Loader.php.”

A single autoloader could try a few different conventions,  but the answer that’s become most widespread is for each PHP framework or library to contain it’s own autoloader.  PHP 5.2 introduced the spl_register_autoload() function to replace __autoload().  spl_register_autoload() allows us to register multiple autoloaders,  instead of just one.  This is ugly,  but it works.

One Class Per File?

A final critique of static autoloading is that it’s not universally held that “one class one file” is the best practice for PHP development.  One of the advantages of OO scripting languages such as PHP and Python is that you can start with a simple procedural script and gradually evolve it into an OO program by gradual refactoring.  A convention that requires to developers to create a new file for each class tends to:

  1. Discourage developers from creating classes
  2. Discourage developers from renaming classes
  3. Discourage developers from deleting classes

These can cumulatively lead programmers to make decisions based on what’s convenient to do with their development tools,  not based on what’s good for the software in the long term.  These considerations need to be balanced against:

  1. The ease of finding classes when they are organized “once class per file”,
  2. The difficulty of navigating huge source code files that contain a large number of classes,  and
  3. Toolset simplification and support.

The last of these is particularly important when we compare PHP with Java.  Since the Java compiler enforces a particular convention,  that convention is supported by Java IDE’s.  The problems that I mention above are greatly reduced if you use an IDE such as Eclipse,  which is smart enough to rename files when you rename a class.  PHP developers don’t benefit from IDEs that are so advanced — it’s much more difficult for IDE’s to understand a dynamic language.  Java also supports inner classes,  which allow captive classes (that are only accessed from within an enclosing class) to be defined inside the same file as the enclosing class.  Forcing captive classes to be defined in separate files can cause a bewildering number of files to appear,  which,  in turn,  can discourage developers from using captive classes — and that can lead to big mistakes.

Dynamic Autoloading

A. J. Brown has developed an autoloader that uses PHP’s tokenizer() to search a directory full of PHP files,  search the files for classes,  and create a mapping from class names to php source files.  tokenizer() is a remarkable metaprogramming facility that makes it easy to write PHP programs that interpret PHP source.  In 298 lines of code,  Brown defines three classes.  To make his autoloader fit into my in-house framework,  I copied his classes into two files:

  • lib/nails_core/autoloader.php: ClassFileMap, ClassFileMapAutoloader
  • lib/nails_core/autoloader_initialize.php: ClassFileMapFactory

I’m concerned about the overhead of repeatedly traversing PHP library directories and parsing the files,  so I run the following program to create the ClassFileMap,  serialize it,  and store it in a file:

bin/create_class_map.php:
14 <?php
15
16 $SUPRESS_AUTOLOAD=true;
17 require_once(dirname(__FILE__)."/../_config.php");
18 require_once "nails_core/autoloader_initialize.php";
19 $lib_class_map = ClassFileMapFactory::generate($APP_BASE."/lib");
20 $_autoloader = new ClassFileMapAutoloader();
21 $_autoloader->addClassFileMap($lib_class_map);
22 $data=serialize($_autoloader);
23 file_put_contents("$APP_BASE/var/classmap.ser",$data);

Note that I’m serializing the ClassFileMapAutoloader rather than the ClassFileMap,  since I’d like to have the option of specifying more than one search directory.  To follow the “convention over configuration” philosophy,  a future version will probable traverse all of the directories in the php_include_path.

All of the PHP pages,  controllers and command-line scripts in my framework have the line

24 require_once(dirname(__FILE__)."/../_config.php");

which includes a file that is responsible for configuring the application and the PHP environment.   I added a bit of code to the _config.php to support the autoloader:

_config.php:
25 <?php
26 $APP_BASE = "/where/app/is/in/the/filesystem";
   ...
27 if (!isset($SUPPRESS_AUTOLOADER)) {
28    require_once "nails_core/autoloader.php";
29    $_autoloader=unserialize(file_get_contents($APP_BASE."/var/classmap.ser"));
30    $_autoloader->registerAutoload();
31 };

Pretty simple.

Autoloading And Performance

Although there’s plenty of controversy about issues of software maintainability,  I’ve learned the hard way that it’s hard to make blanket statements about performance — results can differ based on your workload and the exact environment you’re working in.  Although Brown makes the statement that “We do add a slight overhead to the application,”  many programmers are discovering that autoloading improves performance over require_once:

Zend_Loader Performance Analysis
Autoloading Classes To Reduce CPU Usage

There seem to be two issues here:  first of all,  most systems that use require_once are going to err on the side of including more files than they need rather than fewer — it’s better to make a system slower and bloated than to make it incorrect.  A system that uses autoloading will spend less time loading classes,  and,  just as important,  less memory storing them.  Second,  PHP programmers appear to be experience variable results with require() and require_once():

Wikia Developer Finds require_once() Slower Than require()
Another Developer Finds Little Difference
Yet Another Developer Finds It Depends On His Cache Configuration
Rumor has it,  PHP 5.3 improves require_once() performance

One major issues is that require_once() calls the realpath() C function,  which in turn calls the lstat() system call.  The cost of system calls can vary quite radically on different operating systems and even different filesystems.  The use of an opcode cache such as XCache or APC can also change the situation.

It appears that current opcode caches (as of Jan 2008) don’t efficiently support autoloading:

Mike Willbanks Experience Slowdown With Zend_Loader

APC Developer States That Autoloading is Incompatible With Cacheing
Rambling Discussion of the state of autoloading with XCache

the issue is that they don’t,  at compile time,  know what files are going be required by the application.  Opcode caches also reduce the overhead of loading superfluous classes,  so they don’t get the benefits experienced with plain PHP.

It all reminds me of the situation with synchronized in Java.  In early implementations of Java,  synchronized method calls had an execution time nearly ten times longer than ordinary message calls.  Many developers designed systems (such as the Swing windowing toolkit) around this performance problem.  Modern VM’s have greatly accelerated the synchronization mechanism and can often optimize superfluous synchronizations away — so the performance advice of a decade ago is bunk.

Language such as Java are able to treat individual classes as compilation units:  and I’d imagine that,  with certain restrictions,  a PHP bytecode cache should be able to do just that.  This may involve some changes in the implementation of PHP.

Conclusion

Autoloading is an increasingly popular practice among PHP developers.  Autoloading improves development productivity in two ways:

  1. It frees developers from thinking about loading the source files needed by an application,  and
  2. It enables dynamic dispatch,  situations where a script doesn’t know about all the classes it will interact with when it’s written

Since PHP allows developers to create their own autoloaders,  a number of autoloaders exist.  Many frameworks,  such as the Zend Framework,  symfony,  and CodeIgniter,  come with autoloaders — as a result,  some PHP applications might contain more than one autoloader.  Most autoloaders require that classes be stored in files with specific names,  but Brown’s autoloader can scan directory trees to automatically  locate PHP classes and map them to filenames.  Eliminating the need for both convention and configuration,  I think it’s a significant advance:  in many cases I think it could replace the proliferation of autoloaders that we’re seeing today.

You’ll hear very different stories about the performance of autoload,  require_once() and other class loading mechanisms from different people.  The precise workload,  operating system,  PHP version,  and the use of an opcode cache appear to be important factors.  Widespread use of autoloading will probably result in optimization of autoloading throughout the PHP ecosystem.