Fibers In PHP 8.1

PHP 8.1 comes with a few new additions to the language, and one that I have seen get the most attention is fibers. This is a feature that allows PHP code to be executed in a concurrent thread, with the ability to pause and resume that code at will.

Using fibers doesn't mean that the code runs in parallel, rather that the code is executed away from the main thread in a virtual or green thread. These are threads create by and executed by the PHP VM, rather than being executed by the CPU and managed by the underlying operating system. This lightweight thread of execution is also called a coroutine and are executed in sequence, rather than being parallel.

Fibers are intended to eliminate the distinction between synchronous and asynchronous functions in PHP and provide a petter mechanism to manage blocking code.

Before digging into how fibers work it's a good idea to look at these different functions so that we can set them apart.

Synchronous And Asynchronous Functions In PHP

Synchronous functions in PHP are just normal functions. The code executes in sequence, processing one line after the other.

Taking an example of downloading the contents of a web page we could do something like this.

function get_web_page($url) {
  $ch = curl_init();
  curl_setopt ($ch, CURLOPT_URL, $url);
  curl_setopt ($ch, CURLOPT_HEADER, 0);
  curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
  curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 1);
  curl_setopt ($ch, CURLOPT_TIMEOUT, 120);
  $contents = curl_exec($ch);
  curl_close($ch);
  return $contents;
}

echo get_web_page('https://www.hashbangcode.com/');

There are multiple ways of grabbing the contents of a web page. In the example above I am using the PHP curl library to make the request.

One important thing to note about this example is that it is blocking. This means that when the function is called the program is prevented from running any more code until the request has been completed and the response has been returned. The request essentially "blocks" the CPU. You can think of any request that leaves the codebase as blocking; so all file, database, network or even memory access is classed as blocking.

To get around this limitation asynchronous functions were devised. This is where the code and the result of the call are separated so that they are run at different times or in different places in code. There are a few different ways to implement asynchronous functions, but the that tends to be mentioned the most is "promises". A promise represents the eventual result of an operation but without actually processing the code until later.

A standard exists for Promises that was originally devised for JavaScript and has made its way to other languages. As tends to happen with standards there are different versions or flavours. This isn't the standard in use for all promise implementations, but it seems to be the common one.

At their core, promises rely on closures, and since PHP has closures there is nothing to prevent PHP also implementing promises. According to the standard, a promise object must implement a method called "then()" that accepts two closures on what to do if the promise succeeds or fails.

If we take a look at the GuzzleHttp library implementation of promises then there are some examples like this.

use GuzzleHttp\Promise\Promise;

$promise = new Promise();
$promise
    ->then(function ($value) {
        // Return a value and don't break the chain
        return "Hello, " . $value;
    })
    // This then is executed after the first then and receives the value
    // returned from the first then.
    ->then(function ($value) {
        echo $value;
    });

// Resolving the promise triggers the $onFulfilled callbacks and outputs
// "Hello, reader."
$promise->resolve('reader.');

In the above example we are setting up two "then()" method calls that will be called in sequence after the promise is resolved. The call to resolve() would be placed elsewhere in the code and only called when it was explicitly needed. There is a little bit of code running behind the scenes here, but we are essentially creating a situation and then executing code if that situation is seen as correct.

Asynchronous functions tend to be run in an event loop. This is a loop that doesn't end unless there are no more events to process, which can also set to run forever. Asynchronous functions are added as events to the loop so that the promises can be set and resolved in different iterations of the loop. There is a lot more to event loops than I've described here, but that's the basics. 

There are a few problems with asynchronous functions.

  • If you want to go asynchronous then you need to rewrite a lot of code to handle the asynchronous calls. This ends up with applications written around the asynchronous functions, rather than them being a part of it. If you are writing asynchronous functions then you can't suddenly drop into synchronous functions as it will break the model and cause blocking.
  • Asynchronous functions can be difficult to understand and debug as they essentially decouple code. The action at a distance anti-pattern becomes quite commonplace in code like this and can create some very difficult bugs.
  • Different libraries implement promises in different ways and although there is some interoperability between different libraries that is a little rare. This creates

Some people will say that asynchronous functions run in parallel, which isn't necessarily the case, especially with PHP. There are some functions and packages that will implement parallel processing into PHP so that asynchronous functions can be run in parallel, but this "promise" system is normally run in sequence. In fact, parallel processing can get quite complex and is only really suited to certain situations where concurrency isn't a key factor.

As asynchronous functions and promises aren't built into PHP there needs to be a small amount of code to get everything running correctly. In recent years, PHP based projects like amphp, ReactPHP and Guzzle have all implemented asynchronous frameworks to make live easier when writing asynchronous code. They also implement their own promise implementations.

Fibers

Fibers in PHP are a way of creating asynchronous code that doesn't require a ground up re-implementation of your codebase. As I mentioned before, fibers are executed in a separate thread, but this does not mean that they are processed in parallel. Instead, fibers will use what are known as green threads or virtual threads that are threads scheduled by the PHP VM, rather than the underlying operating system. This gives fibers the ability to be fully controlled by PHP itself.

As the threads that fibers run in are controlled by PHP this means that we can create full-stack, interruptible functions. Essentially, we can pause and resume fiber code from anywhere in the call-stack. This allows the ability to create functions that can be paused without having to create your own promises framework or build you application in a different way.

Fibers in PHP are controlled through the new Fiber class, which is how all fibers within PHP are controlled.

The example on the PHP documentation for Fibers has the following code. 

$fiber = new Fiber(function (): void {
  $value = Fiber::suspend('fiber');
  echo "Value used to resume fiber: ", $value, "\n";
});

$value = $fiber->start();

echo "Value from fiber suspending: ", $value, "\n";

$fiber->resume('test');

This produces the following result.

Value from fiber suspending: fiber
Value used to resume fiber: test

When I first saw this example I thought "so what?", but there is more going on here than it first seems. Let's go through each part in detail.

The first part of this code sets up a new Fiber object. The Fiber constructor accepts a callable parameter that will be called when we start the fiber. In this example we are passing a closure that will be executed when the fiber starts.

$fiber = new Fiber(function (): void {
  $value = Fiber::suspend('fiber');
  echo "Value used to resume fiber: ", $value, "\n";
});

The key part of the above example is the call to Fiber::suspend(). This is a special function that can only be called within a Fiber object and is used to pause the execution of the fiber.

As a side note, if you do attempt to add a call to Fiber::suspend() outside of a fiber then you'll get the following error.

PHP Fatal error:  Uncaught FiberError: Cannot suspend outside of a fiber in fibers.php

The next step here is to start the Fiber. This is done using the start() method of the Fiber object.

$value = $fiber->start();

Note that we are storing a return value of the start() method. This comes from the argument we passed to the Fiber::suspend() call and will only have anything in it once the fiber has been paused. In this line the $value variable will now contain the string "fiber" since that is what we passed into the Fiber::suspend() call.

The next line attempts to demonstrate this by printing out that value.

echo "Value from fiber suspending: ", $value, "\n";

Finally, we resume the fiber. This line picks up execution where we paused it using the Fiber::suspend() call.

$fiber->resume('test');

The value we pass into the resume() method is then made available in the closure as it is returned from the result of the Fiber::suspend() call. This essentially means that the string "test" is then available within the closure after the code is resumed. The final line of code to print out the value of the $value variable is then executed, which prints out the final string.

What we have done here is to start a PHP function and then pause it half way through execution to go and do something else. Then, at a later time we press play on the function and it resumes. Not only that, but we can pass values to and from the inside of the fiber as the code executes. Powerful stuff!

As a way of demonstrating the threads in this application I have put together this diagram.

main         fiber
start() -----┐
             |
  ┌------ suspend()
  |
  |
resume() ----┐
             |
             |
  ┌------ terminates
  |
program
continues

It is perfectly possible to create multiple fibers by creating new instances of the Fiber object. Each new instance of the Fiber class creates another fiber that acts in its own virtual thread.

Fiber States

A PHP fiber might have a number of different states that effect how you interact with it. These are as follows.

  • Started - This is a fiber that has been started using the start() method. Any fiber that has been suspended or terminated will be seen as started. You can detect this using the isStarted() method of the Fiber class.
  • Suspended - Once a fiber has been suspended using the Fiber::suspended() call it will be in the suspended state. Fibers in this state have been started (and so will return true from isStarted()) but are not considered to be running or terminated. The method isSuspended() allows you to detect a fiber in this state.
  • Running - A fiber that has been started and is currently running. The fiber in this state is not suspended or terminated, but will report as having been started. The method isRunning() allows you to detect a fiber in this state.
  • Terminated - Once the execution of the fiber has reached the end, either by finishing execution or throwing and error, then it can be said to be terminated. Terminated fibers have been started, but since they have finished they will return false on a isStarted() check. The method isTerminated() can be used to detect this state.

Real World Usage Of Fibers

Whilst looking into the theory is nice, I always try to think about how I would use this code in a real application. It took me a little time to get my head into why this would be useful, so I'll add some examples here.

Stream A Webpage

The simplest fiber example I could think of (that does useful work) would be streaming a web page. The following example will create a file handle and then use a fiber to do the actual work of streaming the data from the handle. In this case we are pulling in the contents of a website, 50 bytes at a time.

$fiber = new Fiber(function($stream): void {
  while (!feof($stream)) {
    $contents = fread($stream, 50);
    Fiber::suspend($contents);
  }
});

$stream = fopen('https://www.hashbangcode.com/', 'r');
stream_set_blocking($stream, false);

$contents = $fiber->start($stream);

while (!$fiber->isTerminated()) {
  $contents .= $fiber->resume();
}

fclose($stream);

echo $contents;

This example shows how data we create inside the fiber can be transmitted outside of the fiber using the Fiber::suspend() method. Using fibers here is useful as it allows us to nibble away at the contents of the stream if, for example, we were trying to find the presence of a string. We would otherwise need to download everything and inspect it afterwards.

Multiple File Delete

File deletions, or any file operation, can be abstracted away from the current code using a fiber. The following example will take a set of files and delete them one at a time.

$fiber = new Fiber(function(array $files): void {
  foreach ($files as $file) {
    unlink($file);
    Fiber::suspend($file);
  }
});

$files = [
  'test1.txt',
  'test2.txt',
  'test3.txt',
];

print "Deleting Files" . PHP_EOL;

$last_file_deleted = $fiber->start($files);
$files_deleted = 1;
$total_files = count($files);

while (!$fiber->isTerminated()) {
  $percentage = round($files_deleted / $total_files, 2) * 100;
  printf("Deleted %s (%s%% done)." . PHP_EOL, $last_file_deleted, $percentage);
  $last_file_deleted = $fiber->resume();
  $files_deleted++;
}
print "Completed" . PHP_EOL;

This prints out the following.

Deleting Files
Deleted test1.txt (33% done).
Deleted test2.txt (67% done).
Deleted test3.txt (100% done).
Completed

The benefit of using fibers here is that we pause the execution of the file deletion to present some useful feedback to the user. This allows us to generate a percentage complete report and let the user know file was deleted.

Conclusion

When I first saw them, I honestly struggled to see why fibers were useful. After reading through a lot of code and articles involving asynchronous code, promises and other mechanisms I can see that they have a use case. Most confusingly, I have seen some articles referring to them as being parallel, which they most certainly are not. They do run in another thread, but they are not executed in parallel.

They are good for splitting blocking tasks like network reads and file operations down and abstracting them away from the normal (synchronous) codebase. In reality, I think most users will not need to use PHP Fibers directly. I have seen a few libraries incorporate fibers into them and so they will probably be used without people realising it.

Fibers are simple enough to understand and can be found in other languages in a similar form. They will help with making asynchronous code less dependent on a ground up implementation. Fibers are a low level construct though. They don't manage their own scheduler tasks or event loops as that is up to the developer to implement. Which is where the third party libraries will come in.

This is one step closer to a multi-threaded environment, but without the pitfalls and issues that are involved in multi-threaded environments. Creating fibers also takes less resources than full threads and as they are controlled through PHP they can be started and suspended easily.

Historically, most users have implemented PHP code that runs in a synchronous fashion. The code starts from the top, executes in through the code, and finishes. The introduction fibers is a different approach that allows for some interesting possibilities. I'll be interested to see how fibers are used in the PHP community and what benefits they can bring to PHP applications.

Comments

Hi, I just want to throw a few things out there: first, google the difference between concurrent vs parallel. Once you think you have a good idea of this difference, sit back and consider the implications of those two things on a system that is intended to deal with thousands of users at the same time, or if your lucky and you have a good service, many thousands of users at the same time. Regardless of a services prospective user count, it is basically guaranteed that user count will surpass the amount of available processors by a significant amount. Yes, it’s true that you can have servers with hundreds of processors, but if a service will only have 128 active users at once, your not likely to fit the bill for such hardware. With that low of a consumer count a service will often have only 8-16 cores.

With these expectations, which are very real, the question isn't a matter of true parallelism because there is simply no such thing. The question becomes how well can a system multi-task. From these optics, PHP Fibers are a big deal.

Secondly, Fibers are an actual primitive feature within the PHP core now, which means that native extensions are able to exploit the feature within C as well. This is a major thing, especially since many PHP extensions use threads outside of PHP userspace. When these native fibers are also bridged to userspace Fibers it allows actual userspace parallelism. But again, it’s short sighted to put much weight in that within any system where activity outstrips available CPU cores at such a level.

Permalink

Add new comment

The content of this field is kept private and will not be shown publicly.
CAPTCHA
4 + 8 =
Solve this simple math problem and enter the result. E.g. for 1+3, enter 4.
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.