Understanding JIT in PHP 8
The Just In Time compiler in PHP 8 is implemented as part of the Opcache extension and is intended to compile the operating code in processor instructions in runtime.
This means that with JIT some operating codes should not be interpreted by Zend VM, such instructions will be executed directly as processor level instructions.
JIT in PHP 8
One of the most commented features of PHP 8 is the Just In Time (JIT) compiler. He is well known in many blogs and communities - there is a lot of noise around him, but so far I have not found very many details about the work of JIT in details.
After repeated attempts and disappointments to find useful information, I decided to study the PHP source code. Combining my little knowledge of C and all the scattered information that I have been able to collect so far, I managed to prepare this article and I hope that it will help you better understand JIT PHP.
Simplifying things: when the JIT is working properly, your code will not be executed through the Zend VM, instead it will be executed directly as a set of processor level instructions.
That is the whole idea.
But in order to better understand this, we need to think about how php works inside. It is not very difficult, but requires some introduction.
I already wrote an article with a brief overview of how php works . If this article seems to be overly complicated, just read its predecessor and come back. This should ease things a bit.
How is the PHP code executed?
We all know that php is an interpreted language. But what does this really mean?
Whenever you want to execute PHP code, be it a fragment or an entire web application, you have to go through the php interpreter. The most commonly used ones are PHP FPM and the CLI interpreter. Their work is very simple: get the php code, interpret it and return the result back.
This is the usual picture for each interpreted language. Some steps may vary, but the general idea is the same. In PHP, this happens like this:
- PHP code is read and converted to a set of keywords known as Tokens. This process allows the interpreter to understand in which part of the program each piece of code is written. This first step is called Lexing or Tokenizing .
- Having tokens in hand, the PHP interpreter will analyze this collection of tokens and try to find meaning in them. As a result, Abstract Syntax Tree (AST) is generated using a process called parsing . AST is a set of nodes indicating which operations should be performed. For example, “echo 1 + 1” should actually mean “print the result 1 + 1” or, more realistically, “print the operation, the operation is 1 + 1.
- Having an AST, for example, makes it much easier to understand operations and their priority. Converting this tree to something that can be done requires an Intermediate Representation IR, which in PHP we call the opcode. The process of converting AST to operational code is called compilation.
- Now that we have the opcodes, the fun part is: execution of the code! PHP has an engine called Zend VM, which is able to get a list of opcodes and execute them. After all the opcodes have completed, the program ends.
To make this a little clearer, I made a chart:
A simplified diagram of the PHP interpretation process.
Straightforward enough, as you can see.But there is also a bottleneck: what is the point of lexing and parsing the code every time you execute it, if your php code may not even change so often?
In the end, we are only interested in opcodes, right? Right! That's why the Opcache Extension exists.
The Opcache extension comes with PHP, and there is usually no particular reason to deactivate it. If you are using PHP, you should probably enable Opcache.
What he does is add an operational shared cache layer for the opcodes. Its task is to extract the opcodes recently generated from our AST and cache them so that it can easily skip the lexing and parsing phases for future runs.
Here is a diagram of the same process, given the Opcache extension:
Stream of PHP interpretation with Opcache. If the file has already been parsed, php retrieves the cached operating code for it, rather than parsing it again.
It’s just bewitching how beautifully the steps of lexing, parsing and compilation are skipped.
Note : this is where the PHP 7.4 preload function works best ! This allows you to tell PHP FPM to analyze your code base, convert it to opcodes, and cache them even before you do anything.
You can start to think about where you can stick JIT here, right ?! At least I hope so, which is why I am writing this article...
What does the Just In Time compiler do?
After listening to Ziv’s explanation in the episode of the PHP and JIT podcasts from PHP Internals News , I was able to get some idea of what actually needs to be done JIT...
If Opcache allows you to get the operating code faster so that it can go directly to the Zend VM, JIT is intended to make it work without the Zend VM at all.
Zend VM is a C program that acts as a layer between the operating code and the processor itself. JIT generates compiled code at runtime, so php can skip Zend VM and go directly to the processor . Theoretically, we should benefit from this.
It sounded strange at first, because to compile machine code you need to write a very specific implementation for each type of architecture. But actually it is quite real.
The JIT implementation in PHP uses the DynASM (Dynamic Assembler) library, which maps a set of CPU commands in a specific format to assembly code for many different types of CPUs. Thus, the Just In Time compiler converts the operating code into machine code for a specific architecture using DynASM.
Although one thought still haunted me...
If preloading is able to parse the php code into the operating one before execution, and DynASM can compile the operating code into the machine code (Just In Time compilation), why the hell don't we compile PHP right away in place using Ahead of Time compilation ?!
One of the thoughts that the podcast episode prompted me was that PHP is weakly typed, that is, often PHP does not know what type the variable is until Zend VM tries to execute a specific opcode.
This can be understood by looking at the union type zend_value , which has a lot of pointers to different type representations for a variable. Whenever a Zend virtual machine tries to extract a value from zend_value, it uses macros like ZSTR_VAL who are trying to access a row pointer from a union of values.
For example, this Zend VM handler should process the expression "less or equals "(& lt; =).See how it forks into many different code paths to guess the types of operands.
Duplication of such type inference logic with machine code is not feasible and could potentially make work even slower.
Final compilation after the types have been evaluated is also not a good option, because compiling to machine code is a laborious CPU task. So compiling EVERYTHING at runtime is a bad idea.
How does the Just In Time compiler behave?
Now we know that we cannot deduce types to generate sufficiently good leading compilation. We also know that compilation at runtime is expensive. How can JIT be useful for PHP?
To balance this equation, JIT PHP only tries to compile a few opcodes, which, in his opinion, are worth it. To do this, he profiles the operation codes executed by the Zend virtual machine and checks which ones it makes sense to compile. (depending on your configuration) .
When a particular opcode is compiled, it then delegates the execution to this compiled code instead of being delegated to the Zend VM. It looks like the diagram below:
Stream of PHP interpretation with JIT. If they are already compiled, the opcodes do not execute through the Zend VM.
Thus, the Opcache extension has a couple of instructions that determine whether a particular operating code should be compiled or not. If so, then the compiler will convert it to machine code using DynASM and execute this new generated machine code.
Interestingly, since the current implementation has a megabyte limit for compiled code (also customizable), code execution should be able to seamlessly switch between JIT and interpreted code.
By the way, this talk of Benoit Jacquemont about JIT from php VERY helped me figure it out in all of this.
I’m still not sure in what specific cases the compilation takes place, but I think that I don’t really want to know it yet.
So, probably, your performance gain will not be colossal
I hope that it’s much clearer now, WHY everyone says that most php applications will not get much performance benefits from using the Just In Time compiler. And why recommending Ziva for profiling and experimenting with various JIT configurations for your application is the best way.
Compiled opcodes will usually be distributed among several requests if you use PHP FPM, but this still will not change the rules of the game.
This is because JIT optimizes processor operations, and currently most php applications are more dependent on I/O than on anything else. It doesn’t matter if processing operations are compiled if you still have to access the disk or network. The timings will be very similar.
You are doing something not related to input/output, for example, image processing or machine learning. Anything other than I/O will benefit from the Just In Time compiler. This is also the reason why people now say that they tend to write more native PHP functions written in PHP rather than C. The overhead will not be dramatically different if such functions are compiled anyway.
Interesting time to be a PHP programmer...
I hope this article was useful for you, and you were able to better understand what JIT is in PHP 8. Feel free to contact me on Twitter if you want to add something that I could forget here, and do not forget to share this with your fellow developers, this will undoubtedly add a little benefit to your conversations! CDMY0CDMY
PHP: static code analyzers