PDFF : PHP Document File Format details
In the first episode, we have presented the origin of the PDFF: how it emerges to be a convenient format to describe PHP component, with a good level of details, some versioning and a dual-readability human/machine.
In this second episode, we’ll introduce the format and the content of such a file. In the end, there is a repository with a copious amount of PHP components and extensions, ready to be tested. You can get a few of them there to illustrate this document.
In the meantime, let’s do a in-depth look at the PDFF.
How : machine and human readable
To keep things simple and processable, the underlying format is JSON.
This is a format flexible enough for the varied structures that will be stored. The parsers and encoders are wildly available. And the PRETTY/COMPACT presentations provide different presentation for different usages.
Inside the PDFF : main branches
The dataset is represented as a tree. Indeed, this is going to be a big tree. Let’s climb the tree. Here are the first levels
- name
- vcs
- handle
- versions
- 0.3.0
- \
- constants
- functions
- traits
- classes
- interfaces
- enums
- \Zttp\
- constants
- functions
- traits
- classes
- interfaces
- enums
- \
- 0.3.0
At the top, there are some administrative information, including the name
of the component, the way it was cloned vcs
and the actual URI to clone it. handle
may be a URL for git
or a component identifier for composer
, for example.
Then, the largest field is the versions
field, which contains the code details. This field contains one object per version, with the version name as property name. Here, it is 0.3.0
. This hash structure allows for several versions in the same file, although one is only displayed.
Inside a specific version, the next level are the namespaces. The global namespace \
is always available, then all the namespaces from the component are detailled, one after the other. The namespaces are not nested, so \A\
, \A\B\
and \A\B\C\
are all distinct entries, at the same level. This is close to PHP’s handling of them, and also, different from the storage of files in a file system.
For each namespace, all the declared structures are listed, by category. Namely, constants, functions, classes, enumerations, interfaces and traits.
This is already quite a large tree. For a small component, there might a few namespaces, but for large frameworks, there may be a over a thousan : Akeneo, Symfony and Shopware, all clock over 1300 namespaces.
Now, let’s review the different elements. We’ll go gradually, introducing the general and specific elements of each category. By the end, there will be some repetition, that will allow us to speed up.
Global constants
Constants are a hash, based on the name of the constant. They have a name
, and a value
property, as expected. They also have an array phpdoc
, for all the phpdoc comments. There are no attributes, as they are not supported by PHP.
The expression
property is a boolean : it is true when the definition of the constant is a static constant expression, or false when it is a literal value. For example, const A = B + 1;
has a true expression
, and a piece of PHP code for value.
- constants
- NAME :
- name
- phpdoc []
- expression
- value
"constants": { "WP_DEFAULT_THEME": { "name": "WP_DEFAULT_THEME", "phpdoc": [], "expression": false, "value": "'twentytwentytwo'" }, "WP_DEBUG": { "name": "WP_DEBUG", "phpdoc": [], "expression": false, "value": "false" },...
Functions
Functions’ description is a bit more complex than constants. In particular, there is a second layer, with parameters.
- functions
- name :
- name
- returntype
- reference
- returntypes []
- parameters []
- totalParameters
- optionalParameters
- variadic
- attributes []
- phpdoc []
The name used for index is in lowercase format : it makes it easier to look up functions that way. The actual name, with its casing, is stored in the name
property.
returntype
is the list of types, returned by the function: they are provided as fully qualified names, all in lower case. This might be an empty array, when no returntype is provided. The type of the returned typehint is stored in the returntype
property. It may be one
(single or no type), or
(union type) or and
(intersectional type).
Parameters are stored in an array of objects, with another level of details. We’ll see them in the next section. That array is complemented with the number of totalParameters
and the number of optionalParameters
.
Functions are also augmented with a variadic
property: this one is not explicitely expressed in PHP code. It means that one of the arguments (the last, for sure), is a variadic argument, making the whole function callable with an arbitrary number of elements.
Finally, phpdoc
and attributes
, which collect the corresponding structures from the source code. The attributes are actual PHP code.
"functions": { "tap": { "name": "tap", "returntype": "one", "reference": false, "returntypehints": [], "parameters": [... ...], "totalParameters": 2, "optionalParameters": 0, "variadic": false, "attributes": [], "phpdoc": [] }
Parameters
Parameters are an extra level of description. They have their own options and descriptions.
Parameters are stored as an array. The positions are the actual rank in the function signature, unlink constants and function which use their name as index.
The actual description of each parameter has obvious options : name
, rank
, reference
, variadic
, phpdoc
, attributes
, typehinttype
and typehints
. Typehints follow the same organisation than for the function return typehints, except for the name itself.
Default values for parameters are build around three entries : hasDefault
, which defines if there is actually a default value or not; that prevents confusion between null
(no default value) and null
(default value is null). As for constants, there is an expression
entry to identify constant static expression in default values. Lastly, default
is the default value.
- parameters
- name
- rank
- variadic
- reference
- hasDefault
- default
- expression
- typehinttype
- typehints []
- phpdoc []
- attributes []
"parameters": [ { "name": "$value", "rank": 0, "variadic": false, "reference": false, "hasDefault": false, "default": "", "expression": false, "typehinttype": "one", "phpdoc": [], "typehints": [], "attributes": [] }, { "name": "$callback", "rank": 1, "variadic": false, "reference": false, "hasDefault": false, "default": "", "expression": false, "typehinttype": "one", "phpdoc": [], "typehints": [], "attributes": [] } ],
Classes
Classes have the largest amount of data : some of them are already described in the previous structures, which we will mention and skip.
- classes
- name
- name
- final
- abstract
- readonly
- extends
- implements []
- traits []
- attributes []
- phpdoc []
- constants []
- properties []
- methods []
Classes are indexed by their name, in lowercase, for easy look up. Their actual name and case are stored in the name
property. Each class has abstract
, readonly
and final
as boolean attributes; phpdoc
and attributes
are similar to the one in functions or parameters.
Then extends
as a single fully qualified name, and implements
and uses
as arrays of fully qualified names. All those are the dependencies of the class. uses
includes the conflict resolutions details (not described here).
Then, a class holds arrays of constants
, properties
and methods
.
"classes": { "zttp": { "name": "Zttp", "abstract": false, "final": false, "extends": "", "implements": [], "uses": [], "usesOptions": [], "attributes": [], "phpdoc": [], "constants": [...], "properties": [...], "methods": [...]
The constants
array is very similar to the one for global constants, except for the final
and visibility
properties. The latter one is a string, with private
, protected
, public
and none
.
The methods
array is similar to the functions
one, except for the static
and visibility
properties.
Properties
The property array is indexed by the property name. There are booleans for static
, readonly
; the couple typehints
and typehinttype
for typehints; visibility
string and the triplet init
, hasDefault
and expression
for the initialisation value; and finally the phpdoc
and attributes
entries.
"$request_type": { "name": "$request_type", "visibility": "protected", "init": "", "static": false, "readonly": false, "hasDefault": true, "expression": false, "typehinttype": "one", "typehints": [], "phpdoc": [ { "phpdoc": "\\/**\n\t * Action name for the requests this table will work with.\n\t *\n\t * @since 4.9.6\n\t *\n\t * @var string $request_type Name of action.\n\t *\\/" } ], "attributes": []
Traits
Traits are a simpler version of classes. The uses
entry lists the other traits that are used by the current one, as an array of fully qualified names.
- traits
- name
- name
- uses
- properties
- methods
- phpdoc
Interfaces
Interfaces are a simpler version of classes. The extends
entry lists the other interface that is extended by the current one, as a fully qualified name.
- interfaces
- name
- name
- extends
- constants
- methods
- phpdoc
Enums
Enumerations are a similar to classes, except for the cases and typehints. Typehint are either string
, int
or empty. Cases are build similarly to constants.
- enums
- name
- name
- typehint
- constants
- methods
- cases
Conclusion
This quick presentation of of the PDFF format introduced the organisation and the different levels of information stored there. Most of the entries come naturally from the source code, with two exceptions : some extra entries are needed to keep the description acurate, like typehinttype
, which would be presented as |
, &
or “ (nothing). The second difference is that all options are always presented, while PHP code would simply skip them and keep it the source uncluttered.
To take a look at this format in more detail, go to the public repository exakat/pdff. In the vcs
folder, there are frameworks and libraries; in the packagist
folder, there are components, and in the ext
folder, there are PHP extensions. Each are detailed per versions. You can download them, and use them as you like.
In the next part, we’ll review where the PDFF format can help, both for machines and humans.
Until then, keep auditing your code!