Semantic typing
Semantic typing is an old practice, where the name of the parameter would also tell what its type is. It is typing, because a $string
is supposed to be typed, and it is semantic, because only the human reader is actually using the meaning : PHP doesn’t really care.
The interesting part is that the practice of semantic typing still exists nowadays. Obviously, it has taken the backseat to actual typing, for one good reason : naming a parameter with its type is akin to passive documentation, and those who read documentation are too rare.
Funny typing
Funny typing happens when a parameter has a type name, but its type is different. For example,
<?php
function foo(string $array) { /**/ }
?>
While this is totally legit PHP code, it is also quite weird. Who would call their parameter with one type, yet type it with another one?
As usual, when there is a mean, there is a will. So, I ran an audit over 2700+ PHP open source projects, and collected stats about parameters with a scalar name ($string
, $int
…), and check their related type. Not PHPDoc type, but an actual associated type.
For example, $string
is typed string
over 98.3% of the time, but sometimes, just sometimes, it is also typed array
(1.6%) or bool
(0.1%). Interestingly, a $string
is never a float.
All other scalar types have the same behavior : they usually bear their eponymous type ($array
is most often an array
), but they also carry a different type, such as bool
or string
.
Interestingly, a $float
is never an array
, while a $bool
is never an array
(the opposite is not true). Apparently, there are subjective limits to stretching types.
This has been made an exakat rule, with extension to properties. If you are afraid your code might sport such typo, you can run an audit.
Common parameter names
While reviewing those funny typed parameters, I also extracted the most common names for typed parameters. Here are the first 100:
- $postBody
- $event
- $request
- $command
- $node
- $config
- $query
- $subject
- $parent
- $item
- $requestBody
- $result
- $context
- $value
- $entity
- $object
- $a
- $type
- $user
- $factory
- $model
- $b
- $data
- $message
- $source
- $options
- $e
- $repository
- $response
- $client
- $field
- $other
- $manager
- $filter
- $warning
- $container
- $collection
- $provider
- $service
- $cache
- $extensionAttributes
- $file
- $action
- $element
- $dao
- $c
- $handler
- $image
- $parser
- $validator
- $configuration
- $resource
- $params
- $builder
- $exception
- $services
- $target
- $metadata
- $storage
- $connection
- $id
- $component
- $form
- $child
- $req
- $loader
- $status
- $logger
- $entry
- $token
- $page
- $group
- $inst
- $definition
- $document
- $input
- $template
- $table
- $error
- $rule
- $settings
- $generator
- $registry
- $class
- $stmt
- $repo
- $from
- $key
- $formatter
- $reader
- $resolver
- $category
- $controller
- $instance
- $property
- $expected
- $helper
- $n
- $session
- $name
$a
, $b
, $c
, $n
, and $e
are the most common one letter name. $postBody
is the most common parameter name of all, though being typed does help its ranking : it is not the most common parameter name. Note also that $requestBody
is ranking high too.
Later, $repository
and $repo
are both quite often used, and representing the same reality : it’s just that the last one is shorter than the former. Also, quite some vague names, such as $params
, $value
, $data
, $message
or $source
are used, and type.
Common varied types
Once type is added to a parameter, there is now a new couple in town : the parameter name, with its meaning, and the type itself. As such, it is interesting to look at 2 populations of typed parameters : the one that get a lot of different types, and the one that gets always the same type.
Always typed the same
When a parameter gets the same name and type, across 100 method definitions or more, you can expect semantic typing to be at the root of the behavior: everyone recognize that value, and how it should be represented.
Take a look at the list below, which shows the name of a method parameter, and its expected type : can you guess what is that type, simply reading the variable name?
Is it obvious that $weak
should be a boolean?
- $allWords (\bool)
- $sqlWalker (\doctrine\orm\query\sqlwalker)
- $isRoot (\bool)
- $replaceextrasymbols (\bool)
- $weak (\bool)
- $isLower (\bool)
- $scriptProperties (\array)
- $pathinfo (\string)
- $use_transliterate (\bool)
- $isVariadic (\bool)
- $altNumbers (\bool)
- $fkConstraint (\doctrine\dbal\schema\foreignkeyconstraint)
- $asOrigReplaceArray (\bool)
- $prenormalizeds (\array)
- $savePath (\string)
- $sessionName (\string)
- $httpsPort (\int)
- $useAttachment (\bool)
- $useShortAttachment (\bool)
- $showArguments (\bool)
- $httpPort (\int)
- $rdata (\array)
- $internalErrors (\bool)
- $emailLexer (\egulias\emailvalidator\emaillexer)
- $arrayAdapter (\symfony\component\cache\adapter\arrayadapter)
- $other_keys (\array)
- $other_members (\array)
- $codePaths (\array)
- $enableIfStandalone (\callable)
- $extra_args (\array)
- $localVault (\symfony\bundle\frameworkbundle\secrets\abstractvault)
- $other_values (\array)
- $other_args (\array)
- $reverseContainer (\symfony\component\dependencyinjection\reversecontainer)
- $testMethod (\string)
- $wrappedDumper (\symfony\component\vardumper\dumper\datadumperinterface)
- $storageKey (\string)
- $transportName (\string)
- $preloaded (\array)
- $hasChild (\bool)
- $inlineServices (\array)
- $invalidBehavior (\int)
- $isNested (\bool)
- $byConstructor (\bool)
- $lille (\symfony\component\dependencyinjection\tests\compiler\lille)
- $maxlifetime (\int)
- $maxItemsPerDepth (\int)
- $metaBag (\symfony\component\httpfoundation\session\storage\metadatabag)
- $cloneArguments (\bool)
- $realInstantiator (\callable)
- $callAutoload (\bool)
- $dumpKeys (\bool)
- $callOriginalConstructor (\bool)
- $callOriginalClone (\bool)
- $hashedPassword (\string)
- $utf8 (\bool)
- $srcContext (\int)
- $sessionOptions (\array)
- $keepArgs (\bool)
- $autoEtag (\bool)
- $autoLastModified (\bool)
- $endOfValue (\bool)
- $returnResult (\bool)
- $isConstructorArgument (\bool)
- $includeContextAndExtra (\bool)
- $isMatch (\bool)
- $pathSeparator (\string)
- $noBuiltin (\bool)
- $remoteAddr (\string)
- $requestUid (\string)
- $rightTrimString (\bool)
- $refs (\array)
- $convertEmptyStringToNull (\bool)
- $vault (\symfony\bundle\frameworkbundle\secrets\abstractvault)
- $getEnv (\closure)
- $dbi (\phpmyadmin\databaseinterface)
- $dbForProject (\utopia\database\database)
- $willBeAvailable (\callable)
- $joinPoint (\neos\flow\aop\joinpointinterface)
- $watcherId (\string)
- $baseApiUri (\oauth\common\http\uri\uriinterface)
- $bookSlug (\string)
- $betterNodeFinder (\rector\core\phpparser\node\betternodefinder)
- $nodeNameResolver (\rector\nodenameresolver\nodenameresolver)
- $nodeTypeResolver (\rector\nodetyperesolver\nodetyperesolver)
- $phpDocInfo (\rector\betterphpdocparser\phpdocinfo\phpdocinfo)
- $phpDocInfoFactory (\rector\betterphpdocparser\phpdocinfo\phpdocinfofactory)
- $reflectionResolver (\rector\core\reflection\reflectionresolver)
- $typeKind (\string)
- $aliased_classes (\array)
- $authComponent (\authcomponent)
- $suppressed_issues (\array)
- $uniqueName (\string)
- $handler_id (\string)
- $a_adt (\iladt)
- $default_renderer (\ilias\ui\renderer)
- $coreRegistry (\magento\framework\registry)
- $fetchStrategy (\magento\framework\data\collection\db\fetchstrategyinterface)
- $moduleDataSetup (\magento\framework\setup\moduledatasetupinterface)
- $telemetryInfo (\phpunit\event\telemetry\info)
Parameters with an ending ‘s’ usually leads to an array ($aliased_classes
, $sessionOptions
), when the parameter name is a noun. When the parameter name includes a verb, then it is a boolean ($cloneArguments
, $dumpKeys
).
boolean
are related to intend, with usage of small words : $isAbsolute
, $forConstructor
, $noBuiltin
). That way, $willBeAvailable
stands as an exception, being a callable.
string
covers a lot of nouns : $handler_id
, $uniqueName
, $bookSlug
, $requestUid
, $hashedPassword
(for that last one, both password
and hashed
would also hint at string).
A total of 258 parameter names were detected.
You never know what is in there
On the other side of the spectrum, there are parameters which may be, well, basically anything. Some of them have been detected with over a thousand different types, across all their usages. Here is their ranking, by number of different type detected.
- $postBody (2352)
- $event (1654)
- $request (853)
- $command (717)
- $node (550)
- $config (462)
- $query (395)
- $subject (347)
- $parent (302)
- $item (300)
- $requestBody (300)
- $result (299)
- $context (294)
- $value (290)
- $entity (283)
- $object (270)
- $a (261)
- $type (256)
- $user (251)
- $factory (248)
- $model (247)
- $b (234)
- $data (232)
- $message (231)
- $source (231)
- $options (229)
- $e (212)
- $repository (211)
- $response (207)
- $client (198)
- $field (197)
- $other (191)
- $manager (186)
- $filter (172)
- $warning (171)
- $container (169)
- $provider (168)
- $collection (168)
- $service (163)
- $cache (162)
- $extensionAttributes (161)
- $file (154)
- $action (153)
- $element (152)
- $dao (152)
- $handler (150)
- $c (150)
- $image (149)
- $parser (146)
- $validator (145)
- $configuration (144)
- $resource (143)
- $params (141)
- $builder (140)
- $exception (137)
- $services (135)
- $target (134)
- $metadata (131)
- $storage (130)
- $connection (128)
- $id (124)
- $form (117)
- $component (117)
- $child (116)
- $req (113)
- $logger (111)
- $status (111)
- $loader (111)
- $token (110)
- $page (110)
- $entry (110)
- $group (108)
- $inst (108)
- $definition (107)
- $document (107)
- $input (106)
- $template (106)
- $table (103)
- $error (102)
- $rule (102)
- $settings (100)
- $generator (97)
- $registry (96)
- $class (95)
- $stmt (95)
- $repo (93)
- $from (93)
- $key (92)
- $reader (92)
- $resolver (92)
- $category (92)
- $formatter (92)
- $controller (88)
- $instance (88)
- $expected (88)
- $n (88)
- $helper (88)
- $property (88)
- $session (88)
- $name (85)
To reach 85 types, $name
had to use more than just scalar types : string
is expected (at least, by me), but many other classes and interfaces are used, to encapsulate what is a name. Since names are quite a common concept, used to distinguish people, services, brands, models, Debian versions, and else, it is a common that $name
require disambiguation (dixit Wikipedia).
Such parameter names should be avoided: they are quite generic, and may raise questions like : how to I concatenate this name in a string? or other common obvious usage.
Indeed, some parameter names are very generic, and lead naturally to many types : $name
, $param
(sic), $instance
, $collection
, $factory
, $entity
, $other
. $a
, $b
, $c
, and $n
are back in the list, just like $e
: this last one is used for exceptions catching, which leads to many different types when propagated to other methods, as a parameter.
Naming, types and semantics
Semantic type establishes a direct relation between types and the words used to build a parameter name. This is an old practice, coming from the early ages of PHP : to keep the code readable, the type is ingrained in the variable name. This was an age with less features than today.
Nowadays, types have a life of their own, and yet, this antiquated behavior is still alive. It is possible to guess the type of a variable, simply by reading it aloud. It is also possible to recognize that another variable might have a very wide range of types, and should not be expected to be one or the other. Surely, after discovering which type is actually used, it will be important to read the rest of the documentation to know how to display a simple $path
: it is not a string.
The most common types are definitely the scalar, which are native to every recent PHP version. This analysis covers a vast array of PHP projects, with various backgrounds, including underlying frameworks. Each of them introduce specific classes to support specific concepts, such as URL or names. It would be interesting to see how this semantic typing apply, depending on each communities.