We’ve all been there. Just about anyone who writes PHP code has, at some point, encountered some that was written several years ago (or, worse, recently) that, instead of some more appropriate data structure, uses arrays or arrays-of-arrays (and, if you’re really unlucky, arrays-of-arrays-of-arrays) to represent just about anything under the sun. Got a row from the database? That’s an array. Got a configuration item? That’s an array. Got something that can’t be represented as an array? Well, someone’s probably going to try and stuff it into one anyways.
Now, before going any further, I’m going to caveat this quite heavily. PHP arrays are not, in and of themselves, a Bad Thing™. The problem arises from the PHP community’s historical and, in many ways, ongoing abuse of arrays as its sole data structure. In a language that got decent support for object-oriented programming comparatively late in its lifecycle, and had low community adoption of it for many years after that, the instinct to see arrays as just about the only tool in the toolbox is very understandable. Problem is, times have changed — virtually nobody uses PHP4 anymore, and even PHP5, which has proper objects, is hitting end-of-life. And, in the modern day, writing or maintaining an application that uses arrays of arrays of primitives as its core datatype is an exercise roughly akin to pulling teeth sans anaesthetic.
I’m also going to leave a short note for the theoretical purists in the room (of which I, myself, am one). Though I am well aware of the difference between arrays and maps, due to how PHP swirls them together into a single primitive type, I use the term “array” exclusively here to avoid any ambiguity.
So How the Heck Did We Get Here, Anyways?
While anyone who’s been in the PHP game long enough probably knows the answer to this anyways, I’m gonna give a short history of arrays and complex datatypes in PHP for those who may have come in later and are wondering why their particular legacy app sucks so much.
Well, the short version is that, prior to PHP 5 (initial release: 2005), objects (and object-oriented code) were essentially crudely-fashioned bolt-on features to the language. There were no interfaces, no way to declare anything as being private or protected, and basically none of the more powerful object-oriented abstractions that we take for granted in languages like Hack, Java, and C++. This, ultimately, made PHP 4 objects little better than repositories for procedural code, and frameworks and tools from this era tended to reflect this. This was true even after the release of PHP 5, as PHP 4 continued to have a dominant effect on how the community thought and approached problems (and even outlived a couple of minor versions of PHP 5).
But what does all this have to do with arrays, you may ask? The answer’s actually pretty simple — if your objects are basically glorified arrays anyways, you might as well dispense with the overhead and just write old-school procedural code using language primitives as your main datatype, maybe using objects for some minor attempt at code organization. Many frameworks (and many programmers) from the mid-to-late 2000s took this approach, in part inspired by Ruby on Rails and its ActiveRecord-style design. Among the most famous of these is of course the ubiquitous CakePHP, which, in its early years, essentially enforced the use of associative arrays as a core data structure for projects using it.
But I Still Don’t See the Problem — What’s Wrong With Arrays, Exactly?
Like I’ve said previously, there’s nothing wrong with arrays when used judiciously and in the correct circumstances. If you need to provide a collection of objects (or primitives) of the same type, then an array is an appropriate data structure to use. But those are essentially the only scenarios in which arrays are the right choice. Using them in other places almost inevitably makes for incredibly painful maintenance, especially a year or two down the line. Here’s a few (annotated) examples of programming-with-arrays code smells:
- Checking if an array contains a certain key or keys with isset(): if you’re doing this, odds are that your “array” has a complex enough set of behaviours around it that it should actually be an object with instance methods.
- Nested associative arrays: Using a map is fine, if you’re trying to refer to items of the same type by keys. But, if you’ve got nested arrays in PHP, your values are AUTOMATICALLY and IRREVOCABLY no longer of the same type, and a whole lot of additional developer effort must be expended on defensive programming to ensure they’re at least similar enough.
- Parallel arrays: This one’s almost inevitably a dead giveaway you’re doing something wrong. If you’re passing multiple arrays into a function, each of which contains corresponding keys and values, then you really ought to create some proper objects to manage those values (and the behaviours that affect them).
Starting to see the connection yet? The problem with arrays in PHP, ultimately, is loose typing: the inability to guarantee that one member of the array will be the same as any other (at least in a programmatic sense), forcing all sorts of defensive-programming shenanigans in order to ensure you won’t get an int or null or nonexistent key where there really shouldn’t be one. In code that uses “naked” arrays as its core data structure, I’ve often seen it that 50–75% of the code flow is “defensive” — i.e. the developer trying to ensure that the array does, in fact, match his preconception of what it should contain. All this extra validation code, to do something that can be done by the language runtime, produces huge maintainability headaches and code-readability issues, no matter how well you structure it.
OK, I See Your Point. But What Can I Do About It?
Fortunately, there are multiple ways to deal with this when writing code. The 0th step, however, is both required and straightforward: stop using arrays unless they’re used as collections of a non-array type. That’s what arrays are typically for, and using them as collections of arrays is probably how you got into this mess in the first place. Obviously there’s exceptions to this, but the general rule is that you should think twice before typing array() or .
OK, but what next?
Good question. The first step is to conceive (or, in the case of existing projects, re-evaluate) your data structures, and the ways they interact with the application, in terms of objects. And no, not just objects where every value is public and can be modified by every method it passes through. Instead, at the very minimum, provide as few getters and setters as possible for the behaviours your object needs to provide. Above all, though, encapsulation is king — providing a single, defined type for a given data structure, with a defined set of behaviours that follow it around (rather than being globally defined somewhere).
After that, it’s time to review how your data structures are actually passed around, and where you can enforce some language-level constraints. PHP 5.0 added what were then known as type hints for just this reason — to allow the language runtime itself to verify that, yes, it was getting the correct input. PHP 7.0 took this a step further, adding what are known as scalar type hints, which do the same thing for primitive types (e.g. ints and strings). So, what’s to be done? Simple: type hints, everywhere. Not only does it make your code more self-documenting, it prevents whole classes of bugs and eliminates a lot of the requirement for defensive programming, through the structure of the language itself.
Now, I’ll admit that there’s one particularly annoying bit about doing this: collections. It’s often a good idea to enforce that a function should take a collection of a certain type, but as of PHP 7.2, the strongest type hint you can give for such an argument is simply that it should be an array. This, in and of itself, is not sufficient for many purposes (see previous). Fortunately, there’s a few different ways to deal with this, depending on your precise use cases.
Variadic Function Arguments and the “Splat” operator
First off, if your function only needs to accept a single collection of a given type, in PHP 5.6 and better you can employ variadic (variable-length) function arguments and the “splat” (array unpacking) operator:
This has the upside that it’s easy and, in PHP 7.0 and greater, can also deal with scalar types. The downside is that you can only employ one of these in a given function’s arguments list, and it has to be the the last thing in the argument list. If it’s not, the result is uncomputable (and, in PHP, gives you a parse error). It also requires you to remember to use the splat operator in any invocations of your function, though this is only a small minus as you’ll get a TypeError pretty quick if you’re just passing a naked array.
Custom Collection Types
The alternative is, of course, custom-defined collection types. These classes will typically ArrayAccess, and provide a strongly-typed collection of a given subtype. They can then, of course, provide just the type they’re meant to contain, and nothing else. This method, unfortunately, has the downside of requiring, in many cases, boilerplate code — if nothing else, here’s a simple example of a strongly-typed collection in PHP:
This class can, because it implements ArrayAccess, be accessed just like an array can. Implementing IteratorAggregate then allows you to plug it into a foreach() loop without any issues, and you’ll get every item from $values in turn. But let’s face it, writing custom collections is kind of a pain in the behind, and even though I prefer this pattern when writing strongly-typed PHP code I still don’t like using it. One thing that might make this kind of boilerplate less painful might be to abstract it out into a base class, and I highly recommend doing so, as this pattern provides the strongest type guarantees for collections.
There is also, for some use cases, a third option: generators. Since PHP 5.5, the language has included support for constructs like this:
Now, a generator isn’t like an array. It can be plugged into a foreach() like an array can, but that’s about where the similarity ends. What a generator is, ultimately, is a single stack frame (like a function call), but frozen in time. Every time the next value is retrieved, the stack frame is retrieved from memory and advanced until it hits the next yield. What this means in practical terms is that generators are PHP’s model of lazy evaluation — you can provide a theoretically infinite number of elements at a constant memory cost. And, because a generator is immutable after being created, you can provide much stronger guarantees about what it’s returning, without requiring type hints. The big downside to generators is that they’re not “rewindable” the same way arrays and custom collections are — once you’ve iterated over the generator, it’s gone for good, and can’t be iterated over again.
OK, So Where Does That Leave Me?
Well, that’s really it. There’s no “best way” to deal with a legacy app that uses arrays everywhere. Sorry. What you can do, though, is to avoid using arrays-of-arrays wherever possible, and consider other data structures instead. I’ve presented a few options that have served me well, but it’s really up to you and whatever corporate overlords you serve to decide which ones are best for you and your project. Good luck!