Performance optimizations with Doctrine

Antonio Turdo
3 min readAug 10, 2021

In Symfony (or plain PHP) projects Doctrine allows you to focus on domain concepts and implement application use cases quickly, letting the library generate both write and read SQL queries.

Lazy loading

One of the mechanisms used by Doctrine is lazy loading. When we access, for the first time, an association present in one of our entities, Doctrine performs, behind the scenes, the db query necessary to retrieve the data and constructs the objects referred by the association.

This approach involves a large number of queries, especially for complex domains, where entities have many associations. The problem is magnified for use cases that operate on a collection of objects, such as a list page.

Processing time can increase due to the large number of queries, but despite this, for many applications this mechanism works reasonably well. In some cases, however, it might be appropriate to perform optimizations, because the processing time become unacceptable or simply to improve the user experience.

Scenario

To make an example, let’s assume that our domain concerns books, and that these are modeled by an entity we show an excerpt.

Optimized query

Our use case requires to retrieve a list of books, and for each of them, we will need both the author and all identifiers. We could use the findBy method of the Doctrine base repository, but to reduce the processing time we write an ad-hoc method:

In this implementation, where we omitted any filtering and paging for simplicity, we added a join to the *-to-one “author” association, and asked Doctrine to hydrate these objects as well, adding the alias of the BookAuthor entity in the from clause.

In this way we reduced the number of queries to only one, while with lazy loading we would have carried out n + 1 queries, where n is the number of books returned by the main query.

Adding also the join towards the *-to-many “identifiers” association would have meant, due to the nature of the relational db and the SQL language, duplication of the data of books and their authors, with a consequent increase in the transfer time from the DB to the application, so it is generally not a good solution.

If it is known that the association has a very low average multiplicity, it is however a hypothesis to be taken into consideration, but unfortunately it is not possible to set a limit value of the average multiplicity below which this solution is more efficient, because it depends on several factors.

In our example, however, we have chosen not to add the join towards the *-to-many association, but in the method presented above we call this other one instead:

The use of “partials” allows us to minimize the amount of duplicate data transferred from the database, while the inclusion of both books and identifiers in the “select” clause ensures that the latter are hydrated and linked to the instances of the books already in memory. It is not even necessary for this method to return results.

Filtered associations

If, for each instance of the main entity, we are interested in a single element (or a very small number of elements) of the *-to-many association, we can rewrite the entire functionality as follows:

In this case we have selected only the ISBN identifiers, eliminating the problem inherent the *-to-many associations highlighted above, since in our domain for each book there is only one ISBN.

The condition that filters the *-to-many association must be included in the join clause. In fact, if we had added it in the where clause, we would have excluded from the query results the books that do not have an ISBN identifier.

It is important to point out that Book instances in memory will be different from those stored in DB, in general a situation to avoid, so it is recommended to adopt this solution only when performance is a critical element of the application. Furthermore, it seems appropriate in these cases to highlight this choice and its consequences with comments on the code.

Partial objects deprecated

Partial objects have recently been deprecated, because usually such objects break invariants, but the use shown here does not present this problem, so Doctrine will probably provide a different syntax to achieve the same result.

Conclusions

When we need to improve performances of read queries with Doctrine, we can replace lazy loading with ad hoc queries that perform joins towards low multiplicity associations, while for higher multiplicity associations, through the use of partial, we can still make it possible to perform, in an efficient way, a single query for each table involved, regardless of the number of objects fetched.

--

--