Committing The Composer Vendor Directory

13th September 2020

When installing composer dependencies those dependencies are downloaded and stored in the 'vendor' directory. There are options available to install this into a different location than the composer.json file, but it's generally found in the same directory.

The vendor directory contains quite a bit of code, which is especially the case if your project contains quite a lot of dependencies. More often than not though, this directory will not contain any code that you have actually written. It will contain the necessary third party libraries that allow your code to work correctly.

I have always been told not to commit the vendor directory, and I have followed this advice for a number of years. This makes sense when building open source libraries as it can quickly create confusion if your library contains a popular library and that library is included again by other libraries. Having a codebase with different versions of libraries can quickly lead to confusion and even disaster.

But what about projects? Recent discussions with some other developers has lead me to question if committing the vendor directory is always a bad idea with projects, so I thought I would go through the pros and cons of doing this. Whilst committing the vendor directory to your codebase is a terrible idea for libraries, it is not so clear cut when building a project using those libraries. In this instance there would be only one vendor directory in the codebase, and the end goal (be it a website or other tool) needs to contain those libraries in order to function.

Committing The Vendor Directory

According to the composer website, best practice for composer based projects is to not commit the vendor directory. The site states that committing the vendor directory can lead to the following problems.

  • Large VCS repository size and diffs when you update code.
  • Duplication of the history of all your dependencies in your own VCS.
  • Adding dependencies installed via git to a git repo will show them as submodules. This is problematic because they are not real submodules, and you will run into issues.

Let's visit each of these items in turn.

Large VCS repository size and diffs when you update code.

A large repository size isn't really the end of the world as storage is fairly cheap, but does it really bloat things that much? To test this I looked at my own (Drupal 9) site codebase, which doesn't contain any vendor directory or third party code (other than some configuration items). If I check out my site project into an empty directory the entire project is around 90MB. After running a composer install I this becomes 150MB. Whilst this is a an increase, it's still not causing many problems as none of the individual files are larger than 2MB (the largest file being a font file). It is still bad practice to commit things like database backups to your repo as that tends to make git and sites like GitHub complain quite a bit.

The diffs though, that is a problem for me. Let's say that you are running a typical Drupal site with 20 to 30 contributed modules or a Symfony project with a few dependencies. Any updates to those modules would constitute changes in your repo and your commits would be the change to the composer.lock file and a whole load of code that you didn't write and don't maintain. The value of seeing all this code, for me at least, is somewhat minimal. Whilst it is a good idea to keep an eye on what your dependencies are doing, you should be far more interested in the code you and your team are adding to the system. Custom code, is where most of your bugs or security flaws will come from.

Duplication of the history of all your dependencies in your own VCS.

This isn't a problem in itself, but ties in closely with the first point. I think they are talking about libraries rather than projects. Committing vendor does make it difficult to look at the history of the project and see what is going on with your custom code. Comparing one version with another will introduce a lot of different changes that you didn't write.

Adding dependencies installed via git to a git repo will show them as submodules. This is problematic because they are not real submodules, and you will run into issues.

This can be a real problem. The issue they are talking about here is when composer downloads the dependency as a git repository, which means that your vendor directory contains a secondary git repository. Git tends to treat this as a submodule and gets a little confused. What tends to happen is that if you check out your git repository elsewhere the vendor directory will contain a broken git submodule. This has stung me a few times when sharing code with colleagues or even deploying the code. Before I found out about this problem I spent a long time scratching my head as to why my local contained the code, but the development site didn't.

Whilst this is painful, and has been a stumbling block for many developers, there are a couple of ways around this. Firstly, you can use the --prefer-dist composer flag when requiring or installing packages. This forces composer to download a archived version of the dependency and not the source code.

Also, by adding a post-update-cmd and post-install-cmd to your composer.json file you can force delete any downloaded git repositories before you attempt to commit them.

  1. "scripts": {
  2. "post-update-cmd": ["echo Delete all .git directories in vendor.", "rm -rf vendor/**/**/.git"],
  3. "post-install-cmd": ["echo Delete all .git directories in vendor.", "rm -rf vendor/**/**/.git"]
  4. },

Aside from those points there I think another problem is the committing of machine generated code. For example, let's say that a developer starts working on a project and needs to include another dependency. They run composer require and install the dependency and find that as well as the new package a load of other files have changed. Composer has a few internal files that it uses to map where things are within the vendor directory and any updates or additions to the packages updates these files. The developer would naturally commit these files as they were part of the change they just introduced. These changes would then form part of any pull request made to the project. Are these changes to composer generated files useful to inspect? Should they be included in code analysis tools? I don't think so.

There is also the problem about merging code. With lots of dependencies committed to your repo merging code into branches becomes a lot more tricky if a merge conflict happens. Without the vendor folder the only conflict you'll find is in your composer.json and composer.lock files, which are relatively easy to solve. With the source code of your packages present it can make for a more complex merge problem. I have often found that developers will see this merge conflict and attempt to solve it, but end up adding a broken or even commit a different version of the dependency to the source code. This can lead to your composer.lock file being out of sync with your vendor directory, which is quite dangerous really.

Finally, there is a difference between your dependencies and your development dependencies. When you commit your vendor directory it means that it is locked in. Due to the fact that you are probably working locally on the project you will likely install all dependencies. It is quite normal to run a deployment using the composer install with the --no-dev flag.

composer install --no-dev

Doing this prevents packages that aren't meant to be deployed to production being deployed into the live site. If, however, you have committed your vendor directory then you have no choice. All of your code must be deployed at the same time. Whilst this might not seem like a problem, there have been a few major security flaws over the years that were caused by code in development modules. Packages like phpunit and the Drupal Coder module had flaws that meant just being present on the server gave attackers a way of compromising the server and as they were development packages they shouldn't really have been deployed in the first place.

There are some positives to this approach though. It is easier to swap between versions as all of your dependencies are included in the codebase. If you have a problem you can revert to a previous version of the code easily without having to run composer install to get your dependencies back. This makes operations like git bisect a little easier to perform.

To sum up:

❌  Commits are messy and filled with third party code.

❌  Messy pull requests.

❌  Machine generated composer code that changes between versions and platforms.

❌  Merge conflicts are harder to solve.

❌  Composer.lock file doesn't always show what's in your system.

❌  No difference between dependencies and development dependencies.

✅  Easier to swap between versions.

✅  All code is in the repository and can be inspected.

✅  Easy to deploy.

 

Not Committing The Vendor Directory

The alternative to committing the vendor directory is, funnily enough, to not commit the vendor directory. This is considered the best practice approach to using composer and essentially means creating your project with the following .gitignore file.

vendor

If you are running a Drupal or WordPress site you'll likely have some other entries in there, but the core idea is that no third party dependency code is committed to your repository. When you require a new dependency the only thing that changes is the composer.json and composer.lock files. When you update a dependency the only thing to change is the composer.lock file.

This lack of third party code means that your repository is smaller but it does mean that when you want to work on the project you need to run composer install to download and install all of the dependencies. The same applies if you are installing locally or deploying to production. Installing dependencies this is fine, although occasionally I have found that composer install will fail for random reasons. Things like github being down or just random network issues can cause composer installs to fail. When this happens locally it's annoying, when this happens during a deployment then the deployment will fail. You will need a robust deployment process in order to allow for things like composer failing to install things.

I have also heard that installing composer dependencies can annoy sysadmins as they then have to deal with the bandwidth that this uses up. I see little difference between downloading composer dependencies from source or downloading them as part of the repo. You would still need to download the full size of the repository. In my opinion you shouldn't be using your production server to install composer dependencies, this should be done on a builder server and an artefact deployed to the production environment. The extra strain that running composer install on a production environment is absolutely not worth it.

Since you need to create a deployment process you can create one that doesn't deploy your development dependencies to production. Not only does this reduce the overall size of the project but it can prevent some security problems caused by third party testing and development code being present on production servers. If you want to install just the production composer packages make sure you run composer install with the --no-dev flag.

composer install --no-dev

Additionally, you can also add the --prefer-dist flag in order to further speed up the install process. Using --prefer-dist means that composer will try to download and unzip archives of the dependencies using GitHub or another API when available. This is used for faster downloading of dependencies in most cases. This method does not clone the whole repository and so you don't get the whole history of the project when you just need a specific version. Also, archives tend to be prepared in such a way that they don't contain any unnecessary files that aren't needed in dependencies, like .gitattributes for example.

This all goes towards having a robust deployment process that will handle any situation or problems that are present in the composer landscape.

What about reviewing all of the code in third part dependencies? Whilst this is a laudable act, I feel sorry for the developer who needs to review a pull request with thousands of lines of code containing a new third party dependency. It would be very easy to miss the security problem introduced by a single line of custom code buried in this mess. With the vendor directory not in the pull request it would be very easy to review the custom code and spot the problem.

It is actually possible to prevent security problems in composer packages without having to manually inspect every single line of code thanks to the security-advisories project. This is a packagist.org project that doesn't actually contain any code. It basically includes a composer dependency into your project that prevents you from including any other package that contains a know security problem. Just include this into your project like this:

composer require --dev roave/security-advisories:dev-master

Now, if you try to install a security issue in a dependency you would get a conflict and composer would simply refuse to install that package.

If you are using Drupal you should know that Drush also has a check you can use to spot modules that have a security problem, just run the following command to see if there are any security updates.

drush pm:security

You can tie with into your build process as this will return a fail status if there are security releases available.

I have also heard developers talk about third party code 'going missing', and what safeguards are in place for this. For example, let's say you were using a Drupal module and one day the maintainer of that module decided that they didn't want to maintain it and removed the module from the site. Now, when you run a composer install you will get a fail because of that missing module. This could be a Synfony component or other dependency, but the point stands that it's a possibility that a developer might remove their code from the internet and break your site. With the vendor directory not committed the code in that dependency is simply gone. In my opinion, however, the chance of this actually happening are very small. If this does happen then the code in that module or component then becomes essentially custom code and you should treat it as such. Find a backup of your site (you are taking backups, right) and move that code from the vendor directory into your src directory or wherever you are storing custom code. With no third party maintainer to maintain that project you are now responsible for that code as if it was your own.

To sum up:

❌  Less easy to see what third part code is actually in your project.

❌  Composer install can fail if any of the dependency sites are down.

❌  Dependencies can go missing.

❌  Need to setup managed deploys.

✅  Managed deploys are more robust.

✅  Merge conflicts are easier to solve.

✅  Cleaner commits.

✅  Cleaner pull requests.

✅  No development dependencies on production.

 

Conclusion

When I started thinking about this I realised that the reality of the situation isn't as clear cut as you would expect. Both methods have their pros and cons but in my experience committing the vendor directory has lead to more problems.

I have been involved in projects in the past that have used both systems. When I just started using Drupal 8 I didn't have a deployment system setup so I opted to commit the vendor directory so that deployments could mimic the Drupal 7 deployment style that I already had. In my experience in working with both styles I have found that committing the vendor directory leads to way more problems than not committing it. Aside from the potential security risk in committing development dependencies there is also the hassle of just managing the code.

Updates or additions to dependencies can cause a large amounts of change, with the vendor directory included in the repository this makes viewing commits or pull requests quite painful. I completely agree that it is important to understand the code contained in the system, but in reality the code your own team writes is more important and should undergo way more scrutiny. Whilst you should definitely keep up to date with the development of your dependencies and their updates and security realises, most development teams don't have the time to analyse every line of code in their dependencies.

Once I started using a more robust deployment mechanism I removed the vendor directory from the codebase. To my mind, committing the vendor directory feels like a lazy approach that could be solved by using a decent deployment tool (like Deployer for example).

I have also realised that not committing the vendor directory means that your composer files will not lie to you. You can use commands like composer show to see what packages are installed, but I have often found that with these reports are just wrong when the vendor directory has been committed. As a test I will delete the vendor directory and run composer install just to make sure that nothing has changed and I often find that there are significant changes throughout the codebase. These changes can stem from differences in the auto generated files that composer produces, but more often (and dangerous) the change is the code in the dependencies. The reasons for these changes are numerous but might be due to improperly managed merge conflicts or a developer committing the vendor directory but not the composer.lock file (which happens!). Try it for yourself. Delete your vendor directory from your repo and run composer install. If nothing changes I will buy you a beer.

I would love some more opinions on this, especially from industry professionals. Please leave a comment and let me know what you think. Do you commit the vendor directory? If so, why? What problems are you trying to solve? Is your team working more efficiently or do you find pull requests full of noise? Please let me know in the comments.

Comments

Permalink

@Grimreaper,

That is awesome. Thank you for that! \o/

philipnorton42 (Mon, 09/14/2020 - 09:44)

Add new comment

The content of this field is kept private and will not be shown publicly.