Rebuilder: How .CO Manages our Website Media Assets

Here at .CO, we’re constantly seeking ways to improve the overall user experience. I’m of the belief that one of the core components of a good user experience is page load time; both perceived and real. This is why I set out to create us an asset management system to fit our needs. When we first began developing the .CO Membership Program website, we knew we were likely going to be hit with short burst traffic from upcoming blog posts on launch. In order to minimize page load time, I followed general best practices for loading all JS dependencies in the footer. We’ve been adopting the practice of widgetizing all of our JS into small namespaced components. While this ultimately helps reduce code duplication and ensures we create reusable code, it leads to upwards of 10 JS dependencies on certain pages. We do the same thing with our CSS in an attempt to maintain the smallest set of reusable styling via page components.

Problems with Serving Small Components

The core problem we set out to solve was how to serve combinations of media assets, which we call bundles, to our users efficiently to reduce page load time. An easy way to accomplish this would be to combine, minify, and compress the files. Serving the fewest amount of static files possible has a number of benefits:

Reducing the number of HTTP requests, equating to less round-trips, less bandwidth, and ultimately faster page loads
Reducing the filesize of our assets via minification and gzip compression; smaller files leads to faster downloads
Browsers are generally configured to cache recently downloaded assets, so future page requests will serve our cached assets quickly

These benefits come at the cost of added complexity to our build process. Several questions come to mind when we consider necessary requirements for an asset management system.

How do we safely version control the media assets between our different development environments (i.e. local, development, staging, and production)?
How do we separate our local development of unminified, uncompressed assets and then convert those in alternate environments?
How do we deal with the possibility of different absolute paths to the assets within each environment?
How would we be able to seamlessly manage our assets in the cloud (S3/CloudFront) and be able to switch from serving local to cloud copies with the switch of a flag?

With each of these questions in mind, I started an iterative process of building out a tool to satisfy our requirements. This tool became known as Rebuilder.

Rebuilder to the Rescue

Rebuilder is what I consider to be an asset management library, but it could be so much more depending on your requirements. At it’s core, it’s a pipelined module queueing system with a default set of modules to manage your website media assets. The default modules included are CSSTidy, JSMin, Gzip, and S3, and Bundler. CSSTidy is a very basic implementation of CSS minification where we remove extraneous characters and lines. JSMin is even more rudimentary and accomplishes the same as CSSTidy but with JS files. It’s worth noting that our implementation of JSMin was modified to support the combining of unminified and minified files into a singular file. For example, we could combine a minified version of jQuery with an unminified jQuery plugin. Gzip is used to gzip compress our assets so they can be served up via Amazon S3 and Cloudfront more efficiently. S3 is our Amazon S3 library with a few tweaks to support creates, updates, custom headers, cache expirys, and our unique bucket pathing scheme (I’ll touch on this later). Lastly, we have Bundler… We’ll focus on discussing Bundler in depth as it wraps the functionality of all of the other built-in modules.

What is Bundler?

Bundler is the meat and potatoes of Rebuilder. It’s a wrapper around all of the other modules and handles the creation and management of what we refer to as asset “bundles” as well as all other static media resources. Bundler serves both server side and client side, as it comes packaged with both a command line tool and a client side library. The command line tool wraps the other default modules to handle any combination of the following scenarios:

Combining of user-defined media assets into bundles; applies to both CSS and JS.
Creation of minified CSS and JS files for the entirety of your media assets directory
Creation of gzipped CSS and JS files for the entirety of your media assets directory
Automatically find and replace strings in your CSS and JS in the newly created files
Uploading of new or modified media assets to Amazon S3
Setting custom gzip headers for gzipped files when uploading to Amazon S3
Setting custom file prefixes for your S3 bucket files to simulate your existing directory structure
Serving up assets via the Amazon Cloudfront CDN once they’ve been uploaded to S3.

While all of the above pertain to the creation and management of our optimized assets, what do we do about serving them on the frontend? Bundler itself uses naming conventions on all bundles, minified files, compressed files, and gzipped files. Using these conventions and a global user-defined configuration file that’s shared between the command line script and client side script, we can serve these assets. The configuration file itself contains environmental specific configuration options for each of the default modules. You may enable or disable minification, gzipping, serving from S3, serving from Cloudfront, etc. The frontend Bundler class parses this configuration file and can sanely generate both CSS includes as well as JS includes with proper pathing. Here’s a quick example:

INSERT CODE SAMPLEZ HERE

Using Bundler Across Environments with S3

One interesting problem was how could we easily switch between serving files locally and via S3 without changing our asset path structure. How could we maintain the same relative filepaths to assets in our CSS and JS files? The solution I devised was to merely mirror the local directory structure to look like a bucket path. Although S3 doesn’t support directories within a bucket, it does support forward slashes in it’s filenames. Using this knowledge, we were able to devise a method for mirroring the relative path of our assets in our filenames as we upload to S3. This solution works great, but it does come at a small cost. It meant that we’d also need to serve our assets locally from a seemingly odd directory, i.e. /bucketName/css/global.css. Since in most cases MVC frameworks and existing applications already have a structure to their public directory, we couldn’t impose such restrictions. The solution I settled on was to create a sub-directory from the document root matching the bucket name and then to create a symlink (or symlinks) to our /media/ folder. If we didn’t have a media folder, we would have had to create several symlinks to /css/, /js/, etc. With this symlink in place, we now have a method of serving assets matching our Amazon S3 bucket naming. The remaining problem is that the assets themselves contain old paths without the bucket name, i.e. /media/css/ as opposed to /bucketName/css/. I added functionality to the CSSTidy and JSMin modules to support a find_replace key/val array. When Bundler is minifying and compressing these files, it also will perform a find and replace on any user-defined values. Thanks to relative paths, we can now safely serve any combination of assets on any of our environments.

On the topic of serving files on different environments via S3, we also soon realized we needed to separate all of our assets by environment to avoid pushing breaking changes to production as we were modifying other environments. In other words, if we make a file change locally and need to test it on S3, we can’t use a globally shared asset, as this would affect all environments using this asset. What we settled on for a convention was to add the environment name to our asset filename as part of the path component. While this does mean we will have duplication of assets at times, it provides us with a means of ensuring environments don’t share assets and we can test each one independently.

Caveats of Rebuilder

Rebuilder does come with caveats. Due to the fact the bundling, compression, minification, and Amazon S3 uploading all take place synchronously, there’s a small window of time where some website assets may be out of sync if you were to run on a live environment. For instance, a newly uploaded JS file could reference a param or method that doesn’t yet exist in a secondary old JS file that has yet to be updated. Similarly, you could have a CSS file that references images or fonts that have yet to be uploaded. These reasons alone lend themselves to coupling Rebuilder with a continuous (or manual) deployment system. This could be something as simple as a bash script with rsync, Capistrano, Fabric, or Phing. Whatever you chose, the premise of completing all transfers successfully before symlinking the web directory over to your repository is crucial to prevent these edge cases.

We strive to make Rebuilder useful to not only ourselves, but the PHP community as a whole. We’d graciously appreciate pull requests, feedback, or additional modules. If you create a module for Rebuilder, let us know and we’ll link to it directly from the wiki.

Still here? Check out Rebuilder!

Rebuilder has been open sourced with an MIT license. We <3 forks and pull requests.