This is about how we are delivering thousands of dynamically resized images from our datacenter, per second. We have outbound traffic of about 2-3 Gbit per second. Around 60% of that consists of image requests. After years of running all the data through a commercial hardware storage solution we’ve decided it was time for a less cost intensive and more flexible approach regarding the resizing of our images. We built a completely new infrastructure containing several shards. Each shard has several synchronized instances of Apache/nginx running on them. The resulting benchmarks showed that we can handle well beyond 30000 requests per second.
The Starting Point: High Volume, Low Flexibility
Figure 1: The old infrastructure
AutoScout24 has around 2.5 million individual listings with up to 10 images each. That in itself is quite a challenge. But we are currently offering eight different sizes for every single image. You do the math, 2.510^610*8. That results in potentially storing and delivering 200 million images in an efficient and seamless manner. Our commercial hardware storage solution was really quite performant with around 500 GB of RAM and a lot of SSD storage. We had the entire thing running in a metro cluster in two locations. They were kept synchronized through a dark fiber connection. The downside of this was the expense, the difficulty in scaling up and the lack of flexibility in respect to image sizing. In the old setup the services responsible for adding new ads to the platform did the scaling before storing it on the commercial hardware storage system. We had no flexibility which sizes and ratios we could use on our platform. A hindrance we didn’t want to have in a time where retina displays appear and screen sizes come and go.
The Goal: High Volume, High Flexibility
Figure 2: The new infrastructure
The new infrastructure is based on five shards running in a VMware Cluster. The images to be stored are distributed evenly among these shards. Each shard consists of several Linux machines whose data volumes are being replicated via GlusterFS. It’s a scalable network file system with synchronous replication between multiple so-called bricks. In our case one node holds one brick. Apache instances installed on these nodes fetch the reference images from disk. An nginx instance, which has been compiled with the image filter module, is responsible for getting the image from an Apache httpd running on the same machine and resizing it to the appropriate size. Therefore the need to resize an image during import is now obsolete and we can easily add a new format to the list without reprocessing all of the existing images. This allows our applications to select the image sizes depending on their use case, the clients device or other criteria. Currently we don’t support dynamic sizes (for example using width and height URL parameters). Image requests directed to a shard are cached by a number of Varnish machines. By using the Varnish cache we are achieving approximately a 40% hit rate for the image requests. How does this work in a standard use case? A customer enters some search criteria in the browser and receives a list of automobiles with the thumbnail images. The larger images for the detail view are loaded into the cache along with the thumbnail image, because we first try to find the reference image in the cache. Once the customer selects one of the offers in the list, the larger images will be accessed directly from the cache, rather than going back to the drive.
Making it look like commercial hardware storage solution
We didn’t want to change the interface for the applications uploading files to the share and at that point in time the file share providing a central point for our images infrastructure had not been decoupled in a separate service. We needed a solution to make five separate GlusterFS shards look like one windows share. To accomplish this we are using a combination of Samba and symlinks to create a single entry point for requests. The GlusterFS bricks are mounted into five mount points on a single machine and combined to one directory using symlinks. This is possible because the folder structure of the shards doesn’t overlap and consists of two levels of fixed 3-digit subdirectories. As an example:
On top of that Samba exposes the combined directory as a windows share. This way we were able change our image infrastructure without changing any of the individual application interfaces. The only change necessary was a change in DNS to get the applications writing to the new infrastructure.
Transitioning under load
Prior to switching to the new image infrastructure, the commercial hardware storage and Samba were kept synchronized with rsync jobs. Making the rsync jobs work was quite a challenge due to the sheer amount of changes they had to cope with. After the configuration of the new image infrastructrue was complete and lots of synthetic upfront testing by replaying traffic behavior from existing log files from the old infrastructure, we were able to test the new infrastructure. It was time to allow some live traffic. Using our feature toggle tool FeatureBee we were able to control the distribution of traffic to the new system. We started off in countries with less traffic, but quickly realized that we can certainly put more load onto the system. In the end it was necessary to allow all traffic as we wanted to ensure the new infrastructure would perform, especially when it comes to caching with LRU evictions and user behavior.
If something doesn’t work timebox your attempt
We believe that we hit a performance bug on nginx. This manifested itself in having nginx load the image from disk, which was considerably slower than loading it from an Apache httpd backend on the same machine. Even after testing with different storage configuration and other system tweaks we didn’t manage to find out the root cause of this performance problem. Only when trying something completely different after some time we found nginx to be the bottleneck. So if something is not working as expected try another approach after reasonable time and see if this impacts the results.
Benchmarking limited by laws of nature
While testing with the famous siege http benchmark tool we got pretty disappointing results in the beginning. After thinking hard about our benchmark setup we realized that results are tied to the laws of nature. To give you an example: we tested our resizing webservers with a concurrency of 20 requests and we got a result of 250 requests per second. As you can see one request takes about 80ms (1000ms * 20 / 250). When testing with a higher concurrency the figures look quite different (1000ms * 40 / 80) with about 500 requests per second. Of course this doesn’t scale infinitely nor linear as in this example. The point is that you should keep in mind the natural limitations and, as an example test with different concurrencies.
Be aware of all the caches
The different cache layers (down to the disk) made benchmarking more complicated. The Linux file system cache, VMware disk caching and the storage system which can move hot objects into faster regions if required. Naturally benchmarks are skewed by all these caches. So running tests multiple times results in increased speed. Be aware of where caches exist, don’t party too early!
Pairing while testing
When you are testing a system with so many different parts it is crucial to be able to communicate and coordinate well with everyone involved. For example you should be able to monitor if somebody is using or clearing one of the caches. That could influence your test results dramatically.
Don’t burry your files too deep
We are using a very detailed and therefore deep file structure. Initially this was necessary for performance reasons. Today this is most likely not the case. On our specific system we observed some real performance differences when comparing to folder structure with less depth. Especially image types of which we don’t have a lot of (e.g. dealer logos) are burried in a folder structure that is adding a lot of unneeded complexity.
Why no CDN?
Of course there is another option to deliver all these image requests: a content delivery network. We considered it, compared costs and benefits and came to the conclusion that it did not fit our needs at the time. Since we have good peering and a technically sound solution we can still deliver images to our visitors fastly. Plus, it was a nice challenge to build the whole infrastructure ourselves. As technology and our company are constantly changing – we never know where we will be in a year from now.