Preparing for Impact
While the LCROSS team prepares for the October 9th impact on the lunar south pole, the Nebula team is busy polishing and preparing for the public launch of the LCROSS Citizen Science site. Over the next two weeks, we will be making a number of enhancements to the site, including a new home page, a new site template that complements the newly redesigned NASA Nebula site, and more.
Additionally, we will be optimizing the site to accommodate the multiple terabytes of lunar impact pictures that will be uploaded from amateur astronomers world wide in the days following the impact. Due to the unique and community-based content the site provides, we must be prepared for peak events as the site receives mention in the press.
Performance Tuning for the Cloud
Even while running in the space cloud, it is a non-trivial endeavor for a web server to process a hundred 12 gigabyte images at a time. So as each image is uploaded, we place a reference to it in an enterprise class queue powered by RabbitMQ. We can then create as many image processing nodes as we need in the cloud and have them listen to the same queue. This is a great example of how cloud computing can allow us to elegantly scale on an as-needed basis.
The next issue is one of the oldest problems in the book: optimizing for page hit throughput. What happens if an article referring to the site lands on the home page of a popular social news media site, and all of a sudden ther is a huge peak in requests for the home page? Without a solution in place ahead of time, the site would come to a screeching halt. There are many approaches, but we will focus on one area for now: graceful caching.
Grace Under Fire
There is a relative newcomer called Varnish. Officially labeled as an HTTP accelerator, it also provides us with the ability to implement graceful caching for all of our content that doesn't require authentication. In our case, this is perfect since the vast majority of page hits will be for unauthenticated content such as the home page, impact observations, articles, etc.
Varnish can be configured in a number of ways, but our biggest payoff comes from implementing graceful caching. We can configure Varnish to cache unauthenticated content for say, 5 minutes. During this time, Varnish will serve cache hits directly from memory. Only when the cache expires does Varnish pass along the request to Apache for the real content. During this time, Varnish will continue to serve cache hits to other requests for a grace period of 5 minutes (or whatever we specify).
This means we can have tens of thousands of users requesting the home page, but only once every 5 minutes would Apache have to process a request. This solution easily allows us to serve thousands upon thousands of requests per second.
By using tried and true practices for performance tuning and taking advantage of the cloud's opportunities for scalability, we are able to provide a high performance, enterprise class application.