Preparing for Impact: Performance Tuning for LCROSS Citizen Science
Posted on Sept. 22, 2009 by Devcamcar
Tags: lcross performance rabbit varnish

Preparing for Impact

While the LCROSS team prepares for the October 9th impact on the lunar south pole, the Nebula team is busy polishing and preparing for the public launch of the LCROSS Citizen Science site. Over the next two weeks, we will be making a number of enhancements to the site, including a new home page, a new site template that complements the newly redesigned NASA Nebula site, and more.

Additionally, we will be optimizing the site to accommodate the multiple terabytes of lunar impact pictures that will be uploaded from amateur astronomers world wide in the days following the impact. Due to the unique and community-based content the site provides, we must be prepared for peak events as the site receives mention in the press.

Performance Tuning for the Cloud

Even while running in the space cloud, it is a non-trivial endeavor for a web server to process a hundred 12 gigabyte images at a time. So as each image is uploaded, we place a reference to it in an enterprise class queue powered by RabbitMQ. We can then create as many image processing nodes as we need in the cloud and have them listen to the same queue. This is a great example of how cloud computing can allow us to elegantly scale on an as-needed basis.

The next issue is one of the oldest problems in the book: optimizing for page hit throughput. What happens if an article referring to the site lands on the home page of a popular social news media site, and all of a sudden ther is a huge peak in requests for the home page? Without a solution in place ahead of time, the site would come to a screeching halt. There are many approaches, but we will focus on one area for now: graceful caching.

Grace Under Fire

There is a relative newcomer called Varnish. Officially labeled as an HTTP accelerator, it also provides us with the ability to implement graceful caching for all of our content that doesn't require authentication. In our case, this is perfect since the vast majority of page hits will be for unauthenticated content such as the home page, impact observations, articles, etc.

Varnish can be configured in a number of ways, but our biggest payoff comes from implementing graceful caching. We can configure Varnish to cache unauthenticated content for say, 5 minutes. During this time, Varnish will serve cache hits directly from memory.  Only when the cache expires does Varnish pass along the request to Apache for the real content. During this time, Varnish will continue to serve cache hits to other requests for a grace period of 5 minutes (or whatever we specify).

This means we can have tens of thousands of users requesting the home page, but only once every 5 minutes would Apache have to process a request. This solution easily allows us to serve thousands upon thousands of requests per second.


By using tried and true practices for performance tuning and taking advantage of the cloud's opportunities for scalability, we are able to provide a high performance, enterprise class application.

Comments

Congratulations to all the designers and pioneers developing opportunities of cloud computing!!

Look forward to both learning myself and supporting the public understandings of cloud computing and its implications for education and economic development in America and around the world Dr Ronnie Lowenstein President, Lowenstein & Associates, Education Technology Think Tank, Special Advisor to US Congresswoman Diane E Watson

Reply to this comment.

Post a comment.