Archive for the ‘architecture’ Category

Ruby on Rails: A look back

Friday, May 30th, 2008

It has been a little over a year since we started rewriting MedHelp’s software and had to answer a very simple question: which platform should we use?

After much exploration and deliberation, I decided that Ruby on Rails was the way to go. At that time, the debate on whether RoR was scalable or mature enough was raging (and still is), with few high profile stories adding to the drama (a twitter dev dissing RoR for what seemed to be architecture failures was a classic).

Just like anyone making an investment decision, I followed the various blogs talking about why RoR is such a terrible platform, why it couldn’t scale and how it is obviously a bad choice, starting with twitter of course and going through to the various people for and against.

To my surprise (or not) the issues people faced as they scaled RoR were not specific to RoR. In fact they were issues I saw people dabble with for years. Bottlenecked (and sometimes not truly stateless) app servers, expensive database queries, single points of failure, centralized databases.

For some reason many people in the debate assumed that there are platforms that scale and others that don’t. And that by picking the right platform you will be able to serve millions of users. Unfortunately, it is never that simple. Scaling is a continuous exercise of understanding the bottlenecks in your system and the limitations of your architecture and finding ways to gracefully get beyond them.

Another argument against Ruby on Rails was that Ruby is a slow language or that it consumed too much memory. But wasn’t this the argument against Java when the world was dominated by C++ fanatics? Wait, wasn’t this also the argument against C when Assembly developers were the coolest kids on the block? What about machine code.. you get the picture!

The answer to this argument is two folds. The first is an economic one. Developers are way more expensive than hardware. This statement held true for years, and is truer every minute than the minute before. The other part of the answer is that today’s architecture (thanks to the 90’s) puts completely stateless software at the heart of your system allowing you to scale horizontally. So it is not really that important how fast each machine is (as long as it is not noticeable to the end user), you can always add another piece of hardware and double your capacity.

So not finding any challenges with RoR that I didn’t expect to face with any other platform, and having been sold on its design philosophy (long live conventions), the elegance of its architecture and the elasticity of the Ruby language, I decided that MedHelp is going to be a Ruby on Rails shop.

Fast forward one year later. And you will notice that MedHelp is up and running. We were able to rewrite the entire application in RoR in about four weeks. We transformed the site from a simple forum application to a vibrant community. Added tons of feature, some of which are complex Ajax applications such as trackers. Swapped out the site’s interface in favor of better flow and aesthetics. And we did all that while growing our visitors from 2 million unique visitors to 5.5 million uniques.

Our average team size during this year was 3.5 people (we are 6 now). And while all of them are experienced engineers with a lot of experience in building and scaling server software (whom I knew or worked with prior to MedHelp, and am proud to continue doing so today) all of them learned Ruby on Rails on the job.

After all this, I am now taking a deep breath and asking myself again. Have I made the right choice? The answer for me is clearly yes.

The ride was not an easy one. And we had our share of emergencies, head scratching and nervous moments. But none of the mistakes made or the bugs found were caused by Ruby on Rails except in the sense that the platform’s flexibility made it easy to make some mistakes. But the mistakes were ours. When made, they often showed a misunderstanding of how a certain feature worked, a flaw in our database schema or how our components are distributed across our servers.

Now that we’ve gone through those pains to grow the site, I think I am ready to share many of the things that we learned or had to re-learn as we grew MedHelp. Each week or two I will share one of the big pitfalls that we managed to fall into, and what lessons we learned as we climbed out of it and started marching for the next pitfall.

The fake order of id

Monday, November 5th, 2007

A couple of days ago, I gave Ajay really bad advice. He was changing a piece of code to order the results by created_at instead of ids because ids and my code review boldly stated:

Ids will be chronologically ordered 99.9999% of the time the ids will be chronologically ordered.

While this might be true right now, it is really suboptimal. I realized that when I was thinking about the best way to setup our new database server. It occurred to me that we might want to setup a master-master replication scheme for MySQL server. Once we do that, each MySQL master will have its own range of ids to use when assigning a new automatically generated id to a record being inserted. For example, server A will be able to use ids within the range 1 - 99999 and server B 100000 to 199999

Once that happens, the chronological order of ids will be less continuous and might look like: 324, 100231, 325, 326, 100232,et.

So the lesson I admit to learning is: never sort by IDs when you mean to sort by date. In fact, I can’t think of any reason why you would ever want to sort by ID.

It’s all about the cache money

Thursday, November 1st, 2007

Cache is money to MedHelp as it helps us scale by reducing cpu, I/O, roundtrip times, etc. This is critical as we continue to grow from millions of users to tens of millions of users and beyond.

There are five popular ways to cache:

  1. page caching
  2. action caching
  3. fragment caching
  4. model caching
  5. in-memory computational cache

Page Cache

Page caching caches static pages so the request does not need to hit your rails server. I can imagine, every page on wikipedia is page cached and invalidated on update. This has huge performance benefits as we can configure our web server in front of rails to return the cached html page. We could use page caching for our medical dictionary pages. Almost all our pages have or will soon have dynamic content so the value of page caching appears low.

However, an interesting use of page caching is to leverage the fact that all pages viewed by a non-logged-in users looks the same. Since, the majority of our traffic are new users from google this will have a large impact. The trick here is that we have to identify that a request is from a non-logged in user from inspecting the cookie in our front-end web server.

Methods: caches_page and expire_page or sweepers

Action Caching

In action caching, the request hits the rails server and passes through all the filters. This is useful when you require auth to differentiate logged in and out and have an authorization filter. I cannot think of a case where we would use this form of caching since we use dynamic content on almost every page.

Methods: caches_action, expire_action

Fragment Caching

Fragment caching saves rendering of portions of your view. This method is interesting as we could cache the middle div (user journals) in the people page and invalidate using a time to live.

Another interesting technique, is to use identifiers on the cache fragments so that in the controller you do not execute code related to that fragment (ie, save db calls and computations). This does poke a hole in the MVC model as it creates an additional coupling between view and controller. Here is a code example:

View:
<% cache( :controller => :post, :action => :show, :subject_id => @post.subject_id ) dp %>
  <% # beautifully written medhelp code %>
<% end %>
Controller:
def show
  unless read_fragment(
:controller => :post, :action -> :show, :subject_id => @post.subject_id  )
    # lots of db calls
  end
  # code you need for other fragments on the page
end
def edit
  ...
  expire_fragment(:controller => :post, :action -> :show, :subject_id => @post.subject_id)
end

Methods: <% cache do %> in the view expire_fragment

Model Caching

Model caching is when we store ActiveRecord and db results to save db calls which is typically the bottleneck of our web appplication. We currently use this everywhere and we are very strict in reviewing this in our bottoms up design reviews and code reviews prior to check-in. For example, we cache all users. Soon, we will be caching all posts, subjects, and forums. All hail memcache.

Methods: CACHE get/set, act_as_cachable

In-memory computational cache

This is where we save our computation in a data structure. I almost forgot this since it is second nature. An example of this is caching the key words -> links data structure that is used to link-ify user generated text. See our medical terms highlighting in user journals and forum posts.

Methods: CACHE get/set, session set/get, class variables