Hints on the nginx fair load balancer
September 11th, 2009The default load balancing for nginx is simple round robin. If you are proxying to a set of peers, and one request for whatever reason is very slow, nginx will continue to give the busy peer additional requests. The result is that requests that might ordinarily be quickly fulfilled will stack up behind the slow request.
There is an add on module for nginx called the fair load balancer. There isn’t all that much documentation available. Hopefully this post will fulfil some of that need. This is of necessity a summary — the fair load balancer has three weight modes and two scan modes, and can fit a large number of site requirements.
The fair load balancer is initialized in the upstream declaration:
upstream mongrel {
fair;
192.168.1.77:4000;
192.168.1.77:4001;
192.168.1.77:4002;
}
This declaration will use fair in its default mode. In this mode, fair will assign idle peers requests first, when all are busy it will assign the requests using a score that depends on both the number of requests assigned to the peer (as most important) and if all equal, to the peer that had the earliest assignment.
Peers may be assigned weights. A weight is an integer value that is related to the number of requests that can be assigned to each peer, although the interpretation is a little different for each weight mode,
Fair also keeps track of peer failures. A failure is defined as an I/O error connecting to the peer — but not an html error and, most importantly, not a timeout. If a peer fails once, the load balancer will only consider the peer for a request if all others are busy. Moreover, when a peer reaches “max_fails”, the software will not assign it a request for fail_timeout seconds.
Here is an example of a configuration that takes advantage of these features. It uses weight_mode=peak to prevent upstream overload, It sets peer failure to pause 60 seconds in assignment on the event of a backend error.
upstream mongrel {
fair weight_mode=peak;
192.168.1.77:4000 max_fails=1 fail_timeout=60 weight=6;
192.168.1.78:4000 max_fails=1 fail_timeout=60 weight=6;
192.168.1.79:4000 max_fails=1 fail_timeout=60 weight=6;
}
The weight_mode=peak option imposes a limit on each peer of weight requests, in this case, 6. If all peers have 6 requests, then the directive instructs the load balancer to return busy (nginx will display the “site is temporarily unavailable” page). Obviously, this sort of configuration only makes sense if you have a large number of servers and would rather have an error page than an overloaded backend.
Bug: There is an important bug in the fair load balancer that affects 64 bit systems. ngx_http_upstream_choose_fair_peer_busy has an initialization that is incorrect. The problem line that reads:
ngx_uint_t best_sched_score = ~0U;
This only assigns a 64 bit integer 32 bits of 1s, it needs to be 64 bits of ones.
ngx_uint_t best_sched_score = ULONG_MAX;
Is one simple way to fix the problem.


