Skip to main content

Load Balancing

This blog always talks abt things at very basic level. This is one such post on basics of Load Balancing. During my college days, I have wondered how sites run 24*7 without any downtime. I also wonder how sites like google, yahoo handle so much traffic(Add facebook and quora to that list now).
One of the key contributor for this awesomeness is load balancer. In simple terms, there are many hosts ready to serve the incoming request. So the questions are
1)how one routes traffic to any one of the host(such that they are not unfairly skewed)?
2)how to make end users unaware of  server(s) crashes in our datacenter?
3)how to make cookie based transaction possible if each request hits different server?
4)how to make https transaction possible(as they are tightly bound to hostname)?
Lets see the different kinds of solution available and the way they solve the above problems
1)DNS Load Balancers:
In this method we have to associate two or more A records to same domain name. Traffic will be routed randomly to one server as for each dns request  the ips are sequentially permuted. But hosts may cache a ip address for a host name and try to contact them even if they are down. This makes downtime transparent to end users. DNS changes take time to sync. So host can not be removed too easily.
Sometimes say a client requests a transaction, the server (say A is reached)  performs the work and cached some data for faster access for next req and if next req hits B, it may have to do the job all over again and may not be able to reproduce the tasks. In general session is lost and there is no broker in the middle to route traffic. There are ways to counter this. This question sheds light on the same http://serverfault.com/questions/32421/how-is-session-stickiness-achieved-across-multiple-web-servers

SSL connection requires certificates to match the hostname. For ex if I have two A records for server.eskratch.com which actually corresponds to node1.eskratch.com, node2.eskratch.com. The ssl certificate returned by any of the real nodes for the request https://server.eskratch.com will not match the hostname 'server'(it will be node1 or node2). So the private and public keys, certs have to be synced across node1 and node2 which should decipher hostname to be "server" when client decodes with the public keys

2)Software Load Balancers:
There are lot of software load balancers available (open source :P). HAProxy is one of the load balancer I worked on(not adept though). Say we have 10 servers. HAProxy will run any one of 10 servers or on a separate box. The request will first reach HAProxy. It will schedule requests to difeerent webservers behind it (round robin, weighted round robin are commonly used). If a backend web server is down, the HAProxy should not send request to that box. The health checks are done periodically using a technology called 'heartbeat'. Its basically like every server at the backend should periodically say to the HAProxy "Iam alive" or else they will not be considered healthy and request wont be forwarded to them. Regarding session maintenance, the load balancer will add a session cookie to each response as a part of HTTP header so that (if necessary) it will send incoming requests with cookie set to same backend server.

The TCP connection usually terminates at the load balancer and load balancer forwards requests to the backend servers with the source IP being the load balancer. So the backend server sends response to load balancer which inturn uses the tcp connection with the client to send the response back. Since TCP connection ends at the load balancer, SSL will also be striped at the load balancer as  backend servers will be in private ips and security wont be a big issue. So one need not want to sync certs as in DNS load balancing.

The usecase required one more change. People wanted servers to send response directly without forwarding via loadbalancer. This is called Direct Server Routing(DSR). In this when a request hits load balancer, the load balancer does not terminate TCP, it just forwards packet to any one of the server via ARP. Now the server will accept packet only if its IP matches with the Dest IP in the packet. So each server will have an interface like lo(localhost) which will the ip of loadbalancer(Dest IP). Now the server can process the packet and send response to client (Source IP in the packet) directly.

A team in my company uses a loadbalancer(cannot be called exactly loadbalancer) called Varnish. This apart from load balancing helps one to cache frequent request, response which are suitable for static requests.

3)Hardware Load Balancers:
I doubt the name Hardware Load Balancers, its best suited if we call them efficient and costly load balancers. These load balancers usually come installed on their hardware. It provides UIs to create multiple virtual servers and nodes behind those servers. To explain what a virtual server is,  a company maps mail account to  ip A and other webservers to ip B. Requests to A and B will hit same load balancer box but different virtual server( a process that accepts request based on IP) . This virtual server distributes request to real nodes behind it.

 Less work force is needed as making changes in the configuration are smooth and one can drag the vendor if something goes wrong as we pay for it. This is useful for a architecture experiencing very heavy loads of input traffic.

 BIGIP from f5 Technologies is one such load balancer

PS: There might be multiple ways to implement things. The things mentioned here are based on the implementations at my workplace

Comments

Popular posts from this blog

Lessons from Memory

Started debugging an issue where Linux started calling OOM reaper despite tons of memory is used as Linux cached pages. My assumption was if there is a memory pressure, cache should shrink and leave way for the application to use. This is the documented and expected behavior. OOM reaper is called when few number of times page allocation has failed consequently. If for example mysql wants to grow its buffer and it asks for a page allocation and if the page allocation fails repeatedly, kernel invokes oom reaper. OOM reaper won't move out pages, it sleeps for some time and sees if kswapd or a program has freed up caches/application pages. If not it will start doing the dirty job of killing applications and freeing up memory. In our mysql setup, mysql is the application using most of the Used Memory, so no other application can free up memory for mysql to use. Cached pages are stored as 2 lists in Linux kernel viz active and inactive.
More details here
https://www.kernel.org/doc/gorman…

Walking down the Memory Lane!!!

This post is going to be an account of  few trouble-shootings I did recently to combat various I/O sluggishness.
Slow system during problems with backup
We have a NFS mount where we push backups of our database daily. Due to some update to the NFS infra, we started seeing throughput of NFS server drastically affected. During this time we saw general sluggishness in the system during backups. Even ssh logins appeared slower. Some boxes had to be rebooted due to this sluggishness as they were too slow to operate on them. First question we wanted to answer, does NFS keep writing if the server is slow? The slow server applied back pressure by sending small advertised window(TCP) to clients. So clients can't push huge writes if server is affected. Client writes to its page cache. The data from page cache is pushed to server when there is a memory pressure or file close is called. If server is slow, client can easily reach upto dirty_background_ratio set for page cache in sysctl. This di…

How we have systematically improved the roads our packets travel to help data imports and exports flourish

This blog post is an account of how we have toiled over the years to improve the throughput of our interDC tunnels. I joined this company around 2012. We were scaling aggressively then. We quickly expanded to 4 DCs with a mixture of AWS and colocation. Our primary DC is connected to all these new DCs via IPSEC tunnels established from SRX. The SRX model we had, had an IPSEC throughput of 350Mbps. Around December 2015 we saturated the SRX. Buying SRX was an option on the table. Buying one with 2Gbps throughput would have cut the story short. The tech team didn't see it happening.

I don't have an answer to the question, "Is it worth spending time in solving a problem if a solution is already available out of box?" This project helped us in improving our critical thinking and in experiencing the theoretical network fundamentals on live traffic, but also caused us quite a bit of fatigue due to management overhead. Cutting short the philosophy, lets jump to the story.

De…