Skip to main content

Load Balancing

This blog always talks abt things at very basic level. This is one such post on basics of Load Balancing. During my college days, I have wondered how sites run 24*7 without any downtime. I also wonder how sites like google, yahoo handle so much traffic(Add facebook and quora to that list now).
One of the key contributor for this awesomeness is load balancer. In simple terms, there are many hosts ready to serve the incoming request. So the questions are
1)how one routes traffic to any one of the host(such that they are not unfairly skewed)?
2)how to make end users unaware of  server(s) crashes in our datacenter?
3)how to make cookie based transaction possible if each request hits different server?
4)how to make https transaction possible(as they are tightly bound to hostname)?
Lets see the different kinds of solution available and the way they solve the above problems
1)DNS Load Balancers:
In this method we have to associate two or more A records to same domain name. Traffic will be routed randomly to one server as for each dns request  the ips are sequentially permuted. But hosts may cache a ip address for a host name and try to contact them even if they are down. This makes downtime transparent to end users. DNS changes take time to sync. So host can not be removed too easily.
Sometimes say a client requests a transaction, the server (say A is reached)  performs the work and cached some data for faster access for next req and if next req hits B, it may have to do the job all over again and may not be able to reproduce the tasks. In general session is lost and there is no broker in the middle to route traffic. There are ways to counter this. This question sheds light on the same

SSL connection requires certificates to match the hostname. For ex if I have two A records for which actually corresponds to, The ssl certificate returned by any of the real nodes for the request will not match the hostname 'server'(it will be node1 or node2). So the private and public keys, certs have to be synced across node1 and node2 which should decipher hostname to be "server" when client decodes with the public keys

2)Software Load Balancers:
There are lot of software load balancers available (open source :P). HAProxy is one of the load balancer I worked on(not adept though). Say we have 10 servers. HAProxy will run any one of 10 servers or on a separate box. The request will first reach HAProxy. It will schedule requests to difeerent webservers behind it (round robin, weighted round robin are commonly used). If a backend web server is down, the HAProxy should not send request to that box. The health checks are done periodically using a technology called 'heartbeat'. Its basically like every server at the backend should periodically say to the HAProxy "Iam alive" or else they will not be considered healthy and request wont be forwarded to them. Regarding session maintenance, the load balancer will add a session cookie to each response as a part of HTTP header so that (if necessary) it will send incoming requests with cookie set to same backend server.

The TCP connection usually terminates at the load balancer and load balancer forwards requests to the backend servers with the source IP being the load balancer. So the backend server sends response to load balancer which inturn uses the tcp connection with the client to send the response back. Since TCP connection ends at the load balancer, SSL will also be striped at the load balancer as  backend servers will be in private ips and security wont be a big issue. So one need not want to sync certs as in DNS load balancing.

The usecase required one more change. People wanted servers to send response directly without forwarding via loadbalancer. This is called Direct Server Routing(DSR). In this when a request hits load balancer, the load balancer does not terminate TCP, it just forwards packet to any one of the server via ARP. Now the server will accept packet only if its IP matches with the Dest IP in the packet. So each server will have an interface like lo(localhost) which will the ip of loadbalancer(Dest IP). Now the server can process the packet and send response to client (Source IP in the packet) directly.

A team in my company uses a loadbalancer(cannot be called exactly loadbalancer) called Varnish. This apart from load balancing helps one to cache frequent request, response which are suitable for static requests.

3)Hardware Load Balancers:
I doubt the name Hardware Load Balancers, its best suited if we call them efficient and costly load balancers. These load balancers usually come installed on their hardware. It provides UIs to create multiple virtual servers and nodes behind those servers. To explain what a virtual server is,  a company maps mail account to  ip A and other webservers to ip B. Requests to A and B will hit same load balancer box but different virtual server( a process that accepts request based on IP) . This virtual server distributes request to real nodes behind it.

 Less work force is needed as making changes in the configuration are smooth and one can drag the vendor if something goes wrong as we pay for it. This is useful for a architecture experiencing very heavy loads of input traffic.

 BIGIP from f5 Technologies is one such load balancer

PS: There might be multiple ways to implement things. The things mentioned here are based on the implementations at my workplace


Popular posts from this blog

How we have systematically improved the roads our packets travel to help data imports and exports flourish

This blog post is an account of how we have toiled over the years to improve the throughput of our interDC tunnels. I joined this company around 2012. We were scaling aggressively then. We quickly expanded to 4 DCs with a mixture of AWS and colocation. Our primary DC is connected to all these new DCs via IPSEC tunnels established from SRX. The SRX model we had, had an IPSEC throughput of 350Mbps. Around December 2015 we saturated the SRX. Buying SRX was an option on the table. Buying one with 2Gbps throughput would have cut the story short. The tech team didn't see it happening. I don't have an answer to the question, "Is it worth spending time in solving a problem if a solution is already available out of box?" This project helped us in improving our critical thinking and in experiencing the theoretical network fundamentals on live traffic, but also caused us quite a bit of fatigue due to management overhead. Cutting short the philosophy, lets jump to the story.

LXC and Host Crashes

 We had set up a bunch of lxc containers on two servers each with 16 core CPUs and 64 GB RAM(for reliability and loadbalancing). Both the servers are on same vlan. The servers need to have atleast one of their network interface in promiscuous mode so that it forwards all packets on vlan to the bridge( ) which takes care of the routing to containers. If the packets are not addressed to the containers, the bridge drops the packet. Having this setup, we moved all our platform maintenance services to these containers. They are fault tolerant as we used two host machines where each host machine has a replica of the containers on the other. The probability to crash for both the servers at the same time due to some hardware/software failure is less. But to my surprise both the servers are crashing exactly the same time with a mean life time 20 days. We had to wake up late nights(early mornings) to fix stuffs that gone down The

The server, me and the conversation

We were moving a project from AWS to our co-located DC. We have setup KVMs scheduled by Cloudstack for each of the component in the architecture. The KVMs used local storage. The VMs are provisioned with more than required resources because we have the opinion that in our DC scaling during peak load and then downscaling doesn't offer much benefits financially as we are anyways paying for the hardware in advance and its also powered on. Its going to be idle if not used. Now we found something interesting our latency in co-located DC was 2 times more than in AWS. The time for first byte at our load balancer in aws was 60ms average and at our DC was 112ms. We started our debugging mission, Mission Conquer-AWS. All the servers are newer Dell hardwares. So the initially intuition was virtualisation is causing the issue. Conversation with the Hypervisor We started with CPU optimisation, we started using the host-passthrough mode of CPU in libvirt so VMs dont see QEMU emulated CPUs,