eskratch

Posts

Showing posts from 2012

REDIS

When somebody asks us to create a web application, the software requirements will be Apache, Mysql, PhP(LAMP where L stands for Linux). Imagine an application which stores data in main memory instead of disk, this will increase your application performance tremendously. One such application is Redis. Redis stores data on memory instead of disk and will periodically sync with the disks(if necessary). Why Redis?: Redis will be faster as it keeps data on memory. I read somewhere Memory is like disk and disk is like tape for redis. Redis allows lot of data structures. Basically its a NOSQL database( http://en.wikipedia.org/wiki/NoSQL ) . They dont support table or database schema we use traditionally. For them everything is key value pair as in hashmap. Redis allows a key to have values of types- string, set, sorted set, list, hashmap. To understand each data structure and the command they support, have a look at http://simonwillison.net/static/2010/redis-tutorial/ Lot of ...

Creating own Hosting Space

Requirement: A Computer connected to DSL modem and access rights to DSL Modem's admin interface. In this setup, if I connect to a website, the request reaches the Modem. All systems behind modem will have private ips(say 192.168.x.x). The modem performs a NAT and sends the request to the ISP's upstream router. The modem will have a different ip on ISP's upstream subnet. So it puts its IP and some port translation so that it can map to same computer. A very very rough sketch at http://drawit.eskratch.com/fetch?q=464041803 where A and B are clients who send requests to web servers on port number 8140 and 8141. Modem m receives them and translates req to m:80 and m:81 respectively and sends to the ISP modem.The ISP modem associates a public IP pub to the modem m and sends the request to webserver. On return the packet reaches ISP who based on IPmapping sends the data to modem m. The modem m based on port mapping, send the data to corresponding computer. Now my r...

Summarize-II

I talked about my summarization tool in the previous post . I have pushed the source code in github https://github.com/kalyanceg/summarize . Its completely undocumented. Probably I will add a javadoc for all methods and classes in a couple of weeks. If somebody wishes to contribute to the codebase or build over the existing utility, feel free to checkout the repository and it will be good if you push the changes (productive ones) back to the repo.

SSL Certs, Keys and Trusts

Recently faced some weird issues with ssl, while building a new infrastructure. So this will be a post on SSL basics which actually is sufficient to fix even crazy error like "SSLHandshakeException". For guys wondering what is that, I too have no idea abt it. Its computer way of saying, go and read basics before bugging me. Browser based Handshakes(One way cert validation) : Say I open gmail.com, browser sends a request to gmail.com, gmail.com will give me a certificate that will have hostname(gmail), validity of certificate, public key of gmail.com and a hash to make sure data is not tampered. Browser uses the public key to encrypt data to gmail which can be decrypted only with gmail's private key. Similarly gmail sends data encrypted with its private key which can be decrypted with its public key which the browser has. SSL Session: But everybody will have gmail's public key. What if any culprit decrypts data from gmail and reads it? So at the sta...

Load Balancing

This blog always talks abt things at very basic level. This is one such post on basics of Load Balancing. During my college days, I have wondered how sites run 24*7 without any downtime. I also wonder how sites like google, yahoo handle so much traffic(Add facebook and quora to that list now). One of the key contributor for this awesomeness is load balancer. In simple terms, there are many hosts ready to serve the incoming request. So the questions are 1)how one routes traffic to any one of the host(such that they are not unfairly skewed)? 2)how to make end users unaware of server(s) crashes in our datacenter? 3)how to make cookie based transaction possible if each request hits different server? 4)how to make https transaction possible(as they are tightly bound to hostname)? Lets see the different kinds of solution available and the way they solve the above problems 1) DNS Load Balancers: ( http://en.wikipedia.org/wiki/Round-robin_DNS ) In this method we have to...

Summarize

Had been trying this summarizing system for a while.. So refactored all my existing codes and created a runnable jar file. Download jar from here . 1) Click and run the file using Java SE. 2)Select input file(only pdf) 3)Specify the amount of text to be extracted The output file will be in the same folder as the input pdf file with the name being "eskratch"+input_file_name. Still errors are not properly handled, will try handling these errors and release an update soon.

Networking In LXC

Had been trying to create multiple vm containers and make them reachable from the existing infrastructure switches. So basically I will explain about my host system My host system has two interfaces em1 and em2. em1 is attached to two vlan switch ports 211 and 103. So my interface looks like em1, em1.211, em1.103 and em2. em2 is reachable through my network. 1)Networking via veth: I created a bridge interface br0 and attached it with em1.211 and em1.103. Now the containers use veth to bridge with br0 using veth pair. The usual flow of data will be from outer world to container em1.x(picks up in promisc mode)->br0->host veth pair->container from outer world to host em2(picks up packet as em1 has no ip)->host 2)Networking via macvlan: Macvlan is a kernel feature which allows an interface to have multiple hw address and ip. So by default macvlan creates a new interface(virtual of an existing interface) with hw and ip addr pairs and then moves th...

Creating VM-II

Yeah probably if somebody http://blogs.eskratch.com/2012/10/create-your-own-vms-i.html tried this, you would have faced problems in network connectivity either to the host or vm. Documentation for LXC sucks. So lets quickly go through a series of steps to make host and container accessible Requirements : The system should be connected to internet via ethernet. Use a ubuntu host So lets use the dhcp proto to assign ips to our containers instead of static ips, remove lxc.network.pv4 from conf file its enough for the conf file to have lxc.utsname = beta lxc.network.type = veth lxc.network.flags = up lxc.network.link = br0 Now create a bridge brctl addbr br0 Attach the bridge with interface eth0 brctl addif eth0 You should lose our internet connectivity bcoz during bridging eth0 enters promiscuous mode(where it reads all packets on network and forward to bridge, if the bridge knows the ip, forwards it, else drops it) if...

Create your own VMs - I

LXC(Linux Containers: http://en.wikipedia.org/wiki/LXC ) is a virtualization technique which provides os and network stack level isolation. It allows to run multiple OS distros on a host machine with isolated process listings, user groups, ip addressing. The only constraint is all these containers and host machines should use same kernel( http://en.wikipedia.org/wiki/Linux_kernel ). It make use of cgroups, a new feature addition to latest linux kernels. What is cgroup Cgroup associates each task to a subsystem and the subsystem allocates resources shared fro host machine. One can even control amount of resources by passing them as parameter. There can be heirarchy of cgroups i.e cgroups within cgroups(in precise vm within vm) Have a look at http://www.kernel.org/doc/Documentation/cgroups/cgroups.txt to get a taste of cgroup Create our VM This works without any issue in Ubuntu host. 1)install lxc 2)create a mount point for vms mkdir -p /cgroup ...

Data Link Layer

Came across this question in Quora https://www.quora.com/TCP-IP/What-is-the-main-purpose-of-Data-link-layer-in-TCP-IP-or-OSI-layer . So what DLL layer does when I know the ip and port from ip and tcp layer? Actually ip layer just specifies the destination ip. Say if I need to route a packet from a network A to C via network B. A ip address of host in network C will be in the destination ip field of IP header. The ip layer in the host finds the route from the routing table(use route -n to see the available routes) and sends a ARP request to find the mac address of the host matching the route, then send the packet to the host by placing its mac address in segment header. Now the packet reaches the router of network A. Now the router finds the route for the host in network C from its routing table and place that mac address and send the packet to a router in B network which does the same to push the packet to router in network in C and finally to the destination host. Rout...

Crontab

Cron is the task scheduler in linux based system. Any user can add their job by editing crontab. The job runs at the time specified by the user. To add a job 1)type crontab -e in the shell 2)Specify hour, minute, date of month, month, day of the week (as required) 3)Specify the command (Eg. echo "happi Bday"|sendmail "kalyanceg@eskratch.com" 4)save with :wq and exit The cron will get the job done for you at the time scheduled by you Algo behind Cron Initially cron sleeps for 1 minute and then checks for any process in the queue. If so, does the job and then sleeps for 1 min. This naive method is not scaleable. Then the jobs are put in event list. The head will be the recent most process to be completed in the future. The cron sleeps till the head is scheduled. Once it discharges the head, its sleeping time becomes (next head's time-now time). It cant sleep without taking into account new entries. So a SIGHUP interrupt will wake the sleeping cron t...