It’s been a while since i last posted on the “hardcores or masochists” series. This is due to lack of time since an article like this one requires 4-5 hours of research and coding. But here i am again with a simple one. You must have heard about a “proxy server”. What it is, is what it actually means. It stands between you and the rest of the world for various protocols (http, ftp, ssl etc). In this small tutorial we will see how the HTTP proxy works and a small program example doing just that.

But let’s see in detail how the HTTP proxy should work. Below is the architecture of a proxy system.

The browser has to first be configured to send all the traffic through an http proxy. When that happens, the browser makes the request almost the same as it would do it to the end server with a small change. Below is the dump of a Firefox request on google.gr.

GET http://google.gr/ HTTP/1.1
Host: google.gr
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9) Gecko/2008052906 Firefox/3.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Proxy-Connection: keep-alive
Cookie: PREF=ID=6d527a9e2768fc73:TM=1213705662:LM=1213705662:S=pSyveN4XTqCHejq_

As you can see, the request is almost the same, as we saw it on a previous article here, with a small difference, “Proxy-Connection: keep-alive“. Following is an example of an IE request.

GET http://google.gr/ HTTP/1.1
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/x-ms-application, application/vnd.ms-xpsdocument, application/xaml+xml, application/x-ms-xbap, application/x-shockwave-flash, application/vnd.ms-excel,application/vnd.ms-powerpoint, application/msword, */*
Accept-Language: el
UA-CPU: x86
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 3.0.04506)
Host: google.gr
Proxy-Connection: Keep-Alive

It is basically the same. The “Proxy-Connection: keep-alive” header still exists. This is what makes us know that this is a request going through a proxy and not directly.

So if we wanted to create a small proxy server the basic procedure would be this:

  1. Create a server listening on a specific port, the proxy port.
  2. When a client connects we should get a request like the one above. Read all the headers in.
  3. Parse the headers and find the “Host:”. This way we will know where the request is heading at.
  4. Make a socket connection to the destination and forward all the requests except for the “Proxy-Connection: keep-alive” header which is basically useless on the destination server.
  5. Read the response from the destination server.
  6. Probably parse it, se if we allow it through our Intranet or not. Maybe also cache it. (notice the parenthesis on the proxy reply that says filtered. this is what i mean)
  7. Send the response back to the client browser as is.

A small Java example playing the role of a simple proxy server would be like the one below:

ServerSocket server = new ServerSocket(2020);
System.out.println("Waiting for clients...");
Socket s = server.accept();
System.out.println("New client...");
BufferedReader in = new BufferedReader(new InputStreamReader(s.getInputStream()));
BufferedWriter out = new BufferedWriter(new OutputStreamWriter(s.getOutputStream()));
String read = in.readLine();
String totalRead = "";
while(read != null && !read.equals("\r\n")){
	totalRead += read + "\r\n";
	read = in.readLine();
}
System.out.println("Read client...");
totalRead += "\r\n\r\n"; //adding the last empty line that ends the request
Socket client = new Socket(getHost(totalRead), 80);
BufferedReader cIn = new BufferedReader(new InputStreamReader(client.getInputStream()));
BufferedWriter cOut = new BufferedWriter(new OutputStreamWriter(client.getOutputStream()));

cOut.write(totalRead, 0, totalRead.length());
cOut.flush();
System.out.println("Sent headers...");
read = cIn.readLine();
while(read != null){
	out.write(read, 0, read.length());//forwarding the read data
	out.flush();//flushing the buffer
	read = cIn.readLine();
}

System.out.println("done!");

A few things we need to notice on the above code. First of all the “Proxy-Connection: keep-alive” header was not removed. This is because the code is a scratch just demonstrating how proxy works. If you want to make a good proxy you need to remove it. Even better construct a new request all over. You can do that by parsing the headers and reconstructing the request. Second thing you need to notice is that step 6 from the above list is completely missing. This is for the same reason. The code is merely an example. A good proxy server, for instance SQUID, should filter the incoming data and most probably cache them.

One small pointer to all the adventurous that will try to code a small proxy. Beware of what the “Accept:” header has because if in there there is “gzip” and the server supports gzip encoding then the data you will get will be compressed. So, do not try to echo them out on the console cause you will get alot of strange symbols πŸ˜‰

All in all, the proxy server is just a middle software that forwards requests. Here we discussed about the HTTP proxy, but proxies exist for FTP, SSL etc. Hope this small tutorial made things clear and gave you a good idea on what a proxy is.