My picContinuing my previous post “HTTP for hardcores or… masochists!” i would like to write a few words about downloading and uploading files via http protocol. It is usefull many of the times to be able to download a file via php on even upload eliminating human intervention.

To download a file is not different at all from downloading any page and content. First, let’s see the dump from the client-server communication when requesting a plain text file called foo.txt

HTTP/1.1 200 OK
Date: Fri, 29 Feb 2008 21:29:53 GMT
Server: Apache/2.2.3 (Debian) mod_python/3.2.10 Python/2.4.4 PHP/5.2.0-8+etch10
mod_ssl/2.2.3 OpenSSL/0.9.8c mod_perl/2.0.2 Perl/v5.8.8
Last-Modified: Fri, 29 Feb 2008 21:28:15 GMT
ETag: "3c6e-7-b74cd9c0"
Accept-Ranges: bytes
Content-Length: 7
Connection: close
Content-Type: text/plain; charset=UTF-8

foo bar

As you can see all the webserver says is that the content is a plain text one and that the length is 7 bytes. Plain and easy. But what if we ask for a zip one? Let’s see the dump once more.

HTTP/1.1 200 OK
Date: Fri, 29 Feb 2008 21:35:19 GMT
Server: Apache/2.2.3 (Debian) mod_python/3.2.10 Python/2.4.4 PHP/5.2.0-8+etch10
mod_ssl/2.2.3 OpenSSL/0.9.8c mod_perl/2.0.2 Perl/v5.8.8
Last-Modified: Fri, 29 Feb 2008 21:35:10 GMT
ETag: "25799-77-d0093f80"
Accept-Ranges: bytes
Content-Length: 119
Connection: close
Content-Type: application/zip

See the difference? The content-type is application/zip. That’s it. After the headers the content of the zip follows. Be carefull though. The content is bytecodes so when you read it be sure to be bytecode safe. For instance, if you use a PHP script opening a file on the local computer to save the file you download be sure to open it with the flag “wb” and not just plain “w”. That’s it! After the two consecutive CRLF’s that devide the headers from the content, read content-length bytes and save them to the file.

The upload file procedure is a bit more tricky. Here is the HTML form that i used.

<form action="http://localhost:2020/" method="post">
<input type="text" name="foo" value="foo">
<input type="submit">
</form>

So i submit a filename called foo.txt from a file upload input called foo. The file contains “foo bar bara”. I know alot of foo’s but i’ll write about these stories another time. Anyway here is the dump from the webserver i had on localhost port 2020 when firefox submited the form.

POST / HTTP/1.1
Host: localhost:2020
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; el; rv:1.8.1.12) Gecko/2008
0201 Firefox/2.0.0.12
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plai
n;q=0.8,image/png,*/*;q=0.5
Accept-Language: el-gr,el;q=0.7,en-us.;q=0.3
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-7,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Content-Type: multipart/form-data; boundary=---------------------------662019756
30356
Content-Length: 207

-----------------------------66201975630356
Content-Disposition: form-data; name="uploadedfile"; filename="foo.txt"
Content-Type: text/plain

foo bar bara
-----------------------------66201975630356--

First of all, as you can see, the content type is multipart/form-data which means what exactly it says. A form with many parts have been submited. Secondly, a boundary is defined. This boundary is used to delimit the data that the form submits. You can see what i mean when you see the data. See, before the file contents there is the boundary, a few more headers after the boundary, two CRLF’s, the content of the file and then the boundary again. If we had one more file it would look the same as this, boundary, headers, two CRLF’s, data and then the boundary again.

A small pointout about this boundary. The number is randomly generated. The length of it as Firefox submits it is 30 bytes. In IE it’s a whole new story. The boundary looks very different but almost as big. The point is to be unique and able to identify one file from another.

Using the above techniques you can download and upload files from any program that can make a simple socket connection.

I have made two simple programs written in Java. One simulates a server, opening a socket on 2020 (to change you have to recompile). The, when a request arrives, it outputs anything from the input stream. This way you can see what a browser sends to the webserver. Just point your browser to http://localhost:2020 The client makes an http request like this “java Client host port script”. You can download them from my site from here.

But, there is a much easier way to simulate a client. You can just telnet an http server. How? Well, here is the dump from my telnet (i request the same file foo.txt as above).

> telnet 127.0.0.1 80
Trying 127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.
GET /public_html/foo.txt HTTP/1.1
Host: 127.0.0.1
Connection: close

HTTP/1.1 200 OK
Date: Fri, 29 Feb 2008 23:07:42 GMT
Server: Apache/2.2.3 (Debian) mod_python/3.2.10 Python/2.4.4 PHP/5.2.0-8+etch10 mod_ssl/2.2.3 OpenSSL/0.9.8c mod_perl/2.0.2 Perl/v5.8.8
Last-Modified: Fri, 29 Feb 2008 21:28:15 GMT
ETag: "3c6e-7-b74cd9c0"
Accept-Ranges: bytes
Content-Length: 7
Connection: close
Content-Type: text/plain; charset=UTF-8>

foo bar

As you can see the response is the same, but the telnet is much easier. Anyway, whatever suits you. If you need my progs, be my guest!

If you find this article useful, or not or if you have anything to add leave a comment. It will give me a motive to keep posting tutorials like this. Have fun!


Clarifying elijah’s comment. What happens is this. When a server opens a session with the client what happens, as mentioned in the previous article, is it sents the following extra headers:
Set-Cookie: PHPSESSID=13c1635cb39e550a2c49381ddc87730e; path=/
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
This headers set a cookie at the client that actually gives him the session id. What i did not tell you is what the client does. When a request is made, to resume a session we have to send this cookie back to the server. This happens with the following extra header:

Cookie: PHPSESSID=13c1635cb39e550a2c49381ddc87730e

So, the browser just posts back the cookie. So, if you want to make a connection with a session all ou have to do is make a connection that will initiate the session, which means you will get the PHPSESSID and then in every request for the same session just add it in the headers. I hope this makes things more clear.