Understanding HTTP using Perl
What is HTTP?
HTTP is a client-server protocol by which two machines can communicate over a tcp/ip
connection. An HTTP server is a program that sits listening on a machine's port for HTTP
requests. An HTTP client (we will be using the terms HTTP client and web client interchangeably)
opens a tcp/ip connection to the server via a socket, transmits a request for a document, then
waits for a reply from the server. Once the request-reply sequence is completed, the socket is
closed. So the HTTP protocol is a transactional one. The lifetime of a connection corresponds to
a single request-reply sequence. (a transaction)
HTTP is the protocol used for document exchange in the World-Wide-Web. Everything that
happens on the web, happens over HTTP transactions. TCP/IP networking and HTTP are the two
essential components that make the web work. In order to write software that accesses the web
(like a web browser, or a custom web client) you need a basic understanding of both. In this
article we will cover HTTP, how it works and how to use it for simple transactions. We plan to
include in this site some more articles which will cover basic network programming issues relating
to TCP/IP and HTTP.
The client side: HTTP requests
So basically what happens when we open a URL with the browser, is that the browser figures out
from the url, what the HTTP server's host machine and port are, as well as the document path for
the document we request from the server. For example,
http://www.perlfect.com/articles/index.shtml suggests the document /articles/index.shtml on the
server at www.perlfect.com and port 80. (no port is specified in the url, so the default, 80, is used)
Subsequently, an HTTP request will be recited for that document and the appropriate connection
via TCP/IP will be made with the server. Then the client (the browser, that is) will send the
request, and wait for the server to respond with an HTTP response and, hopefully, the requested
document. If all goes fine, the browser will arrange for displaying the document on our desktop
window. (by rendering the HTML code into visual layout and making additional request for any
images or other files that are embedded in the HTML document)
Now, let's have a look under the hood to see what those HTTP requests lok like. Suppose you
type the URL of the previous example, http://www.perlfect.com/articles/index.shtml on your
netscape's location text box. Here's what the request will look like. (for the sake of clarity, the
following request contains just as many headers as needed to demonstrate the HTTP request's
general form and functionality - Netscape will surely make up a more complicated request, but
the essential part of it are what is shown below)
GET /articles/index.shtml HTTP/1.0
User-Agent: Mozilla 4.0 (X; I; Linux-2.0.35i586)
Host: www.perlfect.com
Accept: image/gif, image/jpeg, */*
The first line contains three important pieces of information: The request method (GET), the
requested document (/articles/index.shtml) and the HTTP protocol version that the client uses.
(1.0) You might wonder what the request method is, but you really don't need to be worried about
it at this point. There are a few different request methods the omst common ones being:
- GET asks to retrieve a document
- POST passes form data to the server for use as input to some CGI program
- HEAD asks to retrieve only the HTTP response header for a document but not the
document itself.
There are others, too that are much less frequently used, and we won't discuss them. The general
structure of a request applies to all methods, so we will stick to GET for now, to demonstrate how
request work in general.
Following that, there are a number of lines called request headers. They are all of the form:
Header-name: Header Value and they specify information and parameters that will help the
server provide a suitable response. In this example the parameters indicate the client software
name and version, the server hostname for which the request is meant (this is because
sometimes, a single HTTP server might serve documents under different names, and each name
corresponds to a different directory tree - so the server needs to be told what name to look up
the document for) and the MIME types that the client is willing to accept.
The server side: HTTP responses
Now, looking on to the server's response:
HTTP/1.0 200 OK
Date: Thus, 08 Oct 1998 16:17:52 GMT
Server: Apache/1.1.1
Content-type: text/html
Content-length: 1538
Last-modified: Mon, 05 Oct 1998 01:23:50 GMT
<HTML>
<HEAD>
<TITLE>Perlfect Solutions</TITLE>
...
The first line contains the version of HTTP used in the response, and the response status in both
numerical code (200) and human-readable string (OK). There are a number of such resonse
codes. To give two common examples : 200 OK means that the document has been found and
that it follows the response headers and 404 NOT FOUND means that the document path does
not exist.
Similarly to request headers, we also have response headers, which are used to pass information
about the document in transit and the status of the server and the request. In the example above
the headers provide information about the server software and version, the date and time the
response was issued and finally the MIME type, length and last modification date of the document
in transit.
A blank line marks the end of the head of the resonse, and then the document follows. After the
browser's finished receiving the HTML document in question, and the TCP/IP connection has
been dropped, it will go on to request any additional embedded documents (in-line images for
example) and render the page's layout on screen. Clicking on a link will cause the browser to
issue a new request for the page pointed to by the link, and so on.
Playing around
As mentioned earlier in our discussion, the examples shown here, while perfectly correct and
working, are merely indicative of the HTTP protocol. The reader is encourged to play around and
experiment with the HTTP requests and responses by real clients and servers. For example, if
you do a simple telnet to the port 80 of a host with an active web server and type in a simple
request like the example we gave, you can have fun watching the server's response come
streaming live. Try non-existent documents, images, or whatever to see real examples of
responses. On the other end check if your web server provides diagnostic facilities to let you
inspect the incoming requests from web browsers. As with anything in computing, there's a lot to
learn from such playing around.
Suggested Reading
Online Documentation/Tutorials
Comments
|
Cherry | Posted at 10:26pm on Sunday, March 4th, 2007 | A good document for starters ...
Thanks |
Devendra Singh Rathore | Posted at 7:58pm on Sunday, April 15th, 2007 | hi
nice explainetion. |
nono | Posted at 11:29pm on Saturday, May 5th, 2007 | good |
Linda | Posted at 3:03pm on Wednesday, May 9th, 2007 | Hope this helps me. |
mahesh kansakar | Posted at 4:40am on Tuesday, May 29th, 2007 | nice |
Kari Jääskeläinen | Posted at 5:30am on Wednesday, June 27th, 2007 | That information helped me finish the last details on my http based log server! Thanx! |
suriya | Posted at 3:05am on Monday, October 15th, 2007 | really nice .need some more detailed information . |
bob | Posted at 1:41pm on Monday, January 21st, 2008 | Not really a great beginners start! will use a beginners guide i think as above seems an alien language!!! |
Anonymous | Posted at 2:44pm on Saturday, May 3rd, 2008 | protip: its you thats broke not the guide. |
Comments to date: 9.
|
Suggested Reading
Perl and LWP is an excellent book to get you started with using sockets
and HTTP to write your own web clients in perl. It covers many issues relevant to web clients and while it does
not go into much depth in some of them, by the time you have absorbed the techniques described in it, you will
no longer need a book to walk you through more complex problems.
Advanced Perl Programming among various other very interesting subjects, dedicates a chapter to socket
programming, not in the context of web clients, but still in a very clear and to-the-point manner. It is also a good book
to have if you're seriously interested about perl programming, in my opinion.
|