Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
CSE 582 Data Structures and Algorithms
HTTP Background
In this section, we provide a brief overview of how a simple web server works and the HTTP protocol. Our goal in providing you with the web server is that you should be shielded from all of the details of network connections and the HTTP protocol. The code that we give you already handles everything that we describe in this section. If you are really interested in the full details of the HTTP protocol, you can read the HTTP specification, but we do not recommend it for this project.
Most web browsers and web servers interact using a text-based protocol called HTTP (Hypertext Transfer Protocol). A web browser opens an Internet connection to a web server and requests some
content with HTTP. The web server responds with the requested content and closes the connection.
The browser reads the content and displays it on the screen.
Each piece of content on the server is associated with a file. If a client requests a specific disk file,then this is referred to as static content. If a client requests an executable file to be run and its output returned, then this is called dynamic content. In this lab, our web server will only handle static content.
Each file requested from the server has a unique name known as a URL (Universal Resource Locator).The port number is optional and defaults to the well-known HTTP port of 80.
An HTTP request (from the web browser to the server) consists of a request line, followed by zero or more request header lines, and finally an empty text line. A request line has the form: method uri version. The method is usually GET (but may be other things, such as POST, OPTIONS, or PUT).
The uri is the file name and any optional arguments (for dynamic content). The vers ion indicates the version of the HTTP protocol that the web client is using (e.g. HTTP/1.0 or HTTP/1.1). The request headers define various parameters of the request such as the type of browser (user agent) making the request. Each header is a colon-separated name-value pair in clear-text string format.
The request line and other header fields must each end with <CR> <LF> (that is, a carriage return character followed by a line feed character). The empty line must consist of only <CR> <LF>.
An HTTP response (from the server to the browser) is similar; it consists of a response line, zero or more response header lines, an empty text line, and finally the interesting part, the response body.
A response line has the form version status message. The status is a three-digit positive integer that indicates the state of the request; some common states and the corresponding messages are 200 for OK, 403 for Forbidden, and 404 for Not found. Two important lines in the response header are Content-Type, which tells the client the MIME type of the content in the response body
(e.g., html or gif) and Content-Length, which indicates the size of the response body in bytes. The server can add any custom header line.
Again, you don’t need to know this information about HTTP unless you want to understand the details of the code we have given you. You will not need to modify any of the procedures in the web server that deal with the HTTP protocol or network connections.
The multi-threaded web server we have provided enables scaling performance by using multiple threads. Using multiple threads has two benefits. First, when an HTTP request is for a file that is resident only on disk (i.e, it is not in memory), then instead of waiting for the file to be loaded in memory, the web server serves another concurrent request. Second, the web server serves multiple requests in parallel using the multiple cores on a machine.
The multi-threaded web server that we have provided uses a fixed-size pool of worker threads to serve web requests. A master thread initially creates this pool of worker threads. Then the master
thread waits for incoming HTTP network connections. It accepts an incoming connection and places the socket descriptor for this connection into a fixed-size request buffer. Then, it continues to wait for more connections.
A worker thread dequeues an HTTP request (socket descriptor) from the request buffer and then processes the request. It performs a read on the socket descriptor to obtain the file name that is requested, reads the file from disk, and then returns the file contents to the client by writing to the socket descriptor. Then, the worker thread waits for another HTTP request.
The master thread and the worker threads synchronize with each other using the fixed-size request buffer. The master thread is the single producer that adds requests to this buffer. If the buffer is full, the master thread waits. The worker threads are consumers that remove requests from the buffer. When the buffer is empty, the workers wait.
When there are multiple HTTP requests available, the requests are handled in FIFO order. Hence,when a worker thread wakes up, it handles the first request (i.e, the oldest request) in the buffer.
Note that the HTTP requests will not necessarily finish in FIFO order. The order in which the requests complete will depend upon how long it takes to process the request and also on how the OS schedules the active threads.
The figure below evaluates the performance of the multi-threaded webserver. The Y axis shows the time it takes for the multi-threaded webserver to serve a fixed number of concurrent requests and the X axis show the number of worker threads. Each data point is an average of 5 runs and the standard deviation across the runs is shown as a bar (I’ symbol). Note that when the number of worker threads is 0, then the master thread serves all requests. Thus the run time for 0 and 1 worker threads is similar. Beyond that, the webserver performance increases (run time decreases) with increasing number of threads. The machine on which this experiment is run has 8 hyperthreaded cores (for a total of 16 cores), so performance does not increase beyond 16 cores.
You will be generating a similar figure to evaluate your caching server using the plot-cache-experiment program described in the testing your code section of this document.
Add the source files for this lab, available in webserver. tar, to your repository, and run make in the
newly created webserver directory.
cd ~/ece344
tar -xf /cad2/ece344f/ src/webserver. tar
git status # should say that “webserver/” directory is untracked
git add webserver
git commit -m “Initial code for Lab 5”
git tag Lab5-start
cd webserver
make
The make command will create four executables called server, client_ simple, client and fileset. We will describe the server program here, and then describe the other three programs later.
The server program we have provided you is a basic, multi-threaded server. When you run it, you need to specify the port number on which it will listen for connections (more details about running the server program are described below). You should specify port numbers that are greater
than roughly 2000 to avoid active ports. Then, when you connect your web browser to this server,make sure that you specify this same port. For example, assume that you are running the server on ug205. eecg and using port number 2003. Copy your favorite HTML file, called favorite. html to the webserver directory. Then, you can view this file from a web browser (running on the same or a different machine), by using the url: ug205. eecg . utoronto . ca : 2003/ favorite . html. Note that
you will need to run the web browser on one of the lab machines.
We are providing you with a minimal web server. For example, the web server does not handle any HTTP requests other than GET. Also, it does not support running CGI programs. This web server is also not very robust; for example, if a web client closes its connection to the server, it may crash.
We do not expect you to fix these problems!
We provide various helper functions that are simply wrappers for system calls that check the error codes of the system calls and immediately terminate if an error occurs. One should always check error codes! However, many programmer don’t like to do it because they believe that it makes their code less readable; the solution is to use these wrapper functions.