Graduate Program KB

HTTP and DNS

How the internet works

How the internet works in 5 minutes
youtube: 7_LPdttKXPc
How does the internet work?
youtube: x3c1ih2NJEg

A Simple Network

2 Computer Network

5 Computer Network

5 Computer Network With Router

A Network of Networks

Connecting Routers to Routers

Connecting Routers to Routers to Routers

Connecting a Network to the Telephone Infrastructure

Connecting Networks using Telephone Infrastructure

Locating Devices in a Network

To send messages to a device, we need a way to identify the various devices in our network. The Internet Protocol or I.P. for short, part of the Internet Protocol Suite, defines and enables internetworking. It uses logical addressing system and performs routing.

Routing is the act forwarding packets from a source host to the next router that is one 'hop' closer to the intended destination host on another network.

IPv4, uses 32 bit addresses. This limits the total available IPv4 addresses to 2^32 (4,294,967,296). IPv4 address are typically represented in dot-decimal notation. This consists of the 4 octets of the address expressed in decimal form, separated by periods e.g. 192.168.9.1. The I.P. address 192.168.9.1 is equivalent to the 32 bit decimal number 3232237825 and 0xc0a80901 in hexadecimal.

IPv4 addresses are not very human-friendly because they are not easy to memorize. To solve this problem, an IP address can be aliased with human-readable name known a domain name. You look up the registration data for a domain or IP address by visiting the ICANN Lookup page.

Relationship between the Internet and the Web.

The internet and the web are not the same thing! The internet is the infrastructure connecting billions of computers. The web is a service built on top of the internet.

Intranet and Extranets

Intranets are private networks. Organizations use these to allow secure access to shared resources (e.g. file servers, printers, wikis) and to facilitate intra-organization communication (e.g. I.P. phones, chat servers).

Extranets are similar to intranets, except they allow other networks to access to some or all of their resources. These types of networks are typically used when organizations wish to share resources with stakeholders or customers.

Different Types of Networks

Websites and Web Pages

A web page is a simple document that can be displayed by a browser. Typically, a web page is written in HTML and can embed different types of resources:

  • styles - used to control the look and feel of the page
  • scripts - used to allow users to interact with our web page
  • media - images, sound, video

All web pages on the internet are reachable through a unique address, known as a URL. We'll talk more about these later.

A website is a collection of related web pages that are accessed under a unique domain name.

A web server is a computer that hosts one or more websites. A web server is responsible for serving a website's content (HTML documents, images, videos etc.) to a requesting client (e.g. a user's browser).

In 1989, Tim Berners-Lee, inventor of the World Wide Web, described the Web's three pillars:

  1. URLs - a way to uniquely address any resource on the internet
  2. HTTP - a transfer protocol, allowing users to retrieve HTML documents at a given URL.
  3. HTML - a document format allowing for embedded hyperlinks.

The early goals of the web was to make it easy to find, read and navigate through text documents. Though much has changed since then, including support for video, images and binary data, the fundemental pillars still hold.

URLs, being human readable, made it much easier to navigate the web. Having said that, typing in very long URLs became very cumbersome and that's where hyperlinks came to the rescue. Hyperlinks, colloquially referred to as links, allowed any text string to be correlated with a URL. Users could visit the document residing the underlying URL by simply clicking a link.

There are three types of links:

  1. Interal: These are links between web pages belonging to the same website.
  2. External: A link from our website to another website.
  3. Incoming: A link from another website back to our website.

Anchors

Where links tie two pages together, anchors tie two sections of a document together. Following a link that points to an anchor will cause the browser to jump to another section of the current document.

URLs

URL stands for Uniform Resource Locator. A URL is the address of a unique resource on the Web, with the only exceptions being URLs that point to resources that have been moved, or URLs that point to resources that no longer exist. The resource represented by the URL and the URL itself are handled by the webserver.

A URL is composed of multiple parts, with some parts being optional. Let's take a look at the most important parts of a URL:

Anatomy of a URL

Scheme

The scheme indicates the protocol to be used when requesting the resource.

Authority

The authority is separated from the scheme by a semi-colon follow by two forward slashes ://. If present, the authority is comprised of the domain (e.g. www.mydomain.com) follow by the port (80), seperated by a colon (:).

  • The domain indicates the Web server that the resource is being request from. This can be a domain name or an IP address.
  • The port is number used to uniquely identify a connection endpoint. Port numbers are used by the operating system to direct data to a specific service. Specific port numbers are reserved to identify specific services. Ports below 1024 are used to identify historically commonly used services, with the HTTP service is assigned port 80 and HTTPS port 443.

Note: The scheme and the authority are separated by ://. The colon (:) separates the scheme from the rest of the URL, while the // indicates that the next part of the URL is the authority. Not all URLs contain an authority, one example being mailto:donkeykong. The URL contains a scheme (mailto) but no authority. In this case we need only use a colon to separate the scheme from the rest of the URL.

Path to the resource

Back in the early days of the Web, this path would represent a physical file on the Web server. These days, it is typically an abstraction, handled Web server without necessarily mapping to any physical file.

Parameters

?key1=value1&key2=value2 are extra parameters that are passed to the Web server. The ? denotes the start of the parameters, with each parameter's key/value pair separated with an & symbol.

Anchors

The anchor #ALocationInDocument instructs the browser to navigate to the location in the document where the anchor is defined. For audio or video media, the browser will attempt to go to the time code defined in the anchor.

Note: The section after the #, known as the fragement_identifier is not sent to the Web server with the request.

How are URLs used?

A URL is typically typed into the address bar in a browser to fetch the resource it represents on a particular Web server.

HTML makes extensive use of URLs:

  • To display HTML documents within an <iframe> element
  • To link resources to documents that rely on them using <link> and <script> elements
  • To link one document to another using the <a> element
  • To render media, such as:
    • images using the <img> element
    • video using the <video> element
    • audio using the <audio> element

URLs are used extensively in CSS and Javascript too - and between HTML, CSS and Javascript, we have the core technologies that make up the modern Web.

What's the difference between absolute and relative URLs?

Referring to the URL specification, we can see that in addition to absolute URL strings, the specification describes relative URL strings.

Note: The specification uses the term URL objects to describe the in-memory representation of URL strings.

When using the browser to fetch a resource using its URL, we must use an absolute URL. This is because the URL is provided without any additional context. The port (80, 443) and protocol (e.g. http or https) are not mandatory, and if we omit them, the browser will default to the HTTP protocol using port 80.

Within a document, such as a HTML page, we can use relative URLs. This is because the browser knows the absolute URL that was used to fetch HTML document. You can tell if a URL is relative or absolute by looking at the URLs path. If the path portion of the URL begins with a / character, then the browser will fetch the requested resource from the root of the Web server, without any consideration for the context provided by the current document.

Lets look at some examples of absolute URLs:

Absolute URLs

  1. Full URL: this is what we need to use to fetch a resource for the first time using our browser
  2. Implicit protocol: the browser will resolve this URL using the same protocol as the one used to fetch the document that contains this URL
  3. Implicit domain name: you're most likely to encounter absolute URLs of this form. The browser will use the same protocol and domain name as the one used to fetch the document containing this URL(Note: you cannot omit the domain name without omitting the protocol)

Now lets turn our attention to some examples of relative URLs:

Assume that we are viewing a document that was fetched using the following URL: https://gradprogram.productfactory.io/courses/brain-school

Relative URLs

  1. Sub-resources: Because the URL does not start with a /, the browser will attempt to find the resource in a sub-directory of the one that contains the current resource. This amounts to requesting the following resource: https://gradprogram.productfactory.io/courses/brain-school/03-http-and-dns
  2. Navigating relative to current location: In this instance, we want a resource located one directory up (.. is borrowed from Unix file systems and means 'the directory above') from the current resource. In effect, we are requesting the following resource: https://gradprogram.productfactory.io/courses/brain-school/03-http-and-dns/../01-js-bundling-part-1-of-2. This can be simplified to https://gradprogram.productfactory.io/courses/brain-school/01-js-bundling-part-1-of-2

Semantic URLs

A Semantic URL (also known as a Clean URL) is a URL that aims to improve the usability and accessibility of a website, web application, or web service. In practice, this means designing our URLs to have intuitive names, that clearly communicate the conceptual structure of a collection of information without being coupled to the server's internal representation of the underlying information.

Building our own Web server

What is a Web server?

Before we begin, lets define what a Web server is. When talking about a Web server, context is important; we may be referring to a physical hardware, software or a combination of the two:

  1. Physical hardware: A Web server describes a computer that hosts a Web server software application, as well as the physical files that the hosted web site is composed of (e.g. CSS Javascript, HTML and other media files). A Web server should be connected to the internet and capable of exchanging data with other online devices.

  2. Software: A Web server can describe the software components needed to control access to hosted files. The most basic Web server is an HTTP server. A HTTP server understands URLs and the HTTP protocol. To access a Web server one uses the domain names of the web sites it hosts. The Web server then servers the content of the requested resources to the requesting clients.

Whenever a client wants to access a resource hosted on the Web server, it sends a request using the HTTP protocol. When the request arrives at the intended physical Web server, the software HTTP server application accepts the request, locates the requested resource and responds to the client using the HTTP protocol. If the resource could not be located, the HTTP server should return a 404 response.

Web Server Flow

A Web server must provide support for HTTP(Hyer Text Transfer Protocol). HTTP is a textual, stateless protocol. Let's examine what each of these terms mean:

  • textual: commands are in (human readable) plain text
  • stateless: web server and client do not store information about previous communications. To support stateful communication, one needs an application server.

The following rules apply to HTTP communication:

  1. In the majority of cases, a client makes HTTP requests to only ever to HTTP servers. Servers respond to a client's HTTP requests. A server may also push data into a client cache through a mechanism known as a server push.
  2. When requesting a resource from a HTTP server, a client must provide the resource's URL.
  3. The HTTP server responds to every client's requests, even if the response is an error.

Upon receiving a request, a HTTP server will:

  1. Verify that the URL corresponds to a know resource.
  2. If the resource exists (static content), the HTTP server will send the content back to the client. In some cases, the content may need to be generated (dynamic content) before it can be sent to the client.
  3. If the content does not exist and it cannot be generated, then the HTTP server will responed to the client with an error message.

Node.js HTTP API

You can run a HTTP server using Node.js. Let's create a new Node.js application:

  1. Create a directory named 03-http-and-dns and change into the directory
  2. Run npm init to create a package.json file:
    • Set the value of main to server.js
  3. Create a file called server.js in the root of your project's directory:
    • touch server.js

To begin, import the http built-in module:

import http from "http";

Next, we'll invoke the http.createServer method to create a new Server instance:

const server = http.createServer();

Next we'll need to write a function to respond incoming requests. This function is called a request listener:

https://nodejs.org/api/events.html#emitteroneventname-listener

const requestListener = (request, response) => {
  // Respond to the incoming request
};

We now need to register our requestListener with the server. Before we can do this, we need to know a little bit more about the Server class:

http.Server class hierarchy

So with that in mind, we can now register our requestListener so it is invoked when a request event is fired:

server.on("request", requestListener);

Now let's update our requestListener to do respond to any request with some JSON. We'll need to set the following properties on the response object:

And use the following methods to set headers,:

const requestListener = (request, response) => {
  const jsonData = JSON.stringify({
    studentName: "Gerson Jepzir",
    hobbies: ["Yelling", "Building miniature skyscrapers out of pop tarts"],
  });
  response.statusCode = 200;
  response.setHeader("Content-Type", "application/json");
  response.write(jsonData);
  response.end();
};

Finally, we need to start the server listening to requests on a port of our choosing. In this case, we'll make the server listen for requests on port 80:

server.listen(8080);

Next let's run our HTTP server process:

npm start

> 03-http-and-dns@1.0.0 start
> node server.js

Open a second terminal. We'll be using the cURL command line utility to send requests to our HTTP server and view the response.

--get to send a GET request -i include the HTTP response headers in the output

curl --get http://localhost:8080 -i

HTTP/1.1 200 OK
Content-Type: application/json
Date: Fri, 24 Mar 2023 02:09:32 GMT
Connection: keep-alive
Keep-Alive: timeout=5
Transfer-Encoding: chunked

{"studentName":"Gerson Jepzir","hobbies":["Yelling","Building miniature skyscrapers out of pop tarts"]}

In the response above, we can see the response status code 200, indicating that the resource we requested was successfully fetched and returned to us in the response message body. The Content-Type header is also present in the response, with a MIME type of application/json. A MIME types has the following structure: type/subtype. The type represents the general category that the data falls under, and the subtype specifies the exact type of data of the specified type that the MIME type represents. A MIME type can also have a optional parameter that provides additional details: type/subtype;parameter=value.

Our responseListener responds to all HTTP requests in the same manner. To handle different types of HTTP requests, we can inspect the request.method property. Let's update the requestListener function to return an 405 status code when we receive HTTP request where the method is not GET

const requestListener = (request, response) => {
  const requestMethod = request.method;
  if (requestMethod === "GET") {
    const jsonData = JSON.stringify({
      studentName: "Gerson Jepzir",
      hobbies: ["Yelling", "Building miniature skyscrapers out of pop tarts"],
    });
    response.statusCode = 200;
    response.setHeader("Content-Type", "application/json");
    response.write(jsonData);
    response.end();
  } else {
    response.statusCode = 405;
    response.end();
  }
};

Restart your HTTP server process. Now let's send a HTTP request, specifying the HEAD method. To do this we'll use cURL passing the following paramaters:

--head to only fetch HTTP headers -i include the HTTP response headers in the output

curl --head http://localhost:8080 -i

HTTP/1.1 405 Method Not Allowed
Date: Fri, 24 Mar 2023 03:54:08 GMT
Connection: keep-alive
Keep-Alive: timeout=5

Note the HTTP response status code 405 in the HTTP response headers. This indicates that the HTTP server received our request, recognized the request method in the HTTP request header, but the target resource does not support the request method.

Homework

Write a HTTP server to allow users to create, delete, update and retreive todos. Some of the things you'll need to consider:

  • How can you test the behaviours of your Web server?
  • What URLs would clients use to interact with todo resources?
  • How would you handle invalid requests?
  • How would you manage the state of todos? What happens if your HTTP server goes down?
  • Are there any libraries you could use to reduce boilerplate code?