This is a quick tour of the internet and how it works. The aim is to provide a short, but reasonably comprehensive overview of the internet. It is for people who are interested in how the internet works, but don’t want to spend hours researching all the details.
We’ll start by looking at the low-level design of the internet and gradually move up to browsing a website.
What is the internet?
Contrary to popular belief, the internet is neither a single physical entity or a large amorphous network. Rather, it is a network of networks.
These sub-networks are called Autonomous Systems (AS). There are just over 43,000 ASs connected to this network (at the beginning of 2013). Each AS consists of a network of nodes connected together.
Where am I in this network?
There are often further sub-networks. For example, when you connect to the internet, it is likely that your device is directly connected to a local network. This local network is often provided by a router in the same building, whether using a wired or wireless connection.
That router may have a direct connection to a node of your ISP’s (internet provider’s) AS.
How does data reach other devices?
In order to communicate with someone on the other side of the world, data is routed through the user’s local network and sent to the connected AS. From there, it is routed through the AS to reach the internet. At this point, it is routed through the internet to the AS the recipient is connected to, and through that AS to reach the recipient’s local network, where it is finally routed to their device.
IP (Internet Protocol) is used to work out where to route data to. This is achieved by assigning unique IP addresses to each device connected to the internet. In the case of our example, the user’s device will have been assigned a local IP address from the router, while the router (connected to the AS), will have a global IP address on the internet.
The internet uses part of the address to work out which AS the data should be sent to. The AS uses the other part of the address to work out which node on its network to route to.
What if something goes wrong?
The network is designed to update its routes if a node disappears, becomes congested etc. Despite this, data frequently fails to reach its target.
To work around this, something called the TCP protocol is often used on top of the IP protocol (TCP/IP). This protocol allows data to be split into small packets. A packet is a chunk of data with additional information attached, including its destination among other things.
When data is sent to a recipient using TCP, it will first establish a connection with the recipient. This process is called ‘handshaking’, and is simply a way for both sides to confirm there is a connection between them and initialise data transfer.
At the end of receiving a series of packets, the recipient will send a reply packet to acknowledge this. If this acknowledgement is not received, the sender will assume the packets failed to arrive and resend the data packets.
How does a website work?
Domain names?
When a user enters a URL (e.g. http://google.com) into a browser, the first thing it must do is translate the domain name into an IP address. The domain name is the main part of a URL delimited with ‘.’ (google.com), and is translated by querying a DNS (Domain Name System) server.
The user’s machine may need to query a root DNS server to find another DNS server for the given TLD (top level domain (e.g. .com)). It will then query that DNS server to find the IP address for google.com.
A domain name may consist of multiple sub-domains, each of which may require looking up another DNS server to reach the final IP address. For example, foo.bar.gov.uk would start by looking up the TLD (.uk), at that server it would look for ‘gov’, at the next it would look for ‘bar’, and at the last DNS server it would get an IP address for ‘foo’.
How does it get the web page?
After a browser has retrieved the IP address of the server, the browser will begin communicating with the server; it will send a request to the server for a web page.
The web page is specified after the domain name, the part after the ‘/’ (e.g. example.com/something.html). If no specific web page is specified, the server will choose a default page (typically called something like index.html).
If multiple parts are listed, such as example.com/foo/bar/baz.html, these can be thought of as sub-folders on the server.
The server will then send a reply with the requested page.
How does a page display?
The returned page is an HTML document. HTML is a markup language, which means that it describes what each part of the content is (e.g. header, table etc.). The browser will read this file, and decide from this information how to display it.
This is a single text file, but can contain links to images and other content which the browser will download separately and then display as part of the page. It can also contain links to other web pages and provide methods for sending additional data to the server.
Further Reading
We’ve taken a brief look at many different parts of the internet, if you’d like to learn more about any of these parts, the links below may give you a good start:
- Autonomous System
- Routing
- Internet protocols
- DNS
- HTML
- Unix and Internet Fundamentals – A similar essay explaining how a computer works (particularly a Linux system). A little dated, but still useful.