Networks and the Internet
Network connections
You can identify a TCP connection uniquely by five parameters:
- The source IP address.
- The source port number. These two parameters are needed so that the other end of the connection can send replies back.
- The destination IP address.
- The destination port number.
- The protocol (TCP).
When you set up a connection, you specify the destination IP address and port number, and implicitly also the protocol. Your system supplies the source IP address; that's obvious enough. But where does the source port number come from? The system literally picks one out of a hat; it chooses an unused port number somewhere above the "magic" value 1024. You can look at this information with netstat:
$ netstat Proto Recv-Q Send-Q Local Address Foreign Address (state) tcp4 0 0 presto.smtp 203.130.236.50.1825 ESTABLISHED tcp4 0 0 presto.3312 andante.ssh ESTABLISHED tcp4 0 0 presto.2593 hub.freebsd.org.ssh ESTABLISHED tcp4 0 0 presto.smtp www.auug.org.au.3691 ESTABLISHED
As you can see, this is the view on a system called presto. We'll see presto again in our sample network below. Normally you'll see a lot more connections here. For each connection, the protocol is tcp4 (TCP on IPv4). The first line shows a connection to the port smtp on presto from port 1825 on a machine with the IP address 203.130.236.50.
netstat shows the IP address in this case because the machine in question does not have reverse DNS mapping. This machine is sending a mail message to presto. The second and third lines show outgoing connections from presto to port ssh on the systems andante and hub.freebsd.org. The last is another incoming mail message from http://www.auug.org.au. Graphically, you could display the connection between presto and http://www.auug.org.au like this:
Note that the port number for smtp is 25.
For various reasons, it's not always possible to connect directly in this manner:
- The Internet standards define a number of IP address blocks as non-routable. In these cases, we'll have to translate at least the IP addresses to establish connection. This technique is accordingly called Network Address Translation or NAT, and we'll look at it in "Firewalls, IP aliasing and proxies" , on page 393.
- For security reasons, it may not be advisable to make direct connections to servers via the Internet. Instead, the only access may be via an encrypted session on a different port. This technique is called tunneling, and we'll look at it in "Basic network access: clients" , on page 424.
The physical network connection
The most obvious thing about your network connection is what it looks like. It usually involves some kind of cable going out of your computer2 Maybe it won't. For example, you might use wireless Ethernet, which broadcasts in the microwave radio spectrum., but there the similarity ends. FreeBSD supports most modern network interfaces:
- The most popular choice for Local Area Networks is Ethernet, which transfers data between a number of computers at speeds of 10 Mb/s, 100 Mb/s or 1000 Mb/s (1 Gb/s). We'll look at it in the following section.
- An increasingly popular alternative to Ethernet is wireless networking, specifically local networks based on the IEEE 802.11 standard. We'll look at them on page 291.
- FDDI stands for Fiber Distributed Data Interface, and was originally run over glass fibres. In contrast to Ethernet, it ran at 100 Mb/s instead of 10 Mb/s. Nowadays Ethernet runs at 100 Mb/s as well, and FDDI runs over copper wire, so the biggest difference is the protocol. FreeBSD does support FDDI, but we won't look at it here.
- Token Ring is yet another variety of LAN, introduced by IBM. It has never been very popular in the UNIX world. FreeBSD does have some support for it, but it's a little patchy, and we won't look at it in this book.
- Probably the most common connection to a Wide-Area Network is via a telephone with a modem or with DSL. Modems have the advantage that you can also use them for non-IP connections such as UUCP and direct dial up (see page 338), but they're much slower than DSL. If you use a modem to connect to the Internet, you'll almost certainly use the Point to Point Protocol, PPP, which we look at on page 339. In some obscure cases you may need to use the Serial Line Internet Protocol, SLIP, but it's really obsolete.
- An alternative to ADSL or modem lines is cable networking, which uses TV cable services to supply Internet connectivity. In many ways, it looks like Ethernet.
- In some areas, Integrated Services Digital Networks (ISDNs) are an attractive alternative to modems. They are much faster than modems, both in call setup time and in data transmission capability, and they are also much more reliable. FreeBSD includes the isdn4bsd package, which was developed in Germany and allows the direct connection of low-cost German ISDN boards to FreeBSD. In other parts of the world, ISDN is not cost effective, and it's also much slower than ADSL and cable.
- In some parts of the world, satellite links are of interest. In most cases, they are unidirectional: they transfer data from the Internet to your system (the downlink) and require some other connection to get data back to the Internet (the uplink).
- If you have a large Internet requirement, you may find it suitable to connect to the Internet via a Leased Line, a telephone line that is permanently connected. This is a relatively expensive option, of course, and we won't discuss it here, particularly as the options vary greatly from country to country and from region to region.
The decision on which WAN connection you use depends primarily on the system you are connecting to, in many cases an Internet Service Provider or ISP. We'll look at ISPs in "Connecting to the Internet" .
Ethernet
In the early 1970s, the Xerox Company chartered a group of researchers at its Palo Alto Research Center (PARC ) to brainstorm the Office of the Future. This innovative group created the mouse, the window interface metaphor and an integrated, object-oriented programming environment called Smalltalk. In addition, a young MIT engineer in the group named Bob Metcalfe came up with the concept that is the basis of modern local area networking, the Ethernet. The Ethernet protocol is a low-level broadcast packet-delivery system that employed the revolutionary idea that it was easier to resend packets that didn't arrive than it was to make sure all packets arrived. There are other network hardware systems out there, IBM's Token Ring architecture and Fiber Channel, for example, but by far the most popular is the Ethernet system in its various hardware incarnations. Ethernet is by far the most common local area network medium. There are three types:
- Originally, Ethernet ran at 10 Mb/s over a single thick coaxial cable, usually bright yellow in colour. This kind of Ethernet is often referred to as thick Ethernet, also called 10B5, and the line interface is called .AUI You may also hear the term yellow string (for tying computers together), though this term is not limited to thick Ethernet. Thick Ethernet is now obsolete: it is expensive, difficult to lay, and relatively unreliable. It requires 50 ohm resistors at each end of the cable to transmit signals correctly. If you leave these out, you won't get degraded performance: the network Will Not Work at all.
- As the name suggests, thin Ethernet is thin coaxial cable, and otherwise quite like thick Ethernet. It is significantly cheaper (thus the term Cheapernet), and the only disadvantage over thick Ethernet is that the cables can't be quite as long. The cable is called RG58, and the cable connectors are called BNC. Both terms are frequently used to refer to this kind of connection, as is 10 Base 2. You'll still see thin Ethernet around, but since it's effectively obsolete. Performance is poor, and it's no cheaper than 100 Mb/s Ethernet. Like thick Ethernet, all machines are connected by a single cable with terminators at each end.
- Modern Ethernets run at up to 1000 Mb/s over multi-pair cables called UTP, for Unshielded Twisted Pair. Twisted pair means that each pair of wires are twisted to minimize external electrical influence—after all, the frequencies on a 1000 Mb/s Ethernet are way up in the UHF range. Unlike coaxial connections, where all machines are connected to a single cable, UTP connects individual machines to a hub or a switch, a box that distributes the signals. We'll discuss the difference between a hub and a switch on page 288. You'll also hear the terms 10BaseTP, 100BaseTP and 1000BaseTP
Compared to coaxial Ethernet, UTP cables are much cheaper, and they are more reliable. If you damage or disconnect a coaxial cable, the whole network goes down. If you damage a UTP cable, you only lose the one machine connected to it. On the down side, UTP requires switches or hubs, which cost money, though the price has decreased to the point where it's cheaper to buy a cheap switch and UTP cables rather than the RG58 cable alone. UTP systems employ a star architecture rather than the string of coaxial stations with terminators. You can connect many switches together simply by reversing the connections at one end of a switch-to-switch link. In addition, UTP is the only medium currently available that supports 100 Mb/s Ethernet.
How Ethernet works
A large number of systems can be connected to a single Ethernet. Each system has a 48 bit address, the so-called Ethernet address. Ethernet addresses are usually written in bytes separated by colons (:), for example 0:a0:24:37:0d:2b. All data sent over the Ethernet contains two addresses: the Ethernet address of the sender and the Ethernet address of the receiver. Normally, each system responds only to messages sent to it or to a special broadcast address.
You'll also frequently hear the term MAC address. MAC stands for Media Access Control and thus means the address used to access the network link layer. For Ethernets I prefer to use the more exact term Ethernet address.
The fact that multiple machines are on the same network gives rise to a problem: obviously only one system can transmit at anyone time, or the data will be garbled. But how do you synchronize the systems? In traditional Ethernets, the answer is simple, but possibly surprising: trial and error. Before any interface transmits, it checks that the network is idlen. In the Ethernet specification, this is called Carrier Sense. Unfortunately, this isn't enough: two systems might start sending at the same time. To solve this problem, while it sends, each system checks that it can still recognize what it is sending. If it can't, it assumes that another system has started sending at the same time— this is called a collision. When a collision occurs, both systems stop sending, wait a random amount of time, and try again. You'll see this method referred to as CSMA/CD (Carrier Sense Multiple Access/Collision Detect).
There are a number of problems with this approach:
- The interface needs to listen while sending, so it can't receive anything while it's sending: it's running in half-duplex mode. If it could send and receive at the same time (full-duplex mode), the network throughput could be doubled.
- The more active the network, the more likely collisions will be. This slows things down too, sometimes to a point where the network hardly transmits any traffic.
- The more systems on the network, the less bandwidth is available for each system.
With the point-to-point connections on a UTP-based network, you would think it would be possible to change some of this. After all, the connections look pretty much like the same wire that joins two modems to get her, and modems don't have collisions, and they do run in full-duplex mode. The problem is the hub: if you send a packet out to a hub, it doesn't know which connector to send it down, so it sends it down all of them, thus imitating the old Ethernet. To send it just to the destination, it would need to analyze the Ethernet address in every packet and know where to send it.
This is what a switch does: it learns the Ethernet addresses of each interface on the network and uses this information to send packets to only the line to which that interface is connected. There could be more than one if switches are cascaded. This also means that the line can run in full-duplex mode.
Nowadays the price differential between switches and hubs is very small; go into a computer market and you'll see that the prices overlap. If at all possible, buy a switch.
Transmitting Internet data across an Ethernet has another problem. Ethernet evolved independently of the Internet standards. As a result, Ethernets can carry different kinds of traffic. In particular, Microsoft uses a protocol called NetBIOS, and Novell uses a protocol called ZPX. In addition, Internet addresses are only 32 bits, and it would be impossible to map them to Ethernet addresses even if they were the same length. The result? You guessed it, another header. Figure 16-6 shows an Ethernet packet carrying an IP datagram.