The tools of the trade
Files and file names
Both UNIX and Microsoft environments store disk data in files, which in turn are placed in directories .A file may be a directory: that is, it may contain other files. The differences between UNIX and Microsoft start with file names. Traditional Microsoft file names are rigid: a file name consists of eight characters, possibly followed by a period and another three characters (the so-called file name extension). There are significant restrictions on which characters may be used to form a file name, and upper and lower case letters have the same meaning (internally, Microsoft converts the names to UPPER CASE). Directory members are selected with a backslash (\), which conflicts with other meanings in the C programming language—see page 138 for more details.
FreeBSD has a very fexible method of naming files. File names can contain any character except /, and they can be up to 255 characters long. They are case-sensitive: the names FOO, Foo and foo are three different names. This may seem silly at first, but any alternative means that the names must be associated with a specific character set. How do you upshift the German name ? What if the same characters appear in a Russian name? Do they still shift the same? The exception is because the / character represents directories. For example, the name /home/fred/longtext-with-a-long-name represent:
First character is a /, representing the root file system.
home is the name of a directory in the root file system.
fred is the name of a directory in /home.
The name suggests that longtext-with-a-long-name is probably a file, not a directory, though you can't tell from the name.
As a result, you can't use / in a file name. In addition, binary 0s (the ASCII NUL character) can confuse a lot of programs. It's almost impossible to get a binary 0 into a file name anyway: that character is used to represent the end of a string in the C programming language, and it's difficult to input it from the keyboard.
Case sensitivity no longer seems as strange as it once did: web browsers have made UNIX file names more popular with Uniform Resource Indicators or URIs, which are derived from UNIX names.
File names and extensions
The Microsoft naming convention (name, period and extension) seems similar to that of UNIX. UNIX also uses extensions to represent specific kinds of files. The difference is that these extensions (and their lengths) are implemented by convention, not by the file system. In Microsoft, the period between the name and the extension is a typographical feature that only exists at the display level: it's not part of the name. In UNIX, the period is part of the name, and names like foo.bar.bazzot are perfectly valid file names. The system doesn't assign any particular meaning to file name extensions; instead, it looks for magic numbers, specific values in specific places in the file.
Relative paths
Every directory contains two directory entries, . and .. (One and two periods). These are relative directory entries: . is an alternative way to refer to the current directory, and .. refers to the parent directory. For example, in /home/fred, . refers to /home/fred, and .. refers to /home. The root directory doesn't have a parent directory, so in this directory only, .. refers to the same directory. We'll see a number of cases where this is useful1Interestingly, the Microsoft file systems also have this feature .
Globbing characters
Most systems have a method of representing groups of file names and other names, usually by using special characters for representing an abstraction. The most common in UNIX are the characters *,? and the square brackets []. UNIX calls these characters globbing characters. The Microsoft usage comes from UNIX, but the underlying file name representation makes for big differences. Table 7-2 gives some examples.
Name | Microsoft meaning | UNIX meaning |
---|---|---|
CONFIG.* | All files with the name CONFIG, no matter what their extension. | All files whose name starts with CONFIG., no matter what the rest is. Note that the name contains a period. |
CONFIG.BA? | All files with the name CONFIG and an extension that starts with BA, no matter what the last character. | All files that start with CONFI.BA and have one more character in their name. |
* | Depending on the Microsoft version, all files without an extension, or all files. | All files. |
*.* | All files with an extension. | All files that have a period in the middle of their name. |
foo[127] | In older versions, invalid. In newer versions with long file name support, the file with the name foo[127]. | The three files foo1, foo2 and foo7. |
Input and output
Most programs either read input data or write output data. To make it easier, the shell usually starts programs with at least three open files:
- Standard input, often abbreviated to stdin, is the file that most programs read to get input data.
- Standard output, or stdout, is the normal place for programs to write output data.
- Standard error output, or stderr, is a separate file for programs to write error messages.
With an interactive shell (one that works on a terminal screen, like we're seeing here), all three files are the same device, in this case the terminal you're working on.
Why two output files? Well, you may be collecting something important, like a backup of all the files on your system. If something goes wrong, you want to know about it, but you don't want to mess up the backup with the message.
Redirecting input and output
But of course, even if you're running an interactive shell, you don't want to back up your system to the screen. You need to change stdout to be a file. Many programs can do this themselves; for example, you might make a backup of your home directory like this:
$ tar -cf /var/tmp/backup-
This creates (option c) a file (option f) called /var/tmp/backup, and includes all the files in your home directory (~). Any error messages still appear on the terminal, as stderr hasn't been changed.
This syntax is specific to tar.The shell provides a more general syntax for redirecting input and output streams. For example, if you want to create a list of the files in your current directory, you might enter:
$ ls -l drwxr-xr-x 2 root wheel 512 Dec 20 14:36 CVS -rw-r--r-- 1 root wheel 7928 Oct 23 12:01 Makefile -rw-r--r-- 5 root wheel 209 Jul 26 07:11 amd.map -rw-r--r-- 5 root wheel 1163 Jan 31 2002 apmd.conf -rw-r--r-- 5 root wheel 271 Jan 31 2002 auth.conf -rw-r--r-- 1 root wheel 741 Feb 19 2001 crontab -rw-r--r-- 5 root wheel 108 Jan 31 2002 csh.cshrc -rw-r--r-- 5 root wheel 482 Jan 31 2002 csh.login (etc)
You can redirect this output to a file with the command:
$ ls -l > /var/tmp/etclist
This puts the list in the file /var/tmp/etclist. The symbol > tells the shell to redirect stdout to the file whose name follows. Similarly, you could use the < to redirect stdin to that file, for example when using grep to look for specific texts in the file:
$ grep csh < /var/tmp/etclist -rw-r--r-- 5 root wheel 108 Jan 31 2002 csh.cshrc -rw-r--r-- 5 root wheel 482 Jan 31 2002 csh.login -rw-r--r-- 5 grog lemis 110 Jan 31 2002 csh.logout
In fact, though, there's a better way to do that: what we're doing here is feeding the output of a program into the input of another program. That happens so often that there's a special method of doing it, called pipes:
| grep csh -rw-r--r-- 5 root wheel 108 Jan 31 2002 csh.cshrc -rw-r--r-- 5 root wheel 482 Jan 31 2002 csh.login -rw-r--r-- 5 grog lemis 110 Jan 31 2002 csh.logout
The | symbol causes the shell to start two programs. The first has a special file, a pipe, as the output, and the second has the same pipe as input. Nothing gets written to disk, and the result is much faster.
A typical use of pipes is to handle quantities of output data in excess of a screenful. You can pipe to the less2Why less? Originally there was a program called more, but it isn't as powerful. Less is a new program with additional features, which proves beyond doubt that less is more.program, which enables you to page backward and forward:
$ ls -l | less
Another use is to sort arbitrary data:
$ ps aux | sort -n +1
This command takes the output of the ps command and sorts it by the numerical (-n) value of its second column (+1). The first column is numbered 0.