Опубликован: 06.08.2012 | Уровень: специалист | Доступ: платный
Лекция 7:

The tools of the trade

Files and file names

Both UNIX and Microsoft environments store disk data in files, which in turn are placed in directories .A file may be a directory: that is, it may contain other files. The differences between UNIX and Microsoft start with file names. Traditional Microsoft file names are rigid: a file name consists of eight characters, possibly followed by a period and another three characters (the so-called file name extension). There are significant restrictions on which characters may be used to form a file name, and upper and lower case letters have the same meaning (internally, Microsoft converts the names to UPPER CASE). Directory members are selected with a backslash (\), which conflicts with other meanings in the C programming languagesee page 138 for more details.

FreeBSD has a very fexible method of naming files. File names can contain any character except /, and they can be up to 255 characters long. They are case-sensitive: the names FOO, Foo and foo are three different names. This may seem silly at first, but any alternative means that the names must be associated with a specific character set. How do you upshift the German name  ungleichm\ddot\alpha\beta ig? What if the same characters appear in a Russian name? Do they still shift the same? The exception is because the / character represents directories. For example, the name /home/fred/longtext-with-a-long-name represent:

First character is a /, representing the root file system.

home is the name of a directory in the root file system.

fred is the name of a directory in /home.

The name suggests that longtext-with-a-long-name is probably a file, not a directory, though you can't tell from the name.

As a result, you can't use / in a file name. In addition, binary 0s (the ASCII NUL character) can confuse a lot of programs. It's almost impossible to get a binary 0 into a file name anyway: that character is used to represent the end of a string in the C programming language, and it's difficult to input it from the keyboard.

Case sensitivity no longer seems as strange as it once did: web browsers have made UNIX file names more popular with Uniform Resource Indicators or URIs, which are derived from UNIX names.

File names and extensions

The Microsoft naming convention (name, period and extension) seems similar to that of UNIX. UNIX also uses extensions to represent specific kinds of files. The difference is that these extensions (and their lengths) are implemented by convention, not by the file system. In Microsoft, the period between the name and the extension is a typographical feature that only exists at the display level: it's not part of the name. In UNIX, the period is part of the name, and names like foo.bar.bazzot are perfectly valid file names. The system doesn't assign any particular meaning to file name extensions; instead, it looks for magic numbers, specific values in specific places in the file.

Relative paths

Every directory contains two directory entries, . and .. (One and two periods). These are relative directory entries: . is an alternative way to refer to the current directory, and .. refers to the parent directory. For example, in /home/fred, . refers to /home/fred, and .. refers to /home. The root directory doesn't have a parent directory, so in this directory only, .. refers to the same directory. We'll see a number of cases where this is useful1Interestingly, the Microsoft file systems also have this feature .

Globbing characters

Most systems have a method of representing groups of file names and other names, usually by using special characters for representing an abstraction. The most common in UNIX are the characters *,? and the square brackets []. UNIX calls these characters globbing characters. The Microsoft usage comes from UNIX, but the underlying file name representation makes for big differences. Table 7-2 gives some examples.

Таблица 7.2. Globbing examples
Name Microsoft meaning UNIX meaning
CONFIG.* All files with the name CONFIG, no matter what their extension. All files whose name starts with CONFIG., no matter what the rest is. Note that the name contains a period.
CONFIG.BA? All files with the name CONFIG and an extension that starts with BA, no matter what the last character. All files that start with CONFI.BA and have one more character in their name.
* Depending on the Microsoft version, all files without an extension, or all files. All files.
*.* All files with an extension. All files that have a period in the middle of their name.
foo[127] In older versions, invalid. In newer versions with long file name support, the file with the name foo[127]. The three files foo1, foo2 and foo7.

Input and output

Most programs either read input data or write output data. To make it easier, the shell usually starts programs with at least three open files:

  • Standard input, often abbreviated to stdin, is the file that most programs read to get input data.
  • Standard output, or stdout, is the normal place for programs to write output data.
  • Standard error output, or stderr, is a separate file for programs to write error messages.

With an interactive shell (one that works on a terminal screen, like we're seeing here), all three files are the same device, in this case the terminal you're working on.

Why two output files? Well, you may be collecting something important, like a backup of all the files on your system. If something goes wrong, you want to know about it, but you don't want to mess up the backup with the message.

Redirecting input and output

But of course, even if you're running an interactive shell, you don't want to back up your system to the screen. You need to change stdout to be a file. Many programs can do this themselves; for example, you might make a backup of your home directory like this:

$ tar -cf /var/tmp/backup-

This creates (option c) a file (option f) called /var/tmp/backup, and includes all the files in your home directory (~). Any error messages still appear on the terminal, as stderr hasn't been changed.

This syntax is specific to tar.The shell provides a more general syntax for redirecting input and output streams. For example, if you want to create a list of the files in your current directory, you might enter:

$ ls -l
drwxr-xr-x  2 root  wheel  512   Dec  20  14:36  CVS
-rw-r--r--    1 root  wheel  7928  Oct  23  12:01  Makefile
-rw-r--r--    5 root  wheel  209   Jul  26  07:11  amd.map
-rw-r--r--    5 root  wheel  1163  Jan  31  2002  apmd.conf
-rw-r--r--    5 root  wheel  271   Jan  31  2002  auth.conf
-rw-r--r--    1 root  wheel  741   Feb  19  2001  crontab
-rw-r--r--    5 root  wheel  108   Jan  31  2002  csh.cshrc
-rw-r--r--    5 root  wheel  482   Jan  31  2002  csh.login
(etc)

You can redirect this output to a file with the command:

$ ls -l > /var/tmp/etclist

This puts the list in the file /var/tmp/etclist. The symbol > tells the shell to redirect stdout to the file whose name follows. Similarly, you could use the < to redirect stdin to that file, for example when using grep to look for specific texts in the file:

$ grep csh < /var/tmp/etclist
  -rw-r--r--     5 root   wheel  108 Jan 31  2002 csh.cshrc
  -rw-r--r--     5 root   wheel  482 Jan 31  2002 csh.login
  -rw-r--r--     5 grog   lemis  110 Jan 31  2002 csh.logout

In fact, though, there's a better way to do that: what we're doing here is feeding the output of a program into the input of another program. That happens so often that there's a special method of doing it, called pipes:

| grep csh
  -rw-r--r--     5 root  wheel  108 Jan 31  2002 csh.cshrc
  -rw-r--r--     5 root  wheel  482 Jan 31  2002 csh.login
  -rw-r--r--     5 grog  lemis  110 Jan 31  2002 csh.logout

The | symbol causes the shell to start two programs. The first has a special file, a pipe, as the output, and the second has the same pipe as input. Nothing gets written to disk, and the result is much faster.

A typical use of pipes is to handle quantities of output data in excess of a screenful. You can pipe to the less2Why less? Originally there was a program called more, but it isn't as powerful. Less is a new program with additional features, which proves beyond doubt that less is more.program, which enables you to page backward and forward:

$ ls -l | less

Another use is to sort arbitrary data:

$ ps aux | sort -n +1

This command takes the output of the ps command and sorts it by the numerical (-n) value of its second column (+1). The first column is numbered 0.

Бехзод Сайфуллаев
Бехзод Сайфуллаев
Узбекистан, Бухара, Бухарский институт высоких технологий, 2013
Василь Остапенко
Василь Остапенко
Россия