diff options
Diffstat (limited to 'usr.bin/file/file.1')
-rw-r--r-- | usr.bin/file/file.1 | 351 |
1 files changed, 351 insertions, 0 deletions
diff --git a/usr.bin/file/file.1 b/usr.bin/file/file.1 new file mode 100644 index 000000000000..c10b221c26ca --- /dev/null +++ b/usr.bin/file/file.1 @@ -0,0 +1,351 @@ +.TH FILE 1 "Copyright but distributable" +.\# $Id: file.man,v 1.23 1993/09/24 18:50:48 christos Exp $ +.SH NAME +.I file +\- determine file type +.SH SYNOPSIS +.B file +[ +.B \-c +] +[ +.B \-z +] +[ +.B \-L +] +[ +.B \-f +namefile ] +[ +.B \-m +magicfile ] +file ... +.SH DESCRIPTION +.I File +tests each argument in an attempt to classify it. +There are three sets of tests, performed in this order: +filesystem tests, magic number tests, and language tests. +The +.I first +test that succeeds causes the file type to be printed. +.PP +The type printed will usually contain one of the words +.B text +(the file contains only ASCII characters and is +probably safe to read on an ASCII terminal), +.B executable +(the file contains the result of compiling a program +in a form understandable to some \s-1UNIX\s0 kernel or another), +or +.B data +meaning anything else (data is usually `binary' or non-printable). +Exceptions are well-known file formats (core files, tar archives) +that are known to contain binary data. +When modifying the file +.I __MAGIC__ +or the program itself, +.B "preserve these keywords" . +People depend on knowing that all the readable files in a directory +have the word ``text'' printed. +Don't do as Berkeley did \- change ``shell commands text'' +to ``shell script''. +.PP +The filesystem tests are based on examining the return from a +.IR stat (2) +system call. +The program checks to see if the file is empty, +or if it's some sort of special file. +Any known file types appropriate to the system you are running on +(sockets, symbolic links, or named pipes (FIFOs) on those systems that +implement them) +are intuited if they are defined in +the system header file +.BR sys/stat.h . +.PP +The magic number tests are used to check for files with data in +particular fixed formats. +The canonical example of this is a binary executable (compiled program) +.B a.out +file, whose format is defined in +.B a.out.h +and possibly +.B exec.h +in the standard include directory. +These files have a `magic number' stored in a particular place +near the beginning of the file that tells the \s-1UNIX\s0 operating system +that the file is a binary executable, and which of several types thereof. +The concept of `magic number' has been applied by extension to data files. +Any file with some invariant identifier at a small fixed +offset into the file can usually be described in this way. +The information in these files is read from the magic file +.I __MAGIC__. +.PP +If an argument appears to be an +.SM ASCII +file, +.I file +attempts to guess its language. +The language tests look for particular strings (cf \fInames.h\fP) +that can appear anywhere in the first few blocks of a file. +For example, the keyword +.B .br +indicates that the file is most likely a troff input file, +just as the keyword +.B struct +indicates a C program. +These tests are less reliable than the previous +two groups, so they are performed last. +The language test routines also test for some miscellany +(such as +.I tar +archives) and determine whether an unknown file should be +labelled as `ascii text' or `data'. +.PP +Use +.B \-m +.I file +to specify an alternate file of magic numbers. +.PP +The +.B \-z +tries to look inside compressed files. +.PP +The +.B \-c +option causes a checking printout of the parsed form of the magic file. +This is usually used in conjunction with +.B \-m +to debug a new magic file before installing it. +.PP +The +.B \-f +.I namefile +option specifies that the names of the files to be examined +are to be read (one per line) from +.I namefile +before the argument list. +Either +.I namefile +or at least one filename argument must be present; +to test the standard input, use ``-'' as a filename argument. +.PP +The +.B \-L +option causes symlinks to be followed, as the like-named option in +.IR ls (1). +.SH FILES +.I __MAGIC__ +\- default list of magic numbers +.SH SEE ALSO +.IR magic (__SECTION__) +\- description of magic file format. +.br +.IR Strings (1), " od" (1) +\- tools for examining non-textfiles. +.SH STANDARDS CONFORMANCE +This program is believed to exceed the System V Interface Definition +of FILE(CMD), as near as one can determine from the vague language +contained therein. +Its behaviour is mostly compatible with the System V program of the same name. +This version knows more magic, however, so it will produce +different (albeit more accurate) output in many cases. +.PP +The one significant difference +between this version and System V +is that this version treats any white space +as a delimiter, so that spaces in pattern strings must be escaped. +For example, +.br +>10 string language impress\ (imPRESS data) +.br +in an existing magic file would have to be changed to +.br +>10 string language\e impress (imPRESS data) +.br +In addition, in this version, if a pattern string contains a backslash, +it must be escaped. For example +.br +0 string \ebegindata Andrew Toolkit document +.br +in an existing magic file would have to be changed to +.br +0 string \e\ebegindata Andrew Toolkit document +.br +.PP +SunOS releases 3.2 and later from Sun Microsystems include a +.IR file (1) +command derived from the System V one, but with some extensions. +My version differs from Sun's only in minor ways. +It includes the extension of the `&' operator, used as, +for example, +.br +>16 long&0x7fffffff >0 not stripped +.SH MAGIC DIRECTORY +The magic file entries have been collected from various sources, +mainly USENET, and contributed by various authors. +Christos Zoulas (address below) will collect additional +or corrected magic file entries. +A consolidation of magic file entries +will be distributed periodically. +.PP +The order of entries in the magic file is significant. +Depending on what system you are using, the order that +they are put together may be incorrect. +If your old +.I file +command uses a magic file, +keep the old magic file around for comparison purposes +(rename it to +.IR __MAGIC__.orig ). +.SH HISTORY +There has been a +.I file +command in every UNIX since at least Research Version 6 +(man page dated January, 1975). +The System V version introduced one significant major change: +the external list of magic number types. +This slowed the program down slightly but made it a lot more flexible. +.PP +This program, based on the System V version, +was written by Ian Darwin without looking at anybody else's source code. +.PP +John Gilmore revised the code extensively, making it better than +the first version. +Geoff Collyer found several inadequacies +and provided some magic file entries. +The program has undergone continued evolution since. +.SH AUTHOR +Written by Ian F. Darwin, UUCP address {utzoo | ihnp4}!darwin!ian, +Internet address ian@sq.com, +postal address: P.O. Box 603, Station F, Toronto, Ontario, CANADA M4Y 2L8. +.PP +Altered by Rob McMahon, cudcv@warwick.ac.uk, 1989, to extend the `&' operator +from simple `x&y != 0' to `x&y op z'. +.PP +Altered by Guy Harris, guy@auspex.com, 1993, to: +.RS +.PP +put the ``old-style'' `&' +operator back the way it was, because 1) Rob McMahon's change broke the +previous style of usage, 2) the SunOS ``new-style'' `&' operator, +which this version of +.I file +supports, also handles `x&y op z', and 3) Rob's change wasn't documented +in any case; +.PP +put in multiple levels of `>'; +.PP +put in ``beshort'', ``leshort'', etc. keywords to look at numbers in the +file in a specific byte order, rather than in the native byte order of +the process running +.IR file . +.RE +.PP +Changes by Ian Darwin and various authors including +Christos Zoulas (christos@ee.cornell.edu), 1990-1992. +.SH LEGAL NOTICE +Copyright (c) Ian F. Darwin, Toronto, Canada, +1986, 1987, 1988, 1989, 1990, 1991, 1992, 1993. +.PP +This software is not subject to and may not be made subject to any +license of the American Telephone and Telegraph Company, Sun +Microsystems Inc., Digital Equipment Inc., Lotus Development Inc., the +Regents of the University of California, The X Consortium or MIT, or +The Free Software Foundation. +.PP +This software is not subject to any export provision of the United States +Department of Commerce, and may be exported to any country or planet. +.PP +Permission is granted to anyone to use this software for any purpose on +any computer system, and to alter it and redistribute it freely, subject +to the following restrictions: +.PP +1. The author is not responsible for the consequences of use of this +software, no matter how awful, even if they arise from flaws in it. +.PP +2. The origin of this software must not be misrepresented, either by +explicit claim or by omission. Since few users ever read sources, +credits must appear in the documentation. +.PP +3. Altered versions must be plainly marked as such, and must not be +misrepresented as being the original software. Since few users +ever read sources, credits must appear in the documentation. +.PP +4. This notice may not be removed or altered. +.PP +A few support files (\fIgetopt\fP, \fIstrtok\fP) +distributed with this package +are by Henry Spencer and are subject to the same terms as above. +.PP +A few simple support files (\fIstrtol\fP, \fIstrchr\fP) +distributed with this package +are in the public domain; they are so marked. +.PP +The files +.I tar.h +and +.I is_tar.c +were written by John Gilmore from his public-domain +.I tar +program, and are not covered by the above restrictions. +.SH BUGS +There must be a better way to automate the construction of the Magic +file from all the glop in Magdir. What is it? +Better yet, the magic file should be compiled into binary (say, +.IR ndbm (3) +or, better yet, fixed-length ASCII strings +for use in heterogenous network environments) for faster startup. +Then the program would run as fast as the Version 7 program of the same name, +with the flexibility of the System V version. +.PP +.I File +uses several algorithms that favor speed over accuracy, +thus it can be misled about the contents of ASCII files. +.PP +The support for ASCII files (primarily for programming languages) +is simplistic, inefficient and requires recompilation to update. +.PP +There should be an ``else'' clause to follow a series of continuation lines. +.PP +The magic file and keywords should have regular expression support. +Their use of ASCII TAB as a field delimiter is ugly and makes +it hard to edit the files, but is entrenched. +.PP +It might be advisable to allow upper-case letters in keywords +for e.g., troff commands vs man page macros. +Regular expression support would make this easy. +.PP +The program doesn't grok \s-2FORTRAN\s0. +It should be able to figure \s-2FORTRAN\s0 by seeing some keywords which +appear indented at the start of line. +Regular expression support would make this easy. +.PP +The list of keywords in +.I ascmagic +probably belongs in the Magic file. +This could be done by using some keyword like `*' for the offset value. +.PP +Another optimisation would be to sort +the magic file so that we can just run down all the +tests for the first byte, first word, first long, etc, once we +have fetched it. Complain about conflicts in the magic file entries. +Make a rule that the magic entries sort based on file offset rather +than position within the magic file? +.PP +The program should provide a way to give an estimate +of ``how good'' a guess is. +We end up removing guesses (e.g. ``From '' as first 5 chars of file) because +they are not as good as other guesses (e.g. ``Newsgroups:'' versus +"Return-Path:"). Still, if the others don't pan out, it should be +possible to use the first guess. +.PP +This program is slower than some vendors' file commands. +.PP +This manual page, and particularly this section, is too long. +.SH AVAILABILITY +You can obtain the original author's latest version by anonymous FTP +on +.B tesla.ee.cornell.edu +in the directory +.BR /pub/file-X.YY.tar.gz |