diff options
Diffstat (limited to 'sysutils/uniutils/pkg-descr')
-rw-r--r-- | sysutils/uniutils/pkg-descr | 23 |
1 files changed, 23 insertions, 0 deletions
diff --git a/sysutils/uniutils/pkg-descr b/sysutils/uniutils/pkg-descr new file mode 100644 index 000000000000..1144e261299f --- /dev/null +++ b/sysutils/uniutils/pkg-descr @@ -0,0 +1,23 @@ +Unidesc consists of four programs for finding out what is in a Unicode file. +They are useful when working with Unicode files when one doesn't know the +writing system, doesn't have the necessary font, needs to inspect invisible +characters, needs to find out whether characters have been combined or in what +order they occur, or needs statistics on which characters occur. + +uniname defaults to printing the character offset of each character, its byte +offset, its hex code value, its encoding, the glyph itself, and its name. + +unidesc reports the character ranges to which different portions of the text +belong. It can also be used to identify Unicode encodings (e.g. UTF-16be) +flagged by magic numbers. + +unihist generates a histogram of the characters in its input, which must be +encoded in UTF-8 Unicode. By default, for each character it prints the +frequency of the character as a percentage of the total, the absolute number of +tokens in the input, the UTF-32 code in hexadecimal, and, if the character is +displayable, the glyph itself as UTF-8 Unicode. + +ExplicateUTF8 is intended for debugging or for learning about Unicode. It +determines and explains the validity of a sequence of bytes as a UTF8 encoding. + +WWW: http://www.cis.upenn.edu/~wjposer/unidesc.html |