diff options
Diffstat (limited to 'doc/field-formatting.rst')
-rw-r--r-- | doc/field-formatting.rst | 370 |
1 files changed, 370 insertions, 0 deletions
diff --git a/doc/field-formatting.rst b/doc/field-formatting.rst new file mode 100644 index 0000000000000..2e2bd75dd1ca1 --- /dev/null +++ b/doc/field-formatting.rst @@ -0,0 +1,370 @@ + +.. index:: Field Formatting + +Field Formatting +---------------- + +The field format is similar to the format string for printf(3). Its +use varies based on the role of the field, but generally is used to +format the field's contents. + +If the format string is not provided for a value field, it defaults to +"%s". + +Note a field definition can contain zero or more printf-style +'directives', which are sequences that start with a '%' and end with +one of following characters: "diouxXDOUeEfFgGaAcCsSp". Each directive +is matched by one of more arguments to the xo_emit function. + +The format string has the form:: + + '%' format-modifier * format-character + +The format-modifier can be: + +- a '#' character, indicating the output value should be prefixed + with '0x', typically to indicate a base 16 (hex) value. +- a minus sign ('-'), indicating the output value should be padded on + the right instead of the left. +- a leading zero ('0') indicating the output value should be padded on the + left with zeroes instead of spaces (' '). +- one or more digits ('0' - '9') indicating the minimum width of the + argument. If the width in columns of the output value is less than + the minimum width, the value will be padded to reach the minimum. +- a period followed by one or more digits indicating the maximum + number of bytes which will be examined for a string argument, or the maximum + width for a non-string argument. When handling ASCII strings this + functions as the field width but for multi-byte characters, a single + character may be composed of multiple bytes. + xo_emit will never dereference memory beyond the given number of bytes. +- a second period followed by one or more digits indicating the maximum + width for a string argument. This modifier cannot be given for non-string + arguments. +- one or more 'h' characters, indicating shorter input data. +- one or more 'l' characters, indicating longer input data. +- a 'z' character, indicating a 'size_t' argument. +- a 't' character, indicating a 'ptrdiff_t' argument. +- a ' ' character, indicating a space should be emitted before + positive numbers. +- a '+' character, indicating sign should emitted before any number. + +Note that 'q', 'D', 'O', and 'U' are considered deprecated and will be +removed eventually. + +The format character is described in the following table: + +===== ================= ====================== + Ltr Argument Type Format +===== ================= ====================== + d int base 10 (decimal) + i int base 10 (decimal) + o int base 8 (octal) + u unsigned base 10 (decimal) + x unsigned base 16 (hex) + X unsigned long base 16 (hex) + D long base 10 (decimal) + O unsigned long base 8 (octal) + U unsigned long base 10 (decimal) + e double [-]d.ddde+-dd + E double [-]d.dddE+-dd + f double [-]ddd.ddd + F double [-]ddd.ddd + g double as 'e' or 'f' + G double as 'E' or 'F' + a double [-]0xh.hhhp[+-]d + A double [-]0Xh.hhhp[+-]d + c unsigned char a character + C wint_t a character + s char \* a UTF-8 string + S wchar_t \* a unicode/WCS string + p void \* '%#lx' +===== ================= ====================== + +The 'h' and 'l' modifiers affect the size and treatment of the +argument: + +===== ============= ==================== + Mod d, i o, u, x, X +===== ============= ==================== + hh signed char unsigned char + h short unsigned short + l long unsigned long + ll long long unsigned long long + j intmax_t uintmax_t + t ptrdiff_t ptrdiff_t + z size_t size_t + q quad_t u_quad_t +===== ============= ==================== + +.. index:: UTF-8 +.. index:: Locale + +.. _utf-8: + +UTF-8 and Locale Strings +~~~~~~~~~~~~~~~~~~~~~~~~ + +For strings, the 'h' and 'l' modifiers affect the interpretation of +the bytes pointed to argument. The default '%s' string is a 'char \*' +pointer to a string encoded as UTF-8. Since UTF-8 is compatible with +ASCII data, a normal 7-bit ASCII string can be used. '%ls' expects a +'wchar_t \*' pointer to a wide-character string, encoded as a 32-bit +Unicode values. '%hs' expects a 'char \*' pointer to a multi-byte +string encoded with the current locale, as given by the LC_CTYPE, +LANG, or LC_ALL environment varibles. The first of this list of +variables is used and if none of the variables are set, the locale +defaults to "UTF-8". + +libxo will convert these arguments as needed to either UTF-8 (for XML, +JSON, and HTML styles) or locale-based strings for display in text +style:: + + xo_emit("All strings are utf-8 content {:tag/%ls}", + L"except for wide strings"); + +======== ================== =============================== + Format Argument Type Argument Contents +======== ================== =============================== + %s const char \* UTF-8 string + %S const char \* UTF-8 string (alias for '%ls') + %ls const wchar_t \* Wide character UNICODE string + %hs const char * locale-based string +======== ================== =============================== + +.. admonition:: "Long", not "locale" + + The "*l*" in "%ls" is for "*long*", following the convention of "%ld". + It is not "*locale*", a common mis-mnemonic. "%S" is equivalent to + "%ls". + +For example, the following function is passed a locale-base name, a +hat size, and a time value. The hat size is formatted in a UTF-8 +(ASCII) string, and the time value is formatted into a wchar_t +string:: + + void print_order (const char *name, int size, + struct tm *timep) { + char buf[32]; + const char *size_val = "unknown"; + + if (size > 0) + snprintf(buf, sizeof(buf), "%d", size); + size_val = buf; + } + + wchar_t when[32]; + wcsftime(when, sizeof(when), L"%d%b%y", timep); + + xo_emit("The hat for {:name/%hs} is {:size/%s}.\n", + name, size_val); + xo_emit("It was ordered on {:order-time/%ls}.\n", + when); + } + +It is important to note that xo_emit will perform the conversion +required to make appropriate output. Text style output uses the +current locale (as described above), while XML, JSON, and HTML use +UTF-8. + +UTF-8 and locale-encoded strings can use multiple bytes to encode one +column of data. The traditional "precision'" (aka "max-width") value +for "%s" printf formatting becomes overloaded since it specifies both +the number of bytes that can be safely referenced and the maximum +number of columns to emit. xo_emit uses the precision as the former, +and adds a third value for specifying the maximum number of columns. + +In this example, the name field is printed with a minimum of 3 columns +and a maximum of 6. Up to ten bytes of data at the location given by +'name' are in used in filling those columns:: + + xo_emit("{:name/%3.10.6s}", name); + +Characters Outside of Field Definitions +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Characters in the format string that are not part of a field +definition are copied to the output for the TEXT style, and are +ignored for the JSON and XML styles. For HTML, these characters are +placed in a <div> with class "text":: + + EXAMPLE: + xo_emit("The hat is {:size/%s}.\n", size_val); + TEXT: + The hat is extra small. + XML: + <size>extra small</size> + JSON: + "size": "extra small" + HTML: + <div class="text">The hat is </div> + <div class="data" data-tag="size">extra small</div> + <div class="text">.</div> + +.. index:: errno + +"%m" Is Supported +~~~~~~~~~~~~~~~~~ + +libxo supports the '%m' directive, which formats the error message +associated with the current value of "errno". It is the equivalent +of "%s" with the argument strerror(errno):: + + xo_emit("{:filename} cannot be opened: {:error/%m}", filename); + xo_emit("{:filename} cannot be opened: {:error/%s}", + filename, strerror(errno)); + +"%n" Is Not Supported +~~~~~~~~~~~~~~~~~~~~~ + +libxo does not support the '%n' directive. It's a bad idea and we +just don't do it. + +The Encoding Format (eformat) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The "eformat" string is the format string used when encoding the field +for JSON and XML. If not provided, it defaults to the primary format +with any minimum width removed. If the primary is not given, both +default to "%s". + +Content Strings +~~~~~~~~~~~~~~~ + +For padding and labels, the content string is considered the content, +unless a format is given. + +.. index:: printf-like + +Argument Validation +~~~~~~~~~~~~~~~~~~~ + +Many compilers and tool chains support validation of printf-like +arguments. When the format string fails to match the argument list, +a warning is generated. This is a valuable feature and while the +formatting strings for libxo differ considerably from printf, many of +these checks can still provide build-time protection against bugs. + +libxo provide variants of functions that provide this ability, if the +"--enable-printflike" option is passed to the "configure" script. +These functions use the "_p" suffix, like "xo_emit_p()", +xo_emit_hp()", etc. + +The following are features of libxo formatting strings that are +incompatible with printf-like testing: + +- implicit formats, where "{:tag}" has an implicit "%s"; +- the "max" parameter for strings, where "{:tag/%4.10.6s}" means up to + ten bytes of data can be inspected to fill a minimum of 4 columns and + a maximum of 6; +- percent signs in strings, where "{:filled}%" makes a single, + trailing percent sign; +- the "l" and "h" modifiers for strings, where "{:tag/%hs}" means + locale-based string and "{:tag/%ls}" means a wide character string; +- distinct encoding formats, where "{:tag/#%s/%s}" means the display + styles (text and HTML) will use "#%s" where other styles use "%s"; + +If none of these features are in use by your code, then using the "_p" +variants might be wise: + +================== ======================== + Function printf-like Equivalent +================== ======================== + xo_emit_hv xo_emit_hvp + xo_emit_h xo_emit_hp + xo_emit xo_emit_p + xo_emit_warn_hcv xo_emit_warn_hcvp + xo_emit_warn_hc xo_emit_warn_hcp + xo_emit_warn_c xo_emit_warn_cp + xo_emit_warn xo_emit_warn_p + xo_emit_warnx xo_emit_warnx_p + xo_emit_err xo_emit_err_p + xo_emit_errx xo_emit_errx_p + xo_emit_errc xo_emit_errc_p +================== ======================== + +.. index:: performance +.. index:: XOEF_RETAIN + +.. _retain: + +Retaining Parsed Format Information +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +libxo can retain the parsed internal information related to the given +format string, allowing subsequent xo_emit calls, the retained +information is used, avoiding repetitive parsing of the format string:: + + SYNTAX: + int xo_emit_f(xo_emit_flags_t flags, const char fmt, ...); + EXAMPLE: + xo_emit_f(XOEF_RETAIN, "{:some/%02d}{:thing/%-6s}{:fancy}\n", + some, thing, fancy); + +To retain parsed format information, use the XOEF_RETAIN flag to the +xo_emit_f() function. A complete set of xo_emit_f functions exist to +match all the xo_emit function signatures (with handles, varadic +argument, and printf-like flags): + +================== ======================== + Function Flags Equivalent +================== ======================== + xo_emit_hv xo_emit_hvf + xo_emit_h xo_emit_hf + xo_emit xo_emit_f + xo_emit_hvp xo_emit_hvfp + xo_emit_hp xo_emit_hfp + xo_emit_p xo_emit_fp +================== ======================== + +The format string must be immutable across multiple calls to xo_emit_f(), +since the library retains the string. Typically this is done by using +static constant strings, such as string literals. If the string is not +immutable, the XOEF_RETAIN flag must not be used. + +The functions xo_retain_clear() and xo_retain_clear_all() release +internal information on either a single format string or all format +strings, respectively. Neither is required, but the library will +retain this information until it is cleared or the process exits:: + + const char *fmt = "{:name} {:count/%d}\n"; + for (i = 0; i < 1000; i++) { + xo_open_instance("item"); + xo_emit_f(XOEF_RETAIN, fmt, name[i], count[i]); + } + xo_retain_clear(fmt); + +The retained information is kept as thread-specific data. + +Example +~~~~~~~ + +In this example, the value for the number of items in stock is emitted:: + + xo_emit("{P: }{Lwc:In stock}{:in-stock/%u}\n", + instock); + +This call will generate the following output:: + + TEXT: + In stock: 144 + XML: + <in-stock>144</in-stock> + JSON: + "in-stock": 144, + HTML: + <div class="line"> + <div class="padding"> </div> + <div class="label">In stock</div> + <div class="decoration">:</div> + <div class="padding"> </div> + <div class="data" data-tag="in-stock">144</div> + </div> + +Clearly HTML wins the verbosity award, and this output does +not include XOF_XPATH or XOF_INFO data, which would expand the +penultimate line to:: + + <div class="data" data-tag="in-stock" + data-xpath="/top/data/item/in-stock" + data-type="number" + data-help="Number of items in stock">144</div> |