summaryrefslogtreecommitdiff
path: root/doc/field-formatting.rst
diff options
context:
space:
mode:
Diffstat (limited to 'doc/field-formatting.rst')
-rw-r--r--doc/field-formatting.rst370
1 files changed, 370 insertions, 0 deletions
diff --git a/doc/field-formatting.rst b/doc/field-formatting.rst
new file mode 100644
index 0000000000000..2e2bd75dd1ca1
--- /dev/null
+++ b/doc/field-formatting.rst
@@ -0,0 +1,370 @@
+
+.. index:: Field Formatting
+
+Field Formatting
+----------------
+
+The field format is similar to the format string for printf(3). Its
+use varies based on the role of the field, but generally is used to
+format the field's contents.
+
+If the format string is not provided for a value field, it defaults to
+"%s".
+
+Note a field definition can contain zero or more printf-style
+'directives', which are sequences that start with a '%' and end with
+one of following characters: "diouxXDOUeEfFgGaAcCsSp". Each directive
+is matched by one of more arguments to the xo_emit function.
+
+The format string has the form::
+
+ '%' format-modifier * format-character
+
+The format-modifier can be:
+
+- a '#' character, indicating the output value should be prefixed
+ with '0x', typically to indicate a base 16 (hex) value.
+- a minus sign ('-'), indicating the output value should be padded on
+ the right instead of the left.
+- a leading zero ('0') indicating the output value should be padded on the
+ left with zeroes instead of spaces (' ').
+- one or more digits ('0' - '9') indicating the minimum width of the
+ argument. If the width in columns of the output value is less than
+ the minimum width, the value will be padded to reach the minimum.
+- a period followed by one or more digits indicating the maximum
+ number of bytes which will be examined for a string argument, or the maximum
+ width for a non-string argument. When handling ASCII strings this
+ functions as the field width but for multi-byte characters, a single
+ character may be composed of multiple bytes.
+ xo_emit will never dereference memory beyond the given number of bytes.
+- a second period followed by one or more digits indicating the maximum
+ width for a string argument. This modifier cannot be given for non-string
+ arguments.
+- one or more 'h' characters, indicating shorter input data.
+- one or more 'l' characters, indicating longer input data.
+- a 'z' character, indicating a 'size_t' argument.
+- a 't' character, indicating a 'ptrdiff_t' argument.
+- a ' ' character, indicating a space should be emitted before
+ positive numbers.
+- a '+' character, indicating sign should emitted before any number.
+
+Note that 'q', 'D', 'O', and 'U' are considered deprecated and will be
+removed eventually.
+
+The format character is described in the following table:
+
+===== ================= ======================
+ Ltr Argument Type Format
+===== ================= ======================
+ d int base 10 (decimal)
+ i int base 10 (decimal)
+ o int base 8 (octal)
+ u unsigned base 10 (decimal)
+ x unsigned base 16 (hex)
+ X unsigned long base 16 (hex)
+ D long base 10 (decimal)
+ O unsigned long base 8 (octal)
+ U unsigned long base 10 (decimal)
+ e double [-]d.ddde+-dd
+ E double [-]d.dddE+-dd
+ f double [-]ddd.ddd
+ F double [-]ddd.ddd
+ g double as 'e' or 'f'
+ G double as 'E' or 'F'
+ a double [-]0xh.hhhp[+-]d
+ A double [-]0Xh.hhhp[+-]d
+ c unsigned char a character
+ C wint_t a character
+ s char \* a UTF-8 string
+ S wchar_t \* a unicode/WCS string
+ p void \* '%#lx'
+===== ================= ======================
+
+The 'h' and 'l' modifiers affect the size and treatment of the
+argument:
+
+===== ============= ====================
+ Mod d, i o, u, x, X
+===== ============= ====================
+ hh signed char unsigned char
+ h short unsigned short
+ l long unsigned long
+ ll long long unsigned long long
+ j intmax_t uintmax_t
+ t ptrdiff_t ptrdiff_t
+ z size_t size_t
+ q quad_t u_quad_t
+===== ============= ====================
+
+.. index:: UTF-8
+.. index:: Locale
+
+.. _utf-8:
+
+UTF-8 and Locale Strings
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+For strings, the 'h' and 'l' modifiers affect the interpretation of
+the bytes pointed to argument. The default '%s' string is a 'char \*'
+pointer to a string encoded as UTF-8. Since UTF-8 is compatible with
+ASCII data, a normal 7-bit ASCII string can be used. '%ls' expects a
+'wchar_t \*' pointer to a wide-character string, encoded as a 32-bit
+Unicode values. '%hs' expects a 'char \*' pointer to a multi-byte
+string encoded with the current locale, as given by the LC_CTYPE,
+LANG, or LC_ALL environment varibles. The first of this list of
+variables is used and if none of the variables are set, the locale
+defaults to "UTF-8".
+
+libxo will convert these arguments as needed to either UTF-8 (for XML,
+JSON, and HTML styles) or locale-based strings for display in text
+style::
+
+ xo_emit("All strings are utf-8 content {:tag/%ls}",
+ L"except for wide strings");
+
+======== ================== ===============================
+ Format Argument Type Argument Contents
+======== ================== ===============================
+ %s const char \* UTF-8 string
+ %S const char \* UTF-8 string (alias for '%ls')
+ %ls const wchar_t \* Wide character UNICODE string
+ %hs const char * locale-based string
+======== ================== ===============================
+
+.. admonition:: "Long", not "locale"
+
+ The "*l*" in "%ls" is for "*long*", following the convention of "%ld".
+ It is not "*locale*", a common mis-mnemonic. "%S" is equivalent to
+ "%ls".
+
+For example, the following function is passed a locale-base name, a
+hat size, and a time value. The hat size is formatted in a UTF-8
+(ASCII) string, and the time value is formatted into a wchar_t
+string::
+
+ void print_order (const char *name, int size,
+ struct tm *timep) {
+ char buf[32];
+ const char *size_val = "unknown";
+
+ if (size > 0)
+ snprintf(buf, sizeof(buf), "%d", size);
+ size_val = buf;
+ }
+
+ wchar_t when[32];
+ wcsftime(when, sizeof(when), L"%d%b%y", timep);
+
+ xo_emit("The hat for {:name/%hs} is {:size/%s}.\n",
+ name, size_val);
+ xo_emit("It was ordered on {:order-time/%ls}.\n",
+ when);
+ }
+
+It is important to note that xo_emit will perform the conversion
+required to make appropriate output. Text style output uses the
+current locale (as described above), while XML, JSON, and HTML use
+UTF-8.
+
+UTF-8 and locale-encoded strings can use multiple bytes to encode one
+column of data. The traditional "precision'" (aka "max-width") value
+for "%s" printf formatting becomes overloaded since it specifies both
+the number of bytes that can be safely referenced and the maximum
+number of columns to emit. xo_emit uses the precision as the former,
+and adds a third value for specifying the maximum number of columns.
+
+In this example, the name field is printed with a minimum of 3 columns
+and a maximum of 6. Up to ten bytes of data at the location given by
+'name' are in used in filling those columns::
+
+ xo_emit("{:name/%3.10.6s}", name);
+
+Characters Outside of Field Definitions
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Characters in the format string that are not part of a field
+definition are copied to the output for the TEXT style, and are
+ignored for the JSON and XML styles. For HTML, these characters are
+placed in a <div> with class "text"::
+
+ EXAMPLE:
+ xo_emit("The hat is {:size/%s}.\n", size_val);
+ TEXT:
+ The hat is extra small.
+ XML:
+ <size>extra small</size>
+ JSON:
+ "size": "extra small"
+ HTML:
+ <div class="text">The hat is </div>
+ <div class="data" data-tag="size">extra small</div>
+ <div class="text">.</div>
+
+.. index:: errno
+
+"%m" Is Supported
+~~~~~~~~~~~~~~~~~
+
+libxo supports the '%m' directive, which formats the error message
+associated with the current value of "errno". It is the equivalent
+of "%s" with the argument strerror(errno)::
+
+ xo_emit("{:filename} cannot be opened: {:error/%m}", filename);
+ xo_emit("{:filename} cannot be opened: {:error/%s}",
+ filename, strerror(errno));
+
+"%n" Is Not Supported
+~~~~~~~~~~~~~~~~~~~~~
+
+libxo does not support the '%n' directive. It's a bad idea and we
+just don't do it.
+
+The Encoding Format (eformat)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The "eformat" string is the format string used when encoding the field
+for JSON and XML. If not provided, it defaults to the primary format
+with any minimum width removed. If the primary is not given, both
+default to "%s".
+
+Content Strings
+~~~~~~~~~~~~~~~
+
+For padding and labels, the content string is considered the content,
+unless a format is given.
+
+.. index:: printf-like
+
+Argument Validation
+~~~~~~~~~~~~~~~~~~~
+
+Many compilers and tool chains support validation of printf-like
+arguments. When the format string fails to match the argument list,
+a warning is generated. This is a valuable feature and while the
+formatting strings for libxo differ considerably from printf, many of
+these checks can still provide build-time protection against bugs.
+
+libxo provide variants of functions that provide this ability, if the
+"--enable-printflike" option is passed to the "configure" script.
+These functions use the "_p" suffix, like "xo_emit_p()",
+xo_emit_hp()", etc.
+
+The following are features of libxo formatting strings that are
+incompatible with printf-like testing:
+
+- implicit formats, where "{:tag}" has an implicit "%s";
+- the "max" parameter for strings, where "{:tag/%4.10.6s}" means up to
+ ten bytes of data can be inspected to fill a minimum of 4 columns and
+ a maximum of 6;
+- percent signs in strings, where "{:filled}%" makes a single,
+ trailing percent sign;
+- the "l" and "h" modifiers for strings, where "{:tag/%hs}" means
+ locale-based string and "{:tag/%ls}" means a wide character string;
+- distinct encoding formats, where "{:tag/#%s/%s}" means the display
+ styles (text and HTML) will use "#%s" where other styles use "%s";
+
+If none of these features are in use by your code, then using the "_p"
+variants might be wise:
+
+================== ========================
+ Function printf-like Equivalent
+================== ========================
+ xo_emit_hv xo_emit_hvp
+ xo_emit_h xo_emit_hp
+ xo_emit xo_emit_p
+ xo_emit_warn_hcv xo_emit_warn_hcvp
+ xo_emit_warn_hc xo_emit_warn_hcp
+ xo_emit_warn_c xo_emit_warn_cp
+ xo_emit_warn xo_emit_warn_p
+ xo_emit_warnx xo_emit_warnx_p
+ xo_emit_err xo_emit_err_p
+ xo_emit_errx xo_emit_errx_p
+ xo_emit_errc xo_emit_errc_p
+================== ========================
+
+.. index:: performance
+.. index:: XOEF_RETAIN
+
+.. _retain:
+
+Retaining Parsed Format Information
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+libxo can retain the parsed internal information related to the given
+format string, allowing subsequent xo_emit calls, the retained
+information is used, avoiding repetitive parsing of the format string::
+
+ SYNTAX:
+ int xo_emit_f(xo_emit_flags_t flags, const char fmt, ...);
+ EXAMPLE:
+ xo_emit_f(XOEF_RETAIN, "{:some/%02d}{:thing/%-6s}{:fancy}\n",
+ some, thing, fancy);
+
+To retain parsed format information, use the XOEF_RETAIN flag to the
+xo_emit_f() function. A complete set of xo_emit_f functions exist to
+match all the xo_emit function signatures (with handles, varadic
+argument, and printf-like flags):
+
+================== ========================
+ Function Flags Equivalent
+================== ========================
+ xo_emit_hv xo_emit_hvf
+ xo_emit_h xo_emit_hf
+ xo_emit xo_emit_f
+ xo_emit_hvp xo_emit_hvfp
+ xo_emit_hp xo_emit_hfp
+ xo_emit_p xo_emit_fp
+================== ========================
+
+The format string must be immutable across multiple calls to xo_emit_f(),
+since the library retains the string. Typically this is done by using
+static constant strings, such as string literals. If the string is not
+immutable, the XOEF_RETAIN flag must not be used.
+
+The functions xo_retain_clear() and xo_retain_clear_all() release
+internal information on either a single format string or all format
+strings, respectively. Neither is required, but the library will
+retain this information until it is cleared or the process exits::
+
+ const char *fmt = "{:name} {:count/%d}\n";
+ for (i = 0; i < 1000; i++) {
+ xo_open_instance("item");
+ xo_emit_f(XOEF_RETAIN, fmt, name[i], count[i]);
+ }
+ xo_retain_clear(fmt);
+
+The retained information is kept as thread-specific data.
+
+Example
+~~~~~~~
+
+In this example, the value for the number of items in stock is emitted::
+
+ xo_emit("{P: }{Lwc:In stock}{:in-stock/%u}\n",
+ instock);
+
+This call will generate the following output::
+
+ TEXT:
+ In stock: 144
+ XML:
+ <in-stock>144</in-stock>
+ JSON:
+ "in-stock": 144,
+ HTML:
+ <div class="line">
+ <div class="padding"> </div>
+ <div class="label">In stock</div>
+ <div class="decoration">:</div>
+ <div class="padding"> </div>
+ <div class="data" data-tag="in-stock">144</div>
+ </div>
+
+Clearly HTML wins the verbosity award, and this output does
+not include XOF_XPATH or XOF_INFO data, which would expand the
+penultimate line to::
+
+ <div class="data" data-tag="in-stock"
+ data-xpath="/top/data/item/in-stock"
+ data-type="number"
+ data-help="Number of items in stock">144</div>