1 files changed, 370 insertions, 0 deletions
diff --git a/doc/field-formatting.rst b/doc/field-formatting.rst
new file mode 100644
index 0000000000000..2e2bd75dd1ca1
--- /dev/null
+++ b/doc/field-formatting.rst
@@ -0,0 +1,370 @@
+
+.. index:: Field Formatting
+
+Field Formatting
+----------------
+
+The field format is similar to the format string for printf(3).  Its
+use varies based on the role of the field, but generally is used to
+format the field's contents.
+
+If the format string is not provided for a value field, it defaults to
+"%s".
+
+Note a field definition can contain zero or more printf-style
+'directives', which are sequences that start with a '%' and end with
+one of following characters: "diouxXDOUeEfFgGaAcCsSp".  Each directive
+is matched by one of more arguments to the xo_emit function.
+
+The format string has the form::
+
+  '%' format-modifier * format-character
+
+The format-modifier can be:
+
+- a '#' character, indicating the output value should be prefixed
+  with '0x', typically to indicate a base 16 (hex) value.
+- a minus sign ('-'), indicating the output value should be padded on
+  the right instead of the left.
+- a leading zero ('0') indicating the output value should be padded on the
+  left with zeroes instead of spaces (' ').
+- one or more digits ('0' - '9') indicating the minimum width of the
+  argument.  If the width in columns of the output value is less than
+  the minimum width, the value will be padded to reach the minimum.
+- a period followed by one or more digits indicating the maximum
+  number of bytes which will be examined for a string argument, or the maximum
+  width for a non-string argument.  When handling ASCII strings this
+  functions as the field width but for multi-byte characters, a single
+  character may be composed of multiple bytes.
+  xo_emit will never dereference memory beyond the given number of bytes.
+- a second period followed by one or more digits indicating the maximum
+  width for a string argument.  This modifier cannot be given for non-string
+  arguments.
+- one or more 'h' characters, indicating shorter input data.
+- one or more 'l' characters, indicating longer input data.
+- a 'z' character, indicating a 'size_t' argument.
+- a 't' character, indicating a 'ptrdiff_t' argument.
+- a ' ' character, indicating a space should be emitted before
+  positive numbers.
+- a '+' character, indicating sign should emitted before any number.
+
+Note that 'q', 'D', 'O', and 'U' are considered deprecated and will be
+removed eventually.
+
+The format character is described in the following table:
+
+===== ================= ======================
+ Ltr   Argument Type     Format
+===== ================= ======================
+ d     int               base 10 (decimal)
+ i     int               base 10 (decimal)
+ o     int               base 8 (octal)
+ u     unsigned          base 10 (decimal)
+ x     unsigned          base 16 (hex)
+ X     unsigned long     base 16 (hex)
+ D     long              base 10 (decimal)
+ O     unsigned long     base 8 (octal)
+ U     unsigned long     base 10 (decimal)
+ e     double            [-]d.ddde+-dd
+ E     double            [-]d.dddE+-dd
+ f     double            [-]ddd.ddd
+ F     double            [-]ddd.ddd
+ g     double            as 'e' or 'f'
+ G     double            as 'E' or 'F'
+ a     double            [-]0xh.hhhp[+-]d
+ A     double            [-]0Xh.hhhp[+-]d
+ c     unsigned char     a character
+ C     wint_t            a character
+ s     char \*           a UTF-8 string
+ S     wchar_t \*        a unicode/WCS string
+ p     void \*           '%#lx'
+===== ================= ======================
+
+The 'h' and 'l' modifiers affect the size and treatment of the
+argument:
+
+===== ============= ====================
+ Mod   d, i          o, u, x, X
+===== ============= ====================
+ hh    signed char   unsigned char
+ h     short         unsigned short
+ l     long          unsigned long
+ ll    long long     unsigned long long
+ j     intmax_t      uintmax_t
+ t     ptrdiff_t     ptrdiff_t
+ z     size_t        size_t
+ q     quad_t        u_quad_t
+===== ============= ====================
+
+.. index:: UTF-8
+.. index:: Locale
+
+.. _utf-8:
+
+UTF-8 and Locale Strings
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+For strings, the 'h' and 'l' modifiers affect the interpretation of
+the bytes pointed to argument.  The default '%s' string is a 'char \*'
+pointer to a string encoded as UTF-8.  Since UTF-8 is compatible with
+ASCII data, a normal 7-bit ASCII string can be used.  '%ls' expects a
+'wchar_t \*' pointer to a wide-character string, encoded as a 32-bit
+Unicode values.  '%hs' expects a 'char \*' pointer to a multi-byte
+string encoded with the current locale, as given by the LC_CTYPE,
+LANG, or LC_ALL environment varibles.  The first of this list of
+variables is used and if none of the variables are set, the locale
+defaults to "UTF-8".
+
+libxo will convert these arguments as needed to either UTF-8 (for XML,
+JSON, and HTML styles) or locale-based strings for display in text
+style::
+
+   xo_emit("All strings are utf-8 content {:tag/%ls}",
+           L"except for wide strings");
+
+======== ================== ===============================
+ Format   Argument Type      Argument Contents
+======== ================== ===============================
+ %s       const char \*      UTF-8 string
+ %S       const char \*      UTF-8 string (alias for '%ls')
+ %ls      const wchar_t \*   Wide character UNICODE string
+ %hs      const char *       locale-based string
+======== ================== ===============================
+
+.. admonition:: "Long", not "locale"
+
+  The "*l*" in "%ls" is for "*long*", following the convention of "%ld".
+  It is not "*locale*", a common mis-mnemonic.  "%S" is equivalent to
+  "%ls".
+
+For example, the following function is passed a locale-base name, a
+hat size, and a time value.  The hat size is formatted in a UTF-8
+(ASCII) string, and the time value is formatted into a wchar_t
+string::
+
+    void print_order (const char *name, int size,
+                      struct tm *timep) {
+        char buf[32];
+        const char *size_val = "unknown";
+
+	if (size > 0)
+            snprintf(buf, sizeof(buf), "%d", size);
+            size_val = buf;
+        }
+
+        wchar_t when[32];
+        wcsftime(when, sizeof(when), L"%d%b%y", timep);
+
+        xo_emit("The hat for {:name/%hs} is {:size/%s}.\n",
+                name, size_val);
+        xo_emit("It was ordered on {:order-time/%ls}.\n",
+                when);
+    }
+
+It is important to note that xo_emit will perform the conversion
+required to make appropriate output.  Text style output uses the
+current locale (as described above), while XML, JSON, and HTML use
+UTF-8.
+
+UTF-8 and locale-encoded strings can use multiple bytes to encode one
+column of data.  The traditional "precision'" (aka "max-width") value
+for "%s" printf formatting becomes overloaded since it specifies both
+the number of bytes that can be safely referenced and the maximum
+number of columns to emit.  xo_emit uses the precision as the former,
+and adds a third value for specifying the maximum number of columns.
+
+In this example, the name field is printed with a minimum of 3 columns
+and a maximum of 6.  Up to ten bytes of data at the location given by
+'name' are in used in filling those columns::
+
+    xo_emit("{:name/%3.10.6s}", name);
+
+Characters Outside of Field Definitions
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Characters in the format string that are not part of a field
+definition are copied to the output for the TEXT style, and are
+ignored for the JSON and XML styles.  For HTML, these characters are
+placed in a <div> with class "text"::
+
+  EXAMPLE:
+      xo_emit("The hat is {:size/%s}.\n", size_val);
+  TEXT:
+      The hat is extra small.
+  XML:
+      <size>extra small</size>
+  JSON:
+      "size": "extra small"
+  HTML:
+      <div class="text">The hat is </div>
+      <div class="data" data-tag="size">extra small</div>
+      <div class="text">.</div>
+
+.. index:: errno
+
+"%m" Is Supported
+~~~~~~~~~~~~~~~~~
+
+libxo supports the '%m' directive, which formats the error message
+associated with the current value of "errno".  It is the equivalent
+of "%s" with the argument strerror(errno)::
+
+    xo_emit("{:filename} cannot be opened: {:error/%m}", filename);
+    xo_emit("{:filename} cannot be opened: {:error/%s}",
+            filename, strerror(errno));
+
+"%n" Is Not Supported
+~~~~~~~~~~~~~~~~~~~~~
+
+libxo does not support the '%n' directive.  It's a bad idea and we
+just don't do it.
+
+The Encoding Format (eformat)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The "eformat" string is the format string used when encoding the field
+for JSON and XML.  If not provided, it defaults to the primary format
+with any minimum width removed.  If the primary is not given, both
+default to "%s".
+
+Content Strings
+~~~~~~~~~~~~~~~
+
+For padding and labels, the content string is considered the content,
+unless a format is given.
+
+.. index:: printf-like
+
+Argument Validation
+~~~~~~~~~~~~~~~~~~~
+
+Many compilers and tool chains support validation of printf-like
+arguments.  When the format string fails to match the argument list,
+a warning is generated.  This is a valuable feature and while the
+formatting strings for libxo differ considerably from printf, many of
+these checks can still provide build-time protection against bugs.
+
+libxo provide variants of functions that provide this ability, if the
+"--enable-printflike" option is passed to the "configure" script.
+These functions use the "_p" suffix, like "xo_emit_p()",
+xo_emit_hp()", etc.
+
+The following are features of libxo formatting strings that are
+incompatible with printf-like testing:
+
+- implicit formats, where "{:tag}" has an implicit "%s";
+- the "max" parameter for strings, where "{:tag/%4.10.6s}" means up to
+  ten bytes of data can be inspected to fill a minimum of 4 columns and
+  a maximum of 6;
+- percent signs in strings, where "{:filled}%" makes a single,
+  trailing percent sign;
+- the "l" and "h" modifiers for strings, where "{:tag/%hs}" means
+  locale-based string and "{:tag/%ls}" means a wide character string;
+- distinct encoding formats, where "{:tag/#%s/%s}" means the display
+  styles (text and HTML) will use "#%s" where other styles use "%s";
+
+If none of these features are in use by your code, then using the "_p"
+variants might be wise:
+
+================== ========================
+ Function           printf-like Equivalent
+================== ========================
+ xo_emit_hv         xo_emit_hvp
+ xo_emit_h          xo_emit_hp
+ xo_emit            xo_emit_p
+ xo_emit_warn_hcv   xo_emit_warn_hcvp
+ xo_emit_warn_hc    xo_emit_warn_hcp
+ xo_emit_warn_c     xo_emit_warn_cp
+ xo_emit_warn       xo_emit_warn_p
+ xo_emit_warnx      xo_emit_warnx_p
+ xo_emit_err        xo_emit_err_p
+ xo_emit_errx       xo_emit_errx_p
+ xo_emit_errc       xo_emit_errc_p
+================== ========================
+
+.. index:: performance
+.. index:: XOEF_RETAIN
+
+.. _retain:
+
+Retaining Parsed Format Information
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+libxo can retain the parsed internal information related to the given
+format string, allowing subsequent xo_emit calls, the retained
+information is used, avoiding repetitive parsing of the format string::
+
+    SYNTAX:
+      int xo_emit_f(xo_emit_flags_t flags, const char fmt, ...);
+    EXAMPLE:
+      xo_emit_f(XOEF_RETAIN, "{:some/%02d}{:thing/%-6s}{:fancy}\n",
+                     some, thing, fancy);
+
+To retain parsed format information, use the XOEF_RETAIN flag to the
+xo_emit_f() function.  A complete set of xo_emit_f functions exist to
+match all the xo_emit function signatures (with handles, varadic
+argument, and printf-like flags):
+
+================== ========================
+ Function           Flags Equivalent
+================== ========================
+ xo_emit_hv         xo_emit_hvf
+ xo_emit_h          xo_emit_hf
+ xo_emit            xo_emit_f
+ xo_emit_hvp        xo_emit_hvfp
+ xo_emit_hp         xo_emit_hfp
+ xo_emit_p          xo_emit_fp
+================== ========================
+
+The format string must be immutable across multiple calls to xo_emit_f(),
+since the library retains the string.  Typically this is done by using
+static constant strings, such as string literals. If the string is not
+immutable, the XOEF_RETAIN flag must not be used.
+
+The functions xo_retain_clear() and xo_retain_clear_all() release
+internal information on either a single format string or all format
+strings, respectively.  Neither is required, but the library will
+retain this information until it is cleared or the process exits::
+
+    const char *fmt = "{:name}  {:count/%d}\n";
+    for (i = 0; i < 1000; i++) {
+        xo_open_instance("item");
+        xo_emit_f(XOEF_RETAIN, fmt, name[i], count[i]);
+    }
+    xo_retain_clear(fmt);
+
+The retained information is kept as thread-specific data.
+
+Example
+~~~~~~~
+
+In this example, the value for the number of items in stock is emitted::
+
+        xo_emit("{P:   }{Lwc:In stock}{:in-stock/%u}\n",
+                instock);
+
+This call will generate the following output::
+
+  TEXT:
+       In stock: 144
+  XML:
+      <in-stock>144</in-stock>
+  JSON:
+      "in-stock": 144,
+  HTML:
+      <div class="line">
+        <div class="padding">   </div>
+        <div class="label">In stock</div>
+        <div class="decoration">:</div>
+        <div class="padding"> </div>
+        <div class="data" data-tag="in-stock">144</div>
+      </div>
+
+Clearly HTML wins the verbosity award, and this output does
+not include XOF_XPATH or XOF_INFO data, which would expand the
+penultimate line to::
+
+       <div class="data" data-tag="in-stock"
+          data-xpath="/top/data/item/in-stock"
+          data-type="number"
+          data-help="Number of items in stock">144</div>