aboutsummaryrefslogtreecommitdiff
path: root/lib/libc/locale/utf8.c
Commit message (Collapse)AuthorAgeFilesLines
* Remove $FreeBSD$: one-line .c patternWarner Losh2023-08-161-2/+0
| | | | Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/
* spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSDWarner Losh2023-05-121-1/+1
| | | | | | | | | The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch up to that fact and revert to their recommended match of BSD-2-Clause. Discussed with: pfg MFC After: 3 days Sponsored by: Netflix
* libc: drop "All rights reserved" from Foundation copyrightsEd Maste2022-08-041-1/+1
| | | | | | | | | This has already been done for most files that have the Foundation as the only listed copyright holder. Do it now for files that list multiple copyright holders, but have the Foundation copyright in its own section. Sponsored by: The FreeBSD Foundation
* libc: further adoption of SPDX licensing ID tags.Pedro F. Giffuni2017-11-251-0/+2
| | | | | | | | | | | | | | | Mainly focus on files that use BSD 2-Clause license, however the tool I was using mis-identified many licenses so this was mostly a manual - error prone - task. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Notes: svn path=/head/; revision=326193
* Merge from HEADBaptiste Daroussin2015-08-251-1/+5
|\ | | | | | | Notes: svn path=/projects/collation/; revision=287142
| * Make UTF-8 parsing and generation more strict.Ed Schouten2015-08-251-1/+5
| | | | | | | | | | | | | | | | | | | | | | - in mbrtowc() we need to disallow codepoints above 0x10ffff. - In wcrtomb() we need to disallow codepoints between 0xd800 and 0xdfff. Reviewed by: bapt Differential Revision: https://reviews.freebsd.org/D3399 Notes: svn path=/head/; revision=287125
* | Readd checking utf16 surrogates that are invalid in utf8Baptiste Daroussin2015-08-091-0/+7
| | | | | | | | Notes: svn path=/projects/collation/; revision=286518
* | Merge from HEADBaptiste Daroussin2015-08-091-22/+1
|\| | | | | | | Notes: svn path=/projects/collation/; revision=286492
| * Remove 5 and 6 bytes sequences which are illegal in UTF-8 space. (part2)Baptiste Daroussin2015-08-091-7/+1
| | | | | | | | | | | | | | | | | | Per rfc3629 value greater than 0x10ffff should be rejected Suggested by: jilles Notes: svn path=/head/; revision=286491
| * Remove 5 and 6 bytes sequences which are illegal in UTF-8 space.Baptiste Daroussin2015-08-081-8/+0
| | | | | | | | | | | | | | | | | | Per rfc3629 value greater than 0x10ffff should be rejected Suggested by: jilles Notes: svn path=/head/; revision=286490
* | Revamp CTYPE support (from Illumos & Dragonfly)Baptiste Daroussin2015-08-081-8/+10
|/ | | | | | | Obtained from: Dragonfly Notes: svn path=/projects/collation/; revision=286459
* minor perf enhancement for UTF-8Pedro F. Giffuni2014-07-041-19/+10
| | | | | | | | | | | | | Reduce some duplicate code. Reference: https://www.illumos.org/issues/628 Obtained from: Illumos MFC after: 1 week Notes: svn path=/head/; revision=268272
* citrus: Avoid invalid code points.Pedro F. Giffuni2014-05-011-2/+1
| | | | | | | | | | | | | | | | From the OpenBSD log: The UTF-8 decoder should not accept byte sequences which decode to unicode code positions U+D800 to U+DFFF (UTF-16 surrogates), U+FFFE, and U+FFFF. http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8 http://unicode.org/faq/utf_bom.html#utf8-4 Reported by: Stefan Sperling Obtained from: OpenBSD MFC after: 5 days Notes: svn path=/head/; revision=265167
* citrus: Avoid invalid code points.Pedro F. Giffuni2014-04-291-0/+8
| | | | | | | | | | | | | | | | From the OpenBSD log: The UTF-8 decoder should not accept byte sequences which decode to unicode code positions U+D800 to U+DFFF (UTF-16 surrogates), U+FFFE, and U+FFFF. http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8 http://unicode.org/faq/utf_bom.html#utf8-4 Reported by: Stefan Sperling Obtained from: OpenBSD MFC after: 5 days Notes: svn path=/head/; revision=265095
* Implement xlocale APIs from Darwin, mainly for use by libc++. This adds aDavid Chisnall2011-11-201-9/+14
| | | | | | | | | | | | | | | load of _l suffixed versions of various standard library functions that use the global locale, making them take an explicit locale parameter. Also adds support for per-thread locales. This work was funded by the FreeBSD Foundation. Please test any code you have that uses the C standard locale functions! Reviewed by: das (gdtoa changes) Approved by: dim (mentor) Notes: svn path=/head/; revision=227753
* Add comment explaining __mb_sb_limit trick here.Andrey A. Chernov2007-10-151-0/+5
| | | | Notes: svn path=/head/; revision=172661
* The problem is: currently our single byte ctype(3) functions are brokenAndrey A. Chernov2007-10-131-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | for wide characters locales in the argument range >= 0x80 - they may return false positives. Example 1: for UTF-8 locale we currently have: iswspace(0xA0)==1 and isspace(0xA0)==1 (because iswspace() and isspace() are the same code) but must have iswspace(0xA0)==1 and isspace(0xA0)==0 (because there is no such character and all others in the range 0x80..0xff for the UTF-8 locale, it keeps ASCII only in the single byte range because our internal wchar_t representation for UTF-8 is UCS-4). Example 2: for all wide character locales isalpha(arg) when arg > 0xFF may return false positives (must be 0). (because iswalpha() and isalpha() are the same code) This change address this issue separating single byte and wide ctype and also fix iswascii() (currently iswascii() is broken for arguments > 0xFF). This change is 100% binary compatible with old binaries. Reviewied by: i18n@ Notes: svn path=/head/; revision=172619
* Fix a bug where, for 6-byte sequences, the top 6 bits get compared toTom Rhodes2006-03-301-1/+1
| | | | | | | | 111111 rather than the top 7 bits being compared against 1111110 causing illegal bytes fe and ff being treated the same as legal bytes fc and fd. Notes: svn path=/head/; revision=157289
* . Static'ize functions exported via function reference variables only.Alexey Zelkin2005-02-271-13/+15
| | | | | | | | | | | | | | | . Replace inclusion of sys/param.h to sys/cdefs.h and sys/types.h where appropriate. . move _*_init() prototypes to mblocal.h, and remove these prototypes from .c files . use _none_init() in __setrunelocale() instead of duplicating code . move __mb* variables from table.c to none.c allowing us to not to export _none_*() externs, and appropriately remove them from mblocal.h Ok'ed by: tjr Notes: svn path=/head/; revision=142654
* Fix comparisons that test if an unsigned value is < 0.Stefan Farfeleder2005-02-121-2/+2
| | | | | | | Reviewed by: tjr Notes: svn path=/head/; revision=141716
* Add UTF-8-specific implementations of mbsnrtowcs() and wcsnrtombs().Tim J. Robbins2004-07-271-0/+163
| | | | | | | | | These convert plain ASCII characters in-line, making them only slightly slower than the single-byte ("NONE" encoding) version when processing ASCII strings. Notes: svn path=/head/; revision=132687
* Add fast paths for conversion of plain ASCII characters.Tim J. Robbins2004-07-091-0/+13
| | | | Notes: svn path=/head/; revision=131881
* Use conversion state objects to store the accumulated wide character,Tim J. Robbins2004-05-171-63/+67
| | | | | | | | | low bound, and the number of bytes remaining instead of storing the raw byte sequence and deriving them every time mbrtowc() is called. This is much faster -- about twice as fast in some crude benchmarks. Notes: svn path=/head/; revision=129336
* Move prototypes of various encoding-related functions into a new headerTim J. Robbins2004-05-121-5/+1
| | | | | | | file to avoid extern'ing them all over the place. Notes: svn path=/head/; revision=129153
* Perform some basic validation of multibyte conversion state objects.Tim J. Robbins2004-04-121-2/+14
| | | | Notes: svn path=/head/; revision=128155
* Don't cast away const qualifiers.Tim J. Robbins2004-04-101-1/+1
| | | | | | | Spotted by: bde Notes: svn path=/head/; revision=128081
* Allow partial multibyte characters to accumulate in conversion stateTim J. Robbins2004-04-071-12/+41
| | | | | | | | objects passed to mbrtowc(), mbsrtowcs(), and mbrlen(), as required by C99. Notes: svn path=/head/; revision=128004
* Fix a typo that caused mbrtowc() to always return 0.Tim J. Robbins2003-11-111-1/+1
| | | | Notes: svn path=/head/; revision=122467
* Convert the Big5, EUC, MSKanji and UTF-8 encoding methods to implementTim J. Robbins2003-11-021-71/+69
| | | | | | | | | mbrtowc() and wcrtomb() directly. GB18030, GBK and UTF2 are left unconverted; GB18030 will be done eventually, but GBK and UTF2 may just be removed, as they are subsets of GB18030 and UTF-8 respectively. Notes: svn path=/head/; revision=121893
* Whack 28 unused variables.Jacques Vidrine2003-02-181-1/+1
| | | | Notes: svn path=/head/; revision=111082
* Add a UTF-8 encoding method, which will eventually replace the antiqueTim J. Robbins2002-10-101-0/+204
"UTF2" method. Although UTF-8 and the old UTF2 encoding are compatible for 16-bit characters, the new UTF-8 implementation is much more strict about rejecting malformed input and also handles the full 31 bit range of characters. Notes: svn path=/head/; revision=104828