Discussion:
ctype, chrtbl, rune tables and POSIX2008
(too old to reply)
Joerg Sonnenberger
2011-03-16 18:51:28 UTC
Permalink
Hi all,
I'm looking into adding at least part of the POSIX2008 locale support
and wondering about a bunch of legacy support in libc and the rest.

Is anyone still using chrtbl(8) and resulting locale files? Searching
for BSDCTYPE in /usr/share/locale answers the second question. I'm not
sure which NetBSD versions actually used the BSDCTYPE format. NetBSD 1.6
uses the rune format already. NetBSD 1.5 doesn't include any files in
that format either. This looks to me like there is no good reason for
keeping it.

Are there objections against making the runetype classification macros
actually part of the public API for ctype.h? The current _ctype_ would
still be kept for legacy purposes, but anything else could just use a
new _runctype_ pointer going directly to the corresponding runetype
table. New APIs like the POSIX2008 locale support wouldn't have to care
about it.

Joerg

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Christos Zoulas
2011-03-16 20:39:12 UTC
Permalink
Post by Joerg Sonnenberger
Hi all,
I'm looking into adding at least part of the POSIX2008 locale support
and wondering about a bunch of legacy support in libc and the rest.
Is anyone still using chrtbl(8) and resulting locale files? Searching
for BSDCTYPE in /usr/share/locale answers the second question. I'm not
sure which NetBSD versions actually used the BSDCTYPE format. NetBSD 1.6
uses the rune format already. NetBSD 1.5 doesn't include any files in
that format either. This looks to me like there is no good reason for
keeping it.
Nuke it.
Post by Joerg Sonnenberger
Are there objections against making the runetype classification macros
actually part of the public API for ctype.h? The current _ctype_ would
still be kept for legacy purposes, but anything else could just use a
new _runctype_ pointer going directly to the corresponding runetype
table. New APIs like the POSIX2008 locale support wouldn't have to care
about it.
Are you planning to add all the _l functions too?

christos


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Joerg Sonnenberger
2011-03-16 20:44:39 UTC
Permalink
Post by Christos Zoulas
Post by Joerg Sonnenberger
Are there objections against making the runetype classification macros
actually part of the public API for ctype.h? The current _ctype_ would
still be kept for legacy purposes, but anything else could just use a
new _runctype_ pointer going directly to the corresponding runetype
table. New APIs like the POSIX2008 locale support wouldn't have to care
about it.
Are you planning to add all the _l functions too?
Not promising to do all of them, but at least a good part, yes.
The one thing I currently do *not* consider to add is thread local locales.

Joerg

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Takehiko NOZAKI
2011-03-21 13:32:19 UTC
Permalink
hi, all.

early stage of implementation of POSIX2008's multi-locale such as *_l
function is here.
(i wrote it for more 2years ago, need to catch up -current)

ftp://ftp.netbsd.org/pub/NetBSD/misc/tnozaki/multi-locale-snapshot-20090102.tar.gz

more information, see following tech-userlevel discussion:
http://permalink.gmane.org/gmane.os.netbsd.devel.userlevel/10401

very truly yours.
--
Takehiko NOZAKI<***@gmail.com>

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Joerg Sonnenberger
2011-05-05 21:44:54 UTC
Permalink
Post by Takehiko NOZAKI
hi, all.
early stage of implementation of POSIX2008's multi-locale such as *_l
function is here.
(i wrote it for more 2years ago, need to catch up -current)
ftp://ftp.netbsd.org/pub/NetBSD/misc/tnozaki/multi-locale-snapshot-20090102.tar.gz
http://permalink.gmane.org/gmane.os.netbsd.devel.userlevel/10401
Your snapshot includes two different interfaces and I would like to get
a consensus on what we want to support.

One part is the POSIX2008 explicit locale interface. Essentially, all
functions operating on locale-sensitive data get a version which has
the locale as explicit argument. E.g. isalpha(ch) -> isalpha_l(ch, l).
This is highly desirable for multi-thread applications or programs
dealing with input in different languages at the same time.

The second part is the Linux/Darwin Thread Local Locale interface.
Basically, this allows setting a locale for the current thread without
modifying the global (fall back) locale. IMO this is just insane and
repeating the mistakes of the original locale interface. I am strongly
in favour of *not* implementing this.

Joerg

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
YAMAMOTO Takashi
2012-01-17 00:11:57 UTC
Permalink
hi,
Post by Joerg Sonnenberger
Post by Takehiko NOZAKI
hi, all.
early stage of implementation of POSIX2008's multi-locale such as *_l
function is here.
(i wrote it for more 2years ago, need to catch up -current)
ftp://ftp.netbsd.org/pub/NetBSD/misc/tnozaki/multi-locale-snapshot-20090102.tar.gz
http://permalink.gmane.org/gmane.os.netbsd.devel.userlevel/10401
Your snapshot includes two different interfaces and I would like to get
a consensus on what we want to support.
One part is the POSIX2008 explicit locale interface. Essentially, all
functions operating on locale-sensitive data get a version which has
the locale as explicit argument. E.g. isalpha(ch) -> isalpha_l(ch, l).
This is highly desirable for multi-thread applications or programs
dealing with input in different languages at the same time.
The second part is the Linux/Darwin Thread Local Locale interface.
Basically, this allows setting a locale for the current thread without
modifying the global (fall back) locale. IMO this is just insane and
repeating the mistakes of the original locale interface. I am strongly
in favour of *not* implementing this.
your suggestion is to add xlocale without uselocale, right?

is the rest of his patch ok for you?

YAMAMOTO Takashi
Post by Joerg Sonnenberger
Joerg
--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Joerg Sonnenberger
2012-01-18 14:32:10 UTC
Permalink
Post by YAMAMOTO Takashi
your suggestion is to add xlocale without uselocale, right?
Yes.
Post by YAMAMOTO Takashi
is the rest of his patch ok for you?
Looking at the latest version, I don't agree with using _ all over the
place. I think it would help to clean up dead code first -- is there a
good reason for keeping CITRUS optional? Getting rid of that helps for
all the patches in the area (ctype, multi-locale etc). After having only
a single implementation, we can easily go over each function set and
merge it individually.

Joerg

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
YAMAMOTO Takashi
2012-01-18 21:49:20 UTC
Permalink
hi,
Post by Joerg Sonnenberger
Post by YAMAMOTO Takashi
your suggestion is to add xlocale without uselocale, right?
Yes.
i'm not sure if it's a good idea because

- uselocale is acutally being used by third party code in the wild.

- uselocale is in standard.

- sometimes uselocale is the only choice.
eg. when calling a library which internally uses locale-sensitive functions.
Post by Joerg Sonnenberger
Post by YAMAMOTO Takashi
is the rest of his patch ok for you?
Looking at the latest version, I don't agree with using _ all over the
place. I think it would help to clean up dead code first -- is there a
good reason for keeping CITRUS optional? Getting rid of that helps for
all the patches in the area (ctype, multi-locale etc). After having only
a single implementation, we can easily go over each function set and
merge it individually.
the size of libc is the only reason to have it optional i'm aware of.
but i even don't know how much it makes libc smaller/bigger.

YAMAMOTO Takashi
Post by Joerg Sonnenberger
Joerg
--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Joerg Sonnenberger
2012-01-18 22:02:39 UTC
Permalink
Post by YAMAMOTO Takashi
hi,
Post by Joerg Sonnenberger
Post by YAMAMOTO Takashi
your suggestion is to add xlocale without uselocale, right?
Yes.
i'm not sure if it's a good idea because
- uselocale is acutally being used by third party code in the wild.
- uselocale is in standard.
- sometimes uselocale is the only choice.
eg. when calling a library which internally uses locale-sensitive functions.
The problem is that it adds the TLS access overhead for all the
locale-sensitive functions. Further complications are added by VAX and
Sun2 still not supporting TLS. It seems like a strong hack around for
broken code.
Post by YAMAMOTO Takashi
Post by Joerg Sonnenberger
Post by YAMAMOTO Takashi
is the rest of his patch ok for you?
Looking at the latest version, I don't agree with using _ all over the
place. I think it would help to clean up dead code first -- is there a
good reason for keeping CITRUS optional? Getting rid of that helps for
all the patches in the area (ctype, multi-locale etc). After having only
a single implementation, we can easily go over each function set and
merge it individually.
the size of libc is the only reason to have it optional i'm aware of.
but i even don't know how much it makes libc smaller/bigger.
***@britannica:/tmp/with-citrus$ size libc.so.12.179
text data bss dec hex filename
1136427 47232 65752 1249411 131083 libc.so.12.179

***@britannica:/tmp/without-citrus$ size libc.so.12.179
text data bss dec hex filename
1101773 41984 63832 1207589 126d25 libc.so.12.179

That's AMD64 with Clang. Does the small difference really justify the
complexity?

Joerg

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
YAMAMOTO Takashi
2012-01-18 22:42:23 UTC
Permalink
hi,
Post by Joerg Sonnenberger
Post by YAMAMOTO Takashi
hi,
Post by Joerg Sonnenberger
Post by YAMAMOTO Takashi
your suggestion is to add xlocale without uselocale, right?
Yes.
i'm not sure if it's a good idea because
- uselocale is acutally being used by third party code in the wild.
- uselocale is in standard.
- sometimes uselocale is the only choice.
eg. when calling a library which internally uses locale-sensitive functions.
The problem is that it adds the TLS access overhead for all the
locale-sensitive functions.
isn't it avoidable with a global
uselocale_has_ever_been_used_for_this_process variable?
Post by Joerg Sonnenberger
Further complications are added by VAX and
Sun2 still not supporting TLS. It seems like a strong hack around for
broken code.
can't they use a plain setspecific?
Post by Joerg Sonnenberger
Post by YAMAMOTO Takashi
Post by Joerg Sonnenberger
Post by YAMAMOTO Takashi
is the rest of his patch ok for you?
Looking at the latest version, I don't agree with using _ all over the
place. I think it would help to clean up dead code first -- is there a
good reason for keeping CITRUS optional? Getting rid of that helps for
all the patches in the area (ctype, multi-locale etc). After having only
a single implementation, we can easily go over each function set and
merge it individually.
the size of libc is the only reason to have it optional i'm aware of.
but i even don't know how much it makes libc smaller/bigger.
text data bss dec hex filename
1136427 47232 65752 1249411 131083 libc.so.12.179
text data bss dec hex filename
1101773 41984 63832 1207589 126d25 libc.so.12.179
That's AMD64 with Clang. Does the small difference really justify the
complexity?
thanks for providing numbers.
it seems not worth the complexity to me.
but i guess an x86-only guy like me is not a right person to judge. :-)

i remembered another possible problem; the size of statically linked binaries.
eg. a single printf pulled in a lot of locale stuff.

YAMAMOTO Takashi
Post by Joerg Sonnenberger
Joerg
--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Joerg Sonnenberger
2012-01-18 23:07:50 UTC
Permalink
Post by YAMAMOTO Takashi
Post by Joerg Sonnenberger
Post by YAMAMOTO Takashi
i'm not sure if it's a good idea because
- uselocale is acutally being used by third party code in the wild.
- uselocale is in standard.
- sometimes uselocale is the only choice.
eg. when calling a library which internally uses locale-sensitive functions.
The problem is that it adds the TLS access overhead for all the
locale-sensitive functions.
isn't it avoidable with a global
uselocale_has_ever_been_used_for_this_process variable?
I don't know. I just saw a report on the FreeBSD lists that the
uselocale() stuff add a significant overhead (78%?) to some common
operations. It requires a lot more work e.g. for ctype(3) and in fact,
it breaks the ABI (silently). That sounds like a good reason to not
support it.
Post by YAMAMOTO Takashi
Post by Joerg Sonnenberger
Further complications are added by VAX and
Sun2 still not supporting TLS. It seems like a strong hack around for
broken code.
can't they use a plain setspecific?
That makes the code more even more ugly.
Post by YAMAMOTO Takashi
Post by Joerg Sonnenberger
Post by YAMAMOTO Takashi
Post by Joerg Sonnenberger
Post by YAMAMOTO Takashi
is the rest of his patch ok for you?
Looking at the latest version, I don't agree with using _ all over the
place. I think it would help to clean up dead code first -- is there a
good reason for keeping CITRUS optional? Getting rid of that helps for
all the patches in the area (ctype, multi-locale etc). After having only
a single implementation, we can easily go over each function set and
merge it individually.
the size of libc is the only reason to have it optional i'm aware of.
but i even don't know how much it makes libc smaller/bigger.
text data bss dec hex filename
1136427 47232 65752 1249411 131083 libc.so.12.179
text data bss dec hex filename
1101773 41984 63832 1207589 126d25 libc.so.12.179
That's AMD64 with Clang. Does the small difference really justify the
complexity?
thanks for providing numbers.
it seems not worth the complexity to me.
but i guess an x86-only guy like me is not a right person to judge. :-)
i remembered another possible problem; the size of statically linked binaries.
eg. a single printf pulled in a lot of locale stuff.
Let's take bin/cat as example:

text data bss dec hex filename
163956 4396 19912 188264 2df68 with-citrus/cat
147756 4136 19912 171804 29f1c without-citrus/cat

Joerg

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
YAMAMOTO Takashi
2012-01-18 23:29:19 UTC
Permalink
hi,
Post by Joerg Sonnenberger
Post by YAMAMOTO Takashi
Post by Joerg Sonnenberger
Post by YAMAMOTO Takashi
i'm not sure if it's a good idea because
- uselocale is acutally being used by third party code in the wild.
- uselocale is in standard.
- sometimes uselocale is the only choice.
eg. when calling a library which internally uses locale-sensitive functions.
The problem is that it adds the TLS access overhead for all the
locale-sensitive functions.
isn't it avoidable with a global
uselocale_has_ever_been_used_for_this_process variable?
I don't know. I just saw a report on the FreeBSD lists that the
uselocale() stuff add a significant overhead (78%?) to some common
operations. It requires a lot more work e.g. for ctype(3) and in fact,
it breaks the ABI (silently). That sounds like a good reason to not
support it.
hm, i forgot the inlined ctype mess. good point.
i don't think there's a sane way to make uselocale to work for them
without changing the ABI.
Post by Joerg Sonnenberger
Post by YAMAMOTO Takashi
Post by Joerg Sonnenberger
Further complications are added by VAX and
Sun2 still not supporting TLS. It seems like a strong hack around for
broken code.
can't they use a plain setspecific?
That makes the code more even more ugly.
i consider a few #ifdef for now acceptable.
for long term, they need TLS anyway.
Post by Joerg Sonnenberger
Post by YAMAMOTO Takashi
Post by Joerg Sonnenberger
Post by YAMAMOTO Takashi
Post by Joerg Sonnenberger
Post by YAMAMOTO Takashi
is the rest of his patch ok for you?
Looking at the latest version, I don't agree with using _ all over the
place. I think it would help to clean up dead code first -- is there a
good reason for keeping CITRUS optional? Getting rid of that helps for
all the patches in the area (ctype, multi-locale etc). After having only
a single implementation, we can easily go over each function set and
merge it individually.
the size of libc is the only reason to have it optional i'm aware of.
but i even don't know how much it makes libc smaller/bigger.
text data bss dec hex filename
1136427 47232 65752 1249411 131083 libc.so.12.179
text data bss dec hex filename
1101773 41984 63832 1207589 126d25 libc.so.12.179
That's AMD64 with Clang. Does the small difference really justify the
complexity?
thanks for providing numbers.
it seems not worth the complexity to me.
but i guess an x86-only guy like me is not a right person to judge. :-)
i remembered another possible problem; the size of statically linked binaries.
eg. a single printf pulled in a lot of locale stuff.
text data bss dec hex filename
163956 4396 19912 188264 2df68 with-citrus/cat
147756 4136 19912 171804 29f1c without-citrus/cat
thanks.

my feeling is that the bloat is acceptable.

YAMAMOTO Takashi
Post by Joerg Sonnenberger
Joerg
--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Takehiko NOZAKI
2012-01-21 12:08:21 UTC
Permalink
hi,

I have no time to spare these ctype, multi-locale issue before NetBSD 6 branch.
please go on as you like.

latest ctype, multi-locale patch is here:
ftp://ftp.netbsd.org/pub/NetBSD/misc/tnozaki/

maybe code doesn't compile or patches are rejected, because latest
killing CITRUS=no knob.

and one more, funopen/fpos_t issue is should be done before libc major bump.
http://old.nabble.com/Proposal%3A-fpos_t-and-funopen(3)-API-change-td31203605.html


very truly yours.
--
Takehiko NOZAKI<***@NetBSD.org>

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Joerg Sonnenberger
2012-01-22 01:47:37 UTC
Permalink
Post by Takehiko NOZAKI
hi,
I have no time to spare these ctype, multi-locale issue before NetBSD 6 branch.
please go on as you like.
OK, I'll work on getting them into the tree. Thanks for all you have
done on this.
Post by Takehiko NOZAKI
ftp://ftp.netbsd.org/pub/NetBSD/misc/tnozaki/
maybe code doesn't compile or patches are rejected, because latest
killing CITRUS=no knob.
I expect some rejects due to missing files, but nothing major.
Post by Takehiko NOZAKI
and one more, funopen/fpos_t issue is should be done before libc major bump.
http://old.nabble.com/Proposal%3A-fpos_t-and-funopen(3)-API-change-td31203605.html
It's listed in src/lib/libc/shlib_version already.

Joerg

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Christos Zoulas
2012-01-22 06:26:02 UTC
Permalink
Post by Takehiko NOZAKI
Post by Takehiko NOZAKI
and one more, funopen/fpos_t issue is should be done before libc major bump.
http://old.nabble.com/Proposal%3A-fpos_t-and-funopen(3)-API-change-td31203605.html
I've done this. I am waiting for releng's permission to commit.

christos


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Christos Zoulas
2011-05-05 22:52:53 UTC
Permalink
Post by Takehiko NOZAKI
Post by Takehiko NOZAKI
hi, all.
early stage of implementation of POSIX2008's multi-locale such as *_l
function is here.
(i wrote it for more 2years ago, need to catch up -current)
ftp://ftp.netbsd.org/pub/NetBSD/misc/tnozaki/multi-locale-snapshot-20090102.tar.gz
Post by Takehiko NOZAKI
http://permalink.gmane.org/gmane.os.netbsd.devel.userlevel/10401
Your snapshot includes two different interfaces and I would like to get
a consensus on what we want to support.
One part is the POSIX2008 explicit locale interface. Essentially, all
functions operating on locale-sensitive data get a version which has
the locale as explicit argument. E.g. isalpha(ch) -> isalpha_l(ch, l).
This is highly desirable for multi-thread applications or programs
dealing with input in different languages at the same time.
The second part is the Linux/Darwin Thread Local Locale interface.
Basically, this allows setting a locale for the current thread without
modifying the global (fall back) locale. IMO this is just insane and
repeating the mistakes of the original locale interface. I am strongly
in favour of *not* implementing this.
I am also in favor not implementing the Thread Local Locale madness
except if it is required to get libstdc++ working.

christos


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Matthew Mondor
2011-05-07 21:42:01 UTC
Permalink
On Thu, 5 May 2011 23:44:54 +0200
Post by Joerg Sonnenberger
The second part is the Linux/Darwin Thread Local Locale interface.
Basically, this allows setting a locale for the current thread without
modifying the global (fall back) locale. IMO this is just insane and
repeating the mistakes of the original locale interface. I am strongly
in favour of *not* implementing this.
I totally agree,
--
Matt

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Loading...