Opened 14 years ago

Closed 13 years ago

Last modified 13 years ago

#227 closed defect (invalid)

wcrtomb() broken

Reported by: Silvan Scherrer Owned by: bird
Priority: normal Milestone: libc-0.6.5
Component: libc Version: 0.6
Severity: normal Keywords:
Cc:

Description

this testcase shows the broken behaviour

#include <stdlib.h> #include <stdio.h>

int main() {

char buf[256] = {0}; int rc = wctomb(buf, 0xE4); printf("rc %d char %02x (%c)\n", rc, (unsigned char)buf[0], buf[0]); return 0;

}

Change History (9)

comment:1 by Silvan Scherrer, 14 years ago

i get the following output: rc 1 char e4 (õ)

lang is de_CH

and i expected to get a ä

comment:2 by Silvan Scherrer, 14 years ago

Owner: changed from bird to Silvan Scherrer

comment:3 by Silvan Scherrer, 14 years ago

Owner: changed from Silvan Scherrer to bird

i hate trac :) code snipped now better formated.

#include <stdlib.h>
#include <stdio.h>

int main()
{
	char buf[256] = {0};
	int rc = wctomb(buf, 0xE4);
	printf("rc %d char %02x (%c)\n", rc, (unsigned char)buf[0], buf[0]);
	return 0;
}

comment:4 by bird, 14 years ago

I'm a bit surprise you get anything since you're not calling setlocale(LC_ALL, "")... Could you do that and also supply the hex value of the LATIN SMALL LETTER A WITH DIAERESIS character you're expecting.

comment:5 by bird, 14 years ago

Component: baselayoutlibc

comment:6 by Silvan Scherrer, 14 years ago

ok did the test. even with LC_ALL i get the same output. the hex value of LATIN SMALL LETTER A WITH DIAERESIS (ä) is 84

comment:7 by bird, 13 years ago

Resolution: invalid
Status: newclosed

What happens is that when setting LC_ALL, LC_CTYPE or LANG to 'de_CH' you get a 'ISO8859-1' codepage, and in that code page LATIN SMALL LETTER A WITH DIAERESIS has the same value as the unicode code point. See http://en.wikipedia.org/wiki/ISO/IEC_8859-1 for details.

The codepage you're expecting to use is probably 850 (http://en.wikipedia.org/wiki/Code_page_850). To get output in that codepage you need to set LC_ALL, LC_CTYPE or LANG to the value 'de_CH.IBM850'.

I've added a testcase in r3756 which shows very clearly what setlocale() does and that the mbtowc <-> wctomb round-trip works correctly. There is no bug here, I think.

comment:8 by bird, 13 years ago

Milestone: libc-0.6.5
Version: 0.6

comment:9 by bird, 13 years ago

r3788 changes the setlocale behavior when no codeset is specified, it now defaults to the process code page.

Note: See TracTickets for help on using tickets.