Opened 14 years ago

Closed 14 years ago

#12 closed defect (fixed)

Use a single encoding in UI API

Reported by: dmik Owned by:
Priority: blocker Milestone:
Component: odin Version:
Severity: Keywords:
Cc:

Description

It is known that there are two system encodings in Windows: the ANSI encoding (used all over in the GUI APIs) and the OEM encoding (used for compatibility with DOS applications and in particular in the command line sessions for file and console output). For God knows what reasons, in some locales, these two encodings differ. For example, in .ru, we have cp1251 (windows-1251) for ANSI and IBM-866 for OEM (this is the same as in OS/2 and MS/PC DOS).

I found out that at least one UI function, MessageBox(), expects two encodings at the same time for two its different arguments: the window title bar should be in IBM866 while the message text itself -- in windows-1251. I guess this happened to be like that because message text is drawn by Odin on its own (and hence the ANSI encoding is expected) while the title bar is drawn by the system (using the OS/2 encoding which is the same as OEM). This is completely wrong and needs to be fixed. According to Win32 docs, ANSI should be supplied in both cases.

Attachments (1)

encoding_hell.png (42.6 KB ) - added by dmik 14 years ago.

Download all attachments as: .zip

Change History (7)

comment:1 by dmik, 14 years ago

A clarification is needed regarding the ANSI/OEM stuff. There are in fact several issues:

  1. ANSI isn't used everywhere in the APIs as it should. In places, where it isn't, the OS/2 encoding seems to be assumed. This is just a bug that needs to be fixed.
  2. There are locales where OEM and ANSI encodings differ on the native Windows systems. This gives us known problems with the Windows console and GUI applications. For example, the text in Russian passed to a GUI program on the command line from a console will appear wrongly in this GUI application because it is entered in IBM866 (OEM) encoding but being interpreted by the application as it is windows-1251 (ANSI). And the opposite: the GUI application prints the windows-1251 text to the console which is interpreted there as it is IBM866 and becomes unreadable.
  3. It seems that for many (or all?) such locales, the OS/2 system encoding matches the OEM Windows encoding (for historical reasons, due to the common ground of DOS, OS/2 and early Windows development). Since Odin internally uses the ANSI encoding (except the bug mentioned in 1.) this brings the same ANSI/OEM mismatch problem to OS/2 WRT to Odin-based applications. For other locales, it seems that OEM = ANSI = OS/2 encoding so there are no such problems (as there aren't on native Windows).

The ideal solution that would resolve 3. (the ANSI/OEM mismatch) would be to make the ANSI encoding equal to the OS/2 system encoding in Odin (e.g. IBM866 for .ru). However this will break binary compatibility with proprietary Windows applications (such as Flash and Opera) which expect that ANSI is in Windows terms (e.g. windows-1251 for .ru).

I will research more.

by dmik, 14 years ago

Attachment: encoding_hell.png added

comment:2 by dmik, 14 years ago

On the screen shot, you may see that things go crazy now. The window text is fine (proper ANSI), the window title is wrong (misinterpreted as OEM) and the window list entry is correct again (though it's the same string as the tilte). I bet they use a direct WinSwitchEntry API call to set the window list entry and perform a different code page conversion policy in there.

comment:3 by dmik, 14 years ago

Issue 1 is fixed in r21461.

comment:4 by dmik, 14 years ago

In r21462, I also fixed CreateProcess() that had the same issue. Note that for consistency, all OSLib*() helpers in Odin expect strings to be in ANSI as well (they will internally convert them to OEM).

comment:5 by dmik, 14 years ago

Hmm, actually it is not correct to interpret strings as ANSI in the involved OSLib* calls because all other OSLib* calls seem to interpret them as OEM (e.g. pass them directly to the OS/2 APIs) and changing it would be a whole lot of work. Canceled these changes (only in the part related to OSLib*, the rest of the changes is still valid) and replaced with a different solution.

comment:6 by dmik, 14 years ago

Resolution: fixed
Status: newclosed

In r21464, I solved issue 3 by providing special versions of argc and argv, called __argcA and __argvA, which are exported from KERNEL32.DLL (and available at the source level if you include windows.h). These variables represent the command line arguments in the ANSI encoding (as opposed to argc and argv which are always in OEM from the Windows point of view).

Note that these variables are valid only after WinMain() is entered and must not be used if WinMain() is not used. The common technique to implement the windows behavior for Win32 applications that don't parse the lpCommandLine parameter of WinMain() but rather use the standard argc and argv variables is as follows:

#include <windows.h>

#ifndef _MSC_VER

#include <odinlx.h>

int _main(int argc, char **argv);

int WIN32API WinMain(HANDLE hInstance,
                     HANDLE hPrevInstance,
                     LPSTR  lpCmdLine,
                     int    nCmdShow)
{
    return _main(__argcA, __argvA);
}

int main(int argc, char **argv)
{
    EnableSEH();
    RegisterLxExe(WinMain, NULL);
}

#else
#define _main main
#endif

int _main(int argc, char **argv)
{
    // do something with argv which is already in ANSI here
}

This basically corresponds to the RegisterLxExe() usage pattern anyway; the only difference is using these __argcA and __argvA in WinMain() instead of real argc and argv.

Note that __argcA is in fact the same value as argc (the number of arguments doesn't change after the codepage conversion), it is there just for clarity.

This pattern may be also found in the testapp/encodings test case. Now, the functionality of this test case on both the native Windows machine and on OS/2 under Odin is the same so I consider this ticket as done.

Note: See TracTickets for help on using tickets.