Opened 10 years ago

Closed 9 years ago

Last modified 9 years ago

#296 closed defect (wontfix)

Add SIGFPE handler to all threads

Reported by: dmik Owned by:
Priority: normal Milestone: libc-0.6.6
Component: libc Version: 0.6.5
Severity: normal Keywords:
Cc:

Description

It is known that some OS/2 system DLLs and some third party DLLs unexpectedly reset the FPU control word to a value that causes FPU exceptions to be thrown by the CPU. This may happen to any application, no matter what compiler and runtime it uses. Even if the application doesn't load the "viral" DLL directly, it may be injected into the process by the system (i.e. via PM DLL hooks).

However, according to IEEE 754, the default action for FPU exceptions (e.g. divide by zero) in the compatible C runtime is to return a special value (e.g. Inf, infinity).

This often leads to a situation when the program does something like float f = 1.0 / .0; and expects to get an appropriate result (Inf in this case) but instead it gets a SIGFPE and unexpectedly terminates (crashes). There are dozens of user bug reports like that and they continue to come from time to time.

Some applications work around this issue by manually calling _control87() at appropriate times to restore the default IEEE FPU CW value that masks off all exceptions. However, this doesn't always work. First, there may be many places that can indirectly cause an exception and it's very hard to track them all to insert a _control87() call. Second, even if it's done, it's still theoretically possible that the viral DLL kicks in between. The only ultimate solution for the application is to install a system exception handler for each of its threads, intercept all FPU exceptions, reset the control word and retry.

In fact, virtually any application needs this fix, for any of its threads (including threads created by third party library functions). Doing this over and over in each application is a pain. Functionally this definitely belongs to the C runtime. Normal programs (including third party libraries) are expected to start new threads using beginthread(). And this is what most applications do.

So the FPU exception handler must be installed by LIBC both for main() and for all threads started with beginthread().

PS. GCC on Mac shows the correct behavior, no SIGFPE when I do things like float f = 1.0 / .0;. This is just to show that this is the expected behavior of the modern compiler.

Change History (5)

comment:1 Changed 10 years ago by dmik

The exception handler is very simple per se. One example is the NSPR runtime from Mozilla: https://github.com/bitwiseworks/mozilla-os2/blob/8378058e9423e35b316bab4fa3a9bc7e265cc517/nsprpub/pr/src/md/os2/os2thred.c#L66.

Perhaps this can be inserted into one of the existing exception handlers LIBC installs (like __libc_Back_exceptionHandler in source:/branches/libc-0.6/src/emx/src/lib/sys/exceptions.c.

Perhaps the runtime should notice when the application calls _control87() to change the exception masking flags and enable/disable handling of the respective exceptions accordingly (when the application programmatically resets them it expects the exceptions to be thrown so that the runtime should not catch them and reset the control word).

comment:2 Changed 10 years ago by KO Myung-Hun

How about introducing a wrapper function for DosLoadModule?() ?

ULONG APIENTRY DosLoadModuleCW(PSZ pszObject, ULONG uObjectLen, PCSZ pszModule, PHMODULE phmod)
{
    ULONG rc;

    FSCW_VAR();
    FSCW_SAVE();

    rc = DosLoadModule(pszObject, uObjectLen, pszModule, phmod);

    FSCW_RESTORE();

    return rc;
}

As well, some define is needed somewhere. maybe, os2emx.h ?

#define DosLoadModule(a, b, c, d) DosLoadModuleCW(a, b, c, d)

And libc should set FPU CW correctly in according to IEEE specification before entering into main() and _DLL_InitTerm() to prevent some imported DLLs from corrupting CW.

comment:3 Changed 10 years ago by KO Myung-Hun

APIs which change FPU CW but do not restore it.

WinCreateMsgQueue()

comment:4 Changed 9 years ago by bird

Resolution: wontfix
Status: newclosed

Dmitry, I don't think ignoring SIGFPE is a good idea as applications may actually wish to get them. Application are free to change the FPU control word as they like to, either using the 3+ interfaces provided by kLibC or by changing the register directly. At least two of the kLibC "interfaces" are realized as inline assembly in header files, which means that kLibC cannot track FPU CW manipulation even if we wanted to (at least not for 0.6.x).

I like the idea to have "safe" wrappers similar to the ones we've got for high memory to help you application working around the issue. I'll create a defect for that instead. This is also the approach the kLibC itself takes to the problem. Afraid this may have to wait on 0.7 though. Safe FPU CW ticket: #317

comment:5 Changed 9 years ago by dmik

Knut, I don't suggest to completely ignore SIGFPE, I only suggest to use the exception handler to make sure the LIBC setting for FPU CW is obeyed (at least in the most common default case with FPU exceptions masked off). This will require to track the current LIBC setting, right, but that's another story.

Safe wrappers are good but they can't be an universal solution. AFAIR there are some DLLs that change FPU CW on the fly (some video driver DLLs IIRC) and there may be custom DLLs that do that too at arbitrary times (at least in theory). Also, some DLLs inserted into the process by PM may do so (but that's probably covered by your fix from #312 suggested by komh).

Note: See TracTickets for help on using tickets.