Opened 12 years ago

Closed 10 years ago

#256 closed defect (fixed)

LIBC stream fmutex panic

Reported by: dmik Owned by: Silvan Scherrer
Priority: normal Milestone: libc-0.6.6
Component: libc Version: 0.6.4
Severity: normal Keywords:
Cc:

Description

An attempt to terminate a multi-threaded console program with Ctrl-C may often cause LIBC to abort with a message like this:

LIBC PANIC!!
fmutex deadlock: Owner died!
0x18b0902c: Owner=0x01490002 Self=0x01490001 fs=0x3 flags=0x0 hev=0x0001004a
            Desc="LIBC stream"
pid=0x0149 ppid=0x00ba tid=0x0001 slot=0x00ba pri=0x0200 mc=0x0000
D:\CODING\TESTS\OS2\DOS\SEH\SEH.EXE
Process dumping was disabled, use DUMPPROC / PROCDUMP to enable it.

This happens when a thread gets killed (as a result of the process termination procedure, initiated by the default Ctrl-C handler) while it is inside e.g. printf() and then another thread attempts to use printf() before it is terminated.

The same effect may be also seen when killing process with DosKillProcess?.

This is very annoying (especially if there are many threads each of which panics) and it can also cause a hard deadlock (e.g. if printf() is used in an exception handler) that won't ever let thread 1 to exit.

Attachments (4)

stream_panic.diff (17.6 KB) - added by dmik 12 years ago.
stream_panic.2.diff (18.1 KB) - added by dmik 12 years ago.
Updated version, see comments
stream_panic.3.diff (20.7 KB) - added by dmik 12 years ago.
Fixed assertions in kmk in -j >=2 mode plus some optiimizations
stream_panic.4.diff (20.8 KB) - added by dmik 12 years ago.
Fixes occasional "stream lock count 0" assertions

Download all attachments as: .zip

Change History (15)

comment:1 Changed 12 years ago by dmik

Attached is the patch that fixes this by installing an OS/2 exception handler and unlocking the stream mutex on thread termination.

PS. BTW, I didn't set the priorities that high, they are like that by default.

PPS. I really hope that it will appear in the upcoming 065.

Changed 12 years ago by dmik

Attachment: stream_panic.diff added

comment:2 Changed 12 years ago by bird

Priority: highestnormal
Severity: blockernormal

The proposed fix may leave the stream in a bad state, so it's not very helpful per se. I would have to look at how glibc and the freebsd libc handles these kinds of issues, but I'm kind of fearing that they just blame the application and say you-should-not-do-that.

comment:3 Changed 12 years ago by dmik

I actually checked glibc already. Although they define a cleanup routine that removes the lock, it's not intended for signal handlers it seems. However I don't think you can fully use Linux as a reference here since it doesn't have the thread 1 exitlist hangup problem at all...

Regarding the bad state. I don't see how my patch changes the situation here. It only comes into play when the thread gets killed, but when it happens the state of everything (including streams) is potentially messed up anyway, regardless of what I changed. My change is only to prevent a loud annoying crash or a much more annoying hang leading to an unkillable zombie.

Also, in 99 out of 100 cases, a dying thread means the whole application is being unexpectedly terminated. In this case, the question of the application state isn't relevant at all (as opposed to the crash delay or a hang which requires you to reboot the system).

I understand the "fix the application" attitude, but in this particular case it isn't possible to fix the application; I can't put an exception handler around each printf() call.

Last edited 12 years ago by dmik (previous) (diff)

Changed 12 years ago by dmik

Attachment: stream_panic.2.diff added

Updated version, see comments

comment:4 Changed 12 years ago by dmik

Added preserving and restoring FS (vital for Odin apps).

Changed 12 years ago by dmik

Attachment: stream_panic.3.diff added

Fixed assertions in kmk in -j >=2 mode plus some optiimizations

comment:5 Changed 12 years ago by dmik

The stream_abort() assertions in kmk were caused by the exception handler that could unexpectedly unlock the stream.

Note also that the panic (w/o the above patch applied, I mean) may also happen w/o any user interaction -- simply if one thread programmtically kills another thread while it is doing some IO (e.g. prints to the console). There are such apps over there. Just another argument on why we should care about that.

Last edited 12 years ago by dmik (previous) (diff)

Changed 12 years ago by dmik

Attachment: stream_panic.4.diff added

Fixes occasional "stream lock count 0" assertions

comment:6 Changed 12 years ago by dmik

One more tiny fix which checks in the exception handler if the stream is already unlocked (there is a hole between unlocking and resetting the field containing the FD to unlock).

comment:7 Changed 10 years ago by Yuri Dario

Milestone: libc-0.6.6

comment:8 Changed 10 years ago by Silvan Scherrer

Priority: normalhigh

comment:9 Changed 10 years ago by Silvan Scherrer

Owner: changed from bird to Silvan Scherrer
Status: newaccepted

comment:10 Changed 10 years ago by bird

Priority: highnormal

I think a better and less invasive solution here is to detect the process-is-being-kill situation (there should be code for that already) and shut up the panic.

comment:11 Changed 10 years ago by bird

Resolution: fixed
Status: acceptedclosed

r3849 relaxes the fmutex code when the process is being terminated. May of course just change some panic scenarios to crashes elsewhere if a thread was killed/interrupted in the middle of critical update.

Note: See TracTickets for help on using tickets.