Opened 6 years ago

Closed 4 years ago

#256 closed defect (fixed)

LIBC stream fmutex panic

Reported by: dmik Owned by: diver
Priority: normal Milestone: libc-0.6.6
Component: libc Version: 0.6.4
Severity: normal Keywords:
Cc:

Description

An attempt to terminate a multi-threaded console program with Ctrl-C may often cause LIBC to abort with a message like this:

LIBC PANIC!!
fmutex deadlock: Owner died!
0x18b0902c: Owner=0x01490002 Self=0x01490001 fs=0x3 flags=0x0 hev=0x0001004a
            Desc="LIBC stream"
pid=0x0149 ppid=0x00ba tid=0x0001 slot=0x00ba pri=0x0200 mc=0x0000
D:\CODING\TESTS\OS2\DOS\SEH\SEH.EXE
Process dumping was disabled, use DUMPPROC / PROCDUMP to enable it.

This happens when a thread gets killed (as a result of the process termination procedure, initiated by the default Ctrl-C handler) while it is inside e.g. printf() and then another thread attempts to use printf() before it is terminated.

The same effect may be also seen when killing process with DosKillProcess?.

This is very annoying (especially if there are many threads each of which panics) and it can also cause a hard deadlock (e.g. if printf() is used in an exception handler) that won't ever let thread 1 to exit.

Attachments (4)

stream_panic.diff (17.6 KB) - added by dmik 6 years ago.
stream_panic.2.diff (18.1 KB) - added by dmik 6 years ago.
Updated version, see comments
stream_panic.3.diff (20.7 KB) - added by dmik 6 years ago.
Fixed assertions in kmk in -j >=2 mode plus some optiimizations
stream_panic.4.diff (20.8 KB) - added by dmik 6 years ago.
Fixes occasional "stream lock count 0" assertions

Download all attachments as: .zip

Change History (15)

comment:1 Changed 6 years ago by dmik

Attached is the patch that fixes this by installing an OS/2 exception handler and unlocking the stream mutex on thread termination.

PS. BTW, I didn't set the priorities that high, they are like that by default.

PPS. I really hope that it will appear in the upcoming 065.

Changed 6 years ago by dmik

comment:2 Changed 6 years ago by bird

  • Priority changed from highest to normal
  • Severity changed from blocker to normal

The proposed fix may leave the stream in a bad state, so it's not very helpful per se. I would have to look at how glibc and the freebsd libc handles these kinds of issues, but I'm kind of fearing that they just blame the application and say you-should-not-do-that.

comment:3 Changed 6 years ago by dmik

I actually checked glibc already. Although they define a cleanup routine that removes the lock, it's not intended for signal handlers it seems. However I don't think you can fully use Linux as a reference here since it doesn't have the thread 1 exitlist hangup problem at all...

Regarding the bad state. I don't see how my patch changes the situation here. It only comes into play when the thread gets killed, but when it happens the state of everything (including streams) is potentially messed up anyway, regardless of what I changed. My change is only to prevent a loud annoying crash or a much more annoying hang leading to an unkillable zombie.

Also, in 99 out of 100 cases, a dying thread means the whole application is being unexpectedly terminated. In this case, the question of the application state isn't relevant at all (as opposed to the crash delay or a hang which requires you to reboot the system).

I understand the "fix the application" attitude, but in this particular case it isn't possible to fix the application; I can't put an exception handler around each printf() call.

Last edited 6 years ago by dmik (previous) (diff)

Changed 6 years ago by dmik

Updated version, see comments

comment:4 Changed 6 years ago by dmik

Added preserving and restoring FS (vital for Odin apps).

Changed 6 years ago by dmik

Fixed assertions in kmk in -j >=2 mode plus some optiimizations

comment:5 Changed 6 years ago by dmik

The stream_abort() assertions in kmk were caused by the exception handler that could unexpectedly unlock the stream.

Note also that the panic (w/o the above patch applied, I mean) may also happen w/o any user interaction -- simply if one thread programmtically kills another thread while it is doing some IO (e.g. prints to the console). There are such apps over there. Just another argument on why we should care about that.

Last edited 6 years ago by dmik (previous) (diff)

Changed 6 years ago by dmik

Fixes occasional "stream lock count 0" assertions

comment:6 Changed 6 years ago by dmik

One more tiny fix which checks in the exception handler if the stream is already unlocked (there is a hole between unlocking and resetting the field containing the FD to unlock).

comment:7 Changed 4 years ago by ydario

  • Milestone set to libc-0.6.6

comment:8 Changed 4 years ago by diver

  • Priority changed from normal to high

comment:9 Changed 4 years ago by diver

  • Owner changed from bird to diver
  • Status changed from new to accepted

comment:10 Changed 4 years ago by bird

  • Priority changed from high to normal

I think a better and less invasive solution here is to detect the process-is-being-kill situation (there should be code for that already) and shut up the panic.

comment:11 Changed 4 years ago by bird

  • Resolution set to fixed
  • Status changed from accepted to closed

r3849 relaxes the fmutex code when the process is being terminated. May of course just change some panic scenarios to crashes elsewhere if a thread was killed/interrupted in the middle of critical update.

Note: See TracTickets for help on using tickets.