Opened 4 years ago

Last modified 3 years ago

#366 new defect

No fork callback to safely use LIBC heap

Reported by: dmik Owned by:
Priority: normal Milestone: new
Component: libc Version: 0.6.6
Severity: normal Keywords:
Cc: lewisr, dryeo


It turns out that even the callbacks registered with __LIBC_PFORKHANDLE::pfnCompletionCallback that are called very last before returning from fork() to the user code are not safe for using LIBC itself: at least, an attempt to create a new log instance with __libc_LogInit aborts the forked child with

fmutex deadlock: Owner died!
0x2003013c: Owner=0x25930001 Self=0x25940001 fs=0x3 flags=0x0 hev=0x00010004
            Desc="LIBC Heap"
pid=0x2594 ppid=0x2592 tid=0x0001 slot=0x00ab pri=0x0200 mc=0x0000 ps=0x0010
Process dumping was disabled, use DUMPPROC / PROCDUMP to enable it.

_fmutex operation failed:  LIBC Heap request

It looks to me like it can't use its own heap by that time.

A fork callback when it's safe to use LIBC is very desirable. It would, in particular, allow to solve the infamous _DLL_InitTerm problem in forked children and similar 3rd party DLL initialization issues when the only way to initialize the DLL in the forked child is to do it in a delayed fashion, i.e. when a function requiring initialization is called for the first time. This is very annoying as it basically requires a check in all functions routing the execution to the initialization routine if it's not already called.

Attachments (1)

fork_completion_callback.diff (3.8 KB) - added by dmik 3 years ago.

Download all attachments as: .zip

Change History (8)

comment:1 Changed 4 years ago by dmik

To be more exact, it fails in _hmalloc, here is the excerpt from the LIBC log:

000477a1 01 02 0002 Entr 0000 _hmalloc: cb=272
000477a1 01 03 000d Entr 0000 __fmutex_request_internal: sem=0x2003013c{.pszDesc=LIBC Heap} flags=0x1 fs=0x2
0004f8ed 01 04 0000 Entr 0000 __libc_Back_panicV: fFlags=0x0 pvCtx=0x00000000 pszFormat=0x1f6808d8:{fmutex deadlock: %s
%x: Owner=%x Self=%x fs=%x flags=%x hev=%x

and here is the console output:

fmutex deadlock: Owner died!
0x2003013c: Owner=0x00510001 Self=0x00520001 fs=0x3 flags=0x0 hev=0x00010005
            Desc="LIBC Heap"
pid=0x0052 ppid=0x0050 tid=0x0001 slot=0x0093 pri=0x0200 mc=0x0000 ps=0x0010
Process dumping was disabled, use DUMPPROC / PROCDUMP to enable it.

So it's clear that the LIBC Heap fmutex is owned by the parent for the duration of fork(). Perhaps this can be fixed in a way similar to #363.

comment:2 Changed 4 years ago by dmik

Hmm, there is a umForkCompletion callback that does exactly that: releases LIBC heaps in the child. However, it seems to be called after the callback I add in reply to __LIBC_FORK_OP_FORK_PARENT. Which is because completion callbacks are called in LIFO order and and my callback is installed much later than umForkCompletion (added in reply to __LIBC_FORK_OP_EXEC_PARENT called with priority 0xffffff01). I need to try using __LIBC_FORK_OP_EXEC_PARENT as well and a higher priority for my parent callback, something like 0xfffffff (the highest one).

Last edited 4 years ago by dmik (previous) (diff)

comment:3 Changed 4 years ago by dmik

Well, using the highest priority doesn't help for some reason, my callback still gets called first. I need to enable LIBC logging again to see what's going on.

Last edited 4 years ago by dmik (previous) (diff)

comment:4 Changed 4 years ago by dmik

Okay, it turns out that fork callbacks (including their priorities) are processed per module. And the module processing order matches the link order: first, LIBC callbacks are processed, then LIBCX callbacks and always the last - EXE callbacks. This way, it's impossible to register a completion callback that would be called *after* all LIBC callbacks and hence it's impossible to use LIBC Heap from 3rd-party fork callbacks. This looks like a missing feature to me.

A backward-compatible solution would be adding pfnCompletionCallbackEx that takes a boolean flag indicating if the callback should be added to the end of the list (executed last) or to the front of the list (executed first). Or, even more compatible, the enmContext argument of the existing pfnCompletionCallback may accept a __LIBC_FORK_CTX_APPEND flag (added to the __LIBC_FORKCTX enum) that will instruct the callback to be added to the end and executed last.

Last edited 4 years ago by dmik (previous) (diff)

comment:5 Changed 4 years ago by lewisr

Cc: lewisr added

Changed 3 years ago by dmik

comment:6 Changed 3 years ago by dmik

Attached a patch that solves this problem by letting the callback be called last with a new __LIBC_FORK_CTX_FLAGS_LAST flag. This serves well for LIBCx needs meaning that I can finally perform post-fork tasks in the child having LIBC itself fully operational (its Heap not locked, all file handles are properly inherited, etc.) now. This may be useful for other DLLs wanting to support forking too, of course.

The only disadvantage is that it's impossible to fail gracefully from the forked child if something goes wrong because the completion callback is called when the communication between the parent and the child is mostly over. It would be better to have a fully functioning LIBC at a __LIBC_FORK_OP_FORK_PARENT / __LIBC_FORK_OP_FORK_CHILD time but given that LIBC finalizes itself from the completion callback as well, this would require to either change the way how LIBC uses callbacks for its own needs or call completion callbacks also per-module (rather than keep them in a global list processed at once in the end, after all per-module callbacks, like now). But both approaches seem non-trivial to implement so given that we may fail in the child by simply doing exit(1) or such inside the completion callback (which will result in EINTR from fork() on the parent side and LIBC PANIC bitching in the child), my patch seems fine.

Last edited 3 years ago by dmik (previous) (diff)

comment:7 Changed 3 years ago by dryeo

Cc: dryeo added
Note: See TracTickets for help on using tickets.