Opened 7 years ago
Closed 7 years ago
#276 closed defect (fixed)
python: thread start hangs
Reported by: | dmik | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | |
Component: | python | Version: | |
Severity: | low | Keywords: | |
Cc: |
Description
Sometimes python hangs when attempting to start helper threads. See http://trac.netlabs.org/rpm/ticket/275#comment:6 for details.
Change History (4)
comment:1 by , 7 years ago
comment:2 by , 7 years ago
I overlooked the thread 2 state in the debugger which was Critical. This means that the thread was blocked by some other thread that entered the critical section with DosEnterCritSec
. Looking at pthread code closer made it obvious: _pthread_detach
on Thread 1 calls DosEnterCritSec
and then requests a LIBC heap's fast mutex in calloc()
. However, if this mutex happens to be owned by some other thread at that time (Thread 2 in this case, through a call to the same heap mutex via calloc()
too) it's a guaranteed deadlock as the other thread won't get any chance to run and release the held lock.
Note that deadlocks invlolving DosEnterCritSec
are known for causing system freezes and producing unkillable zombies. Given that pthread is used in many ports we maintain, this dangerous code might be called in many cases so I wonder how many freezes/halts are fixed when we fix this... Anyway, this deserves a separate ticket in Ports.
comment:3 by , 7 years ago
I created http://trac.netlabs.org/ports/ticket/175 for the pthread problem.
comment:4 by , 7 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
The problem has completely gone away with http://trac.netlabs.org/ports/ticket/175, closing.
The test case is this application
Popen.communicate() basically creates two threads to read from the child process output and then waits until these threads end. I can connect to this application when it hangs and this is what I see:
Thread 1 (which calls communicate()):
Thread 2 (one of worker threads created by python, it reads the child stdout):
It looks like the worker thread gets stuck in
__std_bzero
while allocating zero-filled memory from the heap. Since this allocation holds the heap mutex lock, Thread 1 is unable to acquire it when doing its own allocation and gets stuck too. It's unclear so far why__std_bzero
does not return.