Opened 3 years ago

Closed 3 years ago

Last modified 2 years ago

#159 closed defect (fixed)

Occasional hangs on Ctrl-C

Reported by: dmik Owned by:
Priority: major Milestone: GA2
Component: general Version: 1.6.0-b22 GA
Severity: medium Keywords:
Cc: java6@…

Description

Sometimes the Java process hangs in the exit list on thread 1 when terminating the Java application by sending it a Ctrl-C signal.

This is rare and therefore it's easier to catch it when using the debug build of Odin/Java?. In either case it's not nice and needs to be fixed.

Change History (8)

comment:1 Changed 3 years ago by dmik

This is from comment 12 in #133:

I tried Ctrl-C several times with the new OpenFire? version and can't make it hang any more...

What I can see in Odin though since we switched it to GCC is that pressing Ctrl-C in debug builds may lead to the infamous "LIBC panic" message. My quick tests showed that this happens if the thread which is currently writing to the log file is terminated (by the Ctrl-C handler) while holding the fmutex used to serialize file access in LIBC and this is why it complains so loud. In some cases this could leave to the exit thread hang. The next time I get this, I will try to fix it in LIBC. Clearly, it should not panic in such cases.

This problem may be the reason for the OpenFire? hang as well. I will test some more.

comment:2 Changed 3 years ago by yoda

As I mentioned in #133:
The hang with control-C with OpenFire? have been there all the time - also
with the VAC builds of Odin. The best way to avoid is is to let OF be idle
for some time with no clients connected - then there is the biggest chance,
that control-C will work.

The hangs are very 'hard' - you can often not even shut down the PC.

comment:3 Changed 3 years ago by yoda

  • Cc java6@… added

comment:4 Changed 3 years ago by dmik

Ok, I think I now more or less know why we sometimes hang at Ctrl-C. The Odin code internally uses a class called VMutex to serialize access to many resources. This class in turn uses a set of DosEnterCriticalSection?() functions which use a couple of interlocked exchange operations in pair with an event semaphore to implement the mutex functionality.

However, if a thread which is currently holding such a VMutex instance, gets unexpectedly terminated (e.g. by the Ctrl-C handler telling the process to exit and terminate all threads), the mutex remains in a held state. If there happens to be any other thread that requests this mutex after it has been issued a terminate request (e.g. in its exception handler), it will never get it (since the owning thread is already dead) and hang forever. This in turn keeps thread 1 (which is responsible for handling process termination and killing other threads) to hang out forever -- a well known unkillable zobmie with the "exit thread 1" status.

The reason why it is easy to catch this condition in the debug build is because it does heavy logging (also in the exception handler) through WriteLog?() which uses VMutex. But it's not only about logging, there may be other functions called from the exception handlers which use VMutex (or a similar mutex implementation). I need to check that.

comment:5 Changed 3 years ago by dmik

I feel like this has something to do with #33.

comment:6 Changed 3 years ago by dmik

I fixed the Ctrl-C hang in release builds within #33 (r341).

Now I'm going to fix the Ctrl-C hang in debug builds that happens in WriteLog? (which has nothing to do with #33) by installing an exception handler and releasing the log mutex upon exception, this should help.

comment:7 Changed 3 years ago by dmik

  • Resolution set to fixed
  • Status changed from new to closed

I fixed the problem with WriteLog? in Odin, r 21979. I also had to disable one assertion in JVM (which is not correct on OS/2) in r342. And finally I had to fix LIBC itself, see http://svn.netlabs.org/libc/ticket/256 for details.

Now the debug version of Java doesn't hang if I terminate it with Ctrl-C.

Note that the LIBC fix can also affect release builds as well as it fixes the generic file I/O problem.

comment:8 Changed 2 years ago by yoda

This problem seems to have come back with some of latest Odin builds again,
as OpenFire? hangs almost every time I try it.

Could you please recheck this ?

Note: See TracTickets for help on using tickets.