Opened 14 years ago
Last modified 13 years ago
#118 new task
SMP: mprotect-based membars don't work well
Reported by: | dmik | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | Next |
Component: | general | Version: | 1.6.0-b22 WSE |
Severity: | highest | Keywords: | |
Cc: |
Description (last modified by )
Investigation within #96 has shown that the newer mprotect-based membar technique used in recent Java versions doesn't work well on OS/2 under the SMP kernel for some reason.
The new technique was added to Java to reduce the overhead of calling membar instructions after each state transition, see this Java bug record http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=5075546 for more information.
The main idea of the new technique is to use a special page where each thread has its own word it writes to when it changes its state. The VM dispatcher thread changes the memory protection flags of this page to READONLY when it wants to be sure about the thread state after some change and then immediately switches it back to READWRITE which (as I understand) should cause the CPU to flush caches and make sure all cores see the same values of the thread state variables. As it is sometimes possible that this memory protection change happens when one of the threads is writing to the page, an access violation exception may be thrown. This exception needs to be handled gracefully by waiting until the VM thread restores the READWRITE protection mode and retrying the operation. And this is how it is done in Java.
When running under the SMP kernel on OS/2, this technique usually works. However, sometimes retrying the write operation to this special serialize page after restoring the READWRITE mode causes corruption of some registers (that are normally saved upon an exception and should be restored if the exception handler continues execution) so that weird (random) memory locations get written afterwards and/or random functions get called which eventually leads to an application crash due to another access memory violation, bad stack condition and so on.
This ticket is to attempt to solve the mentioned issues.
Note that in r297 the workaround from #96 has been applied that forces the -XX:+UseMembar option to all JVM invocations.
Attachments (2)
Change History (8)
comment:1 by , 14 years ago
comment:2 by , 14 years ago
Description: | modified (diff) |
---|
by , 14 years ago
Attachment: | No_FS_abuse_Odin.diff added |
---|
by , 14 years ago
Attachment: | No_FS_abuse_Java.diff added |
---|
comment:3 by , 14 years ago
Just for the record. Added some patches I created for Odin/Java to remove FS switching when I was trying to prove that this switching was the reason why the mprotect scheme fails. However I couldn't prove that: it turned out that similar failures happen when no FS switching takes place and we only maintain a single OS/2 exception chain.
comment:4 by , 14 years ago
Milestone: | Enhanced → GA2 |
---|
comment:5 by , 13 years ago
This ticket contains information about some other problems related to memory corruption on SMP during exception processing: http://svn.netlabs.org/odin32/ticket/37.
comment:6 by , 13 years ago
Milestone: | GA2 → Next |
---|
I don't have any valid guess about the cause of these problems (as well as why it happens in SMP mode only). In order to get one, we first need to write a simple test case that will not involve Odin at all and just implement the described membar technique:
This test case should emulate more or less what is going on in Java when no deprecated -XX:+UseMembar option is used.