Opened 13 years ago

Last modified 11 years ago

#127 new defect

-Xmx400m and above crashes SMP machine when concurrently running VPC

Reported by: Andreas Buchinger Owned by:
Priority: Feedback Pending Milestone:
Component: general Version: 1.6.0-b22 WSE
Severity: highest Keywords:
Cc: Steven Levine, Andreas Buchinger

Description

When running VPC and starting SmartSynchronize? with -Xmx400m and above, all sort of weird system crashes happen on SMP machine.

-Xmx512m -- immediate reboot
-Xmx400m(?) -- trap000e, pressing any key triggers trap0008....
-Xmx350m -- seems to work well

without VPC running -
P:\java6\bin\java.exe -mx800m -jar ..\lib\smartsynchronize.jar
Error occurred during initialization of VM
Could not reserve enough space for object heap
Could not create the Java virtual machine.
which is what I expect with current VIRTUALADDRESSLIMIT=1536 setting But do never crash. Works nicely with -mx750m.

On single core Thinkpad never crashes even with VPC running. Starting with -Xmx450m only java app start hangs which can be killed with CTRL-C. No crash.

Odin version ftp://ftp.dmik.org/tmp/j/ 23.06.11
Java version ftp://ftp.dmik.org/tmp/j/ 10.06.11

Attachments (1)

DRGTXT38.INI.NOK (1.4 KB) - added by Andreas Buchinger 13 years ago.

Download all attachments as: .zip

Change History (23)

Changed 13 years ago by Andreas Buchinger

Attachment: DRGTXT38.INI.NOK added

comment:1 Changed 13 years ago by Andreas Buchinger

To prove that this is a serious bug attached what I found in P:\Util\DragText38\DRGTXT38.INI file after such crash. Seems to be the odin log. Won't imagine which other files are overwritten too.

comment:2 Changed 13 years ago by dmik

Priority: majorFeedback Pending

Thank you for the report.

The only thing the -Xmx option does is setting the size of the Java heap -- a contiguous block of linear memory in the private arena reserved at JVM startup (eventually, just a single DosAllocMem?() call, so there is no place for multi-threaded race conditions).

So my wild guess is that attempts to allocate a block of size close to or above the limit screw up the SMP kernel on your system.

Please give us more information about your system: which OS/2 version, which kernel, etc.

Note that OS/2 is known to be unstable when VIRTUALADDRESSLIMIT is set to a value higher than 1024. See this part of README.OS2 for one example. So, you may be hitting a similar issue with one of your device drivers, for instance. Therefore I suggest you to test it with no VIRTUALADDRESSLIMIT set in config.sys to see if it makes any difference.

Also, please tell us what version of VPC you use and where to get it. Also, describe the minimal set of steps in order to reproduce the problem. And try something very simple like 'java -version' instead of starting a real Java application.

All in all, we need more information and a good way to reproduce.

comment:3 Changed 13 years ago by Andreas Buchinger

eCS2.0rc6 with updates (JFS...)

Kernel -
[d:\] bldlevel OS2KRNL
Build Level Display Facility Version 6.12.675 Sep 25 2001
(C) Copyright IBM Corporation 1993-2001
Signature: @#IBM:14.104a#@_SMP IBM OS/2 Kernel
Vendor: IBM
Revision: 14.104
File Version: 14.104
Description: _SMP IBM OS/2 Kernel

VPC - Innotek VPC 5.1 (Build 370, 2002-10-15) - no clue if vpcos2.zip is still available at the old innotek sites but if necessary I can find it somewhere.

I'm using VIRTUALADDRESSLIMIT=1536 since years with jfs cache sizes from /CACHE:256000 to 800000 (all tests with 256000) and HPFS386 cache size=30000.

{0}[d:\] java -version
openjdk version "1.6.0-wse"
OpenJDK Runtime Environment (build 1.6.0-wse-b22)
OpenJDK Client VM (build 19.0-b09, mixed mode)

java -Xmx900m -version --> ..Could not reserve enough space for object heap..
java -Xmx800m -version --> ..Could not reserve enough space for object heap..
but java -Xmx700m -version --> trap 000e

Test with VIRTUALADDRESSLIMIT=1024 will follow

Version 0, edited 13 years ago by Andreas Buchinger (next)

comment:4 Changed 13 years ago by dmik

So, you cannot trap it with VIRTUALADDRESSLIMIT=1024 no matter what -Xmx option you use?

Second question, by "VPC start afterwards gives a warning" you mean that you close the Java application and then start VPC or you keep the Java running and start VPC in parallel?

comment:5 Changed 13 years ago by Andreas Buchinger

Seems with VIRTUALADDRESSLIMIT=1024 VPC, at least this W2k guest with which I've made all my test till now, is not able to start anymore. WPS do not respond with xcenter showing 99.9% for one CPU. It takes a little time with all these reboots and chkdsks (eCS and W2k guest) to get reproducable results. Even with no java running VPC do not start anymore.

Have to test even more. I sent the above only cause I wanted to have it saved before the next trap... So let me test again tomorrow to give more input. Sorry no time left today.

comment:6 Changed 13 years ago by Andreas Buchinger

All test from now are made with GA version -
{0}[p:\smartsync] java -Xmx542m -version
openjdk version "1.6.0-ga"
OpenJDK Runtime Environment (build 1.6.0-ga-b22)
OpenJDK Client VM (build 19.0-b09, mixed mode)

Traps can be reproduced with VPC guests memory setting of 512MB or more. I had no traps with guests with 256MB or less memory. To give guest a memory of 800MB (which I have used for years) needs Virtualaddresslimit setting of more than 1024 (I had 1536). With Virtualaddresslimit=1024 I can not produce traps. But maybe this is cause in this case max. Java mem heap is about 310MB (-Xmx310m). Higher values gives '...Could not reserve enough space for object heap...'

With Virtualaddresslimit=1280:
Java heap max. is now 542MB (-Xmx542m).

  • no problem as long as VPC is not running.
  • no problem with VPC guest memory setting 96MB/128MB/256MB (java max. heap 542MB all the time)
  • immediate reboot with VPC guest memory setting 512MB and starting 'java -Xmx440m -version'
  • trap000e with VPC guest memory setting 512MB and starting 'java -Xmx510m..' or -Xmx500m
  • but '...Could not reserve enough space for object heap...' with VPC guest memory setting 512MB and starting 'java -Xmx540m' or -Xmx530m or -Xmx520m

Problem seems to occur only when VPC uses much memory (512MB) and java uses much memory too. This is only possible with Virtualaddresslimit>1024. As VPC is odin based too, maybe there's a problem with more than one odin based application which uses lot of memory. I'm currently testing with multiple java apps to check this but till now with no traps. But java apps seem to allocate the memory only when the really need it unlike VPC which immediately allocate all memory. So it's not that easy to came to comparable results memory wise. Still not sure if this is only a VPC problem.

Weird problem is only with some combination of memory settings (VPC 512m and Java -Xmx540m gives no trap but '...Could not reserve...' as stated above while f.i. -Xmx510m traps)

One indication for a java memory problem is, I can not start smartsynchronize with -Xmx512m and let them do real work and at the same time start smartsvn with -Xmx512m (smartsvn starts but hang at the logo screen). This is without VPC started.

All in all this needs even more testing. But maybe we should ask Knut if he has a clue about this as he probably knows more about VPC and Odin as most of us.

Last edited 13 years ago by Andreas Buchinger (previous) (diff)

comment:7 Changed 13 years ago by Andreas Buchinger

Related to this I once got a popuplog.os2 entry when running smartsynchronize with java6 -Xmx512m and the hanging smartsvn with java6 -Xmx512m and additionally trying a 'java -Xmx542m -version' which gives a return code 5 and popuplog.os2 -


06-29-2011 21:34:44 SYS3175 PID 00fa TID 0003 Slot 0127
P:\JAVA6\BIN\JAVA.EXE
c0000005
1cc3accf
P1=00000001 P2=00000098 P3=XXXXXXXX P4=XXXXXXXX
EAX=00000000 EBX=00000001 ECX=159b070c EDX=159b070c
ESI=0521fd40 EDI=0521fcc4
DS=0053 DSACC=f0f3 DSLIM=ffffffff
ES=0053 ESACC=f0f3 ESLIM=ffffffff
FS=288f FSACC=00f3 FSLIM=00000fff
GS=0000 GSACC= GSLIM=
CS:EIP=005b:1cc3accf CSACC=f0df CSLIM=ffffffff
SS:ESP=0053:0521fb18 SSACC=f0f3 SSLIM=ffffffff
EBP=0521fc6c FLG=00010246

JVM.DLL 0001:0049accf

As said only once. All other times it returned the usual java version.

comment:8 Changed 13 years ago by dmik

Thank you for the feedback.

If you read README.OS2 carefully, there is a detailed explanation of the dependency between the maximum Java heap size and the VIRTUALADDRESSLIMIT setting. This more or less relates to any application using high memory, not only Java (you may try e.g. Open Office etc).

Yes, Java only reserves memory at startup. It commits it as needed. I don't know about VPC. What is the amount of real physical memory on your machine? Are you sure it is enough for both Java and VPC, given what you request for them?

Anyway, it looks like a kernel bug related to managing high memory to me. The absence of traps in case of VIRTUALADDRESSLIMIT=1024 only proves that. I don't think we can do a lot here, but if you give me a link to a complete VPC package that I need to reproduce the problem (including any VHD/VM configurations this may require), I will try to reproduce it locally at least.

No promise of the quick fix for that. So far, try to either not use both VPC and Java together or live with VIRTUALADDRESSLIMIT=1024.

comment:9 Changed 13 years ago by Andreas Buchinger

Thx for looking into this. Now as I know the problem and how to circumvent I can live with it. But still not sure, if it's only a problem in conjunction with VPC or if it may appear with other odin/java apps too.

I've 4GB installed and eCS sees about 3.4GB. And yes, I've checked multiple times with memtest86. So there should be no problem with real available memory. Though I can not remember any time seeing eCS using more than 1.5GB (regarding to xcenter mem display).

No traps with VIRTUALADDRESSLIMIT=1024 can mean that

  • the kernel/OdinMemoryManagement/OdinExceptionHandler/....? traps only when using more then 1GB or beyond 1GB
  • or two concurrently running Odin/Java? apps do have problems only with more than 512MB per app (which is only possible with Virtualaddresslimit>1024)
  • or ???

If we feel certain this is only a VPC problem, I wouldn't spent any more minute on this. But I've the bad feeling this problem will hit eCS sooner or later again...

VPC/VHDs - have to try with stripped down images before. Cause I really don't want to upload >8GBs somewhere. It would take days with my upload bandwidth ;-) Have to test with the 14.105 kernel before too. Does it make sense to test with Pashas hacked kernels?

Is the popuplog.os2 entry above of any use?

comment:10 Changed 13 years ago by Andreas Buchinger

One additional note - VPC doc says it will use max. 1GB per guest. Maybe the VPC developers hit a similar limit too. Unfortunately Knut didn't respond on #netlabs yesterday...

comment:11 Changed 13 years ago by dmik

The POPUPLOG.OS2 entry just tells us it crashed in the fatal error report printing code in JVM (see #97) which is totally unrelated to your problem.

In order to find out if it's a VPC only problem, you could try to run Java in parallel with another greedy application that uses high memory such as OOo.

Java does nothing but allocates high memory with DosAllocMem?(OBJ_ANY), so Java is not the cause of the trap on its own. The same for Odin. So I can't see how this can be fixed on the application side except by removing OBJ_ANY usage (which will make it impossible to allocate more than 512M and therefore become equivalent to using -Xmx values < 500 => pointless).

comment:12 Changed 13 years ago by Andreas Buchinger

Another test without VPC - OpenOffice3.2 loading a 130MB file which tooks ages and never completed but allocated a few hundred MB memory, 2 Seamonkey instances running and

  • java -Xmx510m -version --> worked
  • java -Xmx500m -version --> trap 000e

I'm working on a simple reproducibly test scenario....

comment:13 Changed 13 years ago by Andreas Buchinger

If you want to test I've uploaded to my server at -
ftp://buchinger.selfip.net/vpc/
Login as user guest with your email address as password. Unfortunately my connection is not very stable especially with wet weather. But I think Peter Moylans ftp servers supports reget. The server should be online the next days.

The *_VHD is a eCS guest which resides on R:\ on my machine but can be anywhere of course. VIRTUALPC.zip is an installed VPC on P:\. Not sure if it works at another place. But have it transferred to another machines to P:\ successfully in the past without reinstall. When you configure a guest set the path for the first harddisk to the unzipped *.vhd and memory to 512MB.

I use this rexx script to step through various java memory settings but it should immediately trap with -Xmx500m. No matter if VPC runs before or is started afterwards.

/* Test java memory problem */
'call envjava6.cmd'

i = 300
DO WHILE 1

i = i + 10
IF i > 600 THEN DO

i = 300

END

cmdstring = "java -Xmx"i"m -version"
'call 'cmdstring

call SysSleep? 1

END

For network connection from within the guest OS I use 'Virtuelle Umschaltung'. For this the VPC switch driver is necessary which can be installed with vpcsetup.exe. Be aware, mpts can not process protocol.ini afterwards. You need to deinstall switch.os2 with vpcsetup before. But most probably you do not need switch.os2 anyway.

comment:14 Changed 13 years ago by Andreas Buchinger

No difference with os2krnl 14.105 and odin 2011-07-10

Virtualaddresslimit=1280
VPC guest memory setting 512MB
java -version --> reboot (without any -Xmx value, don't know what java uses as default mem value)

comment:15 Changed 13 years ago by Andreas Buchinger

Another reproducible crash scenario without VPC but OpenOffice3.2 and java6

  • fresh reboot (Virtualaddresslimit=1280)
  • start OO3.2 writer and load .bmp image with 240MB
  • open another writer instance and load another .bmp with 60MB (I can not load a second time a 240MB .bmp without silent OO termination)
  • again open another writer instance and load again a .bmp with 60MB
  • 'java -version' --> trap 000e

Sometimes with other image loading scenarios I get hard crashes without trap screen but full hangs (no mouse movement, no <CTRL> <ALT> <DEL>)

I've tested with a simple test application and different DosAllocMem? calls up to 3GB memory utilization (different processes with up to 600MB each) and OO but had no crashes.

With java -Xmx* and simple DosAllocMem? application I can not produce traps too. It needs a more complex app like OO or VPC AND java to crash the machine. Maybe OO frees memory blocks at the time java tries to allocate?

I've pointed Steven to this ticket. Maybe he has ideas how to narrow down this trap000e (?) problems.

comment:16 Changed 13 years ago by Steven Levine

Hi guys,

You are going to need a dump file to have any chance of figuring out what is going on. If you can make jvm.dll generate a ring3 exception, we can get started with a process dump file.

The system traps are going to require a system dump or quality time spent with the kernel debugger.

Andi, can you do com port or icat debugging on the MUT? This is probably what you need to track down the hard hangs.

comment:17 Changed 13 years ago by Steven Levine

Cc: Steven Levine added

comment:18 Changed 13 years ago by Andreas Buchinger

Cc: Andreas Buchinger added

comment:19 Changed 13 years ago by Andreas Buchinger

In theory yes, practically I never worked with KDB or ICAT. Com port and other eCS Systems are available. Not sure if I have a network card which works with ICAT but think it works with COM port too. Should search the Internet for a 'Setup of ICAT for dummies' before.

Do you think this are really page faults trap 000e) and what does it mean. I mean, if the page is not in memory (which I doubt cause I can reproduce it with more than 2GB free memory) than why the hell does the kernel not load it from the swap file? Obviously lack a basic understanding ...

comment:20 Changed 12 years ago by dmik

andib, how does OpenJDK GA3 build behave in these conditions?

comment:21 Changed 12 years ago by Andreas Buchinger

Unfortunately the same - http://members.aon.at/buchinger/Download/priv/IMG_2163.jpg

comment:22 Changed 11 years ago by jep

I have the same problem in eCS 2.2 Beta installation and it's impossible to run any java related app.

This is a physical AMD Phenom 9600, 2.3 GHz, 4 core machine.
It reboot as soon as I try to run a jar file, with or without VPC or VBox.

Note: See TracTickets for help on using tickets.