Attachments (1)
Change History (5)
by , 6 years ago
Attachment: | 0315_01.TRP.txt added |
---|
comment:2 by , 6 years ago
Resolution: | → worksforme |
---|---|
Status: | new → closed |
Not able to reproduce this with the checked in kmk binary, though I'm not able to build all the sources in the suggested sample as it rely on non-canonical libc features I don't have here.
From the looks of the stack dump, it almost looks as if someone was executing
C:/usr/bin/sh.exe -c g++.exe\ -E\ -x\ c++\ -\ 2\>\&1\ \<\ /dev/null\ \|\ D:/Coding/kbuild/master-install/@unixroot/usr/bin/kmk_sed.exe\ -e\ \"/search\ starts\ here/,/\[Ee\]nd\of\ search\ list/!d\"\ -e\ \"/\^\ /!d\"
Though I cannot find any traces of this in the repository pointed to. Putting that string directly into a makefile works fine here.
Eyeballing the code, I cannot see how it may happen either. If shellflags == 0, it is assigned "-ec" or "-c" on line 2911, there are no other assignments to this variable in the function. So, it smells like stack corruption, compiler issues, or mabye outdated sources. The compiler, btw, seems to not use cdecl for construct_command_argv_internal(), if that is indeed the function that is on the top of the stack.
Now, if you want me to spend more time on this, please reduce it to something I can work with (shouldn't be hard). And please, check if it works with the checked in binaries, because whatever patched stuff you're using isn't something I can support or debug for you.
comment:3 by , 6 years ago
Yes, you are right in that the attached trap happens under our own kmk & libc builds.
If I try the checked in binary, it doesn't crash indeed but it doesn't work well either. It spits a lot of fcntl(): Bad file number
and leaves a lot of xxxxxxxx.tmp
files in the current directory (including when run under the "canonical" libc066). Also, the LIBCx build doesn't finish here because the canonical kbuild lacks patches for automatic import library creation under OS/2 for Gxx3OMF tools (but that's another story of course).
Your analysis, however, suggests that it might be a compiler issue. Note that we use GCC4 for our own kmk (and libc) builds as opposed to your GCC3-based builds and there has been a number of similar issues in the past regarding the calling convention and such (mostly because the GCC4 port was started off with blindly applying GCC3 patches to places where a lot had been changed in GCC sources).
I will dig into that direction and report here if I find anything useful.
comment:4 by , 6 years ago
No, it's not the compiler: I've built kmk from our github tree with GCC3 — the crash is essentially the same. Here's what I found though: building kmk from the canonical (this) SVN repo at r3171 crashes too, also regardless of the compiler used.
The GCC4 crash:
Killed by SIGSEGV pid=0x2fd4 ppid=0x2fd2 tid=0x0001 slot=0x008d pri=0x0200 mc=0x0001 ps=0x0010 D:\CODING\KBUILD\TRUNK\OUT\OS2.X86\RELEASE\DIST\KBUILD\BIN\OS2.X86\KMK.EXE KMK 0:0002b0fb cs:eip=005b:0003b0fb ss:esp=0053:001bd360 ebp=001bd598 ds=0053 es=0053 fs=150b gs=0000 efl=00010246 eax=00000000 ebx=0000004e ecx=ffffffff edx=ffffffff edi=00000000 esi=000000e8 Process dumping was disabled, use DUMPPROC / PROCDUMP to enable it.
The GCC3 crash:
Killed by SIGSEGV pid=0x3360 ppid=0x335e tid=0x0001 slot=0x008d pri=0x0200 mc=0x0001 ps=0x0010 D:\CODING\KBUILD\TRUNK\OUT\OS2.X86\RELEASE\DIST\KBUILD\BIN\OS2.X86\KMK.EXE KMK 0:00022c50 cs:eip=005b:00032c50 ss:esp=0053:0017d2d0 ebp=0017d378 ds=0053 es=0053 fs=150b gs=0000 efl=00010206 eax=00000000 ebx=0000004e ecx=ffffffff edx=00000001 edi=00000000 esi=ffffffff Process dumping was disabled, use DUMPPROC / PROCDUMP to enable it.
I'm pretty sure if you build it at r3171, you will get this crash too. Which means that something between r3052 and r3171 blows it up and something that follows somehow fixes it.
r3171 is what our current github branch is based upon. So no surprise it also crashes. r3171 was selected as a base because it's the last revision prior to merging GNU make 4 sources in. With these make 4 updates, the OS/2 version behaves very badly as I already mentioned in my mail to you and above: fcntl(): bad file number
errors, *.tmp
files and eventual build failures (something related to job control and/or these bad file number errors, I suppose). I.e. not practically usable. Regardless of LIBC used. So I really wonder how you could build LIBCx with your checked-in binary.
So we have to kinda stick with r3171 on OS/2 (or fix make 4 glitches whcih is beyond our resources ATM).
Basically, it crashes because
shellflags
inconstruct_command_argv_internal
turns out to be NULL.Reverting the commit fixes issue. I also noticed that changing
to a string from the EMX codepath
(i.e. removing the exclamation mark) fixes the crash as well. I just noticed that it's missing in the EMX codepath and tried it out. Didn't dig deeper.