Opened 18 months ago

Closed 16 months ago

Last modified 16 months ago

#136 closed defect (worksforme)

Commit r3052 makes kmk crash on OS/2

Reported by: dmik Owned by:
Priority: major Milestone:
Component: kBuild Version: 0.1.5
Keywords: Cc:


To reproduce: git clone and kmk in the root. The crash log is attached.

Attachments (1)

0315_01.TRP.txt (117.4 KB) - added by dmik 18 months ago.

Download all attachments as: .zip

Change History (5)

Changed 18 months ago by dmik

Attachment: 0315_01.TRP.txt added

comment:1 Changed 18 months ago by dmik

Basically, it crashes because shellflags in construct_command_argv_internal turns out to be NULL.

Reverting the commit fixes issue. I also noticed that changing

  static char sh_chars_kash[] = "#;*?[]&|<>(){}$`^~!";                          /* note: no \" - good idea? */

to a string from the EMX codepath

  static char sh_chars_kash[] = "#;*?[]&|<>(){}$`^~";

(i.e. removing the exclamation mark) fixes the crash as well. I just noticed that it's missing in the EMX codepath and tried it out. Didn't dig deeper.

Last edited 18 months ago by dmik (previous) (diff)

comment:2 Changed 16 months ago by bird

Resolution: worksforme
Status: newclosed

Not able to reproduce this with the checked in kmk binary, though I'm not able to build all the sources in the suggested sample as it rely on non-canonical libc features I don't have here.

From the looks of the stack dump, it almost looks as if someone was executing

C:/usr/bin/sh.exe -c g++.exe\ -E\ -x\ c++\ -\ 2\>\&1\ \<\ /dev/null\ \|\ D:/Coding/kbuild/master-install/@unixroot/usr/bin/kmk_sed.exe\ -e\ \"/search\ starts\ here/,/\[Ee\]nd\of\ search\ list/!d\"\ -e\ \"/\^\ /!d\"

Though I cannot find any traces of this in the repository pointed to. Putting that string directly into a makefile works fine here.

Eyeballing the code, I cannot see how it may happen either. If shellflags == 0, it is assigned "-ec" or "-c" on line 2911, there are no other assignments to this variable in the function. So, it smells like stack corruption, compiler issues, or mabye outdated sources. The compiler, btw, seems to not use cdecl for construct_command_argv_internal(), if that is indeed the function that is on the top of the stack.

Now, if you want me to spend more time on this, please reduce it to something I can work with (shouldn't be hard). And please, check if it works with the checked in binaries, because whatever patched stuff you're using isn't something I can support or debug for you.

comment:3 Changed 16 months ago by dmik

Yes, you are right in that the attached trap happens under our own kmk & libc builds.

If I try the checked in binary, it doesn't crash indeed but it doesn't work well either. It spits a lot of fcntl(): Bad file number and leaves a lot of xxxxxxxx.tmp files in the current directory (including when run under the "canonical" libc066). Also, the LIBCx build doesn't finish here because the canonical kbuild lacks patches for automatic import library creation under OS/2 for Gxx3OMF tools (but that's another story of course).

Your analysis, however, suggests that it might be a compiler issue. Note that we use GCC4 for our own kmk (and libc) builds as opposed to your GCC3-based builds and there has been a number of similar issues in the past regarding the calling convention and such (mostly because the GCC4 port was started off with blindly applying GCC3 patches to places where a lot had been changed in GCC sources).

I will dig into that direction and report here if I find anything useful.

Last edited 16 months ago by dmik (previous) (diff)

comment:4 Changed 16 months ago by dmik

No, it's not the compiler: I've built kmk from our github tree with GCC3 — the crash is essentially the same. Here's what I found though: building kmk from the canonical (this) SVN repo at r3171 crashes too, also regardless of the compiler used.

The GCC4 crash:

Killed by SIGSEGV
pid=0x2fd4 ppid=0x2fd2 tid=0x0001 slot=0x008d pri=0x0200 mc=0x0001 ps=0x0010
KMK 0:0002b0fb
cs:eip=005b:0003b0fb      ss:esp=0053:001bd360      ebp=001bd598
 ds=0053      es=0053      fs=150b      gs=0000     efl=00010246
eax=00000000 ebx=0000004e ecx=ffffffff edx=ffffffff edi=00000000 esi=000000e8
Process dumping was disabled, use DUMPPROC / PROCDUMP to enable it.

The GCC3 crash:

Killed by SIGSEGV
pid=0x3360 ppid=0x335e tid=0x0001 slot=0x008d pri=0x0200 mc=0x0001 ps=0x0010
KMK 0:00022c50
cs:eip=005b:00032c50      ss:esp=0053:0017d2d0      ebp=0017d378
 ds=0053      es=0053      fs=150b      gs=0000     efl=00010206
eax=00000000 ebx=0000004e ecx=ffffffff edx=00000001 edi=00000000 esi=ffffffff
Process dumping was disabled, use DUMPPROC / PROCDUMP to enable it.

I'm pretty sure if you build it at r3171, you will get this crash too. Which means that something between r3052 and r3171 blows it up and something that follows somehow fixes it.

r3171 is what our current github branch is based upon. So no surprise it also crashes. r3171 was selected as a base because it's the last revision prior to merging GNU make 4 sources in. With these make 4 updates, the OS/2 version behaves very badly as I already mentioned in my mail to you and above: fcntl(): bad file number errors, *.tmp files and eventual build failures (something related to job control and/or these bad file number errors, I suppose). I.e. not practically usable. Regardless of LIBC used. So I really wonder how you could build LIBCx with your checked-in binary.

So we have to kinda stick with r3171 on OS/2 (or fix make 4 glitches whcih is beyond our resources ATM).

Last edited 16 months ago by dmik (previous) (diff)
Note: See TracTickets for help on using tickets.