Opened 14 years ago

Closed 14 years ago

#168 closed defect (fixed)

Arora web browser crashes in LIBC

Reported by: rudi Owned by:
Priority: major Milestone: Qt 4.6.3
Component: QtCore Version: 4.6.2
Severity: low Keywords:
Cc:

Description

When surfing around Arora often crashes with the following register dump:

Killed by SIGSEGV
pid=0x9128 ppid=0x0017 tid=0x0001 slot=0x007e pri=0x0200 mc=0x0001
C:\PROG\ARORA\ARORA.EXE
LIBC063 0:00053f6a
cs:eip=005b:1dc63f6a      ss:esp=0053:0214e388      ebp=0214e3c8
 ds=0053      es=0053      fs=150b      gs=0000     efl=00212216
eax=00000000 ebx=20392fd4 ecx=00010000 edx=00000000 edi=00000000 esi=25300000
Process dumping was disabled, use DUMPPROC / PROCDUMP to enable it.

It's an attempt to "setmem" a NULL pointer. I tried to analyze the problem and to me it appears that this comes from /webkit/JavaScriptCore/runtime/Collector.cpp, line 273. Digging a bit into the implementation of posix_memalign() in KLIBC, I tend to think that this function will not work correctly, when the requested alignment is larger than a page size. But exactly this happens in line 272.

I suggest to modify JavaScriptCore?\wtf\platform.h so that for OS/2 HAVE_POSIX_MEMALIGN is zero. Note that there are two instances of the JavaScriptCore? in the 3rdparty subtree: one that goes into QtScript? and one that ends up in QtWebKit?.

At the moment I'm writing this, my finding is just a hypothesis. I will try to prove it, when my other builds are complete...

Change History (9)

comment:1 Changed 14 years ago by rudi

O.K. that really seems to help. The crashes are gone !

Besides setting OS/2 HAVE_POSIX_MEMALIGN to 0, I also added an OBJ_ANY to JavaScriptCore?/runtime/Collector.cpp, line 259. I think we are doing himem builds anyway, so we don't need to care about systems, that don't support OBJ_ANY.

P.S.: Posting this comment using Arora ;-).

comment:2 Changed 14 years ago by Dmitry A. Kuminov

Milestone: Qt EnhancedQt 4.6.3

comment:3 Changed 14 years ago by Dmitry A. Kuminov

Great :)

How did you find out that posix_memalign() doesn't work for alignments greater than 64K? Here (eCS 2.0rc7, libc063) it does work:

#include <stdio.h>
#include <stdlib.h>

void foo()
{
    void *p1, *p2, *p3;
    int rc1 = posix_memalign(&p1, 16, 32);
    int rc2 = posix_memalign(&p2, 64 * 1024, 64 * 1024 * 2);
    int rc3 = posix_memalign(&p3, 256 * 1024, 256 * 1024 * 2);
 
    printf("rc1 %d p1 %p\n", rc1, p1);
    printf("rc2 %d p2 %p\n", rc2, p2);
    printf("rc3 %d p3 %p\n", rc3, p3);
}

int main()
{
    foo();   
    foo();
    return 0;
}

gives us

rc1 0 p1 0x500200
rc2 0 p2 0x510000
rc3 0 p3 0x540000
rc1 0 p1 0x530020
rc2 0 p2 0x5d0000
rc3 0 p3 0x600000

And I also don't get your point about OBJ_ANY. -Zhigh-mem is a suggestion only; kLIBC will operate in low-mem conditions if high memory is not available (or if some other kLIBC user votes against -Zhigh-mem IIRC the relevant code). However, I think that it makes sense to use a trial and error approach (i.e. first with OBJ_ANY and then without it, if the allocation fails) whenever we do DosAllocMem?() manually in webkit/jsc.

comment:4 Changed 14 years ago by rudi

How did you find out

I came to that conclusion by looking at the source code. I think the problem happens, when the posix_memalign() is called when the internal heap pool is too low to satisfy the request. In this situation a new chunk has to be obtained from the OS and the extra bytes needed to ensure the alignment are not correctly taken into account.

With the 4.6.3 build, the described crashes are gone as well, but probably due to the changes in the memory consumption.

And I also don't get your point about OBJ_ANY

I didn't know that -Zhigh-mem is a suggestion only. So "trial and error" would be the way to go.

comment:5 Changed 14 years ago by rudi

With the 4.6.3 build, the described crashes are gone as well, but probably due to the > changes in the memory consumption.

The crash is back after adding SSL support.

comment:6 Changed 14 years ago by Dmitry A. Kuminov

It's really funny that the above test case reproduces one of the very few sequences of execution that really work :) On the other side, there are many where posix_memalign() doesn't work as expected and it seems to be the case with arora. Indeed, some bugs are really frightened by some developers and don't show up right away.

I created http://svn.netlabs.org/libc/ticket/223 for this issue, please see the updated test case there that shows all the fun. I didn't look closer at the posix_memalign() implementation because fixing it in LIBC is quite meaningless (there is no acceptable way to propagate the fixed LIBC.DLL to the end users and I don't want to create the DLL hell by providing a private copy of this DLL with Qt). I also don't have time for that right now. So let's set HAVE_POSIX_MEMALIGN back to zero on OS/2 and use DosAllocMem?/64K alignment instead.

comment:7 Changed 14 years ago by Dmitry A. Kuminov

Removed HAVE_POSIX_MEMALIGN in r779. Added the trial and error approach for DosAllocMem?() in r780. Please check this in your setup.

comment:8 Changed 14 years ago by rudi

Works here.

comment:9 Changed 14 years ago by Dmitry A. Kuminov

Resolution: fixed
Status: newclosed

Good.

Note: See TracTickets for help on using tickets.