Opened 3 years ago

Closed 3 years ago

#187 closed defect (fixed)

Crash when clicking on a list's header

Reported by: diver Owned by:
Priority: blocker Milestone: Qt 4.6.3
Component: QtGui Version: 4.6.2
Severity: highest Keywords:
Cc:

Description

open the file open dialoge in details view and click on the headers. this will crash qt in qtgui4.dll

Change History (10)

comment:1 Changed 3 years ago by dmik

  • Component changed from General to QtGui
  • Priority changed from major to blocker
  • Summary changed from listheader crash to Crash when clicking on a list's header

The problem can be easily reproduced in the debug build if we enable the code that turns on MMX/SSE/SSE2 drawing routines (see qInitDrawHelperAsm() in qdrawhelper.cpp). After that, opening any file dialog causes a crash.

The crash is in qt_blend_argb32_on_argb32_sse2() when doing some SSE2 machinery for blending pixels (for example, when drawing a transparent pixmap on another pixmap). Disabling SSE2 fixes the problem.

SSE2 is enabled for quite some time already (since March, see r617). And I'm sure I used the file open dialog in release Qt applications many times but I have never seen this crash. We must find what's going on.

It's difficult to debug because neither the VAC3 debugger nor idbug understand the SSE2 instruction set. I have to disassemble the byte flow on my own to see what GCC generated for us...

comment:2 Changed 3 years ago by rudi

It looks like a MOVDQA to an unaligned target address:

Opcode is: 66 0f 7f 45 B8 = MOVDQA [ebp-72], xmm0

In my case ebp is 0x0093bcd0. Thus [ebp-72] evaluates to 0x0093BC88, which is not on a 16 byte boundary.

comment:3 Changed 3 years ago by dmik

I thought that as well but I see no MOVDQA here, it's MOVDQU (0xF3)...

comment:4 Changed 3 years ago by dmik

Checked once again. It's clear now. In my case, I have a different instruction where it traps (was too sleepy yesterday to see that): PADDW xmm0, xmm1 (66 0F FD C1). According to this, for 128-bit ops (which ones involving xmmN are) the operands must also be 16-byte aligned and they are not.

This alignment also explains why it traps in different places depending on the build and why there are builds when it may not trap at all.

I hope GCC has an option to make the code involving xmm registers 16-byte aligned. I will check this now.

comment:5 Changed 3 years ago by rudi

Hmm, to me this pretty much looks like a compiler bug. Here ebp is perfectly aligned (as it is the default for GCC4.x). The crash happens when accessing a local variable of type m128i. The compiler should have taken care of that automatically.

comment:6 Changed 3 years ago by dmik

Yes, I agree -- the compiler has all information it needs to perform the proper alignment automatically. Probably the OS/2 port misses some config define or so. We need to talk to Paul about this.

But fortunately I found an option that helped -- -mstackrealign. It's intended for a bit different task (mixing old 4-byte aligned code with SSE) but it seems to do the job in our case: at prolog, it corrects ESP so that addressing m128i variables gives the proper alignment (I compared Asm output to check that in a few places that crashed before). I tested this workaround on two machines and it worked well on both. Please test it too and if works, we will accept this for now.

The change is in r810. In order to get it applied, you *must* manually touch qdrawhelper_sse.cpp and qdrawhelper_sse2.cpp in gui/paining (or delete the respective .obj files) and then run make in gui.

comment:7 Changed 3 years ago by rudi

Yes, it works here. Cool !

However, I'm not sure if it is necessary to apply the option to qdrawhelper_sse.cpp as well.

1.) Standard SSE doesn't know about m128i.
2.) m64 is not required to be aligned (even though for performance reasons it should be)

So the option makes only sense in case the compiler also fails to align the "m64" type to 8 bytes and the performance penalty of the stack re-align is less than the loss caused by a possible misalignment.

comment:8 Changed 3 years ago by diver

i can confirm, that i don't see the crash anymore

comment:9 Changed 3 years ago by dmik

Yes, right, but the alignment caused by -mstackrealign only adds two extra instructions here: andl -16, %esp in prolog and movl %ebp%, %esp% in epilog. The penalty for doing this should be worth the speed-up aligning 64-bit variables potentially gives. Let's leave it as is just in case unless you have a proof it degrades performance. More over, most CPUs have SSE2 anyway.

To further convince you (and myself :) I made a series of tests using the tests/blit test on the real machine (T400 Core 2 Duo 2.4G, Radeon HD3470, Panorama). The test is a release build run several times (to stabilize counting) with the mouse pointer far away from the test window (to avoid extra time for syncing wit it). Here are the results:

mode, func solid 20% transparent
SSE no align, qt_memfill32_sse 60K 55K
SSE -mstackrealign, qt_memfill32_sse 65K 57K
plain, qt_memfill_template<uint32, uin32> 74K 64K
SSE2 no align, qt_memfill32_sse2 75K 64K
SSE2 -mstackrealign, qt_memfill32_sse2 75K 64K

(Note that in the transparent case, the blend function is used instead of memfill, which is expectedly slower).

Even taking into account that the test's time measurements are inaccurate (big level of error, about 1-3%), we can see that -mstackrealign gives some 5% boost or so -- this is definitely not a harm.

Note that the speed of the plain function (no MMX-like technology used at all, just a normal MOV mem <- reg) beats SSE easily (probably due to better pipelining and caching mechanisms in modern processors) and that explains why qt_memfill32_sse is not used in the original Qt in SSE mode (it's commented out around #8032 in qdrawhelper.cpp which makes the plain templated version to be used instead).

Note also that in case of SSE2, the alignment doesn't give any noticeable speed boost in fill/blend operations and in both cases it's generally almost the same as the plain MOV (looks like modern CPUs are too cool). But it at least does not crash. And I think on older CPUs (which is a frequent case in the OS/2 world) the difference should be more noticeable.

Okay, enough with this for now.

comment:10 Changed 3 years ago by dmik

  • Resolution set to fixed
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.