Opened 6 years ago

Closed 5 years ago

#46 closed defect (fixed)

XWLAN widget produces a ring 3 trap D on startup

Reported by: Lewis Rosenthal Owned by:
Priority: major Milestone: v3.14
Component: xcenter widget Version: 3.13
Keywords: Cc: steve53@…

Description

As reported in Arca Noae ticket #1160, the above condition occurs with the 14.104a debug kernel at address 5b:1db84c7c on startup.

Trap 13 (0DH) - General Protection Fault 0000
eax=666b7477 ebx=666b7477 ecx=666b7477 edx=00000008 esi=00000000 edi=00000008
eip=1db84c7c esp=00e656a0 ebp=00e656bc iopl=0 rf -- -- nv up ei pl nz na po nc
cs=005b ss=0053 ds=0053 es=0053 fs=150b gs=0000 cr2=025a10e0 cr3=7fe2c000 p=01
005b:1db84c7c 8a19 mov bl,byte ptr [ecx] ds:666b7477=invalid

The condition does not occur when xwlan.dll has been removed.

Change History (14)

comment:1 Changed 6 years ago by andib

It's a pity that track notification does still not work reliable so I didn't see this earlier :-(

I guess your trap is reproducable on your machine with your profiles and settings, correct?

I'm not good in deciphering trap information but I fear this is not related to #37. Anyway I'll upload a test version at evening or tomorrow so you can test if you like. Stay tuned if you want to give it a try.

Btw. it seems I do not have access to the mantis #1160 ticket - 'Access Denied.'

comment:2 Changed 6 years ago by Lewis Rosenthal

I guess your trap is reproducable on your machine with your profiles and settings, correct?

This is actually David's trap, but I believe it is reproducible, yes. I don't have

Btw. it seems I do not have access to the mantis #1160 ticket - 'Access Denied.'

Should be fixed, now, Andi. Sorry for the oversight.

comment:3 Changed 6 years ago by andib

I couldn't reproduce it on my standard MUT which I've set up with the kernel debugger and ICAT. Probably cause there is no WLAN NIC installed, no genmac and no profiles.

So I tried to get ICAT running with my T60 and BlueLion? installation. There I saw a Trap 13 with pmdf (and ZOC) which looks very similar. Though different address but I'm running a newer xwlan version there. I'm pretty sure this is the same as #37. The same as others described with 'xcenter automatically closes after a few seconds'. Usually only seen on new installations which makes sense now.

The root of the problem is that xwlan tries to connect (or search for) some network which is not defined by any profile. To be more precise it starts scanning when no profile is preselected. In this case some code dealing with wpa_supplicant runs. Currently I don't know why as this should not happen as long as no link to a network is established. Maybe some regression introduced with the wpa_reassociate code. Needs additional investigation.

I tracked the problem down in idebug and added some checks to fix this trap. Unfortunately this problem does not occur when you've got xwlan/wlanstat running once with a working profile. At least that's what I observed. So isn't that easy to reproduce. Anyway I'm confident enough that this fix will solve all these 'xcenter crashes on startup' to make a full 3.13 package available on netlabs for testing.

comment:4 Changed 6 years ago by andib

Resolution: fixed
Status: newclosed

Pretty sure your trap is fixed now so I'm closing this ticket.

Feel free to reopen if 3.13 does still not work for you. But please do the following -

  • remove the widget dll cause it's more convenient in such cases to use the standalone wlanstat.exe
  • open pmprinf with logfile
  • set Driver and TCPIP debugging to on (deb.cmd 1 on, deb.cmd 2 on)
  • start wlanstat
  • attach pmprintf logfile

comment:5 Changed 6 years ago by andib

Milestone: v3.20v3.14
Resolution: fixed
Status: closedreopened
Version: 3.123.13

I reopen this cause I once had a similar behavior even with the latest code. At least I can reproduce a hang when starting pmshell although I still can not debug it.

With debug kernel it seems to tightly loop in doscall1 or kernel without the trap described above.

Starting pmshell within idebug does not produce an exception, does not start the WPS, and can't be stopped with the debuggers stop button. So I'm a bit lost with this at the moment. My guess is something bad happens in the widgets init code called by xcenter/xwp but looking for a way to debug this. Is there something like _interrupt3() in the VAC library?

comment:6 Changed 6 years ago by andib

#include <builtin.h>
_interrupt3();

Works with VAC3.65. Now ICAT stops in the (x)wlan code. But ICAT gives me another pile of mysteries to solve....

comment:7 Changed 6 years ago by stevenhl

Cc: steve53@… added

comment:8 Changed 6 years ago by andib

v3.10 did not trap in this way. Unfortunately the svn v3.10 code does not build here.

But in the source code the very same strcpy line hasn't changed since then. Currently my best guess is that something in vac3.65 (library or compiler) does things different then with 3.08. Tested different alignment settings and compiler/linker options but without success. What makes me wonder is why this problem only shows with the debug kernel.

Anyone knows about a defect in 14.104 kernel (compared to 14.106) which could explain such effect?

comment:9 Changed 6 years ago by stevenhl

A couple of points. An exception reported by the kernel debugger is only an issue if it is fatal to the application. Exceptions occur all the time and the vast majority are handled by an exception handler and are not fatal the the process or the kernel. A simple example is a stack guard page fault. These occur all the time. The kernel commits the guard page, creates a new guard page and life goes on.

The VAC 3.65 library is different than the 3.08 library. For example, some of the functions shown in your stack trace do not exist in 3.08. The function names imply to me that you have enabled the heap memory debug option. Did you turn on /Tm perhaps?

If we had access to the 14.106 debug kernel, we would see exactly the same exceptions as you are seeing with the 14.104a debug kernel. What you are seeing is not a defect.

I agree the we need to spend some quality time on IRC. Now that we are into the New Year, we should find a time to do this. I am in PST8 this time of the year, so it will probably have to be afternoon your time.

comment:10 Changed 6 years ago by andib

/Tm was enabled with debug builds since ever. Test with /Tm- gives -

##r  
Trap 13 (0DH) - General Protection Fault 0000 
eax=61767264 ebx=61767264 ecx=61767264 edx=00000008 esi=00000000 edi=00000008 
eip=000628fc esp=000d6dc0 ebp=000d6ddc iopl=0 rf -- -- nv up ei pl nz na po nc 
cs=005b ss=0053 ds=0053 es=0053 fs=150b gs=0000 cr2=414c5758 cr3=00225000 p=00 
005b:000628fc 8a19           mov       bl,byte ptr [ecx]    ds:61767264=invalid 
##k  
005b:0006296f 01010101 01010101 01010101 01010101 __mktime + 9af 
005b:0005e60e 00460000 004770e4 000d6e34 00064c44 __udump_allocated + 12e 
005b:0005e31d 00460000 00000000 00081e4c 000001ab _debug_ucalloc + 6d 
005b:0006068b 00460000 00081e4c 000001ab 00000001 _tm_msg_print + 64b 
005b:00058a97 00463cd8 000d71fc 00081e4c 000001ab UniStrFromUcs + 47 
005b:73656363 00632e73 6d75645f 6e6f4370 7463656e  
##u  
005b:000628fc 8a19           mov       bl,byte ptr [ecx] 
005b:000628fe 4a             dec       edx 
005b:000628ff 41             inc       ecx 
005b:00062900 881dc4660b00   mov       byte ptr [000b66c4],bl 
005b:00062906 85d2           test      edx,edx 
005b:00062908 75f2           jnz       000628fc 
005b:0006290a 8bf0           mov       esi,eax 
005b:0006290c 8bc6           mov       eax,esi 
005b:0006290e ebb9           jmp       000628c9 
005b:00062910 55             push      ebp 
005b:00062911 8bec           mov       ebp,esp 
005b:00062913 53             push      ebx 
##

Bringing MUT back to life with -

##r eip=62910  
##g  
DelayHardError SYS3171: 4 string(s): 
 Pid 005a  Tid 0001  Slot 0087  HobMte 0fda 
 E:\_WORK\XWLAN\TRUNK\DEBUG\WLANSTAT.EXE 
 
Symbols unlinked (wlanstat) 
Symbols unlinked (genprism) 
Symbols unlinked (genmac) 


For completeness here is a link to a thread which deals with the same problem although started with a question to ICAT https://groups.google.com/forum/#!topic/comp.os.os2.programmer.misc/w5HxpMh_bVw

I've to sum up what I understand about this problem. After spending the best part of my Christmas/NY holidays already I'm unsure if I should invest even more. Against doing so -

  • Steven just stated that this exceptions is not a problem
  • the same code works since a long time and does fill the right memory with correct string
  • no problem with retail kernel, Exceptq now added to latest code base
  • even more time which could be used for RL too

Reasons for going further -

  • wlanstat and even worse xwlan xcenter widget does not work with debug kernel. You need to reboot with xwlan.dll moved out of the way and in case of the widget you do not know why the machine hangs
  • v3.10 works with debug kernel
  • although I never changed this part of the code I changed compiler and added code in other places. Including trace/debug messages. There is a slight chance that one of my debug messages in a totally unrelated place triggers this behavior (see Pauls comment in the thread). I know I'm a bit sloppy with string checks in debug/pmprintf messages. This 'slight chance' is it what drives me crazy.

Maybe I should check in all my current sources with the additional debug messages and comments as a branch and hope someone else will do the hard work.

comment:11 Changed 6 years ago by andib

Ok found wpacli being problematic. Need some code cleanup and have to find the exact place. But chances are high that the problem is there. My current crippled version starts without trap. Hopefully I've a fix within the next days.

comment:12 Changed 6 years ago by andib

My current working branch does not trigger this problem. But I somehow lost debugging via pmprintf completely. Need to fix that. But my assumption that this is related to debugging stuff somehow seems to be true.

comment:13 Changed 6 years ago by andib

With my safer _strncpy_s function and the reworked (pm)printf debugging handling I'm pretty sure this is solved now.

Some pmprintf calls didn't get there needed va_args. The (pm)printf calls where wrapped in #define DPRINF_xxx(p) statements and so there where no compiler warnings.

Some more code clean up and testing needed.

comment:14 Changed 5 years ago by andib

Resolution: fixed
Status: reopenedclosed

As corresponding ticket in AN mantis is closed now I close this too.

Note: See TracTickets for help on using tickets.