wiki:KernelBpTips

Version 1 (modified by Markus Thielen, 13 years ago) (diff)

--

Kernel Debugging Tips

OS/2's Toolkit 3.0 contained a file named KDEBUG.INF that provides tight but very helpful information about kernel debugging.

Following is an excerpt about setting break points:

Setting useful breakpoints

There are many actions that can be invoked when you use a combination of breakpoint commands. You can easily set up the debugger to stop at a certain label, print information about the current system millisecond, current process, and executing file, and then continue. For example, this would be very useful in analyzing the performance of a subsystem .DLL. Another use would be to set a breakpoint on a file-read label in the file system, and dump information about which file is being read, where it is being read from, how much of the file is being read, and what is reading it.

In order to do this, you must know some internal information about the kernel and other system components, such as labels of various application programming interface (API) functions or worker routines.

One way to keep some release independence is to base breakpoints from function labels with an offset. These change much less often than the linear addresses of the functions. An example is as follows:

       BP _LDRGetMte +63 "DA %EDX L EAX +1; G"

This command places a breakpoint at the symbol _LDRGetMte, offset by hex 63. Whenever that breakpoint is hit, the debugger dumps (in ASCII) the memory at the linear address stored in register EDX, for a length contained in register EAX plus one byte. The program then continues executing.

Of course this offset within the function may change, but it is easier to find the new offset if it does than to use linear addresses for your breakpoints.

The effect of this breakpoint is to display the name of any executable file (such as one with a file type of .EXE, .DLL, or .SYS) that is started with DosExecPgm, DosStartSession, or DosLoadModule, or imported from any other executable file. This is very useful when debugging problems in applications.

A series of actions that can help you profile your code to see how often you are taking page faults on pieces of code follows:

  1. Unassemble at the label _ldrgetpage for approximately 30 instructions
  1. Look for a call to _ldrvalidatemtehandle
  1. Set a breakpoint on the instruction immediately following that call (where execution would go upon return from the call) as follows:
BP% address" . lm% eax ;G " 

This breakpoint will cause the loader to display the name of the code module (DLL or EXE) that code is being read from. For example, all code that is ever executed is referenced at least once (the first time it is loaded). However, if you are seeing excessively long code load times and a large amount of what looks to be thrashing, you may want to use this breakpoint. Also, if you have code residing on remote drives, you may want to use this breakpoint to see how much you are hitting the code on the remote drive to possibly reconfigure it.

A breakpoint you can use to profile your code is:

   BP xxxxxx "DD SISData+4 L1; G"

where xxxxxx is the address or label you wish to profile.

This breakpoint will give you a running millisecond counter in hexadecimal. If you want to tell the time it takes between two points in the code, add this to the breakpoints and the time will be displayed when it hits the points. This will help in narrowing down the slow area of code. Be careful when analyzing the results, however, as although the timer is reported in milliseconds, it is only updated every 16 milliseconds.

When the Presentation Manager Interface sets error information, you can query for it through the debugger by executing the following breakpoint:

   BP _winseterrorinfo

When this breakpoint is hit, the first parameter after the return address on the stack is the error code being set.

If you happen to press Ctrl+C while the speaker is sounding a tone, that tone will continue until the speaker is shut off. The command:

 0   61 41 

will output the instruction to turn the speaker port off.

Using the debugger, you can also see which keystrokes are being input to the Presentation Manager single input queue. The breakpoint:

 BP putinsqp "dw si 17; G"

will show you the keyboard scan codes before PMWIN gets them. This is helpful when tracking the direction of keystrokes.

You can also look at the processor's registers before they are affected by the OS/2 trap handler. By executing:

 BP trap$systemfatalfault

you can stop the system before the trap handler is invoked.

To look at the device driver header chain, you can use the following 2 commands:

 .D dev devhead 

This command will display the device driver information starting at the first device driver in the chain.

 .D dev 'devnext'

This will query the device driver chain, showing the installed device drivers.

If you ever need to break in just before the system reboots, use this:

 BP w_shutdown

To see which slots (threads) are being dispatched, you can look inside the scheduler by:

 BP _schgetnextrunner "dw tasknumber l1; G"

This will give you the slot number of each thread being dispatched. This is valuable to track performance issues to see if any one process is getting a disproportionately large share of the processor.

To interrupt a device driver just before it initializes, you can use:

 BP syiInitializeDeviceDriver 

This will stop just before the driver initializes so you can trace through the initialization routine.

Another useful breakpoint is one that tells you what kernel calls you are executing. There is a routine called "sci" that is used in all OS/2 kernel calls. By setting the following breakpoint you can have it report what kernel calls you are making.

 BP sci "ln wo(ss:esp)"

This breakpoint says to break on sci (system call interface) and list the near symbol representing the name of the kernel call. The kernel call is determined by looking at the return value on the stack. This is more effective than just stopping and listing near symbols because this method displays just the name of the routine and makes tracing a series of kernel calls much easier.

If you need to look into the functions dealing with the graphics engine (especially if you are writing a presentation driver such as a printer or display driver) you can set the following breakpoint:

   BP dispatch32 

This breakpoint is in PMGRE.DLL, and when it is encountered you should look at the return address. Next, unassemble at that address less approximately 20 bytes. You are looking for the first doubleword pushed onto the stack before the call. This should be the doubleword representing the command flags and the engine function number. The command flags are the high order word and the function number is the low order word of the doubleword. You could look at the stack frame to get this, but because this is the first dword pushed, it is the last dword in the frame due to the 32-bit calling conventions. Because the DDI stack frames are variable-sized, you may have to do some searching to find the end of the frame. You can do this too, but unassembling before the call works just as well and may be easier.

For 16-bit presentation drivers, you would want to set the breakpoint as follows:

  BP dispatch16 

Since the 16-bit calling conventions put the function number first, you can simply look at the stack frame on this one.

Another useful breakpoint is as follows:

   BP _LDRNewExe "G %(DW(SS:ESP)); DD _pgSwappablePages L 3; G"

This command places a breakpoint at the symbol _LDRNewExe. When that breakpoint is reached, the program continues executing until the caller is returned to. The debugger then displays the number of pages of swappable memory in the system, followed by the number of fixed (resident) pages, followed by the number of pages of discardable memory (read-only or execute-only pages). The program then continues executing.

The effect of this breakpoint is to display the hex number of pages that are in use by the whole system after every executable program is started.

The term "G %(DW(SS:ESP))" means "Go (G) to the linear address (%) that is the doubleword (DW) at the top of the stack (SS:ESP) and stop". The operators "WO" and "BY" are similar to "DW". "WO" means "word," and "BY" means "byte." This type of setup can be useful when displaying parameters for a function call as well, for example:

       BP _LdrOpenNewExe "DA %(DW(SS:ESP+4)); G"

When this breakpoint is executed, the debugger dumps ASCII (DA) starting at the linear address (%) given in the doubleword (DW) at SS:ESP plus 4 bytes (the doubleword at the top of the stack is the address for _LdrOpenNewExe to return to). The program then continues executing.

The effect of this breakpoint is to display (in ASCII) the name of every executable loaded module that actually opens a file. The difference between this breakpoint and the one given earlier is that the earlier one displays the names even if the module was attached to instead of newly opened and loaded. This breakpoint only gives the names of the newly opened modules. By setting both breakpoints and subtracting the names, you can determine the imported modules.

A series of commands that can be useful in determining the amount of memory an application requires is as follows:

 .PU   (pick a thread that is in the process you wish to examine)
 .SS n (where 'n' is the thread or slot number you picked)
 DL    (list all valid LDT selectors (all 16-bit segments))

If the DL command listing stops with an address not present or invalid, issue the ".ID x" command (where 'x' is the address) and then the G (Go) command. When the system stops again, the LDT page will be present. Reissue the ".SS n" and DL commands, and you will get a more complete listing of the LDT selectors. An example of the output follows:

     # .PU
       ... (generates a lot of information like...)
Slot Pid  Ord  pPTDA    Name    pstkframe  CS:EIP        SS:ESP     cbargs
0012 0006 0012 7d31ccb0 FOO.EXE 7d102f50 d02f:00002501 0b3f:00007eee 0008
...  ...  ...  ...      ...     ...      ...           ...           ...

     # .SS 12
     # DL
       ... (try it and stop at %7b176000...)
     # .ID %7b176000
       task|addr 0012|7b176000 queued, g to continue
     # G
       ... (stops at INT3 in tasking...)
       0170:fff629f4 cc             int     3
     # .SS 12
     # DL
       0007  Data    Bas=7b175000 Lim=0000ffff DPL=3 P  RO
       000f  Data    Bas=00010000 Lim=00001677 DPL=3 P  RW    A
       0017  Code    Bas=00020000 Lim=00003197 DPL=3 P  RE    A
       001f  Data    Bas=00030000 Lim=00001fff DPL=3 P  RW    A
       ...   ...     ...          ...          ...   .  ..    .

The first entry (0007 Data ...) points to the process' LDT itself, and should not be counted when adding up the amount of memory the application itself is using. However, it is part of the system overhead that is used for the process, and should be attributed as such.

The following breakpoints give you a great deal of information about the startup of an application:

BP g_tkExecPgm "? 'ExecPgm'; G" 
BP g_w_loadmodule "? 'LoadModule'; G" 
BP h_w_QAppType "? 'QryApplType'; DA DS:DX; G" 
BP _ldrGetMte +63 "DA %EDX ; G" 
BP _ldrOpenNewExe "DA %(DW(SS:ESP+4)); G" 
BP _load_error "DD SS:ESP L 3; G" 
BP h_w_getprocaddr "? 'GetProcAddr'; G" 
BP h_w_getprocaddr +23 "? 'by ordinal'; ? DI; G" 
BP h_w_getprocaddr +25 "? 'by name'; DA AX:DI; G" 

These nine breakpoints give you a great deal of information about how a program loads and starts executing. These breakpoints can be placed into a KDB.INI file on the system being tested to allow you to determine which binary files are loaded and executed. Please again note that these offsets may change in subsequent releases of the OS/2 kernel and loader. Some of the possible types of files are the following:

.DLL Dynamic link library .EXE Executable program .FON Static font file .PSF Dynamic font file .QPR Queue printer driver .PDR Printer driver .SYS Physical device driver .VDD Virtual device driver

The next two breakpoints inform the debugger that the specified label was passed. This allows you to interpret the successive breakpoints accurately.

  1. BP g_tkExecPgm "? 'DosExecPgm'; G"
  2. BP g_w_loadmodule "? 'DosLoadModule'; G"

The next breakpoint prints the name of the routine being executed, and prints the name of the file being queried.

  1. BP h_w_QAppType "? 'DosQueryAppType?'; DA DS:DX; G"

A program issues DosQueryAppType (QAT) when it has to determine the application type of an executable file. For example, when you type 'FREDDY' at a command prompt and press Enter, the system program CMD.EXE has to find out whether the file 'FREDDY' is a .EXE file, a .CMD file, a .COM file, or some other type of file.

Some information can be derived from the file's extension. The !DOS command processor works this way. However, you may still need to determine whether a .EXE file is a DOS application, a 16-bit OS/2 Version 1.3 application, a 32-bit OS/2 application, or a Windows application. DosQueryAppType (QAT) attempts to determine this in a way that is very similar to what happens when the program is actually loaded and executed.

Sometimes, DosQueryAppType (QAT) is issued to verify that the file exists. For example, SysInit? (the thread that loads all the device drivers) has to determine whether a device driver exists, and whether the file contains a valid device driver. SysInit? issues DosQueryAppType (QAT) for all of the BASEDEV= and DEVICE= statements in the CONFIG.SYS file.

Sometimes, the QAT type does not need to open the file. If the Shell queries the type of a file several times in quick succession, the first QAT call determines the type of the file, and then saves it for a few cycles. If another QAT call is made immediately for the same file, the previously-determined type is returned. This value is cleared when any other file-system access is made, such as when a file is deleted.

To see whether a file was actually opened for a loader or tasking call, see breakpoint number 5.

Breakpoint number 4 prints the name of any file being loaded, read, attached to, or otherwise referred to (for example, imported) in the loader.

  1. BP _ldrGetMte +63 "DA %EDX ; G"

If breakpoint number 1, 2, or 3 immediately precedes breakpoint number 4, then breakpoint number 4 indicates the name of the primary module being executed by DosExecPgm, or loaded by DosLoadModule (DLM), or queried by DosQueryAppType (QAT).

Breakpoint number 5 displays the name of any file being opened for execution by the loader. This breakpoint, along with breakpoint number 6, allows the debugger to find a large number of observed problems.

  1. BP _ldrOpenNewExe "DA %(DW(SS:ESP+4)) ; G"
  2. BP _load_error "DD SS:ESP L 3; G"

Breakpoint number 6 is the error handler for this area. Many error codes are not returned through a base-pointer chained stack frame through several levels of function calls. Instead, a special stack frame is created when the loader is entered, and the stack offset to this frame is preserved. When an error occurs, _load_error() is called with two parameters,an error code and an optional MTE handle. In breakpoint number 6, the debugger displays the return address from the location the error was detected, the error code for what happened, and either a 0 or a module table entry (MTE) handle for where the error was found. File BSEERR.H contains the Control Program error codes. Common error codes are 2 (ERROR_FILE_NOT_FOUND) and 193 (ERROR_BAD_EXE_FORMAT). If an MTE handle is supplied, issue the '.lm[o] handle' command to obtain more information.

After _load_error() finishes processing, it clears the stack of everything above the special stack frame. It then returns directly to the original caller, without executing any intervening code. This is efficient, and it avoids a large amount of error-handling code in each function.

The next three breakpoints are used to determine what is being retrieved from a .DLL after a successful call to DosLoadModule (DLM). Breakpoint number 7 displays the label 'DosGetProcAddr', followed by information about the .DLL being queried, including its name.

Breakpoint number 8 displays the ordinal number of the code or data being sought in the .DLL. Breakpoint number 9 displays the name of the code or data being sought in the .DLL. Breakpoints number 8 and 9 are mutually exclusive.

 7. BP h_w_getprocaddr  "? '!DosGetProcAddr'; .LM EAX; G"
 8. BP h_w_getprocaddr +23 "? 'by ordinal'; ? DI; G"
 9. BP h_w_getprocaddr +25 "? 'by name'; DA AX:DI; G"

Breakpoints number 7, 8, and 9 do not provide information that is being imported from a .DLL. To do this, more information about the internal operation of the loader is required.

The following is an example that uses these breakpoints:

!LoadModule 

C:\OS2\DLL\TIMESNRM.PSF
43 3a 5c 4f 53 32 5c 44-4c 4c 5c 54 49 4d 45 53 C:\OS2\DLL\TIMESNRM.PSF.
!QryApplType 

C:\OS2\PMSHELL.EXE
43 3a 5c 4f 53 32 5c 50-4d 53 48 45 4c 4c 2e 45 C:\OS2\PMSHELL.EXE
!QryApplType 

04a8:00000002 C:\OS2\PMSHELL.EXE
!QryApplType 

04a8:00000002 C:\OS2\PMSHELL.EXE
!ExecPgm 

C:\OS2\PMSHELL.EXE
!LoadModule 

BVSCALLS 
!GetProcAddr 

by ordinal 

0003H 3T 3Q 0000000000000011Y '.' TRUE 
!LoadModule 

PMSDMRI 
!LoadModule 

C:\OS2\DLL\PMATM.DLL
!GetProcAddr 

by name 

04a8:00000002 FONT_DRIVER_DISPATCH_TABLE
!LoadModule 

DISPLAY 
!LoadModule 

C:\OS2\DLL\DISPLAY.DLL
!LoadModule 

PMCTLS 

!LoadModule 

SPL1B 
!LoadModule 

SPL1D 
53 50 4c 31 44                                  SPL1D
0030:00006724  fff8b6c3 00000002 00000000 (this is what _load_error() displays)