Opened 8 years ago

Closed 8 years ago

#24 closed defect (fixed)

Trap in FS_FSINFO because of NULL "pBootFSInfo" pointer in pVolInfo

Reported by: erdmann Owned by:
Priority: critical Milestone: GA
Component: IFS Version:
Severity: highest Keywords:
Cc:

Description

Recently I had a trap in FAT32.IFS and I tracked it down to routine "FS_FSINFO" in file "fat32.c". The trap occurred in version 0.9.13 version but the code is still unchanged in the current version 0.10.x. The trap occurred in this line of code: pAlloc->cUnitAvail = pVolInfo->pBootFSInfo->ulFreeClusters;

The trap occurs because at this point, pBootFSInfo is NULL. The trap happened because OS2DASD/OS2LVM got confused of if a (removable) USB stick was present or not and the said USB stick was formatted with FAT32 filesystem.

Change History (52)

comment:1 by erdmann, 8 years ago

Just to add,

there are a whole lot of other places where pBootFSInfo pointer is used. They might suffer from the same problem.
The "pBootFSInfo" pointer should be properly set in FS_MOUNT in file "ifsmount.c" but apparently there are conditions where the volume is no longer mounted but the filesystem (FAT32.IFS in this case) is not properly notified of this fact (or the other way around: the IFS is notified of an unmount but never properly receives a remount request).
While this really is a problem of the kernel and/or the DMDs, the IFS still has to protect itself against this problem.

Last edited 8 years ago by erdmann (previous) (diff)

comment:2 by Lewis Rosenthal, 8 years ago

Priority: majorcritical
Severity: mediumhighest

If we hope to include a finished version of the new FAT32 driver in Blue Lion, we need to address issues like these ASAP.

Thanks for investigating this, Lars.

comment:3 by diver, 8 years ago

Milestone: GA
Type: taskdefect

comment:4 by Valery V. Sedletski, 8 years ago

I, hopefully, fixed this with r132 (and binaries are available from the same link, ftp://osfree.org/upload/fat32/fat32-0.10-alpha6.zip). Now I added the check for the removable availability, and if it is not available (for example, I took off the flash stick from the USB port, without ejecting it), the volume is deleted, and re-added when you insert it back. So, if I take off the USB stick, the drive letter is automatically disappears, and reappears again if I insert it back. I did so several times, and I see no traps. This should be tested and discussed, of course, so please, test it.

comment:5 by erdmann, 8 years ago

Hm,
at the point of trap pVolInfo was valid (a valid pointer) but the contained pointer pBootFSInfo was not.
I don't see how your change would fix the trap.

comment:6 by Valery V. Sedletski, 8 years ago

If the volume is not valid (uncertain), it will delete the volume, but not try to use fields of pVolInfo (which is invalid itself). So, no reason to use pVolInfo->pBootFSInfo if pVolInfo is invalid (it should be actually done free(pVolInfo))

comment:7 by Valery V. Sedletski, 8 years ago

PS: pVolInfo is got by GetVolInfo(), which takes the pVolInfo from pvpfsd. When volume is unmounted (case MPINT_VOL_REMOVED of FS_MOUNT), it did not zero-out that pointed (which I did and added free() instead of freeseg()). So, actually you see pVolInfo from deleted hVPB which is still in pvpfsd and is not freed. That is becauseyou did not trapped on pVolInfo == NULL but trapped on pVolInfo->pBootFSInfo == NULL.

Version 0, edited 8 years ago by Valery V. Sedletski (next)

comment:8 by erdmann, 8 years ago

Tested: ftp://osfree.org/upload/fat32/fat32-0.10-alpha6.zip

1) I cannot reproduce the trap. However it is of course difficult to reproduce and only time will show. I'll let you know if I run into the very same problem again.

2) I did a "chkdsk g: /F" on a USB stick (partition): that worked ok.

3) I did a "format g: /FS:FAT32" on a USB stick (partition): format did not do anything, the existing contents are still there. I also tried "fat32fmt.exe g:" with the same effect. Running format seems to leave the system in an inconsistent state: the commandline hangs, a subsequent "CHKDSK G: /F" does not finish. Eventually a Ctrl-C will work.

comment:9 by Valery V. Sedletski, 8 years ago

Did you tried alpha6 to do that? Format should finish. There was a problem in previous builds that did not finish format after being formatted into FAT32, but subsequent attempt succeeded (see the end of http://trac.netlabs.org/fat32/ticket/14, the bug with failing to set the volume label). This was fixed with alpha6. Also, I did not understood, the format succeeded successfully for you, or there was an error? Opening the disk in FC/2 after that succeeds or not, or it writes an error accessing the disk? Does subsequent format attempt succeed?

PS: Also note that format atm modifies the system area only, i.e., boot block, FAT's but not data area. So I had the situation where I reformat the partition to HPFS, then back to FAT32 and see the same data. If you see such situation, you can try wipe'ing the partition by dfsee, like this: 1) set current partition 2) wipe "" 0 <amount of bytes in that partition>

Last edited 8 years ago by Valery V. Sedletski (previous) (diff)

comment:10 by Valery V. Sedletski, 8 years ago

PPS: Now I see the same directory listing after formatting too. So, I acknowledge this problem.

comment:11 by Valery V. Sedletski, 8 years ago

Uploaded the new ftp://osfree.org/upload/fat32/fat32-0.10-alpha7.zip version. I merged changes with trunk, so Lars' changes in FS_MOUNT are in. KO Myung Hun's changes in cache are also in, but I disabled the cache init for now, as I observed a hang on boot in FS_INIT when InitCache() is called. This is temporary, of course. Also, I disabled a code piece in FORMAT which I introduced in alpha6, so now directory listing after format is cleared, and it is no more preserved after FORMAT. Please use this version.

comment:12 by Lewis Rosenthal, 8 years ago

A couple observations before I reboot to confirm some bad behavior I've just seen (could be entirely my fault, as I just noticed that CACHEF32 was locked and did not get updated when I copied over alpha 7 to my boot volume):

You missed updating the bldlevel strings in alpha7, so it reports itself as alpha6 (the date is newer, however). ;-)

There is a country.kor in \OS2\BOOT which I'm guessing came over with KO's patches. That isn't required for non-Korean systems, is it?

Is os2dasd.f32 required to replace OS2DASD.DMD, or...?

Sorry for all of the seemingly OT questions in this ticket. I am seeing files left after format, and want to eliminate what I can (I suspect it's the backleveled CACHEF32).

How should we be testing right now? I have:

IFS=C:\OS2\BOOT\FAT32.IFS /CACHE:2048 /h /ac:h
CALL=C:\OS2\CACHEF32.EXE /f /p:2 /m:50000 /b:250 /d:5000

You mention that the cache init is currently disabled, so I'm not sure whether I need to make a change in CONFIG.SYS for any of this.

comment:13 by erdmann, 8 years ago

Tested: ​ftp://osfree.org/upload/fat32/fat32-0.10-alpha7.zip

1) "format g: /FS:FAT32" still does absolutely nothing. It returns right away without any error but also without any apparent effect. I can run it multiple times and it always returns without error but also without any apparent effect. If it does not actually wipe anything (meaning FAT at least) then it's useless for me. Have you considered using the appropriate IOCTLs to indicate start of formatting and at the end of formatting to redetermine media ? It also does not properly check if the correct volume label is entered or not. If I enter nothing for the volume label it will go ahead and attempt a format even though a valid volume label has been set in the past.

2) "chkdsk g: /F" works. It'll take its time but it eventually returns.

comment:14 by erdmann, 8 years ago

@ Lewis:

I only replace these files:
\os2\cachef32.exe|cachef32.sym
\os2\diskdump.exe|diskdump.sym
\os2\f32mon.exe|f32mon.sym
\os2\f32stat.exe|f32stat.sym
\os2\fat32chk.exe|fat32chk.sym
\os2\fat32fmt.exe|fat32fmt.sym
\os2\dll\ufat32.dll|ufat32.sym
\os2\boot\fat32.ifs|fat32.sym

I am fairly confident that this is all that needs to be replaced on an LVM enabled system.

comment:15 by Valery V. Sedletski, 8 years ago

2lewisr: Yes, I forgot to update bldlevel strings. Country.kor is needed only for korean systems (it's korean country.sys version, if I correctly understand). Os2dasd.f32 is optional too (the purpose of both is described in older fat32 documentation). It is needed as a replacement of os2dasd.dmd for Merlin-based systems, to understand FAT32 volumes. No need to change config.sys. All should work out of the box. Cachef32 is needed, it is needed not only for cache, but, for example, for codepage conversion to unicode and to start CHKDSK on boot. But CHKDSK should now start from fat32.ifs init. The start from cachef32.exe is now disabled. Also, please do "unlock cachef32.exe" before replacing it with newer version.

2erdmann: 1) strange, because everything is working with alpha7 on my side. It reformats volumes successfully. It was like you're saying with alpha6, but repaired with alpha7 for me. Please supply more details for me, to be able to reproduce it on my side. As I see it working here. And yes, I did all required IOCTL's before formatting (including BEGINFORMAT before formatting and REDETERMINEMEDIA after that. You can look into the sources, if you interested.). It checks the volume label. If you'll enter no volume label, it will use "NO NAME". And If there were the volume label, it will ask for volume label. So, I don't understand what are you talking about. More details, please.

comment:16 by Valery V. Sedletski, 8 years ago

PS: Looked again into sources -- yes, in some cases return code is not checked. The most notable case is the function, zeroing sectors. So, there are cases when actions are made "blindly". Also, many functions in original Fat32Format have "void" return code, but call "die" function in case of errors. So, need to check sources for such cases... But strange that it works on three my machines.

Last edited 8 years ago by Valery V. Sedletski (previous) (diff)

comment:17 by Valery V. Sedletski, 8 years ago

Tested it on another machine, and it failed there. Now I fixed one error and everything is working on both. write_file() function returned an error, because I specified write size in bytes, but needed to convert it into sectors. Also, the return code check was commented out. Uploaded the binaries again to ftp://osfree.org/upload/fat32/fat32-0.10-alpha7.zip. The version strings updated. Please try this version.

Last edited 8 years ago by Valery V. Sedletski (previous) (diff)

comment:18 by Valery V. Sedletski, 8 years ago

2Lewis: Regarding the minimum set of files, in case you change only for testing and intend to remove them back -- The set of fat32.ifs, ufat32.dll and cachef32.exe is sufficient. Other files are optional. I use my version of fat32 on my systems with no problems, though it is called alpha. And format worked successfully on several machines, until I changed something.

comment:19 by Lewis Rosenthal, 8 years ago

Thanks for this.

Formatting appears to be successful, now, though I have managed to make a mess of my test stick (64GB unpartitioned) by formatting it HPFS and copying a bunch of files. The intent was to format it FAT32 after that, however, chkdsk reports it as (still) FAT32, and formatting FAT32 now fails (as does formatting it HPFS). With the stick inserted, DFSee even exits immediately. It will be interesting to see what is so bad about it.

Anyway, while the bldlevel strings are now correct, the version string displayed by FAT32.IFS at boot still shows:

FAT32.IFS version 0.10a6 Oct 14 2016

Not complaining, just want to ensure that if you have build time scripts to update that you get those in order.

More news when I get some more tests done (partitioned and unpartitioned).

Thanks!

PS - Thanks to you and Lars for confirming which files are necessary. I have updated my installation script to ensure that cachef32.exe is unlocked and to warn if any files cannot be copied (makes life easier). Please advise about the cache settings for current testing.

comment:20 by erdmann, 8 years ago

I reloaded: ​ftp://osfree.org/upload/fat32/fat32-0.10-alpha7.zip

I made sure that all files are in the correct place and have the correct file dates. I also made sure that LIBPATH and PATH load the DLLs and EXEs in preference to where I have the 0.9.13 version of FAT32 installed.

I still cannot get format to work:

[d:\]format j: /FS:FAT32
The type of file system for the disk is FAT32.
The new type of file system is FAT32.
Enter the current volume label for drive j:
Warning!  All data on hard disk j: will be lost!
Proceed with FORMAT (Y/N)? Y
Enter up to 11 characters for the volume label
or press Enter for no volume label. FAT32 VOLUM
Size: 2048 MB 4208967 sectors
512 Bytes Per Sector, Cluster size 4096 bytes
Volume Serial No. is 1a0a:1129
Volume label is FAT32 VOLUM
32 Reserved Sectors, 4103 Sectors per FAT, 2 fats
525091 Total clusters
525090 Free Clusters
Formatting drive j:
Wrote 4221952 bytes in 0.02 seconds, 190.74 Megabytes/sec
Clearing out 8246 sectors for
Reserved sectors, fats and root cluster...
Initialising reserved sectors and FATs...
Done.

The format is blindingly fast but again it does not seem to do anything, all existing files are still there untouched. I am trying to format a FAT32 partition on a USB stick (removable). Maybe that is the difference (are you testing with fixed media only ?)

About volume label: as a confirmation the format asks for the existing label before formatting. But if I enter nothing or enter a wrong label (even though it has a valid label) it does the format anyway and overwrites the label with "NO NAME" instead of aborting the format.

By the way, with the intial version of ​ftp://osfree.org/upload/fat32/fat32-0.10-alpha7.zip, I got this in POPUPLOG.OS2:

10-13-2016  08:12:04  SYS3175  PID 004b  TID 0001  Slot 00b9
D:\OS2\CHKDSK.COM
c0000005
1c7868d6
P1=00000002  P2=0001fffc  P3=XXXXXXXX  P4=XXXXXXXX  
EAX=0fffffff  EBX=00000008  ECX=00000001  EDX=00000020
ESI=00000017  EDI=1c780010  
DS=0053  DSACC=d0f3  DSLIM=7fffffff  
ES=0053  ESACC=d0f3  ESLIM=7fffffff  
FS=150b  FSACC=00f3  FSLIM=00000030
GS=0053  GSACC=d0f3  GSLIM=7fffffff
CS:EIP=005b:1c7868d6  CSACC=d0df  CSLIM=7fffffff
SS:ESP=0053:00020000  SSACC=d0f3  SSLIM=7fffffff
EBP=00020024  FLG=00010212

UFAT32.DLL 0002:000068d6


Last edited 8 years ago by erdmann (previous) (diff)

comment:21 by Valery V. Sedletski, 8 years ago

2Lars: No, I tried two fixed disk partitions, one 16 GB flash disk and PAE ramdisk -- everything formats successfully. The root directory listing changes. So, I copy some files to the disk, they appear in the directory, then I format them, they disappear.

Also, if it can't write to disk, it should say an error, because I returned back the error checking in write_file().

Just tried to enter an incorrect volume label:

[d:\]format e: /fs:fat32
The type of file system for the disk is FAT32.
The new type of file system is FAT32.
Enter the current volume label for drive e: ert
SYS0636: An incorrect volume label was entered for this drive.

So, it answers by an error if incorrect volume label is entered. If I try to enter correct volume label, I see the following:

[d:\]format e: /fs:fat32
The type of file system for the disk is FAT32.
The new type of file system is FAT32.
Enter the current volume label for drive e: test
Warning!  All data on hard disk e: will be lost!
Proceed with FORMAT (Y/N)? y
Enter up to 11 characters for the volume label
or press Enter for no volume label. sdf
Size: 1024 MB 2650662 sectors
512 Bytes Per Sector, Cluster size 4096 bytes
Volume Serial No. is 1c18:282f
Volume label is SDF
32 Reserved Sectors, 2584 Sectors per FAT, 2 fats
330682 Total clusters
330681 Free Clusters
Formatting drive e:
Wrote 2666496 bytes in 2.06 seconds, 1.23 Megabytes/sec
Clearing out 5208 sectors for
Reserved sectors, fats and root cluster...
Initialising reserved sectors and FATs...
Done.

So, it accepts the previous volume label "test", formats the disk and the label is changed to "SDF" as intended. So, I wonder what is wrong on your machine.

Hm, but I acknowledge that if I simply press "Enter" and don't write any volume label, it formats the disk, instead of quitting, right. It appears, I miss this case, I usually use "format e: /fs:fat32 /v:test".

sysinstx also writes correct bootblock now (no "os2boot" for now, only bootblock and microfsd files)

[d:\]sysinstx e:
The type of file system for the disk is FAT32.
The system files have been transferred.

And files are there. You have the minimal set of FreeLDR files and a bootblock installed now.

Regarding the trap, is it only with initial version, or with last version too?

2Lewis: DFSee exits immediately? Maybe, disk is locked? I had such problem with locking, when one of my disks becomes locked, so trying to format it, either to FAT32 or HPFS, or any other FS, fails -- it writes an error. "CHKDSK e:" works, but "CHKDSK e: /f" fails, because disk is locked. Are you having similar symptoms?

It is strange that FAT32.IFS writes an incorrect version on boot, because I changed both bldlevel in makefiles, and the version in headers (the latter is shown by fat32.ifs). But OK, I'll double check it.

comment:22 by erdmann, 8 years ago

I now tried this:
1) run "chkdsk j: /F" on a FAT32 formatted partition on USB stick: it went ok, no trap in POPUPLOG.OS2
2) run "format j: /FS:FAT32": the format would not work
3) run "chkdsk j: /F" again: this time it would trap. I have TRAPDUMP=R0,F: in config.sys in order to catch Ring 0 traps and was able to capture a dump which shows this:

                        TRAP SCREEN INFORMATION
OS/2 Kernel Revision 14.106_SMP 
Exception in module:         
 TRAP 0008       ERRCD=0000  ERACC=****  ERLIM=******** CPU=00 
EAX=00000000  EBX=f8f30000  ECX=00000001  EDX=0000001d 
ESI=f8e646f0  EDI=0000001d  EBP=000046ca  FLG=00010046 
CS:EIP=0878:00005ff7  CSACC=009b  CSLIM=0000a697 
SS:ESP=1530:00004690  SSACC=1097  SSLIM=000046b7 
DS=0870  DSACC=0093  DSLIM=0000ba2a  CR0=8001003b 
ES=09a8  ESACC=0093  ESLIM=0000ff17  CR2=00a00000 
FS=0000  FSACC=****  FSLIM=******** 
GS=0000  GSACC=****  GSLIM=********
 
No Symbols Found 


    ****  3175 or 3171 trap encountered  ****
Unknown error code in 3175 trap info: 
 at 005b:1d556915 
P1=00000002  P2=0001fff0  P3=XXXXXXXX  P4=XXXXXXXX   
CS:EIP=005b:1d556915  CSACC=d0df  CSLIM=7fffffff 
SS:ESP=0053:0001fff0  SSACC=d0f3  SSLIM=7fffffff 
EBP=0002000c  FLG=00010206 
EAX=00000004  EBX=00000008  ECX=00000001  EDX=00000020 
ESI=00000017  EDI=1d550010   
DS=0053  DSACC=d0f3  DSLIM=7fffffff   
ES=0053  ESACC=d0f3  ESLIM=7fffffff   
FS=150b  FSACC=00f3  FSLIM=00000030 
GS=0053  GSACC=d0f3  GSLIM=7fffffff 

 
No Symbols Found 


    ****  3175 or 3171 trap encountered  ****
System Encountered an access violation
 in UFAT32.DLL 0002:00006915  at 005b:1d556915
 
The cs:eip values in the 317x screen do not match the
current cs:eip.  Therefore, unless this trap was handled
by an exception handler, TRAPDUMP=ON was not in config.sys

P1=00000002  P2=0001fff0  P3=XXXXXXXX  P4=XXXXXXXX   
EAX=00000004  EBX=00000008  ECX=00000001  EDX=00000020 
ESI=00000017  EDI=1d550010   
DS=0053  DSACC=d0f3  DSLIM=7fffffff   
ES=0053  ESACC=d0f3  ESLIM=7fffffff   
FS=150b  FSACC=00f3  FSLIM=00000030 
GS=0053  GSACC=d0f3  GSLIM=7fffffff 
CS:EIP=005b:1d556915  CSACC=d0df  CSLIM=7fffffff 
SS:ESP=0053:0001fff0  SSACC=d0f3  SSLIM=7fffffff 
EBP=0002000c  FLG=00010206 

I then checked the stick under Windows and it would find errors with the filesystem which it corrected (apart from others, "\WP ROOT. SF" was broken).

Now you have the place in UFAT32.DLL to look at ...

I am using the QSINIT loader. I suggest you try formatting a USB stick and see if you also get a trap. I am also running ACPI Version 3.23.04 which currently is a test version. I don't know if that has a bearing on the problem.

comment:23 by erdmann, 8 years ago

Just to be on the safe side, I reverted ACPI.PSD back to 3.23.01 and booted with the original OS2LDR.

I redid the test sequence I stated above. This time it would not trap but the format still would not work. I also assume the trap will not happen every time. I suspect that the register state is somehow messed up under circumstances yet to find out.

comment:24 by Valery V. Sedletski, 8 years ago

2Lars: I did chkdsk->format->chkdsk as you said, on flash stick. It checked and formatted successfully, no trap. (I never had ring0 traps). Trap 0008 is very bad, it is very difficult to track. Not sure how subsequent trap screens are related to final trap 0008. I looked into ufat32.dll at address 2:00006915, I see the instructions:

.00026902: 55                           push        ebp
.00026903: 89E5                         mov         ebp,esp
.00026905: 81EC1C000000                 sub         esp,00000001C ;"   
"
.0002690B: C745FC00000000               mov         d,[ebp][-04],000000000 ;"
.00026912: 8B4518                       mov         eax,[ebp][18]
.00026915: 8945E4                       mov         [ebp][-1C],eax
.00026918: 8B451C                       mov         eax,[ebp][1C]
.0002691B: 8945E8                       mov         [ebp][-18],eax

the instruction

   mov [ebp][-1C],eax

is likely to be trapped because of insufficient stack, because the previous instruction not trapped, and it also deals with stack, but other offset relative to ebp. Anyway, to be sure why it's trapped, you'd be better to attach com port terminal, install OS/4 kernel (it shows more info than standard kernel), write the following commands into kdb.ini:

.b 115200t 3f8
g
ln
u
k

so, it will show disassembly and debug symbols. Also, you can use "dg" command to check the stack segment limit, to be sure if it is insufficient stack. I looked into ufat32.dll disassembly, and see it is trapped somewhere in CHKDSK, in static routines, no debug symbols. But I see, that, according to a map file, 2:000068f5 is SetNextCluster().

.0002690B: C745FC00000000               mov         d,[ebp][-04],000000000 ;"

is

ULONG  dummy = 0;

and

.00026912: 8B4518                       mov         eax,[ebp][18]
.00026915: 8945E4                       mov         [ebp][-1C],eax

is

SetCluster.ulCluster = ulCluster;

SetCluster is the structure on stack. It is already allocated, so I see no reason for this instruction to be trapped. SSLIM=0x7fffffff, so it is 32-bit stack. According to a makefile, the stack size is 81920 bytes == 0x20000 bytes. ebp == 0x2000c, so the stack size is really too small and it's trapped because of exhausted stack. So, we need to increase. Why it lead to a ring0 trap, I have no idea. Maybe, you have TRAPDUMP set up, so it catches ring3 traps?

PS: But I see no relation of QSINIT and ACPI with FAT32. I have QSINIT too, but no Arca Noae ACPI installed.

comment:25 by Valery V. Sedletski, 8 years ago

PPS: 81920 bytes is the stack as specified in ufat32.dll makefile, but it should be chkdsk.com stack, by idea. And chkdsk.com is a 16-bit program with 16-bit stack, which should be less than 64k. So, I wonder why CHKDSK uses the stack as specified in DLL makefile. for SYS, I needed to switch stack because sysinstx.com trapped because of insufficient stack. But with CHKDSK, we see other situation. Strange.

comment:26 by Valery V. Sedletski, 8 years ago

I was wrong about stack size to be 0x20000, and ebp=0x2000c has nothing with stack limit, but we have for sure too little stack. Also, I looked into popuplog.os2 and see many following traps on boot:

13.10.2016  17:54: 1  SYS3175  PID 0007  TID 0001  Slot 000B
D:\OS2\FAT32CHK.EXE
 c0000005
1ff368c7
P1=00000002  P2=0001fff8  P3=XXXXXXXX  P4=XXXXXXXX
EAX=00000004  EBX=00000008  ECX=00000001  EDX=00000020

ESI=00000017  EDI=1ff30010
DS=0053  DSACC=d0f3  DSLIM=7fffffff
ES=0053  ESACC=d0f3  ESLIM=7fffffff
FS=150b  FSACC=00f3  FSLIM=00000030
GS=0053  GSACC=d0f3  GSLIM=7fffffff
CS:EIP=005b:1ff368c7  CSACC=d0df  CSLIM=7fffffff
SS:ESP=0053:0001fff8  SSACC=d0f3  SSLIM=7fffffff
EBP=00020014  FLG=00010212

UFAT32.DLL 0002:000068c7

here, we have trap in almost the same area, and it seems, by the same reason. But I see no ring0 traps here.

comment:27 by erdmann, 8 years ago

The only thing I can add to the Ring 0 trap screen contents I already posted:

CS=878 was pointing to OS2AHCI.ADD (first) code segment
DS=870 was pointing to OS2AHCI.ADD (first) data segment
ES=9A8 had a linear mapping (exists in the GDT) and is a 16-bit data segment but that's all I can say about it

I suspect that OS2AHCI (which I am using) trapped on dumping the trap into the trap dump partition. Nonetheless it succeeded.

And no, I don't have TRAPDUMP enabled for Ring 3 traps. As I already stated, I have it set to TRAPDUMP=R0,F: which only dumps in case of Ring 0 trap.

comment:28 by Valery V. Sedletski, 8 years ago

And now I see then the 2nd trap screen is the same registers as 3rd one. Did you found this info about os2ahci.add from PMDF utility by analyzing the dump file? BTW, I tried to set up DUMPFS partition non my machine, and it does not work. I press Ctrl-Alt-Num-Num, and it tries to dump, but no success -- the os2dump is called with a prompt, but it stops on that stage. (I have 4 GB of RAM, so I created the 4 GB partition, formatted it with "format i: /fs:dumpfs" /v:SADUMP, added TRAPDUMP=R0,I: and copied os2dump.hd to os2dump and then added ifs=dumpfs.ifs to config.sys), but it doesn't work. I used standard os2dump with a FAT partition when I had 1 GB of RAM, and it even worked, but now it won't. Do you use DUMPFS now? Any tips you can suggest?

Btw, cf. the DUMPFS ticket -- I tried to trap with this setup, but no trap so far.

comment:29 by erdmann, 8 years ago

Yes, I looked at the dump file's contents to find out where the selectors were pointing at. I load the dump file in PMDF and then for example a ".d DEV 870:0" will format the device driver header if selector 870 points to the first device driver data segment, a "dg 9A8" will display the GDT descriptor for selector 9A8 etc.
You get help about the supported command set by entering "?" and also ".?"

# .d DEV 870:0
 
          DevNext: 0858:0000 
          DevAttr: 8d80 
         DevStrat: 0000 
           DevInt: 000d 
          DevName: OS2AHCI$ 
        DevProtCS: 0878 
        DevProtDS: 0870 
        DevRealCS: 0000 
        DevRealDS: 0000 
# 

For properly setting up a dump partition:
1) read this:http://home.earthlink.net/~steve53/os2diags/dumpfs-user-guide.txt[[BR]] Pay attention to what it says about your dump partition size
2) I am using QSINIT loader. It might also work with the original OS2LDR but I know for sure it works with QSINIT.
3) I think there was a corrupted DUMPFS package but the one from here:
http://www.os2site.com/sw/drivers/filesystem/dumpfs.zip[[BR]] should be ok
4) I don't know how well the OS/4 kernel works with dumping. I always use the standard OS/2 kernel.
5) I vaguely remember that DUMPFS.IFS has to be the last loaded IFS but I might be mistaken. Nonetheless that's how I set it up in CONFIG.SYS.
6) Your formatting commandline looks ok to me. But you should add DUMPFS.IFS to the CONFIG.SYS with full path info:
IFS=C:\OS2\BOOT\DUMPFS.IFS
but maybe you did that already.

Last edited 8 years ago by erdmann (previous) (diff)

comment:30 by dazarewicz, 8 years ago

FWIW, OS2AHCI is not involved in dumping a trap dump. In fact, no driver is involved. Dumping is done completely by using the BIOS.

comment:31 by Valery V. Sedletski, 8 years ago

2Lars: yes, I downloaded it from os2site, and read the doc from Steven Levine site. Did everything as you described, size of partition is 4000 MB, and when I press Ctrl-Alt-Num_Num, the os2dump is called, it says

The memory dump will try to dump to partition: G
The amount of memory it expects to dump is:  3668928K

and it does not proceed any further.

OS/4 kernel should work with memory dumps. And I tried IBM's kernels too, with the same result.

2dazarevicz: Yes, right, os2dump works via BIOS. That is dumpfs.ifs may work via disk driver, but os2dump use direct BIOS disk access.

comment:32 by Valery V. Sedletski, 8 years ago

Fixed the volume label bug, and added stack switching for CHKDSK, increased the stack size to 128 KB, I hope it will be sufficient. The link is as previous, ftp://osfree.org/upload/fat32/fat32-0.10-alpha7.zip.

comment:33 by Valery V. Sedletski, 8 years ago

Upd: Repaired showing percent formatted and added checking error codes in most routines. So, if it failed somewhere, it must panic with an error. Also, it should show % formatted, please say if not. The link is as previous, ftp://osfree.org/upload/fat32/fat32-0.10-alpha7.zip.

comment:34 by erdmann, 8 years ago

Things have not improved:

after a "format i: /FS:FAT32" the disk is no longer accessible even though a drive letter will still be allocated. I ejected the USB stick and reinserted it, and a drive letter is again allocated but the partition cannot be accessed at all. No writing to the partition is possible.
I also tried to create a file on the formatted partition but no file can be saved to the partition.
There is clearly something wrong with the formatting code.

A "chkdsk i: /F" run directly after the formatting will now find errors:

The volume label is FAT32 VOLUM.
The Volume Serial Number is 1A09-3119.
CHKDSK is checking fats :Ok.
CHKDSK is checking files and directories...
CHKDSK is searching for lost data.
CHKDSK has searched 100% of the disk.
The stored free disk space is incorrect.
(524696 free allocation units are reported,
while 525090 free units are detected.)
The correct free space is set to 525090 allocation units.

  2150772736 bytes total disk space.
           0 bytes in 0 hidden files.
        4096 bytes in 1 directories.
           0 bytes in extended attributes.
           0 bytes in 1 user files.
  2150768640 bytes available on disk.

        4096 bytes in each allocation unit.
      525091 total allocation units.
      525090 available allocation units on disk.

0% of the files and directories are fragmented.

but it is not able to correct them. I can rerun the chkdsk command over and over again and it will continue to show the very same error.

Last edited 8 years ago by erdmann (previous) (diff)

comment:35 by erdmann, 8 years ago

Something to add:

I now used DFSee to format the partition with FAT32 and that works just fine.
DFSee reports that it cannot lock the complete disk (I suppose that's DosPhysicalDisk and DosDevIOCtl with the physical disk category), I answer with "go ahead anyway" and it continues and formats the partition. Please note that USBMSD.ADD is not called into when the DosPhysicalDisk/DosDevIOCtl calls are issued and therefore is also not in the loop on this failure. After formatting with DFSee I still cannot write a file to disk even after I ejected and reinserted it.

If I reboot with FAT32 0.9.13 in use, a subsequent chkdsk finds a problem. I run a chkdsk under Windows and reinsert the USB stick in the OS/2 box. I can now write a file to the partition.

Last edited 8 years ago by erdmann (previous) (diff)

comment:36 by Valery V. Sedletski, 8 years ago

The partition cannot be accessed because it is not mounted. So, the drive letter is allocated, but disk is not mounted. But it should write an error in this case. I wonder why are you doesn't see an error. Dfsee is probably cannot lock the disk, not physical one, but the logical one. (it formats a partition, not a whole disk). DosPhysicalDisk is not used in my case. The "(524696 free allocation units are reported, while 525090 free units are detected.)" error is showed usually on almost all disks in normal circumstances, and it does not fixed neither by our CHKDSK, nor windows one. This is usual and is not a FORMAT result. Your disk formatting is not finished with REDETERMINEMEDIA, but I don't see an error from it in your case. It should be

      "WARNING: Failed to do final remount, rc = %lu!\n"
      "WARNING: probably, you need to reboot for changes to take in effect.\n"

if REDETERMINEMEDIA ioctl returned a non-zero code, but it isn't

I did not understood about usbmsd.add -- how it can be not called when flash disk is formatted. It should. The ADD is called by os2dasd.dmd, and os2dasd.dmd is called by fat32.ifs in this case.

comment:37 by erdmann, 8 years ago

What I meant to say for USBMSD.ADD: it is not called when API function "DosPhysicalDisk" is called. Of course it is called when formatting a USB stick.

comment:38 by Valery V. Sedletski, 8 years ago

It seems, I reproduced the bug with FORMAT on your side: Lewis has the following cachef32.exe command line:

CALL=C:\OS2\CACHEF32.EXE /f /p:2 /m:50000 /b:250 /d:5000

whereas, my line is:

call=D:\os2\cachef32.exe /l:off /p:1 /m:50000 /b:250 /d:5000

It seems, everything hurted the "/f" parameter. When I set the command line like Lewis has, I get:

D:\>format g: /fs:fat32 /v:sadump
The type of file system for the disk is FAT32.
The new type of file system is FAT32.
Warning!  All data on hard disk g: will be lost!
Proceed with FORMAT (Y/N)? y
Size: 3584 MB 8193150 sectors
512 Bytes Per Sector, Cluster size 4096 bytes
Volume Serial No. is 19f8:291e
Volume label is SADUMP
32 Reserved Sectors, 7986 Sectors per FAT, 2 fats
1022143 Total clusters
1022142 Free Clusters
Formatting drive g:
100% percent of disk formatted ...
Wrote 8198144 bytes in 3.86 seconds, 2.03 Megabytes/sec
Clearing out 16012 sectors for
Reserved sectors, fats and root cluster...
Initialising reserved sectors and FATs...
Done.

and then

D:\>sysinstx g:
The type of file system for the disk is FAT32.
Error 2 writing to g:\boot\loader\preldr0.mdl file.
SYS0002: The system cannot find the file specified.

EXPLANATION: The file named in the command does not exist in the
current directory or search path specified or the file name was
entered incorrectly.

ACTION: Retry the command using the correct file name.

and creating a file fails. So, your case seems to be reproduced on my side. Please try my command line and see if FORMAT and SYS will succeed to you (and creating files too).

comment:39 by Valery V. Sedletski, 8 years ago

PS: you can try this without a reboot: turn lazy write off:

cachef32 /l:off

and format works, turn it on:

cachef32 /l:on

and it fails. So, it seems to be the lazy write bug. But I always switch off the lazy write. And, I doubt that we can fix lazy write soon, as it was always buggy :(.

comment:40 by Valery V. Sedletski, 8 years ago

PPS: I added FAT32_STOPLW ioctl before FORMAT and SYS and FAT32_STARTLW after them -- Now after FORMAT I see empty disk, FC/2 shows its volume label, so it is accessible. Then I do SYSINSTX and new files and dirs appear on the disk, I can walk dirs, so the disk is accessible. But I cannot create files on it, until I disable lazy write. Need to workaround this somehow.

comment:41 by Valery V. Sedletski, 8 years ago

Haha, DosClose fails when I create files with LW enabled. So, I Create file, write some info into it and save it successfully, but when I close it, it fails. It is easy to ensure about this, if you create a file in FC/2 by Shift-F4, then write some text into it, save by F2. But if you press Esc to close it, you see the message "There are unsaved changes". You press F2 again, and it's over and over again. I even created the test program trying to create a file on the disk, and I ensured that I get the ERROR_FILE_NOT_FOUND when calling DosClose.

comment:42 by Valery V. Sedletski, 8 years ago

Now I see that FS_CLOSE calls FS_COMMIT, which fails with an ERROR_FILE_NOT_FOUND error. The failed function is FindPathCluster, which returns FAT_EOF (it finds no first file cluster).

Last edited 8 years ago by Valery V. Sedletski (previous) (diff)

comment:43 by erdmann, 8 years ago

Yes this problem is related with lazy writing: if I turn off lazy writing the format procedure will end successfully and the newly formatted partition will also be writable even though the formatting procedure will end with error 26 (ERROR_NOT_DOS_DISK -> Unknown media type):

[d:\]format h: /FS:FAT32
The type of file system for the disk is FAT32.
The new type of file system is FAT32.
Enter the current volume label for drive h: FAT32 VOLUM
Warning!  All data on hard disk h: will be lost!
Proceed with FORMAT (Y/N)? Y
Enter up to 11 characters for the volume label
or press Enter for no volume label. FAT32 VOLUM
Size: 2048 MB 4208967 sectors
512 Bytes Per Sector, Cluster size 4096 bytes
Volume Serial No. is efc:2f65
Volume label is FAT32 VOLUM
32 Reserved Sectors, 4103 Sectors per FAT, 2 fats
525091 Total clusters
525090 Free Clusters
Formatting drive h:
100% percent of disk formatted ...
Wrote 4221952 bytes in 2.06 seconds, 1.95 Megabytes/sec
Clearing out 8246 sectors for
Reserved sectors, fats and root cluster...
Initialising reserved sectors and FATs...
WARNING: failed to set the volume label, rc=26
Done.

Under the given circumstances I think it would be a good idea to remove the /L switch from cachef32.exe (I assume that the /F switch does not enable lazy writing by itself, if yes, it should also be removed) until the lazy writing is fixed.
That will avoid a lot of problem reports where these problem reports stem from a known problem.

Last edited 8 years ago by erdmann (previous) (diff)

comment:44 by Valery V. Sedletski, 8 years ago

2Lars: The error "WARNING: failed to set the volume label, rc=26" should not be there, as after I changed it to use sector mode, I don't see it anymore, it was with previous builds. Anyway, you have some difference from my setup, but I wonder what? Could you specify more details about your system? Regarding the "/F" switch, I agree. We need to always start up the cache helper, but probably, disable lazy writing, until it gets fixed. And it seems to enable it by default when "/F" is set, because I need to do "cachef32 /l:off" before it can be working.

PS: It was previously ERROR_NOT_DOS_DISK == 26 before, when I formatted to fat32 after fat32. But If I formatted after hpfs, or any other FS, it completed successfully. So, do you see the same? And this symptom has gone when I added the FAT32_SECTORIO ioctl after BEGINFORMAT. Maybe,you forgot to update something? Or, some older DLL left on LIBPATH?

Last edited 8 years ago by Valery V. Sedletski (previous) (diff)

comment:45 by erdmann, 8 years ago

I am absolutely sure I unpacked all files from ftp://osfree.org/upload/fat32/fat32-0.10-alpha7.zip that I listed above. I checked that file dates were correct (they are 15.10.2016 and 15.10.2016 for the various files, I didn't bother with documentation files) and I checked that PATH and LIBPATH would pick up the executables and UFAT32.DLL from the correct directories.

No, there still must be something. But you are right, the partition formatted as FAT32 already had FAT32 in use. I suspect it has something to do with FS_MOUNT: the partition to format is already mounted as FAT32 whereas it would not be if it were a different filesystem before the format. I can try and format a partition that has something else than FAT32 and report results.

comment:46 by Valery V. Sedletski, 8 years ago

Theoretically, the BEGINFORMAT ioctl should remount a partition to FAT32 IFS, so that when it is formatted, it is formatted using fat32.ifs regardless of the initial filesystem. Sorry, I cannot check if 15.10.16 is the latest because I already rewrote files at the same link: ftp://osfree.org/upload/fat32/fat32-0.10-alpha7.zip. (I uploaded the newer version with lazy write disabled in the IFS). Please check if something changed. Unfortunately, I need to reproduce your situation on my machine, to fix your problems, but I have no required details, so it's bad.

comment:47 by Valery V. Sedletski, 8 years ago

PS: 15 Oct seems to be an older version than I uploaded the previous version, as today is Oct 20, and I uploaded it one or two days ago -- a way after Oct 15

comment:48 by erdmann, 8 years ago

Yes, with the ftp://osfree.org/upload/fat32/fat32-0.10-alpha7.zip and latest file date of 20.10.2016 it now works fine. I can format with FAT32 no matter what filesystem there was initially. After formatting I can write files to it.

There is one odd thing in conjunction with FAT16: if I format a FAT16 formatted drive to FAT32, write a file to it and then reformat with FAT16, then I still see the file and a chkdsk (on that now FAT16 formatted drive) gives this error:

[d:\]chkdsk i: /F
The current hard disk drive is: I:
The type of file system for the disk is FAT.
The volume label is FAT16 VOLUM.
The Volume Serial Number is 9C39-A015.
SYS1336: Disk error reading the File Allocation Table 1.
Processing cannot continue.
SYS1374: File Allocation Table (FAT) is bad on drive I.

But that might also be related to an error in the FAT16 implementation of the OS/2 kernel. You can close this bug.

comment:49 by Valery V. Sedletski, 8 years ago

Good. I still see a file after reformatting fat32->fat too, but it is not a surprise, as Fat32Format seems to be not overwriting the root dir. Maybe, I need to clear it too. And yes, I see the same error with FAT CHKDSK here. But this seems to be a FAT implementation bug, not FAT32. (FAT is older and it does nothing about FAT32, and they are similar).

PS: But there's still one bug in FAT32 -- it sometimes lock the very 1st FAT32 drive letter, so it fails to reformat to any file system then. (And error opening or locking the disk).

PPS: And yes, I needed to disable lazy writing in fat32.ifs for now, otherwise it will not work. I suspect, that after FORMAT, lazy write from previous volume is still active, and it tries to flush it, and corrupts volume.

Last edited 8 years ago by Valery V. Sedletski (previous) (diff)

in reply to:  12 comment:50 by KO Myung-Hun, 8 years ago

Replying to lewisr:

A couple observations before I reboot to confirm some bad behavior I've just seen (could be entirely my fault, as I just noticed that CACHEF32 was locked and did not get updated when I copied over alpha 7 to my boot volume):

You missed updating the bldlevel strings in alpha7, so it reports itself as alpha6 (the date is newer, however). ;-)

There is a country.kor in \OS2\BOOT which I'm guessing came over with KO's patches. That isn't required for non-Korean systems, is it?

country.kor is required to use Korean characters(Hangul) correctly on non-Korean systems. In other words, to use CP949 on non-Korean systesm. So it should be shipped as well, as an option.

comment:51 by Valery V. Sedletski, 8 years ago

This doesn't repeat anymore, so, closing it too.

comment:52 by Valery V. Sedletski, 8 years ago

Resolution: fixed
Status: newclosed
Note: See TracTickets for help on using tickets.