Opened 8 years ago
Closed 8 years ago
#24 closed defect (fixed)
Trap in FS_FSINFO because of NULL "pBootFSInfo" pointer in pVolInfo
Reported by: | erdmann | Owned by: | |
---|---|---|---|
Priority: | critical | Milestone: | GA |
Component: | IFS | Version: | |
Severity: | highest | Keywords: | |
Cc: |
Description
Recently I had a trap in FAT32.IFS and I tracked it down to routine "FS_FSINFO" in file "fat32.c". The trap occurred in version 0.9.13 version but the code is still unchanged in the current version 0.10.x. The trap occurred in this line of code: pAlloc->cUnitAvail = pVolInfo->pBootFSInfo->ulFreeClusters;
The trap occurs because at this point, pBootFSInfo is NULL. The trap happened because OS2DASD/OS2LVM got confused of if a (removable) USB stick was present or not and the said USB stick was formatted with FAT32 filesystem.
Change History (52)
comment:2 by , 8 years ago
Priority: | major → critical |
---|---|
Severity: | medium → highest |
If we hope to include a finished version of the new FAT32 driver in Blue Lion, we need to address issues like these ASAP.
Thanks for investigating this, Lars.
comment:3 by , 8 years ago
Milestone: | → GA |
---|---|
Type: | task → defect |
comment:4 by , 8 years ago
I, hopefully, fixed this with r132 (and binaries are available from the same link, ftp://osfree.org/upload/fat32/fat32-0.10-alpha6.zip). Now I added the check for the removable availability, and if it is not available (for example, I took off the flash stick from the USB port, without ejecting it), the volume is deleted, and re-added when you insert it back. So, if I take off the USB stick, the drive letter is automatically disappears, and reappears again if I insert it back. I did so several times, and I see no traps. This should be tested and discussed, of course, so please, test it.
comment:5 by , 8 years ago
Hm,
at the point of trap pVolInfo was valid (a valid pointer) but the contained pointer pBootFSInfo was not.
I don't see how your change would fix the trap.
comment:6 by , 8 years ago
If the volume is not valid (uncertain), it will delete the volume, but not try to use fields of pVolInfo (which is invalid itself). So, no reason to use pVolInfo->pBootFSInfo if pVolInfo is invalid (it should be actually done free(pVolInfo))
comment:7 by , 8 years ago
PS: pVolInfo is got by GetVolInfo(), which takes the pVolInfo from pvpfsd. When volume is unmounted (case MOUNT_VOL_REMOVED of FS_MOUNT), it did not zero-out that pointer (which I did and added free() instead of freeseg()). So, actually you see pVolInfo from deleted hVPB which is still in pvpfsd and is not freed. That is becauseyou did not trapped on pVolInfo == NULL but trapped on pVolInfo->pBootFSInfo == NULL.
comment:8 by , 8 years ago
Tested: ftp://osfree.org/upload/fat32/fat32-0.10-alpha6.zip
1) I cannot reproduce the trap. However it is of course difficult to reproduce and only time will show. I'll let you know if I run into the very same problem again.
2) I did a "chkdsk g: /F" on a USB stick (partition): that worked ok.
3) I did a "format g: /FS:FAT32" on a USB stick (partition): format did not do anything, the existing contents are still there. I also tried "fat32fmt.exe g:" with the same effect. Running format seems to leave the system in an inconsistent state: the commandline hangs, a subsequent "CHKDSK G: /F" does not finish. Eventually a Ctrl-C will work.
comment:9 by , 8 years ago
Did you tried alpha6 to do that? Format should finish. There was a problem in previous builds that did not finish format after being formatted into FAT32, but subsequent attempt succeeded (see the end of http://trac.netlabs.org/fat32/ticket/14, the bug with failing to set the volume label). This was fixed with alpha6. Also, I did not understood, the format succeeded successfully for you, or there was an error? Opening the disk in FC/2 after that succeeds or not, or it writes an error accessing the disk? Does subsequent format attempt succeed?
PS: Also note that format atm modifies the system area only, i.e., boot block, FAT's but not data area. So I had the situation where I reformat the partition to HPFS, then back to FAT32 and see the same data. If you see such situation, you can try wipe'ing the partition by dfsee, like this: 1) set current partition 2) wipe "" 0 <amount of bytes in that partition>
comment:10 by , 8 years ago
PPS: Now I see the same directory listing after formatting too. So, I acknowledge this problem.
comment:11 by , 8 years ago
Uploaded the new ftp://osfree.org/upload/fat32/fat32-0.10-alpha7.zip version. I merged changes with trunk, so Lars' changes in FS_MOUNT are in. KO Myung Hun's changes in cache are also in, but I disabled the cache init for now, as I observed a hang on boot in FS_INIT when InitCache() is called. This is temporary, of course. Also, I disabled a code piece in FORMAT which I introduced in alpha6, so now directory listing after format is cleared, and it is no more preserved after FORMAT. Please use this version.
follow-up: 50 comment:12 by , 8 years ago
A couple observations before I reboot to confirm some bad behavior I've just seen (could be entirely my fault, as I just noticed that CACHEF32 was locked and did not get updated when I copied over alpha 7 to my boot volume):
You missed updating the bldlevel strings in alpha7, so it reports itself as alpha6 (the date is newer, however). ;-)
There is a country.kor in \OS2\BOOT which I'm guessing came over with KO's patches. That isn't required for non-Korean systems, is it?
Is os2dasd.f32 required to replace OS2DASD.DMD, or...?
Sorry for all of the seemingly OT questions in this ticket. I am seeing files left after format, and want to eliminate what I can (I suspect it's the backleveled CACHEF32).
How should we be testing right now? I have:
IFS=C:\OS2\BOOT\FAT32.IFS /CACHE:2048 /h /ac:h CALL=C:\OS2\CACHEF32.EXE /f /p:2 /m:50000 /b:250 /d:5000
You mention that the cache init is currently disabled, so I'm not sure whether I need to make a change in CONFIG.SYS for any of this.
comment:13 by , 8 years ago
Tested: ftp://osfree.org/upload/fat32/fat32-0.10-alpha7.zip
1) "format g: /FS:FAT32" still does absolutely nothing. It returns right away without any error but also without any apparent effect. I can run it multiple times and it always returns without error but also without any apparent effect. If it does not actually wipe anything (meaning FAT at least) then it's useless for me. Have you considered using the appropriate IOCTLs to indicate start of formatting and at the end of formatting to redetermine media ? It also does not properly check if the correct volume label is entered or not. If I enter nothing for the volume label it will go ahead and attempt a format even though a valid volume label has been set in the past.
2) "chkdsk g: /F" works. It'll take its time but it eventually returns.
comment:14 by , 8 years ago
@ Lewis:
I only replace these files:
\os2\cachef32.exe|cachef32.sym
\os2\diskdump.exe|diskdump.sym
\os2\f32mon.exe|f32mon.sym
\os2\f32stat.exe|f32stat.sym
\os2\fat32chk.exe|fat32chk.sym
\os2\fat32fmt.exe|fat32fmt.sym
\os2\dll\ufat32.dll|ufat32.sym
\os2\boot\fat32.ifs|fat32.sym
I am fairly confident that this is all that needs to be replaced on an LVM enabled system.
comment:15 by , 8 years ago
2lewisr: Yes, I forgot to update bldlevel strings. Country.kor is needed only for korean systems (it's korean country.sys version, if I correctly understand). Os2dasd.f32 is optional too (the purpose of both is described in older fat32 documentation). It is needed as a replacement of os2dasd.dmd for Merlin-based systems, to understand FAT32 volumes. No need to change config.sys. All should work out of the box. Cachef32 is needed, it is needed not only for cache, but, for example, for codepage conversion to unicode and to start CHKDSK on boot. But CHKDSK should now start from fat32.ifs init. The start from cachef32.exe is now disabled. Also, please do "unlock cachef32.exe" before replacing it with newer version.
2erdmann: 1) strange, because everything is working with alpha7 on my side. It reformats volumes successfully. It was like you're saying with alpha6, but repaired with alpha7 for me. Please supply more details for me, to be able to reproduce it on my side. As I see it working here. And yes, I did all required IOCTL's before formatting (including BEGINFORMAT before formatting and REDETERMINEMEDIA after that. You can look into the sources, if you interested.). It checks the volume label. If you'll enter no volume label, it will use "NO NAME". And If there were the volume label, it will ask for volume label. So, I don't understand what are you talking about. More details, please.
comment:16 by , 8 years ago
PS: Looked again into sources -- yes, in some cases return code is not checked. The most notable case is the function, zeroing sectors. So, there are cases when actions are made "blindly". Also, many functions in original Fat32Format have "void" return code, but call "die" function in case of errors. So, need to check sources for such cases... But strange that it works on three my machines.
comment:17 by , 8 years ago
Tested it on another machine, and it failed there. Now I fixed one error and everything is working on both. write_file() function returned an error, because I specified write size in bytes, but needed to convert it into sectors. Also, the return code check was commented out. Uploaded the binaries again to ftp://osfree.org/upload/fat32/fat32-0.10-alpha7.zip. The version strings updated. Please try this version.
comment:18 by , 8 years ago
2Lewis: Regarding the minimum set of files, in case you change only for testing and intend to remove them back -- The set of fat32.ifs, ufat32.dll and cachef32.exe is sufficient. Other files are optional. I use my version of fat32 on my systems with no problems, though it is called alpha. And format worked successfully on several machines, until I changed something.
comment:19 by , 8 years ago
Thanks for this.
Formatting appears to be successful, now, though I have managed to make a mess of my test stick (64GB unpartitioned) by formatting it HPFS and copying a bunch of files. The intent was to format it FAT32 after that, however, chkdsk reports it as (still) FAT32, and formatting FAT32 now fails (as does formatting it HPFS). With the stick inserted, DFSee even exits immediately. It will be interesting to see what is so bad about it.
Anyway, while the bldlevel strings are now correct, the version string displayed by FAT32.IFS at boot still shows:
FAT32.IFS version 0.10a6 Oct 14 2016
Not complaining, just want to ensure that if you have build time scripts to update that you get those in order.
More news when I get some more tests done (partitioned and unpartitioned).
Thanks!
PS - Thanks to you and Lars for confirming which files are necessary. I have updated my installation script to ensure that cachef32.exe is unlocked and to warn if any files cannot be copied (makes life easier). Please advise about the cache settings for current testing.
comment:20 by , 8 years ago
I reloaded: ftp://osfree.org/upload/fat32/fat32-0.10-alpha7.zip
I made sure that all files are in the correct place and have the correct file dates. I also made sure that LIBPATH and PATH load the DLLs and EXEs in preference to where I have the 0.9.13 version of FAT32 installed.
I still cannot get format to work:
[d:\]format j: /FS:FAT32 The type of file system for the disk is FAT32. The new type of file system is FAT32. Enter the current volume label for drive j: Warning! All data on hard disk j: will be lost! Proceed with FORMAT (Y/N)? Y Enter up to 11 characters for the volume label or press Enter for no volume label. FAT32 VOLUM Size: 2048 MB 4208967 sectors 512 Bytes Per Sector, Cluster size 4096 bytes Volume Serial No. is 1a0a:1129 Volume label is FAT32 VOLUM 32 Reserved Sectors, 4103 Sectors per FAT, 2 fats 525091 Total clusters 525090 Free Clusters Formatting drive j: Wrote 4221952 bytes in 0.02 seconds, 190.74 Megabytes/sec Clearing out 8246 sectors for Reserved sectors, fats and root cluster... Initialising reserved sectors and FATs... Done.
The format is blindingly fast but again it does not seem to do anything, all existing files are still there untouched. I am trying to format a FAT32 partition on a USB stick (removable). Maybe that is the difference (are you testing with fixed media only ?)
About volume label: as a confirmation the format asks for the existing label before formatting. But if I enter nothing or enter a wrong label (even though it has a valid label) it does the format anyway and overwrites the label with "NO NAME" instead of aborting the format.
By the way, with the intial version of ftp://osfree.org/upload/fat32/fat32-0.10-alpha7.zip, I got this in POPUPLOG.OS2:
10-13-2016 08:12:04 SYS3175 PID 004b TID 0001 Slot 00b9 D:\OS2\CHKDSK.COM c0000005 1c7868d6 P1=00000002 P2=0001fffc P3=XXXXXXXX P4=XXXXXXXX EAX=0fffffff EBX=00000008 ECX=00000001 EDX=00000020 ESI=00000017 EDI=1c780010 DS=0053 DSACC=d0f3 DSLIM=7fffffff ES=0053 ESACC=d0f3 ESLIM=7fffffff FS=150b FSACC=00f3 FSLIM=00000030 GS=0053 GSACC=d0f3 GSLIM=7fffffff CS:EIP=005b:1c7868d6 CSACC=d0df CSLIM=7fffffff SS:ESP=0053:00020000 SSACC=d0f3 SSLIM=7fffffff EBP=00020024 FLG=00010212 UFAT32.DLL 0002:000068d6
comment:21 by , 8 years ago
2Lars: No, I tried two fixed disk partitions, one 16 GB flash disk and PAE ramdisk -- everything formats successfully. The root directory listing changes. So, I copy some files to the disk, they appear in the directory, then I format them, they disappear.
Also, if it can't write to disk, it should say an error, because I returned back the error checking in write_file().
Just tried to enter an incorrect volume label:
[d:\]format e: /fs:fat32 The type of file system for the disk is FAT32. The new type of file system is FAT32. Enter the current volume label for drive e: ert SYS0636: An incorrect volume label was entered for this drive.
So, it answers by an error if incorrect volume label is entered. If I try to enter correct volume label, I see the following:
[d:\]format e: /fs:fat32 The type of file system for the disk is FAT32. The new type of file system is FAT32. Enter the current volume label for drive e: test Warning! All data on hard disk e: will be lost! Proceed with FORMAT (Y/N)? y Enter up to 11 characters for the volume label or press Enter for no volume label. sdf Size: 1024 MB 2650662 sectors 512 Bytes Per Sector, Cluster size 4096 bytes Volume Serial No. is 1c18:282f Volume label is SDF 32 Reserved Sectors, 2584 Sectors per FAT, 2 fats 330682 Total clusters 330681 Free Clusters Formatting drive e: Wrote 2666496 bytes in 2.06 seconds, 1.23 Megabytes/sec Clearing out 5208 sectors for Reserved sectors, fats and root cluster... Initialising reserved sectors and FATs... Done.
So, it accepts the previous volume label "test", formats the disk and the label is changed to "SDF" as intended. So, I wonder what is wrong on your machine.
Hm, but I acknowledge that if I simply press "Enter" and don't write any volume label, it formats the disk, instead of quitting, right. It appears, I miss this case, I usually use "format e: /fs:fat32 /v:test".
sysinstx also writes correct bootblock now (no "os2boot" for now, only bootblock and microfsd files)
[d:\]sysinstx e: The type of file system for the disk is FAT32. The system files have been transferred.
And files are there. You have the minimal set of FreeLDR files and a bootblock installed now.
Regarding the trap, is it only with initial version, or with last version too?
2Lewis: DFSee exits immediately? Maybe, disk is locked? I had such problem with locking, when one of my disks becomes locked, so trying to format it, either to FAT32 or HPFS, or any other FS, fails -- it writes an error. "CHKDSK e:" works, but "CHKDSK e: /f" fails, because disk is locked. Are you having similar symptoms?
It is strange that FAT32.IFS writes an incorrect version on boot, because I changed both bldlevel in makefiles, and the version in headers (the latter is shown by fat32.ifs). But OK, I'll double check it.
comment:22 by , 8 years ago
I now tried this:
1) run "chkdsk j: /F" on a FAT32 formatted partition on USB stick: it went ok, no trap in POPUPLOG.OS2
2) run "format j: /FS:FAT32": the format would not work
3) run "chkdsk j: /F" again: this time it would trap. I have TRAPDUMP=R0,F: in config.sys in order to catch Ring 0 traps and was able to capture a dump which shows this:
TRAP SCREEN INFORMATION OS/2 Kernel Revision 14.106_SMP Exception in module: TRAP 0008 ERRCD=0000 ERACC=**** ERLIM=******** CPU=00 EAX=00000000 EBX=f8f30000 ECX=00000001 EDX=0000001d ESI=f8e646f0 EDI=0000001d EBP=000046ca FLG=00010046 CS:EIP=0878:00005ff7 CSACC=009b CSLIM=0000a697 SS:ESP=1530:00004690 SSACC=1097 SSLIM=000046b7 DS=0870 DSACC=0093 DSLIM=0000ba2a CR0=8001003b ES=09a8 ESACC=0093 ESLIM=0000ff17 CR2=00a00000 FS=0000 FSACC=**** FSLIM=******** GS=0000 GSACC=**** GSLIM=******** No Symbols Found **** 3175 or 3171 trap encountered **** Unknown error code in 3175 trap info: at 005b:1d556915 P1=00000002 P2=0001fff0 P3=XXXXXXXX P4=XXXXXXXX CS:EIP=005b:1d556915 CSACC=d0df CSLIM=7fffffff SS:ESP=0053:0001fff0 SSACC=d0f3 SSLIM=7fffffff EBP=0002000c FLG=00010206 EAX=00000004 EBX=00000008 ECX=00000001 EDX=00000020 ESI=00000017 EDI=1d550010 DS=0053 DSACC=d0f3 DSLIM=7fffffff ES=0053 ESACC=d0f3 ESLIM=7fffffff FS=150b FSACC=00f3 FSLIM=00000030 GS=0053 GSACC=d0f3 GSLIM=7fffffff No Symbols Found **** 3175 or 3171 trap encountered **** System Encountered an access violation in UFAT32.DLL 0002:00006915 at 005b:1d556915 The cs:eip values in the 317x screen do not match the current cs:eip. Therefore, unless this trap was handled by an exception handler, TRAPDUMP=ON was not in config.sys P1=00000002 P2=0001fff0 P3=XXXXXXXX P4=XXXXXXXX EAX=00000004 EBX=00000008 ECX=00000001 EDX=00000020 ESI=00000017 EDI=1d550010 DS=0053 DSACC=d0f3 DSLIM=7fffffff ES=0053 ESACC=d0f3 ESLIM=7fffffff FS=150b FSACC=00f3 FSLIM=00000030 GS=0053 GSACC=d0f3 GSLIM=7fffffff CS:EIP=005b:1d556915 CSACC=d0df CSLIM=7fffffff SS:ESP=0053:0001fff0 SSACC=d0f3 SSLIM=7fffffff EBP=0002000c FLG=00010206
I then checked the stick under Windows and it would find errors with the filesystem which it corrected (apart from others, "\WP ROOT. SF" was broken).
Now you have the place in UFAT32.DLL to look at ...
I am using the QSINIT loader. I suggest you try formatting a USB stick and see if you also get a trap. I am also running ACPI Version 3.23.04 which currently is a test version. I don't know if that has a bearing on the problem.
comment:23 by , 8 years ago
Just to be on the safe side, I reverted ACPI.PSD back to 3.23.01 and booted with the original OS2LDR.
I redid the test sequence I stated above. This time it would not trap but the format still would not work. I also assume the trap will not happen every time. I suspect that the register state is somehow messed up under circumstances yet to find out.
comment:24 by , 8 years ago
2Lars: I did chkdsk->format->chkdsk as you said, on flash stick. It checked and formatted successfully, no trap. (I never had ring0 traps). Trap 0008 is very bad, it is very difficult to track. Not sure how subsequent trap screens are related to final trap 0008. I looked into ufat32.dll at address 2:00006915, I see the instructions:
.00026902: 55 push ebp .00026903: 89E5 mov ebp,esp .00026905: 81EC1C000000 sub esp,00000001C ;" " .0002690B: C745FC00000000 mov d,[ebp][-04],000000000 ;" .00026912: 8B4518 mov eax,[ebp][18] .00026915: 8945E4 mov [ebp][-1C],eax .00026918: 8B451C mov eax,[ebp][1C] .0002691B: 8945E8 mov [ebp][-18],eax
the instruction
mov [ebp][-1C],eax
is likely to be trapped because of insufficient stack, because the previous instruction not trapped, and it also deals with stack, but other offset relative to ebp. Anyway, to be sure why it's trapped, you'd be better to attach com port terminal, install OS/4 kernel (it shows more info than standard kernel), write the following commands into kdb.ini:
.b 115200t 3f8 g ln u k
so, it will show disassembly and debug symbols. Also, you can use "dg" command to check the stack segment limit, to be sure if it is insufficient stack. I looked into ufat32.dll disassembly, and see it is trapped somewhere in CHKDSK, in static routines, no debug symbols. But I see, that, according to a map file, 2:000068f5 is SetNextCluster().
.0002690B: C745FC00000000 mov d,[ebp][-04],000000000 ;"
is
ULONG dummy = 0;
and
.00026912: 8B4518 mov eax,[ebp][18] .00026915: 8945E4 mov [ebp][-1C],eax
is
SetCluster.ulCluster = ulCluster;
SetCluster is the structure on stack. It is already allocated, so I see no reason for this instruction to be trapped. SSLIM=0x7fffffff, so it is 32-bit stack. According to a makefile, the stack size is 81920 bytes == 0x20000 bytes. ebp == 0x2000c, so the stack size is really too small and it's trapped because of exhausted stack. So, we need to increase. Why it lead to a ring0 trap, I have no idea. Maybe, you have TRAPDUMP set up, so it catches ring3 traps?
PS: But I see no relation of QSINIT and ACPI with FAT32. I have QSINIT too, but no Arca Noae ACPI installed.
comment:25 by , 8 years ago
PPS: 81920 bytes is the stack as specified in ufat32.dll makefile, but it should be chkdsk.com stack, by idea. And chkdsk.com is a 16-bit program with 16-bit stack, which should be less than 64k. So, I wonder why CHKDSK uses the stack as specified in DLL makefile. for SYS, I needed to switch stack because sysinstx.com trapped because of insufficient stack. But with CHKDSK, we see other situation. Strange.
comment:26 by , 8 years ago
I was wrong about stack size to be 0x20000, and ebp=0x2000c has nothing with stack limit, but we have for sure too little stack. Also, I looked into popuplog.os2 and see many following traps on boot:
13.10.2016 17:54: 1 SYS3175 PID 0007 TID 0001 Slot 000B D:\OS2\FAT32CHK.EXE c0000005 1ff368c7 P1=00000002 P2=0001fff8 P3=XXXXXXXX P4=XXXXXXXX EAX=00000004 EBX=00000008 ECX=00000001 EDX=00000020 ESI=00000017 EDI=1ff30010 DS=0053 DSACC=d0f3 DSLIM=7fffffff ES=0053 ESACC=d0f3 ESLIM=7fffffff FS=150b FSACC=00f3 FSLIM=00000030 GS=0053 GSACC=d0f3 GSLIM=7fffffff CS:EIP=005b:1ff368c7 CSACC=d0df CSLIM=7fffffff SS:ESP=0053:0001fff8 SSACC=d0f3 SSLIM=7fffffff EBP=00020014 FLG=00010212 UFAT32.DLL 0002:000068c7
here, we have trap in almost the same area, and it seems, by the same reason. But I see no ring0 traps here.
comment:27 by , 8 years ago
The only thing I can add to the Ring 0 trap screen contents I already posted:
CS=878 was pointing to OS2AHCI.ADD (first) code segment
DS=870 was pointing to OS2AHCI.ADD (first) data segment
ES=9A8 had a linear mapping (exists in the GDT) and is a 16-bit data segment but that's all I can say about it
I suspect that OS2AHCI (which I am using) trapped on dumping the trap into the trap dump partition. Nonetheless it succeeded.
And no, I don't have TRAPDUMP enabled for Ring 3 traps. As I already stated, I have it set to TRAPDUMP=R0,F: which only dumps in case of Ring 0 trap.
comment:28 by , 8 years ago
And now I see then the 2nd trap screen is the same registers as 3rd one. Did you found this info about os2ahci.add from PMDF utility by analyzing the dump file? BTW, I tried to set up DUMPFS partition non my machine, and it does not work. I press Ctrl-Alt-Num-Num, and it tries to dump, but no success -- the os2dump is called with a prompt, but it stops on that stage. (I have 4 GB of RAM, so I created the 4 GB partition, formatted it with "format i: /fs:dumpfs" /v:SADUMP, added TRAPDUMP=R0,I: and copied os2dump.hd to os2dump and then added ifs=dumpfs.ifs to config.sys), but it doesn't work. I used standard os2dump with a FAT partition when I had 1 GB of RAM, and it even worked, but now it won't. Do you use DUMPFS now? Any tips you can suggest?
Btw, cf. the DUMPFS ticket -- I tried to trap with this setup, but no trap so far.
comment:29 by , 8 years ago
Yes, I looked at the dump file's contents to find out where the selectors were pointing at. I load the dump file in PMDF and then for example a ".d DEV 870:0" will format the device driver header if selector 870 points to the first device driver data segment, a "dg 9A8" will display the GDT descriptor for selector 9A8 etc.
You get help about the supported command set by entering "?" and also ".?"
# .d DEV 870:0 DevNext: 0858:0000 DevAttr: 8d80 DevStrat: 0000 DevInt: 000d DevName: OS2AHCI$ DevProtCS: 0878 DevProtDS: 0870 DevRealCS: 0000 DevRealDS: 0000 #
For properly setting up a dump partition:
1) read this:http://home.earthlink.net/~steve53/os2diags/dumpfs-user-guide.txt[[BR]]
Pay attention to what it says about your dump partition size
2) I am using QSINIT loader. It might also work with the original OS2LDR but I know for sure it works with QSINIT.
3) I think there was a corrupted DUMPFS package but the one from here:
http://www.os2site.com/sw/drivers/filesystem/dumpfs.zip[[BR]]
should be ok
4) I don't know how well the OS/4 kernel works with dumping. I always use the standard OS/2 kernel.
5) I vaguely remember that DUMPFS.IFS has to be the last loaded IFS but I might be mistaken. Nonetheless that's how I set it up in CONFIG.SYS.
6) Your formatting commandline looks ok to me. But you should add DUMPFS.IFS to the CONFIG.SYS with full path info:
IFS=C:\OS2\BOOT\DUMPFS.IFS
but maybe you did that already.
comment:30 by , 8 years ago
FWIW, OS2AHCI is not involved in dumping a trap dump. In fact, no driver is involved. Dumping is done completely by using the BIOS.
comment:31 by , 8 years ago
2Lars: yes, I downloaded it from os2site, and read the doc from Steven Levine site. Did everything as you described, size of partition is 4000 MB, and when I press Ctrl-Alt-Num_Num, the os2dump is called, it says
The memory dump will try to dump to partition: G The amount of memory it expects to dump is: 3668928K
and it does not proceed any further.
OS/4 kernel should work with memory dumps. And I tried IBM's kernels too, with the same result.
2dazarevicz: Yes, right, os2dump works via BIOS. That is dumpfs.ifs may work via disk driver, but os2dump use direct BIOS disk access.
comment:32 by , 8 years ago
Fixed the volume label bug, and added stack switching for CHKDSK, increased the stack size to 128 KB, I hope it will be sufficient. The link is as previous, ftp://osfree.org/upload/fat32/fat32-0.10-alpha7.zip.
comment:33 by , 8 years ago
Upd: Repaired showing percent formatted and added checking error codes in most routines. So, if it failed somewhere, it must panic with an error. Also, it should show % formatted, please say if not. The link is as previous, ftp://osfree.org/upload/fat32/fat32-0.10-alpha7.zip.
comment:34 by , 8 years ago
Things have not improved:
after a "format i: /FS:FAT32" the disk is no longer accessible even though a drive letter will still be allocated. I ejected the USB stick and reinserted it, and a drive letter is again allocated but the partition cannot be accessed at all. No writing to the partition is possible.
I also tried to create a file on the formatted partition but no file can be saved to the partition.
There is clearly something wrong with the formatting code.
A "chkdsk i: /F" run directly after the formatting will now find errors:
The volume label is FAT32 VOLUM. The Volume Serial Number is 1A09-3119. CHKDSK is checking fats :Ok. CHKDSK is checking files and directories... CHKDSK is searching for lost data. CHKDSK has searched 100% of the disk. The stored free disk space is incorrect. (524696 free allocation units are reported, while 525090 free units are detected.) The correct free space is set to 525090 allocation units. 2150772736 bytes total disk space. 0 bytes in 0 hidden files. 4096 bytes in 1 directories. 0 bytes in extended attributes. 0 bytes in 1 user files. 2150768640 bytes available on disk. 4096 bytes in each allocation unit. 525091 total allocation units. 525090 available allocation units on disk. 0% of the files and directories are fragmented.
but it is not able to correct them. I can rerun the chkdsk command over and over again and it will continue to show the very same error.
comment:35 by , 8 years ago
Something to add:
I now used DFSee to format the partition with FAT32 and that works just fine.
DFSee reports that it cannot lock the complete disk (I suppose that's DosPhysicalDisk and DosDevIOCtl with the physical disk category), I answer with "go ahead anyway" and it continues and formats the partition. Please note that USBMSD.ADD is not called into when the DosPhysicalDisk/DosDevIOCtl calls are issued and therefore is also not in the loop on this failure. After formatting with DFSee I still cannot write a file to disk even after I ejected and reinserted it.
If I reboot with FAT32 0.9.13 in use, a subsequent chkdsk finds a problem. I run a chkdsk under Windows and reinsert the USB stick in the OS/2 box. I can now write a file to the partition.
comment:36 by , 8 years ago
The partition cannot be accessed because it is not mounted. So, the drive letter is allocated, but disk is not mounted. But it should write an error in this case. I wonder why are you doesn't see an error. Dfsee is probably cannot lock the disk, not physical one, but the logical one. (it formats a partition, not a whole disk). DosPhysicalDisk is not used in my case. The "(524696 free allocation units are reported, while 525090 free units are detected.)" error is showed usually on almost all disks in normal circumstances, and it does not fixed neither by our CHKDSK, nor windows one. This is usual and is not a FORMAT result. Your disk formatting is not finished with REDETERMINEMEDIA, but I don't see an error from it in your case. It should be
"WARNING: Failed to do final remount, rc = %lu!\n" "WARNING: probably, you need to reboot for changes to take in effect.\n"
if REDETERMINEMEDIA ioctl returned a non-zero code, but it isn't
I did not understood about usbmsd.add -- how it can be not called when flash disk is formatted. It should. The ADD is called by os2dasd.dmd, and os2dasd.dmd is called by fat32.ifs in this case.
comment:37 by , 8 years ago
What I meant to say for USBMSD.ADD: it is not called when API function "DosPhysicalDisk" is called. Of course it is called when formatting a USB stick.
comment:38 by , 8 years ago
It seems, I reproduced the bug with FORMAT on your side: Lewis has the following cachef32.exe command line:
CALL=C:\OS2\CACHEF32.EXE /f /p:2 /m:50000 /b:250 /d:5000
whereas, my line is:
call=D:\os2\cachef32.exe /l:off /p:1 /m:50000 /b:250 /d:5000
It seems, everything hurted the "/f" parameter. When I set the command line like Lewis has, I get:
D:\>format g: /fs:fat32 /v:sadump The type of file system for the disk is FAT32. The new type of file system is FAT32. Warning! All data on hard disk g: will be lost! Proceed with FORMAT (Y/N)? y Size: 3584 MB 8193150 sectors 512 Bytes Per Sector, Cluster size 4096 bytes Volume Serial No. is 19f8:291e Volume label is SADUMP 32 Reserved Sectors, 7986 Sectors per FAT, 2 fats 1022143 Total clusters 1022142 Free Clusters Formatting drive g: 100% percent of disk formatted ... Wrote 8198144 bytes in 3.86 seconds, 2.03 Megabytes/sec Clearing out 16012 sectors for Reserved sectors, fats and root cluster... Initialising reserved sectors and FATs... Done.
and then
D:\>sysinstx g: The type of file system for the disk is FAT32. Error 2 writing to g:\boot\loader\preldr0.mdl file. SYS0002: The system cannot find the file specified. EXPLANATION: The file named in the command does not exist in the current directory or search path specified or the file name was entered incorrectly. ACTION: Retry the command using the correct file name.
and creating a file fails. So, your case seems to be reproduced on my side. Please try my command line and see if FORMAT and SYS will succeed to you (and creating files too).
comment:39 by , 8 years ago
PS: you can try this without a reboot: turn lazy write off:
cachef32 /l:off
and format works, turn it on:
cachef32 /l:on
and it fails. So, it seems to be the lazy write bug. But I always switch off the lazy write. And, I doubt that we can fix lazy write soon, as it was always buggy :(.
comment:40 by , 8 years ago
PPS: I added FAT32_STOPLW ioctl before FORMAT and SYS and FAT32_STARTLW after them -- Now after FORMAT I see empty disk, FC/2 shows its volume label, so it is accessible. Then I do SYSINSTX and new files and dirs appear on the disk, I can walk dirs, so the disk is accessible. But I cannot create files on it, until I disable lazy write. Need to workaround this somehow.
comment:41 by , 8 years ago
Haha, DosClose fails when I create files with LW enabled. So, I Create file, write some info into it and save it successfully, but when I close it, it fails. It is easy to ensure about this, if you create a file in FC/2 by Shift-F4, then write some text into it, save by F2. But if you press Esc to close it, you see the message "There are unsaved changes". You press F2 again, and it's over and over again. I even created the test program trying to create a file on the disk, and I ensured that I get the ERROR_FILE_NOT_FOUND when calling DosClose.
comment:42 by , 8 years ago
Now I see that FS_CLOSE calls FS_COMMIT, which fails with an ERROR_FILE_NOT_FOUND error. The failed function is FindPathCluster, which returns FAT_EOF (it finds no first file cluster).
comment:43 by , 8 years ago
Yes this problem is related with lazy writing: if I turn off lazy writing the format procedure will end successfully even though it will end with error 26 (ERROR_NOT_DOS_DISK -> Unknown media type):
[d:\]format h: /FS:FAT32 The type of file system for the disk is FAT32. The new type of file system is FAT32. Enter the current volume label for drive h: FAT32 VOLUM Warning! All data on hard disk h: will be lost! Proceed with FORMAT (Y/N)? Y Enter up to 11 characters for the volume label or press Enter for no volume label. FAT32 VOLUM Size: 2048 MB 4208967 sectors 512 Bytes Per Sector, Cluster size 4096 bytes Volume Serial No. is efc:2f65 Volume label is FAT32 VOLUM 32 Reserved Sectors, 4103 Sectors per FAT, 2 fats 525091 Total clusters 525090 Free Clusters Formatting drive h: 100% percent of disk formatted ... Wrote 4221952 bytes in 2.06 seconds, 1.95 Megabytes/sec Clearing out 8246 sectors for Reserved sectors, fats and root cluster... Initialising reserved sectors and FATs... WARNING: failed to set the volume label, rc=26 Done.
Under the given circumstances I think it would be a good idea to remove the /L switch from cachef32.exe (I assume that the /F switch does not enable lazy writing by itself, if yes, it should also be removed) until the lazy writing is fixed (if ever ...).
That will avoid a lot of problem reports where these problem reports stem from a known problem.
comment:44 by , 8 years ago
2Lars: The error "WARNING: failed to set the volume label, rc=26" should not be there, as after I changed it to use sector mode, I don't see it anymore, it was with previous builds. Anyway, you have some difference from my setup, but I wonder what? Could you specify more details about your system? Regarding the "/F" switch, I agree. We need to always start up the cache helper, but probably, disable lazy writing, until it gets fixed. And it seems to enable it by default when "/F" is set, because I need to do "cachef32 /l:off" before it can be working.
PS: It was previously ERROR_NOT_DOS_DISK == 26 before, when I formatted to fat32 after fat32. But If I formatted after hpfs, or any other FS, it completed successfully. So, do you see the same? And this symptom has gone when I added the FAT32_SECTORIO ioctl after BEGINFORMAT. Maybe,you forgot to update something? Or, some older DLL left on LIBPATH?
comment:45 by , 8 years ago
I am absolutely sure I unpacked all files from ftp://osfree.org/upload/fat32/fat32-0.10-alpha7.zip that I listed above. I checked that file dates were correct (they are 15.10.2016 and 15.10.2016 for the various files, I didn't bother with documentation files) and I checked that PATH and LIBPATH would pick up the executables and UFAT32.DLL from the correct directories.
No, there still must be something. But you are right, the partition formatted as FAT32 already had FAT32 in use. I suspect it has something to do with FS_MOUNT: the partition to format is already mounted as FAT32 whereas it would not be if it were a different filesystem before the format. I can try and format a partition that has something else than FAT32 and report results.
comment:46 by , 8 years ago
Theoretically, the BEGINFORMAT ioctl should remount a partition to FAT32 IFS, so that when it is formatted, it is formatted using fat32.ifs regardless of the initial filesystem. Sorry, I cannot check if 15.10.16 is the latest because I already rewrote files at the same link: ftp://osfree.org/upload/fat32/fat32-0.10-alpha7.zip. (I uploaded the newer version with lazy write disabled in the IFS). Please check if something changed. Unfortunately, I need to reproduce your situation on my machine, to fix your problems, but I have no required details, so it's bad.
comment:47 by , 8 years ago
PS: 15 Oct seems to be an older version than I uploaded the previous version, as today is Oct 20, and I uploaded it one or two days ago -- a way after Oct 15
comment:48 by , 8 years ago
Yes, with the ftp://osfree.org/upload/fat32/fat32-0.10-alpha7.zip and latest file date of 20.10.2016 it now works fine. I can format with FAT32 no matter what filesystem there was initially. After formatting I can write files to it.
There is one odd thing in conjunction with FAT16: if I format a FAT16 formatted drive to FAT32, write a file to it and then reformat with FAT16, then I still see the file and a chkdsk (on that now FAT16 formatted drive) gives this error:
[d:\]chkdsk i: /F The current hard disk drive is: I: The type of file system for the disk is FAT. The volume label is FAT16 VOLUM. The Volume Serial Number is 9C39-A015. SYS1336: Disk error reading the File Allocation Table 1. Processing cannot continue. SYS1374: File Allocation Table (FAT) is bad on drive I.
But that might also be related to an error in the FAT16 implementation of the OS/2 kernel. You can close this bug.
comment:49 by , 8 years ago
Good. I still see a file after reformatting fat32->fat too, but it is not a surprise, as Fat32Format seems to be not overwriting the root dir. Maybe, I need to clear it too. And yes, I see the same error with FAT CHKDSK here. But this seems to be a FAT implementation bug, not FAT32. (FAT is older and it does nothing about FAT32, and they are similar).
PS: But there's still one bug in FAT32 -- it sometimes lock the very 1st FAT32 drive letter, so it fails to reformat to any file system then. (And error opening or locking the disk).
PPS: And yes, I needed to disable lazy writing in fat32.ifs for now, otherwise it will not work. I suspect, that after FORMAT, lazy write from previous volume is still active, and it tries to flush it, and corrupts volume.
comment:50 by , 8 years ago
Replying to lewisr:
A couple observations before I reboot to confirm some bad behavior I've just seen (could be entirely my fault, as I just noticed that CACHEF32 was locked and did not get updated when I copied over alpha 7 to my boot volume):
You missed updating the bldlevel strings in alpha7, so it reports itself as alpha6 (the date is newer, however). ;-)
There is a country.kor in \OS2\BOOT which I'm guessing came over with KO's patches. That isn't required for non-Korean systems, is it?
country.kor is required to use Korean characters(Hangul) correctly on non-Korean systems. In other words, to use CP949 on non-Korean systesm. So it should be shipped as well, as an option.
comment:52 by , 8 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
Just to add,
there are a whole lot of other places where pBootFSInfo pointer is used. They might suffer from the same problem.
The "pBootFSInfo" pointer should be properly set in FS_MOUNT in file "ifsmount.c" but apparently there are conditions where the volume is no longer mounted but the filesystem (FAT32.IFS in this case) is not properly notified of this fact (or the other way around: the IFS is notified of an unmount but never properly receives a remount request).
While this really is a problem of the kernel and/or the DMDs, the IFS still has to protect itself against this problem.