Opened 8 years ago
Closed 7 years ago
#46 closed defect (fixed)
FAT corruption and stuck system (ARCAOS/DOS full screen/R249)
Reported by: | Barry Landy | Owned by: | |
---|---|---|---|
Priority: | minor | Milestone: | Future |
Component: | IFS | Version: | |
Severity: | high | Keywords: | |
Cc: | bl10@… |
Description
I have a newly made ARCAOS system.
My previous system was ECS22 beta, presumably using version 913 (2008) and no ill symptoms.
I dont know which release is part of ARCAOS but I found that obeying a CMD file from a FAT32 formatted disk could cause the contents that file to vanish (not the file iteslf). There appeared to be other corruption associated with that problem.
Not unreasonably LEwis Rosenthal recommended that I try R249, which I installed using the WPI version.
The above problem did not occur. However running a DOS in a full screen window (which I need to do for other reasons) and obeying a BAT command (calling word perfect) all seemed well until I exited word perfect when the systen froze.
On the forced reboot the partition showed lots of errors to the extent that it said "use SCANDISC under Windows".
Reverting to the "stable release" 913 these problems do not occur.
I arranged for full backup of the partitions that are vulnerable and tried again, with very similar results so the effect is repeatable.
I dont know what diagnostics would help.
Attachments (46)
Change History (278)
comment:2 by , 8 years ago
Answering some questions: Why it froze: working on it! Each time: no
What FAT errors: I have no output as it did the automatic chkdsk on system restart and the output gets thrown away (I think).
All my FAT partitions are FAT32.
I have been testing with r249 and alternatively 913 (partly to try to distinguish FAT32 problems and ARCAOS problems). I have the following as part of a BAT which seems to reliably freeze the system with r249 and NOT with 913.
================== ::@echo off set popdir=%1 if "%popdir%"=="" set popdir=popdir set popdrive=%1 if "%popdrive%"=="" set popdrive=popdrive
cd > %disk2%:\temp\currdir copy %diskroot%\bat\cd.aux + %disk2%:\temp\currdir %disk2%:\temp\%popdir%.bat > nul :: To restore the pushed directory write %disk2%:\TEMP\%popdir%
.. and more lines follow
===================
disk2 is set to D diskroot to D:\d
with r249 the copy command is executed but none of the following lines are echoed. The system freezes at that point and I have to reboot. WIth 913 this does not happen.
I will try with r257 and add to this report.
comment:4 by , 8 years ago
2landb: created the file test.bat like this:
rem :: @echo off set diskroot=w set disk2=w set popdir=%1 if "%popdir%"=="" set popdir=popdir set popdrive=%1 if "%popdrive%"=="" set popdrive=popdrive cd > %disk2%:\temp\currdir copy %diskroot%\bat\cd.aux + %disk2%:\temp\currdir %disk2%:\temp\%popdir%.bat > nul rem :: To restore the pushed directory write %disk2%:\TEMP\%popdir%
I have a w: FAT32 disk with
set disk2=w set diskroot=w
Created a file w:\w\bat\cd.aux with a "test" string in it. I see no freeze and no FS corruption. Files w:\temp\currdir and w:\temp\popdir.bat are successfully created.
comment:5 by , 8 years ago
Doesnt really surprise me !
In case it matters cd.aux contains
cd
(ie cd followed by space and NO CR)
so that concatenating the two files gives one line cd data to change directory (back).
(but I cant see why it matters)
comment:6 by , 8 years ago
What doesn't surprise you? That it does not freeze and no FS corruption? Maybe, it is ArcaOS is guilty, not fat32? Or, I miss something? I have no 14.200 kernel, for example.
PS: Did you tested the same fat32 version under eCS? Does fat32 0.9.13 have this problem, if tested with your ArcaOS?
comment:7 by , 8 years ago
I was not surprised that you could not replicate the effect
I have now tested with ECS22 and both version 913 and 257 on both systems
I get the same "stuck system" effect with version 257 on both ECS and ARCAOS and the effect does NOT occur with version 913 on either system.
comment:8 by , 8 years ago
I continue to see various stuck states. All using Dos. All usin software you may not have.
Can I send you a dos editing program and a small file to experiment with?
comment:10 by , 8 years ago
new comment by the reporter in ticket #47. But as this ticket might have a lot more information, I add the comment from #47 in here and close #47
Since experiencing the stuck state (as I thought on ArcaOS) I have experimented with both CS and ARCAOS. The stuck states occur on both systems in all releases later than 913. I now have a file on a FAT32 formatted volume that causes a stuck state on either system when attempting to delete it (DEL comand) in a DOS window. This effect is (so far) 100% consistent and oersisted despite remaking the volume by copying the files and directories. Please let me know what diagnostic information would be useful.
comment:11 by , 8 years ago
now have a file on a FAT32 formatted volume that causes a stuck state on either system when attempting to delete it (DEL comand) in a DOS window. This effect is (so far) 100% consistent and oersisted despite remaking the volume by copying the files and directories.
Please let me know what diagnostic information would be useful.
You'd better describe how I could reproduce this on my machine. And please, try the latest version from ftp://osfree.org/upload/fat32/ again, and test if the problem is present in the latest version. (It could be gone on newer versions. I tried to delete a file from FAT32 partition from a DOS prompt -- it deletes it as it should, no stuck state.). And, you can give me a link to your DOS program, so, I could try to test it on my machine.
comment:12 by , 8 years ago
Will try the new version. I am fairly baffled at how to test on your machine. There is nothing special about the file that will not delete.
by , 8 years ago
by , 8 years ago
Attachment: | POPDIR.BAT added |
---|
comment:13 by , 8 years ago
No difference with R269 (sorry!)
I have attached two files ne.exe (the editor) and a small file (popdir.bat)
instructions to follow
comment:14 by , 8 years ago
ne is a free standing context editor (very old!)
the following causes a stuck state on my ECS22 system (with any version of FAT32 other than 913).
POPDIR.BAT is the small file I have attached. It is on a FAT32 volume (and in case it matters, so is NE.EXE)
call NE by
NE popdir.bat a command window opens showing the one line of data END to go to the end of the line; backspace to delete the last charecter F3 to exit in response to the prompt to update the file or do other things type y (for yes)
this is where for me the system gets into a stuck state
comment:15 by , 8 years ago
I start it as
ne popdir.bat
add an empty line, press F3, "y", Enter, it saves and quits. No stuck.
Do you mean by "stuck" that system hangs, or only a VDM hangs? Mouse pointer moving? Ctrl-Alt-Del works or only Reset?
comment:16 by , 8 years ago
I mean the entire system hangs and only reset will recover it.
(as I said before I am not surprised your system does not show this effect which is very hard on my system and which I can provoke in various ways in both a DOS window and a DOS full screen)
comment:17 by , 8 years ago
Then you need to give me enough info to have problem reproduced on my machine. Or try taking a COM port log. Do you have a COM port and a null-modem cable?
comment:18 by , 8 years ago
yes I have a com port (not activated). I have an old cable that was once used to transfer data between systems which I seem to recall is a null cable, or of course I could buy one.
Please point me at some info about how I would use that.
comment:19 by , 8 years ago
2landb: Ah, nice. You need another machine with COM port too, and connect both with the cable, plus a terminal emulator program on another end. It can be any OS. If the second machine has no COM port, you can use an USB2Serial adapter (you plug a null-modem cable into it). For OS/2, I use Prolific pl2303-compatible adapters. For Linux, the choice could be significantly wider. And on debugee side, you should specify the /monitor switch in fat32.ifs command line in config.sys. It permanently enables the debug messages. Then you should see the messages in the terminal. You can save the log and send it to me. I need a log taken when the system begins hanging. This is very good if you have a COM port and a cable.
comment:20 by , 8 years ago
OK good. I guessed I would need a second computer :-) and I have a laptop T61 which does have a com port (and anyway I have a docking station which certainly does). I can run ECS21 or windows 7; does it matter which?
What do I run in the monitoring system to see the messages?
comment:21 by , 8 years ago
2landb: Yes, very good. I have a COM port on my ThinkPad X61T Docking station, too. This is how I usually debug fat32.ifs. It does not matter which OS you use -- only in that how is your available terminal emulator is good enough.
What do I run in the monitoring system to see the messages?
Terminal emulator. I use DTerm on OS/2: https://sites.google.com/site/os2acw/dterm_b10.zip. On windows, you can use HyperTerminal or Putty. On Linux it is usually minicom or cu.
comment:22 by , 8 years ago
Nothing is ever simple. What I thought was a serial connector (blue) is in fact a VGA connector. Documentation says there is an internal serial header so a bigger project to connect to it
I assume it cannot be done via a D25 connection?
comment:23 by , 8 years ago
Nothing is ever simple. What I thought was a serial connector (blue) is in fact a VGA connector. Documentation says there is an internal serial header so a bigger project to connect to it
This is on a laptop itself, I assume. But you said, you have a COM port header on your docking station? I have the same.
I assume it cannot be done via a D25 connection?
Usually a PC has a DB9 connector. Modems usually have DB25. My null-modem cable is 9-to-25, so I needed an additional 25-to-9 converter.
comment:24 by , 8 years ago
Sorry: I wasnt precise enough.
My PC (where I generate the stuck states) does not have an external serial connector but just an internal one.
The PC mobo does have an external DB25 connector (aka parallel port) so I wondered if I could use that?
Though I assume your /monitor switch will send data out through COM1?
comment:26 by , 8 years ago
The PC mobo does have an external DB25 connector (aka parallel port) so I wondered if I could use that?
LPT port? Not, of course. You need a COM port. LPT port is principially different.
comment:27 by , 8 years ago
OK I now have a connector plugged into the serial header with a DB9 port on the other end and what I hope is a null cable connected to the T61.
As I suspected there are no com.sys statements in my config.sys.
I added device=<drive>:\os2\boot\com.sys in config and also added /monitor on the fat32.ifs statement but nothing happens (ie hyperlink hears nothing)
Do I need anything else in config.sys ?
(Of course the header connector could be the wrong way round, or the cable - laplink - not be the right sort, or or..
comment:28 by , 8 years ago
I added device=<drive>:\os2\boot\com.sys in config and also added /monitor on the fat32.ifs statement but nothing happens (ie hyperlink hears nothing)
If your machine is SMP, you need pscom.sys instead of com.sys, because com.sys is not SMP-aware. Also, did you installed a debug kernel first?
(Of course the header connector could be the wrong way round, or the cable - laplink - not be the right sort, or or..
laplink cable is 25-pin. Isn't your cable 9-pin one?
comment:29 by , 8 years ago
The cable is a dual cable with DS9 and DS25 connectors on both ends.
No I have not installed a debug kernel; I did not realise it is needed. Where do I get it from?
comment:31 by , 8 years ago
No I have not installed a debug kernel; I did not realise it is needed. Where do I get it from?
On OS/2 installation disks, of course. \os2image\debug\smp\os2krnlb is a half-strict kernel, which you need. And please install the symbol file, os2krnlb.sym. But rename them to os2krnl.* first, of course.
I meant DB9/25 of course!
so, you have a 25-to-9 converter too?
comment:33 by , 8 years ago
I meant DB9/25 of course!
no; no 25-9 converter. It has DB9 on both ends
very strange
comment:34 by , 8 years ago
Please talk me through this as to an idiot.
I have disks for ECS2.0 and ECS2.1 and ARCAOS
What I am actually running at present and am using as testing ssystem for FAT32 is ECS2.2 beta but this came with an ISO download and I dont know if the generated DVD has an alternate debug kernel (though I suppose it does). At present I cannot see the DVD but I have the iso file.
The install disks for ECS21 and ARCAOS both have a half strict kernel available so I think it probably best if I change to using the ARCA system.
Do I just copy the kernel into the system (and the symbol table)?
(I used to do this a lot when kernel updates happened regularly but have lost the technique)
And re the cable: I have in the past only used the 25-25 connectors so I dont know if the 9-9 will work.
comment:35 by , 8 years ago
I found the ECS2.2 beta II DVD. It does have a debug kernel so I could use that also.
comment:36 by , 8 years ago
What I am actually running at present and am using as testing ssystem for FAT32 is ECS2.2 beta but this came with an ISO download and I dont know if the generated DVD has an alternate debug kernel (though I suppose it does). At present I cannot see the DVD but I have the iso file.
I know nothing about ArcaOS, but I suspect it is the same as eCS. On IBM's OS/2's and on eCS it is the \os2image subdirectory in the root. If you have only an ISO image, then you could burn it first on a CD/DVD disk (otherwise how you are going to install it on real hardware. If not real hardware, then it's possible, of course.). Also, you could extract it by 7zip or mount via an ISOFS Netdrive plugin. 7zip was available on eCS boot partition after installation. I suppose you have one, or ArcaOS has 7zip too (maybe, I only suspect, as I haven't ArcaOS).
Do I just copy the kernel into the system (and the symbol table)?
Yes, you need to copy the kernel, or insert it in the QSINIT(/ArcaLoader?) menu. In case of QSINIT, the older kernel is available there too.
PS: Back up your kernel. of course, if you're not using QSINIT menu. In case of QSINIT, you can copy your new kernel under another name, and add both kernels to menu.
comment:37 by , 8 years ago
I am using QSINIT so that's OK.
I have set the ECS22 system up to boot with QSINIT from either the default kernel or OS2KRNLB, and that works OK.
However, during system init (reading config.sys) OS2 reports that COM.SYS (or PSCOM.SYS) is ignored as if the COM1 port is not present.
I have tried 2 different ways PSD=ACPI with no parameters and PSCOM.SYS PSD=ACPI /maxcpu=1 and COM.SYS with the same error.
I have checked the BIOS and it says serial port active I have looked on Windows and it says COM1 is working.
Any ideas?
comment:38 by , 8 years ago
However, during system init (reading config.sys) OS2 reports that COM.SYS (or PSCOM.SYS) is ignored as if the COM1 port is not present.
Did you enabled that COM port in BIOS first?
I have checked the BIOS and it says serial port active I have looked on Windows and it says COM1 is working.
What are COM port settings in Windows then? I mean, I/O port address, port number?
comment:39 by , 8 years ago
However, during system init (reading config.sys) OS2 reports that COM.SYS (or PSCOM.SYS) is ignored as if the COM1 port is not present.
On which machine? On terminal or on debugee? If debugee then it's normal, because kernel debugger takes this port and com.sys/pscom.sys should be disabled. On terminal machine they should be enabled, of course.
comment:40 by , 8 years ago
On the MUT (ecs system being tested); my terminal setup is using windows on T61
So do I actually need the (PS)COM.SYS? I "cured" the "line ignored" problem by moving the (PS)COM.SYS above the USB section. (to stop USBCOM.SYS grabbing it)
from what you say this is wrong? Does the debug kernel grab the serial line before config.sys?
comment:42 by , 8 years ago
Debug kernel grabs the COM port itself, so COM.SYS/PSCOM.SYS should be disabled during testing. And USBCOM.SYS does nothing with physical COM port too. It can handle USB COM ports only. Yes, the kernel grabs the COM port before config.sys.
comment:43 by , 8 years ago
Windows says com1 is 3f8 and irq 4
So, it is normal COM port with normal settings. And you should add the settings to os2ldr.ini, like this:
dbflags=0 ; com port address (0 for vio) dbport=0x3F8
comment:44 by , 8 years ago
Good news and bad news
Having finally (I hope) got it set up correctly (I added the dbflags and port to os2ldr.ini and corrected the kernel line) the system would not complete the boot process: at the end (maybe) of the config.sys work it stopped on a black screen.
I tried changing to VGA but that didnt work very well (for either mode)
comment:45 by , 8 years ago
It stopped at the debugger prompt in the boot process beginning. Did you tried the "g" (go) command from the debugger prompt? You can also create the file "kdb.ini" in the root directory of the boot drive with these contents:
.b 115200t 3f8 g ln u dd ss:esp k
It will set up the com port speed and address, and continue with "g" command. Then execute the rest of commands in case of a trap.
comment:46 by , 8 years ago
Sorry but how am I supposed to know this stuff? I have zero documentation for the debugger and the qsinit "documentation" is totally wierd - I thnk I got there at the 3rd attempt.
If I create that file are there any more things I ought to know as a pre-requisite?
Anyway - going to bed now.
comment:47 by , 8 years ago
You just ask me if something is not clear. These things are needed for each one working with the kernel debugger, anyway. If not clear, just ask. The debugger stops at very early stage, to have very easy debugging. If you need not stop there, you just create a debugger ini file with "g" line (at least). But at most, I suggested you more lines for my convenience :)
comment:48 by , 8 years ago
I put that file kdb.ini in the root (is that a misprint for kbd.ini?)
No change: the system ends up in a black screen AND it is not responsive to my keyboard
NB the keyboard/mouse are USB attached to a USB hub. Iswitched them to a root hub and that made no difference. If it is critical I hae a PS2 keyboard I could switch to.
comment:49 by , 8 years ago
I put that file kdb.ini in the root (is that a misprint for kbd.ini?)
No. KDB.INI. Kernel DeBugger INI.
No change: the system ends up in a black screen AND it is not responsive to my keyboard
Where do you type? In terminal? Or on test computer keyboard?
NB the keyboard/mouse are USB attached to a USB hub. Iswitched them to a root hub and that made no difference. If it is critical I hae a PS2 keyboard I could switch to.
You type in the terminal, not your computer keyboard. So keyboard and mouse don't matter.
comment:50 by , 8 years ago
so
1) kdb.ini made no difference - still a black screen It should have had some effect?
2) I have never seen anything at the terminal end and the using the terminal keyboard had no effect,
I need to test the serial port separately (? I may look for a serial keyboard) and then the null cable - I may need to buy one. Will come back when I have done that
comment:51 by , 8 years ago
I found a serial mouse. The serial connection was not working (my attempt to set it on the header was bad) and it may take a little while to fix it (my fingers are too big!)
comment:52 by , 8 years ago
landb: You could try connecting two terminals from both sides and see if it works (prior to connecting real debugger), with what settings. You'll need to reenable pscom.sys, of course. And see if your cable is defective. Did you set up all terminal settings correctly? Should be Speed 9600, 8 data bits, 1 stop bits, no parity (aka "8N1").
comment:53 by , 8 years ago
kdb.ini made no difference - still a black screen It should have had some effect?
It should boot further, at least, and not to stop in the beginning. Do you see black screen in terminal or on the test machine? Did you copied kdb.ini to the root of boot drive? Not another drive?
comment:54 by , 8 years ago
1) I will get the serial fitted professionally tomorrow so no need for extra fuss until then. 2) =============== It should boot further, at least, and not to stop in the beginning. Do you see black screen in terminal or on the test machine? Did you copied kdb.ini to the root of boot drive? Not another drive? ==================
Black screen was on MUT. KDB.INI in MUT in root of boot drive. Might the lack of COM1 confuse it? Let's wait and see....
3) terminal settings were OK I think (standard defaults)
comment:55 by , 8 years ago
Black screen was on MUT
very strange, because it should boot further. Are you sure you copied the debug kernel? Copied to the root? Renamed to os2krnl?
comment:56 by , 8 years ago
Good take! I have not renamed the kernel because QSINIT indicates that the system will boot without. I assume then that something in the debig kernel requires the name to be right?
I will test that
comment:57 by , 8 years ago
kernels are now
18-07-09 3:26a 1,004,533 0 a--- os2krnl 30-05-13 1:44a 870,881 0 a--- os2krnl.106 11-06-09 6:07a 192,148 0 a--- os2krnl.sym 17-03-05 8:20p 171,076 0 a--- os2krnls.106
The two 106 files are the original (and the non debug system boots perfectly well under these names)
I tried the debug system again (now os2krnl) and it stops with a black screen again. kdb.ni contains .b 115200t 3f8 g ln u dd ss:esp k
and os2ldr.ini
[config] ; Kernel number from kernel list section to be loaded by default default=1
; Waiting time (in seconds) after which kernel mentioned in "default" ; will be loaded timeout=10
dbflags=0 ; com port address dbport=0x3f8
[kernel] os2krnl.106=Default, memlimit=512 os2krnl=debug
comment:58 by , 8 years ago
I have not renamed the kernel because QSINIT indicates that the system will boot without
Ah, I forgot that you are using QSINIT... then no need to rename.
[config] ; Kernel number from kernel list section to be loaded by default default=1 ; Waiting time (in seconds) after which kernel mentioned in "default" ; will be loaded timeout=10 dbflags=0 ; com port address dbport=0x3f8 [kernel] os2krnl.106=Default, memlimit=512 os2krnl=debug
In one line?
then not correct. It should be like this:
[config] ; Kernel number from kernel list section to be loaded by default default=1 ; Waiting time (in seconds) after which kernel mentioned in "default" ; will be loaded timeout=10 dbflags=0 ; com port address dbport=0x3f8 [kernel] os2krnl=Default boot os2krnl.dbg=Debug kernel
And the kernel is named as "os2krnl.dbg". The file should be renamed the same way.
comment:59 by , 8 years ago
No - it wasnt in one line. That was caused by the reporting ignoring my linefeeds.
comment:60 by , 8 years ago
So, your QSINIT boots an original kernel, and your debug kernel is called os2krnl.dbg on the disk, and the same name in os2ldr.ini? And it began to boot? You can try adding "loglevel=3" to the [config] section of os2ldr.ini, then QSINIT should write his debug info to the debug terminal. Is there this debug info, or no?
comment:61 by , 8 years ago
1) Since i do not yet have a monitoring terminal (waiting for the serial port) I would not see the QSINIT log.
2) for your information (and I should have said this before)
a) my boot volume is formatted JFS
b) I switch off lazywrite for the FAT32 cache (ie I am not using the cache).
comment:62 by , 8 years ago
Progress (of sorts!)
1) the serial port is now working (tested with a serial mouse)
2) using my old cable I get information on the terminal emulator. However what appears is junk (as below)
ç€Ó!Ú%B¹%[oo“[)„{ š[)b÷)«.jÚ!Ú•¤õŠó
7K¯pjÚ!Ú•¤õLSQ˜)!ß1ÝŠ`³B@%k¹«J!UŠC¼„
õÖfÖÛõ¹«J!ûßÚMR¼ƒKb÷)ªÏœgûç‘oç )‘1‘Æç17!+ÆÂÇ)™´ç™™Â)Ú)ç™gÂ
)Ñbâ bÎ!ý]ýÔâý!ûÖVѳø!ü'KûãŠKBBªKûãò
I assume this means the cable is not a null cable? In which case I will go and purchase one.
comment:63 by , 8 years ago
How did you tested it with serial mouse? Attached it to the other end of null-modem cable? If no, then it only indicates that the port seems to be ok. The cable is in question.
comment:64 by , 8 years ago
No. I plugged the serial mouse into the serial socket on the computer. That worked OK. I am pretty sure my old cable is straight through and I am getting a null modem cable tomorrow,
(I cannot connect the mouse to the end of the old cable as both ends are female and I dont have a gender changer)
comment:65 by , 8 years ago
Yes, maybe, your cable is a modem cable (straight one). Null-modem cable is a crossed one. Ah, yes, it is other gender...
comment:66 by , 8 years ago
Still progress of sorts. I now have a null modem cable (sold as such!).
However I am still getting rubbish on the terminal screen. It is set to see COM1; set to 9600 8N1
Any other settings needed: I left it set to emulation auto detect but there are lots of other possibilities.
You also suggested I could use PUTTY which i also have but doesnt seem designed to connect to COM1.
All advice welcomed!
comment:67 by , 8 years ago
I also downloaded an up to date putty which has a serial option; I used that and still get garbage on the screen.
comment:68 by , 8 years ago
AH, good. Do you have ".b 9600t 3f8" at the beginning of a kdb.ini (kdb.ini should be placed on a test machine! As it is the kernel debugger .ini file and kernel debugger runs on the test machine, not the terminal) too? BTW, we can try setting speed to 115200 on both end (should be ".b 115200t 3f8") for more quick debugging. The rest of settings seems to be ok.
comment:69 by , 8 years ago
I looked at kdb.ini on the test machine. It had 115200 whereas the terminal machine had 9600. I corrected kdb.ini to 9600.
Made no apparent difference to the output though I dont know what I am expecting. Meanwhile running the debug kernel still stops with a black screen.
comment:70 by , 8 years ago
I decided to also try monitoring with an OS2 system on my T61 (ECS21 as it happens)
I downloaded dterm (from your earlier post) and got it running.
I then booted the debug terminal on the MUT and absolutely nothing appeared on the Dterm screen. Iand yes, the null modem cable is still connected).
comment:71 by , 8 years ago
It occurred to me to try the latest version of FAT32 (r257) on my T61. It creates the same stuck state, so it is something to do with the way the system is set up rather than any oddity in the hardware. I will try to monitor in the other directin, using the Thinkpad as the MUT.
comment:72 by , 8 years ago
I could recommend to install a terminal program on both machines and experiment with their settings first. Try to set up the same settings and see if typing symbols on one terminal outputs the same symbols on another. But only enable pscom.sys first on that machine. Otherwise, the COM port will not work.
comment:73 by , 8 years ago
I discovered that the T61 serial port was disabled, but it still makes no difference.
Independently of the com output shouldnt the debug boot be getting further?
comment:74 by , 8 years ago
Of course, you should not load the debug kernel, if you want to use a COM port for another purpose, than kernel debugger, and enable the COM port driver instead. Otherwise the kernel debugger will take over.
comment:75 by , 8 years ago
Of course. My point is that when loading the debug kernel to try to debug why my system gets into a stuck state the system gets into a different stuck state prior to loading the normal OS2 screen. You suggested the kdb.ini file, which I put in place but that made no difference. So why isnt the system getting further with the debug kernel?
comment:76 by , 8 years ago
It probably, trapped somewhere. You need to get your terminal working first, to see why and where. (Maybe, it traps on mount. Disk mounts occurs nearly before pmshell start) When a debug kernel is installed, all traps go to the debug terminal, not showed on the screen.
comment:77 by , 8 years ago
I recommend to start with two terminals on both COM ports attached first, without the debug kernel active, and with a COM port driver enabled on both sides. Then if you correctly set all options, typing a string on one terminal should reproduce the same string on another side (you will see it on the terminal). If while you typing on one side, only incorrect characters repeated (not the same you typed on the other side), then somewhere COM port settings are wrong, but the connection is ready. Also, if, while you typing on side 1, it repeats correct chars on side 2, but not vice versa, then side 1 has correct COM port settings (data bits, stop bits and parity bits number, 8N1, for example), terminal emulation settings etc. 8N1 on both sides should be sufficient. And, there is suc setting as whether to react to CD (Carrier Detect) and TR (Terminal Ready) signals. Your cable could be 2- or 3-wire (which is common with nowadays' cables), and hence, doesn't have CD or TR signals wires. So, some terminal emulators could work incorrect. So, you should look into your terminal program settings, if such options (to disable CD and/or TR signals) exist. Debug kernel doesn't need these signals, but some terminal emulation programs may need.
When you set up terminal programs ok, you may change one side to a debug kernel instead of a terminal program, and all should work (you need to disable a COM port driver on that side, and enable a debug kernel first).
comment:78 by , 8 years ago
pleased to report that when I run Windows on both systems and open hyperterminal on both what I type on one appears on the other in both directions.
The same is true with PUTTY (both ends or one end PUTTY the other hyperterminal)
So the link is good and the cable good.
I have not tried the OS2 to OS2 route yet
comment:79 by , 8 years ago
Ok, so you already can use your terminal under Putty or HyperTerminal (or OS/2 with HyperAccess Lite, as it is very similar to HyperTerminal, and produced by the same company).
PS: You said you tested r257, but there's newer build, r276 already, you could try it (look at ftp://osfree.org/upload/fat32/ for newer builds.)
comment:80 by , 8 years ago
And it works with Dterm on OS2 (both sides). I did not know about hyperaccess.
Why do I need to disable the com port driver? Wouldnt the debug kernel take it first?
comment:81 by , 8 years ago
I tried to leave the COM port driver enabled, as a result, the COM port driver takes the COM port over, and starting from the COM port driver loading, messages are stopped to come to the terminal. So, it appears that the COM port driver prevents the debug kernel to be working.
comment:82 by , 8 years ago
1) I installed r276. Both with 269 and 276 I find that there are more frequent stuck states and I dont do anything odd; just for example opening an OS2 window (startup commands come from a Fat32 partition)
2) I have dterm on the terminal, and it showed the usual junk characters. I typed some text on the screen and it suddenly started outputting readable text as a result of the /monitor.
3) just before the stuck state it showed the following
P:30 T:c D:0 T:10:386 GetBuf3Access: FindPathCluster P:30 T:c D:0 T:10:388 GetFatAccess: ReadSector ░ P:30 T:c D:0 T:10:439 GetReadAccess: ReadSector ░ P:30 T:c D:0 T:10:447 ReleaseReadBuf ░ P:30 T:c D:0 T:10:486 ReleaseFat ░ P:30 T:c D:0 T:10:521 ReleaseBuf3 ░ P:30 T:c D:0 T:10:521 GetFatAccess: GetNextCluster ░ P:30 T:c D:0 T:10:523 ReleaseFat ░ P:30 T:c D:0 T:10:558 FS_OPENCREATE returned 0 (Action = 1, OI=411808c8) ░ P:30 T:c D:0 T:10:562 FS_FILEINFO for D:\D\cmd\AUTOEXEC.CMD, usFlag = 0, level ░ 4 ░ P:30 T:c D:0 T:10:646 FS_FILEINFO returned 0 ░ P:30 T:c D:0 T:10:646 FS_FILEINFO returned 0 ░ P:4b T:81 D:0 T:10:695 FS_PROCESSNAME returned filename: D:\D\CMD\AUTOEXEC.CMD░
which I doubt is helpful ■
comment:83 by , 8 years ago
That's good. But the piece of log you've inserted here is too small, of course. In the dir you started DTerm from, should be a DTerm.log file. Please attach it here (please, set the Log option to "replace" to clear log on startup. There is a status line at the bottom, you can choose a set of options by "arrows" at top left, options could be choosen by mouse or keyboard)
comment:84 by , 8 years ago
Sorry, no log. It's not set up by default. I found the control-l option which gave a suboption to name a log. will try again!
by , 8 years ago
Attachment: | dtermlog.001 added |
---|
comment:85 by , 8 years ago
I have attached a log. again trying to open an OS2 window and finishing in a stuck state
To get the dterm window to see characters instead of rubbish I started by doing
echo xxx > com1
on the MUT. why that should help is a mystery!
comment:86 by , 8 years ago
This log contains no hang. All IFS functions complete normal and it does not hang in any of them. Are you sure this log is taken after the system is hanged? Or, it hanged not in the IFS, but somewhere else.
To get the dterm window to see characters instead of rubbish I started by doing echo xxx > com1 on the MUT. why that should help is a mystery!
Are you sure you have disabled the com port driver?
echo xxx > com1
should not work as it should be no "com1" device in the system, because you disabled the COM port driver!
PS: what says the following command:
mode com1
Does it say "SYS0021: The drive is not ready"?
comment:87 by , 8 years ago
You misunderstand. This is NOT using the debug kernel but the ordinary kernel and your monitoring data is going to COM1 via PSCOM.SYS (as is the ECHO).
As I can say about the hang is that the OS2 window never came to the command prompt. Mouse and keyboard dead.
The Log was closed AFTER the hang. Some interaction with something no doubt. I am pretty sure that if I put back the old version (913) there will be no hangs.
by , 8 years ago
Attachment: | putty1.txt added |
---|
comment:88 by , 8 years ago
Here is another one: this time captured by PUTTY under windows. Still needed am ECHO xx to stop the rubbbish.
I dont know if this is more indicative but it does not seem to end tidily.
comment:89 by , 8 years ago
Ah, you load a non-debug kernel, output a string to COM1 via echo command, and the log becomes ok on the other side? If so, then you can use a non-debug kernel, every thing should work ok. It appears that some program accesses the com port too, so it reinits the COM port. When you issue "echo" command, the port is reinitted again and debug messages pass through.
From which window you issue an "echo" command, if it hangs after that?
And where it "does not end tidily"? I see it ending on FindPathCluster call... Or, you cutted the log? Could you please, leave some number of repeating lines at the end?
comment:90 by , 8 years ago
Ah, "GetBuf3Access: FindPathCluster", so it is waiting on the semaphore. Then it is not a hang. Only a program accessing the FAT disk should "freeze". The rest of the system should be responsive. What program are you using to access the FAT32 disk? WPS?
comment:91 by , 8 years ago
1) as expected version 913 runs clean. it also runs much faster presumably because there is no monitoring.
2) to issue the ECHO command I have to start an OS2 window, so it doesnt always hang....
3) PUTTY has a "copy all to clipboard" option which I used so what I attached was ALL the logged data.
4) the program accessing the FAT32 disk was the OS2 window processor. The system was completely hung - mouse dead keyboard dead no way to access anything else.
comment:92 by , 8 years ago
Hm, strange. Buf3 semaphore should be free at the moment, so, it should be no lockup. Or there are some repeating messages after a line "GetBuf3Access: FindPathCluster?" ?
comment:94 by , 8 years ago
Sorry to confuse.
I simply meant that I am opening an OS2 window and it is processing some commands. In config.sys I have
os2shell /k d\cmd\autoexec.cmd
so that whenever an os2 window is opened it will obey that CMD file and whatever it calls.
Strange - yes I agree. But thats why I am trying to debug this....
(no more tonight. What time zone are you? wouldnt we be better off communicating by email?)
comment:95 by , 8 years ago
Ok, you meant the "OS/2 Window". It is a standard PM window. And I use OS/2 windows without any problems on my machines. I suspect that your autoexec.cmd makes some problems. Could you share your autoexec.cmd instead? (BTW, why not use 4OS/2 for that purpose, it can have a file executed on startup of each OS/2 window). And could you try to disable your autoexec.cmd for testing purposes? (Just restore standard values of OS2SHELL and COMSPEC variables in config.sys and reboot?). I am in GMT+3 (Moscow) timezone. But I usually working nearer to evening.
May be, email would be better, but I prefer irc (if you agree). I am 24h hanging over #osFree irc channel on EFnet irc network (www.efnet.org for the list of servers).
comment:96 by , 8 years ago
I dont use irc.... I dont use 4OS2 (and dont want to start now)
I could remove the automatic use of start up command file but actually I want to get to the bottom of this and not find ways around it. All that autoexec does is set up things I want set in my window such as the path, colour of prompt, larger window for scrolling, etc. What do you mean by "share the autoexec.cmd"?
I have not tried to recreate the DOS window hang since getting this monitoring working as assumed the other hand was "just as good"....
comment:97 by , 8 years ago
Ok, you could use email, if you wish, but it isn't much better than writing here (my email is _valerius (dog) mail (dot) ru).
Removing autoexec is just for testing purposes -- I want to see if it is your autoexec that is causing troubles. If yes, you could just share with me your .cmd's contents and I maybe, will reproduce the problem on my machine. It could speed up things a lot.
comment:98 by , 8 years ago
PS: And please, attach your config.sys here. Maybe, you have some references to your FAT32 drive in it, or something like this.
PPS: What is searching for ex.[bat|cmd|com|exe]? It this "ex.*" accessed from your autoexec.cmd?
by , 8 years ago
Attachment: | config.sys added |
---|
comment:99 by , 8 years ago
1) I have attached config.sys. The FAT32 partition is D:
2) I have a tiny file which converts ex to exit (both ex.bat ad ex.cmd are present)
3) ex is not referenced from autoexec or any of the cmd files it calls (which I will attach in an email)
by , 8 years ago
by , 8 years ago
Attachment: | putty2.txt added |
---|
by , 8 years ago
Attachment: | putty3.txt added |
---|
comment:101 by , 8 years ago
Attached 2 files
lots of fun this morning.
1) removing the /K from the OS2_SHELL line stopped my system completing the boot process (the PM screen came up but remained empty). I totally do not understand this!
So instead I changed my references to d:\d\cmd (d: is FAT32) in config.sys to F:\D\CMD (F: is HPFS)
2) with that setup I installed 276 again; added /monitor and booted. the first instance of an OS2 window worked fine. After exiting and starting another (clicking on icon) it was referencing the FAT32 volume and became stuck again. See putty2.txt
3) reboot: this time I started a DOS window and it quickly became stuck. See putty3.txt
comment:102 by , 8 years ago
Hm, it looks like there is really hang in FindPathCluster. I added some debug messages and uploaded r276 there: ftp://osfree.org/upload/fat32/test/fat32-0.10-r276-os2.zip. Please take a log.
PS: Yes, you have a very strange setup. Why do you need .cmd files on FAT32 drive and PATH referencing it? Anyway, it helps to catch glitches ;) But I am still unable to reproduce this on my machine :)
PPS: Are you sure that your FAT32 disk does not contain errors? Either, those fixable with CHKDSK, or hardware ones? Bad sectors, or maybe, a faulty drive?
comment:103 by , 8 years ago
1) I have done extensive and repeated error checking (my first thought!). In any event the errors also occur on my Thinkpad so I think hardware error is ruled out
2) my setup did not used to be like this. Current setup is all to do with data backup, migration and recovery.
I used to have cmd files with extensive lists of things to be backed up. I moved to using file synch on one (maybe two) tree. This tree is on the FAT32 partition so that windows files also get backed up at the same time.
comment:104 by , 8 years ago
Running the new 276
first boot attempt failed. After a long wait there is a message "failure to get semaphore" (exact text has gone. Then ENTER to proceed; screen clears. new message FAT32: Semrequest getFatAccess Failed rc=121! Press Enter to continue
but it does not continue.
force reboot.....
comment:105 by , 8 years ago
2nd reboot:
Long wait at the same point but then the boot continued.
open OS2 window: echo starting > com1 exit open window again: stuck after achoing "call cprompt"
putty trave in putty4
by , 8 years ago
Attachment: | putty4.txt added |
---|
comment:106 by , 8 years ago
Are there any other logs you need?
I realise that as well as the CMD files which are normally on FAT32 (but now moved) there is also a APP dataset (\apps) which is still on FAT32.
I could move that but on the other hand it is causing this problem quite reliably.
comment:107 by , 8 years ago
Are you sure putty4.txt is with ftp://osfree.org/upload/fat32/test/fat32-0.10-r276-os2.zip, and not ftp://osfree.org/upload/fat32/fat32-0.10-r276-os2.zip ?
I suspect, you downloaded the 2nd one, because I don't see any additional debug messages.
Regarding the getting semaphore problem, the current binary is compiled with an experimantal option for working around another problem. This is unrelated problem, now I built the binaries without this option enabled. (This problem needs to be adressed separately, and I know about it). The binary is at 1st link, as above (note the "test" subdirectory).
comment:108 by , 8 years ago
I downloaded directly from the link in the post above (comment 102). Looking at the 2 links in 107 the one I downloaded and the one you quote above as correct (\test\) have the same length (compressed 799kb) and the non test one has a different length (compressed 801kb).
comment:109 by , 8 years ago
I know what happened. The wpi version puts the FAT32.IFS into \ecs\boot; your zip puts it into \os2\boot.
I will correct and try again.
comment:110 by , 8 years ago
Every boot with that version of FAT32.IFS hits the semaphore timeout. 4 tries so far
comment:111 by , 8 years ago
Keeps on happening. I assume all the timings are changed by the presence of the extra debug code? It certainly runs very slowly...
comment:112 by , 8 years ago
Apologies. it took me a very long time to realise you meant you had changed the test version and I needed to download it again. trying that now (with the \ecs -> \os2 change)
by , 8 years ago
Attachment: | putty5.txt added |
---|
comment:114 by , 8 years ago
That is non-official .WPI from ArcaNoae maybe, installs it to \ecs\boot, but both my .WPI and .ZIP have it packed into \os2\boot. And I didn't uploaded any .WPI's. It is presumed that you need to copy each file into its intended location manually, not just "unzip file.zip" to root. Some files may be unnecessary for you, such as .sym files or docs.
Every boot with that version of FAT32.IFS hits the semaphore timeout. 4 tries so far
With last reuploaded? Should not be... As I disabled corresponding code.
Apologies. it took me a very long time to realise you meant you had changed the test version and I needed to download it again. trying that now (with the \ecs -> \os2 change)
Yes, so you did not redownloaded my test version.
PS: With your latest log (putty5.txt), I see no hang at all. Does it look hanging for you still? As I see all IFS functions completed and no more hang in FindPathCluster.
PPS: Yes, timings/delays are because of added debug code. You can remove the /monitor switch and there will be no delays. BTW, you can start f32mon.exe before testing, this will temporarily enable debug messages. When you close f32mon, debug messages are switched off. This is more convenient method to use instead of permanently enabling it with the /monitor switch.
comment:115 by , 8 years ago
All earlier versions are available as WPI and I dont know who made them then.
For Ecomstation a lot of stuff, including FAT32.IFS, goes into \ecs\boot and the WPI version puts it there. Obviously NOT for ARCAOS :-)
So long as the config is updated each time by the WPI process it obviously doesnt matter. It did matter with a manual download, and I used the same instructions as I received when test the USB suite: unzi evewrything to root. I decided the extra files wouldnt matter.
Yes, putty5 refers to a stuck state. It is entirely conceivable that there is some interaction somewhere with something. I will try to generate something more obvious. Thanks for the tip about f32mon - that would be very useful.
by , 8 years ago
Attachment: | putty6.txt added |
---|
by , 8 years ago
Attachment: | putty7.txt added |
---|
comment:116 by , 8 years ago
Two stuck states
1) putty6.txt
with monitor off: boot, open OS2 window, open DOS window In terminal start PUTTY In OS2 window obey echo starting >com1 f32mon In DOS window obey ne \\temp\popdir.bat
and it sticks at that point
2) putty7.txt
with monitor off: boot, open OS2 window, open DOS window In DOS window obey ne \\temp\popdir.bat In terminal start PUTTY In OS2 window obey echo starting >com1 f32mon in DOS window (in the NE processor) make a change to the dataset and type F3 (close)
system is stuck there!
I hope these help.
Timing obviously makes a big difference; things that get stuck running in monitor mode work with monitoring off.
by , 8 years ago
Attachment: | putty8.txt added |
---|
comment:117 by , 8 years ago
Bizarre!
Open Os2 window open Dos window in os2 window echo starting >com1 f32mon then system stuck!
the F32mon command is the last thing on the OS2 window (no monitor trace) putty8.txt is the result of issuing the command f32mon in the OS2 window
comment:118 by , 8 years ago
putty9
(this is what I was trying to do when putty8 happened)
Open Os2 window open Dos window in os2 window echo starting >com1 f32mon In dos window (in D: a FAT32 volume) NE temp\popdir.bat ====system stuck
then system stuck!
by , 8 years ago
Attachment: | putty9.txt added |
---|
comment:119 by , 8 years ago
For Ecomstation a lot of stuff, including FAT32.IFS, goes into \ecs\boot and the WPI version puts it there. Obviously NOT for ARCAOS :-)
Yes, my main machine is MCP2 and even on eCS I keep all my IFS'es in \os2\boot. And ArcaOS, I suspect, have no \ecs subdirectory. So, it would be not very clever from my side to install binaries to \ecs\ :) And to have two different .WPI's for eCS and for other systems is not a clever idea :)
It looks that it hangs when reading sectors. I added some other debug messages. It looks all these logs include calling ReadSector from different places, and the hang is in ReadSector. I suspect that you have hitted bad sectors and DOVOLIO call hangs. Need to check this. Please test one of your testcases when it hangs.
comment:120 by , 8 years ago
PS: the link to the test version is: ftp://osfree.org/upload/fat32/test/fat32-0.10-r278-os2.zip
comment:121 by , 8 years ago
BTW I dont know if I told you that both on my desktop and the T61 the drives are SSD. So bad sectors less likely.
by , 8 years ago
Attachment: | putty10.txt added |
---|
comment:123 by , 8 years ago
Ok, now please, test ftp://osfree.org/upload/fat32/test/fat32-0.10-r278-os2.zip (reuploaded). The same testcase as with putty10.txt
by , 8 years ago
Attachment: | putty11.txt added |
---|
comment:124 by , 8 years ago
putty11: call of F32MON from OS2 window resulted in stuck state with the attached log (loop?)
by , 8 years ago
Attachment: | putty12.txt added |
---|
comment:126 by , 8 years ago
What's the difference between 11 and 12? Both logs look incomplete, or it hanged not in fat32.ifs. 12th log ends with
fpc008: GetNextCluster: ulCluster=fffffff
so, the loop should end and reach the mark fpc009, but I don't see fpc009 in the log. No places between fpc008 and fpc009, where it could hang. So, it looks like it hanged not in fat32.ifs, but somewhere else. Are you sure it hanged at fpc008 and there were no lines after
fpc008: GetNextCluster: ulCluster=fffffff
?
comment:127 by , 8 years ago
i captured all there was. The only command issued was F32MON on the OS2 Window (in both cases).
I will try again
comment:128 by , 8 years ago
Hehe, "D:\FAT32.LOG". So, the log is on your FAT32 drive, that's because it hangs when you start f32mon.exe. Try starting it with other current drive. I still don't understand, why it hangs when accessing FAT32 drive, because the hang is somewhere else.
BTW I dont know if I told you that both on my desktop and the T61 the drives are SSD. So bad sectors less likely.
Why do you think so? SSD's are indeed a special kind of flash memory and it can be corrupted. The bad sectors should be replaced transparently to spare sectors, in theory, but there is some limit for spare sectors. (As it is with hard disk, too). So, when the limit is exhausted, the bad sectors cannot be replaced, hence errors. And, there should be simply faulty drives.
by , 8 years ago
Attachment: | putty13.txt added |
---|
comment:129 by , 8 years ago
i captured all there was.
So, it looks that it hanged not in fat32.ifs. No idea why it could be hanged, except for a hardware error.
comment:130 by , 8 years ago
OK did not see your last comment before trying again.
putty13 is
open os2 window echo F32MON }} stuck and I checked that putty13 has all the lines that were on PUTTY. How do I change the log that FAT32MON is using?
comment:131 by , 8 years ago
And in the log #13 it hanged on calling the kernel function for reading sectors -- here bad sectors are very likely.
> open os2 window > echo > F32MON stuck and I checked that putty13 has all the lines that were on PUTTY. How do I change the log that FAT32MON is using?
You cannot change the log name. It is always called fat32.log and is created in the current directory. Just change the current drive and current diretory.
comment:132 by , 8 years ago
OK easy. Just to play safe I am running a bad sector scan (DFSEE) which has showed nothing so far. I will kill that and retry the monitor run
comment:133 by , 8 years ago
putty14
open os2 window ensure NOT d: (logging to R: JFS formatted) F32MON open DOS window stuck
NB the autoexec for dos window accesses d:\temp
by , 8 years ago
Attachment: | putty14.txt added |
---|
comment:134 by , 8 years ago
I have restarted the bad sector scan on partition D to get that out of the equation. Will take about an hour to complete.
(As I said in an earlier post, I can generate the stuck state on my T61 which does i think rule out bad sectors except an odd coincidence)
comment:135 by , 8 years ago
(and again as in an earlier post, I see no problems with version 913, also tending to rule out a hardware problem)
comment:136 by , 8 years ago
Please, test ftp://osfree.org/upload/fat32/test/t1.zip (only IFS and .sym file, install it over previous IFS). And, the please use the same testcase as in log #13 (f32mon log with current disk d:)
And please, test two times (two logs), to get stable results
comment:137 by , 8 years ago
a) dfsee completed showing no errors
b) just an observation: without the monitoring it is much rarer to get a stuck state. With the monitoring almost everything seems to stick -- timing problem?
comment:138 by , 8 years ago
putty 15a and 15b
reboot, open os2 window scho starting >com1 F32mon
system sticks; after that monitor output continues for some time (improve baud rate?)
by , 8 years ago
Attachment: | putty15a.txt added |
---|
by , 8 years ago
Attachment: | putty15b.txt added |
---|
by , 8 years ago
Attachment: | putty16a.txt added |
---|
comment:140 by , 8 years ago
Putty16a
same scenario. This time the output looked as though it was looping (not stopping!) and I had to termnate it (at the PUTTY end)
comment:141 by , 8 years ago
And now, please, t3.zip from the same place! No need to terminate! It is executed in a loop normally. The log may be big.
by , 8 years ago
Attachment: | putty17.txt added |
---|
comment:143 by , 8 years ago
So, nothing new. It has been hanged in the place where it normally cannot hang -- on exit from ReadSector, sectors read done ok, and no place it can hang. It looks like it hanged somewhere else, but not in fat32.ifs. So, no ideas yet. Probably, you're right, some timing issue -- it hanged earlier than in #15a/#15b. Probably, it was caused by delay added by additional debug code.
comment:144 by , 8 years ago
So what now? I could repeat the process on a different "stuck" perhaps?
Could the stuck state be a control loop?
comment:145 by , 8 years ago
I have no idea. Except for testing without an SSD, to see that error is only occurs with an SSD. The loop terminates normally. It hangs somewhere outside fat32.ifs.
comment:146 by , 8 years ago
Please also try ftp://osfree.org/upload/fat32/test/t4.zip and check if it hangs with it.
comment:147 by , 8 years ago
It looks like it hangs after some delay after accessing the fat32 disk. If I add more debug messages, the earlier in the code it will hang. And hang not immediately after accessing the media, but after some delay -- already not in the place of access.
I added the code for checking for bad sectors, which was commented out, since version 0.9.13. Maybe, it will help.
by , 8 years ago
Attachment: | putty20.txt added |
---|
comment:148 by , 8 years ago
Running with T4 logged to putty20
open os2 window echo starting in D:(FAT32) > com1 .... DID NOT STICK ... open DOS window ... stuck
comment:149 by , 8 years ago
A second attempt: putty21
{{open os2 window (JFS volume) echo starting in R:(JFS) > com1 .... DID NOT STICK ... open DOS window .... DID NOT STICK ... open second OS2 window ... open did not complete }}}
by , 8 years ago
Attachment: | putty21.txt added |
---|
comment:150 by , 8 years ago
No need in logs. Actually, I removed all debug messages, as they are not needed. As I said, it hangs somewhere else for unknown reason. No idea why.
comment:151 by , 8 years ago
But version 913 gives me no hangs.... (I know, timing issues can affect anything!)
Thanks for help so far
comment:152 by , 8 years ago
One more idea. You have the following parameters in config.sys:
IFS=D:\OS2\BOOT\FAT32.IFS /CACHE:2048 /H /Q CALL=D:\OS2\CACHEF32.EXE /S /F
Can you try with these parameters:
IFS=D:\OS2\BOOT\FAT32.IFS /CACHE:2048 /H /AC:* /LARGEFILES /FAT /EXFAT /EAS CALL=D:\OS2\CACHEF32.EXE /F /P:2 /M:50000 /B:250 /D:5000
Will it change something?
comment:153 by , 8 years ago
Do you think \EAs is important as I dont really want to generate EAs in those partitions?
comment:155 by , 8 years ago
Without the \EAS
essentially no difference
I tried the two things that reliably stuck the system before the debug version testing:
a) delete a particular file on a FAT32 volume
b) edit a particular dataset on the FAT32 volume
The first try at delete generated a forced reboot (!!!) -- which is a change, but after that I just got stuck states
comment:156 by , 8 years ago
Maybe, /eas are unimportant, but I don't know. It's just parameters I have on my system. Ok. Then no ideas so far.
comment:158 by , 8 years ago
It occurred to me to wonder if there may be instances in which you ae writing outside a buffer and corrupting some other process's data, so the IFS is exiting correctly but then the calling process fails.
Just trying to make sense of this puzzle.
comment:159 by , 8 years ago
More evidence for corruption effects are two things I didnt mention earlier while testing with the debug versions:
1) Once the system spontaneously rebooted
2) a few times there were system traps but not in a specific place; once in OS2KRNL, at least once in ACPI
If it would be helpful I could experiment with running debug versions with the system set to take a dump
comment:160 by , 8 years ago
Why are you so sure that traps in kernel or ACPI are related to fat32? I'd suspect that is another problem. If fat32 could corrupt memory of kernel or ACPI, it would trap on my machine too, but I have no such experience. And rather than creating dumps, you'd better get a trap screen with symbols loaded and /monitor option enabled, to see what was going on in fat32.ifs that time.
comment:161 by , 8 years ago
oh, I am not sure at all. It is simply that these trap etc did not happen at other times.
I can easily try to get a trap screen. It probably needs a debug version (just to change the timing).
I assume the "Tn" versions are still around?
comment:162 by , 8 years ago
I can easily try to get a trap screen. It probably needs a debug version (just to change the timing).
Yes a debug kernel, plus a debug kernel symbol file and fat32 symbol files. Which timing?
I assume the "Tn" versions are still around?
What do you mean?
comment:163 by , 8 years ago
Timing: if you recall as you increased the amount of debug code more and more things wouldnt work, so I assume there is some timing effect.
Debug Kernel: if you recall I have never got the debug kernel beyond a black screen even with a kdb.ini file.
Tn versions: these were versions you made available on the ftp site for download with extra trace information.
comment:164 by , 8 years ago
What is the timing for? Debug version of kernel, not the IFS.
I wonder why are you getting the black screen, because the debug kernel is the same as non-debug kernel and it does not need something special. You just should copy the kernel (renamed to os2krnl) plus a *.sym file to the root directory as usual, plus create kdb.ini file in the root.
Why do you need t*.zip? They are not needed anymore. You should use latest svn revision.
comment:165 by , 8 years ago
Debug kernel. As you recall I am using QSINIT so I dont need to rename the kernel. I did create kdb.ini and it did not help. Post processing config.sys the system did not initialise the PM screen but remained in a black empty screen not responding to anything. I even tried "g" on the terminal to no effect.
Anyway I dont need the debug kernel to cause these problems; why are you suggesting I use it?
T*.zip.. the only reason I suggested using them was to change the timing (within FAT32.IFS) as that seemed to result in more and different things failing, and I only got the traps in that state.
comment:166 by , 8 years ago
Debug kernel. As you recall I am using QSINIT so I dont need to rename the kernel.
Ok, It does not matter. Then not rename it.
I did create kdb.ini and it did not help. Post processing config.sys the system did not initialise the PM screen but remained in a black empty screen not responding to anything. I even tried "g" on the terminal to no effect.
Are you use SMP or UNI kernel? Are you sure you have the same type debug kernel? Because If you change kernel from UNI to SMP you should change doscal1.dll correspondingly. If you don't change doscal1.dll, many usermode processes, including pmshell.exe, will trap.
Anyway I dont need the debug kernel to cause these problems; why are you suggesting I use it?
I need to see last fat32 debug messages, plus a trap screen on the terminal. For that, I need the debug kernel. Non-debug kernel only show the trap screen on the screen.
comment:167 by , 8 years ago
I will check but believe I am using SMP kernel and the corresponding debug kernel.
I will try using the debug kernel again, but if that doesnt work I will set up SYSDUMP which has the trap screen recorded.
comment:168 by , 8 years ago
So, you are sure you are using the debug SMP kernel and a SMP version of doscal1.dll? If so, Then probably, there is a trap somewhere, that's because you have a black screen. The trap screen should be on a debug terminal.
I will try using the debug kernel again, but if that doesnt work I will set up SYSDUMP which has the trap screen recorded.
SYSDUMP doesn't show what's happenned in fat32.ifs just before trap. I need both fat32.ifs debug messages and a trap screen, with all debug symbols added.
PS: how do you use system dumps on newer systems? They have > 2 GB of ram. Here, dumpfs.ifs could help, but its os2dump version does not understand disks > 512 GB with 127 or 255 sectors per track. (I have a 2 TB disks and I am unable to get dumpfs working).
comment:169 by , 8 years ago
PS: how do you use system dumps on newer systems? They have > 2 GB of ram. Here, >dumpfs.ifs could help, but its os2dump version does not understand disks > 512 GB with >127 or 255 sectors per track. (I have a 2 TB disks and I am unable to get dumpfs >working).
I havent done it for a while but the first thing is to use QSINIT to restrict the memory to (say) 512Mb, then sysdump should be fine.
Also my SSDs are all 512Gb so should be fine.
SYSDUMP doesn't show what's happenned in fat32.ifs just before trap. I need both >fat32.ifs debug messages and a trap screen, with all debug symbols added.
noted. If you recall I never solved the problem of getting intelligable output without sending "echo xxx >com1" from the MUT. SO I would not have seen the trap dump info on the terminal. everything seemed set up correctly. Needs more thought
comment:170 by , 8 years ago
I have fixed the problem of garbage characters on the terminal. Mismatch of baud rate. Apparently the debug kernel default is 115200.
I have been pointed to a feature of QSINIT to set baudrate, and set it to 9600 to match the default of PUTTY. I then booted the debug kernel and indeed it ended on a trap (putty30)
For future work I will set PUTTY to 115200 also
by , 8 years ago
Attachment: | putty30.txt added |
---|
comment:171 by , 8 years ago
Ok, good progress. But I see nothing useful here :( Null pointer dereference somewhere. But no info where.
Also, please add /monitor switch in the future.
comment:172 by , 8 years ago
yes, that was "proof of concept" but it does support your theory that the black screen was caused by a trap.
Any suggestions as to the cause of the trap?
comment:173 by , 8 years ago
Any suggestions as to the cause of the trap?
I don't know. Maybe, the kernel problem. Maybe, we can try another kernel. Or some driver. No idea, which.
comment:174 by , 8 years ago
Could you try the "k" command in the terminal (after it trapped), to see the call stack? (Did you added it to kdb.ini?)
by , 8 years ago
Attachment: | putty31.txt added |
---|
comment:175 by , 8 years ago
booted with the debug kernel that came with ARCAOS, and with /monitor for the FAT32.IFS
Still a trap 1 (missing page)
Putty31
by , 8 years ago
Attachment: | putty31a.txt added |
---|
comment:176 by , 8 years ago
I ran it again and remembered to type k output in putty31a
kdb.ini is there and does have a k at the end.
by , 8 years ago
Attachment: | putty40.txt added |
---|
comment:177 by , 8 years ago
OK. Steven Levine (kernel expert) has helped.
the trap is something the kernel debugger has noticed, and typing gt gets beyond it (g for go and t to ignore kernel traps)
So after that I did various things and eventually went to a Dos window, used the ne precessor to open a file, changed it a little. exited, and it "stuck" during write back. The log seems to indicate that FAT32 processing is incomplete.
Very strangely the next thing that happened was a spontaneous reboot.....
the log is in putty40
comment:178 by , 8 years ago
To putty31a.txt: Nothing useful here. Too little data.
To putty40.txt: It rebooted when reading sectors. (When moving file into DELDIR). Can you reproduce that?
comment:179 by , 8 years ago
putty41
Booted as before (with gt); system stuck while still processing startup commands.
Putty42
After clearing the log the apparently stuck system suddenly generated this
putty43
Another "stuck during startup"
putty44
stuck AFTER startup process (apparently) terminated. (I have no idea why a call was made to open WP51. seems to happen every startup...)
putty45
(this is the effect I was trying to replicate)
After completing the startup process
open dos window (in d:) obey "ne temp\popdir.bat alter it a bit F3 (close) try to save and the system sticks no forced reboot this time
by , 8 years ago
Attachment: | putty41.txt added |
---|
by , 8 years ago
Attachment: | putty42.txt added |
---|
by , 8 years ago
Attachment: | putty43.txt added |
---|
by , 8 years ago
Attachment: | putty44.txt added |
---|
by , 8 years ago
Attachment: | putty45.txt added |
---|
comment:180 by , 8 years ago
putty41
Booted as before (with gt); system stuck while still processing startup commands.
Why with "gt"? Use "g". Which startup? What do you mean? Those scripts executed on cmd.exe startup? It hitted a breakpoint somewhere in kernel. It should not happen normally. It is probably, because you used "gt". Don't use "gt", use "g"!
putty42
The log output from fat32.ifs is entered in kernel debugger. Why this?
putty43
Trap8, probably, because of using "gt". Never use it!
putty44
nothing understood. What happens every startup? Your startup scrips execute WordPerfect for DOS?
putty45
Here, please, I need a more detailed log. Please, take a log with ftp://osfree.org/upload/fat32/test/fat32-0.10-r283-os2.zip.
comment:181 by , 8 years ago
Answering some queries.
1) I am not an expert in the use of the debug kernel, on the contrary. So I asked an expert and got the advice to use gt
Quote:
BIOS and DOS emulation components of a DOS VDM use hlt instructions to make OS/2 kernel requests. Every now and then the debug kernel notices them. Usually a gt is sufficient to continue execution.
Certainly if you wish me to use just g I will. But it doesnt sound like t should cause traps.
2) Startup commands
I mean all those things that have been set up (by system or by user) that should complete before user activity
In my case this means
startup.cmd in root and whatever is called from there Startup folder Xworkplace startup folders
3) Startup scripts
Your startup scrips execute WordPerfect? for DOS?
I saw that and it was a total surprise to me. I cannot think of any place where this should be happening (though I agree the trace shows it). I have looked quite hard and not found it. Will have to look harder. I cannot even think of any reason for it. (Though I do use wordperfect for DOS)
4) I will download the test version and try again (later)
comment:182 by , 8 years ago
Comment about putty44
What I meant was that all the commands called directly or indirewctly during startup (as listed above) had completed apparently properly BUT the mouse and keyboard were inoperative and the system appeared stuck.
Before making a test which I hope will show FAT32.IFS problems I wait for the system to go quiet to avoid confusion. In this case it stuck at that point.
comment:183 by , 8 years ago
You can't get beyond a trap normally. (maybe, it will only help in case of a breakpoint trap, but not a pagefault or a protection violation). If you trapped and trying to continue, it will necessarily trap again. I don't need to continue after trap, I need to stop and see a trap screen.
Is there something suspicious in autoexec.bat/startup.cmd/script executed on cmd.exe start/in other command files, executed automatically/by a scheduler (if you hae one)? or startup folders?
comment:184 by , 8 years ago
Can you share a 14.200 kernel with me? (Together with a .sym file, if possible)? Better, the debug one. (You can try email it to me). I want to check editing with your ne.exe on this kernel. Maybe, this kernel has a broken VDM?
comment:185 by , 8 years ago
No point in sharing a kernel. I am doing these tests on ECS22. The debug kernel says it is 103a (I think). certainly not 200. The debug kernel that comes with ARCSOS is also an old kernel.
I dont see anything suspicious in the startup chain.
I think the point about qt was that the system is indeed stopping on a trap but far too early in the process. Still I will try using g by itself and see
by , 8 years ago
Attachment: | putty46.txt added |
---|
comment:186 by , 8 years ago
Putty46
I dont know if this will be any use but...
Using R283 as requested, and moving beyond the initial trap by typing g
The system stops during the process of processing the startup commands/folders; I have typed nothing to the ECS system.
NB the debug kernel is 104a
comment:187 by , 8 years ago
I asked you to test a situation like in putty45.txt, where the popdir.bat is deleted into DELDIR (after closing a file in NE.EXE). So, putty46.txt is a wrong log file. It hits a int3 (a breakpoint), so it is a problem with a kernel. Please use a testcase in putty45.txt.
Using R283 as requested, and moving beyond the initial trap by typing g
Which initial trap? int 3? I only see int 3 in the log, there is nothing after it.
comment:188 by , 8 years ago
May be, initial trap is the right trap, and int 3 is occured after you pressed "g"? Then why I don't see that the very first trap in the log?
comment:189 by , 8 years ago
Doing my best! I cannot repeat the situation in putty45 unless the system initialises itself to the point where I can open a DOS window and use the keyboard. I had two tries and both stopped on that trap 3 at which point I can do nothing.
I dont know why the earlier trap 1 isnt in the log; maybe putty has a fixed size buffer?
If you want to see the trap 1 I can capture it but it is so early in the process it seems unlikely to be relevant (it is before the PM screen is initialised).
I took a quick look at putty46. VERY strangely there is a reference to d:\d\qw98.exe and I know of no reason why that should be accessed during system startup (but that is not your problem, I think!) which is very similar to the strange appearance of WP51 in an earlier log.
comment:190 by , 8 years ago
I found this and will change it!
In the PuTTY configuration window, select Window in the category tree on the left. Change the Lines of scrollback to whatever value you wish. The default is 200.8 May 2013
by , 8 years ago
Attachment: | putty47.txt added |
---|
comment:192 by , 8 years ago
putty47
I did the same again and it stopped in the same place (trap3). This makes it look to me that once again extra tracing changes the timing so things go wrong at a different place.
I tried to upload the 10000 line putty47 but TRAC doesnt allow files >200K; in any event the inital trap 1 was more than 10000 lines back
If you want to see the trap 1 I will catch it for you.
comment:193 by , 8 years ago
I have changed putty to keep 100,000 lines and a little bit of experimentation shows that all the trace up to the first trap 1 is 6000 lines (202Kb) and the trace up to the int 3 (which does keep repeating) is 10000 lines (377Kb).
So if you want the longer traces it will have to be by email.
comment:194 by , 8 years ago
I also booted the ecs system with the nondebug kernel and the system initialised correctly. I then opened a dos window as in putty45, and got the same freeze at the same place. The putty script is very long - i could email it.
I realise why there are mysterious references to various programs in D: in the startup process: OS2 is verifying the desktop icons. This does NOT explain the D:\wp51 as I dont have an icon for that but does explain all the others.
Please advise if you wish me to email the logs
by , 8 years ago
Attachment: | putty53.txt added |
---|
by , 8 years ago
Attachment: | putty54.txt added |
---|
comment:195 by , 8 years ago
Putty53 and putty54 both with Putty buffer set to 2000 lines
remove /monitor from fat32.ifs
1) boot with normal kernel wait for system to complete initialisation open os2 window: command F32MON open DOS window: ne temp\popdir.bat make a change; F3; y to file back --- system stuck record as putty53
2) boot with debug kernel same as (1) BUT after the y to file back the system did a spontaneous reboot recorded as putty54
comment:196 by , 8 years ago
(see also #53)
following Steven Levine's analysis of that problem I tried using the OS4 kernel and the above commands do not freeze the system.
so the preliminary analysis suggesting kernel stack overflow may be right. It would certainly account for random-ish effects (traps, reboots, freezes).
If so, what to do about it?
comment:197 by , 8 years ago
This looks to be the good news. So, it seems that the issue is with insufficient kernel stack, indeed. Please do more stress-tests with your usual task, and see if it will crash or hang.
If so, what to do about it?
I'll probably need to minimise the number of nested function calls in the IFS. So, I probably need to do some optimizations, for lesser stack usage.
comment:198 by , 8 years ago
... and when you make the changes please testthem against an OS2 kernel.
comment:199 by , 8 years ago
See also issue #53
R285 tested and shows no problems both in ECS22 and ARCAOS or with the debug kernel ( in earlier testing this showed diffferent problems).
comment:200 by , 8 years ago
Possible trap/stuck in FAT32.
The only reason I have to support this is that I get the trap using r285 but not when using 913. Using the retail kernel the system sticks: using the debug kernel I get the following (nd nothing else) on the terminal:
Trap 13 (0DH) - General Protection Fault 0000 eax=00004648 ebx=0000fff0 ecx=fe0f4050 edx=f9304648 esi=fd2eafd6 edi=f7460000 eip=000007e6 esp=00005bb4 ebp=00005bce iopl=0 rf -- -- nv up ei ng nz na pe nc cs=0be8 ss=1530 ds=0bf0 es=4050 fs=0000 gs=0000 cr2=00b70000 cr3=bf0b7000 p=03 0be8:000007e6 268a1f mov bl,byte ptr es:[bx] es:fff0=invalid
================
I am trying to set up a trap dump
comment:201 by , 8 years ago
The activity is to click a "file" pulldown in an application (on a FAT32 volume). The trap happens 100% of the time.
I get the same trap in ARCAOS : the only difference is that in ARCAOS I get a proper trap screen saying trap D in FAT32 (so matching the above trap 13).
I have taken a sysdump and compressed it with 7Z and will send by email to valerius and Steven Levine.
comment:202 by , 8 years ago
please also note that this trap does *not* happen with the OS4 nucleus, so I guess there is another stack problem overlooked (?).
comment:203 by , 8 years ago
A trapdump over email? Isn't it too big? Please try this build first: ftp://osfree.org/upload/fat32/test/fat32-0.10-r285-os2.zip (as I fixed one trap 000d since r285).
comment:205 by , 8 years ago
Ok, committed changes to the repository. Now they available as r286. For those who needs a version without exFAT, it is available too (see ftp://osfree.org/upload/fat32/).
comment:206 by , 8 years ago
A further trap 13 using release 288
Trying to use the debuug kernel to investigate this I find that despite BLDlevel showing both FAT32.IFS and CACHEF32.exe as R288, when using the debug kernel, when CACHEF32.EXE is called in config.sys it reports that FAT32.IFS is R285.
The boot will not proceed beyond that point.
i will check whether I have done something wrong but is it self identifying correctly?
by , 8 years ago
Attachment: | putty60.txt added |
---|
comment:207 by , 8 years ago
Re the above: FAT32.IFS was coming from \ecs\bin and not \os2\boot. The perils of using WPI.....
I now have sorted that out and back to the trap 13
version r288: obeying
xcopy d:\dfs f:\dfs
where d: is FAT32 and F: is JFS gets a trap 13 (as shown in the terminal log in PUTTY60)
comment:208 by , 8 years ago
I did xcopy /h /o /t /s /e /r /v w:\dfsee l:\dfsee\, where w: is a FAT32 drive and l: is a JFS drive -- everything copied well. Could you, please, show me the log with "/monitor" enabled?
comment:209 by , 8 years ago
Re the above: FAT32.IFS was coming from \ecs\bin and not \os2\boot. The perils of using WPI.....
WPI should not install into \ecs. Or it is not my .WPI.
by , 8 years ago
Attachment: | putty61.txt added |
---|
comment:211 by , 8 years ago
Please test ftp://osfree.org/upload/fat32/test/fat32-0.10-r289-os2.zip (with f32mon log enabled, of course)
by , 8 years ago
Attachment: | putty62.txt added |
---|
comment:212 by , 8 years ago
Putty62:
using R289 - same trap 13
NB1 booted without /monitor and obeyed F32MON before starting the xcopy
NB2 if helpful I can get a dump
comment:213 by , 8 years ago
Ok, thanks. Now t1.zip at the same place (install it over the previous archive).
by , 8 years ago
Attachment: | putty63.txt added |
---|
comment:215 by , 8 years ago
In case it makes a difference I should also point out that the XCOPY command is being called from a CMD file on a FAT32 formatted partition.
by , 8 years ago
Attachment: | putty64.txt added |
---|
comment:218 by , 8 years ago
Damn, it now crashes not in FS_READ. Please get a trap in FS_READ. I need stable results. Or, I'll need to reproduce it on my machine.
comment:219 by , 8 years ago
ERROR:MakeChain:Cluster 3c2 is not free!
Do you have exFAT? I never had this error on FAT32.
comment:220 by , 8 years ago
No info, from where did the FS_WRITE call appeared. The log looks to be incomplete, so it is useless. Please try taking the log one more time.
comment:221 by , 8 years ago
1) I do not have exfat
2) I carried out exactly the same procedure as the previous time so cannot explain differences unless there are timing issues.
Will try again.
by , 8 years ago
Attachment: | putty65.txt added |
---|
comment:223 by , 8 years ago
sorry: one (maybe) minor difference.
I switched to the boot volume (R:) before calling F32MON (so the log is going to a JFS volume) instead of to a FAT32 volume.
comment:224 by , 8 years ago
Ok, thanks. Now much better. If you put log to fat32 disk, you make things harder to understand and introduce additional problems. Please don't put log on the fat32 disk anymore.
Now make the same test with t3.zip from the same directory.
by , 8 years ago
Attachment: | putty66.txt added |
---|
comment:227 by , 8 years ago
T4 worked. the (2) xcopies completed normally. I have attached putty67 in case it is useful
by , 8 years ago
Attachment: | putty67.txt added |
---|
comment:229 by , 8 years ago
Committed the new r290 revision, which corrected several errors, including this buffer overrun in FS_READ/FS_WRITE, which we were talked about.
PS: build is available at the usual place (ftp://osfree.org/upload/fat32/ and http://trac.netlabs.org/fat32/)
comment:231 by , 7 years ago
Ok, this looks as the problem with corrupted file system, which caused the hang. This is fixed when you reformatted the stick. So, I'll close this.
comment:232 by , 7 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
Why did the system froze? So, it occurs each time you start a .bat file with a DOS program? I have no WordPerfect. I created a .bat file starting Volkov Commander -- seems to work fine and no FS corruption. How could I reproduce this on my machine? What FAT errors that were? Trash in directories or FAT table corruption, or? What is CHKDSK output? Could you take a f32mon log while starting WordPerfect? Is it FAT16 or FAT32 partition?
PS: Sometimes i get trash in directories (while the FS is accessible and files can be opened fine, so you can always backup data and reformat the FS). But I cannot reproduce this yet.
PPS: I don't know which version is included in ArcaOS as I have no ArcaOS. You need the latest version r257 from here: ftp://osfree.org/upload/fat32/.