Opened 7 years ago

Closed 7 years ago

#46 closed defect (fixed)

FAT corruption and stuck system (ARCAOS/DOS full screen/R249)

Reported by: Barry Landy Owned by:
Priority: minor Milestone: Future
Component: IFS Version:
Severity: high Keywords:
Cc: bl10@…

Description

I have a newly made ARCAOS system.

My previous system was ECS22 beta, presumably using version 913 (2008) and no ill symptoms.

I dont know which release is part of ARCAOS but I found that obeying a CMD file from a FAT32 formatted disk could cause the contents that file to vanish (not the file iteslf). There appeared to be other corruption associated with that problem.

Not unreasonably LEwis Rosenthal recommended that I try R249, which I installed using the WPI version.

The above problem did not occur. However running a DOS in a full screen window (which I need to do for other reasons) and obeying a BAT command (calling word perfect) all seemed well until I exited word perfect when the systen froze.

On the forced reboot the partition showed lots of errors to the extent that it said "use SCANDISC under Windows".

Reverting to the "stable release" 913 these problems do not occur.

I arranged for full backup of the partitions that are vulnerable and tried again, with very similar results so the effect is repeatable.

I dont know what diagnostics would help.

Attachments (46)

NE.EXE (167.5 KB) - added by Barry Landy 7 years ago.
POPDIR.BAT (9 bytes) - added by Barry Landy 7 years ago.
dtermlog.001 (6.0 KB) - added by Barry Landy 7 years ago.
putty1.txt (4.0 KB) - added by Barry Landy 7 years ago.
config.sys (9.4 KB) - added by Barry Landy 7 years ago.
cmd.zip (1.0 KB) - added by Barry Landy 7 years ago.
putty2.txt (36.5 KB) - added by Barry Landy 7 years ago.
putty3.txt (1.1 KB) - added by Barry Landy 7 years ago.
putty4.txt (3.9 KB) - added by Barry Landy 7 years ago.
putty5.txt (3.7 KB) - added by Barry Landy 7 years ago.
putty6.txt (1.9 KB) - added by Barry Landy 7 years ago.
putty7.txt (1.1 KB) - added by Barry Landy 7 years ago.
putty8.txt (798 bytes) - added by Barry Landy 7 years ago.
putty9.txt (1.7 KB) - added by Barry Landy 7 years ago.
putty10.txt (4.0 KB) - added by Barry Landy 7 years ago.
putty11.txt (6.9 KB) - added by Barry Landy 7 years ago.
putty12.txt (9.7 KB) - added by Barry Landy 7 years ago.
putty13.txt (9.7 KB) - added by Barry Landy 7 years ago.
putty14.txt (3.1 KB) - added by Barry Landy 7 years ago.
putty15a.txt (7.2 KB) - added by Barry Landy 7 years ago.
putty15b.txt (7.2 KB) - added by Barry Landy 7 years ago.
putty16a.txt (40.2 KB) - added by Barry Landy 7 years ago.
putty17.txt (42.7 KB) - added by Barry Landy 7 years ago.
putty20.txt (1.2 KB) - added by Barry Landy 7 years ago.
putty21.txt (1.2 KB) - added by Barry Landy 7 years ago.
putty30.txt (18.0 KB) - added by Barry Landy 7 years ago.
putty31.txt (79.4 KB) - added by Barry Landy 7 years ago.
putty31a.txt (513 bytes) - added by Barry Landy 7 years ago.
putty40.txt (96.0 KB) - added by Barry Landy 7 years ago.
putty41.txt (86.8 KB) - added by Barry Landy 7 years ago.
putty42.txt (54.4 KB) - added by Barry Landy 7 years ago.
putty43.txt (92.0 KB) - added by Barry Landy 7 years ago.
putty44.txt (91.2 KB) - added by Barry Landy 7 years ago.
putty45.txt (92.5 KB) - added by Barry Landy 7 years ago.
putty46.txt (83.3 KB) - added by Barry Landy 7 years ago.
putty47.txt (193.5 KB) - added by Barry Landy 7 years ago.
putty53.txt (31.8 KB) - added by Barry Landy 7 years ago.
putty54.txt (77.5 KB) - added by Barry Landy 7 years ago.
putty60.txt (83.8 KB) - added by Barry Landy 7 years ago.
putty61.txt (101.3 KB) - added by Barry Landy 7 years ago.
putty62.txt (83.4 KB) - added by Barry Landy 7 years ago.
putty63.txt (85.7 KB) - added by Barry Landy 7 years ago.
putty64.txt (83.8 KB) - added by Barry Landy 7 years ago.
putty65.txt (30.2 KB) - added by Barry Landy 7 years ago.
putty66.txt (40.5 KB) - added by Barry Landy 7 years ago.
putty67.txt (82.4 KB) - added by Barry Landy 7 years ago.

Download all attachments as: .zip

Change History (278)

comment:1 Changed 7 years ago by Valery V. Sedletski

Why did the system froze? So, it occurs each time you start a .bat file with a DOS program? I have no WordPerfect?. I created a .bat file starting Volkov Commander -- seems to work fine and no FS corruption. How could I reproduce this on my machine? What FAT errors that were? Trash in directories or FAT table corruption, or? What is CHKDSK output? Could you take a f32mon log while starting WordPerfect?? Is it FAT16 or FAT32 partition?

PS: Sometimes i get trash in directories (while the FS is accessible and files can be opened fine, so you can always backup data and reformat the FS). But I cannot reproduce this yet.

PPS: I don't know which version is included in ArcaOS as I have no ArcaOS. You need the latest version r257 from here: ftp://osfree.org/upload/fat32/.

Last edited 7 years ago by Valery V. Sedletski (previous) (diff)

comment:2 Changed 7 years ago by Barry Landy

Answering some questions: Why it froze: working on it! Each time: no

What FAT errors: I have no output as it did the automatic chkdsk on system restart and the output gets thrown away (I think).

All my FAT partitions are FAT32.

I have been testing with r249 and alternatively 913 (partly to try to distinguish FAT32 problems and ARCAOS problems). I have the following as part of a BAT which seems to reliably freeze the system with r249 and NOT with 913.

================== ::@echo off set popdir=%1 if "%popdir%"=="" set popdir=popdir set popdrive=%1 if "%popdrive%"=="" set popdrive=popdrive

cd > %disk2%:\temp\currdir copy %diskroot%\bat\cd.aux + %disk2%:\temp\currdir %disk2%:\temp\%popdir%.bat > nul :: To restore the pushed directory write %disk2%:\TEMP\%popdir%

.. and more lines follow

===================

disk2 is set to D diskroot to D:\d

with r249 the copy command is executed but none of the following lines are echoed. The system freezes at that point and I have to reboot. WIth 913 this does not happen.

I will try with r257 and add to this report.

comment:3 Changed 7 years ago by Barry Landy

I see exactly the same system freeze with r257. (but no corruption)

comment:4 Changed 7 years ago by Valery V. Sedletski

2landb: created the file test.bat like this:

rem :: @echo off
set diskroot=w
set disk2=w
set popdir=%1
if "%popdir%"=="" set popdir=popdir
set popdrive=%1
if "%popdrive%"=="" set popdrive=popdrive
cd > %disk2%:\temp\currdir
copy %diskroot%\bat\cd.aux + %disk2%:\temp\currdir %disk2%:\temp\%popdir%.bat > nul
rem :: To restore the pushed directory write %disk2%:\TEMP\%popdir%

I have a w: FAT32 disk with

set disk2=w
set diskroot=w

Created a file w:\w\bat\cd.aux with a "test" string in it. I see no freeze and no FS corruption. Files w:\temp\currdir and w:\temp\popdir.bat are successfully created.

comment:5 Changed 7 years ago by Barry Landy

Doesnt really surprise me !

In case it matters cd.aux contains

cd

(ie cd followed by space and NO CR)

so that concatenating the two files gives one line cd data to change directory (back).

(but I cant see why it matters)

comment:6 Changed 7 years ago by Valery V. Sedletski

What doesn't surprise you? That it does not freeze and no FS corruption? Maybe, it is ArcaOS is guilty, not fat32? Or, I miss something? I have no 14.200 kernel, for example.

PS: Did you tested the same fat32 version under eCS? Does fat32 0.9.13 have this problem, if tested with your ArcaOS?

Last edited 7 years ago by Valery V. Sedletski (previous) (diff)

comment:7 Changed 7 years ago by Barry Landy

I was not surprised that you could not replicate the effect

I have now tested with ECS22 and both version 913 and 257 on both systems

I get the same "stuck system" effect with version 257 on both ECS and ARCAOS and the effect does NOT occur with version 913 on either system.

comment:8 Changed 7 years ago by Barry Landy

I continue to see various stuck states. All using Dos. All usin software you may not have.

Can I send you a dos editing program and a small file to experiment with?

comment:9 Changed 7 years ago by Barry Landy

please close (see ticket #47)

comment:10 Changed 7 years ago by diver

new comment by the reporter in ticket #47. But as this ticket might have a lot more information, I add the comment from #47 in here and close #47

Since experiencing the stuck state (as I thought on ArcaOS) I have experimented with both CS and ARCAOS. The stuck states occur on both systems in all releases later than 913.

I now have a file on a FAT32 formatted volume that causes a stuck state on either system when attempting to delete it (DEL comand) in a DOS window. This effect is (so far) 100% consistent and oersisted despite remaking the volume by copying the files and directories.

Please let me know what diagnostic information would be useful.

comment:11 Changed 7 years ago by Valery V. Sedletski

now have a file on a FAT32 formatted volume that causes a stuck state on either system when attempting to delete it (DEL comand) in a DOS window. This effect is (so far) 100% consistent and oersisted despite remaking the volume by copying the files and directories.

Please let me know what diagnostic information would be useful.

You'd better describe how I could reproduce this on my machine. And please, try the latest version from ftp://osfree.org/upload/fat32/ again, and test if the problem is present in the latest version. (It could be gone on newer versions. I tried to delete a file from FAT32 partition from a DOS prompt -- it deletes it as it should, no stuck state.). And, you can give me a link to your DOS program, so, I could try to test it on my machine.

comment:12 Changed 7 years ago by Barry Landy

Will try the new version. I am fairly baffled at how to test on your machine. There is nothing special about the file that will not delete.

Changed 7 years ago by Barry Landy

Attachment: NE.EXE added

Changed 7 years ago by Barry Landy

Attachment: POPDIR.BAT added

comment:13 Changed 7 years ago by Barry Landy

No difference with R269 (sorry!)

I have attached two files ne.exe (the editor) and a small file (popdir.bat)

instructions to follow

comment:14 Changed 7 years ago by Barry Landy

ne is a free standing context editor (very old!)

the following causes a stuck state on my ECS22 system (with any version of FAT32 other than 913).

POPDIR.BAT is the small file I have attached. It is on a FAT32 volume (and in case it matters, so is NE.EXE)

call NE by

NE popdir.bat a command window opens showing the one line of data END to go to the end of the line; backspace to delete the last charecter F3 to exit in response to the prompt to update the file or do other things type y (for yes)

this is where for me the system gets into a stuck state

comment:15 Changed 7 years ago by Valery V. Sedletski

I start it as

ne popdir.bat

add an empty line, press F3, "y", Enter, it saves and quits. No stuck.

Do you mean by "stuck" that system hangs, or only a VDM hangs? Mouse pointer moving? Ctrl-Alt-Del works or only Reset?

comment:16 Changed 7 years ago by Barry Landy

I mean the entire system hangs and only reset will recover it.

(as I said before I am not surprised your system does not show this effect which is very hard on my system and which I can provoke in various ways in both a DOS window and a DOS full screen)

comment:17 Changed 7 years ago by Valery V. Sedletski

Then you need to give me enough info to have problem reproduced on my machine. Or try taking a COM port log. Do you have a COM port and a null-modem cable?

comment:18 Changed 7 years ago by Barry Landy

yes I have a com port (not activated). I have an old cable that was once used to transfer data between systems which I seem to recall is a null cable, or of course I could buy one.

Please point me at some info about how I would use that.

comment:19 Changed 7 years ago by Valery V. Sedletski

2landb: Ah, nice. You need another machine with COM port too, and connect both with the cable, plus a terminal emulator program on another end. It can be any OS. If the second machine has no COM port, you can use an USB2Serial adapter (you plug a null-modem cable into it). For OS/2, I use Prolific pl2303-compatible adapters. For Linux, the choice could be significantly wider. And on debugee side, you should specify the /monitor switch in fat32.ifs command line in config.sys. It permanently enables the debug messages. Then you should see the messages in the terminal. You can save the log and send it to me. I need a log taken when the system begins hanging. This is very good if you have a COM port and a cable.

comment:20 Changed 7 years ago by Barry Landy

OK good. I guessed I would need a second computer :-) and I have a laptop T61 which does have a com port (and anyway I have a docking station which certainly does). I can run ECS21 or windows 7; does it matter which?

What do I run in the monitoring system to see the messages?

comment:21 Changed 7 years ago by Valery V. Sedletski

2landb: Yes, very good. I have a COM port on my ThinkPad? X61T Docking station, too. This is how I usually debug fat32.ifs. It does not matter which OS you use -- only in that how is your available terminal emulator is good enough.

What do I run in the monitoring system to see the messages?

Terminal emulator. I use DTerm on OS/2: https://sites.google.com/site/os2acw/dterm_b10.zip. On windows, you can use HyperTerminal? or Putty. On Linux it is usually minicom or cu.

comment:22 Changed 7 years ago by Barry Landy

Nothing is ever simple. What I thought was a serial connector (blue) is in fact a VGA connector. Documentation says there is an internal serial header so a bigger project to connect to it

I assume it cannot be done via a D25 connection?

comment:23 Changed 7 years ago by Valery V. Sedletski

Nothing is ever simple. What I thought was a serial connector (blue) is in fact a VGA connector. Documentation says there is an internal serial header so a bigger project to connect to it

This is on a laptop itself, I assume. But you said, you have a COM port header on your docking station? I have the same.

I assume it cannot be done via a D25 connection?

Usually a PC has a DB9 connector. Modems usually have DB25. My null-modem cable is 9-to-25, so I needed an additional 25-to-9 converter.

comment:24 Changed 7 years ago by Barry Landy

Sorry: I wasnt precise enough.

My PC (where I generate the stuck states) does not have an external serial connector but just an internal one.

The PC mobo does have an external DB25 connector (aka parallel port) so I wondered if I could use that?

Though I assume your /monitor switch will send data out through COM1?

comment:25 Changed 7 years ago by Barry Landy

just ordered a mobo header to DB9 connector.

comment:26 Changed 7 years ago by Valery V. Sedletski

The PC mobo does have an external DB25 connector (aka parallel port) so I wondered if I could use that?

LPT port? Not, of course. You need a COM port. LPT port is principially different.

comment:27 Changed 7 years ago by Barry Landy

OK I now have a connector plugged into the serial header with a DB9 port on the other end and what I hope is a null cable connected to the T61.

As I suspected there are no com.sys statements in my config.sys.

I added device=<drive>:\os2\boot\com.sys in config and also added /monitor on the fat32.ifs statement but nothing happens (ie hyperlink hears nothing)

Do I need anything else in config.sys ?

(Of course the header connector could be the wrong way round, or the cable - laplink - not be the right sort, or or..

comment:28 Changed 7 years ago by Valery V. Sedletski

I added device=<drive>:\os2\boot\com.sys in config and also added /monitor on the fat32.ifs statement but nothing happens (ie hyperlink hears nothing)

If your machine is SMP, you need pscom.sys instead of com.sys, because com.sys is not SMP-aware. Also, did you installed a debug kernel first?

(Of course the header connector could be the wrong way round, or the cable - laplink - not be the right sort, or or..

laplink cable is 25-pin. Isn't your cable 9-pin one?

comment:29 Changed 7 years ago by Barry Landy

The cable is a dual cable with DS9 and DS25 connectors on both ends.

No I have not installed a debug kernel; I did not realise it is needed. Where do I get it from?

comment:30 Changed 7 years ago by Barry Landy

I meant DB9/25 of course!

comment:31 Changed 7 years ago by Valery V. Sedletski

No I have not installed a debug kernel; I did not realise it is needed. Where do I get it from?

On OS/2 installation disks, of course. \os2image\debug\smp\os2krnlb is a half-strict kernel, which you need. And please install the symbol file, os2krnlb.sym. But rename them to os2krnl.* first, of course.

I meant DB9/25 of course!

so, you have a 25-to-9 converter too?

Last edited 7 years ago by Valery V. Sedletski (previous) (diff)

comment:32 Changed 7 years ago by Barry Landy

no; no 25-9 converter. It has DB9 on both ends

comment:33 Changed 7 years ago by Valery V. Sedletski

I meant DB9/25 of course!

no; no 25-9 converter. It has DB9 on both ends

very strange

comment:34 Changed 7 years ago by Barry Landy

Please talk me through this as to an idiot.

I have disks for ECS2.0 and ECS2.1 and ARCAOS

What I am actually running at present and am using as testing ssystem for FAT32 is ECS2.2 beta but this came with an ISO download and I dont know if the generated DVD has an alternate debug kernel (though I suppose it does). At present I cannot see the DVD but I have the iso file.

The install disks for ECS21 and ARCAOS both have a half strict kernel available so I think it probably best if I change to using the ARCA system.

Do I just copy the kernel into the system (and the symbol table)?

(I used to do this a lot when kernel updates happened regularly but have lost the technique)

And re the cable: I have in the past only used the 25-25 connectors so I dont know if the 9-9 will work.

comment:35 Changed 7 years ago by Barry Landy

I found the ECS2.2 beta II DVD. It does have a debug kernel so I could use that also.

comment:36 Changed 7 years ago by Valery V. Sedletski

What I am actually running at present and am using as testing ssystem for FAT32 is ECS2.2 beta but this came with an ISO download and I dont know if the generated DVD has an alternate debug kernel (though I suppose it does). At present I cannot see the DVD but I have the iso file.

I know nothing about ArcaOS, but I suspect it is the same as eCS. On IBM's OS/2's and on eCS it is the \os2image subdirectory in the root. If you have only an ISO image, then you could burn it first on a CD/DVD disk (otherwise how you are going to install it on real hardware. If not real hardware, then it's possible, of course.). Also, you could extract it by 7zip or mount via an ISOFS Netdrive plugin. 7zip was available on eCS boot partition after installation. I suppose you have one, or ArcaOS has 7zip too (maybe, I only suspect, as I haven't ArcaOS).

Do I just copy the kernel into the system (and the symbol table)?

Yes, you need to copy the kernel, or insert it in the QSINIT(/ArcaLoader??) menu. In case of QSINIT, the older kernel is available there too.

PS: Back up your kernel. of course, if you're not using QSINIT menu. In case of QSINIT, you can copy your new kernel under another name, and add both kernels to menu.

Last edited 7 years ago by Valery V. Sedletski (previous) (diff)

comment:37 Changed 7 years ago by Barry Landy

I am using QSINIT so that's OK.

I have set the ECS22 system up to boot with QSINIT from either the default kernel or OS2KRNLB, and that works OK.

However, during system init (reading config.sys) OS2 reports that COM.SYS (or PSCOM.SYS) is ignored as if the COM1 port is not present.

I have tried 2 different ways PSD=ACPI with no parameters and PSCOM.SYS PSD=ACPI /maxcpu=1 and COM.SYS with the same error.

I have checked the BIOS and it says serial port active I have looked on Windows and it says COM1 is working.

Any ideas?

comment:38 Changed 7 years ago by Valery V. Sedletski

However, during system init (reading config.sys) OS2 reports that COM.SYS (or PSCOM.SYS) is ignored as if the COM1 port is not present.

Did you enabled that COM port in BIOS first?

I have checked the BIOS and it says serial port active I have looked on Windows and it says COM1 is working.

What are COM port settings in Windows then? I mean, I/O port address, port number?

comment:39 Changed 7 years ago by Valery V. Sedletski

However, during system init (reading config.sys) OS2 reports that COM.SYS (or PSCOM.SYS) is ignored as if the COM1 port is not present.

On which machine? On terminal or on debugee? If debugee then it's normal, because kernel debugger takes this port and com.sys/pscom.sys should be disabled. On terminal machine they should be enabled, of course.

comment:40 Changed 7 years ago by Barry Landy

On the MUT (ecs system being tested); my terminal setup is using windows on T61

So do I actually need the (PS)COM.SYS? I "cured" the "line ignored" problem by moving the (PS)COM.SYS above the USB section. (to stop USBCOM.SYS grabbing it)

from what you say this is wrong? Does the debug kernel grab the serial line before config.sys?

comment:41 Changed 7 years ago by Barry Landy

Windows says com1 is 3f8 and irq 4

comment:42 Changed 7 years ago by Valery V. Sedletski

Debug kernel grabs the COM port itself, so COM.SYS/PSCOM.SYS should be disabled during testing. And USBCOM.SYS does nothing with physical COM port too. It can handle USB COM ports only. Yes, the kernel grabs the COM port before config.sys.

Last edited 7 years ago by Valery V. Sedletski (previous) (diff)

comment:43 Changed 7 years ago by Valery V. Sedletski

Windows says com1 is 3f8 and irq 4

So, it is normal COM port with normal settings. And you should add the settings to os2ldr.ini, like this:

dbflags=0
; com port address (0 for vio)
dbport=0x3F8 

comment:44 Changed 7 years ago by Barry Landy

Good news and bad news

Having finally (I hope) got it set up correctly (I added the dbflags and port to os2ldr.ini and corrected the kernel line) the system would not complete the boot process: at the end (maybe) of the config.sys work it stopped on a black screen.

I tried changing to VGA but that didnt work very well (for either mode)

comment:45 Changed 7 years ago by Valery V. Sedletski

It stopped at the debugger prompt in the boot process beginning. Did you tried the "g" (go) command from the debugger prompt? You can also create the file "kdb.ini" in the root directory of the boot drive with these contents:

.b 115200t 3f8
g
ln
u
dd ss:esp
k

It will set up the com port speed and address, and continue with "g" command. Then execute the rest of commands in case of a trap.

comment:46 Changed 7 years ago by Barry Landy

Sorry but how am I supposed to know this stuff? I have zero documentation for the debugger and the qsinit "documentation" is totally wierd - I thnk I got there at the 3rd attempt.

If I create that file are there any more things I ought to know as a pre-requisite?

Anyway - going to bed now.

comment:47 Changed 7 years ago by Valery V. Sedletski

You just ask me if something is not clear. These things are needed for each one working with the kernel debugger, anyway. If not clear, just ask. The debugger stops at very early stage, to have very easy debugging. If you need not stop there, you just create a debugger ini file with "g" line (at least). But at most, I suggested you more lines for my convenience :)

Last edited 7 years ago by Valery V. Sedletski (previous) (diff)

comment:48 Changed 7 years ago by Barry Landy

I put that file kdb.ini in the root (is that a misprint for kbd.ini?)

No change: the system ends up in a black screen AND it is not responsive to my keyboard

NB the keyboard/mouse are USB attached to a USB hub. Iswitched them to a root hub and that made no difference. If it is critical I hae a PS2 keyboard I could switch to.

comment:49 Changed 7 years ago by Valery V. Sedletski

I put that file kdb.ini in the root (is that a misprint for kbd.ini?)

No. KDB.INI. Kernel DeBugger? INI.

No change: the system ends up in a black screen AND it is not responsive to my keyboard

Where do you type? In terminal? Or on test computer keyboard?

NB the keyboard/mouse are USB attached to a USB hub. Iswitched them to a root hub and that made no difference. If it is critical I hae a PS2 keyboard I could switch to.

You type in the terminal, not your computer keyboard. So keyboard and mouse don't matter.

comment:50 Changed 7 years ago by Barry Landy

so

1) kdb.ini made no difference - still a black screen It should have had some effect?

2) I have never seen anything at the terminal end and the using the terminal keyboard had no effect,

I need to test the serial port separately (? I may look for a serial keyboard) and then the null cable - I may need to buy one. Will come back when I have done that

comment:51 Changed 7 years ago by Barry Landy

I found a serial mouse. The serial connection was not working (my attempt to set it on the header was bad) and it may take a little while to fix it (my fingers are too big!)

comment:52 Changed 7 years ago by Valery V. Sedletski

landb: You could try connecting two terminals from both sides and see if it works (prior to connecting real debugger), with what settings. You'll need to reenable pscom.sys, of course. And see if your cable is defective. Did you set up all terminal settings correctly? Should be Speed 9600, 8 data bits, 1 stop bits, no parity (aka "8N1").

comment:53 Changed 7 years ago by Valery V. Sedletski

kdb.ini made no difference - still a black screen It should have had some effect?

It should boot further, at least, and not to stop in the beginning. Do you see black screen in terminal or on the test machine? Did you copied kdb.ini to the root of boot drive? Not another drive?

comment:54 Changed 7 years ago by Barry Landy

1) I will get the serial fitted professionally tomorrow so no need for extra fuss until then. 2) =============== It should boot further, at least, and not to stop in the beginning. Do you see black screen in terminal or on the test machine? Did you copied kdb.ini to the root of boot drive? Not another drive? ==================

Black screen was on MUT. KDB.INI in MUT in root of boot drive. Might the lack of COM1 confuse it? Let's wait and see....

3) terminal settings were OK I think (standard defaults)

comment:55 Changed 7 years ago by Valery V. Sedletski

Black screen was on MUT

very strange, because it should boot further. Are you sure you copied the debug kernel? Copied to the root? Renamed to os2krnl?

comment:56 Changed 7 years ago by Barry Landy

Good take! I have not renamed the kernel because QSINIT indicates that the system will boot without. I assume then that something in the debig kernel requires the name to be right?

I will test that

comment:57 Changed 7 years ago by Barry Landy

kernels are now

18-07-09 3:26a 1,004,533 0 a--- os2krnl 30-05-13 1:44a 870,881 0 a--- os2krnl.106 11-06-09 6:07a 192,148 0 a--- os2krnl.sym 17-03-05 8:20p 171,076 0 a--- os2krnls.106

The two 106 files are the original (and the non debug system boots perfectly well under these names)

I tried the debug system again (now os2krnl) and it stops with a black screen again. kdb.ni contains .b 115200t 3f8 g ln u dd ss:esp k

and os2ldr.ini

[config] ; Kernel number from kernel list section to be loaded by default default=1

; Waiting time (in seconds) after which kernel mentioned in "default" ; will be loaded timeout=10

dbflags=0 ; com port address dbport=0x3f8

[kernel] os2krnl.106=Default, memlimit=512 os2krnl=debug

comment:58 Changed 7 years ago by Valery V. Sedletski

I have not renamed the kernel because QSINIT indicates that the system will boot without

Ah, I forgot that you are using QSINIT... then no need to rename.

 [config] ; Kernel number from kernel list section to be loaded by default default=1

; Waiting time (in seconds) after which kernel mentioned in "default" ; will be loaded timeout=10

dbflags=0 ; com port address dbport=0x3f8

[kernel] os2krnl.106=Default, memlimit=512 os2krnl=debug

In one line?

then not correct. It should be like this:

 [config] ; Kernel number from kernel list section to be loaded by default default=1

    ; Waiting time (in seconds) after which kernel mentioned in "default" ; will be   loaded 
    timeout=10

    dbflags=0 
    ; com port address 
    dbport=0x3f8

    [kernel]
    os2krnl=Default boot
    os2krnl.dbg=Debug kernel

And the kernel is named as "os2krnl.dbg". The file should be renamed the same way.

comment:59 Changed 7 years ago by Barry Landy

No - it wasnt in one line. That was caused by the reporting ignoring my linefeeds.

comment:60 Changed 7 years ago by Valery V. Sedletski

So, your QSINIT boots an original kernel, and your debug kernel is called os2krnl.dbg on the disk, and the same name in os2ldr.ini? And it began to boot? You can try adding "loglevel=3" to the [config] section of os2ldr.ini, then QSINIT should write his debug info to the debug terminal. Is there this debug info, or no?

Last edited 7 years ago by Valery V. Sedletski (previous) (diff)

comment:61 Changed 7 years ago by Barry Landy

1) Since i do not yet have a monitoring terminal (waiting for the serial port) I would not see the QSINIT log.

2) for your information (and I should have said this before)

a) my boot volume is formatted JFS

b) I switch off lazywrite for the FAT32 cache (ie I am not using the cache).

comment:62 Changed 7 years ago by Barry Landy

Progress (of sorts!)

1) the serial port is now working (tested with a serial mouse)

2) using my old cable I get information on the terminal emulator. However what appears is junk (as below)

ç€Ó!Ú%B¹%[oo“[)„{ š[)b÷)«.jÚ!Ú•¤õŠó7K¯pjÚ!Ú•¤õLSQ˜)!ß1ÝŠ`³B@%k¹«J!­UŠC¼„ õÖfÖÛõ¹«J!ûßÚMR¼ƒKb÷)ªÏœgûç‘oç )‘1‘Æç17!+ÆÂÇ)™´ç™™Â)Ú)ç™g )Ñbâ bÎ!ý]ýÔâý!ûÖVѳø!ü'KûãŠKBBªKûãò

I assume this means the cable is not a null cable? In which case I will go and purchase one.

comment:63 Changed 7 years ago by Valery V. Sedletski

How did you tested it with serial mouse? Attached it to the other end of null-modem cable? If no, then it only indicates that the port seems to be ok. The cable is in question.

Last edited 7 years ago by Valery V. Sedletski (previous) (diff)

comment:64 Changed 7 years ago by Barry Landy

No. I plugged the serial mouse into the serial socket on the computer. That worked OK. I am pretty sure my old cable is straight through and I am getting a null modem cable tomorrow,

(I cannot connect the mouse to the end of the old cable as both ends are female and I dont have a gender changer)

comment:65 Changed 7 years ago by Valery V. Sedletski

Yes, maybe, your cable is a modem cable (straight one). Null-modem cable is a crossed one. Ah, yes, it is other gender...

comment:66 Changed 7 years ago by Barry Landy

Still progress of sorts. I now have a null modem cable (sold as such!).

However I am still getting rubbish on the terminal screen. It is set to see COM1; set to 9600 8N1

Any other settings needed: I left it set to emulation auto detect but there are lots of other possibilities.

You also suggested I could use PUTTY which i also have but doesnt seem designed to connect to COM1.

All advice welcomed!

comment:67 Changed 7 years ago by Barry Landy

I also downloaded an up to date putty which has a serial option; I used that and still get garbage on the screen.

comment:68 Changed 7 years ago by Valery V. Sedletski

AH, good. Do you have ".b 9600t 3f8" at the beginning of a kdb.ini (kdb.ini should be placed on a test machine! As it is the kernel debugger .ini file and kernel debugger runs on the test machine, not the terminal) too? BTW, we can try setting speed to 115200 on both end (should be ".b 115200t 3f8") for more quick debugging. The rest of settings seems to be ok.

comment:69 Changed 7 years ago by Barry Landy

I looked at kdb.ini on the test machine. It had 115200 whereas the terminal machine had 9600. I corrected kdb.ini to 9600.

Made no apparent difference to the output though I dont know what I am expecting. Meanwhile running the debug kernel still stops with a black screen.

comment:70 Changed 7 years ago by Barry Landy

I decided to also try monitoring with an OS2 system on my T61 (ECS21 as it happens)

I downloaded dterm (from your earlier post) and got it running.

I then booted the debug terminal on the MUT and absolutely nothing appeared on the Dterm screen. Iand yes, the null modem cable is still connected).

comment:71 Changed 7 years ago by Barry Landy

It occurred to me to try the latest version of FAT32 (r257) on my T61. It creates the same stuck state, so it is something to do with the way the system is set up rather than any oddity in the hardware. I will try to monitor in the other directin, using the Thinkpad as the MUT.

comment:72 Changed 7 years ago by Valery V. Sedletski

I could recommend to install a terminal program on both machines and experiment with their settings first. Try to set up the same settings and see if typing symbols on one terminal outputs the same symbols on another. But only enable pscom.sys first on that machine. Otherwise, the COM port will not work.

comment:73 Changed 7 years ago by Barry Landy

I discovered that the T61 serial port was disabled, but it still makes no difference.

Independently of the com output shouldnt the debug boot be getting further?

comment:74 Changed 7 years ago by Valery V. Sedletski

Of course, you should not load the debug kernel, if you want to use a COM port for another purpose, than kernel debugger, and enable the COM port driver instead. Otherwise the kernel debugger will take over.

comment:75 Changed 7 years ago by Barry Landy

Of course. My point is that when loading the debug kernel to try to debug why my system gets into a stuck state the system gets into a different stuck state prior to loading the normal OS2 screen. You suggested the kdb.ini file, which I put in place but that made no difference. So why isnt the system getting further with the debug kernel?

comment:76 Changed 7 years ago by Valery V. Sedletski

It probably, trapped somewhere. You need to get your terminal working first, to see why and where. (Maybe, it traps on mount. Disk mounts occurs nearly before pmshell start) When a debug kernel is installed, all traps go to the debug terminal, not showed on the screen.

comment:77 Changed 7 years ago by Valery V. Sedletski

I recommend to start with two terminals on both COM ports attached first, without the debug kernel active, and with a COM port driver enabled on both sides. Then if you correctly set all options, typing a string on one terminal should reproduce the same string on another side (you will see it on the terminal). If while you typing on one side, only incorrect characters repeated (not the same you typed on the other side), then somewhere COM port settings are wrong, but the connection is ready. Also, if, while you typing on side 1, it repeats correct chars on side 2, but not vice versa, then side 1 has correct COM port settings (data bits, stop bits and parity bits number, 8N1, for example), terminal emulation settings etc. 8N1 on both sides should be sufficient. And, there is suc setting as whether to react to CD (Carrier Detect) and TR (Terminal Ready) signals. Your cable could be 2- or 3-wire (which is common with nowadays' cables), and hence, doesn't have CD or TR signals wires. So, some terminal emulators could work incorrect. So, you should look into your terminal program settings, if such options (to disable CD and/or TR signals) exist. Debug kernel doesn't need these signals, but some terminal emulation programs may need.

When you set up terminal programs ok, you may change one side to a debug kernel instead of a terminal program, and all should work (you need to disable a COM port driver on that side, and enable a debug kernel first).

comment:78 Changed 7 years ago by Barry Landy

pleased to report that when I run Windows on both systems and open hyperterminal on both what I type on one appears on the other in both directions.

The same is true with PUTTY (both ends or one end PUTTY the other hyperterminal)

So the link is good and the cable good.

I have not tried the OS2 to OS2 route yet

comment:79 Changed 7 years ago by Valery V. Sedletski

Ok, so you already can use your terminal under Putty or HyperTerminal? (or OS/2 with HyperAccess? Lite, as it is very similar to HyperTerminal?, and produced by the same company).

PS: You said you tested r257, but there's newer build, r276 already, you could try it (look at ftp://osfree.org/upload/fat32/ for newer builds.)

comment:80 Changed 7 years ago by Barry Landy

And it works with Dterm on OS2 (both sides). I did not know about hyperaccess.

Why do I need to disable the com port driver? Wouldnt the debug kernel take it first?

comment:81 Changed 7 years ago by Valery V. Sedletski

I tried to leave the COM port driver enabled, as a result, the COM port driver takes the COM port over, and starting from the COM port driver loading, messages are stopped to come to the terminal. So, it appears that the COM port driver prevents the debug kernel to be working.

comment:82 Changed 7 years ago by Barry Landy

1) I installed r276. Both with 269 and 276 I find that there are more frequent stuck states and I dont do anything odd; just for example opening an OS2 window (startup commands come from a Fat32 partition)

2) I have dterm on the terminal, and it showed the usual junk characters. I typed some text on the screen and it suddenly started outputting readable text as a result of the /monitor.

3) just before the stuck state it showed the following

P:30 T:c D:0 T:10:386 GetBuf3Access: FindPathCluster                           

P:30 T:c D:0 T:10:388 GetFatAccess: ReadSector                                 ░
P:30 T:c D:0 T:10:439 GetReadAccess: ReadSector                                ░
P:30 T:c D:0 T:10:447 ReleaseReadBuf                                           ░
P:30 T:c D:0 T:10:486 ReleaseFat                                               ░
P:30 T:c D:0 T:10:521 ReleaseBuf3                                              ░
P:30 T:c D:0 T:10:521 GetFatAccess: GetNextCluster                             ░
P:30 T:c D:0 T:10:523 ReleaseFat                                               ░
P:30 T:c D:0 T:10:558 FS_OPENCREATE returned 0 (Action = 1, OI=411808c8)       ░
P:30 T:c D:0 T:10:562 FS_FILEINFO for D:\D\cmd\AUTOEXEC.CMD, usFlag = 0, level ░
4                                                                              ░
P:30 T:c D:0 T:10:646 FS_FILEINFO returned 0                                   ░
P:30 T:c D:0 T:10:646 FS_FILEINFO returned 0                                   ░
P:4b T:81 D:0 T:10:695  FS_PROCESSNAME returned filename: D:\D\CMD\AUTOEXEC.CMD░

which I doubt is helpful ■

comment:83 Changed 7 years ago by Valery V. Sedletski

That's good. But the piece of log you've inserted here is too small, of course. In the dir you started DTerm from, should be a DTerm.log file. Please attach it here (please, set the Log option to "replace" to clear log on startup. There is a status line at the bottom, you can choose a set of options by "arrows" at top left, options could be choosen by mouse or keyboard)

comment:84 Changed 7 years ago by Barry Landy

Sorry, no log. It's not set up by default. I found the control-l option which gave a suboption to name a log. will try again!

Changed 7 years ago by Barry Landy

Attachment: dtermlog.001 added

comment:85 Changed 7 years ago by Barry Landy

I have attached a log. again trying to open an OS2 window and finishing in a stuck state

To get the dterm window to see characters instead of rubbish I started by doing

echo xxx > com1

on the MUT. why that should help is a mystery!

comment:86 Changed 7 years ago by Valery V. Sedletski

This log contains no hang. All IFS functions complete normal and it does not hang in any of them. Are you sure this log is taken after the system is hanged? Or, it hanged not in the IFS, but somewhere else.

 To get the dterm window to see characters instead of rubbish I started by doing

echo xxx > com1

on the MUT. why that should help is a mystery!

Are you sure you have disabled the com port driver?

echo xxx > com1

should not work as it should be no "com1" device in the system, because you disabled the COM port driver!

PS: what says the following command:

mode com1

Does it say "SYS0021: The drive is not ready"?

Last edited 7 years ago by Valery V. Sedletski (previous) (diff)

comment:87 Changed 7 years ago by Barry Landy

You misunderstand. This is NOT using the debug kernel but the ordinary kernel and your monitoring data is going to COM1 via PSCOM.SYS (as is the ECHO).

As I can say about the hang is that the OS2 window never came to the command prompt. Mouse and keyboard dead.

The Log was closed AFTER the hang. Some interaction with something no doubt. I am pretty sure that if I put back the old version (913) there will be no hangs.

Changed 7 years ago by Barry Landy

Attachment: putty1.txt added

comment:88 Changed 7 years ago by Barry Landy

Here is another one: this time captured by PUTTY under windows. Still needed am ECHO xx to stop the rubbbish.

I dont know if this is more indicative but it does not seem to end tidily.

comment:89 Changed 7 years ago by Valery V. Sedletski

Ah, you load a non-debug kernel, output a string to COM1 via echo command, and the log becomes ok on the other side? If so, then you can use a non-debug kernel, every thing should work ok. It appears that some program accesses the com port too, so it reinits the COM port. When you issue "echo" command, the port is reinitted again and debug messages pass through.

From which window you issue an "echo" command, if it hangs after that?

And where it "does not end tidily"? I see it ending on FindPathCluster? call... Or, you cutted the log? Could you please, leave some number of repeating lines at the end?

comment:90 Changed 7 years ago by Valery V. Sedletski

Ah, "GetBuf3Access: FindPathCluster?", so it is waiting on the semaphore. Then it is not a hang. Only a program accessing the FAT disk should "freeze". The rest of the system should be responsive. What program are you using to access the FAT32 disk? WPS?

comment:91 Changed 7 years ago by Barry Landy

1) as expected version 913 runs clean. it also runs much faster presumably because there is no monitoring.

2) to issue the ECHO command I have to start an OS2 window, so it doesnt always hang....

3) PUTTY has a "copy all to clipboard" option which I used so what I attached was ALL the logged data.

4) the program accessing the FAT32 disk was the OS2 window processor. The system was completely hung - mouse dead keyboard dead no way to access anything else.

comment:92 Changed 7 years ago by Valery V. Sedletski

Hm, strange. Buf3 semaphore should be free at the moment, so, it should be no lockup. Or there are some repeating messages after a line "GetBuf3Access: FindPathCluster??" ?

comment:93 Changed 7 years ago by Valery V. Sedletski

What is "OS2 window processor"? Could you share it with me?

comment:94 Changed 7 years ago by Barry Landy

Sorry to confuse.

I simply meant that I am opening an OS2 window and it is processing some commands. In config.sys I have

os2shell /k d\cmd\autoexec.cmd

so that whenever an os2 window is opened it will obey that CMD file and whatever it calls.

Strange - yes I agree. But thats why I am trying to debug this....

(no more tonight. What time zone are you? wouldnt we be better off communicating by email?)

comment:95 Changed 7 years ago by Valery V. Sedletski

Ok, you meant the "OS/2 Window". It is a standard PM window. And I use OS/2 windows without any problems on my machines. I suspect that your autoexec.cmd makes some problems. Could you share your autoexec.cmd instead? (BTW, why not use 4OS/2 for that purpose, it can have a file executed on startup of each OS/2 window). And could you try to disable your autoexec.cmd for testing purposes? (Just restore standard values of OS2SHELL and COMSPEC variables in config.sys and reboot?). I am in GMT+3 (Moscow) timezone. But I usually working nearer to evening.

May be, email would be better, but I prefer irc (if you agree). I am 24h hanging over #osFree irc channel on EFnet irc network (www.efnet.org for the list of servers).

Last edited 7 years ago by Valery V. Sedletski (previous) (diff)

comment:96 Changed 7 years ago by Barry Landy

I dont use irc.... I dont use 4OS2 (and dont want to start now)

I could remove the automatic use of start up command file but actually I want to get to the bottom of this and not find ways around it. All that autoexec does is set up things I want set in my window such as the path, colour of prompt, larger window for scrolling, etc. What do you mean by "share the autoexec.cmd"?

I have not tried to recreate the DOS window hang since getting this monitoring working as assumed the other hand was "just as good"....

comment:97 Changed 7 years ago by Valery V. Sedletski

Ok, you could use email, if you wish, but it isn't much better than writing here (my email is _valerius (dog) mail (dot) ru).

Removing autoexec is just for testing purposes -- I want to see if it is your autoexec that is causing troubles. If yes, you could just share with me your .cmd's contents and I maybe, will reproduce the problem on my machine. It could speed up things a lot.

comment:98 Changed 7 years ago by Valery V. Sedletski

PS: And please, attach your config.sys here. Maybe, you have some references to your FAT32 drive in it, or something like this.

PPS: What is searching for ex.[bat|cmd|com|exe]? It this "ex.*" accessed from your autoexec.cmd?

Last edited 7 years ago by Valery V. Sedletski (previous) (diff)

Changed 7 years ago by Barry Landy

Attachment: config.sys added

comment:99 Changed 7 years ago by Barry Landy

1) I have attached config.sys. The FAT32 partition is D:

2) I have a tiny file which converts ex to exit (both ex.bat ad ex.cmd are present)

3) ex is not referenced from autoexec or any of the cmd files it calls (which I will attach in an email)

Changed 7 years ago by Barry Landy

Attachment: cmd.zip added

comment:100 Changed 7 years ago by Barry Landy

I have attached cmd.zip with 4 cmd files

Changed 7 years ago by Barry Landy

Attachment: putty2.txt added

Changed 7 years ago by Barry Landy

Attachment: putty3.txt added

comment:101 Changed 7 years ago by Barry Landy

Attached 2 files

lots of fun this morning.

1) removing the /K from the OS2_SHELL line stopped my system completing the boot process (the PM screen came up but remained empty). I totally do not understand this!

So instead I changed my references to d:\d\cmd (d: is FAT32) in config.sys to F:\D\CMD (F: is HPFS)

2) with that setup I installed 276 again; added /monitor and booted. the first instance of an OS2 window worked fine. After exiting and starting another (clicking on icon) it was referencing the FAT32 volume and became stuck again. See putty2.txt

3) reboot: this time I started a DOS window and it quickly became stuck. See putty3.txt

comment:102 Changed 7 years ago by Valery V. Sedletski

Hm, it looks like there is really hang in FindPathCluster?. I added some debug messages and uploaded r276 there: ftp://osfree.org/upload/fat32/test/fat32-0.10-r276-os2.zip. Please take a log.

PS: Yes, you have a very strange setup. Why do you need .cmd files on FAT32 drive and PATH referencing it? Anyway, it helps to catch glitches ;) But I am still unable to reproduce this on my machine :)

PPS: Are you sure that your FAT32 disk does not contain errors? Either, those fixable with CHKDSK, or hardware ones? Bad sectors, or maybe, a faulty drive?

Last edited 7 years ago by Valery V. Sedletski (previous) (diff)

comment:103 Changed 7 years ago by Barry Landy

1) I have done extensive and repeated error checking (my first thought!). In any event the errors also occur on my Thinkpad so I think hardware error is ruled out

2) my setup did not used to be like this. Current setup is all to do with data backup, migration and recovery.

I used to have cmd files with extensive lists of things to be backed up. I moved to using file synch on one (maybe two) tree. This tree is on the FAT32 partition so that windows files also get backed up at the same time.

comment:104 Changed 7 years ago by Barry Landy

Running the new 276

first boot attempt failed. After a long wait there is a message "failure to get semaphore" (exact text has gone. Then ENTER to proceed; screen clears. new message FAT32: Semrequest getFatAccess Failed rc=121! Press Enter to continue

but it does not continue.

force reboot.....

comment:105 Changed 7 years ago by Barry Landy

2nd reboot:

Long wait at the same point but then the boot continued.

open OS2 window: echo starting > com1 exit open window again: stuck after achoing "call cprompt"

putty trave in putty4

Changed 7 years ago by Barry Landy

Attachment: putty4.txt added

comment:106 Changed 7 years ago by Barry Landy

Are there any other logs you need?

I realise that as well as the CMD files which are normally on FAT32 (but now moved) there is also a APP dataset (\apps) which is still on FAT32.

I could move that but on the other hand it is causing this problem quite reliably.

comment:107 Changed 7 years ago by Valery V. Sedletski

Are you sure putty4.txt is with ftp://osfree.org/upload/fat32/test/fat32-0.10-r276-os2.zip, and not ftp://osfree.org/upload/fat32/fat32-0.10-r276-os2.zip ?

I suspect, you downloaded the 2nd one, because I don't see any additional debug messages.

Regarding the getting semaphore problem, the current binary is compiled with an experimantal option for working around another problem. This is unrelated problem, now I built the binaries without this option enabled. (This problem needs to be adressed separately, and I know about it). The binary is at 1st link, as above (note the "test" subdirectory).

Last edited 7 years ago by Valery V. Sedletski (previous) (diff)

comment:108 Changed 7 years ago by Barry Landy

I downloaded directly from the link in the post above (comment 102). Looking at the 2 links in 107 the one I downloaded and the one you quote above as correct (\test\) have the same length (compressed 799kb) and the non test one has a different length (compressed 801kb).

comment:109 Changed 7 years ago by Barry Landy

I know what happened. The wpi version puts the FAT32.IFS into \ecs\boot; your zip puts it into \os2\boot.

I will correct and try again.

comment:110 Changed 7 years ago by Barry Landy

Every boot with that version of FAT32.IFS hits the semaphore timeout. 4 tries so far

comment:111 Changed 7 years ago by Barry Landy

Keeps on happening. I assume all the timings are changed by the presence of the extra debug code? It certainly runs very slowly...

comment:112 Changed 7 years ago by Barry Landy

Apologies. it took me a very long time to realise you meant you had changed the test version and I needed to download it again. trying that now (with the \ecs -> \os2 change)

Changed 7 years ago by Barry Landy

Attachment: putty5.txt added

comment:113 Changed 7 years ago by Barry Landy

Another stuck state with the trace information in putty5.txt

comment:114 Changed 7 years ago by Valery V. Sedletski

That is non-official .WPI from ArcaNoae? maybe, installs it to \ecs\boot, but both my .WPI and .ZIP have it packed into \os2\boot. And I didn't uploaded any .WPI's. It is presumed that you need to copy each file into its intended location manually, not just "unzip file.zip" to root. Some files may be unnecessary for you, such as .sym files or docs.

Every boot with that version of FAT32.IFS hits the semaphore timeout. 4 tries so far

With last reuploaded? Should not be... As I disabled corresponding code.

Apologies. it took me a very long time to realise you meant you had changed the test version and I needed to download it again. trying that now (with the \ecs -> \os2 change)

Yes, so you did not redownloaded my test version.

PS: With your latest log (putty5.txt), I see no hang at all. Does it look hanging for you still? As I see all IFS functions completed and no more hang in FindPathCluster?.

PPS: Yes, timings/delays are because of added debug code. You can remove the /monitor switch and there will be no delays. BTW, you can start f32mon.exe before testing, this will temporarily enable debug messages. When you close f32mon, debug messages are switched off. This is more convenient method to use instead of permanently enabling it with the /monitor switch.

Last edited 7 years ago by Valery V. Sedletski (previous) (diff)

comment:115 Changed 7 years ago by Barry Landy

All earlier versions are available as WPI and I dont know who made them then.

For Ecomstation a lot of stuff, including FAT32.IFS, goes into \ecs\boot and the WPI version puts it there. Obviously NOT for ARCAOS :-)

So long as the config is updated each time by the WPI process it obviously doesnt matter. It did matter with a manual download, and I used the same instructions as I received when test the USB suite: unzi evewrything to root. I decided the extra files wouldnt matter.

Yes, putty5 refers to a stuck state. It is entirely conceivable that there is some interaction somewhere with something. I will try to generate someone more obvious. Thanks for the tip about f32mon - that would be very useful.

Version 0, edited 7 years ago by Barry Landy (next)

Changed 7 years ago by Barry Landy

Attachment: putty6.txt added

Changed 7 years ago by Barry Landy

Attachment: putty7.txt added

comment:116 Changed 7 years ago by Barry Landy

Two stuck states

1) putty6.txt

with monitor off: boot, open OS2 window, open DOS window

In terminal start PUTTY
In OS2 window obey 
echo starting >com1
f32mon

In DOS window obey
ne \\temp\popdir.bat

and it sticks at that point

2) putty7.txt

with monitor off: boot, open OS2 window, open DOS window
In DOS window obey
ne \\temp\popdir.bat

In terminal start PUTTY
In OS2 window obey 
echo starting >com1
f32mon
in DOS window (in the NE processor) make a change to the dataset and type F3 (close)

system is stuck there!

I hope these help.

Timing obviously makes a big difference; things that get stuck running in monitor mode work with monitoring off.

Changed 7 years ago by Barry Landy

Attachment: putty8.txt added

comment:117 Changed 7 years ago by Barry Landy

Bizarre!

Open  Os2 window
open Dos window
in os2 window echo starting >com1
f32mon
then system stuck!

the F32mon command is the last thing on the OS2 window (no monitor trace) putty8.txt is the result of issuing the command f32mon in the OS2 window

comment:118 Changed 7 years ago by Barry Landy

putty9

(this is what I was trying to do when putty8 happened)

Open  Os2 window
open Dos window
in os2 window echo starting >com1
f32mon
In dos window (in D: a FAT32 volume)
NE temp\popdir.bat
====system stuck

then system stuck!

Changed 7 years ago by Barry Landy

Attachment: putty9.txt added

comment:119 Changed 7 years ago by Valery V. Sedletski

For Ecomstation a lot of stuff, including FAT32.IFS, goes into \ecs\boot and the WPI version puts it there. Obviously NOT for ARCAOS :-)

Yes, my main machine is MCP2 and even on eCS I keep all my IFS'es in \os2\boot. And ArcaOS, I suspect, have no \ecs subdirectory. So, it would be not very clever from my side to install binaries to \ecs\ :) And to have two different .WPI's for eCS and for other systems is not a clever idea :)

It looks that it hangs when reading sectors. I added some other debug messages. It looks all these logs include calling ReadSector? from different places, and the hang is in ReadSector?. I suspect that you have hitted bad sectors and DOVOLIO call hangs. Need to check this. Please test one of your testcases when it hangs.

comment:120 Changed 7 years ago by Valery V. Sedletski

comment:121 Changed 7 years ago by Barry Landy

BTW I dont know if I told you that both on my desktop and the T61 the drives are SSD. So bad sectors less likely.

comment:122 Changed 7 years ago by Barry Landy

putty10: same as putty9 with new version

Changed 7 years ago by Barry Landy

Attachment: putty10.txt added

comment:123 Changed 7 years ago by Valery V. Sedletski

Ok, now please, test ftp://osfree.org/upload/fat32/test/fat32-0.10-r278-os2.zip (reuploaded). The same testcase as with putty10.txt

Last edited 7 years ago by Valery V. Sedletski (previous) (diff)

Changed 7 years ago by Barry Landy

Attachment: putty11.txt added

comment:124 Changed 7 years ago by Barry Landy

putty11: call of F32MON from OS2 window resulted in stuck state with the attached log (loop?)

Changed 7 years ago by Barry Landy

Attachment: putty12.txt added

comment:125 Changed 7 years ago by Barry Landy

putty12 same again

comment:126 Changed 7 years ago by Valery V. Sedletski

What's the difference between 11 and 12? Both logs look incomplete, or it hanged not in fat32.ifs. 12th log ends with

fpc008: GetNextCluster: ulCluster=fffffff

so, the loop should end and reach the mark fpc009, but I don't see fpc009 in the log. No places between fpc008 and fpc009, where it could hang. So, it looks like it hanged not in fat32.ifs, but somewhere else. Are you sure it hanged at fpc008 and there were no lines after

fpc008: GetNextCluster: ulCluster=fffffff

?

Last edited 7 years ago by Valery V. Sedletski (previous) (diff)

comment:127 Changed 7 years ago by Barry Landy

i captured all there was. The only command issued was F32MON on the OS2 Window (in both cases).

I will try again

comment:128 Changed 7 years ago by Valery V. Sedletski

Hehe, "D:\FAT32.LOG". So, the log is on your FAT32 drive, that's because it hangs when you start f32mon.exe. Try starting it with other current drive. I still don't understand, why it hangs when accessing FAT32 drive, because the hang is somewhere else.

BTW I dont know if I told you that both on my desktop and the T61 the drives are SSD. So bad sectors less likely.

Why do you think so? SSD's are indeed a special kind of flash memory and it can be corrupted. The bad sectors should be replaced transparently to spare sectors, in theory, but there is some limit for spare sectors. (As it is with hard disk, too). So, when the limit is exhausted, the bad sectors cannot be replaced, hence errors. And, there should be simply faulty drives.

Changed 7 years ago by Barry Landy

Attachment: putty13.txt added

comment:129 Changed 7 years ago by Valery V. Sedletski

i captured all there was.

So, it looks that it hanged not in fat32.ifs. No idea why it could be hanged, except for a hardware error.

comment:130 Changed 7 years ago by Barry Landy

OK did not see your last comment before trying again.

putty13 is

open os2 window
echo
F32MON
}}
stuck and I checked that putty13 has all the lines that were on PUTTY.

How do I change the log that FAT32MON is using?
Last edited 7 years ago by Barry Landy (previous) (diff)

comment:131 Changed 7 years ago by Valery V. Sedletski

And in the log #13 it hanged on calling the kernel function for reading sectors -- here bad sectors are very likely.

> open os2 window
> echo
> F32MON

stuck and I checked that putty13 has all the lines that were on PUTTY.

How do I change the log that FAT32MON is using?

You cannot change the log name. It is always called fat32.log and is created in the current directory. Just change the current drive and current diretory.

comment:132 Changed 7 years ago by Barry Landy

OK easy. Just to play safe I am running a bad sector scan (DFSEE) which has showed nothing so far. I will kill that and retry the monitor run

comment:133 Changed 7 years ago by Barry Landy

putty14

open os2 window
ensure NOT d: (logging to R: JFS formatted)
F32MON
open DOS window 
stuck

NB the autoexec for dos window accesses d:\temp

Changed 7 years ago by Barry Landy

Attachment: putty14.txt added

comment:134 Changed 7 years ago by Barry Landy

I have restarted the bad sector scan on partition D to get that out of the equation. Will take about an hour to complete.

(As I said in an earlier post, I can generate the stuck state on my T61 which does i think rule out bad sectors except an odd coincidence)

comment:135 Changed 7 years ago by Barry Landy

(and again as in an earlier post, I see no problems with version 913, also tending to rule out a hardware problem)

comment:136 Changed 7 years ago by Valery V. Sedletski

Please, test ftp://osfree.org/upload/fat32/test/t1.zip (only IFS and .sym file, install it over previous IFS). And, the please use the same testcase as in log #13 (f32mon log with current disk d:)

And please, test two times (two logs), to get stable results

Last edited 7 years ago by Valery V. Sedletski (previous) (diff)

comment:137 Changed 7 years ago by Barry Landy

a) dfsee completed showing no errors

b) just an observation: without the monitoring it is much rarer to get a stuck state. With the monitoring almost everything seems to stick -- timing problem?

comment:138 Changed 7 years ago by Barry Landy

putty 15a and 15b

reboot, 
open os2 window
scho starting >com1
F32mon

system sticks; after that monitor output continues for some time (improve baud rate?)

Changed 7 years ago by Barry Landy

Attachment: putty15a.txt added

Changed 7 years ago by Barry Landy

Attachment: putty15b.txt added

comment:139 Changed 7 years ago by Valery V. Sedletski

Ok, thanks. Now please test t2.zip in the same place.

Changed 7 years ago by Barry Landy

Attachment: putty16a.txt added

comment:140 Changed 7 years ago by Barry Landy

Putty16a

same scenario. This time the output looked as though it was looping (not stopping!) and I had to termnate it (at the PUTTY end)

comment:141 Changed 7 years ago by Valery V. Sedletski

And now, please, t3.zip from the same place! No need to terminate! It is executed in a loop normally. The log may be big.

Changed 7 years ago by Barry Landy

Attachment: putty17.txt added

comment:142 Changed 7 years ago by Barry Landy

putty17 attached; same scenario. output stopped of its own accord

comment:143 Changed 7 years ago by Valery V. Sedletski

So, nothing new. It has been hanged in the place where it normally cannot hang -- on exit from ReadSector?, sectors read done ok, and no place it can hang. It looks like it hanged somewhere else, but not in fat32.ifs. So, no ideas yet. Probably, you're right, some timing issue -- it hanged earlier than in #15a/#15b. Probably, it was caused by delay added by additional debug code.

comment:144 Changed 7 years ago by Barry Landy

So what now? I could repeat the process on a different "stuck" perhaps?

Could the stuck state be a control loop?

comment:145 Changed 7 years ago by Valery V. Sedletski

I have no idea. Except for testing without an SSD, to see that error is only occurs with an SSD. The loop terminates normally. It hangs somewhere outside fat32.ifs.

comment:146 Changed 7 years ago by Valery V. Sedletski

Please also try ftp://osfree.org/upload/fat32/test/t4.zip and check if it hangs with it.

comment:147 Changed 7 years ago by Valery V. Sedletski

It looks like it hangs after some delay after accessing the fat32 disk. If I add more debug messages, the earlier in the code it will hang. And hang not immediately after accessing the media, but after some delay -- already not in the place of access.

I added the code for checking for bad sectors, which was commented out, since version 0.9.13. Maybe, it will help.

Changed 7 years ago by Barry Landy

Attachment: putty20.txt added

comment:148 Changed 7 years ago by Barry Landy

Running with T4 logged to putty20

open os2 window
echo starting in D:(FAT32) > com1
....
DID NOT STICK
...
open DOS window
... stuck

comment:149 Changed 7 years ago by Barry Landy

A second attempt: putty21

{{open os2 window (JFS volume) echo starting in R:(JFS) > com1 .... DID NOT STICK ... open DOS window .... DID NOT STICK ... open second OS2 window ... open did not complete }}}

Changed 7 years ago by Barry Landy

Attachment: putty21.txt added

comment:150 Changed 7 years ago by Valery V. Sedletski

No need in logs. Actually, I removed all debug messages, as they are not needed. As I said, it hangs somewhere else for unknown reason. No idea why.

comment:151 Changed 7 years ago by Barry Landy

But version 913 gives me no hangs.... (I know, timing issues can affect anything!)

Thanks for help so far

comment:152 Changed 7 years ago by Valery V. Sedletski

One more idea. You have the following parameters in config.sys:

IFS=D:\OS2\BOOT\FAT32.IFS /CACHE:2048 /H /Q
CALL=D:\OS2\CACHEF32.EXE /S /F

Can you try with these parameters:

IFS=D:\OS2\BOOT\FAT32.IFS /CACHE:2048 /H /AC:* /LARGEFILES /FAT /EXFAT /EAS
CALL=D:\OS2\CACHEF32.EXE /F /P:2 /M:50000 /B:250 /D:5000

Will it change something?

comment:153 Changed 7 years ago by Barry Landy

Do you think \EAs is important as I dont really want to generate EAs in those partitions?

comment:154 Changed 7 years ago by Barry Landy

the CACHEF32 line is as I already have it.

comment:155 Changed 7 years ago by Barry Landy

Without the \EAS

essentially no difference

I tried the two things that reliably stuck the system before the debug version testing:

a) delete a particular file on a FAT32 volume

b) edit a particular dataset on the FAT32 volume

The first try at delete generated a forced reboot (!!!) -- which is a change, but after that I just got stuck states

comment:156 Changed 7 years ago by Valery V. Sedletski

Maybe, /eas are unimportant, but I don't know. It's just parameters I have on my system. Ok. Then no ideas so far.

comment:157 Changed 7 years ago by Barry Landy

/EAS makes no difference

comment:158 Changed 7 years ago by Barry Landy

It occurred to me to wonder if there may be instances in which you ae writing outside a buffer and corrupting some other process's data, so the IFS is exiting correctly but then the calling process fails.

Just trying to make sense of this puzzle.

comment:159 Changed 7 years ago by Barry Landy

More evidence for corruption effects are two things I didnt mention earlier while testing with the debug versions:

1) Once the system spontaneously rebooted

2) a few times there were system traps but not in a specific place; once in OS2KRNL, at least once in ACPI

If it would be helpful I could experiment with running debug versions with the system set to take a dump

comment:160 Changed 7 years ago by Valery V. Sedletski

Why are you so sure that traps in kernel or ACPI are related to fat32? I'd suspect that is another problem. If fat32 could corrupt memory of kernel or ACPI, it would trap on my machine too, but I have no such experience. And rather than creating dumps, you'd better get a trap screen with symbols loaded and /monitor option enabled, to see what was going on in fat32.ifs that time.

comment:161 Changed 7 years ago by Barry Landy

oh, I am not sure at all. It is simply that these trap etc did not happen at other times.

I can easily try to get a trap screen. It probably needs a debug version (just to change the timing).

I assume the "Tn" versions are still around?

comment:162 Changed 7 years ago by Valery V. Sedletski

I can easily try to get a trap screen. It probably needs a debug version (just to change the timing).

Yes a debug kernel, plus a debug kernel symbol file and fat32 symbol files. Which timing?

I assume the "Tn" versions are still around?

What do you mean?

comment:163 Changed 7 years ago by Barry Landy

Timing: if you recall as you increased the amount of debug code more and more things wouldnt work, so I assume there is some timing effect.

Debug Kernel: if you recall I have never got the debug kernel beyond a black screen even with a kdb.ini file.

Tn versions: these were versions you made available on the ftp site for download with extra trace information.

comment:164 Changed 7 years ago by Valery V. Sedletski

What is the timing for? Debug version of kernel, not the IFS.

I wonder why are you getting the black screen, because the debug kernel is the same as non-debug kernel and it does not need something special. You just should copy the kernel (renamed to os2krnl) plus a *.sym file to the root directory as usual, plus create kdb.ini file in the root.

Why do you need t*.zip? They are not needed anymore. You should use latest svn revision.

comment:165 Changed 7 years ago by Barry Landy

Debug kernel. As you recall I am using QSINIT so I dont need to rename the kernel. I did create kdb.ini and it did not help. Post processing config.sys the system did not initialise the PM screen but remained in a black empty screen not responding to anything. I even tried "g" on the terminal to no effect.

Anyway I dont need the debug kernel to cause these problems; why are you suggesting I use it?

T*.zip.. the only reason I suggested using them was to change the timing (within FAT32.IFS) as that seemed to result in more and different things failing, and I only got the traps in that state.

comment:166 Changed 7 years ago by Valery V. Sedletski

Debug kernel. As you recall I am using QSINIT so I dont need to rename the kernel.

Ok, It does not matter. Then not rename it.

I did create kdb.ini and it did not help. Post processing config.sys the system did not initialise the PM screen but remained in a black empty screen not responding to anything. I even tried "g" on the terminal to no effect.

Are you use SMP or UNI kernel? Are you sure you have the same type debug kernel? Because If you change kernel from UNI to SMP you should change doscal1.dll correspondingly. If you don't change doscal1.dll, many usermode processes, including pmshell.exe, will trap.

Anyway I dont need the debug kernel to cause these problems; why are you suggesting I use it?

I need to see last fat32 debug messages, plus a trap screen on the terminal. For that, I need the debug kernel. Non-debug kernel only show the trap screen on the screen.

comment:167 Changed 7 years ago by Barry Landy

I will check but believe I am using SMP kernel and the corresponding debug kernel.

I will try using the debug kernel again, but if that doesnt work I will set up SYSDUMP which has the trap screen recorded.

comment:168 Changed 7 years ago by Valery V. Sedletski

So, you are sure you are using the debug SMP kernel and a SMP version of doscal1.dll? If so, Then probably, there is a trap somewhere, that's because you have a black screen. The trap screen should be on a debug terminal.

I will try using the debug kernel again, but if that doesnt work I will set up SYSDUMP which has the trap screen recorded.

SYSDUMP doesn't show what's happenned in fat32.ifs just before trap. I need both fat32.ifs debug messages and a trap screen, with all debug symbols added.

PS: how do you use system dumps on newer systems? They have > 2 GB of ram. Here, dumpfs.ifs could help, but its os2dump version does not understand disks > 512 GB with 127 or 255 sectors per track. (I have a 2 TB disks and I am unable to get dumpfs working).

comment:169 Changed 7 years ago by Barry Landy

PS: how do you use system dumps on newer systems? They have > 2 GB of ram. Here, >dumpfs.ifs could help, but its os2dump version does not understand disks > 512 GB with >127 or 255 sectors per track. (I have a 2 TB disks and I am unable to get dumpfs >working).

I havent done it for a while but the first thing is to use QSINIT to restrict the memory to (say) 512Mb, then sysdump should be fine.

Also my SSDs are all 512Gb so should be fine.

SYSDUMP doesn't show what's happenned in fat32.ifs just before trap. I need both >fat32.ifs debug messages and a trap screen, with all debug symbols added.

noted. If you recall I never solved the problem of getting intelligable output without sending "echo xxx >com1" from the MUT. SO I would not have seen the trap dump info on the terminal. everything seemed set up correctly. Needs more thought

comment:170 Changed 7 years ago by Barry Landy

I have fixed the problem of garbage characters on the terminal. Mismatch of baud rate. Apparently the debug kernel default is 115200.

I have been pointed to a feature of QSINIT to set baudrate, and set it to 9600 to match the default of PUTTY. I then booted the debug kernel and indeed it ended on a trap (putty30)

For future work I will set PUTTY to 115200 also

Changed 7 years ago by Barry Landy

Attachment: putty30.txt added

comment:171 Changed 7 years ago by Valery V. Sedletski

Ok, good progress. But I see nothing useful here :( Null pointer dereference somewhere. But no info where.

Also, please add /monitor switch in the future.

Last edited 7 years ago by Valery V. Sedletski (previous) (diff)

comment:172 Changed 7 years ago by Barry Landy

yes, that was "proof of concept" but it does support your theory that the black screen was caused by a trap.

Any suggestions as to the cause of the trap?

comment:173 Changed 7 years ago by Valery V. Sedletski

Any suggestions as to the cause of the trap?

I don't know. Maybe, the kernel problem. Maybe, we can try another kernel. Or some driver. No idea, which.

Last edited 7 years ago by Valery V. Sedletski (previous) (diff)

comment:174 Changed 7 years ago by Valery V. Sedletski

Could you try the "k" command in the terminal (after it trapped), to see the call stack? (Did you added it to kdb.ini?)

Changed 7 years ago by Barry Landy

Attachment: putty31.txt added

comment:175 Changed 7 years ago by Barry Landy

booted with the debug kernel that came with ARCAOS, and with /monitor for the FAT32.IFS

Still a trap 1 (missing page)

Putty31

Changed 7 years ago by Barry Landy

Attachment: putty31a.txt added

comment:176 Changed 7 years ago by Barry Landy

I ran it again and remembered to type k output in putty31a

kdb.ini is there and does have a k at the end.

Changed 7 years ago by Barry Landy

Attachment: putty40.txt added

comment:177 Changed 7 years ago by Barry Landy

OK. Steven Levine (kernel expert) has helped.

the trap is something the kernel debugger has noticed, and typing gt gets beyond it (g for go and t to ignore kernel traps)

So after that I did various things and eventually went to a Dos window, used the ne precessor to open a file, changed it a little. exited, and it "stuck" during write back. The log seems to indicate that FAT32 processing is incomplete.

Very strangely the next thing that happened was a spontaneous reboot.....

the log is in putty40

comment:178 Changed 7 years ago by Valery V. Sedletski

To putty31a.txt: Nothing useful here. Too little data.

To putty40.txt: It rebooted when reading sectors. (When moving file into DELDIR). Can you reproduce that?

comment:179 Changed 7 years ago by Barry Landy

putty41

Booted as before (with gt); system stuck while still processing startup commands.

Putty42

After clearing the log the apparently stuck system suddenly generated this

putty43

Another "stuck during startup"

putty44

stuck AFTER startup process (apparently) terminated. (I have no idea why a call was made to open WP51. seems to happen every startup...)

putty45

(this is the effect I was trying to replicate)

After completing the startup process

open dos window (in d:)
obey "ne temp\popdir.bat
alter it a bit
F3 (close)
try to save and the system sticks
no forced reboot this time

Changed 7 years ago by Barry Landy

Attachment: putty41.txt added

Changed 7 years ago by Barry Landy

Attachment: putty42.txt added

Changed 7 years ago by Barry Landy

Attachment: putty43.txt added

Changed 7 years ago by Barry Landy

Attachment: putty44.txt added

Changed 7 years ago by Barry Landy

Attachment: putty45.txt added

comment:180 Changed 7 years ago by Valery V. Sedletski

putty41

Booted as before (with gt); system stuck while still processing startup commands.

Why with "gt"? Use "g". Which startup? What do you mean? Those scripts executed on cmd.exe startup? It hitted a breakpoint somewhere in kernel. It should not happen normally. It is probably, because you used "gt". Don't use "gt", use "g"!

putty42

The log output from fat32.ifs is entered in kernel debugger. Why this?

putty43

Trap8, probably, because of using "gt". Never use it!

putty44

nothing understood. What happens every startup? Your startup scrips execute WordPerfect? for DOS?

putty45

Here, please, I need a more detailed log. Please, take a log with ftp://osfree.org/upload/fat32/test/fat32-0.10-r283-os2.zip.

comment:181 Changed 7 years ago by Barry Landy

Answering some queries.

1) I am not an expert in the use of the debug kernel, on the contrary. So I asked an expert and got the advice to use gt

Quote:

BIOS and DOS emulation components of a DOS VDM use hlt instructions to make OS/2 kernel
requests.  Every now and then the debug kernel notices them.  Usually a

  gt

is sufficient to continue execution.

Certainly if you wish me to use just g I will. But it doesnt sound like t should cause traps.

2) Startup commands

I mean all those things that have been set up (by system or by user) that should complete before user activity

In my case this means

startup.cmd in root and whatever is called from there
Startup folder
Xworkplace startup folders

3) Startup scripts

Your startup scrips execute WordPerfect?? for DOS?

I saw that and it was a total surprise to me. I cannot think of any place where this should be happening (though I agree the trace shows it). I have looked quite hard and not found it. Will have to look harder. I cannot even think of any reason for it. (Though I do use wordperfect for DOS)

4) I will download the test version and try again (later)

comment:182 Changed 7 years ago by Barry Landy

Comment about putty44

What I meant was that all the commands called directly or indirewctly during startup (as listed above) had completed apparently properly BUT the mouse and keyboard were inoperative and the system appeared stuck.

Before making a test which I hope will show FAT32.IFS problems I wait for the system to go quiet to avoid confusion. In this case it stuck at that point.

comment:183 Changed 7 years ago by Valery V. Sedletski

You can't get beyond a trap normally. (maybe, it will only help in case of a breakpoint trap, but not a pagefault or a protection violation). If you trapped and trying to continue, it will necessarily trap again. I don't need to continue after trap, I need to stop and see a trap screen.

Is there something suspicious in autoexec.bat/startup.cmd/script executed on cmd.exe start/in other command files, executed automatically/by a scheduler (if you hae one)? or startup folders?

comment:184 Changed 7 years ago by Valery V. Sedletski

Can you share a 14.200 kernel with me? (Together with a .sym file, if possible)? Better, the debug one. (You can try email it to me). I want to check editing with your ne.exe on this kernel. Maybe, this kernel has a broken VDM?

comment:185 Changed 7 years ago by Barry Landy

No point in sharing a kernel. I am doing these tests on ECS22. The debug kernel says it is 103a (I think). certainly not 200. The debug kernel that comes with ARCSOS is also an old kernel.

I dont see anything suspicious in the startup chain.

I think the point about qt was that the system is indeed stopping on a trap but far too early in the process. Still I will try using g by itself and see

Changed 7 years ago by Barry Landy

Attachment: putty46.txt added

comment:186 Changed 7 years ago by Barry Landy

Putty46

I dont know if this will be any use but...

Using R283 as requested, and moving beyond the initial trap by typing g

The system stops during the process of processing the startup commands/folders; I have typed nothing to the ECS system.

NB the debug kernel is 104a

comment:187 Changed 7 years ago by Valery V. Sedletski

I asked you to test a situation like in putty45.txt, where the popdir.bat is deleted into DELDIR (after closing a file in NE.EXE). So, putty46.txt is a wrong log file. It hits a int3 (a breakpoint), so it is a problem with a kernel. Please use a testcase in putty45.txt.

Using R283 as requested, and moving beyond the initial trap by typing g

Which initial trap? int 3? I only see int 3 in the log, there is nothing after it.

comment:188 Changed 7 years ago by Valery V. Sedletski

May be, initial trap is the right trap, and int 3 is occured after you pressed "g"? Then why I don't see that the very first trap in the log?

comment:189 Changed 7 years ago by Barry Landy

Doing my best! I cannot repeat the situation in putty45 unless the system initialises itself to the point where I can open a DOS window and use the keyboard. I had two tries and both stopped on that trap 3 at which point I can do nothing.

I dont know why the earlier trap 1 isnt in the log; maybe putty has a fixed size buffer?

If you want to see the trap 1 I can capture it but it is so early in the process it seems unlikely to be relevant (it is before the PM screen is initialised).

I took a quick look at putty46. VERY strangely there is a reference to d:\d\qw98.exe and I know of no reason why that should be accessed during system startup (but that is not your problem, I think!) which is very similar to the strange appearance of WP51 in an earlier log.

comment:190 Changed 7 years ago by Barry Landy

I found this and will change it!

In the PuTTY configuration window, select Window in the category tree on the left. Change the Lines of scrollback to whatever value you wish. The default is 200.8 May 2013

comment:191 Changed 7 years ago by Barry Landy

in fact it was already set tp 2000 and I have made it 10000

Changed 7 years ago by Barry Landy

Attachment: putty47.txt added

comment:192 Changed 7 years ago by Barry Landy

putty47

I did the same again and it stopped in the same place (trap3). This makes it look to me that once again extra tracing changes the timing so things go wrong at a different place.

I tried to upload the 10000 line putty47 but TRAC doesnt allow files >200K; in any event the inital trap 1 was more than 10000 lines back

If you want to see the trap 1 I will catch it for you.

comment:193 Changed 7 years ago by Barry Landy

I have changed putty to keep 100,000 lines and a little bit of experimentation shows that all the trace up to the first trap 1 is 6000 lines (202Kb) and the trace up to the int 3 (which does keep repeating) is 10000 lines (377Kb).

So if you want the longer traces it will have to be by email.

comment:194 Changed 7 years ago by Barry Landy

I also booted the ecs system with the nondebug kernel and the system initialised correctly. I then opened a dos window as in putty45, and got the same freeze at the same place. The putty script is very long - i could email it.

I realise why there are mysterious references to various programs in D: in the startup process: OS2 is verifying the desktop icons. This does NOT explain the D:\wp51 as I dont have an icon for that but does explain all the others.

Please advise if you wish me to email the logs

Changed 7 years ago by Barry Landy

Attachment: putty53.txt added

Changed 7 years ago by Barry Landy

Attachment: putty54.txt added

comment:195 Changed 7 years ago by Barry Landy

Putty53 and putty54 both with Putty buffer set to 2000 lines

remove /monitor from fat32.ifs

1) boot with normal kernel
wait for system to complete initialisation
open os2 window: command F32MON
open DOS window: ne temp\popdir.bat
make a change; F3; y to file back
--- system stuck
record as putty53
2) boot with debug kernel
same as (1)
BUT 
after the y to file back the system did a spontaneous reboot
recorded as putty54

comment:196 Changed 7 years ago by Barry Landy

(see also #53)

following Steven Levine's analysis of that problem I tried using the OS4 kernel and the above commands do not freeze the system.

so the preliminary analysis suggesting kernel stack overflow may be right. It would certainly account for random-ish effects (traps, reboots, freezes).

If so, what to do about it?

Last edited 7 years ago by Barry Landy (previous) (diff)

comment:197 Changed 7 years ago by Valery V. Sedletski

This looks to be the good news. So, it seems that the issue is with insufficient kernel stack, indeed. Please do more stress-tests with your usual task, and see if it will crash or hang.

If so, what to do about it?

I'll probably need to minimise the number of nested function calls in the IFS. So, I probably need to do some optimizations, for lesser stack usage.

comment:198 Changed 7 years ago by Barry Landy

... and when you make the changes please testthem against an OS2 kernel.

comment:199 Changed 7 years ago by Barry Landy

See also issue #53

R285 tested and shows no problems both in ECS22 and ARCAOS or with the debug kernel ( in earlier testing this showed diffferent problems).

Last edited 7 years ago by Barry Landy (previous) (diff)

comment:200 Changed 7 years ago by Barry Landy

Possible trap/stuck in FAT32.

The only reason I have to support this is that I get the trap using r285 but not when using 913. Using the retail kernel the system sticks: using the debug kernel I get the following (nd nothing else) on the terminal:

Trap 13 (0DH) - General Protection Fault 0000
eax=00004648 ebx=0000fff0 ecx=fe0f4050 edx=f9304648 esi=fd2eafd6 edi=f7460000
eip=000007e6 esp=00005bb4 ebp=00005bce iopl=0 rf -- -- nv up ei ng nz na pe nc
cs=0be8 ss=1530 ds=0bf0 es=4050 fs=0000 gs=0000 cr2=00b70000 cr3=bf0b7000 p=03
0be8:000007e6 268a1f         mov       bl,byte ptr es:[bx]      es:fff0=invalid

================

I am trying to set up a trap dump

comment:201 Changed 7 years ago by Barry Landy

The activity is to click a "file" pulldown in an application (on a FAT32 volume). The trap happens 100% of the time.

I get the same trap in ARCAOS : the only difference is that in ARCAOS I get a proper trap screen saying trap D in FAT32 (so matching the above trap 13).

I have taken a sysdump and compressed it with 7Z and will send by email to valerius and Steven Levine.

comment:202 Changed 7 years ago by Barry Landy

please also note that this trap does *not* happen with the OS4 nucleus, so I guess there is another stack problem overlooked (?).

comment:203 Changed 7 years ago by Valery V. Sedletski

A trapdump over email? Isn't it too big? Please try this build first: ftp://osfree.org/upload/fat32/test/fat32-0.10-r285-os2.zip (as I fixed one trap 000d since r285).

comment:204 Changed 7 years ago by Barry Landy

OK that test version has fixed the trap I found this morning.

comment:205 Changed 7 years ago by Valery V. Sedletski

Ok, committed changes to the repository. Now they available as r286. For those who needs a version without exFAT, it is available too (see ftp://osfree.org/upload/fat32/).

comment:206 Changed 7 years ago by Barry Landy

A further trap 13 using release 288

Trying to use the debuug kernel to investigate this I find that despite BLDlevel showing both FAT32.IFS and CACHEF32.exe as R288, when using the debug kernel, when CACHEF32.EXE is called in config.sys it reports that FAT32.IFS is R285.

The boot will not proceed beyond that point.

i will check whether I have done something wrong but is it self identifying correctly?

Changed 7 years ago by Barry Landy

Attachment: putty60.txt added

comment:207 Changed 7 years ago by Barry Landy

Re the above: FAT32.IFS was coming from \ecs\bin and not \os2\boot. The perils of using WPI.....

I now have sorted that out and back to the trap 13

version r288: obeying

xcopy d:\dfs f:\dfs

where d: is FAT32 and F: is JFS gets a trap 13 (as shown in the terminal log in PUTTY60)

comment:208 Changed 7 years ago by Valery V. Sedletski

I did xcopy /h /o /t /s /e /r /v w:\dfsee l:\dfsee\, where w: is a FAT32 drive and l: is a JFS drive -- everything copied well. Could you, please, show me the log with "/monitor" enabled?

comment:209 Changed 7 years ago by Valery V. Sedletski

Re the above: FAT32.IFS was coming from \ecs\bin and not \os2\boot. The perils of using WPI.....

WPI should not install into \ecs. Or it is not my .WPI.

Changed 7 years ago by Barry Landy

Attachment: putty61.txt added

comment:210 Changed 7 years ago by Barry Landy

yees, sorry. Putty61 is the same trap with /monitor

comment:211 Changed 7 years ago by Valery V. Sedletski

Please test ftp://osfree.org/upload/fat32/test/fat32-0.10-r289-os2.zip (with f32mon log enabled, of course)

Changed 7 years ago by Barry Landy

Attachment: putty62.txt added

comment:212 Changed 7 years ago by Barry Landy

Putty62:

using R289 - same trap 13

NB1 booted without /monitor and obeyed F32MON before starting the xcopy

NB2 if helpful I can get a dump

comment:213 Changed 7 years ago by Valery V. Sedletski

Ok, thanks. Now t1.zip at the same place (install it over the previous archive).

Changed 7 years ago by Barry Landy

Attachment: putty63.txt added

comment:214 Changed 7 years ago by Barry Landy

Putty63: using t1, same as previous

comment:215 Changed 7 years ago by Barry Landy

In case it makes a difference I should also point out that the XCOPY command is being called from a CMD file on a FAT32 formatted partition.

comment:216 Changed 7 years ago by Valery V. Sedletski

ok, now please test t2.zip from the same place.

Changed 7 years ago by Barry Landy

Attachment: putty64.txt added

comment:217 Changed 7 years ago by Barry Landy

Putty64: using t2, same as previous

comment:218 Changed 7 years ago by Valery V. Sedletski

Damn, it now crashes not in FS_READ. Please get a trap in FS_READ. I need stable results. Or, I'll need to reproduce it on my machine.

Last edited 7 years ago by Valery V. Sedletski (previous) (diff)

comment:219 Changed 7 years ago by Valery V. Sedletski

ERROR:MakeChain:Cluster 3c2 is not free!

Do you have exFAT? I never had this error on FAT32.

comment:220 Changed 7 years ago by Valery V. Sedletski

No info, from where did the FS_WRITE call appeared. The log looks to be incomplete, so it is useless. Please try taking the log one more time.

comment:221 Changed 7 years ago by Barry Landy

1) I do not have exfat

2) I carried out exactly the same procedure as the previous time so cannot explain differences unless there are timing issues.

Will try again.

Changed 7 years ago by Barry Landy

Attachment: putty65.txt added

comment:222 Changed 7 years ago by Barry Landy

putty65. Exactly the same test.

comment:223 Changed 7 years ago by Barry Landy

sorry: one (maybe) minor difference.

I switched to the boot volume (R:) before calling F32MON (so the log is going to a JFS volume) instead of to a FAT32 volume.

comment:224 Changed 7 years ago by Valery V. Sedletski

Ok, thanks. Now much better. If you put log to fat32 disk, you make things harder to understand and introduce additional problems. Please don't put log on the fat32 disk anymore.

Now make the same test with t3.zip from the same directory.

comment:225 Changed 7 years ago by Barry Landy

Putty66: same as previous with t3

Changed 7 years ago by Barry Landy

Attachment: putty66.txt added

comment:226 Changed 7 years ago by Valery V. Sedletski

Now t4.zip at the same place. Will it help?

Last edited 7 years ago by Valery V. Sedletski (previous) (diff)

comment:227 Changed 7 years ago by Barry Landy

T4 worked. the (2) xcopies completed normally. I have attached putty67 in case it is useful

Changed 7 years ago by Barry Landy

Attachment: putty67.txt added

comment:228 Changed 7 years ago by Valery V. Sedletski

Ok, so bug seems to be fixed. Will release a new fixed version soon.

comment:229 Changed 7 years ago by Valery V. Sedletski

Committed the new r290 revision, which corrected several errors, including this buffer overrun in FS_READ/FS_WRITE, which we were talked about.

PS: build is available at the usual place (ftp://osfree.org/upload/fat32/ and http://trac.netlabs.org/fat32/)

Last edited 7 years ago by Valery V. Sedletski (previous) (diff)

comment:230 Changed 7 years ago by Barry Landy

That seems to work. Thank you!

Last edited 7 years ago by Barry Landy (previous) (diff)

comment:231 Changed 7 years ago by Valery V. Sedletski

Ok, this looks as the problem with corrupted file system, which caused the hang. This is fixed when you reformatted the stick. So, I'll close this.

comment:232 Changed 7 years ago by Valery V. Sedletski

Resolution: fixed
Status: newclosed
Note: See TracTickets for help on using tickets.