Opened 8 years ago
Closed 7 years ago
#53 closed defect (fixed)
Writing large eas or coping a file with large eas trap the system r282
Reported by: | Gregg Young | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | Future |
Component: | IFS | Version: | |
Severity: | medium | Keywords: | |
Cc: | stevenhl, Barry Landy |
Description
Writing large eas to a VFAT drive causes a system trap. The eas are a thumbnail created by Lucide. It also traps in fat32 when you try to copy a file with large eas attached (again a thumbnail created by Lucide). Steps to reproduce:
Place a pdf file without a thumbnail on the VFAT drive. Open it with Lucide set to create thumbnails When you close the file or Lucide (which is when the eas get written) you trap
The same thing occurs on a FAT32 drive if you copy a pdf that already has a thumbnail as part of its eas.
Note this was not happening with r280 but was with r276 and r277.
The following is the trap screen from the VFAT trap described above. I have a system dump of both traps above. I also have a system dump from just repeatedly accessing the drive from the drives folder which was also fixed by turning off eas. All these problems go away with eas turned off.
Change History (33)
comment:1 by , 8 years ago
comment:2 by , 8 years ago
How can I cause Lucide to create *large* thumbnail? My PDF file has 6766 bytes thumbnail and it does not trap. Another file is 12 MB in size and thumbnail is 30 KBytes. How much bytes should be the Large thumbnail? 60 KB? Maybe, this could be easier achieved with PMView? As I heard, it can create large thumbnails too. But my PMView version does not create thumbnails. Maybe, because it is unregistered? (I see nothing about thumbnails in its settings).
PS: Trap 0008 screen is useless, because it is second trap, which has occured when processing the first trap. But I need the first trap, not the second. It looks like the page fault occured, and it trapped in its handler.
comment:3 by , 8 years ago
follow-up: 5 comment:4 by , 8 years ago
Could you upload somewhere a zipped file with large EA's? Hm, tried a 50 MB .pdf file, but it's thumbnail is only 9 KB...
comment:5 by , 8 years ago
Replying to valerius:
Could you upload somewhere a zipped file with large EA's? Hm, tried a 50 MB .pdf file, but it's thumbnail is only 9 KB...
Just take a huge REXX file. REXX.TOKENSIMAGE can get larger.
comment:6 by , 8 years ago
I have only PPWizard, which is too huge (300 KB) and it does not fit into EA's. Also, I have some install.cmd of 40 KB in size. Its REXX.TOKENSIMAGE is only 37 KB, which is not much. Others are small. So, it doesn't help me.
follow-up: 11 comment:7 by , 8 years ago
I generated a REXX file with 1842 "say Line #XXX" lines. A REXX.TOKENSIMAGE of 44 KB in size is created. But if I add 1843th line, no EA's created at all. So, I cannot create EA's > 44 KB.
comment:8 by , 8 years ago
Looking at it closer, I see that except for 44 KB REXX.TOKENSIMAGE, there is also 21 KB REXX.LITERALPOOL, total 65 KB which is close to the maximum. I copied it to FAT32 partition -- no trap. The same is with FAT16 and exFAT partitions.
comment:9 by , 8 years ago
The eas that cause this here are in the 3-5 KB range. The actual files I used are too large to attach to the ticket. Based on the pattern I am seeing I think this is a timing issue as some recent revisions trap consistently on access (276), other revisions allow you to access the drive and trap when eas are written (282) and others don't trap at least in limited use (280). I have a quad core processor so it might be a multiprocessor issue. I will try turning eas on again but run only one processor.
comment:11 by , 8 years ago
Replying to valerius:
I generated a REXX file with 1842 "say Line #XXX" lines. A REXX.TOKENSIMAGE of 44 KB in size is created. But if I add 1843th line, no EA's created at all. So, I cannot create EA's > 44 KB.
Interesting and thanks for testing and reporting. When I checked my REXX EAs I can't find a larger value either, but I thought I made a mistake. OT: Apparently there is a system limit at 44 KB, maybe for to leave enough space for other EAs to keep everything below 64 KB.
comment:12 by , 8 years ago
I have done some preliminary analysis of dump file provided by Gregg. The issue has nothing to do with the thumbnail size. The kernel stack is overflowing. It's not yet clear whether there is some form of indirect recursion in the IFS or if this is an unexpected page fault that is consuming extra stack space.
At the ring3 level we have the expected DosSetPathInfo call and the parameters are valid and the buffers appear to be committed.
I am getting some odd results from pmdf in the sense that I'm not seeing any references to code owned by the fat32 driver, but I'm not done looking.
comment:13 by , 8 years ago
Cc: | added |
---|
comment:14 by , 8 years ago
2stevenhl: Thanks for the analysis. I'm not getting those traps because I have OS/4 kernel, which has twice more kernel stack than IBM's kernel. I'll look at it closer. Need to check IBM's kernel behaviour (I checked it briefly, found no problems so far).
comment:15 by , 8 years ago
PS: Barry Landy has another problem in ticket #46. It has spontaneous hangs or hits at int 3 somewhere in KernelFaultEntry. According to fat32.ifs trace messages, at this time no operation is done which could lead to a hang or a trap. I suspect that he has the same problem. Need to try if the problems will disappear with an OS/4 kernel.
comment:16 by , 8 years ago
2aschn: This is REXX.TOKENSIMAGE is 44 KB in size. But there is also REXX.LITERALPOOL which is 21 KB, 64 KB total. So, the limit is reached.
comment:17 by , 8 years ago
Tried to copy a test file with 21 KB of REXX.LITERALPOOL and 44 KB of REXX.TOKENSIMAGE having loaded the 14.106_SMP kernel. No trap so far. So, some extra circumstances are needed to get the kernel stack exhausted.
comment:18 by , 8 years ago
@valerius: I find it somewhat surprising that you expected to be able to develop a stable IFS for folks that use the OS/2 kernel when you know how different it is compared to the OS4 kernel.
Regarding Barry's issue, the int3 occurs because the fault handler cannot decode the instructions at what is supposed to be the cs:eip. This too could be a side effect of stack overrun, but I'd need to see a system dump to say for sure.
comment:19 by , 8 years ago
@valerius: I find it somewhat surprising that you expected to be able to develop a stable IFS for folks that use the OS/2 kernel when you know how different it is compared to the OS4 kernel.
What is surprising? Expected by whom? Nothing understood.
I am trying to be compatible with IBM kernels, of course. Though, I generally use OS/4 kernel on all my machines, so I might not noticed something happening on IBM's kernels.
comment:20 by , 8 years ago
Cc: | removed |
---|
comment:21 by , 8 years ago
Cc: | added |
---|
comment:22 by , 8 years ago
Ah, different... (I reread your phrase several times and only now got it) -- It is not very different from IBM's kernel. There are some useful enhancements. But it is backwards compatible, so it should not be a problem. With stack overrun it is just overlooked. I tested it with IBM kernel, but briefly, and I didn't experienced stack overflows so far. So, I need to test more thoroughly.
comment:23 by , 8 years ago
Reproduced a trap 0008 on a 14.104_UNI dbg. It looks that it overflows the stack when trying to move a file to DELDIR:
P:82 T:9b D:0 T:43:617 FS_PROCESSNAME for W:\delete P:82 T:9b D:0 T:43:621 FS_PROCESSNAME returned filename: W:\delete P:82 T:9b D:0 T:43:626 FS_MKDIR - W:\delete P:82 T:9b D:0 T:43:630 GetFatAccess: ReadSector P:82 T:9b D:0 T:43:638 ReleaseFat P:82 T:9b D:0 T:43:642 FS_MKDIR returned 5 P:82 T:9b D:0 T:43:646 FS_PROCESSNAME for W:\delete\29012621.53 P:82 T:9b D:0 T:43:652 FS_PROCESSNAME returned filename: W:\delete\29012621.53 P:82 T:9b D:0 T:43:660 FS_PROCESSNAME for W:\Temp\MD98\1STBOOT.BMP P:82 T:9b D:0 T:43:666 FS_PROCESSNAME returned filename: W:\Temp\MD98\1STBOOT.BMP P:82 T:9b D:0 T:43:671 FS_MOVE W:\Temp\MD98\1STBOOT.BMP to W:\delete\29012621.53 P:82 T:9b D:0 T:43:679 GetFatAccess: ReadSector P:82 T:9b D:0 T:43:685 ReleaseFat P:82 T:9b D:0 T:43:689 GetFatAccess: ReadSector P:82 T:9b D:0 T:43:695 ReleaseFat P:82 T:9b D:0 T:43:699 GetFatAccess: ReadSector P:82 T:9b D:0 T:43:705 ReleaseFat P:82 T:9b D:0 T:43:708 GetFatAccess: ReadSector P:82 T:9b D:0 T:43:714 ReleaseFat P:82 T:9b D:0 T:43:720 GetFatAccess: ReadSector P:82 T:9b D:0 T:43:751 ReleaseFat P:82 T:9b D:0 T:43:761 GetFatAccess: ReadSector P:82 T:9b D:0 T:43:777 ReleaseFat P:82 T:9b D:0 T:43:787 GetFatAccess: GetNextCluster P:82 T:9b D:0 T:43:791 ReleaseFat Trap 8 - Double Exception Detected 0000 eax=a03c1689 ebx=00000092 ecx=00000092 edx=00800092 esi=00000010 edi=00000010 eip=fff4763b esp=000001f4 ebp=00004f04 iopl=2 rf -- -- nv up ei pl nz ac pe nc cs=0178 ss=0090 ds=0170 es=0170 fs=0000 gs=0170 cr2=00ac0000 cr3=00212000 0178:fff4763b 53 push ebx No symbols found 0178:fff4763b 53 push ebx 0178:fff4763c 33db xor ebx,ebx 0178:fff4763e 57 push edi 0178:fff4763f 8b7d0c mov edi,dword ptr [ebp+0c] 0178:fff47642 56 push esi 0178:fff47643 895df8 mov dword ptr [ebp-08],ebx 0178:fff47646 8b7508 mov esi,dword ptr [ebp+08] 0178:fff47649 f7e6 mul esi 0178:fff4764b 8bc6 mov eax,esi 0178:fff4764d c1ea08 shr edx,08 0178:fff47650 c1e808 shr eax,08 0178:fff47653 8955d4 mov dword ptr [ebp-2c],edx 0090:000001f4 00004ecc 00000030 00000000 Past end of segment: 0090:00000200 ##
2sevenhl: The current stack is a special stack for trap handlers (selector 0x90). How can I find a ring0 stack for drivers (which has been oveflown)? (Is it selector 0x30?) Does it have the same selector each time?
comment:24 by , 8 years ago
##r Trap 8 - Double Exception Detected 0000 eax=a03c1689 ebx=00000092 ecx=00000092 edx=00800092 esi=00000010 edi=00000010 eip=fff4763b esp=000001f4 ebp=00004f04 iopl=2 rf -- -- nv up ei pl nz ac pe nc cs=0178 ss=0090 ds=0170 es=0170 fs=0000 gs=0170 cr2=00ac0000 cr3=00212000 0178:fff4763b 53 push ebx ##ln No symbols found ##k Past end of segment: 0090:00004f04 ##dg 90 0090 Data Bas=ffdba8ac Lim=000001ff DPL=0 P RW A ##
So, the trap handler stack (0x90) appears to be overflown (stack limit is reached), that's because trap 0008 occured. Not sure about the original stack. -- I thought, it is insufficient of original stack.
comment:25 by , 8 years ago
I'm now in the process of removing big buffers located on stack (for strings/directory entries/etc.) from code, and changing them to dynamically-allocated buffers. So far, I no more see traps when copying/deleting files (mostly, when moving a file or its EA's to a DELDIR). Need some time for removing them from everywhere. So, please wait for some time.
comment:27 by , 8 years ago
Cc: | added |
---|
comment:29 by , 8 years ago
2all: I uploaded a new r285 revision with relaxed stack usage. I did some tests, including booting from fat32 file system, as a stress-test -- no traps so far. Please test. The binaries are at the usual place (ftp://osfree.org/upload/fat32/).
comment:30 by , 8 years ago
I have tried this with most of the things that caused this and so far can't reproduce it. Thanks
comment:31 by , 8 years ago
I have also tested this both in ECS22 and ARCAOS and also using debug kernel and have not made it fail. (see also ticket #46)
comment:32 by , 7 years ago
So, it looks like this was the problem with heavy stack usage. Now we use much lesser memory for stack, so, the error no more repeated. So, I close this ticket.
comment:33 by , 7 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
This is the correct trap screen info. The one in ticket 52 was not and I can't figure out how to edit the description. Sorry Thanks
#
OS/2 Kernel Revision 14.200_SMP Exception in module:
EAX=ff78cf28 EBX=ff78cf28 ECX=ff78cf28 EDX=00001001 ESI=ff67f258 EDI=fed20000 EBP=00004f78 FLG=00010202 CS:EIP=0168:fff43cae CSACC=c09b CSLIM=ffffffff SS:ESP=1530:00004f50 SSACC=1097 SSLIM=00004f4f DS=0160 DSACC=c093 DSLIM=ffffffff CR0=8001003b ES=0160 ESACC=c093 ESLIM=ffffffff CR2=fc318ff8 FS=0000 FSACC= FSLIM= GS=0000 GSACC= GSLIM= %fff43900 OS2KRNL _pgGetPF + 3ae #