Opened 2 years ago

Closed 2 years ago

#53 closed defect (fixed)

Writing large eas or coping a file with large eas trap the system r282

Reported by: gyoung Owned by:
Priority: major Milestone: Future
Component: IFS Version:
Severity: medium Keywords:
Cc: stevenhl, landb

Description

Writing large eas to a VFAT drive causes a system trap. The eas are a thumbnail created by Lucide. It also traps in fat32 when you try to copy a file with large eas attached (again a thumbnail created by Lucide). Steps to reproduce:

Place a pdf file without a thumbnail on the VFAT drive. Open it with Lucide set to create thumbnails When you close the file or Lucide (which is when the eas get written) you trap

The same thing occurs on a FAT32 drive if you copy a pdf that already has a thumbnail as part of its eas.

Note this was not happening with r280 but was with r276 and r277.

The following is the trap screen from the VFAT trap described above. I have a system dump of both traps above. I also have a system dump from just repeatedly accessing the drive from the drives folder which was also fixed by turning off eas. All these problems go away with eas turned off.

Change History (33)

comment:1 Changed 2 years ago by gyoung

This is the correct trap screen info. The one in ticket 52 was not and I can't figure out how to edit the description. Sorry Thanks

#

TRAP SCREEN INFORMATION

OS/2 Kernel Revision 14.200_SMP Exception in module:

TRAP 0008 ERRCD=0000 ERACC= ERLIM= CPU=00

EAX=ff78cf28 EBX=ff78cf28 ECX=ff78cf28 EDX=00001001 ESI=ff67f258 EDI=fed20000 EBP=00004f78 FLG=00010202 CS:EIP=0168:fff43cae CSACC=c09b CSLIM=ffffffff SS:ESP=1530:00004f50 SSACC=1097 SSLIM=00004f4f DS=0160 DSACC=c093 DSLIM=ffffffff CR0=8001003b ES=0160 ESACC=c093 ESLIM=ffffffff CR2=fc318ff8 FS=0000 FSACC= FSLIM= GS=0000 GSACC= GSLIM= %fff43900 OS2KRNL _pgGetPF + 3ae #

comment:2 Changed 2 years ago by valerius

How can I cause Lucide to create *large* thumbnail? My PDF file has 6766 bytes thumbnail and it does not trap. Another file is 12 MB in size and thumbnail is 30 KBytes. How much bytes should be the Large thumbnail? 60 KB? Maybe, this could be easier achieved with PMView? As I heard, it can create large thumbnails too. But my PMView version does not create thumbnails. Maybe, because it is unregistered? (I see nothing about thumbnails in its settings).

PS: Trap 0008 screen is useless, because it is second trap, which has occured when processing the first trap. But I need the first trap, not the second. It looks like the page fault occured, and it trapped in its handler.

comment:3 Changed 2 years ago by valerius

PPS: since r280, I didn't changed anything related to EA's. So everything should be the same on r282 like on r280 (and on r276/r277 too).

Last edited 2 years ago by valerius (previous) (diff)

comment:4 follow-up: Changed 2 years ago by valerius

Could you upload somewhere a zipped file with large EA's? Hm, tried a 50 MB .pdf file, but it's thumbnail is only 9 KB...

Last edited 2 years ago by valerius (previous) (diff)

comment:5 in reply to: ↑ 4 Changed 2 years ago by aschn

Replying to valerius:

Could you upload somewhere a zipped file with large EA's? Hm, tried a 50 MB .pdf file, but it's thumbnail is only 9 KB...

Just take a huge REXX file. REXX.TOKENSIMAGE can get larger.

comment:6 Changed 2 years ago by valerius

I have only PPWizard, which is too huge (300 KB) and it does not fit into EA's. Also, I have some install.cmd of 40 KB in size. Its REXX.TOKENSIMAGE is only 37 KB, which is not much. Others are small. So, it doesn't help me.

comment:7 follow-up: Changed 2 years ago by valerius

I generated a REXX file with 1842 "say Line #XXX" lines. A REXX.TOKENSIMAGE of 44 KB in size is created. But if I add 1843th line, no EA's created at all. So, I cannot create EA's > 44 KB.

comment:8 Changed 2 years ago by valerius

Looking at it closer, I see that except for 44 KB REXX.TOKENSIMAGE, there is also 21 KB REXX.LITERALPOOL, total 65 KB which is close to the maximum. I copied it to FAT32 partition -- no trap. The same is with FAT16 and exFAT partitions.

comment:9 Changed 2 years ago by gyoung

The eas that cause this here are in the 3-5 KB range. The actual files I used are too large to attach to the ticket. Based on the pattern I am seeing I think this is a timing issue as some recent revisions trap consistently on access (276), other revisions allow you to access the drive and trap when eas are written (282) and others don't trap at least in limited use (280). I have a quad core processor so it might be a multiprocessor issue. I will try turning eas on again but run only one processor.

comment:10 Changed 2 years ago by gyoung

It isn't an SMP issue. I get the trap with only 1 processor active

comment:11 in reply to: ↑ 7 Changed 2 years ago by aschn

Replying to valerius:

I generated a REXX file with 1842 "say Line #XXX" lines. A REXX.TOKENSIMAGE of 44 KB in size is created. But if I add 1843th line, no EA's created at all. So, I cannot create EA's > 44 KB.

Interesting and thanks for testing and reporting. When I checked my REXX EAs I can't find a larger value either, but I thought I made a mistake. OT: Apparently there is a system limit at 44 KB, maybe for to leave enough space for other EAs to keep everything below 64 KB.

comment:12 Changed 2 years ago by stevenhl

I have done some preliminary analysis of dump file provided by Gregg. The issue has nothing to do with the thumbnail size. The kernel stack is overflowing. It's not yet clear whether there is some form of indirect recursion in the IFS or if this is an unexpected page fault that is consuming extra stack space.

At the ring3 level we have the expected DosSetPathInfo? call and the parameters are valid and the buffers appear to be committed.

I am getting some odd results from pmdf in the sense that I'm not seeing any references to code owned by the fat32 driver, but I'm not done looking.

comment:13 Changed 2 years ago by stevenhl

  • Cc stevenhl added

comment:14 Changed 2 years ago by valerius

2stevenhl: Thanks for the analysis. I'm not getting those traps because I have OS/4 kernel, which has twice more kernel stack than IBM's kernel. I'll look at it closer. Need to check IBM's kernel behaviour (I checked it briefly, found no problems so far).

comment:15 Changed 2 years ago by valerius

PS: Barry Landy has another problem in ticket #46. It has spontaneous hangs or hits at int 3 somewhere in KernelFaultEntry?. According to fat32.ifs trace messages, at this time no operation is done which could lead to a hang or a trap. I suspect that he has the same problem. Need to try if the problems will disappear with an OS/4 kernel.

comment:16 Changed 2 years ago by valerius

2aschn: This is REXX.TOKENSIMAGE is 44 KB in size. But there is also REXX.LITERALPOOL which is 21 KB, 64 KB total. So, the limit is reached.

comment:17 Changed 2 years ago by valerius

Tried to copy a test file with 21 KB of REXX.LITERALPOOL and 44 KB of REXX.TOKENSIMAGE having loaded the 14.106_SMP kernel. No trap so far. So, some extra circumstances are needed to get the kernel stack exhausted.

comment:18 Changed 2 years ago by stevenhl

@valerius: I find it somewhat surprising that you expected to be able to develop a stable IFS for folks that use the OS/2 kernel when you know how different it is compared to the OS4 kernel.

Regarding Barry's issue, the int3 occurs because the fault handler cannot decode the instructions at what is supposed to be the cs:eip. This too could be a side effect of stack overrun, but I'd need to see a system dump to say for sure.

comment:19 Changed 2 years ago by valerius

@valerius: I find it somewhat surprising that you expected to be able to develop a stable IFS for folks that use the OS/2 kernel when you know how different it is compared to the OS4 kernel.

What is surprising? Expected by whom? Nothing understood.

I am trying to be compatible with IBM kernels, of course. Though, I generally use OS/4 kernel on all my machines, so I might not noticed something happening on IBM's kernels.

comment:20 Changed 2 years ago by stevenhl

  • Cc stevenhl removed

comment:21 Changed 2 years ago by stevenhl

  • Cc stevenhl added

comment:22 Changed 2 years ago by valerius

Ah, different... (I reread your phrase several times and only now got it) -- It is not very different from IBM's kernel. There are some useful enhancements. But it is backwards compatible, so it should not be a problem. With stack overrun it is just overlooked. I tested it with IBM kernel, but briefly, and I didn't experienced stack overflows so far. So, I need to test more thoroughly.

comment:23 Changed 2 years ago by valerius

Reproduced a trap 0008 on a 14.104_UNI dbg. It looks that it overflows the stack when trying to move a file to DELDIR:

P:82 T:9b D:0 T:43:617 FS_PROCESSNAME for W:\delete
P:82 T:9b D:0 T:43:621  FS_PROCESSNAME returned filename: W:\delete
P:82 T:9b D:0 T:43:626 FS_MKDIR - W:\delete
P:82 T:9b D:0 T:43:630 GetFatAccess: ReadSector
P:82 T:9b D:0 T:43:638 ReleaseFat
P:82 T:9b D:0 T:43:642 FS_MKDIR returned 5
P:82 T:9b D:0 T:43:646 FS_PROCESSNAME for W:\delete\29012621.53
P:82 T:9b D:0 T:43:652  FS_PROCESSNAME returned filename: W:\delete\29012621.53
P:82 T:9b D:0 T:43:660 FS_PROCESSNAME for W:\Temp\MD98\1STBOOT.BMP
P:82 T:9b D:0 T:43:666  FS_PROCESSNAME returned filename: W:\Temp\MD98\1STBOOT.BMP
P:82 T:9b D:0 T:43:671 FS_MOVE W:\Temp\MD98\1STBOOT.BMP to W:\delete\29012621.53
P:82 T:9b D:0 T:43:679 GetFatAccess: ReadSector
P:82 T:9b D:0 T:43:685 ReleaseFat
P:82 T:9b D:0 T:43:689 GetFatAccess: ReadSector
P:82 T:9b D:0 T:43:695 ReleaseFat
P:82 T:9b D:0 T:43:699 GetFatAccess: ReadSector
P:82 T:9b D:0 T:43:705 ReleaseFat
P:82 T:9b D:0 T:43:708 GetFatAccess: ReadSector
P:82 T:9b D:0 T:43:714 ReleaseFat
P:82 T:9b D:0 T:43:720 GetFatAccess: ReadSector
P:82 T:9b D:0 T:43:751 ReleaseFat
P:82 T:9b D:0 T:43:761 GetFatAccess: ReadSector
P:82 T:9b D:0 T:43:777 ReleaseFat
P:82 T:9b D:0 T:43:787 GetFatAccess: GetNextCluster
P:82 T:9b D:0 T:43:791 ReleaseFat
Trap 8 - Double Exception Detected 0000
eax=a03c1689 ebx=00000092 ecx=00000092 edx=00800092 esi=00000010 edi=00000010
eip=fff4763b esp=000001f4 ebp=00004f04 iopl=2 rf -- -- nv up ei pl nz ac pe nc
cs=0178 ss=0090 ds=0170 es=0170 fs=0000 gs=0170  cr2=00ac0000  cr3=00212000
0178:fff4763b 53             push      ebx
No symbols found
0178:fff4763b 53             push      ebx
0178:fff4763c 33db           xor       ebx,ebx
0178:fff4763e 57             push      edi
0178:fff4763f 8b7d0c         mov       edi,dword ptr [ebp+0c]
0178:fff47642 56             push      esi
0178:fff47643 895df8         mov       dword ptr [ebp-08],ebx
0178:fff47646 8b7508         mov       esi,dword ptr [ebp+08]
0178:fff47649 f7e6           mul       esi
0178:fff4764b 8bc6           mov       eax,esi
0178:fff4764d c1ea08         shr       edx,08
0178:fff47650 c1e808         shr       eax,08
0178:fff47653 8955d4         mov       dword ptr [ebp-2c],edx
0090:000001f4  00004ecc 00000030 00000000 
Past end of segment: 0090:00000200
##

2sevenhl: The current stack is a special stack for trap handlers (selector 0x90). How can I find a ring0 stack for drivers (which has been oveflown)? (Is it selector 0x30?) Does it have the same selector each time?

comment:24 Changed 2 years ago by valerius

##r                                                                          
Trap 8 - Double Exception Detected 0000                                      
eax=a03c1689 ebx=00000092 ecx=00000092 edx=00800092 esi=00000010 edi=00000010
eip=fff4763b esp=000001f4 ebp=00004f04 iopl=2 rf -- -- nv up ei pl nz ac pe nc
cs=0178 ss=0090 ds=0170 es=0170 fs=0000 gs=0170  cr2=00ac0000  cr3=00212000
0178:fff4763b 53             push      ebx        
##ln                                              
No symbols found                                  
##k                                               
Past end of segment: 0090:00004f04                
##dg 90
0090  Data    Bas=ffdba8ac Lim=000001ff DPL=0 P  RW    A  
##

So, the trap handler stack (0x90) appears to be overflown (stack limit is reached), that's because trap 0008 occured. Not sure about the original stack. -- I thought, it is insufficient of original stack.

Last edited 2 years ago by valerius (previous) (diff)

comment:25 Changed 2 years ago by valerius

I'm now in the process of removing big buffers located on stack (for strings/directory entries/etc.) from code, and changing them to dynamically-allocated buffers. So far, I no more see traps when copying/deleting files (mostly, when moving a file or its EA's to a DELDIR). Need some time for removing them from everywhere. So, please wait for some time.

Last edited 2 years ago by valerius (previous) (diff)

comment:26 Changed 2 years ago by landb

OK. Happy to test my system again when you are ready.

comment:27 Changed 2 years ago by landb

  • Cc landb added

comment:28 Changed 2 years ago by gyoung

I am also happy to test when you are ready. Thanks

comment:29 Changed 2 years ago by valerius

2all: I uploaded a new r285 revision with relaxed stack usage. I did some tests, including booting from fat32 file system, as a stress-test -- no traps so far. Please test. The binaries are at the usual place (ftp://osfree.org/upload/fat32/).

comment:30 Changed 2 years ago by gyoung

I have tried this with most of the things that caused this and so far can't reproduce it. Thanks

comment:31 Changed 2 years ago by landb

I have also tested this both in ECS22 and ARCAOS and also using debug kernel and have not made it fail. (see also ticket #46)

comment:32 Changed 2 years ago by valerius

So, it looks like this was the problem with heavy stack usage. Now we use much lesser memory for stack, so, the error no more repeated. So, I close this ticket.

comment:33 Changed 2 years ago by valerius

  • Resolution set to fixed
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.