Opened 12 years ago

Closed 12 years ago

#14 closed defect (fixed)

AHCI 1.22 crash

Reported by: Doug Bissett Owned by: Markus Thielen
Priority: major Milestone:
Component: driver Version:
Keywords: Cc:

Description

I am using OS2AHCI 1.22, from the eCS betazone, on my Asus M3A78-EM motherboard, with AMD quad core Phenom processor, and a new Seagate 500 GB SATA III drive. I had problems with my old Maxtor 250 GB SATA drive, and the old AHCI version so I wasn't using AHCI previously.

With OS2AHCI 1.22, I ran okay for about 3 hours, while I did my backups using BackAgain?/2000, putting the backup files on the same disk. That all worked good (roughly 4.1 GB total output, in 3 files). The next step was to burn those files to a SATA DVD drive, using DVDDAO. As soon as I started DVDDAO, the system trapped (trap000d). I got a picture of that in AHCITRAP.jpg, but something went wrong when I tried Ctrl-Alt+Numlck+Numlck. and I didn't get anything else (looks like my diskette drive died). After I rebooted, I did the DVD burn, with no trouble.

Attachments (3)

AHCITRAP.jpg (233.0 KB) - added by Doug Bissett 12 years ago.
AHCITRAP2.jpg (61.7 KB) - added by Doug Bissett 12 years ago.
os2ahci-1.24.zip (32.6 KB) - added by Markus Thielen 12 years ago.
OS2AHCI version 1.24

Download all attachments as: .zip

Change History (31)

Changed 12 years ago by Doug Bissett

Attachment: AHCITRAP.jpg added

comment:1 Changed 12 years ago by Markus Thielen

Thanks for the bug report!

Which driver interface of DVDDAO did you use, aspinkk, aspirout or OS2CDROM.DMD?
Did you use the same configuration when it worked after reboot?

Thanks
Markus

comment:2 Changed 12 years ago by Markus Thielen

Owner: changed from somebody to Markus Thielen
Status: newaccepted

Appearantly, email notifications for ticket comments work now ;-)

@dgbisse: please have a look at the questions in my previous comment, thanks.

comment:3 Changed 12 years ago by Markus Thielen

I tried to reproduce the problem on an ICH8 desktop machine; I had no problems using either of the available interfaces.

Can you still reproduce the problem?

Thanks
Markus

comment:4 in reply to:  1 Changed 12 years ago by Doug Bissett

Replying to markus.thi:

Thanks for the bug report!

Which driver interface of DVDDAO did you use, aspinkk, aspirout or OS2CDROM.DMD?
Did you use the same configuration when it worked after reboot?

Thanks
Markus

Sorry about the delayed response. I was away from home for a month.

DVDDAO was version 2.0.6, using aspirout from aspirb10.zip. Yes. I used exactly the same configuration after reboot.

comment:5 in reply to:  3 Changed 12 years ago by Doug Bissett

Replying to markus.thi:

I tried to reproduce the problem on an ICH8 desktop machine; I had no problems using either of the available interfaces.

Can you still reproduce the problem?

Thanks
Markus

Since little else has changed, I suspect that I can still cause AHCI to crash. Whether it will crash with exactly the same fault, I am not sure. I will give it a try, when I find some time. FWIW, I now have a dump partition set up, so I may be able to get a system dump.

This is not an ICH8 adapter. PCI.EXE identifies it (in IDE mode) as:

Bus 0 (PCI), Device Number 17, Device Function 0
Vendor 1002h ATI Technologies Inc
Device 4390h SB7x0/SB8x0/SB9x0 SATA Controller [IDE mode]
Command 0107h (I/O Access, Memory Access, BusMaster?, System Errors)
Status 0230h (Has Capabilities List, Supports 66MHz, Medium Timing)
Revision 00h, Header Type 00h, Bus Latency Timer 40h
Self test 00h (Self test not supported)
Cache line size 64 Bytes (16 DWords)
PCI Class Storage, type IDE (ATA)
PCI EIDE Controller Features :

BusMaster? EIDE is supported
Primary Channel is in native mode at Addresses 0 & 1
Secondary Channel is in native mode at Addresses 2 & 3

Subsystem ID 82EF1043h M3A78-EH Motherboard
Subsystem Vendor 1043h ASUSTeK Computer Inc.
Address 0 is an I/O Port : C000h..C007h
Address 1 is an I/O Port : B000h
Address 2 is an I/O Port : A000h
Address 3 is an I/O Port : 9000h
Address 4 is an I/O Port : 8000h..8007h
Address 5 is a Memory Address (0-4GiB) : FBCFF800h..FBCFFBFFh
System IRQ 11, INT# A
New Capabilities List Present:

Power Management Capability, Version 1.1

Does not support low power State D1 or D2
Does not support PME# signalling
Current Power State : D0 (Device operational, no power saving)

Unknown Capability (Code 12h)!!

comment:6 Changed 12 years ago by Doug Bissett

Okay. I did not get a trap using the DVD drive, I did get a trap (three times) when I ejected my 1 TB USB hard drive. The first time, I saw the trap screen flash, just before it tried to do a trap dump. Then, that complained about something, and the screen turned off. All I could do was reset the machine. After rebooting, it was obvious that the drive had been ejected successfully, and it remounted. I got the trap again, when I tried to eject it.

After doing a bit of searching, I found something that indicates that DUMPFS doesn't work with AHCI, which probably explains the complaint. I REMed the TRAPDUMP line in CONFIG.SYS, rebooted, mounted the USB drive, and ejected it. AHCITRAP2.JPG shows the results.

I have gone back to using the Dani driver, which seems to work well.

Changed 12 years ago by Doug Bissett

Attachment: AHCITRAP2.jpg added

comment:7 Changed 12 years ago by Markus Thielen

I'll try to reproduce the problem next week. What file system is in use on the external disk?

comment:8 Changed 12 years ago by Markus Thielen

I have an Asus M4A78-EM mainboard here, but I cannot reproduce the trap. I used a JFS formatted USB disk.

Your mainboard has a COM port, so a serial dump would be possible. Do you happen to have a 2nd machine with a COM port? You'd need a slot panel connector for the mainboard's COM port and a NULL modem cable, too. It's a bit involved, I know, but would be of great help. Here? is a detailled description of the process.

comment:9 in reply to:  8 Changed 12 years ago by Doug Bissett

Replying to markus.thi:

I have an Asus M4A78-EM mainboard here, but I cannot reproduce the trap. I used a JFS formatted USB disk.

Your mainboard has a COM port, so a serial dump would be possible. Do you happen to have a 2nd machine with a COM port? You'd need a slot panel connector for the mainboard's COM port and a NULL modem cable, too. It's a bit involved, I know, but would be of great help. Here? is a detailled description of the process.

My disk (A Samsung StoryStation?, 1 TB drive) has 2 partitions. The first is NTFS, with no drive letter for eCS, and no NTFS driver loaded. The second partition is JFS.

I will look into trying to get a com port to com port setup. I had a NULL modem cable at one time, but I have no idea where it is now.

BTW, it seems that I get no notification of updates to this incident. I only stumbled upon this when I was reminded of other problem reports.

comment:10 Changed 12 years ago by Markus Thielen

Thanks for your reply. The email nofification problem should be fixed by now, I just received an email about your update.
If it helps, I can send you a null modem cable. They are rather cheap these days ;-)

comment:11 Changed 12 years ago by Markus Thielen

Have you configured your email address? Check under "Preferences" at the top of the page.

comment:12 in reply to:  11 ; Changed 12 years ago by Doug Bissett

Replying to markus.thi:

Have you configured your email address? Check under "Preferences" at the top of the page.

Apparently not. I have now.

I found my NULL cable modem, but it has 25 pins on both sides, and my machines have 9 pin connectors. I haven't found time to follow up on this anyway. Hopefully, I can do something soon.

comment:13 in reply to:  12 ; Changed 12 years ago by Markus Thielen

Replying to dgbisse:

Apparently not. I have now.

If it works now, you'll get notified about this reply.

I found my NULL cable modem, but it has 25 pins on both sides, and my machines have 9 pin connectors. I haven't found time to follow up on this anyway. Hopefully, I can do something soon.

Would be great - thanks!

comment:14 in reply to:  13 Changed 12 years ago by Doug Bissett

Replying to markus.thi:

Replying to dgbisse:

Apparently not. I have now.

If it works now, you'll get notified about this reply.

I did.

I found my NULL cable modem, but it has 25 pins on both sides, and my machines have 9 pin connectors. I haven't found time to follow up on this anyway. Hopefully, I can do something soon.

Would be great - thanks!

I couldn't find any way to connect my old null modem, so I got a new null modem cable. Then, I spent about 3 hours, trying to figure out why I cant boot with either one of the debug kernels. They just stop, before the boot blob goes away, and before the logo appears. According to what I see on the screen, OS2KRNL loaded, then it says OS2DBCS.FNT (which doesn't exist, but I see that, briefly, with the retail kernel too) and it has the JFS line at the bottom of the screen, until I use the reset switch. If I have ZOC set up on the other system, it beeps at me, but does nothing else. I think that indicates that the link was successful. I went through the whole procedure a couple of times, including reading what SHL posted, but it just won't boot with a debug kernel (and that is long before AHCI would be involved). I have also tried PMDF, as suggested.

FWIW, I can get ZOC (both ends) to communicate okay, so the setup should be good.

One other thing, that could be involved: I leave the BIOS setting at SATA, not AHCI, because I also have WinXP installed, and it can't use AHCI, without a lot of messing around. I do know that the OS2AHCI driver loads okay, with the Dani driver REMed, so I assume that it is capable of switching the adapter to AHCI mode. If I set the BIOS to AHCI, that is the only mode that will work (meaning that WinXP won't work until I switch it back to SATA mode).

Any suggestions???

comment:15 Changed 12 years ago by Doug Bissett

I found the boot problem. The kdb.ini file that downloaded was more than 6K of XML stuff. When I click on the link, I get what I expected to see, so I copied it, and pasted it into a new file. Now, I can boot (VERY slowly).

More, if I can get some results.

comment:16 Changed 12 years ago by Doug Bissett

Okay, I got a log (almost 9 meg), using ZOC as the terminal. I got one trap, fairly early, and I stopped the trace to try using PMDF as a terminal. That did nothing, so I went back to ZOC. Search for "ON MUT" (the second one), for the start of the second trace. It stops at another trap, which happened before the boot was done. I will attempt to upload ZOC1209.LOG.

comment:17 Changed 12 years ago by Doug Bissett

No, that won't work. A file size limit of 256K is far too small. The file is at:

http://www3.telus.net/public/bissett1/ZOC1209.LOG

Please let me know when you have it, so I can remove it from that location.

Good luck...

comment:18 Changed 12 years ago by Markus Thielen

Thank you so much for your efforts! We will dig through this as soon as possible.
I got the log file and uploaded it here for future reference:
http://www.thiguten.de/projects/ahci/uploads/ZOC1209.LOG.gz

Thanks again!
-- Markus

comment:19 in reply to:  18 ; Changed 12 years ago by Doug Bissett

Replying to markus.thi:

Thank you so much for your efforts! We will dig through this as soon as possible.
I got the log file and uploaded it here for future reference:
http://www.thiguten.de/projects/ahci/uploads/ZOC1209.LOG.gz

Thanks again!
-- Markus

I hope it has the information that you need. Let me know if you need anything else.

FWIW, I looked into what needs to be done to winXP, and found that AMD has now made an AHCI driver available. I will give it a try, eventually.

You may also be interested to know, that I used my antique IBM ThinkPad? A22e (900 mhz P3 with 256 meg of memory, running eCS 2.0) to run ZOC on the "other" end of the null modem. It seems to have done the job.

comment:20 in reply to:  19 Changed 12 years ago by Markus Thielen

Replying to dgbisse:

You may also be interested to know, that I used my antique IBM ThinkPad? A22e (900 mhz P3 with 256 meg of memory, running eCS 2.0) to run ZOC on the "other" end of the null modem. It seems to have done the job.

Never through away a running (old) machine... ;-)

Changed 12 years ago by Markus Thielen

Attachment: os2ahci-1.24.zip added

OS2AHCI version 1.24

comment:21 Changed 12 years ago by Markus Thielen

Your debug log was very informative... thanks again!

I attached version 1.24 of OS2AHCI to this ticket. Could you give it a try?

Thanks!
--Markus

comment:22 in reply to:  21 Changed 12 years ago by Doug Bissett

Replying to markus.thi:

Your debug log was very informative... thanks again!

Glad to hear that...

I attached version 1.24 of OS2AHCI to this ticket. Could you give it a try?

Thanks!
--Markus

I installed the 1.24 version, and I have now booted 3 times, with no trouble. The first boot was done with the debug kernel, it got to the desktop, and I was able to do a shut down. Of course the system is not really usable that way, so I put the retail kernel back in, and booted twice more. I also tried the USB drive, with no trouble.

I have been able to run for a few hours in the past, so I am not able to say that you have actually fixed the problem, but it looks good so far. I will run this way, to see what happens.

THANKS!

comment:23 Changed 12 years ago by Doug Bissett

Okay, I ran for over 4 hours in my eCS 2.0 boot system. I am pretty sure that I was never able to do that before, using AHCI. I burned a DVD, and I used my USB disk drive, with no troubles.

Now, I installed AHCI 1.24 onto my eCS 2.1 system. I will continue to use AHCI 1.24, until I see a problem, or you update it.

I also installed AHCI 1.24 onto my Lenovo ThinkPad? T510, which has always worked with AHCI. I haven't had much activity, yet, but it seems to be okay.

Well done...

comment:24 Changed 12 years ago by Markus Thielen

This is very nice to hear!

The next release will target #13, which will take a little while. If you do not encounter any problems til then I think we will be able to close this ticket.

Cheers, and another warm thank you for spending so much time on this issue!

--Markus

comment:25 in reply to:  24 Changed 12 years ago by Doug Bissett

Replying to markus.thi:

This is very nice to hear!

The next release will target #13, which will take a little while. If you do not encounter any problems til then I think we will be able to close this ticket.

Cheers, and another warm thank you for spending so much time on this issue!

--Markus

Too bad I didn't get to it earlier, but life goes on.

I am now 4 hours into using v 1.24 on my main machine (where I had the problem), in eCS 2.1. It is still good. I am also more than 3 hours running on the T510 (with little activity), and it is also still good.

I think you fixed it.

comment:26 Changed 12 years ago by Doug Bissett

Just a quick update: Still no problems on either system.

THANKS!

comment:27 Changed 12 years ago by Doug Bissett

Just to confirm the fix. I still have no problems with AHCI.

Thank you very much...

comment:28 Changed 12 years ago by Markus Thielen

Resolution: fixed
Status: acceptedclosed

Thank *you*!

Note: See TracTickets for help on using tickets.