Opened 12 years ago

Closed 11 years ago

#56 closed defect (fixed)

changeset 780 breaks USB hub (in port replicator)

Reported by: abwillis Owned by: somebody
Priority: major Component: basedrv
Version: Keywords:
Cc: lars.erdmann@…

Description

I did an svn pull which took me to revision 781 and my trackball plugged into my port replicator quit working. Did an svn up -r 780 and it still did not work. svn up -r 779 (which I had believe was the last one I had built previously) and it works. Will gather more information.

Attachments (1)

willis.zip (75.6 KB ) - added by Lars Erdmann 12 years ago.

Download all attachments as: .zip

Change History (19)

comment:1 by abwillis, 12 years ago

Saw that there is now a 782, pulled it to see if it was the NULL pointer issue it mentions but still doesn't work. USBDock and USB Resource Manager both show no USB devices attached.

comment:2 by Lars Erdmann, 12 years ago

Can you work on this on your own ? I have zero time.

comment:3 by abwillis, 12 years ago

Mind you, I have no idea what the code is doing but it seemed the most likely candidate to try backing out first and it worked:

Index: ehcaccio.c
===================================================================
--- ehcaccio.c	(revision 782)
+++ ehcaccio.c	(working copy)
@@ -1048,6 +1048,12 @@
       } /* endif */
    }
 
+
+   if (g_pCurrUSBRB == pUSBRB)
+   {
+      g_pCurrUSBRB = g_pCurrUSBRB->next;
+   }
+
    pUSBRB->status = USBRB_DATA_EMPTY;
    pUSBRB->next   = NULL;
    pUSBRB->prev   = NULL;

comment:4 by abwillis, 12 years ago

It is in the:

SUBROUTINE NAME: UnallocateUsbrbData

a little more context:

   if (g_pEndUSBRBList == pUSBRB)
   {
      g_pEndUSBRBList = pUSBRB->prev;
      if (g_pEndUSBRBList) {
         g_pEndUSBRBList->next = NULL;
      } /* endif */
   }


   if (g_pCurrUSBRB == pUSBRB)
   {
      g_pCurrUSBRB = g_pCurrUSBRB->next;
   }

   pUSBRB->status = USBRB_DATA_EMPTY;
   pUSBRB->next   = NULL;
   pUSBRB->prev   = NULL;

   Trace(0x0050,sizeof(pUSBRB),&pUSBRB,FALSE)

comment:5 by Lars Erdmann, 12 years ago

Very strange. What this code is supposed to do is to advance the read pointer (g_pCurrUSBRB) used in "GetNextUSBRBData" to the next element if it happens to point to the element that is to be freed (pUSBRB).
I'll just remove it and let's see what happens. I don't pretend I understand it.

comment:6 by Lars Erdmann, 12 years ago

Maybe I misunderstood: you put it BACK IN ? That makes more sense ...

comment:7 by abwillis, 12 years ago

Correct, I put it back in. In changeset 780 g_pCurrUSBRB = g_pCurrUSBRB->next; was initially changed to g_pCurrUSBRB = pUSBRB->next; which caused the initial breakage here and then in 782 it was completely removed. Once that part of the changeset 780 is backed out (and compared to the current trunk it requires adding in the entire section) I no longer experience problems.

comment:8 by Lars Erdmann, 12 years ago

This is even more strange: if (g_pCurrUSBRB == pUSBRB) how can it then matter if I use g_pCurrUSBRB = g_pCurrUSBRB->next or g_pCurrUSBRB = pUSBRB->next ? This looks like a serialization issue.
Something disturbes the "UnallocateUsbrbData" routine and modifies g_pCurrUSBRB. That's what needs to be fixed. Most likely it's the interaction between "UnallocateUsbrbData" and "GetNextUsbrbData".
I tried to serialize these 2 routines with a mutex but seemingly something goes wrong.

comment:9 by Lars Erdmann, 12 years ago

Damn, now that we are talking about it: now I see that (for whatever reason, I don't remember) I am using GetNextUsbrbData without mutex protection on every call. Need to change that ...

comment:10 by Lars Erdmann, 12 years ago

It turns out that "chkdsk" is a heavy operation:

Can you do this: format a USB stick with HPFS and do a chkdsk x: /F:3 on it ? That operation will read every single sector of the USB stick (it'll take a long time). Can you check if that will hang your system ? It does on my system, at least sometimes. Report back your results. You might also want to try and format a USB stick with JFS. However for JFS you cannot specify /F:3.

comment:11 by abwillis, 12 years ago

I put in the disk that I had had as JFS and was going to check it first (and am) it turns out that I had reformatted as HPFS (I think I was not wanting the journalling on my flash drive and was originally going to try hpfs386 but that doesn't like other boot options I have). It is performing a chkdsk /f:3 now. Currently plugged directly into my laptop. Will try in the dock later. So far it is at 3% on an 8G drive. I have the usbehcd.sys /s:1 set in config.sys

comment:12 by Lars Erdmann, 12 years ago

If you don't want to wait for ages it would help if you tried say a 1 Gig stick. Note that /F:3 would also take for ages when run against a hard disk. What I observed is that for each sector USBMSD.ADD makes 3 requests to USBEHCD.SYS: the first 2 are presumably to address the sector on the USB stick (it's the USB mass data protocol to address a specific sector, it's UFI/CBI) and the third request is the actual data transmission request (it passes a 512 byte buffer).

UFI/CBI:
http://www.usb.org/developers/devclass_docs/usbmass-ufi10.pdf http://www.usb.org/developers/devclass_docs/usb_msc_cbi_1.1.pdf

Version 0, edited 12 years ago by Lars Erdmann (next)

comment:13 by abwillis, 12 years ago

I stopped it at 32% so that I could go home... started it again, this time plugged into the dock. It is at 63% and gave a beep, that corresponded to the xCenter CPU meter dropping off (had been running around 20% - 3-5% in IRQ and is now at 3% with .7% IRQ). It beeps every so often (haven't timed them). System is and has been responsive.

comment:14 by Lars Erdmann, 12 years ago

The beeps stem from USBEHCD.SYS. I had placed them in order to check if some code path is entered and forgot to remove them. See "FreeClaimed" in file ehcaccio.c.

1) one beep (1000 Hz) is when the QH that is to be freed is still active (which means it is cached internally by the USB HW). In that case the driver has to wait "for the next round"

2) one beep (2000 Hz) is when an interrupt QH is freed. This only happens when for example you remove a USB mouse or a USB keyboard from a USB 2.0 hub (or if the USB mouse/keyboard is a true USB 2.0 device which most of them are not).

In short: most likely you hit code path 1)

comment:15 by Lars Erdmann, 12 years ago

It is helpful if you have a USB stick that blinks on data transfer. That will tell you when finally the transfer will hang. However it is not a complete system hang. You can unplug the stick and the system will continue to work.

comment:16 by abwillis, 12 years ago

OK, the beep was probably related to a USB headset I had plugged in then (don't know which beep Hz). I had plugged it in several minutes before and the IRQ drop off had coincided with it. However, after waiting 40 min or so I had stopped it and started chkdsk again and it stopped at 40%, I removed it from the dock resulting in a trap in ibmss214 (not sure how that driver is related to the dock but...). Anyhow, I have started it again and it hung at 20%. In all cases, the light on the USB drive shows activity but CPU/IRQ activity does not bear it out and the chkdsk appears to just be halted (percentage-wise). Chkdsk /f:2 got this far very quickly: [d:\]chkdsk g: /f:2 The current hard disk drive is: G: The type of file system for the disk is HPFS. The HPFS file system program has been started. CHKDSK is searching for lost data. CHKDSK has searched 100% of the disk. CHKDSK corrected a disk space allocation error.

8297540 kilobytes total disk space.

but then took a very long time (quit watching it after 10 min) to give the rest:

8297540 kilobytes total disk space.

38 kilobytes are in 13 directories.

52619 kilobytes are in 265 user files.

110 kilobytes are in extended attributes.

12160 kilobytes are reserved for system use.

8232613 kilobytes are available for use.

again with no cpu/irq activity so it is possible that eventually the /f:3 will go somewhere but certainly with no sense of urgency.

by Lars Erdmann, 12 years ago

Attachment: willis.zip added

comment:17 by Lars Erdmann, 12 years ago

Can you try willis.zip ? If it does not work, add the /S: switch to USBEHCD.SYS.
Start with /S:8 then go down /S:4, /S:2, /S:1. Tell me when it starts failing.
It looks like with /S:1 (which is the default if you don't specify /S:) the driver handles requests from USBMSD.ADD too fast. I think that USBMSD.ADD finally does not respond any more.

Note: higher values for /S: means "slower", lower values means "faster".

comment:18 by Lars Erdmann, 11 years ago

Resolution: fixed
Status: newclosed
Note: See TracTickets for help on using tickets.