Opened 12 years ago

Closed 11 years ago

#52 closed defect (worksforme)

Some build setups generate a Uniaud32.sys which fails to produce sound on some hardware.

Reported by: gentux2 Owned by: gentux2
Priority: normal Milestone: Stabilization
Component: Building Version:
Severity: blocker Keywords:
Cc:

Description

Using the same code, Mike builds a Uniaud32.sys which works, while Brendan builds one which does not, on certain hardware.

Complete 'PCI' output for specific failing hardware attached, as well as alsahlp$ output for "mike" and "brendan" drivers.

Attachments (5)

pci-P-B-D-R (69.1 KB) - added by gentux2 12 years ago.
Output of 'pci -P -B -D -R'
pci-S (5.7 KB) - added by gentux2 12 years ago.
Output of 'pci -S'
alsahlp.mike (19.1 KB) - added by gentux2 12 years ago.
alsahlp.brendan (2.1 KB) - added by gentux2 12 years ago.
div64_32.lst (5.4 KB) - added by gentux2 11 years ago.
disassembly of div64_32 in OW1.6

Download all attachments as: .zip

Change History (20)

Changed 12 years ago by gentux2

Output of 'pci -P -B -D -R'

Changed 12 years ago by gentux2

Output of 'pci -S'

Changed 12 years ago by gentux2

Changed 12 years ago by gentux2

comment:1 Changed 12 years ago by gentux2

In a "bad" build, the function snd_pcm_hw_refine, in alsa-kernel/core/pcm_native.c, fails with error -22 (EINVAL), in the for loop for (k = 0; k < constrs->rules_num; k++) when k == 11, at the line that reads changed = r->func(params, r); (line 289 at time of this writing). A "good" build does not, but instead returns 0. There is more debugging/tracing that can be done to further isolate this fault.

comment:2 Changed 12 years ago by gentux2

In this case, r->func is a pointer to snd_pcm_hw_rule_mulkdiv(), defined in pcm_native.c. It calls snd_interval_refine() which is defined in pcm_lib.c. The error -22 (EINVAL) is generated at the end of snd_interval_refine() where snd_interval_checkempty(i) returns true.

comment:3 Changed 12 years ago by gentux2

  • Milestone set to 1.1.5

comment:4 Changed 12 years ago by yoda

  • Milestone changed from 1.1.4GA to milestone1
  • Owner set to gentux2

Are we seeing this anymore at all ?

comment:5 Changed 12 years ago by gentux2

  • Milestone changed from milestone1 to 1.1.4GA
  • Severity changed from normal to blocker

Yes. In addition to my OW 1.6 build setup I have a system with OpenWatcom 1.7a which also produces bad builds. I insist it be fixed for 1.1.4GA. I have a plan for working on it.

comment:6 Changed 12 years ago by yoda

I changed it, because Roadmap says buildsystem is supposed to be fixed in Milestone 1.

If you insist to do it before, I can live with it ;-)

comment:7 Changed 12 years ago by gentux2

From milestone:1.1.4GA:

"1. Provide a clean and predictable build process"

I wrote that with this ticket in mind. Other tickets on the "Building" component may more appropriately belong to milestone:milestone1, although it appears there is some ambiguity we might do well to clarify.

comment:8 Changed 12 years ago by gentux2

I have now re-worded milestone:1.1.4GA to be more precise. It is not the "build system" makefiles I had in mind, but the predictability of the resulting outcome of the build.

comment:9 follow-up: Changed 11 years ago by gentux2

I traced this variation to div64_32() located in source:GPL/trunk/alsa-kernel/include/sound/pcm.h . A "good" build and a "bad" build end up with different results when this function is called with the same input values.

There are 3 versions of this function; Uniaud uses the third one, after the #else statement, i.e. BITS_PER_LONG < 64, (i386) is not #defined.

There is nothing external called here, so the bug must be occurring in this function itself, or in divl(), which is defined immediately above it. I suppose there are 3 possibilities:

  1. A compiler bug ("Bad" build was with OpenWatcom 1.6).
  2. This code does not port correctly from GCC to OpenWatcom, and is therefore buggy.
  3. The size of the variables passed to this function are not what they should be.

In any case, the correctness of this function, even in a "good" build setup, needs to be verified.

While I was in the process of creating a testcase .exe, the hard drive in my "bad" build system failed, so for the moment I no longer have that setup for comparison.

comment:10 in reply to: ↑ 9 Changed 11 years ago by gentux2

Replying to gentux2:

While I was in the process of creating a testcase .exe, the hard drive in my "bad" build system failed, so for the moment I no longer have that setup for comparison.

That also means I no longer have the logs to demonstrate my assertions, but I have limited my statements to those things which I am certain I observed.

comment:11 Changed 11 years ago by gentux2

  • Status changed from new to assigned

Steven Levine demonstrated that the printk statements I used to isolate div64_32() were flawed and giving erroneous output. With this information, I can no longer assert with confidence that the bug is in div64_32(). I am certain, however, there was variation in the output of muldiv32(), which calls it. I can no longer reproduce this variation when building with OW1.7 on different setups.

There are 2 things left to do here:

  1. Build again with OW1.6 to verify whether this was a flaw in OW that is fixed in 1.7.
  2. Trace the variation that occurs between different installations of OW1.7. It must be something different.

comment:12 Changed 11 years ago by gentux2

I built again with OpenWatcom 1.6, and I do get an odd variation:

OpenWatcom 1.7:

muldiv32( 4294967295, 48000, 1000000)
n = ffff4480 bb7f
calling div64_32(ffff4480 bb7f,1000000,0)
results in c49ba5e 0, 160000

OpenWatcom 1.6:

muldiv32( 4294967295, 48000, 1000000)
n = ffff4480 bb7f
calling div64_32(ffff4480 bb7f,1000000,0)
results in 10c6 0, 919296

comment:13 Changed 11 years ago by stevenhl

The 1.6 results indicate that it is not using 64-bit math on the multiply.

This is what the 1.7 implementation looks like in REXX

[D:\TMP]rexxtry numeric digits 15 ; say d2x(trunc(x2d(ffffffff) * 48000 / 1000000)) C49BA5E

Is what the the 1.6 implementation looks like in REXX

[D:\TMP]rexxtry numeric digits 15 ; say d2x(trunc((x2d(ffffffff) * 48000) ( x2d(ffffffff) + 1) / 1000000)) 10C6

The modulo math is the equivalent of what happens when a 32x32 multiply is generates a 32-bit result.

If you do a wdis -l -s pcm_lib.obj, it should be possible to spot the bad code. Attach pcm_lib.lst to the ticket or send it to me directly if you can not interpret what you are looking at.

I did a quick search for the OpenWatcom? changeset to corrected this defect, but I have not found it yet.

Changed 11 years ago by gentux2

disassembly of div64_32 in OW1.6

comment:14 Changed 11 years ago by gentux2

Very logical, thanks. I attached just the portion relating to div64_32. I cannot interpret it.

comment:15 Changed 11 years ago by gentux2

  • Resolution set to worksforme
  • Status changed from assigned to closed

The particular variation described in this ticket is fixed with an upgrade to OW1.7a. I suspect there could be another similar problem in 1.7a because of two reports of silent builds with it that worked when the same source was built with the pre-release snapshot of OW1.7 I have been using. This is the part "2." in my earlier post. However, I do not have enough solid evidence to keep this ticket open.

I am closing this ticket, and will use OW1.7a for 1.1.4RC7, and see if any new defect is reported. Then we will go from there.

Note: See TracTickets for help on using tickets.