Opened 17 years ago
Closed 16 years ago
#52 closed defect (worksforme)
Some build setups generate a Uniaud32.sys which fails to produce sound on some hardware.
Reported by: | Brendan Oakley | Owned by: | Brendan Oakley |
---|---|---|---|
Priority: | normal | Milestone: | Stabilization |
Component: | Building | Version: | |
Severity: | blocker | Keywords: | |
Cc: |
Description
Using the same code, Mike builds a Uniaud32.sys which works, while Brendan builds one which does not, on certain hardware.
Complete 'PCI' output for specific failing hardware attached, as well as alsahlp$ output for "mike" and "brendan" drivers.
Attachments (5)
Change History (20)
by , 17 years ago
Attachment: | pci-P-B-D-R added |
---|
by , 17 years ago
Attachment: | alsahlp.mike added |
---|
by , 17 years ago
Attachment: | alsahlp.brendan added |
---|
comment:1 by , 17 years ago
In a "bad" build, the function snd_pcm_hw_refine, in alsa-kernel/core/pcm_native.c, fails with error -22 (EINVAL), in the for loop for (k = 0; k < constrs->rules_num; k++)
when k == 11, at the line that reads changed = r->func(params, r);
(line 289 at time of this writing). A "good" build does not, but instead returns 0. There is more debugging/tracing that can be done to further isolate this fault.
comment:2 by , 17 years ago
In this case, r->func is a pointer to snd_pcm_hw_rule_mulkdiv(), defined in pcm_native.c. It calls snd_interval_refine() which is defined in pcm_lib.c. The error -22 (EINVAL) is generated at the end of snd_interval_refine() where snd_interval_checkempty(i) returns true.
comment:3 by , 17 years ago
Milestone: | → 1.1.5 |
---|
comment:4 by , 17 years ago
Milestone: | 1.1.4GA → milestone1 |
---|---|
Owner: | set to |
Are we seeing this anymore at all ?
comment:5 by , 17 years ago
Milestone: | milestone1 → 1.1.4GA |
---|---|
Severity: | normal → blocker |
Yes. In addition to my OW 1.6 build setup I have a system with OpenWatcom 1.7a which also produces bad builds. I insist it be fixed for 1.1.4GA. I have a plan for working on it.
comment:6 by , 17 years ago
I changed it, because Roadmap says buildsystem is supposed to be fixed in Milestone 1.
If you insist to do it before, I can live with it ;-)
comment:7 by , 17 years ago
From milestone:1.1.4GA:
"1. Provide a clean and predictable build process"
I wrote that with this ticket in mind. Other tickets on the "Building" component may more appropriately belong to milestone:milestone1, although it appears there is some ambiguity we might do well to clarify.
comment:8 by , 17 years ago
I have now re-worded milestone:1.1.4GA to be more precise. It is not the "build system" makefiles I had in mind, but the predictability of the resulting outcome of the build.
follow-up: 10 comment:9 by , 16 years ago
I traced this variation to div64_32() located in source:GPL/trunk/alsa-kernel/include/sound/pcm.h . A "good" build and a "bad" build end up with different results when this function is called with the same input values.
There are 3 versions of this function; Uniaud uses the third one, after the #else statement, i.e. BITS_PER_LONG < 64, (i386) is not #defined.
There is nothing external called here, so the bug must be occurring in this function itself, or in divl(), which is defined immediately above it. I suppose there are 3 possibilities:
- A compiler bug ("Bad" build was with OpenWatcom 1.6).
- This code does not port correctly from GCC to OpenWatcom, and is therefore buggy.
- The size of the variables passed to this function are not what they should be.
In any case, the correctness of this function, even in a "good" build setup, needs to be verified.
While I was in the process of creating a testcase .exe, the hard drive in my "bad" build system failed, so for the moment I no longer have that setup for comparison.
comment:10 by , 16 years ago
Replying to gentux2:
While I was in the process of creating a testcase .exe, the hard drive in my "bad" build system failed, so for the moment I no longer have that setup for comparison.
That also means I no longer have the logs to demonstrate my assertions, but I have limited my statements to those things which I am certain I observed.
comment:11 by , 16 years ago
Status: | new → assigned |
---|
Steven Levine demonstrated that the printk statements I used to isolate div64_32() were flawed and giving erroneous output. With this information, I can no longer assert with confidence that the bug is in div64_32(). I am certain, however, there was variation in the output of muldiv32(), which calls it. I can no longer reproduce this variation when building with OW1.7 on different setups.
There are 2 things left to do here:
- Build again with OW1.6 to verify whether this was a flaw in OW that is fixed in 1.7.
- Trace the variation that occurs between different installations of OW1.7. It must be something different.
comment:12 by , 16 years ago
I built again with OpenWatcom 1.6, and I do get an odd variation:
OpenWatcom 1.7:
muldiv32( 4294967295, 48000, 1000000) n = ffff4480 bb7f calling div64_32(ffff4480 bb7f,1000000,0) results in c49ba5e 0, 160000
OpenWatcom 1.6:
muldiv32( 4294967295, 48000, 1000000) n = ffff4480 bb7f calling div64_32(ffff4480 bb7f,1000000,0) results in 10c6 0, 919296
comment:13 by , 16 years ago
The 1.6 results indicate that it is not using 64-bit math on the multiply.
This is what the 1.7 implementation looks like in REXX
[D:\TMP]rexxtry numeric digits 15 ; say d2x(trunc(x2d(ffffffff) * 48000 / 1000000)) C49BA5E
Is what the the 1.6 implementation looks like in REXX
[D:\TMP]rexxtry numeric digits 15 ; say d2x(trunc((x2d(ffffffff) * 48000) ( x2d(ffffffff) + 1) / 1000000)) 10C6
The modulo math is the equivalent of what happens when a 32x32 multiply is generates a 32-bit result.
If you do a wdis -l -s pcm_lib.obj, it should be possible to spot the bad code. Attach pcm_lib.lst to the ticket or send it to me directly if you can not interpret what you are looking at.
I did a quick search for the OpenWatcom changeset to corrected this defect, but I have not found it yet.
comment:14 by , 16 years ago
Very logical, thanks. I attached just the portion relating to div64_32. I cannot interpret it.
comment:15 by , 16 years ago
Resolution: | → worksforme |
---|---|
Status: | assigned → closed |
The particular variation described in this ticket is fixed with an upgrade to OW1.7a. I suspect there could be another similar problem in 1.7a because of two reports of silent builds with it that worked when the same source was built with the pre-release snapshot of OW1.7 I have been using. This is the part "2." in my earlier post. However, I do not have enough solid evidence to keep this ticket open.
I am closing this ticket, and will use OW1.7a for 1.1.4RC7, and see if any new defect is reported. Then we will go from there.
Output of 'pci -P -B -D -R'