Opened 9 years ago
Closed 9 years ago
#266 closed defect (fixed)
Investigate tdb_rec_read and tdb_oob failures
Reported by: | dmik | Owned by: | |
---|---|---|---|
Priority: | minor | Milestone: | |
Component: | Unknown | Version: | Server 3.6.x |
Keywords: | Cc: |
Description
Change History (4)
comment:1 by , 9 years ago
comment:2 by , 9 years ago
In failed cases tdb_oob
is most likely called from within tdb_read_rec
to validate the offset to the next record the just read record contains (tdb_record.next
). Apparently, an incorrect value for this offset is read which makes the tdb_oob
call barf. Looking at tdb_rec_read
shows that it eventually uses pread
and its counterpart tdb_rec_write
uses pwrite
, respectively. These functions differ from the usual write
and read
in that they don't change the file pointer. This, by design, should allow to read/write from different threads to different places of the file simultaneously (and this is what TDB seems to expect).
The pread/pwrite
calls are implemented in the kernel in modern Linux. However, in kLIBC they are emulated with lseek
and read/write
and there is the following note:
* This API will not work if the file is non-blocking or another * thread tries operating on the file while it's executing.
IIRC the file pointer is a global shared resource on OS/2 (and more over it's the same for both reading and writing) which means that if several threads or processes manipulate it at the same time they will screw each other.
If I'm right this is a perfect explanation of why TDB records get corrupted (including wrong magic values and weird next offset values).
I will provide libtdb.a
with more logging to Herwig to get some more info. I also need to check what happens with file locking at that time though advisory file locking per se doesn't solve this problem because two processes may have non-conflicting locks to non-overlaying locations and still screw each other up with simultaneous read/write...
comment:3 by , 9 years ago
I wrote the pread/pwrite
test that checks for its integrity, see https://github.com/bitwiseworks/libcx/commit/a9766b3093131bf7c409eacea1bec5737ebca6da. My testing shows that we have indeed broken pread
and pwrite
implementation. The given test case works perfectly on Linux and OS X but fails on OS/2. The reason of the failure is, as already mentioned, that OS/2 maintains a system-global file pointer for read/write operations and does not provide a read/write function allowing to operate from a specific offset in a file (the global file pointer should be moved to the specific location first). I'm 90% sure that the remaining TDB problems we see are because of that.
I will provide a pread/pwrite
implementation that uses a mutex to protect from other threads moving the file pointer on the same file. It will greatly reduce parallelism but given that OS/2 doesn't support it in the first place, it doesn't look like a problem. And if I'm right in my diagnosis, this will solve the TDB problems.
comment:4 by , 9 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
According to many test results from Herwig, the problem with corrupt TDB files has completely gone away. There are ENOLCK errors sometimes but it is expected because the memory buffer for managing locks in LIBCx is fixed now and it gets exhausted when serving many real life clients quite soon (several days even for buffers as big as 1M). There are some optimizations pending that will solve this, see https://github.com/bitwiseworks/libcx/issues/2 etc.
The first comment should be in the description, but I have no rights to change the descriptions here, apparently. So I'm putting it here.
After fixing
fcntl
advisory locking inlibcx0.dll
we basically have two problems left leading to broken service according to Herwig (and except these two we have a pretty much well functioning server):tdb_read_rec
like this:tdb(D:\MPTN\ETC\samba\lock/wins.tdb): tdb_rec_read bad magic 0x425f5457 at offset=6408
.tdb_oob
liike this:tdb(D:\MPTN\ETC\samba\lock/wins.tdb): tdb_oob len -675900163 beyond eof at 8192
.You may also see these errors that are most likely related to the previous two (at least
tab_oop
can always be seen right before them):