Opened 8 years ago

Closed 8 years ago

#266 closed defect (fixed)

Investigate tdb_rec_read and tdb_oob failures

Reported by: dmik Owned by:
Priority: minor Milestone:
Component: Unknown Version: Server 3.6.x
Keywords: Cc:

Description


Change History (4)

comment:1 by dmik, 8 years ago

The first comment should be in the description, but I have no rights to change the descriptions here, apparently. So I'm putting it here.

After fixing fcntl advisory locking in libcx0.dll we basically have two problems left leading to broken service according to Herwig (and except these two we have a pretty much well functioning server):

  1. Failures in tdb_read_rec like this: tdb(D:\MPTN\ETC\samba\lock/wins.tdb): tdb_rec_read bad magic 0x425f5457 at offset=6408.
  2. Failures in tdb_oob liike this: tdb(D:\MPTN\ETC\samba\lock/wins.tdb): tdb_oob len -675900163 beyond eof at 8192.

You may also see these errors that are most likely related to the previous two (at least tab_oop can always be seen right before them):

tdb(D:\MPTN\ETC\samba\lock/wins.tdb): tdb_transaction_recover: failed to read recovery record
tdb(D:\MPTN\ETC\samba\lock/wins.tdb): tdb_lock failed on list 65 ltype=1 (Error 0)

comment:2 by dmik, 8 years ago

In failed cases tdb_oob is most likely called from within tdb_read_rec to validate the offset to the next record the just read record contains (tdb_record.next). Apparently, an incorrect value for this offset is read which makes the tdb_oob call barf. Looking at tdb_rec_read shows that it eventually uses pread and its counterpart tdb_rec_write uses pwrite, respectively. These functions differ from the usual write and read in that they don't change the file pointer. This, by design, should allow to read/write from different threads to different places of the file simultaneously (and this is what TDB seems to expect).

The pread/pwrite calls are implemented in the kernel in modern Linux. However, in kLIBC they are emulated with lseek and read/write and there is the following note:

 * This API will not work if the file is non-blocking or another
 * thread tries operating on the file while it's executing.

IIRC the file pointer is a global shared resource on OS/2 (and more over it's the same for both reading and writing) which means that if several threads or processes manipulate it at the same time they will screw each other.

If I'm right this is a perfect explanation of why TDB records get corrupted (including wrong magic values and weird next offset values).

I will provide libtdb.a with more logging to Herwig to get some more info. I also need to check what happens with file locking at that time though advisory file locking per se doesn't solve this problem because two processes may have non-conflicting locks to non-overlaying locations and still screw each other up with simultaneous read/write...

comment:3 by dmik, 8 years ago

I wrote the pread/pwrite test that checks for its integrity, see https://github.com/bitwiseworks/libcx/commit/a9766b3093131bf7c409eacea1bec5737ebca6da. My testing shows that we have indeed broken pread and pwrite implementation. The given test case works perfectly on Linux and OS X but fails on OS/2. The reason of the failure is, as already mentioned, that OS/2 maintains a system-global file pointer for read/write operations and does not provide a read/write function allowing to operate from a specific offset in a file (the global file pointer should be moved to the specific location first). I'm 90% sure that the remaining TDB problems we see are because of that.

I will provide a pread/pwrite implementation that uses a mutex to protect from other threads moving the file pointer on the same file. It will greatly reduce parallelism but given that OS/2 doesn't support it in the first place, it doesn't look like a problem. And if I'm right in my diagnosis, this will solve the TDB problems.

comment:4 by dmik, 8 years ago

Resolution: fixed
Status: newclosed

According to many test results from Herwig, the problem with corrupt TDB files has completely gone away. There are ENOLCK errors sometimes but it is expected because the memory buffer for managing locks in LIBCx is fixed now and it gets exhausted when serving many real life clients quite soon (several days even for buffers as big as 1M). There are some optimizations pending that will solve this, see https://github.com/bitwiseworks/libcx/issues/2 etc.

Note: See TracTickets for help on using tickets.