Opened 13 years ago

Closed 13 years ago

#199 closed defect (fixed)

Piping lage amounts of data from/to a child process fails

Reported by: rudi Owned by:
Priority: major Milestone: Qt 4.7
Component: QtCore Version: 4.6.3
Severity: medium Keywords:
Cc:

Description

Something seems to be broking with the QIODevice interface to QProcess. This shows up, when the amount of data passed through the pipe is relatively large (100KB+). In "debug" configuration, the program exits with the following message:

ASSERT: "pipeData[InPipe].bytesLeft >= written" in file io\qprocess_os2.cpp, line 1390

In "release" configuration, some data is lost. Piping in the opposite direction is unreliable as well.

Attachments (4)

qprocess_failure.zip (17.6 KB) - added by rudi 13 years ago.
outpipe.zip (27.5 KB) - added by rudi 13 years ago.
testcase for output
qprocess_logs.zip (23.0 KB) - added by rudi 13 years ago.
Log files
qprocess_os2.diff (1.6 KB) - added by rudi 13 years ago.
Attempt to fix blocking on DosRead?

Download all attachments as: .zip

Change History (43)

Changed 13 years ago by rudi

Attachment: qprocess_failure.zip added

comment:1 Changed 13 years ago by Silvan Scherrer

Milestone: Qt EnhancedQt 4.7

comment:2 Changed 13 years ago by rudi

Here is one more example for the problem. This code is pretty much the same as used by QtCreator?. It produces 100% CPU load until QProcess::waitForFinished() times out. Be sure that "test.ui" has unix-style line endings. But that's another story...

#include <QFile>
#include <QDebug>
#include <QProcess>

int main( int argc, char *argv[] )
{
    qint64 bytes;
    QProcess uic;
    QByteArray data;

    QFile f( "test.ui" );
    if( f.open( QIODevice::ReadOnly ) ) {
        data = f.readAll();
        f.close();
    }
    qDebug() << data.size() << "bytes read from file.";

    uic.start("UIC.EXE", QStringList(), QIODevice::ReadWrite);

    uic.waitForStarted();
    qDebug() << "started";

    bytes = uic.write(data);
    qDebug() << bytes << "bytes written to child process.";

    uic.closeWriteChannel();
    qDebug() << "waiting...";

    if (uic.waitForFinished()) {
        qDebug() << "finished";
        data = uic.readAllStandardOutput();
        qDebug() << "done.";
    } else {
        qDebug() << "failed:" << uic.errorString();
        uic.kill();
    }

    return 0;
}

comment:3 Changed 13 years ago by Dmitry A. Kuminov

Rudi, thank you for the report and for the test cases.

I see two problems:

  1. The assertion at line 1390 of qprocess_os2.cpp is not really correct. It is fairly possible that OS/2 reports there is not enough free bytes in the pipe buffer but DosWrite?() of a higher amount of bytes succeeds (simply because more free space became available after the last check and before DosWrite?()).
  1. It turns out that if one application closes its end of the pipe after writing a chunk of data to it, there is no guarantee that another application will be given a chance to read this chunk before the pipe is actually destroyed.

The last problem is the reason of the truncated data on the child end: calling QProcess::closeWriteChannel() will close the parent end of the pipe regardless of whether the child has actually read the last chunk written to it.

Last edited 13 years ago by Dmitry A. Kuminov (previous) (diff)

comment:4 Changed 13 years ago by Dmitry A. Kuminov

I found two more problems of the current implementation. They turned to be easy to catch on SMP, especially when the application reads/writes data in large chunks close to the pipe buffer size (64K).

  1. When working in sync mode (waitForBytesWritten(), waitForReadyRead() etc), it is possible that the application enters the sync mode right after the worker thread sends an async signal for the next block of data available in the pipe. Since the worker notifies only once for each block, noone will notify the application waiting inside of waitFor() and the async message is of no use because the event loop is not run. This results in a hang.
  1. When working in async mode, it is possible that the worker thread is signaled about the changes in the pipe right after counter of bytes available for reading form the pipe is reset to zero and before the actual data is read. As a result, the worker will issue another async signal about the (same) amount of bytes being available, but when the application processes this signal, there is actually no more data (it was read by the previous signal) so it blocks. Hang again.

Problem 4 is REALLY strange -- reading/writing to the pipe must never block since we create it with the NP_NOWAIT flag (non-blocking mode). I have no idea why it blocks. All in all, the way how pipe notification is implemented in OS/2 is a complete crap. The main problem is that in notifications you only get the number of data available at the given point in time, not the amount of *new* bytes since the last call. This makes implementing cumulative asynchronous reading/writing very complex without any need.

Last edited 13 years ago by Dmitry A. Kuminov (previous) (diff)

comment:5 Changed 13 years ago by Dmitry A. Kuminov

Problem 2 is fixed in r911. The other three will be fixed at once since they are all related and intermixed.

comment:6 Changed 13 years ago by Dmitry A. Kuminov

The pipe semaphore implementation is just insane in OS/2. It keeps being repeatedly posted for the same event (availability of the new block of data or space in the buffer) until there is another event. And there is no way to separate these repeated events from unique ones. This creates a potential hole for missed notifications and either hangs or truncated data if the semaphore is waited on one thread and data is processed (read/written) on another thread (of course, how else would you expect it to be used, guys?!).

I will have to workaround it. It is possible to sort out duplicates if sorting happens on the same thread that actually reads/writes data.

comment:7 Changed 13 years ago by Dmitry A. Kuminov

I implemented the new scheme in r913. It seems to fix all the problems described above. In particular, I performed some load testing on SMP (see process-sync-test.cmd in tests) and it behaves well. I also played around with process-async to check that I didn't break something obvious.

I will test it a bit more tomorrow before closing.

comment:8 Changed 13 years ago by rudi

I did a quick test and I'm not 100% convinced that it's the "gold" revsion of the fix. I have a somewhat esotheric example which uses a QXmlStreamReader feed by a QProcess that extracts an XML file from a 7-zip archive. The reader will return a "Premature end of document" error at the first line in the file. So I guess, soemthing is not quiet right.

comment:9 Changed 13 years ago by Dmitry A. Kuminov

Thanks for testing! Could you please give that example to me so that I save some time trying to reproduce it here?

comment:10 Changed 13 years ago by Dmitry A. Kuminov

Or could you at least test it with the process-async.exe test case and give me the debug log (search for QPROCESS_DEBUG in qprocess*.*)?

I tried extracting a text file from a zip archive to stdin unzip -p file.zip file.txt in that test case and the text file correctly appeared in the text window where it was read from QProcess.

comment:11 Changed 13 years ago by Dmitry A. Kuminov

However, the above UIC example doesn't work here (100% CPU load and no output). Interesting that when run as shown above, it also creates a QXmlStreamReader on an QIODevice constructed from stdin.

When QXmlStreamReader is created on a regular QFile device, UIC works as expected. It also works well if stdin is fed by cmd.exe (i.e. uic < test.ui). Strange.

So far, only UIC behaves this way. Other tools that read from stdin and write to stdout work well with QProcess. Also, the following simple program (that just puts to stdout all it gets on stdin) works well:

    while( (n = fread(buffer, 1, sizeof(buffer), stdin)) > 0 )
    {
        fwrite(buffer, 1, n, stdout);
    }

comment:12 Changed 13 years ago by rudi

Could you please give that example to me

I have to extract the code from a (GUI-) program I did for a client. Maybe this weekend... However, I noticed another oddity: Qt Creator stops during the build process of Converseen. It's not blocking. It just sits there and waits for something. I'm quite positive that this is related to the capturing of the compiler output into the IDE. What's kinda unusual with Converseen is, that there are quite a bunch of stderr messages generated and I cannot remember having seen only a single one of them before Creator enters idle state.

comment:13 Changed 13 years ago by Dmitry A. Kuminov

The following simple Qt-based equivalent of the test case from comment:11

#include <QFile>
#include <QDebug>

int main (int argc, char *argv [])
{
    QFile in;
    in.open (stdin, QIODevice::ReadOnly);

    QFile out;
    out.open (stdout, QIODevice::WriteOnly);

    while (!in.atEnd())
    {
        QByteArray line = in.readLine();
        out.write (line);
    }
}

does NOT work and shows the very same problem as UIC itself. Probably all your failing cases are just consequences.

Apparently, when reading from a QFile constructed on stdin, the child process doesn't notice the closure of the stream (which is performed by the parent when it successfully writes all data to child's stdin) and hence it just hangs waiting for more data. Digging in.

comment:14 Changed 13 years ago by Dmitry A. Kuminov

It not only hangs but spins in the loop. This may explain 100% CPU load of UIC (#213).

comment:15 Changed 13 years ago by Dmitry A. Kuminov

I found the culprit: it's the feof() LIBC call. It never returns true when stdin is a pipe and the other end of the pipe gets closed. Since feof() is what gets used by QFile::atEnd() on sequential streams (which stdin is as well), loops like while (!file.atEnd()) never end.

Of course, while (!reader.atEnd()) is what is used by UIC as well when it parses the input .UI file.

Last edited 13 years ago by Dmitry A. Kuminov (previous) (diff)

comment:16 Changed 13 years ago by Dmitry A. Kuminov

An obvious fix is to change line 892 of qfsfileengine.cpp from

        return feof(d->fh);

to

        return feof(d->fh) || ferror(d->fh);

but I first want to make some more tests (in particular, on Linux) to determine if it is really correct and if it should be an OS/2 only change or the cross-platform one.

Rudi, please try this change locally to see if it fixes your problems.

comment:17 Changed 13 years ago by rudi

O.K., the above code fixes the UIC hang on close. It does not help with #213. So that is probably caused by something else.

Regarding to my QXmlStreamReader issue, I've found one thing this is not strictly a QProcess problem: As I said, I used 7-zip as (de-)compression tool. This programm contains some code that is supposed to find out if stdout and stderr refer to the same terminal and return an error if this is the case and decompression to stdout is requested. It does this by calling isatty() on the file identifiers. While this is not correct anyway (isatty() returns "1" on any character device, including "nul"), it shows an interesting aspect of our QProcess implementation: isatty(fileno(stdXXX)) called from a spawned process will return "1". I didn't expect that as these handles refer to pipes. I have modified the offending code part in 7-zip (it's my own port anyway) and feeding the QXmlStreamReader from a QProcess works just fine now. Unfortunately, there is also a problem with the opposite direction. Find an example attached. The program is supposed to be equivalent to:

7za a test.7z -simyfile.ui < test.ui

With the file "test.ui" appended 10 times. Find the latest 7-zip version here: http://telefon.s-t.de/rudi/p7zip-os2-test.exe

Changed 13 years ago by rudi

Attachment: outpipe.zip added

testcase for output

comment:18 Changed 13 years ago by Dmitry A. Kuminov

Okay. #213 looks different indeed. The 100% CPU load in my case was caused by the atEnd() loop.

Regarding isatty(), I'd say it has nothing to do with our QProcess implementation, but rather with LIBC. QProcess does nothing but creates piped duplicates for stdin/stderr/stdout and then calls LIBC's spawn() (or DosStartSession?() when session types differ).

comment:19 Changed 13 years ago by rudi

O.K., I think I found out what causes #213. The problem is that we open a QFile with an existing file handle (stdin). Unfortunately, stdin is in text mode, while the QFile implementation expects it to be binary. I'm wondering, if we should enforce binary mode in QFSFileEnginePrivate::openFh()...

In the notes to bool QFile::open ( FILE * fh, OpenMode? mode ) they write:

\a fh must be opened in binary mode (i.e., the mode string must contain
'b', as in "rb" or "wb") when accessing files and other random-access
devices. Qt will translate the end-of-line characters if you pass
QIODevice::Text to \a mode. Sequential devices, such as stdin and stdout,
are unaffected by this limitation.

The last sentence doesn't really make sense to me...

comment:20 Changed 13 years ago by Dmitry A. Kuminov

Did some digging into LIBC. Created http://svn.netlabs.org/libc/ticket/246 as this seems like a LIBC issue to me indeed. As a workaround, I will try to disable inheritance of handles 0,1,2 in LIBC with fcntl().

comment:21 Changed 13 years ago by Dmitry A. Kuminov

Yes, this comment puzzles me as well. But given that they workaround this issue in UIC (and we added it for OS/2 as well in r917) they don't change the stdin/out/err mode on Windows. So I wonder why there is no hogging there. Did you actually find the exact line of code (or loop) in Qt sources that creates this hogging?

comment:22 Changed 13 years ago by Dmitry A. Kuminov

In r918, I did a much better thing than disabling handle inheritance: I used dup()/dup2() instead of the direct DosDupHandle?() call and this expectedly fixed the handle states in the child process. This actually draws ticket 246 in LIBC a no-problem, so I closed it. Rudi, please check if it now works with your 7zip machinery w/o the workaround you applied to it.

Last edited 13 years ago by Dmitry A. Kuminov (previous) (diff)

comment:23 Changed 13 years ago by Dmitry A. Kuminov

Applied the QFile::atEnd() fix from comment:16 in r923. Also created http://svn.netlabs.org/libc/ticket/247 for discussing the problem within LIBC.

comment:24 Changed 13 years ago by Dmitry A. Kuminov

Resolution: fixed
Status: newclosed

I've played with Qt Creator a bit by making it build and run applications including our Qt4 tests bundle, http://svn.netlabs.org/repos/qt4/tests (which involves running qmake, make, gcc and friends), as well as performing SVN commands right from Qt Creator and building a simple project that involves .ui compilation -- all works.

I think this issue may be closed. Regarding #213, let's continue in the appropriate defect.

comment:25 Changed 13 years ago by rudi

Resolution: fixed
Status: closedreopened

I still have trouble with the new version. When building Converseen, Qt Creator will stop in the middle without any activity. See comment:12. When attempting to close it, the window disappears, but the Qt Creator process will not vanish. Needs to be killed. It does not do that with the old QProcess and I'm quite prositive that is has to do with the fact that this build produces quit a bunch of stderr messages. Find the sources here: http://telefon.s-t.de/rudi/converseen_test.zip. I did a shadow build.

Also I still can't pipe large amounts of data to a child. Did you test the attachment from comment:17 ? The unmodified version of 7-zip can is here: http://hobbes.nmsu.edu/download/pub/os2/util/archiver/p7zip-9.20.1-os2.zip.

Forgot to mention: r918 fixes the input issue.

Last edited 13 years ago by rudi (previous) (diff)

comment:26 Changed 13 years ago by Dmitry A. Kuminov

I've checked your 7z testcase. It was tricky but I found the reason: it must be using ferror() to detect if reading the input data fails and aborts the process if ferror() returns non-zero (as opposed to e.g. zip or child.exe from tests/process that accept such a case).

The reason why ferror() returns non-zero is that when the server process issues DosDisconnectNPipe() on its end, DosRead?() on the client process aborts with ERROR_PIPE_NOT_CONNECTED which is then translated by LIBC to an ferror() condition (and to errno 57, ENOTCONN). See the updated http://svn.netlabs.org/libc/ticket/247/ for more details.

Turns out that the simplest way to do so is to get rid of the DosDisconnectNPipe() call in QProcess and just issue DosClose?(). In this case, client's DosRead?() simply returns 0 which is translated to EOF. Surprise, this also fixes the inverted feof() value and hence the correct QFile::atEnd() behavior without the fix applied in r923.

So, I applied this workaround in r928. I also reverted r923 since the new fix makes it unnecessary.

Version 0, edited 13 years ago by Dmitry A. Kuminov (next)

comment:27 Changed 13 years ago by Dmitry A. Kuminov

Resolution: fixed
Status: reopenedclosed

Regarding comment:12. Tried it. Stupid typo, fixed in r931. The problem would only appear in async mode and only when the amount of data written to stderr became bigger than the pipe buffer size (65k). So no surprise that process-sync-test didn't find that out. Works now.

PS. Regarding the hang of Qt Creator, it looks like its own bug. They seem to call QFuture::waitForFinished() on the GUI thread when canceling the build process which kind of blocks the thread.

comment:28 Changed 13 years ago by rudi

The output stuff seems to work nicely now. Also the build stoppage in Creator is fixed. My gut feeling that this could be related to stderr was not so wrong...

However, even though all the tests are running fine now, I still have a problem in my app. It's again with the input side and it used to work yesterday. Under some circumstances the app gets stuck on QProcess::waitForReadyRead(). Even though a timeout is specified. Could it be, that the change in the QProcess logic had some side effect on the signal/ready state generation ?

Last edited 13 years ago by rudi (previous) (diff)

comment:29 Changed 13 years ago by Dmitry A. Kuminov

Well, yesterday the logic was exactly the same as today. So if you used the up-to-date builds, nothing has changed in this regard since then. However, the signal/ready logic did change significantly since I started working on this defect 12 days ago.

It's hard to say without the exact example. The only possible place I see for that is DosRead?() waiting for more data. This is absurd (since all pipes we create are NP_NOWAIT) but I saw it once. And fixed this by only reading the amount of data that is actually in the pipe (according to DosPeekNPipe()), so this may get out of sync only if there is somebody else reading from the same handle.

In either case please try to at least collect logs (QPROCESS_DEBUG) to see what's going in. Or give me your example.

comment:30 Changed 13 years ago by rudi

I tried to isolate the problem but it's pretty weird. I extracted the code in question into a small console app. Unfortunately, there it works ! It only fails in the real world case (GUI). Furthermore, when QPROCESS_DEBUG is enabled, the failure scenario is a different one. Find the logs as well as the console app source attached.

Changed 13 years ago by rudi

Attachment: qprocess_logs.zip added

Log files

comment:31 Changed 13 years ago by Dmitry A. Kuminov

Rudi, I looked at the logs. In the GUI case the child process just dies right after startup, after putting 1004 bytes to stdout (they begin with "7-Zip (A) 9.20..." so probably it's an error message). I see nothing suspicious so far.

Please check what these 1004 bytes are about.

JFYI, the only way QProcess can detect the death of the started process is receiving SIGCHLD signal. So if we see the death in the logs it means this signal was definitely sent to us by LIBC.

comment:32 Changed 13 years ago by rudi

No, it's not an error message. The command lists the content of an archive. Depending on the caching, this can be very quick. The 1004 bytes are expected, but never arrive in my application. I.e. the sequence:

QProcess::start()
QProcess::waitForStarted()
if( QProcess::waitForReadyRead() )
    QProcess::readAll()

doesn't seem to work in this case.

Last edited 13 years ago by rudi (previous) (diff)

comment:33 Changed 13 years ago by Dmitry A. Kuminov

Okay, it's clear now. The above code is not fully correct.

What happens is that the death notification gets delivered before the "data available" one. The cross-platform death handler confirms that this may also happen on other platforms as well which is illustrated by the following code at line 1039 in qprocess.cpp:

    // in case there is data in the pipe line and this slot by chance
    // got called before the read notifications, call these two slots
    // so the data is made available before the process dies.
    _q_canReadStandardOutput();
    _q_canReadStandardError();

Such a case will cause waitForReadyRead() to return false even if there was some data before the death because it all has been read inside _q_processDied() and nothing is left for waitForReadyRead() to pick up.

While I could fix that behavior on OS/2 (so that waitForReadyRead() returns true in the above scenario), I tend not to do that since 1) this is not an explicit requirement of QProcess::waitForReadyRead() and 2) I don't want to introduce inconsistent cross-platform behavior (it turns out that life is easier if different platforms have also identical "bugs" in addition to identical features).

The reason why you don't see it in the console program is because the console program doesn't deliver notifications asynchronously and therefore it's impossible that the death signal comes before the read/write signal.

The cross-platform solution for your problem may look like this:

start();
if (waitForStarted())
{
    waitForFinished();
    QProcess::readAll();
}

This way, you will reliably get everything the process wrote to the pipe before its death.

comment:34 Changed 13 years ago by rudi

Well, that is only a part of the problem. I have to think about this a bit more...

But there's another problem. On comment:28 I talked about occasional hangs in QProcess::waitForReadyRead(). I traced that down and found the following: The process hangs in readfromStderr() called via:

  QProcessPrivate::waitFor()
  QCoreApplication::sendPostedEvents(q, QEvent::MetaCall);
    QProcessPrivate::_q_notified()
    QProcessPrivate::_q_canReadStandardError()
    QProcessPrivate::readfromStderr()

The QProcess log shows the following:

QProcess::bytesAvailable() == 0 (stdout)
QProcessPrivate::waitFor(ReadyRead, 10000)
------------------------------------------------- before sendPostedEvents() 


QProcessPrivate::_q_notified: flags 2
QProcessPrivate::bytesAvailableFromStderr() == 220
------------------------------------------------- entering _q_canReadStandardError() 
QProcessPrivate::bytesAvailableFromStderr() == 220 (reused)
------------------------------------------------- before readFromStderr 220 requested
------------------------------------------------- after readFromStderr 220 read
QProcessPrivate::readFromStderr(0x204a4a90 "\r\n7-Zip (A) 9.20", 220) == 220
------------------------------------------------- _qNotified exit 


QProcessPrivate::_q_notified: flags 4
QProcessPrivate::canWrite(), not writing anything (empty write buffer).
------------------------------------------------- _qNotified exit 


QProcessPrivate::_q_notified: flags 2
QProcessPrivate::bytesAvailableFromStderr() == 220 (reused)
------------------------------------------------- entering _q_canReadStandardError() 
QProcessPrivate::bytesAvailableFromStderr() == 220 (reused)
------------------------------------------------- before readFromStderr 220 requested

Here we hang.

As you can see, we have three invocations of QProcessPrivate::_q_notified. Two for stderr (read) and one for stdin (write). What appears wrong to me is the second invocation for stderr. When we get called for the first time, we have 220 bytes available. That's O.K. They are read successfully and we see the first few bytes of content. But why do we get the second call that tries to read them again ? Obviously the pipe is empty now and DosRead? will block. Looks like we need to subtract the number of bytes consumed by the read from the number stored in pipeData.

comment:35 Changed 13 years ago by rudi

Resolution: fixed
Status: closedreopened

Changed 13 years ago by rudi

Attachment: qprocess_os2.diff added

Attempt to fix blocking on DosRead?

comment:36 Changed 13 years ago by rudi

We might also want to exchange lines 1472 and 1473 in qprocess_os2.cpp so that waitMode is set to true after we made sure that the semaphore exists.

comment:37 Changed 13 years ago by Dmitry A. Kuminov

Ooops, that's a typo. Fixed in r964. Could theoretically lead to a hang.

Regarding comment:34. Tracing really helps, as usual. Now I can see that waitFor() did not expect that two messages of the same type were in the queue when it called sendPostedEvents(). I fixed it in r969.

Note that the approach in your patch is not fully correct since it is possible that the second notification represents a new payload (which would be lost). Also, locking is not necessary there since all manipulation with the PipeData? fields for sdterr/stdout happens on the same thread.

Note also that I couldn't test it locally since I was not able to reproduce your hanging scenario (after changing the process-sync test case so that it also does synchronous stuff on the GUI thread). I hope it works for you now.

BTW, using the synchronous QProcess data functions on a GUI thread is not a good idea anyway (just in case if you do it there in your program).

comment:38 Changed 13 years ago by rudi

O.K. works for me as well. I'm aware that using QProcess synchronously in a GUI it's not the best solution. However, the program in question is one of those where I get paid for something that does a job. Not for something that is fancy. In other words: Of course it would be nice to do the data import in a separate thread combined with a progress indicator. But my customer doesn't care about these type of things. All he wants is that his employees can do a certain task with the program. And the less time I invest in creating it, the better for me. The OS/2 version was just something I did out of curiosity. The only platform the program is required to run on is Windows.

comment:39 Changed 13 years ago by Dmitry A. Kuminov

Resolution: fixed
Status: reopenedclosed

Great. I get your point and anyway this is what helped us to find the bugs and finally trace them (it's not unusual that bugs hide in the rare use scenarios treated as never possible by the devs), so I don't complain -)

Note: See TracTickets for help on using tickets.