Opened 9 years ago

Closed 9 years ago

Last modified 6 years ago

#176 closed defect (fixed)

Extremly poor performance of QDirIterator

Reported by: rudi Owned by:
Priority: major Milestone: Qt 4.6.3
Component: QtCore Version: 4.6.2
Severity: low Keywords:
Cc:

Description

Certain file operations (like populating a file dialog from a directory containing a lot of files or expiring a QNetworkDiskCache) take *AGES*. Especially when the underlying file system doesn't have good caching capabilities.

The following code:

    QDirIterator it("some_path");

    while (it.hasNext()) {
        QString path = it.next();
    }

takes on a directory (HPFS) containing 3800 files about 50 seconds (!) to complete, while an opendir/readdir - loop finishes in 2..3 secs. On JFS the performance is better, but not great as well.

Investigating...

Attachments (1)

iterator_speedup.zip (2.6 KB) - added by rudi 9 years ago.
Patches for qfsfileengine_iterator.cpp, qfsfileengine_os2.cpp and qfsfileengine_iterator_os2.cpp

Download all attachments as: .zip

Change History (12)

comment:1 Changed 9 years ago by rudi

One thing I've found so far:

If QFSFileEngine::fileFlags() gets called with type == Refresh (i.e. FlagsMask?, TypesMask? and PermsMask? == 0), we should not call doStat(). See src/corelib/io/qfsfileengine_os2.cpp, line 681 ff..

comment:2 Changed 9 years ago by rudi

I think I have a solution, but not sure if you like it...

The huge slowdown seems to be caused by the way the whole processing is implemented. Currently opendir()/readdir() is called to obtain a list of file names which then are individually stat()ed to obtain a QFileInfo. This is pretty wasteful. Especially considering that readdir() internally does a file enumeration loop that retrieves all the information needed but throws it away...

A better way would be to have QFSFileEngineIterator construct a QFIleInfo for the current file out of the information taken from the enumeration process. Unfortunately KLIBCs dirent structure does not contain all the fields neccessary. Thus we have to switch to the native API. Find patches for this approach attached. The main change is, that QFSFileEngineIterator::advance() now builds a QFileInfo structure with all the information already cached. This avoids the need for internally calling stat() when this when currentFileInfo() is requested.

The performance gain in my testcase is quite impressive: 2 seconds vs. 50 seconds on that 3800-files-directory. The file dialog boxes appear to be snappier as well. The time to open the mentioned directory is cut by half. But it still takes awfully long 55 secs. The reason why the gain is not so great here is probably the icon fetching...

One problem, however does arise out of the new implementation: Since now the native API is used, the code will not handle LIBCs symlink emulation (does that work at all ?).

Changed 9 years ago by rudi

Patches for qfsfileengine_iterator.cpp, qfsfileengine_os2.cpp and qfsfileengine_iterator_os2.cpp

comment:3 Changed 9 years ago by dmik

  • Milestone changed from Qt Enhanced to Qt 4.6.3

comment:4 Changed 9 years ago by dmik

Rudi, thank you for the patch! I have some comments.

  • First, I don't mind against using the Dos API -- actually I was thinking of this way when I worked on these classes back then in case if the performance goes not really good or if there are some other issues. What about support for symlinks that kLIBC provides, I don't think we should rely on that functionality at all in Qt because the rest of the system is not aware of that.
  • However, your patch introduces a bit of an inconsistency: QFSFileEngine still continues to use stat() and friends which gives us two (different) sources of information about the same files which may potentially lead to some unexpected side effects. I would really like to have QFSFileEngine moved to the native API as well if we go with your patch. What do you think?

comment:5 Changed 9 years ago by rudi

Hmm, I agree with you about the inconsistency. Let's check how much effort such a change in QFSFileEngine would require...

comment:6 Changed 9 years ago by dmik

The modified version of the patch for the directory iterator is committed in r794. Note that I preserved the old readdir() code by separating the Dos API approach with the QT_OS2_USE_DOSFINDFIRST define (which is on by default). See below why. Note that QT_OS2_USE_DOSFINDFIRST also disables symlink detection for regular file queries outside the directory iteration, to try to be at least logically consistent.

What about the rest, for now, I give up the idea of using The Dos API instead of stat()/fstat() everywhere else. In particular, I couldn't find a libc call to convert a libc file descriptro to an OS/2 file handle (we need this because Qt needs also to stat() a raw file descriptor w/o knowing the file name).

Another reason is that using the Dos API in these other places would not give a significant performance advantage because there is no double file access as in the directory iterator case; stat() internally uses the Dos API too so we would only remove this slight overhead.

And finally, it's more preferable to fix LIBC by adding the missing file statistics (in particular, creation time and access time) to its dirent implementation. This way, we will be able to use stat() for the directory iterator as well which will bring the consistency and symlik support back. I created this libc ticket for that, http://svn.netlabs.org/libc/ticket/229. When it's complete, we may restore the old approach (by fetching necessary statistics directly from dirent, of course) and remove QT_OS2_USE_DOSFINDFIRST and the code it guards.

Rudi, please check if it works for you.

comment:7 Changed 9 years ago by dmik

Note that here walking through a directory with 3000 files also works 3-4 times faster (I use HPFS386, but it's a VM so the absolute time numbers are actually quite abstract).

comment:8 Changed 9 years ago by dmik

Rudi, are you satisfied with the current implementation?

comment:9 Changed 9 years ago by rudi

Yes, it even seems to be a little bit faster than mine. So I guess this ticket can be closed.

comment:10 Changed 9 years ago by dmik

  • Resolution set to fixed
  • Status changed from new to closed

Thanks.

comment:11 Changed 6 years ago by dmik

Rudi, just in case if you still follow our Qt work, could you run your performance tests against the current SVN? There were some important changes to the iterating code (related to symlink processing).

Note: See TracTickets for help on using tickets.