Opened 7 years ago

Closed 7 years ago

#275 closed enhancement (fixed)

python: Make subprocess use spawn instead of fork+exec

Reported by: dmik Owned by:
Priority: major Milestone:
Component: python Version:
Severity: low Keywords:
Cc:

Description

OS/2 uses the Posix codepath in subprocess.Popen that results in fork+exec to be used to start a new child. This is very inefficient on OS/2 given that parent data pages are deeply copied (Linux uses copy-on-write which is not available on OS/2 on the kernel level). Given that the subprocess interface clearly defines a child as having no any relation with the parent (other than regular file i/o inheritance and termination notification) this is a big overkill.

The WIndows code uses the StartProcess WINAPI (which is exposed to Python). We may simply use the spawn C api (which is internally DosExecPgm) and this looks like the right thing to do because fork on Posix there is a clear misuse (I guess there was no portable spawn implementation in Posix when this code was written).

Change History (10)

comment:1 by dmik, 7 years ago

The primary reason for this change is waf (a python based make framework) which is used in Samba 4. And in the current implementation Python fails to start a child process after creating a certain number of internal build task objects (here it starts failing with 1500 of such objects). Note that fork() itself works well, ie a new child process gets created as a copy of the parent Python process. However an attempt to start a right executable (gcc.exe in this case) within this forked process with exec results in DosExecPgm returning ERROR_BAD_ENVIRONMENT. I have never seen this error and don’t know what it means exactly but my guess is that cloning the parent Python process data pages in the forked child somehow exhausts some system resources and/or breaks some segment layout needed for DosExecPgm when there is too many data (too many Python objects) to clone.

Note that changing it to spawn() will let other use cases benefit as well, both in terms of speed (cloning process data requires some time) and memory footprint (a clean process will use much less memory than a fully cloned Python process with a lot of objects). Mozilla, which now uses Python as a make framework, is one example. Currently, it’s quite slow and memory consuming.

Last edited 7 years ago by dmik (previous) (diff)

comment:2 by dmik, 7 years ago

To do some research of memory consumption in Python, I used https://pypi.python.org/pypi/guppy/ (as per https://www.pluralsight.com/blog/tutorials/how-to-profile-memory-usage-in-python recommendation). Here is what I get for a case when DosExecPgm still works:

Partition of a set of 313111 objects. Total size = 23743552 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0 195102  62  9221720  39   9221720  39 str
     1  42341  14  3862848  16  13084568  55 list
     2   1581   1  2519016  11  15603584  66 dict of 0x2049ed8c
     3   4574   1  2246960   9  17850544  75 dict (no owner)
     4   2422   1  1259440   5  19109984  80 dict of 0x204a936c
     5  15945   5   651724   3  19761708  83 tuple
     6   3942   1   441504   2  20203212  85 __builtin__.set
     7    209   0   341768   1  20544980  87 dict of module
     8   7620   2   335280   1  20880260  88 Node.Nodu
     9   4063   1   292536   1  21172796  89 types.CodeType
<915 more rows. Type e.g. '_.more' to view.>
Data from unreachable objects.
Partition of a set of 1439 objects. Total size = 178964 bytes.
 Index  Count   %     Size   % Cumulative  % Class
     0    165  11    82344  46     82344  46 dict
     1    123   9    54852  31    137196  77 type
     2    490  34    15228   9    152424  85 str
     3    304  21    10048   6    162472  91 tuple
     4    103   7     7456   4    169928  95 list
     5    102   7     4080   2    174008  97 __builtin__.weakref
     6     58   4     2088   1    176096  98 __builtin__.wrapper_descriptor
     7     32   2     1024   1    177120  99 types.GetSetDescriptorType
     8     13   1      416   0    177536  99 __builtin__.method_descriptor
     9      9   1      288   0    177824  99 types.BuiltinFunctionType
<38 more rows. Type e.g. '_.more' to view.>

Here is what I get for a failing case:

Partition of a set of 321835 objects. Total size = 24400496 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0 201729  63  9499144  39   9499144  39 str
     1  43812  14  4082816  17  13581960  56 list
     2   1581   0  2519016  10  16100976  66 dict of 0x203b326c
     3   4578   1  2278224   9  18379200  75 dict (no owner)
     4   2566   1  1334320   5  19713520  81 dict of 0x20464d8c
     5  15972   5   652624   3  20366144  83 tuple
     6   4101   1   459312   2  20825456  85 __builtin__.set
     7    210   0   342288   1  21167744  87 dict of module
     8   7620   2   335280   1  21503024  88 Node.Nodu
     9   4066   1   292752   1  21795776  89 types.CodeType
<921 more rows. Type e.g. '_.more' to view.>
Data from unreachable objects.
Partition of a set of 1440 objects. Total size = 178996 bytes.
 Index  Count   %     Size   % Cumulative  % Class
     0    165  11    82344  46     82344  46 dict
     1    123   9    54852  31    137196  77 type
     2    490  34    15228   9    152424  85 str
     3    306  21    10108   6    162532  91 tuple
     4    103   7     7456   4    169988  95 list
     5    102   7     4080   2    174068  97 __builtin__.weakref
     6     58   4     2088   1    176156  98 __builtin__.wrapper_descriptor
     7     32   2     1024   1    177180  99 types.GetSetDescriptorType
     8     13   1      416   0    177596  99 __builtin__.method_descriptor
     9      9   1      288   0    177884  99 types.BuiltinFunctionType
<37 more rows. Type e.g. '_.more' to view.>

As you see, memory usage by Python in the first case is around 23 MB (313 K objects), and 24 MB (321 K objects). Frankly, it doesn't look impressive, 24 MB is just nothing. So I have to suspect the kLIBC fork() implementation then. As a wild guess, Python can create fragmented memory and fork() data page replication code somehow freaks out corrupting some area needed for DosExecPgm. Still, it looks easier to switch subprocess.Popen to spawn() than to fix fork().

comment:3 by dmik, 7 years ago

The good news is that I ported subprocess.Popen to use spawn() instead of fork() and it works great. The bad news is that it somehow doesn't solve the Waf problem. It looks like Python does something which breaks DosExecPgm execution in some way. The child process starts per se, but a subsequent DosWaitChild returns status 127 from the child (regardless of whether it's a kLIBC exe or an old-school OS/2 exe like CMD.EXE). I have no idea why and I don't know how to debug such a case, frankly saying.

comment:4 by dmik, 7 years ago

It turned out that DosExecPgm error was freaking out because samba scripts for waf would incorrectly handle path separators in a PATH-like variable which resulted in an environment variable too large for OS/2 to handle it (more than 32k?), hence ERROR_BAD_ENVIRONMENT. Fixing it in Samba (https://github.com/bitwiseworks/samba-os2/commit/aa479ad8809460e13948e791b598aefbb23909ad) made the problem go away.

However, using spawn() instead of fork() is still a good thing so I will commit it (after some more local testing).

comment:5 by dmik, 7 years ago

Found interesting thing. Despite that I rewrote subprocess to use spawn(), currently it still uses fork() + exec(). I was surprised when I found it out (but good that I checked). The reason is that os.py provides a os.spawn emulation on systems where no native spawn function is exported. And while posixmodule.c provides a spawn implementation, it was not compiled due to the lack of the HAVE_SPAWNV define (which got lost when Yuri built the new Python using configure and against kLIBC (so called os2knix) as opposed to the previous build which identified itself as os2emx as it was built against EMX and used a special pyconfig.h with all the defines).

I need to resurrect this native spawn code and make it work (a simple rebuild with HAVE_SPAWNV makes the test scripts hang so far).

comment:6 by dmik, 7 years ago

When implementing the spawn version of Popen.communicate() I had to use two threads that read out stdout and stderr, respectively (just like Windows does). And this introduces some unpleasant problem with threads in Python. Sometimes I get the test app stuck. As I have a suspicion that this is a generic problem not specific to the subprocess module, I will create a new ticket for that.

comment:7 by dmik, 7 years ago

The stuck python app was solved within #276. Now everything works well, I only have to sort out file descriptor inheritance: the Popen constructor has a close_fds parameter which, if True, should prevent file handles from being inherited by the child. In fork mode the Unix code was used on OS/2 (which would close all file descriptors except 0, 1 and 2 in the forked child before calling exec with the right program). However, in spawn mode we have more things to do: first, we need to disable inheritance for each file descriptor in the parent before calling spawn, then we need to re-enable inheritance for those descriptors which had it enabled (so that we have to memorize its original state).

comment:8 by dmik, 7 years ago

BTW, using spawn instead of fork does indeed provide some noticeable performance improvement which is seen well when configuring Firefox 52 (which is heavily based on python). Compare the fork mode times:

[9 Nov 2017 00:55:34, python mach configure]

 0:04.22 C:/USR/BIN/make.EXE -f client.mk -s MOZ_PARALLEL_BUILD=4 -s configure
 0:11.09 client.mk:204: D:/Coding/mozilla/master-build/.mozconfig.mk: No such file or directory
 0:17.30 Clobber not needed.
 0:29.85 cd D:/Coding/mozilla/master-build
 0:29.85 D:/Coding/mozilla/master/configure
 0:31.13 Creating Python environment
 0:40.13 New python executable in D:/Coding/mozilla/master-build/_virtualenv/bin/python
 0:40.13 Installing setuptools, pip, wheel...done.
 0:42.33 platform os2knix is not supported
 0:42.33
 0:42.33 Error processing command. Ignoring because optional. (optional:setup.py:python/psutil:build_ext:--inplace)
 0:42.33 Reexecuting in the virtualenv

and the spawn mode times:

[9 Nov 2017 00:58:18, python mach configure]

 0:03.13 C:/USR/BIN/make.EXE -f client.mk -s MOZ_PARALLEL_BUILD=4 -s configure
 0:08.77 client.mk:204: D:/Coding/mozilla/master-build/.mozconfig.mk: No such file or directory
 0:14.00 Clobber not needed.
 0:24.61 cd D:/Coding/mozilla/master-build
 0:24.65 D:/Coding/mozilla/master/configure
 0:25.77 Creating Python environment
 0:33.69 New python executable in D:/Coding/mozilla/master-build/_virtualenv/bin/python
 0:33.69 Installing setuptools, pip, wheel...done.
 0:35.10 platform os2knix is not supported
 0:35.10
 0:35.10 Error processing command. Ignoring because optional. (optional:setup.py:python/psutil:build_ext:--inplace)
 0:35.10 Reexecuting in the virtualenv

The initial configure stage does A LOT of python (a lot of external command executions as well I suppose) and you see a 7 second difference which is quite a bit for 40 seconds of runtime (i.e. the speed goes up by aprox. 12%).

Last edited 7 years ago by dmik (previous) (diff)

comment:9 by dmik, 7 years ago

I'm also considering if we should preserve the fork mode on OS/2 via a special parameter Popen just in case if we need 100% Unix compatibility here when porting some app. This is relatively easy to do as all the Posix code is still there (we will only have to use select.socketpair directly instead of os.pipe as the later now becomes a call to LIBC pipe() instead of socketpair() on OS/2).

comment:10 by dmik, 7 years ago

Resolution: fixed
Status: newclosed

Spawn code is committed in r1253 and r1254. Providing fork mode is postponed via #278.

Note: See TracTickets for help on using tickets.