Opened 7 years ago
Closed 7 years ago
#275 closed enhancement (fixed)
python: Make subprocess use spawn instead of fork+exec
Reported by: | dmik | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | |
Component: | python | Version: | |
Severity: | low | Keywords: | |
Cc: |
Description
OS/2 uses the Posix codepath in subprocess.Popen that results in fork+exec to be used to start a new child. This is very inefficient on OS/2 given that parent data pages are deeply copied (Linux uses copy-on-write which is not available on OS/2 on the kernel level). Given that the subprocess interface clearly defines a child as having no any relation with the parent (other than regular file i/o inheritance and termination notification) this is a big overkill.
The WIndows code uses the StartProcess WINAPI (which is exposed to Python). We may simply use the spawn C api (which is internally DosExecPgm) and this looks like the right thing to do because fork on Posix there is a clear misuse (I guess there was no portable spawn implementation in Posix when this code was written).
Change History (10)
comment:2 by , 7 years ago
To do some research of memory consumption in Python, I used https://pypi.python.org/pypi/guppy/ (as per https://www.pluralsight.com/blog/tutorials/how-to-profile-memory-usage-in-python recommendation). Here is what I get for a case when DosExecPgm
still works:
Partition of a set of 313111 objects. Total size = 23743552 bytes. Index Count % Size % Cumulative % Kind (class / dict of class) 0 195102 62 9221720 39 9221720 39 str 1 42341 14 3862848 16 13084568 55 list 2 1581 1 2519016 11 15603584 66 dict of 0x2049ed8c 3 4574 1 2246960 9 17850544 75 dict (no owner) 4 2422 1 1259440 5 19109984 80 dict of 0x204a936c 5 15945 5 651724 3 19761708 83 tuple 6 3942 1 441504 2 20203212 85 __builtin__.set 7 209 0 341768 1 20544980 87 dict of module 8 7620 2 335280 1 20880260 88 Node.Nodu 9 4063 1 292536 1 21172796 89 types.CodeType <915 more rows. Type e.g. '_.more' to view.> Data from unreachable objects. Partition of a set of 1439 objects. Total size = 178964 bytes. Index Count % Size % Cumulative % Class 0 165 11 82344 46 82344 46 dict 1 123 9 54852 31 137196 77 type 2 490 34 15228 9 152424 85 str 3 304 21 10048 6 162472 91 tuple 4 103 7 7456 4 169928 95 list 5 102 7 4080 2 174008 97 __builtin__.weakref 6 58 4 2088 1 176096 98 __builtin__.wrapper_descriptor 7 32 2 1024 1 177120 99 types.GetSetDescriptorType 8 13 1 416 0 177536 99 __builtin__.method_descriptor 9 9 1 288 0 177824 99 types.BuiltinFunctionType <38 more rows. Type e.g. '_.more' to view.>
Here is what I get for a failing case:
Partition of a set of 321835 objects. Total size = 24400496 bytes. Index Count % Size % Cumulative % Kind (class / dict of class) 0 201729 63 9499144 39 9499144 39 str 1 43812 14 4082816 17 13581960 56 list 2 1581 0 2519016 10 16100976 66 dict of 0x203b326c 3 4578 1 2278224 9 18379200 75 dict (no owner) 4 2566 1 1334320 5 19713520 81 dict of 0x20464d8c 5 15972 5 652624 3 20366144 83 tuple 6 4101 1 459312 2 20825456 85 __builtin__.set 7 210 0 342288 1 21167744 87 dict of module 8 7620 2 335280 1 21503024 88 Node.Nodu 9 4066 1 292752 1 21795776 89 types.CodeType <921 more rows. Type e.g. '_.more' to view.> Data from unreachable objects. Partition of a set of 1440 objects. Total size = 178996 bytes. Index Count % Size % Cumulative % Class 0 165 11 82344 46 82344 46 dict 1 123 9 54852 31 137196 77 type 2 490 34 15228 9 152424 85 str 3 306 21 10108 6 162532 91 tuple 4 103 7 7456 4 169988 95 list 5 102 7 4080 2 174068 97 __builtin__.weakref 6 58 4 2088 1 176156 98 __builtin__.wrapper_descriptor 7 32 2 1024 1 177180 99 types.GetSetDescriptorType 8 13 1 416 0 177596 99 __builtin__.method_descriptor 9 9 1 288 0 177884 99 types.BuiltinFunctionType <37 more rows. Type e.g. '_.more' to view.>
As you see, memory usage by Python in the first case is around 23 MB (313 K objects), and 24 MB (321 K objects). Frankly, it doesn't look impressive, 24 MB is just nothing. So I have to suspect the kLIBC fork()
implementation then. As a wild guess, Python can create fragmented memory and fork()
data page replication code somehow freaks out corrupting some area needed for DosExecPgm
. Still, it looks easier to switch subprocess.Popen
to spawn()
than to fix fork()
.
comment:3 by , 7 years ago
The good news is that I ported subprocess.Popen
to use spawn()
instead of fork()
and it works great. The bad news is that it somehow doesn't solve the Waf problem. It looks like Python does something which breaks DosExecPgm execution in some way. The child process starts per se, but a subsequent DosWaitChild returns status 127 from the child (regardless of whether it's a kLIBC exe or an old-school OS/2 exe like CMD.EXE). I have no idea why and I don't know how to debug such a case, frankly saying.
comment:4 by , 7 years ago
It turned out that DosExecPgm error was freaking out because samba scripts for waf would incorrectly handle path separators in a PATH-like variable which resulted in an environment variable too large for OS/2 to handle it (more than 32k?), hence ERROR_BAD_ENVIRONMENT. Fixing it in Samba (https://github.com/bitwiseworks/samba-os2/commit/aa479ad8809460e13948e791b598aefbb23909ad) made the problem go away.
However, using spawn()
instead of fork()
is still a good thing so I will commit it (after some more local testing).
comment:5 by , 7 years ago
Found interesting thing. Despite that I rewrote subprocess
to use spawn()
, currently it still uses fork() + exec()
. I was surprised when I found it out (but good that I checked). The reason is that os.py
provides a os.spawn
emulation on systems where no native spawn
function is exported. And while posixmodule.c
provides a spawn
implementation, it was not compiled due to the lack of the HAVE_SPAWNV define (which got lost when Yuri built the new Python using configure and against kLIBC (so called os2knix
) as opposed to the previous build which identified itself as os2emx
as it was built against EMX and used a special pyconfig.h
with all the defines).
I need to resurrect this native spawn
code and make it work (a simple rebuild with HAVE_SPAWNV makes the test scripts hang so far).
comment:6 by , 7 years ago
When implementing the spawn
version of Popen.communicate()
I had to use two threads that read out stdout and stderr, respectively (just like Windows does). And this introduces some unpleasant problem with threads in Python. Sometimes I get the test app stuck. As I have a suspicion that this is a generic problem not specific to the subprocess
module, I will create a new ticket for that.
comment:7 by , 7 years ago
The stuck python app was solved within #276. Now everything works well, I only have to sort out file descriptor inheritance: the Popen
constructor has a close_fds
parameter which, if True
, should prevent file handles from being inherited by the child. In fork
mode the Unix code was used on OS/2 (which would close all file descriptors except 0, 1 and 2 in the forked child before calling exec
with the right program). However, in spawn
mode we have more things to do: first, we need to disable inheritance for each file descriptor in the parent before calling spawn
, then we need to re-enable inheritance for those descriptors which had it enabled (so that we have to memorize its original state).
comment:8 by , 7 years ago
BTW, using spawn
instead of fork
does indeed provide some noticeable performance improvement which is seen well when configuring Firefox 52 (which is heavily based on python). Compare the fork
mode times:
[9 Nov 2017 00:55:34, python mach configure] 0:04.22 C:/USR/BIN/make.EXE -f client.mk -s MOZ_PARALLEL_BUILD=4 -s configure 0:11.09 client.mk:204: D:/Coding/mozilla/master-build/.mozconfig.mk: No such file or directory 0:17.30 Clobber not needed. 0:29.85 cd D:/Coding/mozilla/master-build 0:29.85 D:/Coding/mozilla/master/configure 0:31.13 Creating Python environment 0:40.13 New python executable in D:/Coding/mozilla/master-build/_virtualenv/bin/python 0:40.13 Installing setuptools, pip, wheel...done. 0:42.33 platform os2knix is not supported 0:42.33 0:42.33 Error processing command. Ignoring because optional. (optional:setup.py:python/psutil:build_ext:--inplace) 0:42.33 Reexecuting in the virtualenv
and the spawn
mode times:
[9 Nov 2017 00:58:18, python mach configure] 0:03.13 C:/USR/BIN/make.EXE -f client.mk -s MOZ_PARALLEL_BUILD=4 -s configure 0:08.77 client.mk:204: D:/Coding/mozilla/master-build/.mozconfig.mk: No such file or directory 0:14.00 Clobber not needed. 0:24.61 cd D:/Coding/mozilla/master-build 0:24.65 D:/Coding/mozilla/master/configure 0:25.77 Creating Python environment 0:33.69 New python executable in D:/Coding/mozilla/master-build/_virtualenv/bin/python 0:33.69 Installing setuptools, pip, wheel...done. 0:35.10 platform os2knix is not supported 0:35.10 0:35.10 Error processing command. Ignoring because optional. (optional:setup.py:python/psutil:build_ext:--inplace) 0:35.10 Reexecuting in the virtualenv
The initial configure stage does A LOT of python (a lot of external command executions as well I suppose) and you see a 7 second difference which is quite a bit for 40 seconds of runtime (i.e. the speed goes up by aprox. 12%).
comment:9 by , 7 years ago
I'm also considering if we should preserve the fork
mode on OS/2 via a special parameter Popen
just in case if we need 100% Unix compatibility here when porting some app. This is relatively easy to do as all the Posix code is still there (we will only have to use select.socketpair
directly instead of os.pipe
as the later now becomes a call to LIBC pipe()
instead of socketpair()
on OS/2).
comment:10 by , 7 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
The primary reason for this change is waf (a python based make framework) which is used in Samba 4. And in the current implementation Python fails to start a child process after creating a certain number of internal build task objects (here it starts failing with 1500 of such objects). Note that fork() itself works well, ie a new child process gets created as a copy of the parent Python process. However an attempt to start a right executable (gcc.exe in this case) within this forked process with exec results in DosExecPgm returning ERROR_BAD_ENVIRONMENT. I have never seen this error and don’t know what it means exactly but my guess is that cloning the parent Python process data pages in the forked child somehow exhausts some system resources and/or breaks some segment layout needed for DosExecPgm when there is too many data (too many Python objects) to clone.
Note that changing it to spawn() will let other use cases benefit as well, both in terms of speed (cloning process data requires some time) and memory footprint (a clean process will use much less memory than a fully cloned Python process with a lot of objects). Mozilla, which now uses Python as a make framework, is one example. Currently, it’s quite slow and memory consuming.