Threading using less RAM than Popen, why?

Threading using less RAM than Popen, why? - python

I have a question, So I am testing RAM usage of my script and it is as easy as:
a script that is a start up, that script opens up 4 python script in a while loop and loops forever.
I tested 2 things. One with just calling Popen for each script and one using threading and I found out that there is a huge difference between each other...
The script with threading:
And a script that uses Popen to open up the scripts:
Since they both do the exact same thing, why does the Python threading uses so much less RAM than the other that opens up the script? What are the advantage vs. disadvantage of using the threading vs. Popen?
test2:
import time
def testing():
while True:
test = "Helllo world"
print(test)
time.sleep(1)
if __name__ == '__main__':
testing()
threading.py:
import threading
from test2 import testing
threading.Thread(target=testing()).start()
threading.Thread(target=testing()).start()
threading.Thread(target=testing()).start()
Popen:
from subprocess import Popen
for _ in range(4):
Popen(f"py test2.py", shell=True).communicate()

The popen() version creates 3 whole new processes, each of which runs its own, distinct Python interpreter.
The threading version run 3 threads within the current process, sharing its memory space and the single, original Python interpreter.

Related

How to run multiple servers with a python script? [duplicate]

I'm trying to port a shell script to the much more readable python version. The original shell script starts several processes (utilities, monitors, etc.) in the background with "&". How can I achieve the same effect in python? I'd like these processes not to die when the python scripts complete. I am sure it's related to the concept of a daemon somehow, but I couldn't find how to do this easily.

While jkp's solution works, the newer way of doing things (and the way the documentation recommends) is to use the subprocess module. For simple commands its equivalent, but it offers more options if you want to do something complicated.
Example for your case:
import subprocess
subprocess.Popen(["rm","-r","some.file"])
This will run rm -r some.file in the background. Note that calling .communicate() on the object returned from Popen will block until it completes, so don't do that if you want it to run in the background:
import subprocess
ls_output=subprocess.Popen(["sleep", "30"])
ls_output.communicate() # Will block for 30 seconds
See the documentation here.
Also, a point of clarification: "Background" as you use it here is purely a shell concept; technically, what you mean is that you want to spawn a process without blocking while you wait for it to complete. However, I've used "background" here to refer to shell-background-like behavior.

Note: This answer is less current than it was when posted in 2009. Using the subprocess module shown in other answers is now recommended in the docs
(Note that the subprocess module provides more powerful facilities for spawning new processes and retrieving their results; using that module is preferable to using these functions.)
If you want your process to start in the background you can either use system() and call it in the same way your shell script did, or you can spawn it:
import os
os.spawnl(os.P_DETACH, 'some_long_running_command')
(or, alternatively, you may try the less portable os.P_NOWAIT flag).
See the documentation here.

You probably want the answer to "How to call an external command in Python".
The simplest approach is to use the os.system function, e.g.:
import os
os.system("some_command &")
Basically, whatever you pass to the system function will be executed the same as if you'd passed it to the shell in a script.

I found this here:
On windows (win xp), the parent process will not finish until the longtask.py has finished its work. It is not what you want in CGI-script. The problem is not specific to Python, in PHP community the problems are the same.
The solution is to pass DETACHED_PROCESS Process Creation Flag to the underlying CreateProcess function in win API. If you happen to have installed pywin32 you can import the flag from the win32process module, otherwise you should define it yourself:
DETACHED_PROCESS = 0x00000008
pid = subprocess.Popen([sys.executable, "longtask.py"],
creationflags=DETACHED_PROCESS).pid

Use subprocess.Popen() with the close_fds=True parameter, which will allow the spawned subprocess to be detached from the Python process itself and continue running even after Python exits.
https://gist.github.com/yinjimmy/d6ad0742d03d54518e9f
import os, time, sys, subprocess
if len(sys.argv) == 2:
time.sleep(5)
print 'track end'
if sys.platform == 'darwin':
subprocess.Popen(['say', 'hello'])
else:
print 'main begin'
subprocess.Popen(['python', os.path.realpath(__file__), '0'], close_fds=True)
print 'main end'

Both capture output and run on background with threading
As mentioned on this answer, if you capture the output with stdout= and then try to read(), then the process blocks.
However, there are cases where you need this. For example, I wanted to launch two processes that talk over a port between them, and save their stdout to a log file and stdout.
The threading module allows us to do that.
First, have a look at how to do the output redirection part alone in this question: Python Popen: Write to stdout AND log file simultaneously
Then:
main.py
#!/usr/bin/env python3
import os
import subprocess
import sys
import threading
def output_reader(proc, file):
while True:
byte = proc.stdout.read(1)
if byte:
sys.stdout.buffer.write(byte)
sys.stdout.flush()
file.buffer.write(byte)
else:
break
with subprocess.Popen(['./sleep.py', '0'], stdout=subprocess.PIPE, stderr=subprocess.PIPE) as proc1, \
subprocess.Popen(['./sleep.py', '10'], stdout=subprocess.PIPE, stderr=subprocess.PIPE) as proc2, \
open('log1.log', 'w') as file1, \
open('log2.log', 'w') as file2:
t1 = threading.Thread(target=output_reader, args=(proc1, file1))
t2 = threading.Thread(target=output_reader, args=(proc2, file2))
t1.start()
t2.start()
t1.join()
t2.join()
sleep.py
#!/usr/bin/env python3
import sys
import time
for i in range(4):
print(i + int(sys.argv[1]))
sys.stdout.flush()
time.sleep(0.5)
After running:
./main.py
stdout get updated every 0.5 seconds for every two lines to contain:
0
10
1
11
2
12
3
13
and each log file contains the respective log for a given process.
Inspired by: https://eli.thegreenplace.net/2017/interacting-with-a-long-running-child-process-in-python/
Tested on Ubuntu 18.04, Python 3.6.7.

You probably want to start investigating the os module for forking different threads (by opening an interactive session and issuing help(os)). The relevant functions are fork and any of the exec ones. To give you an idea on how to start, put something like this in a function that performs the fork (the function needs to take a list or tuple 'args' as an argument that contains the program's name and its parameters; you may also want to define stdin, out and err for the new thread):
try:
pid = os.fork()
except OSError, e:
## some debug output
sys.exit(1)
if pid == 0:
## eventually use os.putenv(..) to set environment variables
## os.execv strips of args[0] for the arguments
os.execv(args[0], args)

You can use
import os
pid = os.fork()
if pid == 0:
Continue to other code ...
This will make the python process run in background.

I haven't tried this yet but using .pyw files instead of .py files should help. pyw files dosen't have a console so in theory it should not appear and work like a background process.

Python doctest hangs using ProcessPoolExecutor

This code runs fine under regular CPython 3.5:
import concurrent.futures
def job(text):
print(text)
with concurrent.futures.ProcessPoolExecutor(1) as pool:
pool.submit(job, "hello")
But if you run it as python -m doctest myfile.py, it hangs. Changing submit(job to submit(print makes it not hang, as does using ThreadPoolExecutor instead of ProcessPoolExecutor.
Why does it hang when run under doctest?

So I think the issue is because of your with statement. When you have below
with concurrent.futures.ProcessPoolExecutor(1) as pool:
pool.submit(job, "hello")
It enforces the thread to be executed and closed then an there itself. When you run this as main process it works and gives time for thread to execute the job. But when you import it as a module then it doesn't give the background thread a chance and the shutdown on the pool waits for the work to be executed and hence a deadlock
So the workaround that you can use is below
import concurrent.futures
def job(text):
print(text)
pool = concurrent.futures.ProcessPoolExecutor(1)
pool.submit(job, "hello")
if __name__ == "__main__":
pool.shutdown(True)
This will prevent the deadlock and will let you run doctest as well as import the module if you want

The problem is that importing a module acquires a lock (which lock depends on your python version), see the docs for imp.lock_held.
Locks are shared over multiprocessing so your deadlock occurs because your main process, while it is importing your module, loads and waits for a subprocess which attempts to import your module, but can't acquire the lock to import it because it is currently being imported by your main process.
In step form:
Main process acquires lock to import myfile.py
Main process starts importing myfile.py (it has to import myfile.py because that is where your job() function is defined, which is why it didn't deadlock for print()).
Main process starts and blocks on subprocess.
Subprocess tries to acquire lock to import myfile.py
=> Deadlock.

doctest imports your module in order to process it. Try adding this to prevent execution on import:
if __name__ == "__main__":
with concurrent.futures.ProcessPoolExecutor(1) as pool:
pool.submit(job, "hello")

This should actually be a comment, but it's too long to be one.
Your code fails if it's imported as a module too, with the same error as doctest. I get _pickle.PicklingError: Can't pickle <function job at 0x7f28cb0d2378>: import of module 'a' failed (I named the file as a.py).
Your lack of if __name__ == "__main__": violates the programming guidelines for multiprocessing:
https://docs.python.org/3.6/library/multiprocessing.html#the-spawn-and-forkserver-start-methods
I guess that the child processes will also try to import the module, which then tries to start another child process (because the pool unconditionally executes). But I'm not 100% sure about this.
I'm also not sure why the error you get is can't pickle <function>.
The issue here seems to be that you want the module to auto start a process on import. I'm not sure if this is possible.

How to delay the execution of a script in Python?

I'm working with a Python script and I have some problems on delaying the execution of a Bash script.
My script.py lets the user choose a script.sh, and after the possibility to modify that, the user can run it with various options.
One of this option is the possibility to delay of N seconds the execution of the script, I used time.sleep(N) but the script.py totally stops for N seconds, I just want to retard the script.sh of N seconds, letting the user continue using the script.py.
I searched for answers without success, any ideas?

You can start the script in a New thread, sleeping before running it.
Minimal example:
import subprocess as sp
from threading import Thread
import time
def start_delayed(args, delay):
time.sleep(delay)
sp.run(args)
t = Thread(target=start_delayed, kwargs={'args': ['ls'], 'delay': 5})
t.start()

Consider using a Timer object from the threading module:
import subprocess, threading
t = threading.Timer(10.0, subprocess.call, args=(['script.sh'],))
t.start()
...the above running script.sh after a 10-second delay.
Alternately, if you want to efficiently be able to run an arbitrary number of scheduled tasks with only a single thread controlling them, consider using a scheduler from the tandard-library sched module:
import sched, subprocess
s = sched.scheduler(time.time, time.sleep)
s.enter(10, subprocess.call, (['script.sh'],))
s.run()
This will run script.sh after 10 seconds have passed -- though if you want it to run in the background, you'll want to put it in a thread (or such) yourself.

You should run sleep using subprocess.Popen before calling script.sh.

input() blocks other python processes in Windows 8 (python 3.3)

Working on a multi-threaded cross-platform python3.3 application I came across some weird behavior I was not expecting and am not sure is expected. The issue is on Windows 8 calling the input() method in one thread blocks other threads until it completes. I have tested the below example script on three Linux, two Windows 7 and one Windows 8 computers and this behavior is only observed on the Windows 8 computer. Is this expected behavior for Windows 8?
test.py:
import subprocess, threading, time
def ui():
i = input("-->")
print(i)
def loop():
i = 0
f = 'sky.{}'.format(i)
p = subprocess.Popen(['python', 'copy.py', 'sky1', f])
t = time.time()
while time.time() < t+15:
if p.poll() != None:
print(i)
time.sleep(3)
i+=1
f = 'sky.{}'.format(i)
p = subprocess.Popen(['python', 'copy.py', 'sky1', f])
p.terminate()
p.wait()
def start():
t1 = threading.Thread(target=ui)
t2 = threading.Thread(target=loop)
t1.start()
t2.start()
return t2
t2 = start()
t2.join()
print('done')
copy.py:
import shutil
import sys
src = sys.argv[1]
dst = sys.argv[2]
print('Copying \'{0}\' to \'{1}\''.format(src, dst))
shutil.copy(src, dst)
Update:
While trying out one of the suggestions I realized that I rushed to a conclusion missing something obvious. I apologize for getting off to a false start.
As Schollii suggested just using threads (no subprocess or python files) results in all threads making forward progress so the problem actually is using input() in one python process will cause other python processes to block/not run (I do not know exactly what is going on). Furthermore, it appears to be just python processes that are affected. If I use the same code shown above (with some modifications) to execute non-python executables with subprocess.Popen they will run as expected.
To summarize:
Using subprocess to execute non-python executable: works as expected with and without any calls to input().
Using subprocess to execute python executable: created processes appear to not run if a a call to input() is made in the original process.
Use subprocess to create python processes with a call to input() in a new process and not the original process: A call to input() blocks all python processes spawned by the 'main' process.
Side Note: I do not have Windows 8 platform so debugging/tests can be a little slow.

Because there are several problems with input in Python 3.0-3.2 this method has been impacted with few changes.
It's possible that we have a new bug again.
Can you try the following variant, which is raw_input() "back port" (which was avaiable in Python 2.x):
...
i = eval(input("-->"))
...

It's a very good problem to work with,
since you are dependent with input() method, which, usually needs the console input,
since you have threads, all the threads are trying to communicate with the console,
So, I advice you to use either Producer-Consumer concept or define all your inputs to a text file and pass the text file to the program.

How to start a background process in Python?

I'm trying to port a shell script to the much more readable python version. The original shell script starts several processes (utilities, monitors, etc.) in the background with "&". How can I achieve the same effect in python? I'd like these processes not to die when the python scripts complete. I am sure it's related to the concept of a daemon somehow, but I couldn't find how to do this easily.

While jkp's solution works, the newer way of doing things (and the way the documentation recommends) is to use the subprocess module. For simple commands its equivalent, but it offers more options if you want to do something complicated.
Example for your case:
import subprocess
subprocess.Popen(["rm","-r","some.file"])
This will run rm -r some.file in the background. Note that calling .communicate() on the object returned from Popen will block until it completes, so don't do that if you want it to run in the background:
import subprocess
ls_output=subprocess.Popen(["sleep", "30"])
ls_output.communicate() # Will block for 30 seconds
See the documentation here.
Also, a point of clarification: "Background" as you use it here is purely a shell concept; technically, what you mean is that you want to spawn a process without blocking while you wait for it to complete. However, I've used "background" here to refer to shell-background-like behavior.

Note: This answer is less current than it was when posted in 2009. Using the subprocess module shown in other answers is now recommended in the docs
(Note that the subprocess module provides more powerful facilities for spawning new processes and retrieving their results; using that module is preferable to using these functions.)
If you want your process to start in the background you can either use system() and call it in the same way your shell script did, or you can spawn it:
import os
os.spawnl(os.P_DETACH, 'some_long_running_command')
(or, alternatively, you may try the less portable os.P_NOWAIT flag).
See the documentation here.

You probably want the answer to "How to call an external command in Python".
The simplest approach is to use the os.system function, e.g.:
import os
os.system("some_command &")
Basically, whatever you pass to the system function will be executed the same as if you'd passed it to the shell in a script.

I found this here:
On windows (win xp), the parent process will not finish until the longtask.py has finished its work. It is not what you want in CGI-script. The problem is not specific to Python, in PHP community the problems are the same.
The solution is to pass DETACHED_PROCESS Process Creation Flag to the underlying CreateProcess function in win API. If you happen to have installed pywin32 you can import the flag from the win32process module, otherwise you should define it yourself:
DETACHED_PROCESS = 0x00000008
pid = subprocess.Popen([sys.executable, "longtask.py"],
creationflags=DETACHED_PROCESS).pid

Use subprocess.Popen() with the close_fds=True parameter, which will allow the spawned subprocess to be detached from the Python process itself and continue running even after Python exits.
https://gist.github.com/yinjimmy/d6ad0742d03d54518e9f
import os, time, sys, subprocess
if len(sys.argv) == 2:
time.sleep(5)
print 'track end'
if sys.platform == 'darwin':
subprocess.Popen(['say', 'hello'])
else:
print 'main begin'
subprocess.Popen(['python', os.path.realpath(__file__), '0'], close_fds=True)
print 'main end'

Both capture output and run on background with threading
As mentioned on this answer, if you capture the output with stdout= and then try to read(), then the process blocks.
However, there are cases where you need this. For example, I wanted to launch two processes that talk over a port between them, and save their stdout to a log file and stdout.
The threading module allows us to do that.
First, have a look at how to do the output redirection part alone in this question: Python Popen: Write to stdout AND log file simultaneously
Then:
main.py
#!/usr/bin/env python3
import os
import subprocess
import sys
import threading
def output_reader(proc, file):
while True:
byte = proc.stdout.read(1)
if byte:
sys.stdout.buffer.write(byte)
sys.stdout.flush()
file.buffer.write(byte)
else:
break
with subprocess.Popen(['./sleep.py', '0'], stdout=subprocess.PIPE, stderr=subprocess.PIPE) as proc1, \
subprocess.Popen(['./sleep.py', '10'], stdout=subprocess.PIPE, stderr=subprocess.PIPE) as proc2, \
open('log1.log', 'w') as file1, \
open('log2.log', 'w') as file2:
t1 = threading.Thread(target=output_reader, args=(proc1, file1))
t2 = threading.Thread(target=output_reader, args=(proc2, file2))
t1.start()
t2.start()
t1.join()
t2.join()
sleep.py
#!/usr/bin/env python3
import sys
import time
for i in range(4):
print(i + int(sys.argv[1]))
sys.stdout.flush()
time.sleep(0.5)
After running:
./main.py
stdout get updated every 0.5 seconds for every two lines to contain:
0
10
1
11
2
12
3
13
and each log file contains the respective log for a given process.
Inspired by: https://eli.thegreenplace.net/2017/interacting-with-a-long-running-child-process-in-python/
Tested on Ubuntu 18.04, Python 3.6.7.

You probably want to start investigating the os module for forking different threads (by opening an interactive session and issuing help(os)). The relevant functions are fork and any of the exec ones. To give you an idea on how to start, put something like this in a function that performs the fork (the function needs to take a list or tuple 'args' as an argument that contains the program's name and its parameters; you may also want to define stdin, out and err for the new thread):
try:
pid = os.fork()
except OSError, e:
## some debug output
sys.exit(1)
if pid == 0:
## eventually use os.putenv(..) to set environment variables
## os.execv strips of args[0] for the arguments
os.execv(args[0], args)

You can use
import os
pid = os.fork()
if pid == 0:
Continue to other code ...
This will make the python process run in background.

I haven't tried this yet but using .pyw files instead of .py files should help. pyw files dosen't have a console so in theory it should not appear and work like a background process.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Threading using less RAM than Popen, why? - python

The popen() version creates 3 whole new processes, each of which runs its own, distinct Python interpreter. The threading version run 3 threads within the current process, sharing its memory space and the single, original Python interpreter.

Related

How to run multiple servers with a python script? [duplicate]

Python doctest hangs using ProcessPoolExecutor

How to delay the execution of a script in Python?

input() blocks other python processes in Windows 8 (python 3.3)

How to start a background process in Python?

Categories

Resources