Why pool.imap() does not even execute one task at Jupyter Notebook?

Why pool.imap() does not even execute one task at Jupyter Notebook? - python

Even a very simple piece of code does not return anything but always keeps running.
pool = mp.Pool(processes=10)
def add1(x):
return x + 1
for x in pool.imap(add1, [1,2,3]):
print(x)
pool.close()
And any other operations can not be done if it's running, including shutting down the kernel!

Multiprocessing only works reliably when you're doing it from a file. If you are on a Mac or Windows, multiprocessing starts a new Python process. That Python process reads the file from which it was started in order to know its own code, but doesn't execute the __main__ code. It then executes what it was to execute.
This process doesn't work with Jupyter because there is no Python file to read.
Multithreading should work fine.

Related

Can't get python subprocess.Popen() to start another python script in the background

I am in a bit of a pickle here. I have a python script (gather.py) that gathers information from an .xml file and uploads it into a database on a infinite loop that sleeps for 60sec; btw all of this is local. I am using Flask to run a webpage that will later pull information from the database, but at the moment all it does is display a sample page (main.py). I want to run main.py as for it to start gather.py as background process that won't prevent Flask from starting, I tried importing gather.py but it halts the process (indefinitely) and Flask won't start. After Googling for a while it seems that the best option is to use a task queue (Celery) and a message-broker (RabbitMQ) to take care of this. This is fine if the application were to do a lot of stuff in the background, but I only need it to do 1 or 2 things. So I did more digging and found posts stating that subprocess.Popen() could do the job. I tried using it and I don't think it failed, since it didn't raise any errors, but the database is empty. I confirmed that both gather.py and main.py work independently. I tried running the following code in IDLE:
subprocess.Popen([sys.executable, 'path\to\gather.py'], stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
and got this in return:
<subprocess.Popen object at 0x049A1CF0>
Now, I don't know what this means, I tried using .value and .attrib but understandably I get this:
AttributeError: 'Popen' object has no attribute 'value'
and
AttributeError: 'Popen' object has no attribute 'attrib'
Then I read on a StackOverflow post that stdout=subprocess.PIPE would cause the program to halt so, in a 'just in case' moment, I ran:
subprocess.Popen([sys.executable, 'path\to\gather.py'], stdout=subprocess.DEVNULL, stderr=subprocess.STDOUT)
and got this in return:
<subprocess.Popen object at 0x034A77D0>
Through all this process the database tables have remained empty. I am new to the subprocess module but all this checks and I can't figure out why it is not running gather.py. Is it because it has an infinite loop?? If there is a better option pls let me know.
Python version: 3.4.4
PS. IDK if it'll matter but I am running a portable version of Python (PortableApps) on a Windows 10 PC. This is why I included sys.executable inside subprocess.Popen().

Solution 1 (all in python script):
Try to use Thread and Queue.
I do this:
from flask import Flask
from flask import request
import json
from Queue import Queue
from threading import Thread
import time
def task(q):
q.put(0)
t = time.time()
while True:
time.sleep(1)
q.put(time.time() - t)
queue = Queue()
worker = Thread(target=task, args=(queue,))
worker.start()
app = Flask(__name__)
#app.route('/')
def message_from_queue():
msg = "Running: calculate %f seconds" % queue.get()
return msg
if __name__ == '__main__':
app.run(host='0.0.0.0')
If you run this code each access to '/' get a value calculate in task in background. Maybe you need to block until the task get a value, but it isnt enough information in the question. Of course you need to refactor your gather.py to pass a queue for it.
Solution 2 (using a system script):
For windows, create a .bat file and run both script from there:
#echo off
start python 'path\to\gather.py'
set FLASK_APP=app.py
flask run
This will run gather.py and after start the flask server. If you use start /min python 'path\to\gather.py' the gather will run in minimized mode.

subprocess.Popen will not work in opening a python program because it recognizes python as a file and not a executable. Subprocess.Popen can only open .exe files and nothing other than that.
You can use:
os.system('python_file_path.py')
but it won't be a background process(depends on your script)

Large amount of python multiprocessing causes Memory Errors

Intro
Hi, I'm trying to run a windows OS command in a loop using python 3 multiprocessing, but when the loop gets to big (thousand commands) I'm getting memory errors and the process exits / never completes.
Why?
I need to run 65,000 commands as fast as possible, and one by one seems non efficient. these commands are a windows normal command (dir is for example).
-- I do not need the results of the command ! just for it to run.
Code
import multiprocessing
import subprocess
def worker(num):
print("worker:", num)
subprocess.Popen('dir') # or os.system('dir') for example
return
def main():
jobs = []
for i in list(range(1,65535)):
i = str(i)
p = multiprocessing.Process(target=worker, args=(i,))
jobs.append(p)
p.start()
Question
What am I doing wrong here? whats the correct way to run a windows OS command multiple times with python (while maintaining any threading).

You should limit the number of workers running at the same time.
You can use the p.is_alive() to check how many of them are currently running.

Print output from python to C# application

I have made a C# app which calls a python script.
C# app uses Process object to call python script.
I also have redirected the sub-process standard output so I can process the output from python script.
But the problem is:
The output(via print function) from python will always arrive at once when the script terminates.
I want the output to arrive in real time while script running.
I can say I have tried almost all of method can get from google, like add flush of sys.out, redirect sysout in python, C# event driven message receiving or just using while to wait message etc,.
How to flush output of print function?
PyInstaller packaged application works fine in Console mode, crashes in Window mode
I am very wondering that like PyCharm or other python IDE, they run python script inside, but they can print the output one by one without hacking original python script, how they do that?
The python version is 2.7.
Hope to have advise.
Thank you!

I just use very stupid but working method to resolve it:
using thread to periodically flush the sys.out, the code piece is like this:
import sys
import os
import threading
import time
run_thread = False
def flush_print():
while run_thread:
# print 'something'
sys.stdout.flush()
time.sleep(1)
in main function:
if __name__ == '__main__':
thread = threading.Thread(target=flush_print)
run_thread = True
thread.start()
# my big functions with some prints, the function will block until completed
run_thread = False
thread.join()
Apparently this is ugly, but I have no better method to make work done .

Why does the billiard multiprocessing module require the "if name=='main'" line?

If I have the following code:
def f():
print 'ok!'
import sys
sys.exit()
if __name__=='__main__':
import billiard
billiard.forking_enable(0)
p = billiard.Process( target=f)
p.start()
while p.is_alive():
pass
The script behaves as expected, printing "ok!" and ending. But if I omit the if __name__=='__main__': line and de-indent the following lines, my machine (OS X) goes crazy, continually spawning tons of Python processes until I killall Python. Any idea what's going on here?
(To those marking this as a duplicate, note that while the other question asks the purpose of if __name__=='__main__' generally, I'm specifically asking why failure to use it here causes dramatically unexpected behaviour)

You're disabling fork support with the line:
billiard.forking_enable(0)
That means that the library will need to spawn (instead of fork) your child process, and have it re-import the __main__ module to run f, just like Windows does. Without the if __name__ ... guard, re-importing the __main__ module in the children will also mean re-running your code that creates the billiard.Process, which creates an infinite loop.
If you leave fork enabled, the re-import in the child process isn't necessary, so everything works fine with or without the if __name__ ... guard.

input() blocks other python processes in Windows 8 (python 3.3)

Working on a multi-threaded cross-platform python3.3 application I came across some weird behavior I was not expecting and am not sure is expected. The issue is on Windows 8 calling the input() method in one thread blocks other threads until it completes. I have tested the below example script on three Linux, two Windows 7 and one Windows 8 computers and this behavior is only observed on the Windows 8 computer. Is this expected behavior for Windows 8?
test.py:
import subprocess, threading, time
def ui():
i = input("-->")
print(i)
def loop():
i = 0
f = 'sky.{}'.format(i)
p = subprocess.Popen(['python', 'copy.py', 'sky1', f])
t = time.time()
while time.time() < t+15:
if p.poll() != None:
print(i)
time.sleep(3)
i+=1
f = 'sky.{}'.format(i)
p = subprocess.Popen(['python', 'copy.py', 'sky1', f])
p.terminate()
p.wait()
def start():
t1 = threading.Thread(target=ui)
t2 = threading.Thread(target=loop)
t1.start()
t2.start()
return t2
t2 = start()
t2.join()
print('done')
copy.py:
import shutil
import sys
src = sys.argv[1]
dst = sys.argv[2]
print('Copying \'{0}\' to \'{1}\''.format(src, dst))
shutil.copy(src, dst)
Update:
While trying out one of the suggestions I realized that I rushed to a conclusion missing something obvious. I apologize for getting off to a false start.
As Schollii suggested just using threads (no subprocess or python files) results in all threads making forward progress so the problem actually is using input() in one python process will cause other python processes to block/not run (I do not know exactly what is going on). Furthermore, it appears to be just python processes that are affected. If I use the same code shown above (with some modifications) to execute non-python executables with subprocess.Popen they will run as expected.
To summarize:
Using subprocess to execute non-python executable: works as expected with and without any calls to input().
Using subprocess to execute python executable: created processes appear to not run if a a call to input() is made in the original process.
Use subprocess to create python processes with a call to input() in a new process and not the original process: A call to input() blocks all python processes spawned by the 'main' process.
Side Note: I do not have Windows 8 platform so debugging/tests can be a little slow.

Because there are several problems with input in Python 3.0-3.2 this method has been impacted with few changes.
It's possible that we have a new bug again.
Can you try the following variant, which is raw_input() "back port" (which was avaiable in Python 2.x):
...
i = eval(input("-->"))
...

It's a very good problem to work with,
since you are dependent with input() method, which, usually needs the console input,
since you have threads, all the threads are trying to communicate with the console,
So, I advice you to use either Producer-Consumer concept or define all your inputs to a text file and pass the text file to the program.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.