I'm trying to run the following in Eclipse (using PyDev) and I keep getting error :
q = queue.Queue(maxsize=0)
NameError: global name 'queue' is not defined
I've checked the documentations and appears that is how its supposed to be placed. Am I missing something here? Is it how PyDev works? or missing something in the code? Thanks for all help.
from queue import *
def worker():
while True:
item = q.get()
do_work(item)
q.task_done()
def main():
q = queue.Queue(maxsize=0)
for i in range(num_worker_threads):
t = Thread(target=worker)
t.daemon = True
t.start()
for item in source():
q.put(item)
q.join() # block until all tasks are done
main()
Using:
Eclipse SDK
Version: 3.8.1
Build id: M20120914-1540
and Python 3.3
You do
from queue import *
This imports all the classes from the queue module already. Change that line to
q = Queue(maxsize=0)
CAREFUL: "Wildcard imports (from import *) should be avoided, as they make it unclear which names are present in the namespace, confusing both readers and many automated tools". (Python PEP-8)
As an alternative, one could use:
from queue import Queue
q = Queue(maxsize=0)
That's because you're using : from queue import *
and then you're trying to use :
queue.Queue(maxsize=0)
remove the queue part, because from queue import * imports all the attributes to the current namespace. :
Queue(maxsize=0)
or use import queue instead of from queue import *.
If you import from queue import * this is mean that all classes and functions importing in you code fully. So you must not write name of the module, just q = Queue(maxsize=100). But if you want use classes with name of module: q = queue.Queue(maxsize=100) you mast write another import string: import queue, this is mean that you import all module with all functions not only all functions that in first case.
make sure your code is not under queue.py rename it to something else.
if your file name is queue.py it will try to search in the same file.
You Can install kombu with
pip install kombu
and then Import queue Just like this
from kombu import Queue
Related
I am trying to implement clock process to use in Heroku dyno. I am using Python 3.6. Clock process will run each 3 hours. This is the code:
import os
import sys
import requests
from apscheduler.schedulers import asyncio
from apscheduler.schedulers.asyncio import AsyncIOScheduler
from apscheduler.triggers.interval import IntervalTrigger
from webdriverdownloader import GeckoDriverDownloader
from scraper.common import main
def get_driver():
return True
def notify(str):
return True;
if __name__ == '__main__':
scheduler = AsyncIOScheduler()
get_driver()
scheduler.add_job(main, trigger=IntervalTrigger(hours=3))
scheduler.start()
# Execution will block here until Ctrl+C (Ctrl+Break on Windows) is pressed.
try:
loop = asyncio.get_event_loop()
loop.run_until_complete(asyncio.wait())
except (KeyboardInterrupt, SystemExit):
pass
At first I tried with
asyncio.get_event_loop().run_forever()
However, I read that this is not supported in python 3.6, so I changed this statement to run_until_complete.
If I run this example, the code prints out:
AttributeError: module 'apscheduler.schedulers.asyncio' has no attribute 'get_event_loop'
Does anyone know why this error occurs? Any help would be much appreciated. Thanks in advance!
You're not importing the asyncio module from the standard library but the asyncio module in the apscheduler library. You can see that by visiting the link here.
There are only two things you can import from that namespace:
run_in_event_loop
AsyncIOScheduler
If you need to use the low-level asyncio API just import asyncio directly from the standard library.
I'm having issuing using most or all of the cores to process the files faster , it can be reading multiple files a time or using multiple cores to read a single file.
I would prefer using multiple cores to read a single file before moving it to the next.
I tried the code below but can't seem to get all the core used up.
The following code would basically retrieve *.txt file in the directory which contains htmls , in json format.
#!/usr/bin/python
# -*- coding: utf-8 -*-
import requests
import json
import urlparse
import os
from bs4 import BeautifulSoup
from multiprocessing.dummy import Pool # This is a thread-based Pool
from multiprocessing import cpu_count
def crawlTheHtml(htmlsource):
htmlArray = json.loads(htmlsource)
for eachHtml in htmlArray:
soup = BeautifulSoup(eachHtml['result'], 'html.parser')
if all(['another text to search' not in str(soup),
'text to search' not in str(soup)]):
try:
gd_no = ''
try:
gd_no = soup.find('input', {'id': 'GD_NO'})['value']
except:
pass
r = requests.post('domain api address', data={
'gd_no': gd_no,
})
except:
pass
if __name__ == '__main__':
pool = Pool(cpu_count() * 2)
print(cpu_count())
fileArray = []
for filename in os.listdir(os.getcwd()):
if filename.endswith('.txt'):
fileArray.append(filename)
for file in fileArray:
with open(file, 'r') as myfile:
htmlsource = myfile.read()
results = pool.map(crawlTheHtml(htmlsource), f)
On top of that , i'm not sure what the ,f represent.
Question 1 :
What did i not do properly to fully utilize all the cores/threads ?
Question 2 :
Is there a better way to use try : except : because sometimes the value is not in the page and that would cause the script to stop. When dealing with multiple variables, i will end up with a lot of try & except statement.
Answer to question 1, your problem is this line:
from multiprocessing.dummy import Pool # This is a thread-based Pool
Answer taken from: multiprocessing.dummy in Python is not utilising 100% cpu
When you use multiprocessing.dummy, you're using threads, not processes:
multiprocessing.dummy replicates the API of multiprocessing but is no
more than a wrapper around the threading module.
That means you're restricted by the Global Interpreter Lock (GIL), and only one thread can actually execute CPU-bound operations at a time. That's going to keep you from fully utilizing your CPUs. If you want get full parallelism across all available cores, you're going to need to address the pickling issue you're hitting with multiprocessing.Pool.
i had this probleme
you need to do
from multiprocessing import Pool
from multiprocessing import freeze_support
and you need to do in the end
if __name__ = '__main__':
freeze_support()
and you can continue your script
from multiprocessing import Pool, Queue
from os import getpid
from time import sleep
from random import random
MAX_WORKERS=10
class Testing_mp(object):
def __init__(self):
"""
Initiates a queue, a pool and a temporary buffer, used only
when the queue is full.
"""
self.q = Queue()
self.pool = Pool(processes=MAX_WORKERS, initializer=self.worker_main,)
self.temp_buffer = []
def add_to_queue(self, msg):
"""
If queue is full, put the message in a temporary buffer.
If the queue is not full, adding the message to the queue.
If the buffer is not empty and that the message queue is not full,
putting back messages from the buffer to the queue.
"""
if self.q.full():
self.temp_buffer.append(msg)
else:
self.q.put(msg)
if len(self.temp_buffer) > 0:
add_to_queue(self.temp_buffer.pop())
def write_to_queue(self):
"""
This function writes some messages to the queue.
"""
for i in range(50):
self.add_to_queue("First item for loop %d" % i)
# Not really needed, just to show that some elements can be added
# to the queue whenever you want!
sleep(random()*2)
self.add_to_queue("Second item for loop %d" % i)
# Not really needed, just to show that some elements can be added
# to the queue whenever you want!
sleep(random()*2)
def worker_main(self):
"""
Waits indefinitely for an item to be written in the queue.
Finishes when the parent process terminates.
"""
print "Process {0} started".format(getpid())
while True:
# If queue is not empty, pop the next element and do the work.
# If queue is empty, wait indefinitly until an element get in the queue.
item = self.q.get(block=True, timeout=None)
print "{0} retrieved: {1}".format(getpid(), item)
# simulate some random length operations
sleep(random())
# Warning from Python documentation:
# Functionality within this package requires that the __main__ module be
# importable by the children. This means that some examples, such as the
# multiprocessing.Pool examples will not work in the interactive interpreter.
if __name__ == '__main__':
mp_class = Testing_mp()
mp_class.write_to_queue()
# Waits a bit for the child processes to do some work
# because when the parent exits, childs are terminated.
sleep(5)
I am using python 2.7 multiprocessing on windows 7:
import multiprocessing as mp
from Queue import Queue
from multiprocessing.managers import AutoProxy
if __name__ == '__main__':
manager = mp.Manager()
myqueue = manager.Queue()
print myqueue
print type(myqueue)
print isinstance(myqueue, Queue)
print isinstance(myqueue, AutoProxy)
Output:
<Queue.Queue instance at 0x0000000002956B08>
<class 'multiprocessing.managers.AutoProxy[Queue]'>
False
Traceback (most recent call last):
File "C:/Users/User/TryHere.py", line 12, in <module> print
isinstance(myqueue, AutoProxy) TypeError: isinstance() arg 2 must be a
class, type, or tuple of classes and types
My question is: I would like to check if a variable is an instance of a multiprocessing queue, how should i go about checking?
I have referred to:
Check for instance of Python multiprocessing.Connection?
Accessing an attribute of a multiprocessing Proxy of a class
but they dont seem to address my issue. Thanks in advance!
Question: I would like to check if a variable is an instance of a multiprocessing queue, how should i go about checking?
It's a Proxy object, multiprocessing.managers.BaseProxy does match:
from multiprocessing.managers import BaseProxy
print(isinstance(myqueue, BaseProxy))
>>>True
Tested with Python: 3.4.2 and 2.7.9
For Python 3.6, the equivalent would be
import multiprocessing
test_queue = multiprocessing.Queue()
type(test_queue) == multiprocessing.queues.Queue:
>>> True
We can choose not to perform a type-check upon a newly created Queue object as proposed by #mikeye
Here's what I do:
import multiprocessing as mp
my_queue = mp.Queue()
print(type(my_queue) == type(mp.Queue()))
>>>True
For Python 3.10, we can use the Queue under the queues namespace, while still using isinstance().
import multiprocessing as mp
my_q = mp.Queue()
isinstance(my_q, mp.queues.Queue)
>>> True
import asyncio
isinstance(my_q, asyncio.Queue)
>>> False
import queue
isinstance(my_q, queue.Queue)
>>> False
I know this doesn't exactly cover the original question, but since the error is much the same as with only using mp.Queue instead of mp.queues.Queue, I thought I'd add this.
I'm a bit new to coding/scripting in general and need some help implementing multiprocessing.
I currently have two functions that I'll concentrate on here. The first, def getting_routes(router):logs into all my routers (the list of routers comes from a previous function) and runs a command. The second function `def parse_paths(routes): parses the results of this command.
def get_list_of_routers
<some code>
return routers
def getting_routes(router):
routes = sh.ssh(router, "show ip route")
return routes
def parse_paths(routes):
l = routes.split("\n")
...... <more code>.....
return parsed_list
My list is roughly 50 routers long and along with subsequent parsing, takes quite a bit of time so I'd like to use the multiprocessing module to run the sshing into routers, command execution, and subsequent parsing in parallel across all routers.
I wrote:
#!/usr/bin/env python
import multiprocessing.dummy import Pool as ThreadPool
def get_list_of_routers (***this part does not need to be threaded)
<some code>
return routers
def getting_routes(router):
routes = sh.ssh(router, "show ip route")
return routes
def parse_paths(routes):
l = routes.split("\n")
...... <more code>.....
return parsed_list
if __name__ == '__main__':
worker_1 = multiprocessing.Process(target=getting_routes)
worker_2 = multiprocessing.Process(target=parse_paths)
worker_1.start()
worker_2.start()
What I'd like is for parallel sshing into a router, running the command, and returning the parsed output. I've been reading http://kmdouglass.github.io/posts/learning-pythons-multiprocessing-module.html and the multiprocessing module but am still not getting the results I need and keep getting undefined errors. Any help with what I might be missing in the multiprocessing module? Thanks in advance!
Looks like you're not sending the router parameter to the getting_routes function.
Also, I think using threads will be sufficient, you don't need to create new processes.
What you need to do is create a loop in you main block that will start a new thread for each router that is returned from the get_list_of_routers function. Then you have two options - either call the parse_paths function from within the thread, or get the return results from the threads and then call the parse_paths.
For example:
import Queue
from threading import Thread
que = Queue.Queue()
threads = []
for router in get_list_of_routers():
t = Thread(target=lambda q, arg1: q.put(getting_routers(arg1)), args=(que, router))
t.start()
threads.append(t)
for t in threads:
t.join()
results = []
while not que.empty():
results.append(que.get())
parse_paths(results)
I am facing the problem with collecting logs from the following script.
Once I set up the SLEEP_TIME to too "small" value, the LoggingThread
threads somehow block the logging module. The script freeze on logging request
in the action function. If the SLEEP_TIME is about 0.1 the script collect
all log messages as I expect.
I tried to follow this answer but it does not solve my problem.
import multiprocessing
import threading
import logging
import time
SLEEP_TIME = 0.000001
logger = logging.getLogger()
ch = logging.StreamHandler()
ch.setFormatter(logging.Formatter('%(asctime)s %(levelname)s %(funcName)s(): %(message)s'))
ch.setLevel(logging.DEBUG)
logger.setLevel(logging.DEBUG)
logger.addHandler(ch)
class LoggingThread(threading.Thread):
def __init__(self):
threading.Thread.__init__(self)
def run(self):
while True:
logger.debug('LoggingThread: {}'.format(self))
time.sleep(SLEEP_TIME)
def action(i):
logger.debug('action: {}'.format(i))
def do_parallel_job():
processes = multiprocessing.cpu_count()
pool = multiprocessing.Pool(processes=processes)
for i in range(20):
pool.apply_async(action, args=(i,))
pool.close()
pool.join()
if __name__ == '__main__':
logger.debug('START')
#
# multithread part
#
for _ in range(10):
lt = LoggingThread()
lt.setDaemon(True)
lt.start()
#
# multiprocess part
#
do_parallel_job()
logger.debug('FINISH')
How to use logging module in multiprocess and multithread scripts?
This is probably bug 6721.
The problem is common in any situation where you have locks, threads and forks. If thread 1 had a lock while thread 2 calls fork, in the forked process, there will only be thread 2 and the lock will be held forever. In your case, that is logging.StreamHandler.lock.
A fix can be found here (permalink) for the logging module. Note that you need to take care of any other locks, too.
I've run into similar issue just recently while using logging module together with Pathos multiprocessing library. Still not 100% sure, but it seems, that in my case the problem may have been caused by the fact, that logging handler was trying to reuse a lock object from within different processes.
Was able to fix it with a simple wrapper around default logging Handler:
import threading
from collections import defaultdict
from multiprocessing import current_process
import colorlog
class ProcessSafeHandler(colorlog.StreamHandler):
def __init__(self):
super().__init__()
self._locks = defaultdict(lambda: threading.RLock())
def acquire(self):
current_process_id = current_process().pid
self._locks[current_process_id].acquire()
def release(self):
current_process_id = current_process().pid
self._locks[current_process_id].release()
By default, multiprocessing will fork() the process in the pool when running on Linux. The resulting subprocess will lose all running threads except for the main one. So if you're on Linux, that's the problem.
First action item: You shouldn't ever use the fork()-based pool; see https://pythonspeed.com/articles/python-multiprocessing/ and https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods.
On Windows, and I think newer versions of Python on macOS, the "spawn"-based pool is used. This is also what you ought use on Linux. In this setup, a new Python process is started. As you would expect, the new process doesn't have any of the threads from the parent process, because it's a new process.
Second action item: you'll want to have logging setup done in each subprocess in the pool; the logging setup for the parent process isn't sufficient to get logs from the worker processes. You do this with the initializer keyword argument to Pool, e.g. write a function called setup_logging() and then do pool = multiprocessing.Pool(initializer=setup_logging) (https://docs.python.org/3/library/multiprocessing.html#module-multiprocessing.pool).