Python multi threading

Python multi threading - python

I have to write Python program that spawns 3 threads. Pass a ID to each thread as a parameter (numbers 1 through 3). In each thread, call the following JSON endpoint substituting {ID} with the ID passed to the thread.
https://jsonplaceholder.typicode.com/posts/{ID}
Parse the JSON string into a dict in each thread before returning it to the main thread. Combine the results of all child threads into a list in the main thread.my code like below
import threading
import requests
import json
import queue
q = queue.Queue()
def main():
th_list=[]
for i in range(1,4):
t = threading.Thread(target=call_url, args=(i,q))
th_list.append(t)
print(th_list)
for thread in th_list:
thread.start()
#for thread in th_list:
#print(thread.join())
print(q.get())
def call_url(i,q):
url='https://jsonplaceholder.typicode.com/posts/' + str(i)
response=requests.get(url)
o_dict=json.loads(response.content)
q.put(o_dict)
if __name__=='__main__':
main()
But that gets me None as the result. Any help will be appreciated.

Related

How to use Queue with threading properly

I am new to queue & threads kindly help with the below code , here I am trying to execute the function hd , I need to run the function multiple times but only after a single run has been completed
import queue
import threading
import time
fifo_queue = queue.Queue()
def hd():
print("hi")
time.sleep(1)
print("done")
for i in range(3):
cc = threading.Thread(target=hd)
fifo_queue.put(cc)
cc.start()
Current Output
hi
hi
hi
donedonedone
Expected Output
hi
done
hi
done
hi
done

You can use a Semaphore for your purposes
A semaphore manages an internal counter which is decremented by each acquire() call and incremented by each release() call. The counter can never go below zero; when acquire() finds that it is zero, it blocks, waiting until some other thread calls release().
A default value of Semaphore is 1,
class threading.Semaphore(value=1)
so only one thread would be active at once:
import queue
import threading
import time
fifo_queue = queue.Queue()
semaphore = threading.Semaphore()
def hd():
with semaphore:
print("hi")
time.sleep(1)
print("done")
for i in range(3):
cc = threading.Thread(target=hd)
fifo_queue.put(cc)
cc.start()
hi
done
hi
done
hi
done
As #user2357112supportsMonica mentioned in comments RLock would be more safe option
class threading.RLock
This class implements reentrant lock objects. A reentrant lock must be released by the thread that acquired it. Once a thread has acquired a reentrant lock, the same thread may acquire it again without blocking; the thread must release it once for each time it has acquired it.
import queue
import threading
import time
fifo_queue = queue.Queue()
lock = threading.RLock()
def hd():
with lock:
print("hi")
time.sleep(1)
print("done")
for i in range(3):
cc = threading.Thread(target=hd)
fifo_queue.put(cc)
cc.start()

please put the print("down") before sleep.
it will work fine.
Reason:
your program will do this:
thread1:
print
sleep
print
but while the thread is sleeping, other threads will be working and printing their first command.
in my way the thread will write the first, write the second and then go to sleep and wait for other threads to show up.

How do I share data between processes in python?

here is a simple example:
from collections import deque
from multiprocessing import Process
global_dequeue = deque([])
def push():
global_dequeue.append('message')
p = Process(target=push)
p.start()
def pull():
print(global_dequeue)
pull()
the output is deque([])
if I was to call push function directly, not as a separate process, the output would be deque(['message'])
How can get the message into deque, but still run push function in a separate process?

You can share data by using multiprocessing Queue object which is designed to share data between processes:
from multiprocessing import Process, Queue
import time
def push(q): # send Queue to function as argument
for i in range(10):
q.put(str(i)) # put element in Queue
time.sleep(0.2)
q.put("STOP") # put poison pillow to stop taking elements from Queue in master
if __name__ == "__main__":
q = Queue() # create Queue instance
p = Process(target=push, args=(q,),) # create Process
p.start() # start it
while True:
x = q.get()
if x == "STOP":
break
print(x)
p.join() # join process to our master process and continue master run
print("Finish")
Let me know if it helped, feel free to ask questions.

You can also use Managers to achieve this.
Python 2: https://docs.python.org/2/library/multiprocessing.html#managers
Python 3:https://docs.python.org/3.8/library/multiprocessing.html#managers
Example of usage:
https://pymotw.com/2/multiprocessing/communication.html#managing-shared-state

Issue with MultiProcessing in Python with BeautifulSoup 4

I'm having issuing using most or all of the cores to process the files faster , it can be reading multiple files a time or using multiple cores to read a single file.
I would prefer using multiple cores to read a single file before moving it to the next.
I tried the code below but can't seem to get all the core used up.
The following code would basically retrieve *.txt file in the directory which contains htmls , in json format.
#!/usr/bin/python
# -*- coding: utf-8 -*-
import requests
import json
import urlparse
import os
from bs4 import BeautifulSoup
from multiprocessing.dummy import Pool # This is a thread-based Pool
from multiprocessing import cpu_count
def crawlTheHtml(htmlsource):
htmlArray = json.loads(htmlsource)
for eachHtml in htmlArray:
soup = BeautifulSoup(eachHtml['result'], 'html.parser')
if all(['another text to search' not in str(soup),
'text to search' not in str(soup)]):
try:
gd_no = ''
try:
gd_no = soup.find('input', {'id': 'GD_NO'})['value']
except:
pass
r = requests.post('domain api address', data={
'gd_no': gd_no,
})
except:
pass
if __name__ == '__main__':
pool = Pool(cpu_count() * 2)
print(cpu_count())
fileArray = []
for filename in os.listdir(os.getcwd()):
if filename.endswith('.txt'):
fileArray.append(filename)
for file in fileArray:
with open(file, 'r') as myfile:
htmlsource = myfile.read()
results = pool.map(crawlTheHtml(htmlsource), f)
On top of that , i'm not sure what the ,f represent.
Question 1 :
What did i not do properly to fully utilize all the cores/threads ?
Question 2 :
Is there a better way to use try : except : because sometimes the value is not in the page and that would cause the script to stop. When dealing with multiple variables, i will end up with a lot of try & except statement.

Answer to question 1, your problem is this line:
from multiprocessing.dummy import Pool # This is a thread-based Pool
Answer taken from: multiprocessing.dummy in Python is not utilising 100% cpu
When you use multiprocessing.dummy, you're using threads, not processes:
multiprocessing.dummy replicates the API of multiprocessing but is no
more than a wrapper around the threading module.
That means you're restricted by the Global Interpreter Lock (GIL), and only one thread can actually execute CPU-bound operations at a time. That's going to keep you from fully utilizing your CPUs. If you want get full parallelism across all available cores, you're going to need to address the pickling issue you're hitting with multiprocessing.Pool.

i had this probleme
you need to do
from multiprocessing import Pool
from multiprocessing import freeze_support
and you need to do in the end
if __name__ = '__main__':
freeze_support()
and you can continue your script

from multiprocessing import Pool, Queue
from os import getpid
from time import sleep
from random import random
MAX_WORKERS=10
class Testing_mp(object):
def __init__(self):
"""
Initiates a queue, a pool and a temporary buffer, used only
when the queue is full.
"""
self.q = Queue()
self.pool = Pool(processes=MAX_WORKERS, initializer=self.worker_main,)
self.temp_buffer = []
def add_to_queue(self, msg):
"""
If queue is full, put the message in a temporary buffer.
If the queue is not full, adding the message to the queue.
If the buffer is not empty and that the message queue is not full,
putting back messages from the buffer to the queue.
"""
if self.q.full():
self.temp_buffer.append(msg)
else:
self.q.put(msg)
if len(self.temp_buffer) > 0:
add_to_queue(self.temp_buffer.pop())
def write_to_queue(self):
"""
This function writes some messages to the queue.
"""
for i in range(50):
self.add_to_queue("First item for loop %d" % i)
# Not really needed, just to show that some elements can be added
# to the queue whenever you want!
sleep(random()*2)
self.add_to_queue("Second item for loop %d" % i)
# Not really needed, just to show that some elements can be added
# to the queue whenever you want!
sleep(random()*2)
def worker_main(self):
"""
Waits indefinitely for an item to be written in the queue.
Finishes when the parent process terminates.
"""
print "Process {0} started".format(getpid())
while True:
# If queue is not empty, pop the next element and do the work.
# If queue is empty, wait indefinitly until an element get in the queue.
item = self.q.get(block=True, timeout=None)
print "{0} retrieved: {1}".format(getpid(), item)
# simulate some random length operations
sleep(random())
# Warning from Python documentation:
# Functionality within this package requires that the __main__ module be
# importable by the children. This means that some examples, such as the
# multiprocessing.Pool examples will not work in the interactive interpreter.
if __name__ == '__main__':
mp_class = Testing_mp()
mp_class.write_to_queue()
# Waits a bit for the child processes to do some work
# because when the parent exits, childs are terminated.
sleep(5)

Python Multiprocessing Queue: Reading queue from another module

I have an issue reading a multiprocessing queue the function for reading the queue is being called from another module.
below is the class containing the function to start a thread which runs function_to_get_data. The class resides in its own file, which I will call one.py. function_to_get_data is in another file, two.py and is an infinite loop which puts data into the queue (code snippet for this further down). It also contains the function to read the queue. The Queue q is defined globally at the beginning.
import multiprocessing
from two import function_to_get_data
q = multiprocessing.Queue()
class Poller:
def startPoller(self):
pollerThread = multiprocessing.Process(target=module_to_get_data,args=(q,))
pollerThread.start()
def getPoller(self):
if q.empty():
print "queue is empty"
else:
pollResQueue = q.get()
q.put(pollResQueue)
return pollResQueue
if __name__ == "__main__":
startpoll = Poller()
startpoll.startPoller()
Below is snippet from function_to_get_data:
def module_to_get_data(q):
while 1:
# performs actions #
q.put(data_from_actions)
I have a another module, three.py, which requires the data from the queue and requests it by calling the function from the initial class:
from one import Poller
externalPoller = Poller()
data_this_module_needs = externalPoller.getPoller()
The issue is that the Queue is always empty.
I should add that the function in three.py is also called as a thread in one.py by a post from a web page:
def POST(data):
data = web.input()
if data == 'Start':
thread_two = multiprocessing.Process(target= function_in_three_py, args=(q,))
thread_two.start()
If I use the python command line and enter the two Poller functions and call them, I get data from the queue no problem.

Python - Using parallel processes communicating with API to improve speed

I'm trying to write a program in which 1 function adds information to a queue and the other function reads the queue in the meanwhile and does some misc task with it. The program has to put info in the queue and read from it at the same time.
Example:
from multiprocessing import Process
from time import clock, sleep
import Queue
class HotMess(object):
def __init__(self):
self.market_id=['1','2','3']
self.q2=Queue.Queue()
self.session=True
def run(self):
if __name__=='__main__':
t = Process(target=self.get_data)
k = Process(target=self.main_code)
t.run()
k.run()
def main_code(self):
while self.session:
result=self.q2.get()
print(result)
def get_data(self):
#while self.session:
for market in self.market_id:
self.q2.put(market)
mess=HotMess()
mess.run()
Now this produces output of 1 2 3. So far so good. But I actually want get_data to be a while loop and basicly run indefinately. Now if you uncomment while self.session: in the get_data function and do the indent. Now it doesn't produce any output and I think this is due that the Process get_data doesn't finish as long as self.session=True.
My question is how can I main_code() not wait for get_data() and just start working the Queue so they both interact with the Queue (q2). I tried looking at process/threading/POpen but i'm quite far out of my comfort zone and at a bit of a loss.

look here
you should use multiprocessing.Queue. it's for the communication between different processes.
And I also change run to start and join.
from multiprocessing import Process, Queue
from time import clock, sleep
class HotMess(object):
def __init__(self):
self.market_id=['1','2','3']
self.q2=Queue()
self.session=True
def run(self):
if __name__=='__main__':
t = Process(target=self.get_data)
k = Process(target=self.main_code)
t.start()
k.start()
t.join()
k.join()
def main_code(self):
while self.session:
result=self.q2.get()
print(result)
def get_data(self):
while self.session:
for market in self.market_id:
self.q2.put(market)
mess=HotMess()
mess.run()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python multi threading - python

Related

How to use Queue with threading properly

How do I share data between processes in python?

Issue with MultiProcessing in Python with BeautifulSoup 4

Python Multiprocessing Queue: Reading queue from another module

Python - Using parallel processes communicating with API to improve speed

Categories

Resources