How to call some functions in the same time? - python

I'm wondering what is the best way to run some functions in the same time.
I wrote a Python module that runs 3 instances of Firefox with Selenium webdriver, that should load the same page in each one of them.
my code looks like :
url = "http://google.com"
firefox1 = webdriver.Firefox()
firefox2 = webdriver.Firefox()
firefox3 = webdriver.Firefox()
firefox1.get(url)
firefox2.get(url)
firefox3.get(url)
Selenium is very(!) slow, and each one of the page loading takes about a 30-60 secs.
I want to run all of the firefox*.get(url) parallel.
What is the best way to do that?

if its not that big process, you can use thread (although that wouldn't be a perfect parallel, due to python's GIL but still would do your job to some extent)
2) you can use asynchronous programming for this purpose. if its python3 you can use inbuilt library asyncio
here is sample program(I've not tested but it should give you an idea about asyncio)
import asyncio
def func1(args):
print('func1')
def func2(args):
print('func2')
def func3(args):
print('func3')
loop = asyncio.get_event_loop()
flist = [func1(args), func2(args), func3(args)]
w = asyncio.wait(flist)
loop.run_until_complete(w)

Related

How can I run two(or more) selenium's webdrivers at the same time in Python? [duplicate]

This question already has answers here:
Python selenium multiprocessing
(3 answers)
Closed last month.
I'm trying to run two(or more) selenium webdrivers with Python at the same time
I have so far tried using Python's Multiprocessing module, I have used it this way:
def _():
sets = list()
pool = Pool()
for i in range(len(master)):
driver = setProxy(proxy,f'Tread#{i+1}')
sets.append(
[f'Thread#{i+1}',
driver,
master[i]]
)
for i in range(len(sets)):
pool.apply_async(enterPoint, args=(sets[i][0],sets[i][1],sets[i][2]))
pool.close()
pool.join()
The function above calls setProxy() to get a driver instance with a proxy set to it, which works perfectly and opens a chromedriver len(master) amount of times and accesses a link to check the IP. The sets list is a list of lists that consist of 3 objects that are the Thread number, the driver which will run and a list with the data that the driver will use. Pool's apply_async() should run enterPoint() len(sets) of times, and the args are Thread number, driver and the data
Here's enterPoint code:
def enterPoint(thread,driver,accounts):
print('I exist!')
for account in accounts:
cEEPE(thread,driver,account)
But the 'I exist' statement never gets printed out in the CLI I'm running the application at.
cEEPE() is where the magic happens. I've tested my code without applying multiprocessing to it and it works as it should.
I suspect there's a problem in Pool's apply_async() method, which I might have used it the wrong way.
The code provided in the question is in isolation, so its harder to comment on, but I would set about using this process given the problem described:
import multiprocessing & selenium
use start & join methods.
This would produce the two (or more) processes that you ask for.
import multiprocessing
from selenium import webdriver
def open_browser(name):
driver = webdriver.Firefox()
driver.get("http://www.google.com")
print(name, driver.title)
driver.quit()
if __name__ == '__main__':
process1 = multiprocessing.Process(target=open_browser, args=("Process-1",))
process2 = multiprocessing.Process(target=open_browser, args=("Process-2",))
process1.start()
process2.start()
process1.join()
process2.join()
So, I got the code above to work, here's how I fixed it:
instead of writing the apply_async() method like this:
pool.apply_async(enterPoint, args=(sets[i][0],sets[i][1],sets[i][2]))
here's how I wrote it:
pool.apply_async(enterPoint(sets[i][0],sets[i][1],sets[i][2]))
But still, this does not fix my issue since I would like enterPoint to run twice at the same time..
It can be done easily with SeleniumBase, which can multi-thread tests (Eg: -n=3 for 3 threads), or even set a proxy server (--proxy=USER:PASS#SERVER:PORT)
pip install seleniumbase, then run with python:
from parameterized import parameterized
from seleniumbase import BaseCase
BaseCase.main(__name__, __file__, "-n=3")
class GoogleTests(BaseCase):
#parameterized.expand(
[
["Download Python", "Download Python", "img.python-logo"],
["Wikipedia", "www.wikipedia.org", "img.central-featured-logo"],
["SeleniumBase.io Docs", "SeleniumBase", 'img[alt*="SeleniumB"]'],
]
)
def test_parameterized_google_search(self, search_key, expected_text, img):
self.open("https://google.com/ncr")
self.hide_elements("iframe")
self.type('input[title="Search"]', search_key + "\n")
self.assert_text(expected_text, "#search")
self.click('a:contains("%s")' % expected_text)
self.assert_element(img)
(This example uses parameterized to turn one test into three different ones.) You can also apply the multi-threading to multiple files, etc.

Creating threads in Python iterations [duplicate]

I have done some research and the consensus appears to state that this is impossible without a lot of knowledge and work. However:
Would it be possible to run the same test in different tabs simultaneously?
If so, how would I go about that? I'm using python and attempting to run 3-5 of the same test at once.
This is not a generic test, hence I do not care if it interrupts a clean testing environment.
I think you can do that. But I feel the better or easier way to do that is using different windows. Having said that we can use either multithreading or multiprocessing or subprocess module to trigger the task in parallel (near parallel).
Multithreading example
Let me show you a simple example as to how to spawn multiple tests using threading module.
from selenium import webdriver
import threading
import time
def test_logic():
driver = webdriver.Firefox()
url = 'https://www.google.co.in'
driver.get(url)
# Implement your test logic
time.sleep(2)
driver.quit()
N = 5 # Number of browsers to spawn
thread_list = list()
# Start test
for i in range(N):
t = threading.Thread(name='Test {}'.format(i), target=test_logic)
t.start()
time.sleep(1)
print(t.name + ' started!')
thread_list.append(t)
# Wait for all threads to complete
for thread in thread_list:
thread.join()
print('Test completed!')
Here I am spawning 5 browsers to run test cases at one time. Instead of implementing the test logic I have put sleep time of 2 seconds for the purpose of demonstration. The code will fire up 5 firefox browsers (tested with python 2.7), open google and wait for 2 seconds before quitting.
Logs:
Test 0 started!
Test 1 started!
Test 2 started!
Test 3 started!
Test 4 started!
Test completed!
Process finished with exit code 0
Python 3.2+
Threads with their own webdriver instances (different windows)
Threads can solve your problem with a good performance boost (some explanation here) on different windows. Also threads are lighter than processes.
You should use a concurrent.futures.ThreadPoolExecutor with each thread using its own webdriver.
Also consider adding the headless option for your webdriver.
The example bellow uses a chrome-webdriver. To exemplify uses integer as argument url_test for the test function selenium_test 6 times.
from concurrent import futures
from selenium import webdriver
def selenium_test(test_url):
chromeOptions = webdriver.ChromeOptions()
#chromeOptions.add_argument("--headless") # make it not visible
driver = webdriver.Chrome(options=chromeOptions)
print("testing url {:0} started".format(test_url))
driver.get("https://www.google.com") # replace here by driver.get(test_url)
#<actual work that needs to be done be selenium>
driver.quit()
# default number of threads is optimized for cpu cores
# but you can set with `max_workers` like `futures.ThreadPoolExecutor(max_workers=...)`
with futures.ThreadPoolExecutor() as executor:
future_test_results = [ executor.submit(selenium_test, i)
for i in range(6) ] # running same test 6 times, using test number as url
for future_test_result in future_test_results:
try:
test_result = future_test_result.result() # can use `timeout` to wait max seconds for each thread
#... do something with the test_result
except Exception as exc: # can give a exception in some thread, but
print('thread generated an exception: {:0}'.format(exc))
Outputs:
testing url 1 started
testing url 5 started
testing url 3 started
testing url 4 started
testing url 0 started
testing url 2 started
Look at TestNG, you should be able to find frameworks that achieve this.
I did a brief check and here are a couple of links to get you started:
Parallel Execution & Session Handling in Selenium
Parallel Execution using Selenium Webdriver and TestNG
If you want a reliable, rebost framework that can do parallel execution as well as load testing at scale then look at TurboSelenium : https://butlerthing.io/products#demovideo. Drop us a message and will be happy to discuss this with you.

How to run code in parallel with ThreadPoolExecutor?

Hi i'm really new to threading and it's making me confused, how can i run this code in parallel ?
def search_posts(page):
page_url = f'https://jsonplaceholder.typicode.com/posts/{page}'
req = requests.get(page_url)
res = req.json()
title = res['title']
return title
page = 1
while True:
with ThreadPoolExecutor() as executer:
t = executer.submit(search_posts, page)
title = t.result()
print(title)
if page == 20:
break
page += 1
Another question is do i need to learn operating systems in order to understand how does threading work?
The problem here is that you are creating a new ThreadPoolExecutor for every page. To do things in parallel, create only one ThreadPoolExecutor and use its map method:
import concurrent.futures as cf
import requests
def search_posts(page):
page_url = f'https://jsonplaceholder.typicode.com/posts/{page}'
res = requests.get(page_url).json()
return res['title']
if __name__ == '__main__':
with cf.ThreadPoolExecutor() as ex:
results = ex.map(search_posts, range(1, 21))
for r in results:
print(r)
Note that using the if __name__ == '__main__' wrapper is a good habit in making your code more portable.
One thing to keep in mind when using threads;
If you are using CPython (the Python implementation from python.org which is the most common one), threads don't actually run in parallel.
To make memory management less complicated, only one thread at a time can be executing Python bytecode in CPython. This is enforced by the Global Interpreter Lock ("GIL") in CPython.
The good news is that using requests to get a web page will spend most of its time using network I/O. And in general, the GIL is released during I/O.
But if you are doing calculations in your worker functions (i.e. executing Python bytecode), you should use a ProcessPoolExecutor instead.
If you use a ProcessPoolExecutor and you are running on ms-windows, then using the if __name__ == '__main__' wrapper is required, because Python has to be able to import your main program without side effects in that case.

How to execute requests.get without attachment Python

Right now I am trying to execute asynchronous requests without any related tie-in to each other, similar to how FTP can upload / download more than one file at once.
I am using the following code:
rec = reuests.get("https://url", stream=True)
With
rec.raw.read()
To get responses.
But I am wishing to be able to execute this same piece of code much faster with no need to wait for the server to respond, which takes about 2 seconds each time.
The easiest way to do something like that is to use threads.
Here is a rough example of one of the ways you might do this.
import requests
from multiprocessing.dummy import Pool # the exact import depends on your python version
pool = Pool(4) # the number represents how many jobs you want to run in parallel.
def get_url(url):
rec = requests.get(url, stream=True)
return rec.raw.read()
for result in pool.map(get_url, ["http://url/1", "http://url/2"]:
do_things(result)

Python multiprocessing continuous processing with await

I am using an event based system using the new Python 3.5 coroutines and await. I register events and these events are called by the system.
#event
aysnc def handleevent(args):
# handle the event
I need to initialize some classes to handle the work(time consuming). Then call instance methods, also time consuming (they actually use selenium to browse certain sites).
Ideally I would want something like the following code
# supposedly since this is multiprocessing this is a different driver per process
driver = None
def init():
# do the heavy initialization here
global driver
driver = webdriver.Chrome()
def longworkmethod():
## need to return some data
return driver.dolongwork()
class Drivers:
""" A class to handle async and multiprocessing"""
def __init__(self, numberOfDrivers):
self.pool = multiprocessing.Pool(processes=numberOfDrivers, initializer=init)
async def dowork(self, args):
return self.pool.apply_async(longworkmethod, args=args)
### my main python class
drivers = Drivers(5)
#event
aysnc def handleevent(args):
await drivers.dowork(args)
#event
aysnc def quit(args):
## do cleanup on drivers
sys.exit(0)
This code doesn't work, but I have tried many different ways and none seem to be able to do what I want.
It doesn't have to be this exact form, but how do I go about mixing the await and coroutines with a program that needs multiprocessing?
While there nothing technically speaking that would limit you from mixing asyncio and multiprocessing, I would suggest avoiding doing so. It's going to add a lot of complexity as you'll end up needing an event loop per thread and passing information back and forth will be tricky. Just use one or the other.
asyncio supplies functions for running tasks in another thread - such as AbstractEventLoop.run_in_executor. Take a look at these answers
https://stackoverflow.com/a/33025287/66349 (calling selenium within a coroutine)
https://stackoverflow.com/a/28492261/66349
Alternatively you could just use multiprocessing as selenium has a blocking (non asyncio) interface, however it sounds like some of your code is using already using asyncio so maybe stick with the above.

Categories

Resources