I am currently learning multithreading and learned about concurrent.futures and the threading pool executor, i tried to implement an example but for some reason it was not printing the multiple print orders. where did i go wrong?
import requests
import random
import string
import concurrent.futures
result = open(r"workingGhosts.txt","w")
length = 5
url ="https://ghostbin.co/paste/"
def get_random_string(length):
letters = string.ascii_lowercase
result_str = ''.join(random.choice(letters) for i in range(length))
return result_str
times = int(input("how many times do you want to check?"))
list_urls=[]
counter = 0
for x in range(times):
stringA = url + get_random_string(length)
list_urls.append(stringA)
def lol(finalUrl):
r = requests.get(finalUrl)
counter= counter +1
print("printing")
if r.status_code != 404:
result.write(finalUrl)
with concurrent.futures.ThreadPoolExecutor() as executor:
executor.map(lol,list_urls)
I'ts not printing because there is an exception. these aren't shown to you unless you check for them yourself while using an executor
the exception you are getting is:
UnboundLocalError: local variable 'counter' referenced before assignment
because of this line
counter= counter +1
change lol like this:
def lol(finalUrl):
global counter # add this line
r = requests.get(finalUrl)
counter= counter +1
print("printing")
if r.status_code != 404:
result.write(finalUrl)
Related
I am trying to terminate a ThreadPool based on values returned from long running request. I wish to terminate the ThreadPool once the sum of the request return values reaches MIN_REQUIRED_VALUE
I am sure the problem is that I am creating a full list of futures which will always have to be resolved. I am not sure how to perform the requests without creating a list with ThreadPoolExecutor
I know there has been a couple of questions related to terminating a thread pool. I have found similar questions but the answers don't seem to handle the return value.
Smilar questions:
Python ThreadPoolExecutor terminate all threads
asyncio: Is it possible to cancel a future been run by an Executor?
If there is a better way to do this with another module, that would be fine.
Any assistance would be much appreciated.
from time import sleep
from concurrent.futures import ThreadPoolExecutor, as_completed
NUM_REQUESTS = 50
MIN_REQUIRED_VALUE = 30
def long_request(id):
sleep(3)
return {"data": {"value": 10}}
def check_results(results):
total = 0
for result in results:
total += result["data"]["value"]
return total
def main():
futures = []
responses = []
with ThreadPoolExecutor(max_workers=10) as executor:
for request_index in range(NUM_REQUESTS):
future = executor.submit(long_request, request_index)
# Create Futures List
futures.append(future)
for future in as_completed(futures):
responses.append(future.result())
# Check minimum value reached
total = check_results(responses)
if total > MIN_REQUIRED_VALUE:
executor.shutdown(wait=False)
if __name__ == "__main__":
main()
I changed the code around to append only futures with results if MIN_REQUIRED_VALUE not reached and loop through all pending futures and cancel them if MIN_REQUIRED_VALUE is reached.
You can notice I added num_requests to check number of requests submitted and it turns out to be exactly 6 in this case which is expected.
If anyone has a better way to do this would be good to see.
from concurrent.futures import ThreadPoolExecutor, as_completed
from time import sleep
NUM_REQUESTS = 1000
MIN_REQUIRED_VALUE = 50
def long_request(id):
sleep(1)
return {"data": {"value": 10}}
def check_results(results):
total = 0
for result in results:
total += result["data"]["value"]
return total
def main():
futures = []
responses = []
num_requests = 0
with ThreadPoolExecutor(max_workers=10) as executor:
for request_index in range(NUM_REQUESTS):
future = executor.submit(long_request, request_index)
# Future list
futures.append(future)
for future in as_completed(futures):
# --- Changed Logic Below ---
total = check_results(responses)
if total > MIN_REQUIRED_VALUE:
for pending_future in futures:
pending_future.cancel()
else:
num_requests += 1
responses.append(future.result())
return num_requests
if __name__ == "__main__":
requests = main()
print("Num Requests: ", requests)
I want to build a tool that scan a website for sub domains, I know how to do his, but my function is slower, I looked up in the gobuster usage, and I saw that the gobuster can use many concurrent threads, how can I implement this too ?
I have asked Google many times, but I can't see anything about this, can someone give me an example ?
gobuster usage: -t Number of concurrent threads (default 10)
My current program:
def subdomaines(url, wordlist):
checks(url, wordlist) # just checking for valid args
num_lines = get_line_count(wordlist) # number of lines in a file
count = 0
for line in open(wordlist).readlines():
resp = requests.get(url + line) # resp
if resp.status_code in (301, 200):
print(f'Valid - {line}')
print(f'{count} / {num_lines}')
count += 1
Note* : gobuster is a very fast tool for searching subdomains in websites
If you're trying to use threading in python you should start from the basics and learn what's available. But here's a simple example taken from https://pymotw.com/2/threading/
import threading
def worker():
"""thread worker function"""
print 'Worker'
return
threads = []
for i in range(5):
t = threading.Thread(target=worker)
threads.append(t)
t.start()
To apply this to your task, a simple approach would be to spawn a thread for each request. Something like the code below. Note: if your wordlist is long this might be very expensive. Look into some of the thread pool libraries in python for better thread management that you won't need to explicitly control yourself.
import threading
def subdomains(url, wordlist):
checks(url, wordlist) # just checking for valid args
num_lines = get_line_count(wordlist) # number of lines in a file
count = 0
threads = []
for line in open(wordlist).readlines():
t = threading.Thread(target=checkUrl,args=(url,line))
threads.append(t)
t.start()
for thread in threads: #wait for all threads to complete
thread.join()
def checkUrl(url,line):
resp = requests.get(url + line)
if resp.status_code in (301, 200):
print(f'Valid - {line}')
To implement the counter you'll need to control shared access between threads to prevent race conditions (two threads accessing the variable at the same time resulting in... problems). A counter object with protected access is provided in the link above:
class Counter(object):
def __init__(self, start=0):
self.lock = threading.Lock()
self.value = start
def increment(self):
#Waiting for lock
self.lock.acquire()
try:
#Acquired lock
self.value = self.value + 1
finally:
#Release lock, so other threads can count
self.lock.release()
#usage:
#in subdomains()...
counter = Counter()
for ...
t = threading.Thread(target=checkUrl,args=(url,line,counter))
#in checkUrl()...
c.increment()
Final note: I have not compiled or tested any of this code.
Python have threading module.
The simplest way to use a Thread is to instantiate it with a target function and call start() to let it begin working.
import threading
def subdomains(url, wordlist):
checks(url, wordlist) # just checking for valid args
num_lines = get_line_count(wordlist) # number of lines in a file
count = 0
for line in open(wordlist).readlines():
resp = requests.get(url + line) # resp
if resp.status_code in (301, 200):
print(f'Valid - {line}')
print(f'{count} / {num_lines}')
count += 1
threads = []
for i in range(10):
t = threading.Thread(target=subdomains)
threads.append(t)
t.start()
I am aware using the traditional multiprocessing library I can declare a value and share the state between processes.
https://docs.python.org/3/library/multiprocessing.html?highlight=multiprocessing#sharing-state-between-processes
When using the newer concurrent.futures library how can I share state between my processes?
import concurrent.futures
def get_user_object(batch):
# do some work
counter = counter + 1
print(counter)
def do_multithreading(batches):
with concurrent.futures.ThreadPoolExecutor(max_workers=25) as executor:
threadingResult = executor.map(get_user_object, batches)
def run():
data_pools = get_data()
start = time.time()
with concurrent.futures.ProcessPoolExecutor(max_workers=PROCESSES) as executor:
processResult = executor.map(do_multithreading, data_pools)
end = time.time()
print("TIME TAKEN:", end - start)
if __name__ == '__main__':
run()
I want to keep a synchronized value of this counter.
In the previous library I might have used multiprocessing.Value and a Lock.
You can pass an initializer and initargs to ProcessPoolExecutor just as you would to multiprocessing.Pool. Here's an example:
import concurrent.futures
import multiprocessing as mp
def get_user_object(batch):
with _COUNTER.get_lock():
_COUNTER.value += 1
print(_COUNTER.value, end=' ')
def init_globals(counter):
global _COUNTER
_COUNTER = counter
def main():
counter = mp.Value('i', 0)
with concurrent.futures.ProcessPoolExecutor(
initializer=init_globals, initargs=(counter,)
) as executor:
for _ in executor.map(get_user_object, range(10)):
pass
print()
if __name__ == "__main__":
import sys
sys.exit(main())
Use:
$ python3 glob_counter.py
1 2 4 3 5 6 7 8 10 9
Where:
for _ in executor.map(get_user_object, range(10)): lets you iterate over each result. In this case, get_user_object() returns None, so you don't really have anything to process; you just pass and take no further action.
The last print() call gives you an extra newline, because the original print() call does not use a newline (end=' '')
I wrote a sample code of my problem again. I have a test case where the highest time delay will be taken along with its random string.
E.g. [1,2,7,2,1,2,8,1] will be come [1,2,7,7,7,7,8,8]
It is working, however it does not come in the right order as in the output of the random strings of the 7's is random. I have used wait() but I am confused as to why it does not deal with the greenlet as it spawns. I have tried using tasks = ThreadPool() and tasks = []
#!/usr/bin/python
import random
import string
from gevent import sleep, spawn, wait
from gevent.threadpool import ThreadPool
import time
def random_string():
digits = "".join( [random.choice(string.digits) for i in xrange(8)] )
chars = "".join( [random.choice(string.letters) for i in xrange(10)] )
return chars
def shuffle_message (message, delay):
sleep(delay)
print("Shuffled message: {} and time: {}". format(message, delay))
def main():
tasks = ThreadPool(25)
prev = 0
a=[1,2,7,2,1,2,8,1] #test case but it is randomise normally
for i, elem in enumerate(a):
delay = a[i]
string = random_string()
print("Message: {} and time: {}". format(string, delay))
if prev < delay:
prev = delay
else:
delay = prev
print("New message: {} and time: {}". format(string, delay))
tasks.spawn(shuffle_message, string, delay)
wait()
if __name__ == "__main__":
main()
I wrote a simple script in order to check the availability of some domains, but I can't understand why it starts with abns not aaaa.
Here is the code :
import whois
import eventlet
from itertools import product
from string import ascii_lowercase
f = open('4-letter.txt', 'w')
k = (''.join(x) for x in product(ascii_lowercase, repeat=4))
def fetch(url):
for x in k:
if whois.whois(x+".ro").status == "OK":
print(x+" bad")
else:
f.write(x+".ro\n")
pool = eventlet.GreenPool()
for status in pool.imap(fetch, k):
print(status)
f.close()
You access the global generator k in this function:
def fetch(url):
for x in k:
if whois.whois(x+".ro").status == "OK":
print(x+" bad")
else:
f.write(x+".ro\n")
But you also hand k to pool.imap(fetch, k). So k is already iterated over several steps before fetch() is called.