This is not a duplicate of this question
I am trying to understand how django handles multiple requests. According to this answer django is supposed to be blocking parallel requests. But I have found this is not exactly true, at least for django 3.1. I am using django builtin sever.
So, in my code(view.py) I have a blocking code block that is only triggered in a particular situation. It takes a very long to complete the request for this case. This is the code for view.py
from django.shortcuts import render
import numpy as np
def insertionSort(arr):
for i in range(1, len(arr)):
key = arr[i]
j = i-1
while j >=0 and key < arr[j] :
arr[j+1] = arr[j]
j -= 1
arr[j+1] = key
def home(request):
a = request.user.username
print(a)
id = int(request.GET.get('id',''))
if id ==1:
arr = np.arange(100000)
arr = arr[::-1]
insertionSort(arr)
# print ("Sorted array is:")
# for i in range(len(arr)):
# print ("%d" %arr[i])
return render(request,'home/home.html')
so only for id=1 it will execute the blocking code block. But for other cases, it is supposed to work normally.
Now, what I found is, if I make two multiple requests, one with id=1 and another with id=2, second request does not really get blocked but takes longer time to get data from django. It takes ~2.5s to complete if there is another parallel blocking request. Otherwise, it takes ~0.02s to get data.
These are my python codes to make the request:
malicious request:
from concurrent.futures import as_completed
from pprint import pprint
from requests_futures.sessions import FuturesSession
session = FuturesSession()
futures=[session.get(f'http://127.0.0.1:8000/?id=1') for i in range(3)]
start = time.time()
for future in as_completed(futures):
resp = future.result()
# pprint({
# 'url': resp.request.url,
# 'content': resp.json(),
# })
roundtrip = time.time() - start
print (roundtrip)
Normal request:
import logging
import threading
import time
import requests
if __name__ == "__main__":
# start = time.time()
while(True):
print(requests.get("http://127.0.0.1:8000/?id=2").elapsed.total_seconds())
time.sleep(2)
I will be grateful if anyone can explain how Django is serving the parallel requests in this case.
There is an option to use --nothreading when you start the server. From what you described it's possible the blocking task finished in 2 seconds. Easier way to test is to just use time.sleep(10) for testing purposes.
Related
I am using the requests module to download the content of many websites, which looks something like this:
import requests
for i in range(1000):
url = base_url + f"{i}.anything"
r = requests.get(url)
Of course this is simplified, but basically the base url is always the same, I only want to download an image, for example.
This takes very long due to the amount of iterations. The internet connection is not the problem, but rather the amount of time it takes to start a request etc.
So I was thinking about something like multiprocessing, because this task is basically always the same and I could imagine it to be a lot faster when multiprocessed.
Is this somehow doable?
Thanks in advance!
I would suggest that in this case, the lightweight thread would be better. When I ran the request on a certain URL 5 times, the result was:
Threads: Finished in 0.24 second(s)
MultiProcess: Finished in 0.77 second(s)
Your implementation can be something like this:
import concurrent.futures
import requests
from bs4 import BeautifulSoup
import time
def access_url(url,No):
print(f"{No}:==> {url}")
response=requests.get(url)
soup=BeautifulSoup(response.text,features='lxml')
return ("{} : {}".format(No, str(soup.title)[7:50]))
if __name__ == "__main__":
test_url="http://bla bla.com/"
base_url=test_url
THREAD_MULTI_PROCESSING= True
start = time.perf_counter() # calculate the time
url_list=[base_url for i in range(5)] # setting parameter for function as a list so map can be used.
url_counter=[i for i in range(5)] # setting parameter for function as a list so map can be used.
if THREAD_MULTI_PROCESSING:
with concurrent.futures.ThreadPoolExecutor() as executor: # In this case thread would be better
results = executor.map(access_url,url_list,url_counter)
for result in results:
print(result)
end = time.perf_counter() # calculate finish time
print(f'Threads: Finished in {round(end - start,2)} second(s)')
start = time.perf_counter()
PROCESS_MULTI_PROCESSING=True
if PROCESS_MULTI_PROCESSING:
with concurrent.futures.ProcessPoolExecutor() as executor:
results = executor.map(access_url,url_list,url_counter)
for result in results:
print(result)
end = time.perf_counter()
print(f'Threads: Finished in {round(end - start,2)} second(s)')
I think you will see better performance in your case.
I am working around with my personal project.I actually making a brute forcing program in python.I already made it, but the problem is now i want to make it faster by adding some thread to it.The problem is the program has a for loop which repeats for every user,password.So at this point if I make some threads and join the main process to the threads it will do nothing but just repeating the same user,password for every thread.But I don't want this, I want every thread will have a different user,password to bruteforce.Is there any way to tell the threads grab this user,password and now that one because that one is using by another thread.
Thanks.
Here is the code:
import requests as r
user_list = ['a','b','c','d']
pass_list = ['e','f','g','h']
def main_part():
for user,pwd in zip(user_list,pass_list):
action_url = 'https:example.com'
payload = {'user_email':user,'password':pwd}
req = r.post(action_url,data=payload)
print(req.content)
You can use multiprocessing to do what you want. you just need to define a function which handles a single user:
def brute_force_user(user, pwd):
action_url = 'https:example.com'
payload = {'user_email':user,'password':pwd}
req = r.post(action_url,data=payload)
print(req.content)
then run it like that:
import multiprocessing
import os
from itertools import repeat
pool = multiprocessing.Pool(os.cpu_count() - 1)
results = pool.starmap(brute_force_user, user_list, pass_list)
I have a script that use python mechanize and bruteforce html form. This is a for loop that check every password from "PassList" and runs until it matches the current password by checking the redirected url. How can i implement multiprocessing here
for x in PasswordList:
br.form['password'] = ''.join(x)
print "Bruteforce in progress.. checking : ",br.form['password']
response=br.submit()
if response.geturl()=="http://192.168.1.106/success.html":
#url to which the page is redirected after login
print "\n Correct password is ",''.join(x)
break
I do hope this is not for malicious purposes.
I've never used python mechanize, but seeing as you have no answers I can share what I know, and you can modify it accordingly.
In general, it needs to be its own function, which you then call pool over. I dont know about your br object, but i would probably recommend having many of those objects to prevent any clashing. (Can try with the same br object tho, modify code accordingly)
list_of_br_and_passwords = [[br_obj,'password1'],[br_obj,'password2'] ...]
from multiprocessing import Pool
from multiprocessing import cpu_count
def crackPassword(lst):
br_obj = lst[0]
password = lst[1]
br.form['password'] = ''.join(password)
print "Bruteforce in progress.. checking : ",br.form['password']
response=br.submit()
pool = Pool(cpu_count() * 2)
crack_password = pool.map(crackPassword,list_of_br_and_passwords)
pool.close()
Once again, this is not a full answer, just a general guideline on how to do multiprocessing
from multiprocessing import Pool
def process_bruteforce(PasswordList):
<process>
if __name__ == '__main__':
pool = Pool(processes=4) # process per core
is_connected = pool.map(process_bruteforce, PasswordList)
I would try something like that
I have this simple code which fetches page via urllib:
browser_list= ['Chrome','Mozilla','Safari','Internet Explorer','Opera']
user_string_url="http://www.useragentstring.com/pages/"
for eachBrowser in browser_list:
result= urllib2.urlopen(urljoin(user_string_url,eachBrowser))
Now I can read the result via result.read() but I was wondering if all this functionality can be done outside the for loop. Because other URLs to be fetched will wait until all the result has been processed.
I want to process result outside the for loop. Can this be done?
One of the ways to do this maybe to have result as a dictionary. What you can do is:
result = {}
for eachBrowser in browser_list:
result[eachBrowser]= urllib2.urlopen(urljoin(user_string_url,eachBrowser))
and use result[BrowserName] outside the loop.
Hope this helps.
If you simply wants to access all results outside the loop just append all results to a array or dictionary as above answer.
Or if you trying to speed up your task try multithreading.
import threading
class myThread (threading.Thread):
def __init__(self, result):
threading.Thread.__init__(self)
self.result=result
def run(self):
// process your result(as self.result) here
browser_list= ['Chrome','Mozilla','Safari','Internet Explorer','Opera']
user_string_url="http://www.useragentstring.com/pages/"
for eachBrowser in browser_list:
result= urllib2.urlopen(urljoin(user_string_url,eachBrowser))
myThread(result).start() // it will start processing result on another thread and continue loop without any waiting
Its a simple way of multithrading. It may break depending on your result processing. Consider reading the documentation and some examples before you try.
You can use threads for this:
import threading
import urllib2
from urlparse import urljoin
def worker(url):
res = urllib2.urlopen(url)
data = res.read()
res.close()
browser_list = ['Chrome', 'Mozilla', 'Safari', 'Internet Explorer', 'Opera']
user_string_url='http://www.useragentstring.com/'
for browser in browser_list:
url = urljoin(user_string_url, browser)
threading.Thread(target=worker,args=[url]).start()
# wait for everyone to complete
for thread in threading.enumerate():
if thread == threading.current_thread(): continue
thread.join()
Are you using python3?, if so, you can use futures for this task:
from urllib.request import urlopen
from urllib.parse import urljoin
from concurrent.futures import ThreadPoolExecutor
browser_list = ['Chrome','Mozilla','Safari','Internet+Explorer','Opera']
user_string_url = "http://www.useragentstring.com/pages/"
def process_request(url, future):
print("Processing:", url)
print("Reading data")
print(future.result().read())
with ThreadPoolExecutor(max_workers=10) as executor:
submit = executor.submit
for browser in browser_list:
url = urljoin(user_string_url, browser) + '/'
submit(process_request, url, submit(urlopen, url))
You could also do this with yield:
def collect_browsers():
browser_list= ['Chrome','Mozilla','Safari','Internet Explorer','Opera']
user_string_url="http://www.useragentstring.com/pages/"
for eachBrowser in browser_list:
yield eachBrowser, urllib2.urlopen(urljoin(user_string_url,eachBrowser))
def process_browsers():
for browser, result in collect_browsers():
do_something (result)
This is still a synchronous call (browser 2 will not fire until browser 1 is processed) but you can keep the logic for dealing with the results separate from the logic managing the connections. You could of course also use threads to handle the processing asynchronously with or without yield
Edit
Just re-read OP and should repeat that yield doesn't provide multi-threaded, asynchronous execution in case that was not clear in my first answer!
What is your best way to remove all of the blob from blobstore? I'm using Python.
I have quite a lot of blobs and I'd like to delete them all. I'm
currently doing the following:
class deleteBlobs(webapp.RequestHandler):
def get(self):
all = blobstore.BlobInfo.all();
more = (all.count()>0)
blobstore.delete(all);
if more:
taskqueue.add(url='/deleteBlobs',method='GET');
Which seems to be using tons of CPU and (as far as I can tell) doing
nothing useful.
I use this approach:
import datetime
import logging
import re
import urllib
from google.appengine.ext import blobstore
from google.appengine.ext import db
from google.appengine.ext import webapp
from google.appengine.ext.webapp import blobstore_handlers
from google.appengine.ext.webapp import util
from google.appengine.ext.webapp import template
from google.appengine.api import taskqueue
from google.appengine.api import users
class IndexHandler(webapp.RequestHandler):
def get(self):
self.response.headers['Content-Type'] = 'text/plain'
self.response.out.write('Hello. Blobstore is being purged.\n\n')
try:
query = blobstore.BlobInfo.all()
index = 0
to_delete = []
blobs = query.fetch(400)
if len(blobs) > 0:
for blob in blobs:
blob.delete()
index += 1
hour = datetime.datetime.now().time().hour
minute = datetime.datetime.now().time().minute
second = datetime.datetime.now().time().second
self.response.out.write(str(index) + ' items deleted at ' + str(hour) + ':' + str(minute) + ':' + str(second))
if index == 400:
self.redirect("/purge")
except Exception, e:
self.response.out.write('Error is: ' + repr(e) + '\n')
pass
APP = webapp.WSGIApplication(
[
('/purge', IndexHandler),
],
debug=True)
def main():
util.run_wsgi_app(APP)
if __name__ == '__main__':
main()
My experience is that more than 400 blobs at once will fail, so I let it reload for every 400. I tried blobstore.delete(query.fetch(400)), but I think there's a bug right now. Nothing happened at all, and nothing was deleted.
You're passing the query object to the delete method, which will iterate over it fetching it in batches, then submit a single enormous delete. This is inefficient because it requires multiple fetches, and won't work if you have more results than you can fetch in the available time or with the available memory. The task will either complete once and not require chaining at all, or more likely, fail repeatedly, since it can't fetch every blob at once.
Also, calling count executes the query just to determine the count, which is a waste of time since you're going to try fetching the results anyway.
Instead, you should fetch results in batches using fetch, and delete each batch. Use cursors to set the next batch and avoid the need for the query to iterate over all the 'tombstoned' records before finding the first live one, and ideally, delete multiple batches per task, using a timer to determine when you should stop and chain the next task.