While loop makes the browser freeze with Brython - python

I’m trying to get the response of an api request made with ajax.ajax() and the response is stored into ['apiResponse'] in the HTML5 Local Storage (but the rest of the python function processes without waiting for it be put into localStorage).
Because of this I need to wait for it before getting the response and I thought I could do what I did below for the program to wait before it proceed.
Unfortunately the browser seems to freeze every time I put a while loop...
If someone know how to make Brython and the browser to stop freezing or another method to do what I wanna do...
(It would really help me as it’s the only step before succeeding getting Spotify api requests response)
from browser import ajax #to make requests
from browser.local_storage import storage as localStorage #to access HTML5 Local Storage
import json #to convert a json-like string into a Python Dict
#Request to the API
def on_complete(req):
if req.status==200 or req.status==0:
localStorage['apiResponse'] = req.text
else:
print("An error occured while asking Spotify for data")
def apiRequest(requestUrl, requestMethod):
req = ajax.ajax()
req.bind('complete', on_complete)
req.open(requestMethod, requestUrl, True)
req.set_header('Authorization', localStorage['header'])
req.send()
def response():
while localStorage['apiResponse'] == '':
continue
print('done')
return json.loads(localStorage['apiResponse'])
Thanks in advance!

Related

Get requests from one site in parallel Python

I was trying to parse one site, the idea is simple:
I am making get request to the user page www.link/username. Depends if response(html text) contains element or not I do an action.
But I do need to check large amount of usernames(~3000) in parallel and as often as possible.
I have list of proxies(good proxies, not public ones). Set the HEADER with user-agent, refers.
What I do:
Create thread for each username. 50 usernames are using one proxy.
Each thread is checking its username once at random choosing sleep time.
At the begging everything is okay and I get right responses. But after few iterations responses are getting wrong and my program is not doing what should.
Can u please help me to figure out how to make that amount of requests at the same time using python requests.
Some code:
def check_username(username, proxy=''):
responses[username] = 'tgme_page_extra'
try:
responses[username] = requests.get(URL + username, headers=HEADERS, proxies=proxy) # getting response from link
except Exception as e:
print(e)
time.sleep(7)
if "tgme_page_extra" not in responses[username].text: # If username seems unclaimable
# action
else:
#another action
def username_monit(username, proxy):
while True: # Check username and sleep for interval
check_username(username, proxy)
time.sleep(random.choice(config.CHECK_INTERVAL))

Send POST request with Python that generates a download and download the file

There's a website that has a button which downloads an Excel file. After I click, it takes around 20 seconds for the server API to generate the file and send it back to my browser for download.
If I monitor the communication after I click the button, I can see how the browser sends a POST request to a server with a series of headers and form values.
Is there a way that I can simulate a similar POST request programmatically using Python, and retrieve the Excel file after the server sends it over?
Thank you in advance
The requests module is used for sending all kinds of request types.
requests.post sends the post requests synchronously.
The payload data can be set using data=
The response can be accessed using .content.
Be sure to check the .status_code and only save on a successful response code
Also note the use of "wb" inside open, because we want to save the file as a binary instead of text.
Example:
import requests
payload = {"dao":"SampleDAO",
"condigId": 1,
...}
r = requests.post("http://url.com/api", data=payload)
if r.status_code == 200:
with open("file.save","wb") as f:
f.write(r.content)
Requests Documentation
I guess You could similarly do this:
file_info = request.get(url)
with open('file_name.extension', 'wb') as file:
file.write(file_info.content)
I honestly do not know how to explain this tho since I have little understanding how it works

How to send cookie (header) using Python requests library

Hi I am new with python requests and would like to have some help.
When I try to use python requests and get the session cookie, use the following command:
session_req = requests.session()
result = session_req.get(
get_url
)
after execute GET from requests, I use the '.cookies' property ant the respective key I want to send at the POST Header, I get the value successfully, but the POST action is not working.
session_req.cookies['IFCSHOPSESSID']
but when I get the request from the same API via POSTMAN and try to get the cookie property (exporting the code as python requests) I found some differences, and if I use this same cookie exported from POSTMAN it works.
POSTMAN EXAMPLE
'cookie': 'IFCSHOPSESSID=hrthhiqdeg0dvf4ecooc83lui3; nikega=GA1.4.831513767.1599354095; nikega_gid=GA1.4.1839484382.1599354095; _ga=GA1.3.831513767.1599354095; _gid=GA1.3.733956911.1599354099; chaordic_browserId=0-fv_3j6NdVlbNFFwPRzUGQVse7e1bbqga-3OS1599354098234702; chaordic_anonymousUserId=anon-0-fv_3j6NdVlbNFFwPRzUGQVse7e1bbqga-3OS1599354098234702; chaordic_testGroup=%7B%22experiment%22%3Anull%2C%22group%22%3Anull%2C%22testCode%22%3Anull%2C%22code%22%3Anull%2C%22session%22%3Anull%7D; user_unic_ac_id=bec863cf-4e06-0ab1-d881-b566595d3e8f; _gcl_au=1.1.1305519862.1599354100; _fbp=fb.2.1599354100232.504934336; smeventsclear_16df2784b41e46129645c2417f131191=true; smViewOnSite=true; __pr.cvh=4ftsyf8x16; _gaexp=GAX1.3.tupm6REJTMeD-piAakRDMA.18557.0; blueID=75a502b6-e7c2-4eb3-8442-75aea5d95fdc; _cm_ads_activation_retry=false; sback_client=5816989a58791059954e4c52; sback_partner=false; sb_days=1599356617672; sback_refresh_wp=no; smClickOnSite=true; smClickOnSite_652c0aaee02549a3a6ea89988778d3fc=true; _rtbhouse_source_=socialminer; RKT=false; dedup=socialminer; lmd_cj=socialminer; advcake_url=https%3A%2F%2Fwww.nike.com.br%2Flancamentos%3Futm_source%3Dsocialminer%26utm_medium%3Dsocialminer_onsitedesktop%26utm_campaign%3Dsocialminer_onsitedesktop_lancamentos_desk%26smid%3D3-17; advcake_trackid=dd7e2ef0-dd50-889a-aeea-559a0d8bcd22; advcake_utm_content=socialminer_onsitedesktop_lancamentos_desk; advcake_utm_campaign=socialminer; Campanha=; Parceiro=; Midia=; AMCVS_F0935E09512D2C270A490D4D%40AdobeOrg=1; s_cc=true; lmd_orig=direct; SIZEBAY_SESSION_ID=0AC1A70CB19F4f03610665d04bb088ef3b9af0942fc8; sback_customer_w=true; sback_browser=0-87718800-1599408894bff13e290b9fee5fc2b430382f639b87dd9cf25112334287575f550afed62983-14051381-17920887216,13017640152-1599408894; sback_access_token=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJhcGkuc2JhY2sudGVjaCIsImlhdCI6MTU5OTQwODg5NSwiZXhwIjoxNTk5NDk1Mjk1LCJhcGkiOiJ2MiIsImRhdGEiOnsiY2xpZW50X2lkIjoiNTgxNjk4OWE1ODc5MTA1OTk1NGU0YzUyIiwiY2xpZW50X2RvbWFpbiI6Im5pa2UuY29tLmJyIiwiY3VzdG9tZXJfaWQiOiI1ZjU0M2VjODA5ZjFkMDkzMmQzMjQ2OTUiLCJjdXN0b21lcl9hbm9ueW1vdXMiOmZhbHNlLCJjb25uZWN0aW9uX2lkIjoiNWY1NDNlYzgwOWYxZDA5MzJkMzI0Njk2IiwiYWNjZXNzX2xldmVsIjoiY3VzdG9tZXIifX0.K6FYVBasHjMg_PLbT1yZfrnIp97USqijoMObF4eUSms.WrWrDrHeHezRqBiYiYHeDr; sback_customer=$2gSxATWYdVYOVGMI10bUdkW2pWeoZERU1kc1YWWhd1SNR0aMJ0QUVzTHpHdJZERnpVS6FTSkRUTOBjMys2bUdnT2$12; sback_pageview=false; ak_bmsc=B6177778CB59637165F7EC43342C1559C9063147DA220000234E555F8D78F831~plACNrc4cNxoHZNcO7aF4o+U0KQNKjzPECGSfb42NdayPvdNkBWwUT9QOhGjuLJJ3vStuFIRkiI/35wsHEyUE3/h2guphhaEy71BnfekvDtb/6F84hS+fWhPxxVG5RAlph8WzGpYMn6NZESNVcgnZYfH4HoZ/IzBPR6AMG9UGn6W4xm/j/j9kOfef8v/fZf2pXw4mxJuiN5Cxc7g2sV4nCdoEW98Q4AgqplzxWZjpamZk=; bm_sz=6586256DDAFC895D740341E4214D0D40~YAAQRzEGybYT7yN0AQAAfDw5ZQnXjJtKI2SxkwQFV9vLZpF5mACXNUtUFDSkidKuYM2fac5sQgRozU9fA3+017dht/PUtH+wtibATtTmoVOlpKnW+V76+1rySk3HK6q83Q9rtQc/LaaQ8VYtK/tDi0VOc7/0wLyKy/+Z4OLtgUpySYZZcEX4k8/46no8rFD6OQ==; AMCV_F0935E09512D2C270A490D4D%40AdobeOrg=359503849%7CMCIDTS%7C18512%7CMCMID%7C56897587165425478193529762442649463163%7CMCAAMLH-1600030892%7C4%7CMCAAMB-1600030892%7CRKhpRz8krg2tLO6pguXWp5olkAcUniQYPHaMWWgdJ3xzPWQmdj0y%7CMCOPTOUT-1599433292s%7CNONE%7CMCSYNCSOP%7C411-18519%7CvVersion%7C5.0.1; sback_total_sessions=3; sback_session=5f554e3c73a63da56d739d87; lmd_traf=direct-1599402359608&direct-1599408890286&direct-1599414284313&direct-1599427194077; chaordic_realUserId=2962653; chaordic_session=1599429266491-0.4343169041143473; _st_ses=49222273669791505; _st_cart_script=helper_nike.js; _st_cart_url=/; _sptid=1592; _spcid=1592; _st_id=cnVkc29ucmFtb25AZ21haWwuY29t; _st_idb=cnVkc29ucmFtb25AZ21haWwuY29t; lx_sales_channel=%5B%222%22%5D; sback_cart=5f555ba24f507d767721c387; CSRFtoken=1ac8a198f88ac1ccc1f8555ab41c8a95; gpv_v70=nikecombr%3Echeckout%3Eaddress; pv_templateName=CHECKOUT; gptype_v60=checkout%3Aaddress; stc119288=env:1599429270%7C20201007215430%7C20200906222939%7C5%7C1088071:20210906215939|uid:1599354102799.1149977977.6705985.119288.1871143352:20210906215939|srchist:1088071%3A1599429270%3A20201007215430:20210906215939|tsa:1599429270805.1898407973.364911.7635034620790062.2:20200906222939; bm_sv=C9C3A8C6B2F6CB232317BB794ADC0497~ZnoksXquh4Yrh4uN87gycXdh+ixzU+xMFsb94sO9uE5JMLyZz9eJPp5odX7vx944KIXG1nvOxuq8pdrQUDjBrchRJLC4yiD1yWX0h4BjWhZwbfHPtnzaT3ASbIZnf2Ts1TRt+ZAescJJwrNPs4oV2If7vyiWi2AYILFvCstCTS8=; _uetsid=a9a0bfd4fe4e4db52bcd4ca66850a785; _uetvid=9ba47ed116a48f496f6b1a9844e21c95; __udf_j=f08aeb668454efbf6ddc83dd9d4b7a8385abde9f9fbd92526f1de0441da2126ec40330dfc36d0b9c3eae98557c94447d; _spl_pv=40; s_sq=lojanike-new-production%252Clojanike-nikebr%3D%2526c.%2526a.%2526activitymap.%2526page%253Dnikecombr%25253Echeckout%25253Eaddress%2526link%253DSeguir%252520para%252520pagamento%2526region%253Didentificacao-form%2526pageIDType%253D1%2526.activitymap%2526.a%2526.c%2526pid%253Dnikecombr%25253Echeckout%25253Eaddress%2526pidt%253D1%2526oid%253DSeguir%252520para%252520pagamento%2526oidt%253D3%2526ot%253DSUBMIT; RT="z=1&dm=nike.com.br&si=92b42534-25ee-4155-aa1a-e7d127581869&ss=kermvxyl&sl=9&tt=17e8&bcn=%2F%2F173e2544.akstat.io%2F"; _abck=F6E1C280C3F9D735A2B1AB62443DB479~-1~YAAQVjEGycno+iJ0AQAAmtRxZQT8kxLFalTup4dkYT5+cq/PavPcY4/0zAeJv4GoSQQwYVj4EWydkfxbJR3Rgaa4k6ma+5O72J/lsiajATrx0oaZJuB5b/FIP6RymanPRVGlb3kLJXpBQDkCmVv62kkxLKxySrlAYDCg0ORCpSXlTCbFBVEchC9ih5t094egSeVdM6VjfQSO9uDKISBoP4923qkJMTpbk9B1nOoiylKK+y+FGFu8pzEpQqZYj7tIMTJVpqe0OpXaQ8m8nPyp0K+PmBcAndIHcBMTZUEqma9/72Enx8yvGbKXrYbAzNDw6ZtKY9OAbNuVeqprza/Af0aUkinm0l3JqxjTH1LpglNxNN4=~-1~-1~-1; CSRFtoken=20a208bad599aa3ead0bbe944b27a368; bm_sv=C9C3A8C6B2F6CB232317BB794ADC0497~ZnoksXquh4Yrh4uN87gycXdh+ixzU+xMFsb94sO9uE5JMLyZz9eJPp5odX7vx944KIXG1nvOxuq8pdrQUDjBrchRJLC4yiD1yWX0h4BjWhbSXhHWWrgkUsOTt9033P5Wxu1qmo5M6w0VAWeAzBaCN7yZC2Ll7DiGq0CwpjxlOW4=; _abck=F6E1C280C3F9D735A2B1AB62443DB479~-1~YAAQVjEGyRKO+iJ0AQAA+4U9ZQSNIWTEz/60Uk5gz2tnzVtbMbX0hpaMbkbeJxSYSMD1xo7TTedXnJ0UuTLxxcHhLVrRRCrZfSjZ+yH00Ld6FLIajmYFefKPehzA6GgwjnLyucI1O6nDw2ZU1CV0WJLeWGgcmX7sinsLr3DVtmoGJyNR1Q9EWpvq71/W1Ys4Bqhq1628YKEz/0Z1Ic1bWMujcG03064ZZYYXTSTz9jrkxHKaEoJQNQgyUg9NXQhv4EFoMSESy/AIKRy+hVCULLJscbkpH8WakuvYQ1raghVfheks/Xra9AmiUoOqAbWAPXOij1nWQ9PSV2hxQZfkibD0+YP14pTXPoCAUA9jCQHRJIw=~0~-1~-1'
session_req.cookies['IFCSHOPSESSID'] EXAMPLE
qnabtagl4pu7gm2jg3sij03cu6
Other curious thing is that when I use the '.cookies' property, my POST call return sucess even without update the cart where it should be inserting a new register.
As I am trying to develop one site bot, I would like to generate this same cookie via python requests code. Can anyone try to help me on it?
This is an example with python 3. You can customize it.
import requests
data ="param_1=value_1&param_2=value_2&.....&param_n=value_n"; #your request parameters.
cookie = "cookie_name=xxxxxxxx;....." #define cookie
url_endpoint = "htpps://........." # your url endpoint
# add cookies to endpoints
resp = requests.get(url_endpoint, data=data.encode('utf-8'),cookies=cookie)
if(resp.status_code==200):
print("success ")
else:
print("error ")

Asynchronous JSON Requests in Python

I'm using an API for doing HTTP requests that return JSON. The calling of the api, however, depends on a start and an end page to be indicated, such as this:
def API_request(URL):
while(True):
try:
Response = requests.get(URL)
Data = Response.json()
return(Data['data'])
except Exception as APIError:
print(APIError)
continue
break
def build_orglist(start_page, end_page):
APILink = ("http://sc-api.com/?api_source=live&system=organizations&action="
"all_organizations&source=rsi&start_page={0}&end_page={1}&items_"
"per_page=500&sort_method=&sort_direction=ascending&expedite=1&f"
"ormat=json".format(start_page, end_page))
return(API_request(APILink))
The only way to know if you're not longer at an existing page is when the JSON will be null, like this.
If I wanted to do multiple build_orglist going over every single page asynchronously until I reach the end (Null JSON) how could I do so?
I went with a mix of #LukasGraf's answer of using sessions to unify all of my HTTP connections into a single session as well as made use of grequests for making the group of HTTP requests in parallel.

detect if a web page is changed

In my python application I have to read many web pages to collect data. To decrease the http calls I would like to fetch only changed pages. My problem is that my code always tells me that the pages have been changed (code 200) but in reality it is not.
This is my code:
from models import mytab
import re
import urllib2
from wsgiref.handlers import format_date_time
from datetime import datetime
from time import mktime
def url_change():
urls = mytab.objects.all()
# this is some urls:
# http://www.venere.com/it/pensioni/venezia/pensione-palazzo-guardi/#reviews
# http://www.zoover.it/italia/sardegna/cala-gonone/san-francisco/hotel
# http://www.orbitz.com/hotel/Italy/Venice/Palazzo_Guardi.h161844/#reviews
# http://it.hotels.com/ho292636/casa-del-miele-susegana-italia/
# http://www.expedia.it/Venezia-Hotel-Palazzo-Guardi.h1040663.Hotel-Information#reviews
# ...
for url in urls:
request = urllib2.Request(url.url)
if url.last_date == None:
now = datetime.now()
stamp = mktime(now.timetuple())
url.last_date = format_date_time(stamp)
url.save()
request.add_header("If-Modified-Since", url.last_date)
try:
response = urllib2.urlopen(request) # Make the request
# some actions
now = datetime.now()
stamp = mktime(now.timetuple())
url.last_date = format_date_time(stamp)
url.save()
except urllib2.HTTPError, err:
if err.code == 304:
print "nothing...."
else:
print "Error code:", err.code
pass
I do not understand what has gone wrong. Can anyone help me?
Web servers aren't required to send a 304 header as the response when you send an 'If-Modified-Since' header. They're free to send a HTTP 200 and send the entire page again.
Sending a 'If-Modified-Since' or 'If-None-Since' alerts the server that you'd like a cached response if available. It's like sending an 'Accept-Encoding: gzip, deflate' header -- you're just telling the server you'll accept something, not requiring it.
A good way to check if a site returns 304 is to use google chromes dev tools. E.g. below is an annotated example of using chrome on the bls website. Keep refreshing and you will see that the server keeps returning 304. If you force refresh with Ctrl+F5 (windows), you will see that instead it returns status code 200.
You can use this technique on your example to find out if the server does not return 304, or if you have incorrectly formatted your request headers somehow. Sometimes a webpage has a resource imported on to it which does not respect the If- headers and so it returns 200 whatever you do (If any resource on the page does not return 304, the whole page will return 200), but sometimes you are only looking at a specific part of a website and you can cheat by loading the resource directly and bypassing the whole document.

Categories

Resources