I have a simple HTML page where I am trying to post form data using requests.post(); however, I keep getting Bad Request 400. CSRF token missing or incorrect even though I am passing it URL-encoded.
Please help.
url = "https://recruitment.advarisk.com/tests/scraping"
res = requests.get(url)
tree = etree.HTML(res.content)
csrf = tree.xpath('//input[#name="csrf_token"]/#value')[0]
postData = dict(csrf_token=csrf, ward=wardName)
print(postData)
postUrl = urllib.parse.quote(csrf)
formData = dict(csrf_token=postUrl, ward=wardName)
print(formData)
headers = {'referer': url, 'content-type': 'application/x-www-form-urlencoded', 'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36'}
page = requests.post(url, data=formData, headers=headers)
return page.content
You have make sure the requests in one session, so that the csrf_token will be matched:
import sys
import requests
wardName = "DHANLAXMICOMPLEX"
url = 'https://recruitment.advarisk.com/tests/scraping'
#make the requests in one session
client = requests.session()
# Retrieve the CSRF token first
tree = etree.HTML(client.get(url).content)
csrf = tree.xpath('//input[#name="csrf_token"]/#value')[0]
#form data
formData = dict(csrf_token=csrf, ward=wardName)
headers = {'referer': url, 'content-type': 'application/x-www-form-urlencoded', 'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36'}
#use same session client
r = client.post(url, data=formData, headers=headers)
print r.content
It will give you the html with the result data table.
Related
I'm getting the same response from these 2 URLs:
First URL
Second URL
This is the code I'm using:
import requests
url = "https://www.amazon.it/blackfriday"
querystring = {"ref_":"nav_cs_gb_td_bf_dt_cr","deals-widget":"{\"version\":1,\"viewIndex\":60,\"presetId\":\"deals-collection-all-deals\",\"sorting\":\"BY_SCORE\"}"}
payload = ""
headers = {"cookie": "session-id=260-4643637-2647537; session-id-time=2082787201l; i18n-prefs=EUR; ubid-acbit=258-7747562-7485655; session-token=%22aZB70z2dnXHbhJ9e02ESp7q6xO23IGnDFT2iBCiPXZFoBTTEguAJ%2FBSnV7ud6bjAca64nh3bMF1bwDykOBf9BV%2BVjbx4tUQCyBkrg8tyR8PLZ8cjzpCz%2FzQSAmjiL6mSBcspkF8xuV0bxqLeRX7JQCMrHVBFf%2BsUhxV%2FMBLCH8UPk2o5aNL7OyAFCODBdRqm72RK5DAoKeMUymlVEOtqzvZSJbP%2Fut0gobiXJblRM2c%3D%22"}
response = requests.request("GET", url, data=payload, headers=headers, params=querystring)
I would like to get the same response that i get on the browser
How can i do it? Why does this happen?
You have to trick the server into thinking you are a browser. You can accomplish this by setting the user agent header.
headers = {'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.61 Safari/537.36',"cookie": "session-id=260-4643637-2647537; session-id-time=2082787201l; i18n-prefs=EUR; ubid-acbit=258-7747562-7485655; session-token=%22aZB70z2dnXHbhJ9e02ESp7q6xO23IGnDFT2iBCiPXZFoBTTEguAJ%2FBSnV7ud6bjAca64nh3bMF1bwDykOBf9BV%2BVjbx4tUQCyBkrg8tyR8PLZ8cjzpCz%2FzQSAmjiL6mSBcspkF8xuV0bxqLeRX7JQCMrHVBFf%2BsUhxV%2FMBLCH8UPk2o5aNL7OyAFCODBdRqm72RK5DAoKeMUymlVEOtqzvZSJbP%2Fut0gobiXJblRM2c%3D%22"}
I just found the bestbuyCA api by inspecting the xhr.
aboveurl = 'https://www.bestbuy.ca/ecomm-api/availability/products?accept=application%2Fvnd.bestbuy.simpleproduct.v1%2Bjson&accept-language=en-CA&skus=14962185'
I've tried::
response= requests.get(aboveurl)
print(response.text)
//
r = requests.get(url).json()
print(r)
When I run my code in vsc, it starts and keeps running but it will not display anything.
You have to add a header to your request to get the request result:
import requests
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.76 Safari/537.36', "Upgrade-Insecure-Requests": "1","DNT": "1","Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8","Accept-Language": "en-US,en;q=0.5","Accept-Encoding": "gzip, deflate"}
aboveurl = "https://www.bestbuy.ca/ecomm-api/availability/products?accept=application%2Fvnd.bestbuy.simpleproduct.v1%2Bjson&accept-language=en-CA&skus=14962185"
html = requests.get(aboveurl,headers=headers)
print(f'requrest code: {html.status_code}\n request text: {html.text}')
I'm trying to get data from a json link, but I'm getting this error: TypeError: can't concat str to bytes
This is my code:
l = "https://www.off---white.com/en/IT/men/products/omch016f18d471431088s"
url = (l+".json"+"?porcoiddio")
req = urllib.request.Request(url, headers)
response = urllib.request.urlopen(req)
size_opts = json.loads(response.decode('utf-8'))['available_sizes']
How can I solve this error?
Your question answer is change your code to:
size_opts = json.loads(response.read().decode('utf-8'))['available_sizes']
Change at 2018-10-02 22:55 : I view your source code and found Response 503 , the reason why you got 503 is that request did not contain cookies:
req = urllib.request.Request(url, headers=headers)
you have update your headers.
headers.update({"Cookie":cookie_value})
req = urllib.request.Request(url, headers=headers) # !!!! you need a headers include cookies !!!!
you are providing the data argument by mistake …
you'll have to use a keyword argument for headers as otherwise the second argument will be filled with positional input, which happens to be data, try this:
req = urllib.request.Request(url, headers=headers)
See https://docs.python.org/3/library/urllib.request.html#urllib.request.Request for a documentation of Requests signature.
You could have a go using requests instead?
import requests, json
l = "https://www.off---white.com/en/IT/men/products/omch016f18d471431088s"
url = (l+".json"+"?porcoiddio")
session = requests.Session()
session.mount('http://', requests.adapters.HTTPAdapter(max_retries=10))
size_opts = session.get(url, headers= {'Referer': 'off---white.com/it/IT/login', 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36'}).json()['available_sizes']
To check the response:
size_opts = session.get(url, headers= {'Referer': 'off---white.com/it/IT/login', 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36'})
print(size_opts)
Gives
<Response [503]>
This response means: "503 Service Unavailable. The server is currently unable to handle the request due to a temporary overload or scheduled maintenance"
I would suggest the problem isn't the code but the server?
I am fairly new to Python and I'm trying to extract production data from the Alabama state website (https://www.gsa.state.al.us/ogb/production). I was wondering if someone could guide me on starting this? This is what I have so far. I was trying to extract production for permit number 8132-C.
headers = {
'Content-Type': 'application/x-www-form-urlencoded',
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64)
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36',
}
payload = '8132-C'
session = requests.Session()
r = requests.get('https://www.gsa.state.al.us/ogb/production',
params=payload)
print(r.url)
Instead of r.url , you should r.text to see the data.
import requests
payload = '8132-C'
session = requests.Session()
r = requests.get('https://www.gsa.state.al.us/ogb/production', params=payload)
print(r.text)
for a university project I am currently trying to login to a website, and scrap a little detail (a list of news articles) from my user profile.
I am new to Python, but I did this before to some other website. My first two approaches deliver different HTTP errors. I have considered problems with the header my request is sending, however my understanding of this sites login process appears to be insufficient.
This is the login page: http://seekingalpha.com/account/login
My first approach looks like this:
import requests
with requests.Session() as c:
requestUrl ='http://seekingalpha.com/account/orthodox_login'
USERNAME = 'XXX'
PASSWORD = 'XXX'
userAgent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.112 Safari/537.36'
login_data = {
"slugs[]":None,
"rt":None,
"user[url_source]":None,
"user[location_source]":"orthodox_login",
"user[email]":USERNAME,
"user[password]":PASSWORD
}
c.post(requestUrl, data=login_data, headers = {"referer": "http://seekingalpha.com/account/login", 'user-agent': userAgent})
page = c.get("http://seekingalpha.com/account/email_preferences")
print(page.content)
This results in "403 Forbidden"
My second approach looks like this:
from requests import Request, Session
requestUrl ='http://seekingalpha.com/account/orthodox_login'
USERNAME = 'XXX'
PASSWORD = 'XXX'
userAgent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.112 Safari/537.36'
# c.get(requestUrl)
login_data = {
"slugs[]":None,
"rt":None,
"user[url_source]":None,
"user[location_source]":"orthodox_login",
"user[email]":USERNAME,
"user[password]":PASSWORD
}
headers = {
"accept":"text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
"Accept-Language":"de-DE,de;q=0.8,en-US;q=0.6,en;q=0.4",
"origin":"http://seekingalpha.com",
"referer":"http://seekingalpha.com/account/login",
"Cache-Control":"max-age=0",
"Upgrade-Insecure-Requests":1,
"user-agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.112 Safari/537.36"
}
s = Session()
req = Request('POST', requestUrl, data=login_data, headers=headers)
prepped = s.prepare_request(req)
prepped.body ="slugs%5B%5D=&rt=&user%5Burl_source%5D=&user%5Blocation_source%5D=orthodox_login&user%5Bemail%5D=XXX%40XXX.com&user%5Bpassword%5D=XXX"
resp = s.send(prepped)
print(resp.status_code)
In this approach I was trying to prepare the header exactly as my browser would do it. Sorry for redundancy. This results in HTTP error 400.
Does someone have an idea, what went wrong? Probably a lot.
Instead of spending a lot of energy on manually logging in and playing with Session, I suggest you just scrape the pages right away using your cookie.
When you log in, usually there is a cookie added to your request to identify your identity. Please see this for example:
Your code will be like this:
import requests
response = requests.get("www.example.com", cookies={
"c_user":"my_cookie_part",
"xs":"my_other_cookie_part"
})
print response.content