I try to login to the member area of the following website :
https://trader.degiro.nl/
Unfortunately, I tried many way without success.
The post form since to be a json it's the reason why I sent a json instead of the post data
import requests
session = requests.Session()
data = {"username":"test", "password":"test", "isRedirectToMobile": "false", "loginButtonUniversal": ""}
url = "https://trader.degiro.nl/login/#/login"
headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.62 Safari/537.36'}
r = session.post(url, headers=headers, json={'json_payload': data})
Does any one have a idea why it doesn't work ?
Looking at the request my browser sends, the code should be:
url = "https://trader.degiro.nl/login/secure/login"
...
r = session.post(url, headers=headers, json=data)
That is, there's no need to wrap the data in json_payload and the url is slightly different to the one for viewing the login page.
Related
Requests.get() does not seem to be returning the expected bytes for Wikipedia image URLs, such as https://upload.wikimedia.org/wikipedia/commons/0/05/20100726_Kalamitsi_Beach_Ionian_Sea_Lefkada_island_Greece.jpg:
import wikipedia
import requests
page = wikipedia.page("beach")
first_image_link = page.images[0]
req = requests.get(first_image_link)
req.content
b'<!DOCTYPE html>\n<html lang="en">\n<meta charset="utf-8">\n<title>Wikimedia Error</title>\n<style>\n*...
Most websites block requests that come in without a valid browser as a User-Agent. Wikimedia is one such.
import requests
headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Safari/537.36'}
res = requests.get('https://upload.wikimedia.org/wikipedia/commons/0/05/20100726_Kalamitsi_Beach_Ionian_Sea_Lefkada_island_Greece.jpg', headers=headers)
res.content
which will give you expected output
I typed your code and it seems to be an "Error: 403, Forbidden.". Wikipedia requires a user agent header in the request.
import wikipedia
import requests
headers = {
'User-Agent': 'My User Agent 1.0'
}
page = wikipedia.page("beach")
first_image_link = page.images[0]
req = requests.get(first_image_link, headers=headers, stream=True)
req.content
For the user agent, you should probably supply something a bit more descriptive than the placeholder I use in my example. Maybe the name of your script, or just the word "script" or something like that.
I tested it and it works fine. You will get back the image as you are expecting.
I am trying to scrape this website: https://ssweb.seap.minhap.es/portalEELL/consulta_alcaldes
When you choose Alicante from the first menu and then Ayuntamiento de Abengibre from the second you will see a table with results. This is what I want.
I saw in Chrome Console that choosing the values in drop-downs generates a POST request. So I thought it would be straight-forward to obtain that with requests.post
params = {
"consulta_alcalde[_csrf_token]":"dd1546dd35bf0f1af4a1f3aac165a1b5",
"consulta_alcalde[id_provincia]":"2",
"consulta_alcalde[id_entidad]":"17926"
}
r = requests.post("https://ssweb.seap.minhap.es/portalEELL/consulta_alcaldes", params)
But then when I check what r.text contains I get 200 response but can't see my data from the table. What am I doing wrong?
I am aware it can be done with Selenium but I am trying to avoid it as it's very slow.
EDIT:
As per Brian's suggestion I have modified my code as:
params = {
"consulta_alcalde[_csrf_token]":"dd1546dd35bf0f1af4a1f3aac165a1b5",
"consulta_alcalde[id_provincia]":"2",
"consulta_alcalde[id_entidad]":"17951",
"User-Agent":"Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36"
}
with requests.Session() as s:
s.get("https://ssweb.seap.minhap.es/portalEELL/consulta_alcaldes")
r = s.post("https://ssweb.seap.minhap.es/portalEELL/consulta_alcaldes", data=params)
But still no luck...
The "csrf_token" is not static, you'll have to parse the page with bs4 to get it.
Also the site provides content via xhr request, so you need to have "XMLHttpRequest" in the headers. Code:
url = 'https://ssweb.seap.minhap.es/portalEELL/consulta_alcaldes'
s = requests.Session()
r = s.get(url, verify=False)
soup = BeautifulSoup(r.content, 'html.parser')
csrf_token = soup.find('input', id="consulta_alcalde__csrf_token")['value']
data = {
"consulta_alcalde[_csrf_token]":csrf_token,
"consulta_alcalde[id_provincia]":"2",
"consulta_alcalde[id_entidad]":"17951"
}
headers = {"X-Requested-With":"XMLHttpRequest"}
r = s.post(url, data=data, headers=headers, verify=False)
print(r.content)
With a post request, the payload should be the body of the request. To do this make pass the params using the data keyword argument.
requests.post(url, data=payload)
If the post requires json, then you can either use json.dumps or simply pass the payload to the json keyword argument instead.
requests.post(url, json=payload)
i would like to take the response data about a specific website.
I have this site:
https://enjoy.eni.com/it/milano/map/
and if i open the browser debuger console i can see a posr request that give a json response:
how in python i can take this response by scraping the website?
Thanks
Apparently the webservice has a PHPSESSID validation so we need to get it first using proper user agent:
import requests
import json
headers = {
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.76 Safari/537.36'
}
r = requests.get('https://enjoy.eni.com/it/milano/map/', headers=headers)
session_id = r.cookies['PHPSESSID']
headers['Cookie'] = 'PHPSESSID={};'.format(session_id)
res = requests.post('https://enjoy.eni.com/ajax/retrieve_vehicles', headers=headers, allow_redirects=False)
json_obj = json.loads(res.content)
I have tried logging into GitHub using the following code:
url = 'https://github.com/login'
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36',
'login':'username',
'password':'password',
'authenticity_token':'Token that keeps changing',
'commit':'Sign in',
'utf8':'%E2%9C%93'
}
res = requests.post(url)
print(res.text)
Now, res.text prints the code of login page. I understand that it maybe because the token keeps changing continuously. I have also tried setting the URL to https://github.com/session but that does not work either.
Can anyone tell me a way to generate the token. I am looking for a way to login without using the API. I had asked another question where I mentioned that I was unable to login. One comment said that I am not doing it right and it is possible to login just by using the requests module without the help of Github API.
ME:
So, can I log in to Facebook or Github using the POST method? I have tried that and it did not work.
THE USER:
Well, presumably you did something wrong
Can anyone please tell me what I did wrong?
After the suggestion about using sessions, I have updated my code:
s = requests.Session()
headers = {Same as above}
s.put('https://github.com/session', headers=headers)
r = s.get('https://github.com/')
print(r.text)
I still can't get past the login page.
I think you get back to the login page because you are redirected and since your code doesn't send back your cookies, you can't have a session.
You are looking for session persistance, requests provides it :
Session Objects The Session object allows you to persist certain
parameters across requests. It also persists cookies across all
requests made from the Session instance, and will use urllib3's
connection pooling. So if you're making several requests to the same
host, the underlying TCP connection will be reused, which can result
in a significant performance increase (see HTTP persistent
connection).
s = requests.Session()
s.get('http://httpbin.org/cookies/set/sessioncookie/123456789')
r = s.get('http://httpbin.org/cookies')
print(r.text)
# '{"cookies": {"sessioncookie": "123456789"}}'
http://docs.python-requests.org/en/master/user/advanced/
Actually in post method the request parameters should be in request body, not in header.So the login data should be in data parameter.
For github, authenticity token is present in value attribute of an input tag which is extracted using BeautifulSoup library.
This code works fine
import requests
from getpass import getpass
from bs4 import BeautifulSoup
headers = {
'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36'
}
login_data = {
'commit': 'Sign in',
'utf8': '%E2%9C%93',
'login': input('Username: '),
'password': getpass()
}
url = 'https://github.com/session'
session = requests.Session()
response = session.get(url, headers=headers)
soup = BeautifulSoup(response.text, 'html5lib')
login_data['authenticity_token'] = soup.find(
'input', attrs={'name': 'authenticity_token'})['value']
response = session.post(url, data=login_data, headers=headers)
print(response.status_code)
response = session.get('https://github.com', headers=headers)
print(response.text)
This code works perfectly
headers = {
'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36'
}
login_data = {
'commit': 'Sign in',
'utf8': '%E2%9C%93',
'login': 'your-username',
'password': 'your-password'
}
with requests.Session() as s:
url = "https://github.com/session"
r = s.get(url, headers=headers)
soup = BeautifulSoup(r.content, 'html5lib')
login_data['authenticity_token'] = soup.find('input', attrs={'name': 'authenticity_token'})['value']
r = s.post(url, data=login_data, headers=headers)
You can also try using the PyGitHub API to perform common git tasks.
Check the link below:
https://github.com/PyGithub/PyGithub
import requests
headers ={
"Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Encoding":"gzip, deflate",
"Accept-Language":"en-US,en;q=0.5",
"Connection":"keep-alive",
"Host":"mcfbd.com",
"Referer":"https://mcfbd.com/mcf/FrmView_PropertyTaxStatus.aspx",
"User-Agent":"Mozilla/5.0(Windows NT 10.0; WOW64; rv:40.0) Gecko/20100101 Firefox/40.0"}
a = requests.session()
soup = BeautifulSoup(a.get("https://mcfbd.com/mcf/FrmView_PropertyTaxStatus.aspx").content)
payload = {"ctl00$ContentPlaceHolder1$txtSearchHouse":"",
"ctl00$ContentPlaceHolder1$txtSearchSector":"",
"ctl00$ContentPlaceHolder1$txtPropertyID":"",
"ctl00$ContentPlaceHolder1$txtownername":"",
"ctl00$ContentPlaceHolder1$ddlZone":"1",
"ctl00$ContentPlaceHolder1$ddlSector":"2",
"ctl00$ContentPlaceHolder1$ddlBlock":"2",
"ctl00$ContentPlaceHolder1$btnFind":"Search",
"__VIEWSTATE":soup.find('input',{'id':'__VIEWSTATE'})["value"],
"__VIEWSTATEGENERATOR":"14039419",
"__EVENTVALIDATION":soup.find("input",{"name":"__EVENTVALIDATION"})["value"],
"__SCROLLPOSITIONX":"0",
"__SCROLLPOSITIONY":"0"}
b = a.post("https://mcfbd.com/mcf/FrmView_PropertyTaxStatus.aspx",headers = headers,data = payload).text
print(b)
above is my code for this website.
https://mcfbd.com/mcf/FrmView_PropertyTaxStatus.aspx
I checked firebug out and these are the values of the form data.
however doing this:
b = requests.post("https://mcfbd.com/mcf/FrmView_PropertyTaxStatus.aspx",headers = headers,data = payload).text
print(b)
throws this error:
[ArgumentException]: Invalid postback or callback argument
is my understanding of submitting forms via request correct?
1.open firebug
2.submit form
3.go to the NET tab
4.on the NET tab choose the post tab
5.copy form data like the code above
I've always wanted to know how to do this. I could use selenium but I thought I'd try something new and use requests
The error you are receiving is correct because the fields like _VIEWSTATE (and others as well) are not static or hardcoded. The proper way to do this is as follows:
Create a Requests Session object. Also, it is advisable to update it with headers containing USER-AGENT string -
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.101 Safari/537.36",}`
s = requests.session()
Navigate to the specified url -
r = s.get(url)
Use BeautifulSoup4 to parse the html returned -
from bs4 import BeautifulSoup
soup = BeautifulSoup(r.content, 'html5lib')
Populate formdata with the hardcoded values and dynamic values -
formdata = {
'__VIEWSTATE': soup.find('input', attrs={'name': '__VIEWSTATE'})['value'],
'field1': 'value1'
}
Then send the POST request using the session object itself -
s.post(url, data=formdata)