POST request table Python3 - python

I am trying to scrape a table of https://www.domeinquarantaine.nl/, however, for some reason, it does not give a response of the table
#The parameters
baseURL = "https://www.domeinquarantaine.nl/tabel.php"
PARAMS = {"qdate": "2019-04-21", "pagina": "2", "order": "karakter"}
DATA = {"qdate=2019-04-21&pagina=3&order="}
HEADERS = {"Host": "www.domeinquarantaine.nl",
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:65.0) Gecko/20100101 Firefox/65.0",
"Accept": "*/*",
"Accept-Language": "en-US,en;q=0.5",
"Accept-Encoding": "gzip, deflate, br",
"Referer": "https://www.domeinquarantaine.nl/",
"Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
"X-Requested-With": "XMLHttpRequest",
"Content-Length": "41",
"Connection": "keep-alive",
"Cookie": "_ga=GA1.2.1612813080.1548179877; PHPSESSID=5694f8e2e4f0b10e53ec2b54310c02cb; _gid=GA1.2.1715527396.1555747200"}
#POST request
r = requests.post(baseURL, headers = HEADERS, data = PARAMS)
#Checking the response
r.text
The response consists of strange tokens and question marks
So my question is why it is returning this response? And how to fix it to eventually end up with the scraped table?

Open web browser, turn off JavaScript and you will see what requests can get.
But using DevTools in Chrome/Firefox (tab Network, filter XHR requests) you should see POST request to url https://www.domeinquarantaine.nl/tabel.php and it sends back HTML with table.
If you open this url in browser then you see table - so you can get it event with GET but using POST you probably can filter data.
After writing this explanation I saw you already has this url in code - you didn't mention it in description.
You have different problem - you set
"Accept-Encoding": "gzip, deflate, br"
so server sends compressed response and you should uncompress it.
Or use
"Accept-Encoding": "deflate"
and server will send uncompressed data and you will see HTML with table

So there are a couple of reasons why you're getting what you're getting:
Your headers don't look correct
The data that you are sending contains some extra variables
The website requires cookies in order to display the table
This can be easily fixed by changing the data and headers variables and adding requests.session() to your code (which will automatically collect and inject cookies)
All in all your code should look like this:
import requests
session = requests.session()
headers = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:66.0) Gecko/20100101 Firefox/66.0", "Accept": "*/*", "Accept-Language": "en-US,en;q=0.5", "Accept-Encoding": "gzip, deflate", "Referer": "https://www.domeinquarantaine.nl/", "Content-Type": "application/x-www-form-urlencoded; charset=UTF-8", "X-Requested-With": "XMLHttpRequest", "DNT": "1", "Connection": "close"}
data={"qdate": "2019-04-20"}
session.get("https://www.domeinquarantaine.nl", headers=headers)
r = session.post("https://www.domeinquarantaine.nl/tabel.php", headers=headers, data=data)
r.text
Hope this helps!

Related

How do I log in to a site with a double request?

I'm trying to log in to the site, but I have a problem!
Here is my code:
from requests_ntlm import HttpNtlmAuth
import requests
from main import username, password
data = {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate, br",
"Accept-Language": "ru-RU,ru;q=0.9,en-US;q=0.8,en;q=0.7",
"Authorization": "NTLM TlRMTVNT.......",
"Cache-Control": "no-cache",
"Connection": "keep-alive",
"Cookie": "_ym_uid=1654686701790358885; _ym_d=1654686701; _ym_isad=2",
"Host": "...",
"Pragma": "no-cache",
"Referer": "https://...",
"sec-ch-ua": '" Not A;Brand";v="99", "Chromium";v="104", "Opera GX";v="90"',
"sec-ch-ua-mobile": "?0",
"sec-ch-ua-platform": "Windows",
"Sec-Fetch-Dest": "empty",
"Sec-Fetch-Mode": "cors",
"Sec-Fetch-Site": "same-origin",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/104.0.5112.102 Safari/537.36 OPR/90.0.4480.117"
}
auth = HttpNtlmAuth(username, password)
with requests.Session() as session:
q1 = session.get("https://...", auth=auth, headers=data)
data['Authorization'] = q1.headers.get("WWW-Authenticate")
q2 = session.get("https://...", auth=auth, headers=data)
print(q2.raise_for_status())
You need to log in inside the site. I used to use HttpBaseAuth, but after searching in the site files I saw that it does a strange thing using NTLM.
He makes a get request using my headers, receives a 401 and another "WWW-Authenticate" header in the response and resends this request, but with the changed "Authorization" header just the same to the value of the "WWW-Authenticate" header. The header "Authorization" in the very first request is always the same, the values do not change (unfortunately I can't write it here), but if you send it yourself, then the response is still 401 and via response.headers.get not view
What should I do?enter image description here
I can't log in to the site.
If you log in manually, in the browser, it makes a get request, receives the “WWW-authenticate” header in response, and makes a get request again, but with this header.
When I try to do the same thing through python, I get a 401 error.

Unable to send a file with a correct POST request to the website using Python

I'm trying to submit a file (test.exe) to a website using a POST request, but instead of a normal 302 response, it keeps responding with 500. I don't know what I could change in my request: maybe in the headers or in the files format, or maybe I need to somehow pass the data parameter?
I would appreciate any advice on this!
import requests
url = "https://cuckoo.cert.ee/submit/api/presubmit"
files = {"test.exe": open("test.exe", "rb")}
headers = {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate, br",
"Accept-Language": "en-US,en;q=0.9",
"Connection": "keep-alive",
"Content-Length": "199",
"Content-Type": "multipart/form-data; boundary=----WebKitFormBoundarymoUA16cLBrh9JNGC",
"Cookie": "csrftoken=O9tFpNhZuZrj7DsEnBAcj0wmV00z8qE3; theme=cyborg; csrftoken=O9tFpNhZuZrj7DsEnBAcj0wmV00z8qE3",
"Host": "cuckoo.cert.ee",
"Origin": "https://cuckoo.cert.ee",
"Referer": "https://cuckoo.cert.ee/submit/",
"sec-ch-ua-mobile": "?0",
"sec-ch-ua-platform": "Windows",
"Sec-Fetch-Dest": "empty",
"Sec-Fetch-Mode": "cors",
"Sec-Fetch-Site": "same-origin",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36",
"X-CSRFToken": "O9tFpNhZuZrj7DsEnBAcj0wmV00z8qE3"
}
response = requests.post(url, headers=headers, files=files, verify=False)
print(response)
Possibly try changing content type to application/octet-stream.
500 Error indicates that the website may not be able to handle the file that you are trying to upload. It could simply be that the website is malfunctioning or having a temporary failure.
If you have access to the back-end logs, I would recommend looking at that, or contacting the website to see if they have any suggestions.
EDIT:
Verify that your content matches the length that you are declaring as well, it looks like you have a content-length parameter declared in your request. Try taking that out to see if that helps.

How to generate Dynamic request header in Python?

I am new on python. I am sending POST Request using this line of code:
response = requests.post(url=API_ENDPOINT, headers=headers, data=payload)
The problem is that the values of header are dynamic(they are different every time on browser).
These are the headers in browser:
headers = {
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
"Accept-Encoding": "gzip, deflate, br",
"Accept-Language": "en-US,en;q=0.5",
"Connection": "keep-alive",
"Content-Length": "276",
"Content-Type": "application/x-www-form-urlencoded",
"Cookie": "acceptedCookie=%7B%22type%22%3A%22all%22%7D; TS01a14d32=01f893c9654ba8a49f70366efc3464fd76d4a461343cf44a7f074a5071b9818b6b196051effd669b784f691c8fab79bdc5a7efada418db04fc3cf8c3e43224fe186e64941eab43b5d9500201644abda7c0f5914ebb9ab95046ee2cb83c43f259ab0ed0e538fee3db50b2aa541ee5646d70634cea4cec54352547d3366c51e2ae5270756ee57bf78d915dcb8209c9c5771956c715bd75fb761bf42da6ba5cfa34ffbfee670e871ed33f8e25c09fdfc882953efd981f; ASLBSA=85b54f44c65f329c72b20a3ee7a9fc9a63d44001bc2c4e2c2b2f26fdaba7e0e3; ASLBSACORS=85b54f44c65f329c72b20a3ee7a9fc9a63d44001bc2c4e2c2b2f26fdaba7e0e3; utag_main=v_id:0179ddda773b0020fa6584d13ce40004e024f00d00978$_sn:2$_ss:1$_st:1623016566769$_pn:1%3Bexp-session$ses_id:1623014766769%3Bexp-session; s_cc=true; s_fid=3BE425806C624053-0396695F1870C86E; s_sq=luxmyluxottica%3D%2526pid%253DSite%25253APreLogin%25253ALogin%2526pidt%253D1%2526oid%253DLOGIN%2526oidt%253D3%2526ot%253DSUBMIT; todayVisit=true",
"Host": "mywebsite.com",
"Origin": "https://mywebsite.com",
"Referer": "https://mywebsite.com",
"TE": "Trailers",
"Upgrade-Insecure-Requests": "1",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:88.0) Gecko/20100101 Firefox/88.0"
}
The value of content length, cookies, Accept parameter is different every time whenever I hit the API on browser, so I cannot just copy paste the values of headers and send it on POST request. How to generate this dynamic header(how to generate content length, cookies etc)? Please help.

Incomplete response for XHR request in python with requests.get()

I am trying to scrape German zip codes (PLZ) for a given street in a given city using Python's requests on this server. I am trying to apply what I learned here.
I want to return the PLZ of
Schanzäckerstr. in Nürnberg.
import requests
url = 'https://www.11880.com/ajax/getsuggestedcities/schanz%C3%A4ckerstra%C3%9Fe%20n%C3%BCrnberg?searchString=schanz%25C3%25A4ckerstra%25C3%259Fe%2520n%25C3%25BCrnberg'
data = 'searchString=schanz%25C3%25A4ckerstra%25C3%259Fe%2520n%25C3%25BCrnberg'
headers = {"Authority": "wwww.11880.com",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:70.0) Gecko/20100101 Firefox/70.0",
"Accept": "application/json, text/javascript, */*; q=0.01",
"Accept-Language": "de-DE,de;q=0.9,en-US;q=0.8,en;q=0.7",
"Accept-Encoding": "gzip, deflate, br",
"X-Requested-With": "XMLHttpRequest",
"Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
"Content-Length": "400",
"Origin": "https://www.postleitzahlen.de",
"Sec-Fetch-Site": "cross-site",
"Fetch-Mode": "cors",
"DNT": "1",
"Connection": "keep-alive",
"Referer": "https://www.postleitzahlen.de",
}
multipart_data = {(None, data,)}
session = requests.Session()
response = session.get(url, files=multipart_data, headers=headers)
print(response.text)
The above code yields an empty response of the type 200. I want to return:
'90443'
I was able to solve this problem using nominatim openstreetmap API. One can also add street numbers
import requests
city = 'Nürnberg'
street = 'Schanzäckerstr. 2'
response = requests.get( 'https://nominatim.openstreetmap.org/search', headers={'User-Agent': 'PLZ_scrape'}, params={'city': city, 'street': street[1], 'format': 'json', 'addressdetails': '1'}, )
print(street, ',', [i.get('address').get('postcode') for i in response.json()][0])
Make sure to only send one request per second.

Cannot get the right response using python requests' post method

I'm trying to automate the recovery of data from this website (The one I want is "
BVBG.086.01 PriceReport"). Checking with firefox, I found out that the request URL to which the POST is made is "http://www.bmf.com.br/arquivos1/lum-download_ipn.asp", and the parameters are:
hdnStatus: "ativo"
chkArquivoDownload_ativo "28"
txtDataDownload_ativo "09/02/2018"
imgSubmeter "Download"
txtDataDownload_externo_ativo [3]
0 "25/08/2017"
1 "25/08/2017"
2 "25/08/2017"
So, if I use hurl.it to make the request, the response is the correct 302 redirect (Pointing to a FTP URL where the requested files are, something like "Location: /FTP/Temp/10981738/Download.ex_"). (Example of the request here).
So I've tried doing the same with with the following code (Using python's library "requests", and I have tried both versions of request_body, trying to put it into the "data" parameter of the post method)
request_url = "http://www.bmf.com.br/arquivos1/lum-download_ipn.asp"
request_headers = {
"Host": "www.bmf.com.br",
"User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:54.0) Gecko/20100101 Firefox/54.0",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.5",
"Accept-Encoding": "gzip, deflate",
"Referer": "http://www.bmf.com.br/arquivos1/lum-arquivos_ipn.asp?idioma=pt-BR&status=ativo",
"Content-Type": "application/x-www-form-urlencoded",
"Content-Length": "236",
"Connection": "keep-alive",
"Upgrade-Insecure-Requests": "1"
}
# request_body = "hdnStatus=ativo&chkArquivoDownload_ativo=28&txtDataDownload_ativo=09/02/2018&imgSubmeter=Download&txtDataDownload_externo_ativo=25/08/2017&txtDataDownload_externo_ativo=25/08/2017&txtDataDownload_externo_ativo=25/08/2017"
request_body = {
"hdnStatus" : "ativo",
"chkArquivoDownload_ativo": "28",
"txtDataDownload_ativo": "09/02/2018",
"imgSubmeter": "Download",
"txtDataDownload_externo_ativo": ["25/08/2017", "25/08/2017", "25/08/2017"]
}
result_query = post(request_url, request_body, headers=request_headers)
# result_query = post(request_url, data=request_body, headers=request_headers)
for red in result_query.history:
print(BeautifulSoup(red.content, "lxml"))
print()
print(result_query.url)
And what I get is the following response:
<html><head><title>Object moved</title></head>
<body><h1>Object Moved</h1>This object may be found here.</body>
</html>
<html><head><title>Object moved</title></head>
<body><h1>Object Moved</h1>This object may be found here.</body>
</html>
<html><head><title>Object moved</title></head>
<body><h1>Object Moved</h1>This object may be found here.</body>
</html>
http://www.bmf.com.br/arquivos1/lum-arquivos_ipn.asp?idioma=pt-BR&status=ativo
And not the one I wanted (Which should point to the location of the file). What am I doing wrong here?

Categories

Resources