I am new on python. I am sending POST Request using this line of code:
response = requests.post(url=API_ENDPOINT, headers=headers, data=payload)
The problem is that the values of header are dynamic(they are different every time on browser).
These are the headers in browser:
headers = {
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
"Accept-Encoding": "gzip, deflate, br",
"Accept-Language": "en-US,en;q=0.5",
"Connection": "keep-alive",
"Content-Length": "276",
"Content-Type": "application/x-www-form-urlencoded",
"Cookie": "acceptedCookie=%7B%22type%22%3A%22all%22%7D; TS01a14d32=01f893c9654ba8a49f70366efc3464fd76d4a461343cf44a7f074a5071b9818b6b196051effd669b784f691c8fab79bdc5a7efada418db04fc3cf8c3e43224fe186e64941eab43b5d9500201644abda7c0f5914ebb9ab95046ee2cb83c43f259ab0ed0e538fee3db50b2aa541ee5646d70634cea4cec54352547d3366c51e2ae5270756ee57bf78d915dcb8209c9c5771956c715bd75fb761bf42da6ba5cfa34ffbfee670e871ed33f8e25c09fdfc882953efd981f; ASLBSA=85b54f44c65f329c72b20a3ee7a9fc9a63d44001bc2c4e2c2b2f26fdaba7e0e3; ASLBSACORS=85b54f44c65f329c72b20a3ee7a9fc9a63d44001bc2c4e2c2b2f26fdaba7e0e3; utag_main=v_id:0179ddda773b0020fa6584d13ce40004e024f00d00978$_sn:2$_ss:1$_st:1623016566769$_pn:1%3Bexp-session$ses_id:1623014766769%3Bexp-session; s_cc=true; s_fid=3BE425806C624053-0396695F1870C86E; s_sq=luxmyluxottica%3D%2526pid%253DSite%25253APreLogin%25253ALogin%2526pidt%253D1%2526oid%253DLOGIN%2526oidt%253D3%2526ot%253DSUBMIT; todayVisit=true",
"Host": "mywebsite.com",
"Origin": "https://mywebsite.com",
"Referer": "https://mywebsite.com",
"TE": "Trailers",
"Upgrade-Insecure-Requests": "1",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:88.0) Gecko/20100101 Firefox/88.0"
}
The value of content length, cookies, Accept parameter is different every time whenever I hit the API on browser, so I cannot just copy paste the values of headers and send it on POST request. How to generate this dynamic header(how to generate content length, cookies etc)? Please help.
Related
I'm trying to log in to the site, but I have a problem!
Here is my code:
from requests_ntlm import HttpNtlmAuth
import requests
from main import username, password
data = {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate, br",
"Accept-Language": "ru-RU,ru;q=0.9,en-US;q=0.8,en;q=0.7",
"Authorization": "NTLM TlRMTVNT.......",
"Cache-Control": "no-cache",
"Connection": "keep-alive",
"Cookie": "_ym_uid=1654686701790358885; _ym_d=1654686701; _ym_isad=2",
"Host": "...",
"Pragma": "no-cache",
"Referer": "https://...",
"sec-ch-ua": '" Not A;Brand";v="99", "Chromium";v="104", "Opera GX";v="90"',
"sec-ch-ua-mobile": "?0",
"sec-ch-ua-platform": "Windows",
"Sec-Fetch-Dest": "empty",
"Sec-Fetch-Mode": "cors",
"Sec-Fetch-Site": "same-origin",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/104.0.5112.102 Safari/537.36 OPR/90.0.4480.117"
}
auth = HttpNtlmAuth(username, password)
with requests.Session() as session:
q1 = session.get("https://...", auth=auth, headers=data)
data['Authorization'] = q1.headers.get("WWW-Authenticate")
q2 = session.get("https://...", auth=auth, headers=data)
print(q2.raise_for_status())
You need to log in inside the site. I used to use HttpBaseAuth, but after searching in the site files I saw that it does a strange thing using NTLM.
He makes a get request using my headers, receives a 401 and another "WWW-Authenticate" header in the response and resends this request, but with the changed "Authorization" header just the same to the value of the "WWW-Authenticate" header. The header "Authorization" in the very first request is always the same, the values do not change (unfortunately I can't write it here), but if you send it yourself, then the response is still 401 and via response.headers.get not view
What should I do?enter image description here
I can't log in to the site.
If you log in manually, in the browser, it makes a get request, receives the “WWW-authenticate” header in response, and makes a get request again, but with this header.
When I try to do the same thing through python, I get a 401 error.
when you try to display a get request in the text, it gives out incomprehensible characters, I would like it to produce normal text
Code:
headers = {
"authority": "www.ozon.ru",
"method": "GET",
"path": "/product/playstation-5-digital-edition-339866183/?asb=ZdbNZjh%252BgUCDpV0uw5ZLJUkaSn2wNH%252FSaAKJ%252BAxhX2M%253D&asb2=ayxVVx0ddcEtoLM3AwfnfVSDeZSpnVMgJu1dkk3rkjo&keywords=playstation+5&sh=0OBU25Oz",
"scheme": "https",
"accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
"accept-encoding": "gzip, deflate, br",
"accept-language": "ru-RU,ru;q=0.9,en-US;q=0.8,en;q=0.7",
"cache-control": "max-age=0",
"cookie": "__Secure-ext_xcid=704b112d0788105d7206457724d88846; _gcl_au=1.1.954892941.1622892064; visid_incap_2251426=KU5k7V3nRbi5hIuCe4vdFn1su2AAAAAAQUIPAAAAAADSheykofd2VQeH1V6sdne6; visid_incap_1101384=ng7gdJKQTtaTXDhwXw0BERteu2AAAAAAQkIPAAAAAACAxLmcAby/fnf6k3vP36Lor9I9dWxx3mbg; __Secure-ab-group=30; _abck=AD1EC4FEF31E5162861A4C24D2CA7963~-1~YAAQVxndWMOJ1Bl6AQAAzuyMdwYmY8viAq7FQPAI0crJs+Y7Tol5pA9DDuINFgy8m+dW33GrxDi2sthCys8Q8xdFoZ5b/+cj885D7t6jQxlVTWRyFksPOyCfG+aPZcNjWLG4gtLYGhig4GmVY2IhbziLiACrJVZ9tvvQe+bPDscWtCGH5oFB2KDTmr+/5anJzP52dInIJRinf0G36Uv6LmTBvJ5oqmtHns+wdvWHV2/XtFBwUrKukPL/yB4I534FenLEKBs/go7uQS0q8XCAoeXQuHxE+XXEHteC3ViGCfdsi83AQgjjXemaeBg6rIcc6GOo4HS+NPR/o20jeZaNPOw21BZoSvhmSzk3WWAoOxqjayhWTKVE/Uu0k/2n3yS2XuFjUsw9nmMtOslXKyWPHYWcAvAFzw==~-1~-1~-1; _gcl_dc=GCL.1631556671.CjwKCAjwyvaJBhBpEiwA8d38vJ0cwc6gyFvDIalkCBIdbC18GmVXVD0XxIgwZoz--ClC6ErOz2uVmRoCw6oQAvD_BwE; visid_incap_1285159=FmeU/lKJQvaWmkgD2CjChIi8QGEAAAAAQUIPAAAAAADss4s71kpz7hK78LrQuFFa; __Secure-user-id=0; _gcl_aw=GCL.1633529190.CjwKCAjwkvWKBhB4EiwA-GHjFrQIvq2PTJYu-I5iwQo9hU06pvsUVvjj37nTH7ACBQDGNCG1NvNHlBoCIvEQAvD_BwE; nlbi_1101384=wNPJODVEXFvoYA/LK8plmQAAAACQSMQv10AIEBv9M1qG+ZgE; incap_ses_584_1101384=8gq4QeKdPVbcmBCIpMkaCMfCZWEAAAAAxQKn2KtmDQeSxEYbjPxM4Q==; xcid=b0d40e42f20927f1b6ef2f74056069fb; incap_ses_633_1101384=NoBfaWnlixsskJIVP97ICA2dZmEAAAAARs7zxjAA1ruartIDr0d2SA==; __Secure-access-token=3.0.BrCcd3kzR2aUcKDrHyfVvw.30.l8cMBQAAAABhQLuyDO5qoKN3ZWKgAICQoA..20211013104710.f77kLnpPyPCZUz33bipJ1qSFh7n4QIBACd22xU-M_sE; __Secure-refresh-token=3.0.BrCcd3kzR2aUcKDrHyfVvw.30.l8cMBQAAAABhQLuyDO5qoKN3ZWKgAICQoA..20211013104710.CAQWWNrTHcPBdzYVN9iOE7QB4LwfW4rmHjDJEszki5A; incap_ses_585_1101384=y3r7Vz0DllfLWjWKF1ceCA2dZmEAAAAAne5pnSZ7U0xGgsv7j+fBMA==",
"referer": "https://www.ozon.ru/cart",
"sec-ch-ua": '"Chromium";v="94", "Google Chrome";v="94", ";Not A Brand";v="99"',
"sec-ch-ua-mobile": "?0",
"sec-ch-ua-platform": "Windows",
"sec-fetch-dest": "document",
"sec-fetch-mode": "navigate",
"sec-fetch-site": "same-origin",
"sec-fetch-user": "?1",
"upgrade-insecure-requests": "1",
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.81 Safari/537.36",
}
r = requests.get('https://www.ozon.ru/product/playstation-5-digital-edition-339866183/?asb=ZdbNZjh%252BgUCDpV0uw5ZLJUkaSn2wNH%252FSaAKJ%252BAxhX2M%253D&asb2=ayxVVx0ddcEtoLM3AwfnfVSDeZSpnVMgJu1dkk3rkjo&keywords=playstation+5&sh=0OBU25Oz', headers=headers).text
print(r)
response
how to make normal text output?
It looks like you're getting compressed output. (You could verify that by printing r.headers and looking at Content-Encoding.)
Remove the
"accept-encoding": "gzip, deflate, br",
request header because that claims you can accept brotli-compressed content, which Requests by default doesn't handle.
Since you specified accept-encoding, the server performed compression before sending the data to you. ~~It's a Zip compression (begins with PK)~~
Just don't specify accept-encoding at all and the server will likely send you an uncompressed data stream.
More about accept-encoding
I have been trying to figure this out for a couple hours and no answers I've found on here or elsewhere has worked. I am using requests version 2.26.0 and trying to grab the cookie to add to my headers for later use. I did notice on the page that the cookie starts with php-session, perhaps there is a different way I need to grab it besides requests. Anyways, here is the headers I am using and the code I used to try and get the cookie, all it ever outputs is <RequestsCookieJar[]>, no matter what I try.
import requests
headers = {
"Accept": "*/",
"Accept-Encoding": "gzip, deflate, br",
"Accept-Language": "en-US,en;q=0.5",
"Connection": "keep-alive",
"Content-Length": "565",
"Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
"Cookie": "ig_cb=1",
"DNT": "1",
"Host": "www.numlookup.com",
"Origin": "https://www.numlookup.com",
"Referer": "https://www.numlookup.com",
"Sec-GPC": "1",
"TE": "Trailers",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; rv:78.0) Gecko/20100101 Firefox/78.0",
"X-Requested-With": "XMLHttpRequest"
}
s = requests.Session()
s2 = s.get("https://www.numlookup.com/", headers=headers).cookies
print(s2)
I am trying to extract information for all the jobs on this website:
https://www.americanmobile.com/travel-nursing-jobs/search/
Looking at the network activity tab, it looks like all the data I need comes from a POST request made here:
https://jobs.amnhealthcare.com/api/jobs//search. I have attached an image that may help confirm exactly what I am referencing.
example_1
I wrote the following code in Google Colab to try to at least get the first 10 results. Referencing python requests POST with header and parameters, I know a lot of the headers may not even be necessary. I have tried sending this request without any headers at all.
Is what I'm trying to do even possible? I have only gotten a 400 response code so far.
If it is possible to accomplish this, is it possible to extract this information for all 4k + jobs?
import requests
import re
payload = {
"PageNumber": "1",
"JobsPerPage": "10",
"Filters": {
"Locations": "[]",
"Skillset": "[]",
"StartDates": "[]",
"Shifts": "[]",
"Durations": "[]",
"IsCovid19Job": "false",
"Exclusive": "false",
"Skillsets": "[]",
"DivisionCompanyId": "2",
"LocationSearch": "",
"KeywordSearch": "",
"DaxtraJobIds": "[]"
},
"SortOrder": {
"Header": "MaxPayRate",
"SortDirection": "Descending"
}
}
headers = {
"Host": "jobs.amnhealthcare.com",
"Connection": "keep-alive",
"Content-Length": "315",
"Accept": "application/json, text/plain, */*",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.90 Safari/537.36",
"Content-Type": "application/json;charset=UTF-8",
"Origin": "https://www.americanmobile.com",
"Sec-Fetch-Site": "cross-site",
"Sec-Fetch-Mode": "cors",
"Sec-Fetch-Dest": "empty",
"Referer": "https://www.americanmobile.com/",
"Accept-Encoding": "gzip, deflate, br",
"Accept-Language": "en-US,en;q=0.9"
}
response = requests.post('https://jobs.amnhealthcare.com/api/jobs//search' , data=p , headers=headers)
Thank you
The formatting of your data wasn't entirely correct. This should work:
import requests
headers = {
"Host": "jobs.amnhealthcare.com",
"Connection": "keep-alive",
"Content-Length": "315",
"Accept": "application/json, text/plain, */*",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.90 Safari/537.36",
"Content-Type": "application/json;charset=UTF-8",
"Origin": "https://www.americanmobile.com",
"Sec-Fetch-Site": "cross-site",
"Sec-Fetch-Mode": "cors",
"Sec-Fetch-Dest": "empty",
"Referer": "https://www.americanmobile.com/",
"Accept-Encoding": "gzip, deflate, br",
"Accept-Language": "en-US,en;q=0.9"
}
data = '{"PageNumber":1,"JobsPerPage":10,"Filters":{"Locations":[],"Skillset":[],"StartDates":[],"Shifts":[],"Durations":[],"IsCovid19Job":false,"Exclusive":false,"Skillsets":[],"DivisionCompanyId":2,"LocationSearch":"","KeywordSearch":"","DaxtraJobIds":[]},"SortOrder":{"Header":"MaxPayRate","SortDirection":"Descending"}}'
response = requests.post('https://jobs.amnhealthcare.com/api/jobs//search', headers=headers, data=data)
You can now adapt JobsPerPage and PageNumber to retrieve all the posts you need in a for-loop.
I am trying to scrape German zip codes (PLZ) for a given street in a given city using Python's requests on this server. I am trying to apply what I learned here.
I want to return the PLZ of
Schanzäckerstr. in Nürnberg.
import requests
url = 'https://www.11880.com/ajax/getsuggestedcities/schanz%C3%A4ckerstra%C3%9Fe%20n%C3%BCrnberg?searchString=schanz%25C3%25A4ckerstra%25C3%259Fe%2520n%25C3%25BCrnberg'
data = 'searchString=schanz%25C3%25A4ckerstra%25C3%259Fe%2520n%25C3%25BCrnberg'
headers = {"Authority": "wwww.11880.com",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:70.0) Gecko/20100101 Firefox/70.0",
"Accept": "application/json, text/javascript, */*; q=0.01",
"Accept-Language": "de-DE,de;q=0.9,en-US;q=0.8,en;q=0.7",
"Accept-Encoding": "gzip, deflate, br",
"X-Requested-With": "XMLHttpRequest",
"Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
"Content-Length": "400",
"Origin": "https://www.postleitzahlen.de",
"Sec-Fetch-Site": "cross-site",
"Fetch-Mode": "cors",
"DNT": "1",
"Connection": "keep-alive",
"Referer": "https://www.postleitzahlen.de",
}
multipart_data = {(None, data,)}
session = requests.Session()
response = session.get(url, files=multipart_data, headers=headers)
print(response.text)
The above code yields an empty response of the type 200. I want to return:
'90443'
I was able to solve this problem using nominatim openstreetmap API. One can also add street numbers
import requests
city = 'Nürnberg'
street = 'Schanzäckerstr. 2'
response = requests.get( 'https://nominatim.openstreetmap.org/search', headers={'User-Agent': 'PLZ_scrape'}, params={'city': city, 'street': street[1], 'format': 'json', 'addressdetails': '1'}, )
print(street, ',', [i.get('address').get('postcode') for i in response.json()][0])
Make sure to only send one request per second.