I have been trying to figure this out for a couple hours and no answers I've found on here or elsewhere has worked. I am using requests version 2.26.0 and trying to grab the cookie to add to my headers for later use. I did notice on the page that the cookie starts with php-session, perhaps there is a different way I need to grab it besides requests. Anyways, here is the headers I am using and the code I used to try and get the cookie, all it ever outputs is <RequestsCookieJar[]>, no matter what I try.
import requests
headers = {
"Accept": "*/",
"Accept-Encoding": "gzip, deflate, br",
"Accept-Language": "en-US,en;q=0.5",
"Connection": "keep-alive",
"Content-Length": "565",
"Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
"Cookie": "ig_cb=1",
"DNT": "1",
"Host": "www.numlookup.com",
"Origin": "https://www.numlookup.com",
"Referer": "https://www.numlookup.com",
"Sec-GPC": "1",
"TE": "Trailers",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; rv:78.0) Gecko/20100101 Firefox/78.0",
"X-Requested-With": "XMLHttpRequest"
}
s = requests.Session()
s2 = s.get("https://www.numlookup.com/", headers=headers).cookies
print(s2)
Related
I want to send a POST request to a website to get a download link, but when I do, it responds with:
{"status":"ok","c_status":"FAILED","mess":"An error has occurred. Please try again! "}
When I resend the request from the inspect tab of Firefox it responds with:
{"status":"ok","mess":"","c_status":"CONVERTED","vid":"F_HoMkkRHv8","title":"Cake - The Distance (Official Video)","ftype":"mp3","fquality":"128","dlink":"https:\/\/dl143.y2mate.com\/?file=M3R4SUNiN3JsOHJ6WWQ3aTdPRFA4NW1rRVJIOGxQSXZuWjV4OHhnbVNvOUZ0Wmt1MmVlbGFPWkpLSzRNeEl1dVd1aGQ4VHZYUDkyYlkwbVB2NVFqZldPQTQ5NWcvRzNwNm9FMVRkeHpVMU9xdmV1enhYUWtyMUt3TFA3cktwQlpRSHh3a1doMWkyaWUzS0tTdmhEMzdsNk00VWliZkMwWXR5OENNUENOb01rZWpUdVNOcWV4ZzlZV3UzdWI0TTg9"}
Which is what I want to get with my script
I'm not sure whether it needs headers or not so I've tried with and without adding headers but neither changed anything.
If anyone can tell what I'm doing wrong/misunderstanding that would be greatly appreciated
import requests
url = "http://yt1s.com/api/ajaxConvert/convert"
header = {
"Host": "yt1s.com",
"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:106.0) Gecko/20100101 Firefox/106.0",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.5",
"Accept-Encoding": "gzip, deflate, br",
"Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
"Content-Length": "94",
"Origin": "null",
"Upgrade-Insecure-Requests": "1",
"Sec-Fetch-Dest": "document",
"Sec-Fetch-Mode": "navigate",
"Sec-Fetch-Site": "none",
"Sec-Fetch-User": "?1",
"Connection": "keep-alive",
"Cookie": "_ga=GA1.2.1571820535.1667494818; _gid=GA1.2.1382638961.1668295256; prefetchAd_3897490=true; prefetchAd_4425332=true; AdskeeperStorage=%7B%220%22%3A%7B%22svspr%22%3A%22https%3A%2F%2Fyt1s.com%2Fen422%22%2C%22svsds%22%3A1%7D%2C%22C1351947%22%3A%7B%22page%22%3A1%2C%22time%22%3A1668295270300%7D%7D",
}
data = 'vid=F_HoMkkRHv8&k=0%2Be9UUPRJ7ryRtXVXps4nRRiPRF1HVSf5l7v3sVU68OOkTWDWvLn%2FHeNOCGBh7Z%2B3Xo%3D'
r = requests.post(url, headers= header, data = data)
print (r.text)
with open('output.txt', 'w') as output:
output.write(str(r.text))
when you try to display a get request in the text, it gives out incomprehensible characters, I would like it to produce normal text
Code:
headers = {
"authority": "www.ozon.ru",
"method": "GET",
"path": "/product/playstation-5-digital-edition-339866183/?asb=ZdbNZjh%252BgUCDpV0uw5ZLJUkaSn2wNH%252FSaAKJ%252BAxhX2M%253D&asb2=ayxVVx0ddcEtoLM3AwfnfVSDeZSpnVMgJu1dkk3rkjo&keywords=playstation+5&sh=0OBU25Oz",
"scheme": "https",
"accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
"accept-encoding": "gzip, deflate, br",
"accept-language": "ru-RU,ru;q=0.9,en-US;q=0.8,en;q=0.7",
"cache-control": "max-age=0",
"cookie": "__Secure-ext_xcid=704b112d0788105d7206457724d88846; _gcl_au=1.1.954892941.1622892064; visid_incap_2251426=KU5k7V3nRbi5hIuCe4vdFn1su2AAAAAAQUIPAAAAAADSheykofd2VQeH1V6sdne6; visid_incap_1101384=ng7gdJKQTtaTXDhwXw0BERteu2AAAAAAQkIPAAAAAACAxLmcAby/fnf6k3vP36Lor9I9dWxx3mbg; __Secure-ab-group=30; _abck=AD1EC4FEF31E5162861A4C24D2CA7963~-1~YAAQVxndWMOJ1Bl6AQAAzuyMdwYmY8viAq7FQPAI0crJs+Y7Tol5pA9DDuINFgy8m+dW33GrxDi2sthCys8Q8xdFoZ5b/+cj885D7t6jQxlVTWRyFksPOyCfG+aPZcNjWLG4gtLYGhig4GmVY2IhbziLiACrJVZ9tvvQe+bPDscWtCGH5oFB2KDTmr+/5anJzP52dInIJRinf0G36Uv6LmTBvJ5oqmtHns+wdvWHV2/XtFBwUrKukPL/yB4I534FenLEKBs/go7uQS0q8XCAoeXQuHxE+XXEHteC3ViGCfdsi83AQgjjXemaeBg6rIcc6GOo4HS+NPR/o20jeZaNPOw21BZoSvhmSzk3WWAoOxqjayhWTKVE/Uu0k/2n3yS2XuFjUsw9nmMtOslXKyWPHYWcAvAFzw==~-1~-1~-1; _gcl_dc=GCL.1631556671.CjwKCAjwyvaJBhBpEiwA8d38vJ0cwc6gyFvDIalkCBIdbC18GmVXVD0XxIgwZoz--ClC6ErOz2uVmRoCw6oQAvD_BwE; visid_incap_1285159=FmeU/lKJQvaWmkgD2CjChIi8QGEAAAAAQUIPAAAAAADss4s71kpz7hK78LrQuFFa; __Secure-user-id=0; _gcl_aw=GCL.1633529190.CjwKCAjwkvWKBhB4EiwA-GHjFrQIvq2PTJYu-I5iwQo9hU06pvsUVvjj37nTH7ACBQDGNCG1NvNHlBoCIvEQAvD_BwE; nlbi_1101384=wNPJODVEXFvoYA/LK8plmQAAAACQSMQv10AIEBv9M1qG+ZgE; incap_ses_584_1101384=8gq4QeKdPVbcmBCIpMkaCMfCZWEAAAAAxQKn2KtmDQeSxEYbjPxM4Q==; xcid=b0d40e42f20927f1b6ef2f74056069fb; incap_ses_633_1101384=NoBfaWnlixsskJIVP97ICA2dZmEAAAAARs7zxjAA1ruartIDr0d2SA==; __Secure-access-token=3.0.BrCcd3kzR2aUcKDrHyfVvw.30.l8cMBQAAAABhQLuyDO5qoKN3ZWKgAICQoA..20211013104710.f77kLnpPyPCZUz33bipJ1qSFh7n4QIBACd22xU-M_sE; __Secure-refresh-token=3.0.BrCcd3kzR2aUcKDrHyfVvw.30.l8cMBQAAAABhQLuyDO5qoKN3ZWKgAICQoA..20211013104710.CAQWWNrTHcPBdzYVN9iOE7QB4LwfW4rmHjDJEszki5A; incap_ses_585_1101384=y3r7Vz0DllfLWjWKF1ceCA2dZmEAAAAAne5pnSZ7U0xGgsv7j+fBMA==",
"referer": "https://www.ozon.ru/cart",
"sec-ch-ua": '"Chromium";v="94", "Google Chrome";v="94", ";Not A Brand";v="99"',
"sec-ch-ua-mobile": "?0",
"sec-ch-ua-platform": "Windows",
"sec-fetch-dest": "document",
"sec-fetch-mode": "navigate",
"sec-fetch-site": "same-origin",
"sec-fetch-user": "?1",
"upgrade-insecure-requests": "1",
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.81 Safari/537.36",
}
r = requests.get('https://www.ozon.ru/product/playstation-5-digital-edition-339866183/?asb=ZdbNZjh%252BgUCDpV0uw5ZLJUkaSn2wNH%252FSaAKJ%252BAxhX2M%253D&asb2=ayxVVx0ddcEtoLM3AwfnfVSDeZSpnVMgJu1dkk3rkjo&keywords=playstation+5&sh=0OBU25Oz', headers=headers).text
print(r)
response
how to make normal text output?
It looks like you're getting compressed output. (You could verify that by printing r.headers and looking at Content-Encoding.)
Remove the
"accept-encoding": "gzip, deflate, br",
request header because that claims you can accept brotli-compressed content, which Requests by default doesn't handle.
Since you specified accept-encoding, the server performed compression before sending the data to you. ~~It's a Zip compression (begins with PK)~~
Just don't specify accept-encoding at all and the server will likely send you an uncompressed data stream.
More about accept-encoding
I am new on python. I am sending POST Request using this line of code:
response = requests.post(url=API_ENDPOINT, headers=headers, data=payload)
The problem is that the values of header are dynamic(they are different every time on browser).
These are the headers in browser:
headers = {
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
"Accept-Encoding": "gzip, deflate, br",
"Accept-Language": "en-US,en;q=0.5",
"Connection": "keep-alive",
"Content-Length": "276",
"Content-Type": "application/x-www-form-urlencoded",
"Cookie": "acceptedCookie=%7B%22type%22%3A%22all%22%7D; TS01a14d32=01f893c9654ba8a49f70366efc3464fd76d4a461343cf44a7f074a5071b9818b6b196051effd669b784f691c8fab79bdc5a7efada418db04fc3cf8c3e43224fe186e64941eab43b5d9500201644abda7c0f5914ebb9ab95046ee2cb83c43f259ab0ed0e538fee3db50b2aa541ee5646d70634cea4cec54352547d3366c51e2ae5270756ee57bf78d915dcb8209c9c5771956c715bd75fb761bf42da6ba5cfa34ffbfee670e871ed33f8e25c09fdfc882953efd981f; ASLBSA=85b54f44c65f329c72b20a3ee7a9fc9a63d44001bc2c4e2c2b2f26fdaba7e0e3; ASLBSACORS=85b54f44c65f329c72b20a3ee7a9fc9a63d44001bc2c4e2c2b2f26fdaba7e0e3; utag_main=v_id:0179ddda773b0020fa6584d13ce40004e024f00d00978$_sn:2$_ss:1$_st:1623016566769$_pn:1%3Bexp-session$ses_id:1623014766769%3Bexp-session; s_cc=true; s_fid=3BE425806C624053-0396695F1870C86E; s_sq=luxmyluxottica%3D%2526pid%253DSite%25253APreLogin%25253ALogin%2526pidt%253D1%2526oid%253DLOGIN%2526oidt%253D3%2526ot%253DSUBMIT; todayVisit=true",
"Host": "mywebsite.com",
"Origin": "https://mywebsite.com",
"Referer": "https://mywebsite.com",
"TE": "Trailers",
"Upgrade-Insecure-Requests": "1",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:88.0) Gecko/20100101 Firefox/88.0"
}
The value of content length, cookies, Accept parameter is different every time whenever I hit the API on browser, so I cannot just copy paste the values of headers and send it on POST request. How to generate this dynamic header(how to generate content length, cookies etc)? Please help.
I am trying to extract information for all the jobs on this website:
https://www.americanmobile.com/travel-nursing-jobs/search/
Looking at the network activity tab, it looks like all the data I need comes from a POST request made here:
https://jobs.amnhealthcare.com/api/jobs//search. I have attached an image that may help confirm exactly what I am referencing.
example_1
I wrote the following code in Google Colab to try to at least get the first 10 results. Referencing python requests POST with header and parameters, I know a lot of the headers may not even be necessary. I have tried sending this request without any headers at all.
Is what I'm trying to do even possible? I have only gotten a 400 response code so far.
If it is possible to accomplish this, is it possible to extract this information for all 4k + jobs?
import requests
import re
payload = {
"PageNumber": "1",
"JobsPerPage": "10",
"Filters": {
"Locations": "[]",
"Skillset": "[]",
"StartDates": "[]",
"Shifts": "[]",
"Durations": "[]",
"IsCovid19Job": "false",
"Exclusive": "false",
"Skillsets": "[]",
"DivisionCompanyId": "2",
"LocationSearch": "",
"KeywordSearch": "",
"DaxtraJobIds": "[]"
},
"SortOrder": {
"Header": "MaxPayRate",
"SortDirection": "Descending"
}
}
headers = {
"Host": "jobs.amnhealthcare.com",
"Connection": "keep-alive",
"Content-Length": "315",
"Accept": "application/json, text/plain, */*",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.90 Safari/537.36",
"Content-Type": "application/json;charset=UTF-8",
"Origin": "https://www.americanmobile.com",
"Sec-Fetch-Site": "cross-site",
"Sec-Fetch-Mode": "cors",
"Sec-Fetch-Dest": "empty",
"Referer": "https://www.americanmobile.com/",
"Accept-Encoding": "gzip, deflate, br",
"Accept-Language": "en-US,en;q=0.9"
}
response = requests.post('https://jobs.amnhealthcare.com/api/jobs//search' , data=p , headers=headers)
Thank you
The formatting of your data wasn't entirely correct. This should work:
import requests
headers = {
"Host": "jobs.amnhealthcare.com",
"Connection": "keep-alive",
"Content-Length": "315",
"Accept": "application/json, text/plain, */*",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.90 Safari/537.36",
"Content-Type": "application/json;charset=UTF-8",
"Origin": "https://www.americanmobile.com",
"Sec-Fetch-Site": "cross-site",
"Sec-Fetch-Mode": "cors",
"Sec-Fetch-Dest": "empty",
"Referer": "https://www.americanmobile.com/",
"Accept-Encoding": "gzip, deflate, br",
"Accept-Language": "en-US,en;q=0.9"
}
data = '{"PageNumber":1,"JobsPerPage":10,"Filters":{"Locations":[],"Skillset":[],"StartDates":[],"Shifts":[],"Durations":[],"IsCovid19Job":false,"Exclusive":false,"Skillsets":[],"DivisionCompanyId":2,"LocationSearch":"","KeywordSearch":"","DaxtraJobIds":[]},"SortOrder":{"Header":"MaxPayRate","SortDirection":"Descending"}}'
response = requests.post('https://jobs.amnhealthcare.com/api/jobs//search', headers=headers, data=data)
You can now adapt JobsPerPage and PageNumber to retrieve all the posts you need in a for-loop.
I am trying to scrape German zip codes (PLZ) for a given street in a given city using Python's requests on this server. I am trying to apply what I learned here.
I want to return the PLZ of
Schanzäckerstr. in Nürnberg.
import requests
url = 'https://www.11880.com/ajax/getsuggestedcities/schanz%C3%A4ckerstra%C3%9Fe%20n%C3%BCrnberg?searchString=schanz%25C3%25A4ckerstra%25C3%259Fe%2520n%25C3%25BCrnberg'
data = 'searchString=schanz%25C3%25A4ckerstra%25C3%259Fe%2520n%25C3%25BCrnberg'
headers = {"Authority": "wwww.11880.com",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:70.0) Gecko/20100101 Firefox/70.0",
"Accept": "application/json, text/javascript, */*; q=0.01",
"Accept-Language": "de-DE,de;q=0.9,en-US;q=0.8,en;q=0.7",
"Accept-Encoding": "gzip, deflate, br",
"X-Requested-With": "XMLHttpRequest",
"Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
"Content-Length": "400",
"Origin": "https://www.postleitzahlen.de",
"Sec-Fetch-Site": "cross-site",
"Fetch-Mode": "cors",
"DNT": "1",
"Connection": "keep-alive",
"Referer": "https://www.postleitzahlen.de",
}
multipart_data = {(None, data,)}
session = requests.Session()
response = session.get(url, files=multipart_data, headers=headers)
print(response.text)
The above code yields an empty response of the type 200. I want to return:
'90443'
I was able to solve this problem using nominatim openstreetmap API. One can also add street numbers
import requests
city = 'Nürnberg'
street = 'Schanzäckerstr. 2'
response = requests.get( 'https://nominatim.openstreetmap.org/search', headers={'User-Agent': 'PLZ_scrape'}, params={'city': city, 'street': street[1], 'format': 'json', 'addressdetails': '1'}, )
print(street, ',', [i.get('address').get('postcode') for i in response.json()][0])
Make sure to only send one request per second.