I'm trying to run this code:
import requests
import json
print(requests.__version__)
print(json.__version__)
headers = {
'Accept': '*/*',
'Accept-Encoding': 'gzip, deflate, br',
'Host': 'www.soraredata.com',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML,
like Gecko) Chrome/96.0.4664.45 Safari/537.36',
'Accept-Language': 'en-GB,en;q=0.9',
'Referer':'https://www.soraredata.com/player/17512868900934537680021886
28460549415375229654518317941780411003457747672993',
'Connection': 'keep-alive',
}
req = requests.Request('GET',
'https://www.soraredata.com/api/players/info/
1751286890093453768002188628460549415375229654518317941780411003457747672993',
headers=headers)
resp = requests.Session().send(req.prepare())
print(resp.status_code)
On programiz.com works fine, gives 200 as the status code.
But it does not work on my PC, even though the code is the same and even the packages versions. I even tried with different python versions, but it did not work out.
I can t understand why it does not return 200. I hope someone can illuminate me.
I appreciate any help you can provide.
This is happened because while you are running your program in programiz.com that it can load the req link that is player info.
And while you are running the same program in the application that can't load the req link (check your system it may be offline)
Related
I'm trying to log in to a website through a python script that I've created using the requests module. I've issued a post HTTP request with appropriate parameters and headers to the server, but for some reason I get a different response from that site compared to what I see in dev tools. The status is always 200, though. There is also a get request in place within the script that should fetch the credentials once the login is successful. Currently, it throws a JSONDecodeError on the last line.
import requests
link = 'https://propwire.com/login'
check_url = 'https://propwire.com/search'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36',
'x-requested-with': 'XMLHttpRequest',
'referer': 'https://propwire.com/login',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en-US,en;q=0.9,bn;q=0.8',
'origin': 'https://propwire.com',
}
payload = {"email":"some-email","password":"password","remember":"true"}
with requests.Session() as s:
r = s.get(link)
headers['x-xsrf-token'] = r.cookies['XSRF-TOKEN'].rstrip('%3D')
s.headers.update(headers)
s.post(link,json=payload)
res = s.get(check_url)
print(res.json()['props']['auth'])
import requests
url = 'https://cmoffice.kenes.com/cmsearchableprogrammev15/conferencemanager/CM_W3_SearchableProgram/api/persionid/anonymous/type/normal/getfilteredsessions/conference/igcs19'
headers = {'accept': '*/*',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en-GB,en-US;q=0.9,en;q=0.8',
'content-type': 'application/json; charset=UTF-8',
'cookie': '_ga=GA1.2.471841928.1549896884; _gid=GA1.2.1479150813.1563120868; __RequestVerificationToken_L2NtU2VhcmNoYWJsZVByb2dyYW1tZVYxNQ2=t57HyXHVNBIm0HZ33v1WyG8hRa4j4RlDEOvFtEfPakPgH5AutBjAN5pSRHnBx_BpBhbMnH6R-tIhSdop_VMtLF-aY7XcXTRFt7vg5X46zgE1; _gat=1',
'origin': 'https://cmoffice.kenes.com',
'referer': 'https://cmoffice.kenes.com/cmsearchableprogrammeV15/conferencemanager/programme/personid/anonymous/igcs19/normal/b833d15f547f3cf698a5e922754684fa334885ed',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36',
'x-requested-with': 'XMLHttpRequest'}
response = requests.post(url, headers = headers)
print(response)
Gives Response [500]
However browser is able to get a json response with status_code 200
Can anyone shed some light why and how to solve this problem?
Something appears not to be right in the backend. It returns a 500 when you try to post to it, which could be actually anything like for example missing configuration or programming errors.
If I hit the given URL in a browser I get actually a 405 'method not allowed' error.
I am scraping a number of websites for data. Many websites I have no problem scraping at all, but a couple return encrypted data. I have created a basic demo below of what is going on. Is there a way to decrypt the returned results?
headers_Get = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.9',
'Accept-Encoding': 'gzip, deflate, br',
'DNT': '1',
'Connection': 'keep-alive',
'Upgrade-Insecure-Requests': '1'
}
q = 'www.nike.com'
s = requests.Session()
url = 'http://' + q
r = s.get(url, headers=headers_Get)
r.text
The above code returns the expected html from Nike.Com.
However, if we run the same code and replace q = 'www.nike.com', with q = 'www.vanityfair.com' we receive code that looks like the following:
\x1bX�U?�(J�\x1a��|=;�:���N�\x01��J�.��$�D[����1�\x11[T2/����rq}�\x00ʁ�\x06(��J,�ܳR�\'Gs�я�l�\n���)�Qf��\x11�\x15�\x80��\r\x1d�o �<�o�??>}�������\x07��\n�\x1dE\ti�\x19\x01D�)�z\x06\x00p�\x18�e\n(�s&��\x1c��ga$e\n�PGd\x07琚\x17I�8�ީ�A�\x1f�c^�C�zh�Ǵ�t��#�X��wbl\x18�|}[��o���g\x02;����8+��:6\x039���-\x19\x1b��Q���\t\x1aJJ\x1b�\x11��\rq\x0c\x11��p�Q\x10\x18����\x14͋��\x0bus��e3X�w�狔�\x1d��6�nwen�\x02\x08�J�O�߯ףQ�T\x0c�P����0���]]��bI��5��Em/n��������ze�n.Wx��(\x05���+}���^�.qa����E�V�e���}w}�\x16�U]/�]-�d͋$ਡ�aėup��m���o\x06'
Im guessing this is the site upgrading the insecure request, but how can I decrpyt these results to receive the expected html code like Nike?
Note: I get the same results with post and get.
Make the request without the Accept-Encoding header, that way the server doesn't compress the message to be sent
I would like to monitor a particular URL and wait until it internally redirects me by using python requests. The website will randomly redirect me after a period of time. However, I am having some issues right now. The strategy I have employed so far is something like this:
headers = {
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'en-US,en;q=0.9',
'Cache-Control': 'no-cache',
'Connection': 'keep-alive',
'Pragma': 'no-cache',
'Upgrade-Insecure-Requests': '1',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36'
}
session = requests.Session()
while success is False:
r = session.get(url, headers=headers, allow_redirects=True)
if keyword in r.text:
success = True
time.sleep(30)
print("Success.")
It seems as though every time I make a GET request, the timer is reset and so I am never redirected, I thought a session would fix this but perhaps not. Although, how am I meant to check for changes to the page without sending a new request every 30 seconds? Looking at the network tab in Chrome it seems as though the status code is 307.
If anyone knows how to resolve this issue it would be very helpful, thanks.
Selenium is the quick and ugly answer:
from selenium import webdriver
profile = webdriver.FirefoxProfile()
profile.set_preference("general.useragent.override", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Safari/537.36")
browser = webdriver.Firefox(profile)
browser.get(url)
while success is False:
text = browser.page_source
if keyword in text:
success = True
time.sleep(30)
print("Success.")
As far using requests goes, I'd hazard to guess that your web browser is requesting the reload, does the request in the network differ in anyway than the initial request? browsermob-proxy is a great tool for deep diving into these sorts of issues, it's effectively the network tab on steroids.
Apologies for the vagueness of the last half, but it's difficult to say more without having seen the website.
I have written some code for scraping
that program uses requests.get(url, headers=headers)
with headers exactly same with my Chrome browser except cookie
Initially, It works fine. but later. It gets 403 error
My Chrome browser get that data very well without error
but My python requests code doesn't work. What is the problem. I don't know
url = 'http://www.matchesfashion.com/en-kr/products/1171735'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.90 Whale/0.10.36.11 Safari/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
'Accept-Language': 'ko-KR,ko;q=0.8,en-US;q=0.6,en;q=0.4',
'Host': 'www.matchesfashion.com',
'Upgrade-Insecure-Requests': '1',
'Cache-Control': 'max-age=0',
'Accept-Encoding':'gzip, deflate'}
r = requests.get(url, headers=headers)