import requests
url = 'https://cmoffice.kenes.com/cmsearchableprogrammev15/conferencemanager/CM_W3_SearchableProgram/api/persionid/anonymous/type/normal/getfilteredsessions/conference/igcs19'
headers = {'accept': '*/*',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en-GB,en-US;q=0.9,en;q=0.8',
'content-type': 'application/json; charset=UTF-8',
'cookie': '_ga=GA1.2.471841928.1549896884; _gid=GA1.2.1479150813.1563120868; __RequestVerificationToken_L2NtU2VhcmNoYWJsZVByb2dyYW1tZVYxNQ2=t57HyXHVNBIm0HZ33v1WyG8hRa4j4RlDEOvFtEfPakPgH5AutBjAN5pSRHnBx_BpBhbMnH6R-tIhSdop_VMtLF-aY7XcXTRFt7vg5X46zgE1; _gat=1',
'origin': 'https://cmoffice.kenes.com',
'referer': 'https://cmoffice.kenes.com/cmsearchableprogrammeV15/conferencemanager/programme/personid/anonymous/igcs19/normal/b833d15f547f3cf698a5e922754684fa334885ed',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36',
'x-requested-with': 'XMLHttpRequest'}
response = requests.post(url, headers = headers)
print(response)
Gives Response [500]
However browser is able to get a json response with status_code 200
Can anyone shed some light why and how to solve this problem?
Something appears not to be right in the backend. It returns a 500 when you try to post to it, which could be actually anything like for example missing configuration or programming errors.
If I hit the given URL in a browser I get actually a 405 'method not allowed' error.
Related
Why does this function fail to read XML from "https://www.seattletimes.com/feed/"?
I can visit the URL from my browser just fine. It also reads XML from other websites without a problem ("https://news.ycombinator.com/rss").
import urllib
def get_url(u):
header = {'User-Agent': 'Mozilla/5.0'}
request = urllib.request.Request(url=url, headers=header)
response = urllib.request.urlopen(request)
return response.read().decode('utf-8')
url = 'https://www.seattletimes.com/feed/'
feed = get_url(url)
print(feed)
The program times out every time.
Ideas?:
Maybe header need more info (Accept, etc.)?
EDIT1:
I replaced with the request header from the script with my browser header. Still no-go.
header = {
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'Accept-Encoding': 'gzip, deflate',
'Accept-Language': 'en-US,en;q=0.9',
'Connection': 'keep-alive',
'Accept-Language': 'en-US,en;q=0.9',
'Connection': 'keep-alive',
'Upgrade-Insecure-Requests': '1',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36' }
I am not quite sure why but the header/user-agent was confusing the website. If you remove it your code works just fine. I've tried different header arguments without issues, the user-agent seems to be what causes that behaviour.
import urllib.request
def get_url(u):
request = urllib.request.Request(url=url)
response = urllib.request.urlopen(request)
return response.read().decode('utf-8')
url = 'https://www.seattletimes.com/feed/'
feed = get_url(url)
print(feed)
After some debugging I have found a legal header combination (keep in mind I consider this a bug on their end):
header = {
'User-Agent': 'Mozilla/5.0',
'Cookie': 'PHPSESSID=kfdkdofsdj99g36l443862qeq2',
'Accept-Language': "de-DE,de;q=0.9,en-US;q=0.8,en;q=0.7",}
I am using python-request to get some data .
I get the response with 200 status but is not complete, I think its due to the strange characters of the response because it works correctly in postman.
This is my call:
headers = {
'Connection': 'keep-alive',
'Accept': 'application/json, text/plain, */*',
'Accept-Language': 'en',
'x-access-token': token,
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36',
'Sec-Fetch-Site': 'same-origin',
'Sec-Fetch-Mode': 'cors',
'Sec-Fetch-Dest': 'empty',
}
r= requests.get(url ,headers=headers, cert=(ca_cert,ca_key))
This is the response in python
b'{"entities":[],"pagination":{"limit":1000,"offset":0,"count":0},"sort":{"orderDirection":"ASC","orderFieldName":"name"}}'
This is the response in postman:
{"entities":[{"id":"ff80808172c6601d0172ddc6a4f04947","name":{"ar":"2019 الانتخابات الفرعية للبلاكتاون ، كوتاموندرا وموراي","tw":"2019年布莱克敦,库塔曼德拉和默里的州补选","vi":"Cuộc bầu cử quốc gia năm 2019","el":"2019 Δημόσιες βουλευτικές εκλογές για τους","en":"NSW State General Election 2019","it":"Elezioni suppletive dello stato del 2019","cn":"2019年布萊克敦,庫塔曼德拉和默里的州補選"},"alias":"SG1901","welcomeText":{"ar":"الرسالة الافتراضية","tw":"默認消息","vi":"Thông báo mặc định","el":"Προεπιλεγμένο μήνυμα","en":"Default Message","it":"Messaggio predefinito","cn":"默认消息"},"startDate":1552251600000,"endDate":1616482800000,"boardConfiguration":"SECURITY_CERTIFICATES_PREDEFINED_CERTS","securityModel":"VERIFIABLE_MIXING","electoralBoardCreated":true,"adminBoardCreated":true,"bothBoardsCreated":false,"locales":["en","it","el","ar","tw","cn","vi"],"numElections":1}],"pagination":{"limit":1000,"offset":0,"count":1},"sort":{"orderDirection":"ASC","orderFieldName":"name"}}
How can I get the full response in python?
The problem was that I was trying to get the information very fast.( this information is generated in server after y upload a csv file).
So I resolved with a sleep at the end.
Thanks anyway for your attention.
Im trying to post login to this site using python requests post. First time i can requests for 3-4 times. But until 5 times i got 403 error from the server.
I already tried to set headers, included referer,origin,user-agent and proxy but not helped much.
import json
import requests
response = requests.Session()
url = 'https://www.saksfifthavenue.com/account/login?_k=%2Faccount%2Fsummary'
while True:
try:
headers = {
'sec-fetch-mode': 'cors',
'origin': 'https://www.saksfifthavenue.com',
'accept-encoding': 'gzip, deflate',
'accept-language': 'en-US',
'user-agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.90 Safari/537.36',
'content-type': 'application/json;charset=UTF-8',
'accept': 'application/json, text/plain, */*',
'referer': url,
'authority': 'www.saksfifthavenue.com',
'sec-fetch-site': 'same-origin',
'dnt': '1',
}
data = {"username":"demo#gmail.com,"password":"Thisisatest"}
login = response.post(
'https://www.saksfifthavenue.com/v1/account-service/accounts/sign-in', headers=headers, data=json.dumps(data)).content
loginCheck = login.decode()
print(loginCheck)
if "Sorry, this does not match our records. Please try again." in loginCheck:
print('Login failed!!!')
break
elif """Your Account""" in loginCheck:
print('Login success!!!')
else:
print('403 Error. Login Failed')
break
except:
pass
It looks like the server detected your spide requests, you request too fast, maybe you can try to set an interval for these requests.
But why you need to login post in a while? (and without logout
I am scraping a number of websites for data. Many websites I have no problem scraping at all, but a couple return encrypted data. I have created a basic demo below of what is going on. Is there a way to decrypt the returned results?
headers_Get = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.9',
'Accept-Encoding': 'gzip, deflate, br',
'DNT': '1',
'Connection': 'keep-alive',
'Upgrade-Insecure-Requests': '1'
}
q = 'www.nike.com'
s = requests.Session()
url = 'http://' + q
r = s.get(url, headers=headers_Get)
r.text
The above code returns the expected html from Nike.Com.
However, if we run the same code and replace q = 'www.nike.com', with q = 'www.vanityfair.com' we receive code that looks like the following:
\x1bX�U?�(J�\x1a��|=;�:���N�\x01��J�.��$�D[����1�\x11[T2/����rq}�\x00ʁ�\x06(��J,�ܳR�\'Gs�я�l�\n���)�Qf��\x11�\x15�\x80��\r\x1d�o �<�o�??>}�������\x07��\n�\x1dE\ti�\x19\x01D�)�z\x06\x00p�\x18�e\n(�s&��\x1c��ga$e\n�PGd\x07琚\x17I�8�ީ�A�\x1f�c^�C�zh�Ǵ�t��#�X��wbl\x18�|}[��o���g\x02;����8+��:6\x039���-\x19\x1b��Q���\t\x1aJJ\x1b�\x11��\rq\x0c\x11��p�Q\x10\x18����\x14͋��\x0bus��e3X�w�狔�\x1d��6�nwen�\x02\x08�J�O�߯ףQ�T\x0c�P����0���]]��bI��5��Em/n��������ze�n.Wx��(\x05���+}���^�.qa����E�V�e���}w}�\x16�U]/�]-�d͋$ਡ�aėup��m���o\x06'
Im guessing this is the site upgrading the insecure request, but how can I decrpyt these results to receive the expected html code like Nike?
Note: I get the same results with post and get.
Make the request without the Accept-Encoding header, that way the server doesn't compress the message to be sent
I have written some code for scraping
that program uses requests.get(url, headers=headers)
with headers exactly same with my Chrome browser except cookie
Initially, It works fine. but later. It gets 403 error
My Chrome browser get that data very well without error
but My python requests code doesn't work. What is the problem. I don't know
url = 'http://www.matchesfashion.com/en-kr/products/1171735'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.90 Whale/0.10.36.11 Safari/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
'Accept-Language': 'ko-KR,ko;q=0.8,en-US;q=0.6,en;q=0.4',
'Host': 'www.matchesfashion.com',
'Upgrade-Insecure-Requests': '1',
'Cache-Control': 'max-age=0',
'Accept-Encoding':'gzip, deflate'}
r = requests.get(url, headers=headers)