I see there are some URLs fetching metadata(json) for browser to render website i.e. when I hit example.com, in Firefox's developers view-> Network tab; some URL like https://example.com/server/getdata?cmd=showResults.
So, my question is I can access the that URL in new tab in same firefox window(expected json data). But I can't access the same URL in other firefox window(retuning empty json). It is maintaining some kind of session(may be with cookies?). I copied exact same http headers values from developers view and created python script with request at that moment to test. But the python script is retuning empty json
Example Screenshot
Python Code
parameter = {
"Accept": "application/json, text/javascript, */*; q=0.01",
"Accept-Encoding": "gzip, deflate, br",
"Accept-Language": "en-US,en;q=0.5",
"Cache-Control": "no-store, max-age=0",
"Connection": "keep-alive",
"Content-Length": "13175",
"Content-Type": "application/x-www-form-urlencoded",
"DNT": "1",
"Host": "in.example.com",
"Cookie": '__cfduid=xxxxxxxxxxxxxxxxx; __cfruid=xxxxxxx-1520022406; mqttuid=1.361660689',
"Referer": "https://in.example.com/page1/page2",
"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:58.0) Gecko/20100101 Firefox/58.0",
"X-Requested-With": "XMLHttpRequest"
}
response = requests.post(url="https://in.example.com/serv/getData?cmd=XXXX&type=XX&XXXX=1&_=1520022652009", data=parameter)
#print(dir(response))
print(response.headers)
print(response.json())
How can I simulate the session and directly hit the URL without hiting the root website?
PS: The site is static website
UPDATE1
changed header=parameters
response = requests.get(url="https://in.example.com/server1/getallData?cmd=xxxx&_=1520097652234", headers=parameter)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='in.example.com', port=443): Max retries exceeded with url: /server1/getallData?cmd=xxxxx&_=1520097652934 (Caused by <class 'ConnectionResetError'>: [Errno 104] Connection reset by peer)
Getting Connection Reset exception. Looks like CF is doing something? any ideas?
You are passing the "headers" as data?
You should use headers instead.
headers = {'User-Agent': 'Bot'}
requests.get('example.com/params', headers=headers)
Related
I'm trying to submit a file (test.exe) to a website using a POST request, but instead of a normal 302 response, it keeps responding with 500. I don't know what I could change in my request: maybe in the headers or in the files format, or maybe I need to somehow pass the data parameter?
I would appreciate any advice on this!
import requests
url = "https://cuckoo.cert.ee/submit/api/presubmit"
files = {"test.exe": open("test.exe", "rb")}
headers = {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate, br",
"Accept-Language": "en-US,en;q=0.9",
"Connection": "keep-alive",
"Content-Length": "199",
"Content-Type": "multipart/form-data; boundary=----WebKitFormBoundarymoUA16cLBrh9JNGC",
"Cookie": "csrftoken=O9tFpNhZuZrj7DsEnBAcj0wmV00z8qE3; theme=cyborg; csrftoken=O9tFpNhZuZrj7DsEnBAcj0wmV00z8qE3",
"Host": "cuckoo.cert.ee",
"Origin": "https://cuckoo.cert.ee",
"Referer": "https://cuckoo.cert.ee/submit/",
"sec-ch-ua-mobile": "?0",
"sec-ch-ua-platform": "Windows",
"Sec-Fetch-Dest": "empty",
"Sec-Fetch-Mode": "cors",
"Sec-Fetch-Site": "same-origin",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36",
"X-CSRFToken": "O9tFpNhZuZrj7DsEnBAcj0wmV00z8qE3"
}
response = requests.post(url, headers=headers, files=files, verify=False)
print(response)
Possibly try changing content type to application/octet-stream.
500 Error indicates that the website may not be able to handle the file that you are trying to upload. It could simply be that the website is malfunctioning or having a temporary failure.
If you have access to the back-end logs, I would recommend looking at that, or contacting the website to see if they have any suggestions.
EDIT:
Verify that your content matches the length that you are declaring as well, it looks like you have a content-length parameter declared in your request. Try taking that out to see if that helps.
I want to check the login status so. I make program to check it
import requests
import json
import datetime
headers = {
"Accept": "application/json, text/plain, */*",
"Accept-Encoding": "gzip, deflate",
"Accept-Language": "ko-KR,ko;q=0.9,en-US;q=0.8,en;q=0.7",
"Connection": "keep-alive",
"Content-Length": "50",
"Content-Type": "application/json;charset=UTF-8",
"Cookie": "_ga=GA1.2.290443894.1570500092; _gid=GA1.2.963761342.1579153496; JSESSIONID=A4B3165F23FBEA34B4BBE429D00F12DF",
"Host": "marke.ai",
"Origin": "http://marke",
"Referer": "http://marke/event2/login",
"User-Agent": "Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.117 Mobile Safari/537.36",
}
url = "http://mark/api/users/login"
va = {"username": "seg", "password": "egkegn"}
c = requests.post(url, data=json.dumps(va), headers=headers)
if c.status_code != 200:
print("error")
This is working very well in my windows local with Pycharm
but when i ran the code in Linux i got the error like this
requests.exceptions.ProxyError: HTTPConnectionPool(host='marke', port=80):
Max retries exceeded with url: http://marke.ai/api/users/login (
Caused by ProxyError('Cannot connect to proxy.',
NewConnectionError('<urllib3.connection.HTTPConnection>: Failed to establish a new connection: [Errno 110] Connection timed out',)
)
)
So.. what is the problem please teach me also if you know the solution please teach me!!
thank you
According to your error, it seems you are behind a proxy.
So you have to specify your proxy parameters when building your request.
Build your proxies as a dict following this format
proxies = {
"http": "http://my_proxy:my_port",
"https": "https://my_proxy:my_port"
}
If you don't know your proxy parameters, then you can get them using urllib module :
import urllib
proxies = urllib.request.getproxies()
There's a proxy server configured on that Linux host, and it can't connect to it.
Judging by the documentation, you may have a PROXY_URL environment variable set.
Modifying #Arkenys answer. Please try this.
import urllib.request
proxies = urllib.request.getproxies()
# all other things
c = requests.post(url, data=json.dumps(va), headers=headers, proxies=proxies)
I am trying to scrape a table of https://www.domeinquarantaine.nl/, however, for some reason, it does not give a response of the table
#The parameters
baseURL = "https://www.domeinquarantaine.nl/tabel.php"
PARAMS = {"qdate": "2019-04-21", "pagina": "2", "order": "karakter"}
DATA = {"qdate=2019-04-21&pagina=3&order="}
HEADERS = {"Host": "www.domeinquarantaine.nl",
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:65.0) Gecko/20100101 Firefox/65.0",
"Accept": "*/*",
"Accept-Language": "en-US,en;q=0.5",
"Accept-Encoding": "gzip, deflate, br",
"Referer": "https://www.domeinquarantaine.nl/",
"Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
"X-Requested-With": "XMLHttpRequest",
"Content-Length": "41",
"Connection": "keep-alive",
"Cookie": "_ga=GA1.2.1612813080.1548179877; PHPSESSID=5694f8e2e4f0b10e53ec2b54310c02cb; _gid=GA1.2.1715527396.1555747200"}
#POST request
r = requests.post(baseURL, headers = HEADERS, data = PARAMS)
#Checking the response
r.text
The response consists of strange tokens and question marks
So my question is why it is returning this response? And how to fix it to eventually end up with the scraped table?
Open web browser, turn off JavaScript and you will see what requests can get.
But using DevTools in Chrome/Firefox (tab Network, filter XHR requests) you should see POST request to url https://www.domeinquarantaine.nl/tabel.php and it sends back HTML with table.
If you open this url in browser then you see table - so you can get it event with GET but using POST you probably can filter data.
After writing this explanation I saw you already has this url in code - you didn't mention it in description.
You have different problem - you set
"Accept-Encoding": "gzip, deflate, br"
so server sends compressed response and you should uncompress it.
Or use
"Accept-Encoding": "deflate"
and server will send uncompressed data and you will see HTML with table
So there are a couple of reasons why you're getting what you're getting:
Your headers don't look correct
The data that you are sending contains some extra variables
The website requires cookies in order to display the table
This can be easily fixed by changing the data and headers variables and adding requests.session() to your code (which will automatically collect and inject cookies)
All in all your code should look like this:
import requests
session = requests.session()
headers = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:66.0) Gecko/20100101 Firefox/66.0", "Accept": "*/*", "Accept-Language": "en-US,en;q=0.5", "Accept-Encoding": "gzip, deflate", "Referer": "https://www.domeinquarantaine.nl/", "Content-Type": "application/x-www-form-urlencoded; charset=UTF-8", "X-Requested-With": "XMLHttpRequest", "DNT": "1", "Connection": "close"}
data={"qdate": "2019-04-20"}
session.get("https://www.domeinquarantaine.nl", headers=headers)
r = session.post("https://www.domeinquarantaine.nl/tabel.php", headers=headers, data=data)
r.text
Hope this helps!
I'm trying to automate the recovery of data from this website (The one I want is "
BVBG.086.01 PriceReport"). Checking with firefox, I found out that the request URL to which the POST is made is "http://www.bmf.com.br/arquivos1/lum-download_ipn.asp", and the parameters are:
hdnStatus: "ativo"
chkArquivoDownload_ativo "28"
txtDataDownload_ativo "09/02/2018"
imgSubmeter "Download"
txtDataDownload_externo_ativo [3]
0 "25/08/2017"
1 "25/08/2017"
2 "25/08/2017"
So, if I use hurl.it to make the request, the response is the correct 302 redirect (Pointing to a FTP URL where the requested files are, something like "Location: /FTP/Temp/10981738/Download.ex_"). (Example of the request here).
So I've tried doing the same with with the following code (Using python's library "requests", and I have tried both versions of request_body, trying to put it into the "data" parameter of the post method)
request_url = "http://www.bmf.com.br/arquivos1/lum-download_ipn.asp"
request_headers = {
"Host": "www.bmf.com.br",
"User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:54.0) Gecko/20100101 Firefox/54.0",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.5",
"Accept-Encoding": "gzip, deflate",
"Referer": "http://www.bmf.com.br/arquivos1/lum-arquivos_ipn.asp?idioma=pt-BR&status=ativo",
"Content-Type": "application/x-www-form-urlencoded",
"Content-Length": "236",
"Connection": "keep-alive",
"Upgrade-Insecure-Requests": "1"
}
# request_body = "hdnStatus=ativo&chkArquivoDownload_ativo=28&txtDataDownload_ativo=09/02/2018&imgSubmeter=Download&txtDataDownload_externo_ativo=25/08/2017&txtDataDownload_externo_ativo=25/08/2017&txtDataDownload_externo_ativo=25/08/2017"
request_body = {
"hdnStatus" : "ativo",
"chkArquivoDownload_ativo": "28",
"txtDataDownload_ativo": "09/02/2018",
"imgSubmeter": "Download",
"txtDataDownload_externo_ativo": ["25/08/2017", "25/08/2017", "25/08/2017"]
}
result_query = post(request_url, request_body, headers=request_headers)
# result_query = post(request_url, data=request_body, headers=request_headers)
for red in result_query.history:
print(BeautifulSoup(red.content, "lxml"))
print()
print(result_query.url)
And what I get is the following response:
<html><head><title>Object moved</title></head>
<body><h1>Object Moved</h1>This object may be found here.</body>
</html>
<html><head><title>Object moved</title></head>
<body><h1>Object Moved</h1>This object may be found here.</body>
</html>
<html><head><title>Object moved</title></head>
<body><h1>Object Moved</h1>This object may be found here.</body>
</html>
http://www.bmf.com.br/arquivos1/lum-arquivos_ipn.asp?idioma=pt-BR&status=ativo
And not the one I wanted (Which should point to the location of the file). What am I doing wrong here?
I'm having trouble understanding requests.
Let's say I have this request:
POST /user/follow HTTP/1.1
Host: www.website.com
User-Agent: some user agent
Accept: application/json, text/plain, */*
Accept-Language: pl,en-US;q=0.7,en;q=0.3
Referer: https://www.website.com/users/12345/profile
Content-Type: application/json;charset=utf-8
X-CSRF-TOKEN: Ab1/2cde3fGH
Content-Length: 27
Cookie: some-cookie=;
DNT: 1
Connection: close
{"targetUser":"12345"}
How am I supposed to use this information to send a valid request using python?
What I found is not really helpful. I need someone to show me an example with the data I gave you.
I would do something like this.
import requests
headers = {
"User-Agent": "some user agent",
"Content-Length": 27
# you get the point
}
data = {
"targetUser" : "12345"
}
url = "www.website.com/user/follow"
r = requests.post(url, headers=headers,data=data)
Yes, you would use cookies to log in. Cookies are a part of the headers.
I will not write poems i just give you some exapmle code:
import requests
headers = {
"User-Agent": "Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Firefox/45.0",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.5",
"Referer": "SOMETHING",
"Cookie": "SOMETHING",
"Connection": "close",
"Content-Type": "application/x-www-form-urlencoded"
}
data = "SOME DATA"
url = "https://example.com/something"
request = requests.post(url, headers=headers, data=data)
In headers you set needed header etc. you got it i think ;)
This Burp extension may help: Copy As Python-Requests
It can copy selected request(s) as Python-Requests invocations.
In your case, after copying as Python-Requests, you get:
import requests
burp0_url = "http://www.website.com:80/user/follow"
burp0_cookies = {"some-cookie": ""}
burp0_headers = {"User-Agent": "some user agent", "Accept": "application/json, text/plain, */*", "Accept-Language": "pl,en-US;q=0.7,en;q=0.3", "Referer": "https://www.website.com/users/12345/profile", "Content-Type": "application/json;charset=utf-8", "X-CSRF-TOKEN": "Ab1/2cde3fGH", "DNT": "1", "Connection": "close"}
burp0_json={"targetUser": "12345"}
requests.post(burp0_url, headers=burp0_headers, cookies=burp0_cookies, json=burp0_json)