The website exists but request.head/get times out - python

I have written a Python script to check whether a website exists or not. Everything works fine, except when checking http://www.dhl.com - the request times out. I have tried both GET and HEAD methods. I used https://httpstatus.io/ and https://app.urlcheckr.com/ to check DHL website and the result is error. The DHL website DOES exist! Here is my code:
import requests
a ='http://www.dhl.com'
def check(url):
try:
header = {'User-Agent':'Mozilla/5.0 (X11; CrOS x86_64 8172.45.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.64 Safari/537.36'}
request = requests.head(url, headers = header , timeout = 60)
code = request.status_code
if code < 400:
return "Exist",str(code)
else:
return "Not exist", str(code)
except Exception as e:
return "Not Exist",str(type(e).__name__)
print(check(a))
How can I resolve this error?

Testing with curl shows you need a couple of other headers for that DHL site
import requests
url = 'http://www.dhl.com'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.135 Safari/537.36',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'en-US,en;q=0.9,fil;q=0.8',
}
request = requests.head(url, headers=headers, timeout=60, allow_redirects=True)
print(request.status_code, request.reason)
print(request.history)
Without these headers, curl never gets a response.

Related

Failed to log in to a website using the requests module

I'm trying to log in to a website through a python script that I've created using the requests module. I've issued a post HTTP request with appropriate parameters and headers to the server, but for some reason I get a different response from that site compared to what I see in dev tools. The status is always 200, though. There is also a get request in place within the script that should fetch the credentials once the login is successful. Currently, it throws a JSONDecodeError on the last line.
import requests
link = 'https://propwire.com/login'
check_url = 'https://propwire.com/search'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36',
'x-requested-with': 'XMLHttpRequest',
'referer': 'https://propwire.com/login',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en-US,en;q=0.9,bn;q=0.8',
'origin': 'https://propwire.com',
}
payload = {"email":"some-email","password":"password","remember":"true"}
with requests.Session() as s:
r = s.get(link)
headers['x-xsrf-token'] = r.cookies['XSRF-TOKEN'].rstrip('%3D')
s.headers.update(headers)
s.post(link,json=payload)
res = s.get(check_url)
print(res.json()['props']['auth'])

why can't it print response after making a request to an api/url?

I just found the bestbuyCA api by inspecting the xhr.
aboveurl = 'https://www.bestbuy.ca/ecomm-api/availability/products?accept=application%2Fvnd.bestbuy.simpleproduct.v1%2Bjson&accept-language=en-CA&skus=14962185'
I've tried::
response= requests.get(aboveurl)
print(response.text)
//
r = requests.get(url).json()
print(r)
When I run my code in vsc, it starts and keeps running but it will not display anything.
You have to add a header to your request to get the request result:
import requests
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.76 Safari/537.36', "Upgrade-Insecure-Requests": "1","DNT": "1","Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8","Accept-Language": "en-US,en;q=0.5","Accept-Encoding": "gzip, deflate"}
aboveurl = "https://www.bestbuy.ca/ecomm-api/availability/products?accept=application%2Fvnd.bestbuy.simpleproduct.v1%2Bjson&accept-language=en-CA&skus=14962185"
html = requests.get(aboveurl,headers=headers)
print(f'requrest code: {html.status_code}\n request text: {html.text}')

unable to fetch json data - JSONDecodeError: Expecting value

I'm new to python and struggling with below.
The website page URL is https://www.nseindia.com/market-data/equity-derivatives-watch and when we select "Nifty 50 Futures" and upon inspect, we get the api URL as https://www.nseindia.com/api/liveEquity-derivatives?index=nse50_fut.
Now the issue is this json opens up on browser but from python it does not open and gives JSONDecodeError error. I have included right header but still it fails.
One more observation is that when i load this api directly in browser, the python code gets json data once but it does not work there after. One thing i noticed is that a new cookies is set on every page refresh.
Can anyone help me where I'm missing.
Code:
header = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36',
"accept-language": "en-US,en;q=0.9", "accept-encoding": "gzip, deflate, br", "accept": "*/*"}
URL = "https://www.nseindia.com/api/liveEquity-derivatives?index=nse50_fut"
fut_json = requests.get(URL, headers = header).json()
print(fut_json)
File "C:\ProgramData\Anaconda3\lib\site-packages\simplejson\decoder.py", line 400, in raw_decode
return self.scan_once(s, idx=_w(s, idx).end())
JSONDecodeError: Expecting value
You need cookies to get the response as JSON, as without them you get Resource not found.
Here's how:
import requests
headers = {
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36',
}
with requests.Session() as s:
r = s.get("https://www.nseindia.com", headers=headers)
api_url = "https://www.nseindia.com/api/liveEquity-derivatives?index=nse50_fut"
response = s.get(api_url, headers=headers).json()
print(response["marketStatus"]["marketStatusMessage"])
Output:
Market is Closed

Why is Python requests.get() is retrieving outdated data from API?

Context:
I'm making GET requests to an API, and the API sometimes returns data that is up to 5 minutes old. However, when making the same request on Chrome, the data is always up to date. The server is ngnix.
This is the API request made when the page is loaded in Chrome:
https://buff.163.com/api/market/goods/sell_order?game=csgo&goods_id=781660&_=1604808126524
Relevant Code:
def epochTimestamp():
return int(round(datetime.now().timestamp()*1000))
def getProxies():
proxy = random.choice(proxies)
return {'http': fr'socks5h://{proxy}', 'https': fr'socks5h://{proxy}'}
get_purchase_headers = {
'Host': 'buff.163.com',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.183 Safari/537.36',
'X-Requested-With': 'XMLHttpRequest',
'Cache-Control': 'max-age=0'
}
url = f"https://buff.163.com/api/market/goods/sell_order?game=csgo&goods_id=781660&_={epochTimestamp()}"
source = requests.get(url, timeout=10, proxies=getProxies(), headers=get_purchase_headers)
What I have tried:
Including User-Agent headers
'Cache-Control': 'max-age=0'
Including timestamp in the URL

website visit using python requests doesn't count in google analytics

website visit using python requests doesn't count in google analytics real time
I am using python requests module and google counts the visit but not found in google analytics real time (active users)
my code is below:
import requests
import time
agent_android = 'Mozilla/5.0 (Linux; Android 5.1.1; Nexus 5 Build/LMY48B; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/43.0.2357.65 Mobile Safari/537.36'
headers = {'User-Agent': agent_android}
response = requests.get(url, headers=headers)
print(response.content)
time.sleep(300)
I suggest you have a look at selenium
It's perfect for such purposes. Here an example:
from selenium import webdriver
import time
def main():
url = "https:youURL.com"
driver = webdriver.Firefox()
driver.get(url=url)
time.sleep(300)
driver.quit()
if __name__ == '__main__':
main()
`import requests
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'en-US,en;q=0.9',
'Host': 'www.google.com',
'Referer': 'https://www.google.com',
'Cookie': 'my_cookie=1234567890; another_cookie=abcdefghij'
}
try:
response = requests.get(url, allow_redirects=False)
if response.status_code == 200:
print('Success')
else:
print('Failed')
except requests.exceptions.ConnectionError:
print('Connection error')
except requests.exceptions.Timeout:
print('Timeout error')
`

Categories

Resources