I'm using requests in Python 3.8 in order to connect to an Amazon web page.
I'm also using tor, in order to connect via SOCKS5.
This is the relevant piece of code:
session = requests.session()
session.headers.update({'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) '
'Chrome/44.0.2403.157 Safari/537.36'})
anon = {'http': "socks5://localhost:9050", 'https': "socks5://localhost:9050"}
r = session.get("myurl", proxies=anon)
print(r.content)
However, it doesn't work. It gives me the Amazon 503 error. What I need to know is if there is some method to bypass this problem or it depends on a sort of "ip blocking".
Thank you
Related
I am using the requests library to parse the website but it returns the mobile version of the site. How can I get the HTML page of the desktop version?
import requests
sess = requests.Session()
sess.get("https://google.com/")
Try this:
from requests_html import HTMLSession
session = HTMLSession()
session.headers['user-agent'] = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.97 Safari/537.36'
response = session.get(url, headers=session.headers)
I can see that google gets a User-Agent header specifying the browser, from the inspect element of my Mozilla I can see that google gets
User-Agent Mozilla/5.0 etc..
Maybe you can try to send a header tricking Google to think it's being accessed from a browser.
I am trying to access https://www.nseindia.com/api/equity-stockIndices?index=NIFTY%2050. It is working fine from my localhost (code compiled in vscode) but when I deploy it on the server I get HTTP 499 error.
Did anybody get through this and was able to fetch the data using this approach?
Looks like nse is blocking the request somehow. But then how is it working from a localhost?
P.S. - I am a paid user of pythonAnywhere (Hacker) subscription
import requests
import time
def marketDatafn(query):
headers = {'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.87 Safari/537.36'}
main_url = "https://www.nseindia.com/"
session = requests.Session()
response = session.get(main_url, headers=headers)
cookies = response.cookies
url = "https://www.nseindia.com/api/equity-stockIndices?index=NIFTY%2050"
nifty50DataReq = session.get(url, headers=headers, cookies=cookies, timeout=15)
nifty50DataJson = nifty100DataReq.json()
return nifty50DataJson['data']
Actually "Pythonanywhere" only supports those website which are in this whitelist.
And I have found that there are only two subdomain available under "nseindia.com", which is not that you are trying to request.
bricsonline.nseindia.com
bricsonlinereguat.nseindia.com
So, pythonanywhere is blocking you to sent request to that website.
Here's the link to read more about how to request to add your website there.
My Python code:
import requests
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
xml_data1 = "hello"
hostxml = "example.com/xmlrpc.php"
check_method = requests.post(hostxml,data='1', verify=False,headers=headers).text
print(check_method)
Output:
Not Acceptable!An appropriate representation of the requested resource could not be found on this server. This error was generated by Mod_Security.
Mod_Security can be configured to block POST requests on certain endpoints.
Recon the version of Mod_Security the server is using and find a vulnerability to bypass the security check.
Note that this is very illegal unless you have permission to test the server.
I'm trying to fetch automatically the MAC addresses for some vendors in Python. I found a website that is really helpful, but I'm not being able to access its information from Python. When I run this:
import grequests
rs = (grequests.get(u) for u in ['https://aruljohn.com/mac/000000'])
requests = grequests.map(rs)
for response in requests:
print(response)
It prints None. Does anyone know how to solve this?
Looks like the issue is just not setting a user-agent in the headers. I was able to request the website without any issues. I just used normal python request but it should work fine with grequests. I do think you might want to find a more active library. You could check out aiohttp. Very active and I have had a wonderful experience using aiohttp.
import requests
from lxml import html
def request_website(mac):
url = 'https://aruljohn.com/mac/' + mac
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36'
}
r = requests.get(url, headers=headers)
return r.text
response = request_website('000000')
tree = html.fromstring(response)
results = tree.cssselect('.results p')[0].text
print (results)
I am using python requests to get the html page.
I am using the latest version of chrome in the user agent.
But the response tells that Please update your browser.
Here is my sample code.
import requests
url = 'https://www.choicehotels.com/alabama/mobile/quality-inn-hotels/al045/hotel-reviews/4'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36', 'content-type': 'application/xhtml+xml', 'referer': url}
url_response = s.get(url, headers=headers, timeout=15)
print url_response.text
I am using python 2.7 in a windows server.
But when I ran the same code in my local I got the required output.
Please update your browser is the answer.
You cannot do https with old browser (and request in python2.7 could be old browser). There were a lot of security problems in https protocols, so it seems that servers doesn't allow to connect with unsecure encryptions and connection standards.