Web scraping request stopped working, showing "Response [401]" in python? - python

import requests
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.135 Safari/537.36'}
url = 'https://www.nseindia.com/api/chart-databyindex?index=ACCEQN'
r = requests.get(url, headers=headers)
data = r.json()
print(data)
prices = data['grapthData']
print(prices)
It was working fine but now it showing error "Response [401]"

Well, it's all about the site's authentication requirements. It requires a certain level of authorization to access like this.

Related

Web Scrapping just return None

I'm trying to make a pop-up program with mir4 draco price. But the price return None :
import requests
from bs4 import BeautifulSoup
urll = 'https://www.xdraco.com/coin/price/'
headers = {
'User-Agent': "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/86.0.4240.198 Safari/537.36"}
site = requests.get(urll, headers=headers)
soup = BeautifulSoup(site.content, 'html5lib')
price = soup.find('span', class_="amount")
print(price)
You won't be able to parse a site that is dynamically loaded using JS as #jabbson mentioned.
This might be a way to get the data you want.
If you check the network requests being made by the page, you will find that it makes calls to a few different APIs. I found one that might have the info you're looking for. You can make POST requests to this API as shown below...
import requests
import json
headers = {'accept':'application/json, text/plain, */*','user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36'}
html = requests.post('https://api.mir4global.com/wallet/prices/hydra/daily', headers=headers)
output = json.loads(html.text)
# 'output' is a dictionary. If we index the last element, we can get the latest data entry
print(output['Data'][-1])
OUTPUT:
{'CreatedDT': '2022-08-04 21:55:00', 'HydraPrice': '2.1301000000000001', 'HydraAmount': '13434', 'HydraPricePrev': '2.3336000000000001', 'HydraAmountPrev': '5972', 'HydraUSDWemixRate': '2.9401340627166839', 'HydraUSDKLAYRate': '0.29840511595654395', 'USDHydraRate': '6.2627795669928084'}

Issue using python request module

Good morning,
Since yesterday, I'm having timeouts doing requests to ebay website. The code is simple:
import requests
headers = {
"User-Agent":
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36"}
htlm=requests.get("https://www.ebay.es",headers=headers).text
Tested with google and it works. This is the response I receive:
'\nGateway Timeout - In read \n\nGateway Timeout\nThe proxy server did not receive a timely response from the upstream server.\nReference #1.477f1602.1645295618.7675ccad\n\n'
What happened or changed? How could I solve it?
Removing the headers should work. Perhaps they don't like that user agent for some reason.
import requests
# headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36"}
headers = {}
url = "https://www.ebay.es"
response = requests.get(url, headers=headers)
html_text = response.text

How to scrape website that comes up with 403 error?

I am trying to scrape the following web page
https://jamanetwork.com/journals/jamaneurology/article-abstract/2696970
but getting an error.
url ='https://jamanetwork.com/journals/jamaneurology/article-abstract/2696970'
result = requests.get(url)
soup = BeautifulSoup(result.content, 'html.parser')
print(soup.prettify())
Result:
403 Forbidden Request forbidden by
administrative rules.
You can access the web page with no credentials, so not sure why I get 'Request forbidden' error while scraping.
As mentioned you should add a user-agent to your request:
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36'}
You can check your own headers, send by the browser via opening dev tools and take a look under network section. Read more about user-agent.
Example
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36'}
url ='https://jamanetwork.com/journals/jamaneurology/article-abstract/2696970'
result = requests.get(url,headers=headers)
soup = BeautifulSoup(result.content, 'html.parser')
print(soup.prettify())

No response from `request.get()` for NASDAQ webpage

I am able to open this url via a browser and see the response in json format. However, when I use the requests module, there is no response from the method.
import requests
response = requests.get('https://api.nasdaq.com/api/calendar/earnings?date=2021-02-23')
What is wrong here?
this worked for me:
url = 'https://api.nasdaq.com/api/calendar/earnings?date=2021-02-23'
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36'}
response = requests.get(url, headers=headers)
Explanation
The site is blocking requests from python. Refer to explanation here
When adding the headers of the query that appear when inspecting the element in chrome, the request works well in python:
import requests
response = requests.get('https://api.nasdaq.com/api/calendar/earnings?date=2021-02-23',headers={"authority":"api.nasdaq.com","scheme":"https","path":"/api/calendar/earnings?date=2021-02-23","pragma":"no-cache","cache-control":"no-cache","accept":"application/json, text/plain, */*","user-agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.182 Safari/537.36","origin":"https://www.nasdaq.com","sec-fetch-site":"same-site","sec-fetch-mode":"cors","sec-fetch-dest":"empty","referer":"https://www.nasdaq.com/","accept-encoding":"gzip, deflate, br","accept-language":"en-US,en;q=0.9,es;q=0.8,nl;q=0.7"})

Python requests html 403 response

Im using the requests module in python to try and make a search on the following webiste http://musicpleer.audio/, however this website appears to be blocking me as it issues nothing but a 403 when i attempt to access it, im wondering how i can get around this, ive tried sending it the user agent of my web browser(chrome) and it still returns error 403. any suggestions on how i could get around this an example of downloading a song from the site would be very helpful. Thanks in advance
My code:
import requests, os
def funGetList:
start_path = 'C:/Users/Jordan/Music/' # current directory
list = []
for path,dirs,files in os.walk(start_path):
for filename in files:
temp = (os.path.join(path,filename))
tempLen = len(temp)
"print(tempLen)"
iterate = 0
list.append(temp[22:(len(temp))-4])
def funDownloadMP3:
for i in list:
print(i)
payload = {'searchQuery': 'meme', 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36'}
url = 'http://musicpleer.audio/'
print(requests.post(url, data=payload))
Putting the User-Agent in the headers seems to work:
In []:
import requests
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36'}
url = 'http://musicpleer.audio/'
r = requests.get('{}#!{}'.format(url, 'meme'), headers=headers)
r.status_code
Out[]:
200
Note: It looks like the search url is simple '#!<search-term>'
HTML 403 Forbidden error code.
The server might be expecting some more request headers like Host or Cookies etc.
You might want to use Postman to debug it with ease

Categories

Resources