Python requests html 403 response - python

Im using the requests module in python to try and make a search on the following webiste http://musicpleer.audio/, however this website appears to be blocking me as it issues nothing but a 403 when i attempt to access it, im wondering how i can get around this, ive tried sending it the user agent of my web browser(chrome) and it still returns error 403. any suggestions on how i could get around this an example of downloading a song from the site would be very helpful. Thanks in advance
My code:
import requests, os
def funGetList:
start_path = 'C:/Users/Jordan/Music/' # current directory
list = []
for path,dirs,files in os.walk(start_path):
for filename in files:
temp = (os.path.join(path,filename))
tempLen = len(temp)
"print(tempLen)"
iterate = 0
list.append(temp[22:(len(temp))-4])
def funDownloadMP3:
for i in list:
print(i)
payload = {'searchQuery': 'meme', 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36'}
url = 'http://musicpleer.audio/'
print(requests.post(url, data=payload))

Putting the User-Agent in the headers seems to work:
In []:
import requests
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36'}
url = 'http://musicpleer.audio/'
r = requests.get('{}#!{}'.format(url, 'meme'), headers=headers)
r.status_code
Out[]:
200
Note: It looks like the search url is simple '#!<search-term>'

HTML 403 Forbidden error code.
The server might be expecting some more request headers like Host or Cookies etc.
You might want to use Postman to debug it with ease

Related

Web Scrapping just return None

I'm trying to make a pop-up program with mir4 draco price. But the price return None :
import requests
from bs4 import BeautifulSoup
urll = 'https://www.xdraco.com/coin/price/'
headers = {
'User-Agent': "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/86.0.4240.198 Safari/537.36"}
site = requests.get(urll, headers=headers)
soup = BeautifulSoup(site.content, 'html5lib')
price = soup.find('span', class_="amount")
print(price)
You won't be able to parse a site that is dynamically loaded using JS as #jabbson mentioned.
This might be a way to get the data you want.
If you check the network requests being made by the page, you will find that it makes calls to a few different APIs. I found one that might have the info you're looking for. You can make POST requests to this API as shown below...
import requests
import json
headers = {'accept':'application/json, text/plain, */*','user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36'}
html = requests.post('https://api.mir4global.com/wallet/prices/hydra/daily', headers=headers)
output = json.loads(html.text)
# 'output' is a dictionary. If we index the last element, we can get the latest data entry
print(output['Data'][-1])
OUTPUT:
{'CreatedDT': '2022-08-04 21:55:00', 'HydraPrice': '2.1301000000000001', 'HydraAmount': '13434', 'HydraPricePrev': '2.3336000000000001', 'HydraAmountPrev': '5972', 'HydraUSDWemixRate': '2.9401340627166839', 'HydraUSDKLAYRate': '0.29840511595654395', 'USDHydraRate': '6.2627795669928084'}

python web scraping IP blocked

I am trying to extract the source code of the html page. It was working fine before. but now the source web server wanted more evidence that I am NOT a bot. This is the error: your IP is blocked. My IP is NOT blocked for sure as I can still open the page manually via any browser. Do I need to change any parameters before making the request. Thanks.
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36"}
req = requests.get(url, headers=headers)
url_content = req.content
url_content = url_content.replace(b'data-imgid', b'\ndata-imgid')
output_file = open('downloaded.txt', 'wb')
output_file.write(url_content)
output_file.close()

Web scraping request stopped working, showing "Response [401]" in python?

import requests
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.135 Safari/537.36'}
url = 'https://www.nseindia.com/api/chart-databyindex?index=ACCEQN'
r = requests.get(url, headers=headers)
data = r.json()
print(data)
prices = data['grapthData']
print(prices)
It was working fine but now it showing error "Response [401]"
Well, it's all about the site's authentication requirements. It requires a certain level of authorization to access like this.

URL redirect returning a 403 rather than 302

import requests
def extractlink():
with open('extractlink.txt', 'r') as g:
print("opened extractlink.txt for reading")
contents = g.read()
headers = {'userAgent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36'}
r = requests.get(contents, headers=headers)
print(("Links to " + r.url))
time.sleep (2)
Currently, r.url is just linking to the url found in 'extractlink.txt'
I'm looking to fix this script to find the final redirected url and print the result. It appears the issue lies somewhere in the request for the URL, despite trying many alternatives and troubleshooting steps, my issue doesn't seem to be solved like the rest.
When debugging, r.history reads as [] and r.status_code reads as 403 even though the link redirects as a 302 in browser.
Any ideas?
(extractlink.txt is just a one line file with a link to http://butterup.teechip.icu/, enter with your own caution, spam website)
Once again, this is not a duplicate, I'd appreciate if you stop marking it as such. The information and code has changed, as well as the error/goals.
You've just misnamed the User-Agent header:
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36'}
r = requests.get(contents, headers=headers)
Despite many attempts at troubleshooting, It appears the answer lies in the host used for the script returning a 403 on sites that aren't whitelisted.

Login to website via Python Requests

for a university project I am currently trying to login to a website, and scrap a little detail (a list of news articles) from my user profile.
I am new to Python, but I did this before to some other website. My first two approaches deliver different HTTP errors. I have considered problems with the header my request is sending, however my understanding of this sites login process appears to be insufficient.
This is the login page: http://seekingalpha.com/account/login
My first approach looks like this:
import requests
with requests.Session() as c:
requestUrl ='http://seekingalpha.com/account/orthodox_login'
USERNAME = 'XXX'
PASSWORD = 'XXX'
userAgent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.112 Safari/537.36'
login_data = {
"slugs[]":None,
"rt":None,
"user[url_source]":None,
"user[location_source]":"orthodox_login",
"user[email]":USERNAME,
"user[password]":PASSWORD
}
c.post(requestUrl, data=login_data, headers = {"referer": "http://seekingalpha.com/account/login", 'user-agent': userAgent})
page = c.get("http://seekingalpha.com/account/email_preferences")
print(page.content)
This results in "403 Forbidden"
My second approach looks like this:
from requests import Request, Session
requestUrl ='http://seekingalpha.com/account/orthodox_login'
USERNAME = 'XXX'
PASSWORD = 'XXX'
userAgent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.112 Safari/537.36'
# c.get(requestUrl)
login_data = {
"slugs[]":None,
"rt":None,
"user[url_source]":None,
"user[location_source]":"orthodox_login",
"user[email]":USERNAME,
"user[password]":PASSWORD
}
headers = {
"accept":"text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
"Accept-Language":"de-DE,de;q=0.8,en-US;q=0.6,en;q=0.4",
"origin":"http://seekingalpha.com",
"referer":"http://seekingalpha.com/account/login",
"Cache-Control":"max-age=0",
"Upgrade-Insecure-Requests":1,
"user-agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.112 Safari/537.36"
}
s = Session()
req = Request('POST', requestUrl, data=login_data, headers=headers)
prepped = s.prepare_request(req)
prepped.body ="slugs%5B%5D=&rt=&user%5Burl_source%5D=&user%5Blocation_source%5D=orthodox_login&user%5Bemail%5D=XXX%40XXX.com&user%5Bpassword%5D=XXX"
resp = s.send(prepped)
print(resp.status_code)
In this approach I was trying to prepare the header exactly as my browser would do it. Sorry for redundancy. This results in HTTP error 400.
Does someone have an idea, what went wrong? Probably a lot.
Instead of spending a lot of energy on manually logging in and playing with Session, I suggest you just scrape the pages right away using your cookie.
When you log in, usually there is a cookie added to your request to identify your identity. Please see this for example:
Your code will be like this:
import requests
response = requests.get("www.example.com", cookies={
"c_user":"my_cookie_part",
"xs":"my_other_cookie_part"
})
print response.content

Categories

Resources