Cannot scrape using python.requests() but work when loading on browser

Cannot scrape using python.requests() but work when loading on browser - python

I want to scrape data from this page: https://raritysniffer.com/viewcollection/primeapeplanet
The API request works on the browser but returns 403 ERROR when I use python.requests.
requests.get("https://raritysniffer.com/api/index.php?query=fetch&collection=0x6632a9d63e142f17a668064d41a21193b49b41a0&taskId=any&norm=true&partial=true&traitCount=true")
I understand it is possible that I have to pass on specific headers to make it work, but as a python novice, I have no idea how to make it work. Please advise. Thanks!

If you check the response, you can see that the website uses Cloudfare and which indeed returns the 403. To bypass this, try cloudscraper. (be mindful)
import cloudscraper
url = 'https://raritysniffer.com/api/index.php?query=fetch&collection=0x6632a9d63e142f17a668064d41a21193b49b41a0&taskId=any&norm=true&partial=true&traitCount=true'
scraper = cloudscraper.create_scraper(browser = 'firefox')
print(scraper.get(url).text)

Related

Why don't I get a response from my request?

I'm trying to make one simple request:
ua=UserAgent()
req = requests.get('https://www.casasbahia.com.br/' , headers={'User-Agent':ua.random})
I would understand if I received <Response [403] or something like that, but instead, a recive nothing, the code keep runing with no response.
using logging I see:
I know I could use a timeout to avoid keeping the code running, but I just want to understand why I don't get an response
thanks in advance

I never used this API before, but from what I researched on here just now, there are sites that can block requests from fake users.
So, for reproducing this example on my PC, I installed fake_useragent and requests modules on my Python 3.10, and tried to execute your script. It turns out that with my Authentic UserAgent string, the request can be done. When printed on the console, req.text shows the entire HTML file received from the request.
But if I try again with a fake user agent, using ua.random, it fails. The site was probably developed to detect and reject requests from fake agents (or bots).
Though again, this is just theory. I have no ways to access this site's server files to comprove it.

Get Current Browser URL without Selenium Python

Hello I want to ask if there is a way to get my current url every second printed without the selenium library in Python. Selenium would be probably the easier way i know but this is not in my interests. Thanks!

what are you trying to do, exactly? If you just want to get a request from the url you are talking about.. You can use the requests library.
To make a request, simply do:
import requests
with requests.get('https://url.com') as response:
print(response)
If the output is Response[200], you're good.

Downloading torrent file using get request (.torrent)

I am trying to download torrent file from this code :
url = "https://itorrents.org/torrent/0BB4C10F777A15409A351E58F6BF37E8FFF53CDB.torrent"
r = requests.get(url, allow_redirects=True)
open('test123.torrent', 'wb').write(r.content)
It downloads a torrent file , but when i load it to bittorrent error occurs.
It says Unable to Load , Torrent Is Not Valid Bencoding
Can anybody please help me to resolve this problem ? Thanks in advance

This page use cloudflare to prevent scraping the page,I am sorry to say that bypassing cloudflare is very hard if you only use requests, the measures cloudflare takes will update soon.This page will check your browser whether it support Javascript.If not, they won't give you the bytes of the file.That's why you couldn't use them.(You could use r.text to see the response content, it is a html page.Not a file.)
Under this circumstance, I think you should consider about using selenium.

Bypassing Cloudflare can be a pain, so I suggest using a library that handles it. Please don't forget that your code may break in the future because Cloudflare changes their techniques periodically. Well, if you use the library, you will just need to update the library (at least you should hope for that).
I used a similar library only in NodeJS, but I see python also has something like that - cloudscraper
Example:
import cloudscraper
scraper = cloudscraper.create_scraper() # returns a CloudScraper instance
# Or: scraper = cloudscraper.CloudScraper() # CloudScraper inherits from requests.Session
print scraper.get("http://somesite.com").text # => "<!DOCTYPE html><html><head>..."
Depending on your usage you may need to consider using proxies - CloudFlare can still block you if you send too many requests.
Also, if you are working with video torrents, you may be interested in Torrent Stream Server. It a server that downloads and streams video at the same time, so you can watch the video without fully downloading it.

We can do by adding cookies in headers .
But after some time cookie expires.
Therefore only solution is to download from opening browser

TOMTOM api calculateReachableRange 'avoidVignette' or 'allowVignette'

i am new to using the TOMTOM API but i got it working with the example in the browser without a problem, call:
https://api.tomtom.com/routing/1/calculateReachableRange/50.97452,5.86605/json/?key=[MYKEY]&timeBudgetInSec=3600
in the browser i get my json response with my polygon point. But in python i just get the error stating:
"Invalid request: should contain one of the following elements 'avoidVignette' or 'allowVignette'"
Does anybody have any idea why it works in the browser but gives an error when i use it in python code?
mycode:
request_post = requests.post('https://api.tomtom.com/routing/1/calculateReachableRange/50.97452,5.86605/json/?key=[MYKEY]&timeBudgetInSec=3600')
thanks in advance

I figured it out with the help of the comment of #ForceBru.
I used postman to figure out what the problem was and it seems that if you do not use the link directly in the browser but use it as a real post request you are needed to give it a xml or json body where you need to specifiy:
{"avoidVignette": []}
if you are using json.
If you put this in your post request as body it should work like a charm.
Working code:
requests.post('https://api.tomtom.com/routing/1/calculateReachableRange/50.97452,5.86605/json/?key=[MYKEY]&timeBudgetInSec=3600', json={"avoidVignette": []})
Hope this helps some people forward if they get the same error.

If You are not providing any POST parameters than You can use GET method.
Here is the link to Online Routing API Explorer - link

Extract HTML-Content from URL of Site that probably uses Cookies via Python

I recently wanted to extract data from a website that seems to use cookies to grant me access. I do not know very much about those procedures but appearently this inteferes with my method of getting the html content of the website via Python and its requests module.
The code I am running to extract the information contains the following lines:
import responses
#...
response = requests.get(url, proxies=proxies)
content = requests.text
Where the website i am referring to is http://ieeexplore.ieee.org/xpls/icp.jsp?arnumber=6675630&tag=1 and proxies is a valid dict of my proxy servers (I tested those settings on websites that seemed to work fine). However, instead of the content of the article on this site I receive the html-content of the page that you get when you do not accept cookies in your browser.
As I am not really aware of what website is really doing and lack real Web-Developement experience I could not find a solution so far, even if a similar question might have been asked before. Is there any solution to access the content of this website via Python?

startr = requests.get('https://viennaairport.com/login/')
secondr = requests.post('http://xxx/', cookies=startr.cookies)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Cannot scrape using python.requests() but work when loading on browser - python

Related

Why don't I get a response from my request?

Get Current Browser URL without Selenium Python

Downloading torrent file using get request (.torrent)

TOMTOM api calculateReachableRange 'avoidVignette' or 'allowVignette'

Extract HTML-Content from URL of Site that probably uses Cookies via Python

Categories

Resources