Get Current Browser URL without Selenium Python - python

Hello I want to ask if there is a way to get my current url every second printed without the selenium library in Python. Selenium would be probably the easier way i know but this is not in my interests. Thanks!

what are you trying to do, exactly? If you just want to get a request from the url you are talking about.. You can use the requests library.
To make a request, simply do:
import requests
with requests.get('https://url.com') as response:
print(response)
If the output is Response[200], you're good.

Related

Python web requests: Accessing json data from web response

I have a question with probably a well-known answer. However I couldnt articulate it well enough to find answers on google.
Lets say you are using the developer interface of Chrome browser (Press F12). If you click on the network tab and go to any website, a lot of files will be queried there for example images, stylesheets and JSON-responses.
I want to parse these JSON-responses using python now.
Thanks in advance!
You can save the network requests to a .har file (JSON format) and analyze that.
In your network tools panel, there is a download button to export as HAR format.
import json
with open('myrequests.har') as f:
network_data = json.load(f)
print(network_data)
Or, as Jack Deeth answered you can make the requests using Python instead of your browser and get the response JSON data that way.
Though, this can sometimes be difficult depending on the website and nature of the request(s) (for example, needing to login and/or figuring out how to get all the proper arguments to make the request)
I use requests to get the data, and it comes back as a Python dictionary:
import requests
r = requests.get("url/spotted/with/devtools")
r.json()["keys_observed_in_devtools"]
Perhaps you can try using Selenium.
Maybe the answers on this question can help you.

Cannot scrape using python.requests() but work when loading on browser

I want to scrape data from this page: https://raritysniffer.com/viewcollection/primeapeplanet
The API request works on the browser but returns 403 ERROR when I use python.requests.
requests.get("https://raritysniffer.com/api/index.php?query=fetch&collection=0x6632a9d63e142f17a668064d41a21193b49b41a0&taskId=any&norm=true&partial=true&traitCount=true")
I understand it is possible that I have to pass on specific headers to make it work, but as a python novice, I have no idea how to make it work. Please advise. Thanks!
If you check the response, you can see that the website uses Cloudfare and which indeed returns the 403. To bypass this, try cloudscraper. (be mindful)
import cloudscraper
url = 'https://raritysniffer.com/api/index.php?query=fetch&collection=0x6632a9d63e142f17a668064d41a21193b49b41a0&taskId=any&norm=true&partial=true&traitCount=true'
scraper = cloudscraper.create_scraper(browser = 'firefox')
print(scraper.get(url).text)

How to get live stream url from script

I need to get the live stream url using a scripting language such as python or shell
eg: http://rt.com/on-air/
I can get the url by using a tool such as the network monitor on Firefox, but i need to be able to get it via a script
After quick look on requests documentation:
from contextlib import closing
with closing(requests.get('http://rt.com/on-air/', stream=True)) as r:
# Do things with the response here.
If it doesn't help, please check another way:
import requests
r = requests.get('http://rt.com/on-air/', stream=True)
for line in r.iter_lines():
# filter out keep-alive new lines
if line:
# do some sort of things
You need to identify it in the source of the page. It is pretty much the same as using the network tool from FF.
For python you can use beautifulsoup to parse the page and get more info out of it... or a simple regex.

How to get a redirected url without using a Selenium in Python

I'm trying to get a redirected url from another url without using a selenium object. I have an url like:
http://registry.theknot.com/track/View?lt=RetailerGVR&r=325404419&rt=12160&a=994&st=RegistryProfile&ss=LinkedRegistries&sp=Logo
and it gets redirected to:
http://www.target.com/RegistryGiftGiverCmd?isPreview=false&status=completePageLink&registryType=WD&isAjax=false&listId=NjPO_i-DoIafZPZSFhaBRw&clkid=2gTTqGRwsXS4x%3AexW%3ATGBxiqUkWXSi0It0P5VM0&lnm=Online+Tracking+Link&afid=The+Knot%2C+Inc.+and+Subsidiaries&ref=tgt_adv_xasd0002
when is opened by some browser.
I want to avoid instancing a Selenium object and raise a Firefox/Chrome process just to get the redirected URL. Is there any other better way?
Thanks!
If this is just an HTTP redirect, urllib.request/urllib2 in the standard library can follow redirects just fine, as can third-party HTTP client libraries like requests and PycURL. In fact, in the simplest use cases, they do so automatically.
So, just:
>>> import urllib.request
>>> original_url = 'http://registry.theknot.com/track/View?lt=RetailerGVR&r=325404419&rt=12160&a=994&st=RegistryProfile&ss=LinkedRegistries&sp=Logo'
>>> u = urllib.request.urlopen(original_url)
>>> print(u.url)
http://www.target.com/RegistryGiftGiverCmd?isPreview=false&status=completePageLink&registryType=WD&isAjax=false&listId=NjPO_i-DoIafZPZSFhaBRw&clkid=0b5XTmU%3A5WbqRETSYD20AQKOUkWXSGQgQSquVU0&lnm=Online+Tracking+Link&afid=The+Knot%2C+Inc.+and+Subsidiaries&ref=tgt_adv_xasd0002
But if you just want the data, you don't even need that:
>>> data = u.read()
That's the contents of the redirected request.
(For Python 2.x, just replace urllib.request with urllib2 and it works the same.)
The only reason you'd need to use Selenium (or another browser automation and/or JS-environment library) is if the redirect is done through in-page JavaScript. Which it usually isn't, and isn't in this case. There's no reason to go outside the standard library, talk to another app, etc. for simple things like this.

Unable to get page source code in python

I'm trying to get the source code of a page by using:
import urllib2
url="http://france.meteofrance.com/france/meteo?PREVISIONS_PORTLET.path=previsionsville/750560"
page =urllib2.urlopen(url)
data=page.read()
print data
and also by using a user_agent(headers)
I did not succeed to get the source code of the page!
Have you guys any ideas what can be done?
Thanks in Advance
I tried it and the requests works, but the content that you receive says that your browser must accept cookies (in french). You could probably get around that with urllib2, but I think the easiest way would be to use the requests lib (if you don't mind having an additional dependency).
To install requests:
pip install requests
And then in your script:
import requests
url = 'http://france.meteofrance.com/france/meteo?PREVISIONS_PORTLET.path=previsionsville/750560'
response = requests.get(url)
print(response.content)
I'm pretty sure the source code of the page will be what you expect then.
requests library worked for me as Martin Maillard showed.
Also in another thread I have noticed this note by leoluk here:
Edit: It's 2014 now, and most of the important libraries have been
ported and you should definitely use Python 3 if you can.
python-requests is a very nice high-level library which is easier to
use than urllib2.
So I wrote this get_page procedure:
import requests
def get_page (website_url):
response = requests.get(website_url)
return response.content
print get_page('http://example.com')
Cheers!
I tried a lot of things, "urllib" "urllib2" and many other things, but one thing worked for me for everything I needed and solved any problem I faced. It was Mechanize .This library simulates using a real browser, so it handles a lot of issues in that area.

Categories

Resources