Python - Resolve response Remote URL link with actual page - python

I am trying to get the actual page URL using the remote URL from JSON response.
Below is the remote URL I get in an JSON API response.
https://somesite.com/mainlink/1eb-68a8-40be-a3-5679e/utilities/927-40-b958-3b5?pagePath=teststaff
when I click the link, it resolves to actual page when opened in browser which in this case is
https://somesite.com/mainlink/recruitement/utilities/salesteam?pagePath=teststaff
How can programatically get this resolved URL without opening in browser ?

IIUC, use requests.head with allow_redirects=True to ask for the final URL :
import requests
​
url = "https://en.wikipedia.org/"
​
infos = requests.head(url, allow_redirects=True)
​
Output :
​
print(infos.url)
​#https://en.wikipedia.org/wiki/Main_Page

Related

Can't get the html of a page python

So I have been trying to solve this for the past 3 days and just can't know why.
I'm trying to access the html of this site that requires login first.
I tried everyway I could and all return with the same problem.
Here is what I tried:
response = requests.get('https://de-legalization.tlscontact.com/eg/CAI/myapp.php', headers=headers, params=params, cookies=cookies)
print(response.content)
payload = {
'_token': 'TOKEN HERE',
'email': 'EMAIL HERE',
'pwd': 'PASSWORDHERE',
'client_token': 'CLIENT_TOKEN HERE'
}
with requests.session() as s:
r = s.post(login_url, data=payload)
print(r.text)
I also tried using URLLIB but they all return this:
<script>window.location="https://de-legalization.tlscontact.com/eg/CAI/index.php";</script>
Anyone knows why this is happening.
Also here is the url of the page I want the html of:
https://de-legalization.tlscontact.com/eg/CAI/myapp.php
You see this particular output because it is in fact the content of the page you are downloading.
You can test it in chrome by opening the following url:
view-source:https://de-legalization.tlscontact.com/eg/CAI/myapp.php
This is how it looks like in Chrome:
This is happening because you are being redirected by the javascript code on the page.
Since the page you are trying to access requires login, you cannot access it just by sending http request to the internal page.
You either need to extract all the cookies and add them to the python script.
Or you need to use a tool like Selenium that allows you to control a browser from your Python code.
Here you can find how to extract all the cookies from the browser session:
How to copy cookies in Google Chrome?
Here you can find how to add cookies to the http request in Python:
import requests
cookies = {'enwiki_session': '17ab96bd8ffbe8ca58a78657a918558'}
r = requests.post('http://wikipedia.org', cookies=cookies)

Get cookies from selenium to requests

I can login to a website with selenium and i can receive all cookies.
But then I have to quickly submit a request to the site. Meanwhile, selenium stays very slow.
That's why I want to receive cookies with selenium and send requests via the request module.
My Selenium Code (First I log in to the website and received all cookies with the code below.)
browser.get('https://www.example.com/login')
cookiem1 = browser.get_cookies()
print(cookiem1)
2nd stage, I will go to another page of the website and make a request.
s = requests.Session()
for cookie in cookiem1:
s.cookies.set(cookie['name'], cookie['value'])
r = s.get("https://example.com/postcomment')
print(r.content)
I use cookies in this way, but when I send the url via request module, the site does not autohorize my user.
My error:
"errorMessage": "Unauthorized user",\r\n "errorDetails": "No cookie"
Probably with this code the site doesn't unauthorized my session
Thanks in advance
try this
import requests as re
ck = browser.get_cookies()
s = re.Session()
c = [s.cookies.set(c['name'], c['value']) for c in ck]
response = s.get("https://example.com/postcomment")

Getting the redirected url in urllib2

I have a url, and as soon as I click on it, it redirects me to another webpage. I want to get that directed URL in my code with urllib2.
Sample code:
link='mywebpage.com'
html = urllib2.urlopen(link).read()
Any help is much appreciated
use requests library, by default Requests will perform location redirection for all verbs except HEAD.
r = requests.get('https://mywebpage.com')
or turn off redirect
r = requests.get('https://mywebpage.com', allow_redirects=False)

Can't emulate browser behavior with requests

I'm trying to send a post request to a website to get a json response. I can see the json response in Chrome Inspector when I click on a link, but I can get it using requests.
Firstly I tried to used requests Session to get the cookies first and use them in the post request, to no avail.
session = requests.Session()
session.get('http://www.auchandrive.fr/drive/pagestatique.pagetemplate.popuphandler.popinchangementmagasin.changermag/537?t:ac=PAGE_STATIQUE_ENGAGEMENTS')
response = session.post('http://www.auchandrive.fr/drive/rayon.productlist.pagination_0.topage/1?t:ac=3686973/3686997')
print response.text
Secondly I used Selenium+PhantomJS to get the cookies and used them in requests, no results!
browser = webdriver.PhantomJS(PHANTOMJS_PATH)
browser.get('http://www.auchandrive.fr/drive/pagestatique.pagetemplate.popuphandler.popinchangementmagasin.changermag/537?t:ac=PAGE_STATIQUE_ENGAGEMENTS')
all_cookie = {}
for cookie in browser.get_cookies():
all_cookie[cookie['name']] = cookie['value']
rep = requests.post('http://www.auchandrive.fr/drive/rayon.productlist.pagination_0.topage/1?t:ac=3686973/3686997', cookies=all_cookie)
It only works when I manually take the cookies from Chrome.
I can't see what's the problem!
session = requests.Session()
session.get('http://www.auchandrive.fr/drive/pagestatique.pagetemplate.popuphandler.popinchangementmagasin.changermag/537?t:ac=PAGE_STATIQUE_ENGAGEMENTS')
response = session.post('http://www.auchandrive.fr/drive/rayon.productlist.pagination_0.topage/1?t:ac=3686973/3686997')
print(response.json)
Using the json attribute will fetch the JSON response. You can also use requests to make a persistent session, so the cookies are provided.
response.cookies #The cookies attribute

Not able to save cookies from a website using Python requests?

For the link : http://www.ibnlive.com/videos/world/
by using a web browser I can easily see the following cookies when page is loaded as shown :
But if I try to load the same cookies using python requests, it shows up as an empty dictionary :
import requests
s = requests.session()
connection = s.get('http://www.ibnlive.com/videos/world/')
print(s.cookies) # produces an empty dictionary
My question how can I get those cookies using a python script ?
Because the cookie is not being set by http://www.ibnlive.com/videos/world/, but some other resource that the page is loading. Try looking at the headers for that url, you won't see any Set-Cookie headers.

Categories

Resources