As a newbie, I wonder whether there is a method to get the http response status code to judge some expections, like remote server down, url broken, url redirect, etc...
In Selenium it's Not Possible!
For more info click here.
You can accomplish it with request:
import requests
from selenium import webdriver
driver = webdriver.get("url")
r = requests.get("url")
print(r.status_code)
Update:
It actually is possible using the chrome-developer-protocoll with event listeners.
See example script at https://stackoverflow.com/a/75067388/20443541
Related
I was wondering if I can get the current url after a redirect from the starting page, done with requests.
For example:
I send the reqeusts to "google.com" that instantanely sends me to "google.com/page-123456", the page number changes everytime. Can I get the "google.com/page-123456" in my script?
With selenium it can be made like this:
import selenium
import time
driver = (...)
driver.get('google.com')
time.sleep(2)
url = driver.current_url
Can be this made in reqeusts / BeautifoulSoup? How?
Thanks
Try property url of Request object that you can access by response.request:
import requests
response=requests.get("https://google.com")
url=response.request.url
When you go to the site, https://www.jimmyjazz.com/search?keywords=11468285, you are redirected to https://www.jimmyjazz.com/mens/footwear/adidas-solar-hu-nmd/BB9528.
I would like to use requests to enter that search link, then return the url that it is redirected to.
Here is my code to do that:
import requests
from bs4 import BeautifulSoup
sitename = "https://www.jimmyjazz.com/search?keywords=11468285"
response = requests.get(sitename, allow_redirects=True)
print(response.url)
But it still returns the original url:
PS C:\Users\jokzc\Desktop\python\learning requests> py test2.py
https://www.jimmyjazz.com/search?keywords=11468285
How would I append my code to make fix that? Thanks :)
This doesn't actually send a 302 redirect code back. I did the same HTTP GET call in postman, and it appears that it returns a 200 OK response:
Same goes for chrome dev tools, looking at the network traffic:
I think that somewhere in their JavaScript code they are setting the location.href to be a new url. I didn't go through the whole JS stack trace to prove it, but that is my best guess.
Is it possible to receive the status code of a URL with headless Chrome (python)?
As stated in this answer it's not possible using python and selenium (I guess you are using selenium?).
A working alternative from selenium and chrome is requests. With requests an example would look like this:
import requests
response = requests.get('https://api.github.com')
print(response.status_code)
I use requests to scrape webpage for some content.
When I use
import requests
requests.get('example.org')
I get a different page from the one I get when I use my broswer or using
import urllib.request
urllib.request.urlopen('example.org')
I tried using urllib but it was really slow.
In a comparison test I did it was 50% slower than requests !!
How Do you solve this??
After a lot of investigations I found that the site passes a cookie in the header attached to the first visitor to the site only.
so the solution is to get the cookies with head request, then resend them with your get request
import requests
# get the cookies with head(), this doesn't get the body so it's FAST
cookies = requests.head('example.com')
# send get request with the cookies
result = requests.get('example.com', cookies=cookies)
Now It's faster than urllib + the same result :)
I'm trying to get contest data from the url: "https://www.draftkings.com/contest/gamecenter/32947401"
If you go to this URL and aren't logged in, it'll just re-direct you to the lobby. If you're logged in, it'll actually show you the contest results.
Here's some things I tried:
-First, I used Chrome's Dev networking tools to watch requests while I manually logged in
-I then tried copying the cookie that I thought contained the authentication info, it was of the form:
'ajs_anonymous_id=%123123123123123, mlc=true; optimizelyEndUserId'
-I then stored that cookie as an Evironment variable and ran this code:
HEADERS= {'cookie': os.environ['MY_COOKIE'] }
requests.get(draft_kings_url, headers= HEADERS)
No luck, this just gave me the lobby.
I then tried request's built in:
HTTPBasicAuth
HTTPDigestAuth
No luck here either.
I'm no python expert by far, and I've pretty much exhausted what I know and the search results I've found. Any ideas?
The tool that you want is selenium. Something along the lines of:
from selenium import webdriver
browser = webdriver.Firefox()
browser.get(r"https://www.draftkings.com/contest/gamecenter/32947401" )
username = browser.find_element_by_id("user")
username.send_keys("username")
password = browser.find_element_by_id("password")
password.send_keys("top_secret")
login = selenium.find_element_by_name("login")
login.click()
Use fiddler to see the exact request they are making when you try to log in. Then use Session class in requests package.
import requests
session = requests.Session()
session.get('YOUR_URL_LOGIN_PAGE')
this will save all the cookies from your url in your session variable (Like when you use a browser).
Then make a post request to the login url with appropriate data.
You dont have to manually pass cookie data as it is auto generated when you first visit a website. However you can set some header explicitly like UserAgent etc by:
session.headers.update({'header_name':'header_value'})
HTTPBasicAuth & HTTPDigestAuth might not work based on the website.