What is the "tag name" (or other way to identify the element) for the URL field in Firefox?
e.g. when you want to open a new tab in Firefox, you can select the body by:
body = driver.find_element_by_tag_name('body')
and, for example, open a new tab:
body.send_keys(Keys.CONTROL + 't')
Is there a simple "tag name" that anyone knows of for the URL bar?
Normally, you can right click on elements in Firefox and press Q to identify them, but in this case, that doesn't apply.
EDIT: I am not trying to find the URL bar element so that I can navigate to a new web page. I'd like to find it so I can send the "return" key to it as a workaround to refresh the page.
I believe it's 'urlbar'. At least that's how you call it when you code autocomplete features...
The URL bar that you see in the browser is not a part of the page.
If you want to get the current url, use .current_url:
driver.current_url
If you want to change the url, use .get():
driver.get("new_url_here")
You can also navigate through the browsing history using back() and forward().
FYI, Navigating documentation page has a lot of relevant information.
Related
I'm an beginner learning web scraping with Selenium. Recently I faced the problem that sometimes there are button elements that do not have a "href" attribute with link to the website it leads to. In order to obtain the link or useful information from that link, I need to click on the button and get the current url in the new window using the "current_url" method. However, it doesn't always work, when the new url is not valid. I'm asking for help on the solution.
To give you an example, say one wants to obtain the Spotify link to the song listed on https://www.what-song.com/Tvshow/100242/BoJack-Horseman/e/116712. After clicking on the Spotify button, instead of being directed to spotify web player, I see a new window popping up with this url "spotify:track:6ta5yavnnEfCE4faU0jebM". It's not valid probably due to some errors made by the website, but the identifier "6ta5yavnnEfCE4faU0jebM" is still useful so I want to obtain it.
However, when I try using the "current_url" method, it gives me the original link "https://www.what-song.com/Tvshow/100242/BoJack-Horseman/e/116712", instead of the invalid url. My codes are attached below. Note that I already have a time.sleep.
Specs: MacOS 12.6, chrome and webdriver version 106.something, Python 3.
s = Service('/web_scraping/chromedriver')
driver = webdriver.Chrome(service=s)
wait = WebDriverWait(driver, 3)
driver.get('https://www.what-song.com/Tvshow/100242/BoJack-Horseman/e/116712')
spotify_button_element = driver.find_element("xpath",'/html/body/div/div[2]/main/div[2]/div/div[1]/div[5]/div[1]/div[2]/div/div/div[2]/div/div[1]/button[3]')
driver.execute_script("arguments[0].click();", spotify_button_element)
time.sleep(3)
print(driver.current_url)
Any idea on why this happened and how to fix it? Hugh thanks in advance!
What you could do instead of finding the button to click and opening a new tab is to do the following:
import json
spotify_data_request = driver.find_element("id",'__NEXT_DATA__') # get the data stored in a script tag with id = '__NEXT_DATA__'
temp = json.loads(spotify_data_request.get_attribute('innerHTML')) # convert the string into a dict like object
print(temp['props']['pageProps']['episode']['songs'][0]['song']['spotifyId']) # get the Id attribute that you want instead of having to click the spotify button and retrieve it from the URL
I know there are plenty ways to get a HTML source passing the page url.
But is there a way to get the current html of a page if it displays data after some action ?
For example: A simple html page with a button (thats the source html) that displays random data when you click it.
Thanks
I believe you're looking for a tool collectively known as a "headless browser". The only one I've used that is available in Python (and can vouch for) is Selenium WebDriver, but there are plenty to choose from if you're searching up headless browsers for Python.
https://pypi.org/project/selenium
With this you should be able to programmatically load a web page, look up and click the button in the virtually rendered DOM, then lookup the innerHTML property of the targeted element.
I'm using Selenium with Python API and Chrome to do the followings:
Collect the Performance Log;
Click some <a, target='_blank'> tags to get into other pages;
For example, I click a href in Page 'A', which commands the browser opens a new window to load another URL 'B'.
But when I use driver.get_log('performance') to get the performance log, I can only get the log of Page 'A'. Even though I switch to the window of 'B' as soon as I click the href, some log entries of the page 'B' will be lost.
So how can I get the whole performance log of another page without setting the target of <a> to '_top'?
I had the same problem and I think it is because the driver does not immediately switch to a new window.
I switched to page "B" and reloaded this page, then uses get_log and it worked.
I am trying to scrape information off websites like this:
https://www.glassdoor.com/Overview/Working-at-7-Eleven-EI_IE3581.11,19.htm
using python + beautifulsoup + mechanize.
Accessing anything on the main-site is no problem. However, I also need the information that appears in a overlay-window that appears when one clicks on the "Rating Trends" button next to the bar with stars.
This overlay-window can also be accessed directly by using the url:
https://www.glassdoor.com/Reviews/7-Eleven-Reviews-E3581.htm#trends-overallRating
The html associated with this page is a modification of the original site's html.
However, regardless of what element I try to find (via findAll ) on that overlay-window website, beautifulsoup returns zero hits.
How can I fix this? I tried adding a sleep time between accessing the website and reading anything in, to no avail.
Thanks!
If you're using the Chrome browser select the background of that page (without the additional information displayed) and select 'Inspect' from the context menu (for Windows anyway), then the 'Network' tab, so that you can see network traffic. Now click on 'Rating trends'. The entry marked 'xhr' will be https://www.glassdoor.ca/api/employer/3581-rating.htm?locationStr=&jobTitleStr=&filterCurrentEmployee=false&filterEmploymentStatus=REGULAR&filterEmploymentStatus=PART_TIME (I much hope!) and its contents will be the following.
{"employerId":3581,"ratings":[{"hasRating":true,"type":"overallRating","value":2.9},{"hasRating":true,"type":"ceoRating","value":0.54},{"hasRating":true,"type":"bizOutlook","value":0.35},{"hasRating":true,"type":"recommend","value":0.4},{"hasRating":true,"type":"compAndBenefits","value":2.4},{"hasRating":true,"type":"cultureAndValues","value":2.5},{"hasRating":true,"type":"careerOpportunities","value":2.5},{"hasRating":true,"type":"workLife","value":2.4},{"hasRating":true,"type":"seniorManagement","value":2.3}],"week":0,"year":0}
Whether this URL can be altered for use in obtaining information for other employers, I regret, I cannot tell you.
I need help with python...
Do you know how I can check response of web browser after clicking button login (here: submit)?
I want to compare html code and return True if my login will be a successful but unfortunately I don't know how. :/ Any hint would be priceless. :)
That's my code from selenium:
driver.find_element_by_css_selector("input.username").send_keys("margie")
driver.find_element_by_css_selector("input.password").clear()
driver.find_element_by_css_selector("input.password").send_keys("margie")
driver.find_element_by_css_selector("div.btn.submit").click()
Can I use "if"?
Thank you for your time guys!
You can just continue using driver after clicking on submit button. For example, driver.page_source would contain the html code of the page displayed after the login.
There is no universal silver bullet to check if the login was successful or not. It depends on the web-site you are testing against: it may redirect to a particular url, have certain elements on the web page, certain title etc.
Just a side note.
If you would follow Page Object pattern/concept, you would have a separate object for Login Page and a separate object for the page displayed after the login aka Home Page - like in this example. The actual page look check after the login would be incapsulated inside the Home Page object realization which would make things clearer and better organized.
See also:
How to assert in selenium test case for login success?
Page Objects in Python