Get the current url when it's not valid with Selenium Python - python

I'm an beginner learning web scraping with Selenium. Recently I faced the problem that sometimes there are button elements that do not have a "href" attribute with link to the website it leads to. In order to obtain the link or useful information from that link, I need to click on the button and get the current url in the new window using the "current_url" method. However, it doesn't always work, when the new url is not valid. I'm asking for help on the solution.
To give you an example, say one wants to obtain the Spotify link to the song listed on https://www.what-song.com/Tvshow/100242/BoJack-Horseman/e/116712. After clicking on the Spotify button, instead of being directed to spotify web player, I see a new window popping up with this url "spotify:track:6ta5yavnnEfCE4faU0jebM". It's not valid probably due to some errors made by the website, but the identifier "6ta5yavnnEfCE4faU0jebM" is still useful so I want to obtain it.
However, when I try using the "current_url" method, it gives me the original link "https://www.what-song.com/Tvshow/100242/BoJack-Horseman/e/116712", instead of the invalid url. My codes are attached below. Note that I already have a time.sleep.
Specs: MacOS 12.6, chrome and webdriver version 106.something, Python 3.
s = Service('/web_scraping/chromedriver')
driver = webdriver.Chrome(service=s)
wait = WebDriverWait(driver, 3)
driver.get('https://www.what-song.com/Tvshow/100242/BoJack-Horseman/e/116712')
spotify_button_element = driver.find_element("xpath",'/html/body/div/div[2]/main/div[2]/div/div[1]/div[5]/div[1]/div[2]/div/div/div[2]/div/div[1]/button[3]')
driver.execute_script("arguments[0].click();", spotify_button_element)
time.sleep(3)
print(driver.current_url)
Any idea on why this happened and how to fix it? Hugh thanks in advance!

What you could do instead of finding the button to click and opening a new tab is to do the following:
import json
spotify_data_request = driver.find_element("id",'__NEXT_DATA__') # get the data stored in a script tag with id = '__NEXT_DATA__'
temp = json.loads(spotify_data_request.get_attribute('innerHTML')) # convert the string into a dict like object
print(temp['props']['pageProps']['episode']['songs'][0]['song']['spotifyId']) # get the Id attribute that you want instead of having to click the spotify button and retrieve it from the URL

Related

SELENIUM Can't access payment form/fields

https://squidindustries.co/checkout
checkout_cc_number = driver.find_element_by_id("number")
checkout_cc_number.send_keys(card_number)
When I try to input information into the card number field I get an error saying the element could not be located. I tried using time.sleep and driver.implicitly_wait when i first got to the page but both failed. Any ideas?
The element is in a frame (i.e. a webpage within a webpage). Selenium will look for elements in the page it has loaded and not within frames. That's the problem.
To solve this we just need a bit more code, which will tell Selenium to look in the frame.
The example you've given is several pages deep into a shopping cart, so I'm going to use a much more accessible example instead: the mozilla guide to iframes.
Here is some code to open that page and then click the CSS button within the frame:
from selenium import webdriver
import time
browser = webdriver.Chrome()
browser.get(r"https://developer.mozilla.org/en-US/docs/Web/HTML/Element/iframe")
time.sleep(5)
browser.switch_to.frame(browser.find_element_by_class_name("interactive"))
css_button = browser.find_element_by_id("css")
css_button.click()
browser.switch_to.default_content()
There are two lines that are important. The first one is:
browser.switch_to.frame(browser.find_element_by_class_name("interactive"))
That finds the frame and then switches to it. Once we have done that, any code that looks for elements will be looking in the frame and not in the page that we navigated to. That is what you need to do to access the number element. In your example the class of the frame is card-fields-iframe, so use that instead of interactive.
The second important line is:
browser.switch_to.default_content()
That reverts the previous line. So now Selenium will be looking for elements within the page that we navigated to. You'll want to do that after interacting with the frame, so that you can continue through the shopping cart.
have you tried getting the input element using the DOM? what happens if you do document.getElementById('number') ?
I ran into the same issue, and with checkouts, as you mentioned, all the iframe class names are the same. What I did was get all the iframes with the same class name as a list:
iframes = driver.find_elements(By.CLASS_NAME, "card-fields-iframe")
I then switched through the iframes referencing each one by its place in the list. Since there are only four fields in the checkout, the list is only 4 elements long, starting with [0].
driver.switch_to.frame(iframes[0])
number = driver.find_element(By.ID, "number")
if number.is_displayed:
number.send_keys("4000300040005000")
driver.switch_to.default_content()
It's important to note that switching back to the default content, using driver.switch_to.default_content(), before switching to the next frame is the only way I was able to make this work. The is_displayed function just checks to see whether the element is on the page or not.

Taking full page screenshot in chrome store SELENIUM PYTHON

I'm trying to save a full-page screenshot of a chrome store page, using selenium, and python 3.
I've searched online for different answers and I keep getting only the "header" part, no matter what I try. As if the page doesn't scroll for the next "section".
I tried clicking inside the page to verify it's in focus but that didn't help.
Tried answers with stitching and imported Screenshots and Image.
my current code is:
ob = Screenshot_Clipping.Screenshot()
driver2 = webdriver.Chrome(executable_path=chromedriver)
url = "https://chrome.google.com/webstore/detail/online-game-zone-new-tab/abalcghoakdcaalbfadaacmapphamklh"
driver2.get(url)
img_url = ob.full_Screenshot(driver, save_path=r'.', image_name='Myimage.png')
print(img_url)
print('done')
driver2.close()
driver2.quit()
but that gives me this picture:
What am I doing wrong?

Access widget window beautifulsoup python mechanize

I am trying to scrape information off websites like this:
https://www.glassdoor.com/Overview/Working-at-7-Eleven-EI_IE3581.11,19.htm
using python + beautifulsoup + mechanize.
Accessing anything on the main-site is no problem. However, I also need the information that appears in a overlay-window that appears when one clicks on the "Rating Trends" button next to the bar with stars.
This overlay-window can also be accessed directly by using the url:
https://www.glassdoor.com/Reviews/7-Eleven-Reviews-E3581.htm#trends-overallRating
The html associated with this page is a modification of the original site's html.
However, regardless of what element I try to find (via findAll ) on that overlay-window website, beautifulsoup returns zero hits.
How can I fix this? I tried adding a sleep time between accessing the website and reading anything in, to no avail.
Thanks!
If you're using the Chrome browser select the background of that page (without the additional information displayed) and select 'Inspect' from the context menu (for Windows anyway), then the 'Network' tab, so that you can see network traffic. Now click on 'Rating trends'. The entry marked 'xhr' will be https://www.glassdoor.ca/api/employer/3581-rating.htm?locationStr=&jobTitleStr=&filterCurrentEmployee=false&filterEmploymentStatus=REGULAR&filterEmploymentStatus=PART_TIME (I much hope!) and its contents will be the following.
{"employerId":3581,"ratings":[{"hasRating":true,"type":"overallRating","value":2.9},{"hasRating":true,"type":"ceoRating","value":0.54},{"hasRating":true,"type":"bizOutlook","value":0.35},{"hasRating":true,"type":"recommend","value":0.4},{"hasRating":true,"type":"compAndBenefits","value":2.4},{"hasRating":true,"type":"cultureAndValues","value":2.5},{"hasRating":true,"type":"careerOpportunities","value":2.5},{"hasRating":true,"type":"workLife","value":2.4},{"hasRating":true,"type":"seniorManagement","value":2.3}],"week":0,"year":0}
Whether this URL can be altered for use in obtaining information for other employers, I regret, I cannot tell you.

Python Webdriver need code to find a particular element

I have used Python 2.7, Webdriver and Chrome to access Pinterest to insert images to a board. I have successfully logged in to the site, created a board and clicked on the Pin Image button (thanks to Stack Overflow). The problem that I have is to identify and click the “No Thanks” button using xpath find elements code. I attach an image of the web page and the Chrome inspect on the element.
Pinterest 'Popup'
Not Now Element code
I guess you can give a try to this xpath, which will grap the first element containing the class "cancelButton". Hopefully, the button on your popup will be the first element on the page containing this class.
button[contains(#class, 'cancelButton')]
hope this helps
//span[contains(text(),'Not now')]
this is general syntax: //tag[contains(attribute,‘value’)]

Identifying URL bar element in Firefox

What is the "tag name" (or other way to identify the element) for the URL field in Firefox?
e.g. when you want to open a new tab in Firefox, you can select the body by:
body = driver.find_element_by_tag_name('body')
and, for example, open a new tab:
body.send_keys(Keys.CONTROL + 't')
Is there a simple "tag name" that anyone knows of for the URL bar?
Normally, you can right click on elements in Firefox and press Q to identify them, but in this case, that doesn't apply.
EDIT: I am not trying to find the URL bar element so that I can navigate to a new web page. I'd like to find it so I can send the "return" key to it as a workaround to refresh the page.
I believe it's 'urlbar'. At least that's how you call it when you code autocomplete features...
The URL bar that you see in the browser is not a part of the page.
If you want to get the current url, use .current_url:
driver.current_url
If you want to change the url, use .get():
driver.get("new_url_here")
You can also navigate through the browsing history using back() and forward().
FYI, Navigating documentation page has a lot of relevant information.

Categories

Resources