I have two simple questions relevant to Selenium. Actually I am new to this framework.
The questions are raised for:
HERE ARE CHINESE CHARACTERS
and
HERE ARE ANOTHER CHINESE CHARACTERS
How can I use selenium to click each anchor?
Please note that the first has the keyword "title", I know I may use it for match, but no idea how to realize it. I don't plan to use the CHARACTERS presented there, because it varies depending on different projects.
And the second, in this case, the CHINESE CHARACTERS are fixed. Due to there is no other clue, I think I will have to only use it for detection and issue a click event by Selenium.
Please advise, thanks.
You have multiple ways to find both links. Which option to choose depends on the locations of the links on the page, uniqueness of element attributes, text etc.
That said, there are two relevant methods that should be tried first:
find_element_by_link_text()
find_element_by_partial_link_text()
Here is how you should use them:
first_link = driver.find_element_by_link_text(u'HERE ARE CHINESE CHARACTERS')
first_link.click()
second_link = driver.find_element_by_link_text(u'HERE ARE ANOTHER CHINESE CHARACTERS')
second_link.click()
where driver is a Webdriver instance, e.g.:
from selenium import webdriver
driver = webdriver.Firefox()
If you cannot rely on the link texts, then check title and onclick attributes and use one of the following methods:
find_element_by_xpath()
find_element_by_css_selector()
Example (using the first method of two):
first_link = driver.find_element_by_xpath('//a[#title="title_text"]')
first_link.click()
second_link = driver.find_element_by_xpath('//a[#onclick="anotherjsfunc();"]')
second_link.click()
You can use cssSelector in both cases.
css for first a tag should look like
[title='title_text']
and 2nd a tag
[onclick='anotherjsfunc();']
Related
I am new to scraping with Python and have encountered a weird issue.
I am attempting to scrape of OCR'd newspaper articles from a list of URLS using selenium -- the proxy settings on the data source make this easier than other options.
However, I receive tracebacks for the text data every time I run my code. Here is the code that I am using:
article_links = []
for link in driver.find_elements_by_xpath('/html/body/div[1]/main/section[1]/ul[2]/li[*]/div[2]/div[1]/h3/a'):
links = link.get_attribute("href")
article_links.append(links)
articles = []
for article in article_links:
driver.switch_to.window(driver.window_handles[-1])
driver.get(article)
driver.find_element_by_css_selector("#js-doc-explorer-show-additional-views").click()
time.sleep(1)
for article_text in driver.find_elements_by_css_selector("#ocr-container > div.fulltext-ocr.js-page-ocr"):
articles.append(article_text)
I come closest to solving the issue by using .click(), which opens a hidden panel for my data. However, upon using this code, the only data that fills is the last row in the dataset. Without the .click(), all rows come back with nothing. Changing the sleep settings also does not help.
The Xpath for the text data is:
/html/body/div[2]/main/section/div[2]/div[2]/section[2]/div/div[4]/text()
Alternatively, is there a way to get each link's source code and parse it with beautifulsoup after the fact?
UPDATE: There has to be something wrong with the loops -- I can get either the first or last values, but nothing in between.
In a more recent version of Selenium, the method find_elements_by_xpath() is deprecated. Is that the issue you are facing? If it is, import from selenium.webdriver.common.by import By and change it to find_elements(By.XPATH, ...) Similarly, find_elements_by_css_selector() is replaced with find_elements(By.CSS_SELECTOR, ...)
You don't specify if this is even the issue, but if it is, I hope this helps :-)
The solution is found by calling the relevant (unique) class and specifying that it must contain text.
news = []
for article in article_links:
driver2.get(article)
driver2.find_element(By.CSS_SELECTOR, "#js-doc-explorer-show-additional-views").click()
article_text = driver2.find_element(By.XPATH, '//div[#class="fulltext-ocr js-page-ocr"][contains(text()," ")]')
news.append([article_text.text])
this the website I am dealing with https://www.bseindia.com/corporates/ann.html?curpg=1&annflag=1&dt=20211021&dur=P&dtto=20211027&cat=Insider%20Trading%20/%20SAST&scrip=&anntype=A
Here I am able to send the code for the "security name" but in-order to submit it I need to click the dropdown element that comes after giving the security name. How do I achieve this with selenium. I used the code below and its not working(StaleElementReferenceException)
security_name = driver.find_element_by_id("scripsearchtxtbx")
security_name.send_keys('INE350H01032')
sec_click = driver.find_element_by_xpath('//*[#id="ulSearchQuote2"]/li')
sec_click.click()
This can also be accomplished using the Keys Library within Selenium. Selenium Keys not only sends input statements like strings, but it can also send commands such as escape, tab, or in this case enter. Your updated code should look as follows:
security_name = driver.find_element_by_id("scripsearchtxtbx")
security_name.send_keys('INE350H01032')
security_name.send_keys(Keys.ENTER)
sec_click = driver.find_element_by_xpath('//*[#id="ulSearchQuote2"]/li')
sec_click.click()
These are called special keys. For more examples and more information on this, see this link
you can paly with Xpath:
security_name = driver.find_element_by_id("scripsearchtxtbx")
security_name.send_keys('INE350H01032')
sec_click = driver.find_element_by_xpath("//ul[#id='ulSearchQuote2']/li//strong[text()='INE350H01032']")
sec_click.click()
same -->("//ul[#id='ulSearchQuote2']//strong[text()='INE350H01032']")
you can check more about xpath here: xpath
I'm trying to create a program that clicks on boxes that contain a certain word, however all of the boxes have other words around them.
For example the site has a bunch of recipes, however I just want the ones that contain the word "soup". So it needs to be able to click on text that say, "tomato soup, "yummy soup", "some other soup type soup", and so on.
I've found this line.
WebDriverWait(driver, 10).until(expected_conditions.element_to_be_clickable((By.XPATH, "//span[text()='Soup']"))).click()
which is great but only works if you put the exact text in it.
Ex. WebDriverWait(driver, 10).until(expected_conditions.element_to_be_clickable((By.XPATH, "//span[text()='Tomato Soup']"))).click()
If anyone knows how to do a more loose find that would be a big help. Thank You.
If you are using XPath 2.0 you could use a regular expression so look for everything that contains Soup.
//*[matches(#id, '.*Soup.*')]
Maybe take a look at How to use regex in XPath "contains" function.
Update
browser = webdriver.Chrome()
browser.get("https://www.allrecipes.com/search/results/?wt=Soup&sort=re&page=10")
elems = browser.find_elements_by_xpath("//span[contains(text(), 'Soup')]")
for elem in elems[:2]:
print(elem.text)
I'm having trouble selecting a button in my Splinter script using the find_by_css method. The documentation is sparse at best, and I haven't found a lot of good articles out there with examples.
br.find_by_css('div#edit-field-download-files-und-0 a.button.launcher').first.click()
...where br is my browser instance.
I've tried a few different ways of writing it. I'm really not sure how I'm supposed to do it because the documentation doesn't give any hard examples of the syntax.
Here's a screenshot of the element.
Sorry the screenshot kind of sucks.
Does anyone have any experience with this?
The css selector looks alright, just that i am not sure from where have you got find_by_css as a method?
How about this :-
br.find_element_by_css_selector("div#edit-field-download-files-und-0 a.button.launcher").click()
Selenium provides the following methods to locate elements in a page:
find_element_by_id
find_element_by_name
find_element_by_xpath
find_element_by_link_text
find_element_by_partial_link_text
find_element_by_tag_name
find_element_by_class_name
find_element_by_css_selector
To find multiple elements (these methods will return a list):
find_elements_by_name
find_elements_by_xpath
find_elements_by_link_text
find_elements_by_partial_link_text
find_elements_by_tag_name
find_elements_by_class_name
find_elements_by_css_selector
I'm working on something similar where I'm trying to click stuff on a webpage. The documentation for find_by_css() is very poor and you need to type the css path to the element you want to click.
Say we want to go to the about tab on python.org
from splinter import Browser
from time import sleep
with Browser() as browser: #<--Create browser instance (firefox default driver)
browser.visit('http://www.python.org') #<--Visits url string
browser.find_by_css('#about > a').click()
# ^--Put css path here in quotes
sleep(5)
If your connection is good you might not get the chance to see the about tab getting clicked but you should end up on the about page.
The hard part is figuring out the css path to an element. However once you have it, the find_by_css() method looks pretty easy
I like the W3Schools reference for CSS selection parameters: http://www.w3schools.com/cssref/css_selectors.asp
As for your code... I recommend breaking this down into a few steps, at least during debug. The call to br.find_by_css('css_string') returns a list of elements. So you can grab that list and check the count.
elems = br.find_by_css('div#edit-field-download-files-und-0 a.button.launcher')
if len(elems) == 1:
elems.first.click()
If you don't check the length of the returned list and call '.first' on an empty list, you'll get an exception. If len > 1, you're probably getting things you don't expect.
Each id on a page is unique, and you can daisy-chain searches, so you can use a few different statements to make this happen:
id_elems = br.find_by_id('edit-field-download-files-und-0')
if id_elems:
id_elem = id_elems.first
a_elems = id_elem.find_by_tag("a")
for e in a_elems:
if e.has_class("button launcher"):
print('Found it!')
e.click()
This is, of course, just one of many ways to do this.
Lastly, Splinter is a wrapper around Selenium and other webdrivers. It's possible that, even after you find the element to click, the actual click won't do anything. If this happens, you can also try clicking on the wrapped Selenium object, available as e._element. So you could try e._element.click() if necessary.
I'm using Selenium and coding with Python.
I'm trying to do the following: for a flight search website, under Flight 1's 'Enter routing code' text box, type 'AA'
This is the code that I have at the moment:
flight1_routing = driver.find_element_by_xpath(".//*[#id='ita_form_location_RouteLanguageTextBox_0']")
flight1_routing.clear()
flight1_origin.send_keys("AA")
But instead, I get this error message: "invalid element state: Element is not currently interactable and may not be manipulated". How can this be with a regular text field that is also not an autocomplete field, AFAIK?
if you get Element is not currently interactable check if the element is not disabled and its visible. if you want to hack it execute JS to enable it.
i visited the homepage id ita_form_location_RouteLanguageTextBox_0 doesnt exist also under flight one there's no Enter routing code. i can see the text box saying airport city or city name
Also if you have the id prefer to use find_element_by_id if not try to use css selector if you can rather than xpath. Its much cleaner.
Update
here's a working script:
As recomended above, the elements selected are not visible. what is actualy done, is that there's 5-6 different elements all hidden and when you click on show advanced route it picks 2 random ones and makes them visible.
So the id is not always the same. If you use the same id you will get a hidden element some times(because it picks random ids) so selenium is not able to deal with it. i made a selector that gets the 2 hidden elements
from selenium import webdriver
import selenium.webdriver.support.ui as ui
driver = webdriver.Firefox()
driver.get("http://matrix.itasoftware.com/")
#click on the multi tab
tab = driver.find_element_by_id("ita_layout_TabContainer_0_tablist_ita_form_multislice_MultiSliceForm_0").click()
#click on the advanced routes
advanced_routing=ui.WebDriverWait(driver, 10).until(
lambda driver : driver.find_element_by_id("sites_matrix_layout_RouteLanguageToggleLink_1")
)
advanced_routing.click()
#get all visible elements with id like ita_form_location_RouteLanguageTextBox. its similar to regex ita_form_location_RouteLanguageTextBox.*
element = ui.WebDriverWait(driver, 10).until(
lambda driver : driver.find_elements_by_css_selector("[id*=ita_form_multislice_MultiSliceRow] [id*=ita_form_location_RouteLanguageTextBox]")
)
element[0].send_keys("foo")
element[1].send_keys("bar")
import time
time.sleep(20)
Did you click into the correct tab first & enable advanced routing codes?? e.g.
#Go to right tab
driver.find_element_by_css_selector("div#ta_layout_TabContainer_0_tablist_ita_form_multislice_MultiSliceForm_0 > span").click()
#Enable routing
driver.find_element_by_css_selector("a.itaToggleLink").click()
#note I seem to get a different id to the one you're using, assuming its dynamic numbering so handling all cases
#if you know how the dynamic numbering works youmay be able to deduce a single id that will work for your test case
#Instead I'm going for finding all elements matching a pattern then searching through them, assuming only one will be visible
flight1_routings = driver.find_elements_by_css_selector("input[id^='ita_form_location_RouteLanguageTextBox_']")
#probably better finding it then using it separately, but I was feeling lazy sorry.
for route in flight1_routings:
if route.is_displayed():
route.clear()
route.send_keys("AA")
Also you can probably skip the .clear() call as it looks like the box starts with no text to overwrite.
Edit: Updated the enable routing toggling to handle not knowing the id, the class name stays the same, should work. Handling finding the input despite variable id as suggested by foo bar with the css selector, just then iterating over that list and checking if its on top