Select column value based on multiple conditions XML - python

I am trying to find and click on the second sibling of an element that is identified by two conditions using By.XPATH in the following table:
Type == "Renewal" and Seq# == 1, but I cannot seem to make the two conditions work:
driver.find_element(By.XPATH, f"//td[#text()='Renewal' and text()='1']/followingsibling::td[2]/a").click()

To click on the element associated with Type == "Renewal" and Seq# == 1 you can use either of the following locator strategies:
Using xpath and following:
driver.find_element(By.XPATH, "//td[text()='Renewal']//following::td[contains(., '1')]//following::td[1]").click()
Using xpath and following-sibling:
driver.find_element(By.XPATH, "//td[text()='Renewal']//following-sibling::td[contains(., '1')]//following-sibling::td[1]").click()
Note : You have to add the following imports :
from selenium.webdriver.common.by import By

Related

Keep only an element of a webpage while web-scraping

I am trying to extract a table from a webpage with python. I managed to get all the contents inside of that table, but since I am very new to webscrapping I don't know how to keep only the elements that I am looking for.
I know that I should look for this class in the code: <a class="_3BFvyrImF3et_ZF21Xd8SC", which specify the items in the table.
So how can I keep only those classes to then extract the title of them?
<a class="_3BFvyrImF3et_ZF21Xd8SC" title="r/Python" href="/r/Python/">r/Python</a>
<a class="_3BFvyrImF3et_ZF21Xd8SC" title="r/Java" href="/r/Java/">r/Java</a>
I miserably failed in writing a code for that. I don't know how I could extract only these classes, so any inputs will be highly appreciated.
To extract the value of title attributes you can use list comprehension and you can use either of the following locator strategies:
Using CSS_SELECTOR:
print([my_elem.get_attribute("title") for my_elem in driver.find_elements(By.CSS_SELECTOR, "a._3BFvyrImF3et_ZF21Xd8SC[title]")])
Using XPATH:
print([my_elem.get_attribute("title") for my_elem in driver.find_elements(By.XPATH, "//a[#class='_3BFvyrImF3et_ZF21Xd8SC' and #title]")])
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
Okay, I have made a very simple thing that worked.
Basically I pasted the code on VSCODE and the selected all the occurrences of that class. Then I just had to copy and paste in another file. Not sure why the shortcut CTRL + Shift + L did not work, but I have managed to get what I needed.
Select all occurrences of selected word in VSCode

How to locate next element after text found in b tag - Selenium Python

I'm trying to extract the text value following a b tag that contains specific text. I'm using Selenium web driver with Python3.
The HTML inspected for the value I'm trying to return (11,847) is here:
This has an Xpath below (I'm not using this xpath directly to find the element as the table construction changes for different examples that I plan to iterate through):
/html/body/form[1]/div[2]/table[2]/tbody/tr[3]/td[2]/text()
As an example, when I print the below it returns Att: i.e. the element located by my search for the text 'Att' within the b tags.
att=driver.find_element("xpath",".//b[contains(text(), 'Att')]").text
print(att)
Is there a way I can return the value following <b>Att:</b> by searching for 'Att:' (or conversly, I'd also like to return the value following <b>Ref:</b>.
Thanks in advance.
11,847 text content belongs to td node.
You can locate this td element by it's child b text content.
Then you will be able to retrieve the entire text content of that td node.
It will contain Att: and extra spaces and the desired 11,847 string.
Now you will need to remove the Att: and extra spaces so only 11,847 will remain.
As following:
#get the entire text content
entire_text = driver.find_element(By.XPATH,"//td[.//b[contains(text(), 'Att')]]").text
#get the child node text content
child_text = driver.find_element(By.XPATH,"//b[contains(text(), 'Att')]").text
#remove child text content from entire text content
goal_text = entire_text.replace(child_text,'')
#trim white spaces
goal_text = goal_text.strip()
You can use the find_element_by_xpath() method to locate the element that contains the text 'Att:' and then use the find_element_by_xpath() method again to locate the following text node. Here is an example of how you can do this:
att_element = driver.find_element_by_xpath("//b[contains(text(), 'Att:')]")
att_value = att_element.find_element_by_xpath('./following-sibling::text()').text
print(att_value)
This will locate the element that contains the text 'Att:', then locate the following text node, and return the text value of that node.
Similarly you can use the same xpath for 'Ref:' as well just change the text part to 'Ref:'
ref_element = driver.find_element_by_xpath("//b[contains(text(), 'Ref:')]")
ref_value = ref_element.find_element_by_xpath('./following-sibling::text()').text
print(ref_value)
Note that this will only work if the text value you're trying to extract is immediately following the element that contains 'Att:' or 'Ref:' in a text node.
The following xpath would result in an error:
/html/body/form[1]/div[2]/table[2]/tbody/tr[3]/td[2]/text()
as Selenium returns only WebElements but not objects.
Solution
The text 11,847 is within a text node which is the second decendent of the <td> node. So to print the text you have to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following locator strategies:
Using XPATH and childNodes[n]:
print(driver.execute_script('return arguments[0].childNodes[2].textContent;', WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//tr[#class='initial']//td[#align='right']")))).strip())
Using XPATH and splitlines():
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//tr[#class='initial']//td[#align='right']"))).get_attribute("innerHTML").splitlines()[2])
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

Any built-in way for branching waits with OR conditions?

After I click a button on a webpage, one of two things can happen. Normally, I would use a wait until when there's a single event outcome, but is there any built in methodology where I can wait until 1 of two things happens i.e. one of two elements exists?
To wait until either of two elements you can induce WebDriverWait for either of the two elements through the OR option and you can use either of the following approaches:
Using CssSelector you can pass the expressions seperated by comma as follows:
element = WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.CSS_SELECTOR, ".classA, .classB"))
Using CssSelector and lambda you can pass the expressions through OR condition as follows:
element = WebDriverWait(driver,20).until(lambda driver: driver.find_element(By.CSS_SELECTOR, "tagname.classname") or driver.find_element(By.CSS_SELECTOR, "tagname#elementID"))
Using XPath you can pass the expressions through OR condition as follows:
element = WebDriverWait(driver,20).until(EC.visibility_of_element_located((By.XPATH, "//tag_name[#class='elementA' or #id='elementB']"))
Using XPath and lambda you can pass the expressions through OR condition as follows:
element = WebDriverWait(driver,20).until(lambda driver: driver.find_element(By.XPATH,"xpathA") or driver.find_element(By.XPATH,"xpathB"))
Reference
You can find a couple of relevant discussions in:
WebDriverWait for multiple conditions (OR logical evaluation)
selenium two xpath tests in one
Python / Selenium: Logic Operators in WebDriverWait Expected Conditions
you can do it tbh this is in java i dont know about python but this is at least an idea if you have two condition to be met for example input 1 to be present and input 2
I would create two boolean variabels that i will put in them driver.findelements(by....).size()>0;
then i will just add an if so in case not both of them show it will crash or do what ever i want.
Code example:
Boolean AccesViaLogin = driver.findElements(By.id("username_login")).size() > 0;
Boolean AccesViaHomepage = driver.findElements(By.xpath("//button[contains(text(),\"Connexion\")]")).size() > 0;
if (AccesViaLogin == true && AccesViaHomepage == true) {
}

How to find the class path for the number of recovered people from covid using Selenium and Python

So, I need to get the text (number of recovered people from covid) from this webpage into the console, but I can't find the class for the numbers can someone help me to locate the class, so I can print the numbers into the console. I need to use PhantomJS cuz I don't want the log to open when I run the code.
from selenium import webdriver
driver = webdriver.PhantomJS()
driver.get('https://www.tvnet.lv/covid19Live')
text = driver.find_element_by_class_name("covid-summary__count covid-c-recovered")
print(text)
find_element_by_class_name() expects a single class as an argument but you are providing two class names (class is a "multi-valued attribute", multiple values are separated by a space).
Either check for a single class:
driver.find_element_by_class_name("covid-c-recovered")
Or, switch to a CSS selector:
driver.find_element_by_css_selector(".covid-summary__count.covid-c-recovered")
Digging Deeper
Let's look at the source code. When elements are searched by class name, Python selenium actually constructs a CSS selector under the hood:
elif by == By.CLASS_NAME:
by = By.CSS_SELECTOR
value = ".%s" % value
This means that when you've used covid-summary__count covid-c-recovered as a class name value, the actual CSS selector that was used to find an element happened to be:
.covid-summary__count covid-c-recovered
which understandably did not match any elements (covid-c-recovered would be considered as a tag name here).
If you want the number make sure you have dots between class names.
driver.get('https://www.tvnet.lv/covid19Live')
element = driver.find_element_by_class_name("covid-summary__count.covid-c-recovered")
print(element.text)
Outputs
19 072
From per the documentation of selenium.webdriver.common.by implementation:
class selenium.webdriver.common.by.By
Set of supported locator strategies.
CLASS_NAME = 'class name'
So using find_element_by_class_name() you won't be able to pass multiple class names as it accepts a single class.
Solution
To print the number of people HEALED you can use either of the following Locator Strategies:
LATVIJĀ:
print(driver.find_element_by_xpath("//h1[contains(., 'COVID-19 LATVIJĀ')]//following::ul[1]//p[#class='covid-summary__count covid-c-recovered']").text)
PASAULĒ:
print(driver.find_element_by_xpath("//h1[contains(., 'COVID-19 PASAULĒ')]//following::ul[1]//p[#class='covid-summary__count covid-c-recovered']").text)
Ideally you need to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies:
LATVIJĀ:
driver.get("https://www.tvnet.lv/covid19Live")
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//h1[contains(., 'COVID-19 LATVIJĀ')]//following::ul[1]//p[#class='covid-summary__count covid-c-recovered']"))).text)
PASAULĒ:
driver.get("https://www.tvnet.lv/covid19Live")
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//h1[contains(., 'COVID-19 PASAULĒ')]//following::ul[1]//p[#class='covid-summary__count covid-c-recovered']"))).text)
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
Console Output:
19 072
52 546 925
You can find a relevant discussion in How to retrieve the text of a WebElement using Selenium - Python
References
You can find a couple of relevant detailed discussions in:
Invalid selector: Compound class names not permitted error using Selenium
How to locate an element with multiple class names?

Order of found elements in Selenium

I'm using selenium with python to interact with a webpage.
There is a table in the webpage. I'm trying to access to its rows with this code:
rows = driver.find_elements_by_class_name("data-row")
It works as expected. It returns all elements of the table.
The question is, is the order of the returned elements guaranteed to be the same as they appear on the page?
For example, Will the first row that I see in the table in browser ALWAYS be the 0th index in the array?
You shouldn't be depending on the fact whether Selenium returns the elements in the same order as they appear on the webpage or DOM Tree.
Each WebElement within the HTML DOM can be identified uniquely using either of the Locator Strategies.
Though you were able to pull out all the desired elements using find_elements_by_class_name() as follows:
rows = driver.find_elements_by_class_name("data-row")
Ideally, you need to induce WebDriverWait for the visibility_of_all_elements_located() and you can use either of the following Locator Strategies:
Using CLASS_NAME:
element = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CLASS_NAME, "data-row")))
Using CSS_SELECTOR:
element = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, ".data-row")))
Using XPATH:
element = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//*[#class='data-row']")))
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
You can find a detailed discussion in WebDriverWait not working as expected

Categories

Resources