How I can know, that there is a text or not. I used this code:
pricelist = driver.find_elements_by_xpath(".//*[#id='scroll']/div[2]/div/div[2]/div")
if EC.text_to_be_present_in_element(By.XPATH(".//*[#id='scroll']/div[2]/div/div[2]/div[1]")):
price = pricelist[1].text
elif EC.text_to_be_present_in_element(By.XPATH(".//*[#id='scroll']/div[2]/div/div[2]/div[2]")):
price = pricelist[2].text
else:
price = pricelist[3].text
Problem:
TypeError: 'str' object is not callable
You can use Xpath to check length of text in the element. See code below.
price = driver.find_element_by_xpath(".//*[#id='scroll']/div[2]/div/div[2]/div[string-length(text()) > 1]").text
You can change the text length in xpath like /div[string-length(text()) > 10] to find an element which contains more than 10 characters.
By.XPATH is not a method, but simple string. Correct usage is
EC.text_to_be_present_in_element((By.XPATH, xpath_expression)))
If you want to match element that contains text, use [text()] predicate:
".//*[#id='scroll']/div[2]/div/div[2]/div[text()]"
I have used another method:
pricelist = driver.find_elements_by_xpath(".//*[#id='scroll']/div[%d]/div/div[2]/div" %num)
for prc in pricelist:
if len(prc.text) < 10 and len(prc.text) > 0:
price = prc.text
This method is slow enough, but the most readeable
Related
I'm trying to scrape a website and get every meal_box meal_container row in a list by driver.find_elements but for some reason I couldn't do it. I tried By.CLASS_NAME, because it seemed the logical one but the length of my list was 0. Then I tried By.XPATH, and the length was then 1 (I understand why). I think I can use XPATH to get them one by one, but I don't want to do it if I can handle it in a for loop.
I don't know why the "find_elements(By.CLASS_NAME,'print_name')" works but not "find_elements(By.CLASS_NAME,"meal_box meal_container row")"
I'm new at both web scraping and stackoverflow, so if any other details are needed I can add them.
Here is my code:
meals = driver.find_elements(By.CLASS_NAME,"meal_box meal_container row")
print(len(meals))
for index, meal in enumerate(meals):
foods = meal.find_elements(By.CLASS_NAME, 'print_name')
print(len(foods))
if index == 0:
mealName = "Breakfast"
elif index == 1:
mealName = "Lunch"
elif index == 2:
mealName = "Dinner"
else:
mealName = "Snack"
for index, title in enumerate(foods):
recipe = {}
print(title.text)
print(mealName + "\n")
recipe["name"] = title.text
recipe["meal"] = mealName
Here is the screenshot of the HTML:
It seems Ok but about class name put a dot between characters.
Like "meal_box.meal_container.row" Try this.
meals = driver.find_elements(By.CLASS_NAME,"meal_box.meal_container.row")
Try to use driver.find_element_by_css_selector
It can be because "meal_box meal_container row" is inside of other element. So you should try finding the highest element and look for needed one inside.
root = driver.find_element(By.CLASS_NAME,"row")
meals = root.find_elements(By.CLASS_NAME, "meal_box meal_container row")
I'm trying to workout the number of for loops to run depending on the number of List (totalListNum)
And it seems that it is returning Nonetype when infact it should be returning either text or int
website:https://stamprally.org/?search_keywords=&search_keywords_operator=and&search_cat1=68&search_cat2=0
Code Below
for prefectureValue in prefectureValueStorage:
driver.get(
f"https://stamprally.org/?search_keywords&search_keywords_operator=and&search_cat1={prefectureValue}&search_cat2=0")
# Calculate How Many Times To Run Page Loop
totalListNum = driver.find_element_by_css_selector(
'div.page_navi2.clearfix>p').get_attribute('text')
totalListNum.text.split("件中")
if totalListNum[0] % 10 != 0:
pageLoopCount = math.ceil(totalListNum[0])
else:
continue
currentpage = 0
while currentpage < pageLoopCount:
currentpage += 1
print(currentpage)
I dont think you should use get_attribute. Instead try this.
totalListNum = driver.find_element_by_css_selector('div.page_navi2.clearfix>p').text
First, your locator is not unique
Use this:
div.page_navi2.clearfix:nth-of-type(1)>p
or for the second element:
div.page_navi2.clearfix:nth-of-type(2)>p
Second, as already mentioned, use .text to get the text.
If .text does not work you can use .get_attribute('innerHTML')
Hello guys I'm currently trying to build a phone_numbers generator with two methods. The first one is phone_numbers and should be called every time you need new generated number. You can get the numbers from next_phone_numbers. This function should also call phone_numbers if there are no numbers available. However I always get an error message when I call it. Here is my code:
def phone_number(self):
request = requests.get('https://www.bestrandoms.com/random-at-phone-number')
soup = BeautifulSoup(request.content, 'html.parser')
self.phone_numbers = soup.select_one('#main > div > div.col-xs-12.col-sm-9.main > ul > textarea').text
self.phone_numbers = self.phone_numbers.splitlines()
for phone in range(len(self.phone_numbers)):
self.phone_numbers[phone] = '+43' + self.phone_numbers[phone][1:].replace(' ', '')
self.phone_numbers.extend(self.phone_numbers)
return self.phone_numbers
#property
def next_phone_number(self):
self.phone_index += 1
if self.phone_index >= len(self.phone_numbers):
self.phone_number()
return self.phone_numbers[self.phone_index]
error-message:
enter image description here
The issue is that the phone_number() function is not extending the self.phone_numbers list enough to cover the fact that the self.phone_index is out of range. Perhaps consider doing this to extend until it is large enough:
#property
def next_phone_number(self):
self.phone_index += 1
while self.phone_index >= len(self.phone_numbers):
self.phone_number()
return self.phone_numbers[self.phone_index]
When you do self.phone_numbers = soup.select_one(...), you're clobbering the old list of phone numbers you had in self.phone_numbers and replacing it with the data you've parsed from your web page. You later refine this into a new list, but it's still not doing what you want in terms of adding new numbers (I've not tried the code, I'd guess you always get the same amount of them from the web).
You should use a different variable name for the new data. That way you can extend the existing list with the new data without overwriting anything you already have:
def phone_number(self):
request = requests.get('https://www.bestrandoms.com/random-at-phone-number')
soup = BeautifulSoup(request.content, 'html.parser')
# these lines all use new local variables instead of clobbering self.phone_numbers
new_data = soup.select_one('#main > div > div.col-xs-12.col-sm-9.main > ul > textarea').text
new_numbers = new_data.splitlines()
new_numbers_reformatted = ['+43' + number[1:].replace(' ', '') for number in new_numbers]
# so now we can extend the list as desired
self.phone_numbers.extend(new_numbers_reformatted)
return self.phone_numbers
It's possible that you'll also need to change the initialization code in your class to make sure it initializes self.phone_numbers to an empty list, if it's not doing that already. The bad behavior of your method might have been covering up a bug if the list was not being created anywhere else.
I have been working on a project where I need to assert text of web elements in Selenium Python. Below is the example of what I meant:
Example
what I want to assert is the value 6.00 of the far-right of the image and the $6.00 of the bottom-left of the image.
Example.py
class Example:
def checkValue(self):
element 1 = self.driver.find_element_by_xpath("//div[contains(text(),'6.00')]")
element 2 = self.driver.find_element_by_xpath("//label[contains(text(),'Total Amount: $6.00')]")
Test.py
class Test:
def test_1:
np = new Example()
np.checkValue()
I have included the blueprint for the 2 elements. May I know how to include assert command in this example.py program?
You should either check if both elements may be found with the given price, or you can check whether the given elements have the expected value.
Option1
Find elements by price and let selenium raise the exception:
def check_value(self, expected_price):
elem1_xp = '//tr[#whatever]//td[contains(text(),"%s")]' % expected_price # xpath of the subtotal price element, adjust it
elem2_xp = '//div[contains(text(), "Total amount: $%s")]' % expected_price # xpath of the total element, adjust it
element_1 = self.driver.find_element_by_xpath(elem1_xp)
element_2 = self.driver.find_element_by_xpath(elem2_xp)
With this solution, when element_1 or element_2 can't be found, an Exception will be raised and you will see which element couldn't be found. You don't need an assert here. If both elements are found, the check is passed.
If you are searching for 6.00 and it is present twice on the screen, you need to be careful to avoid finding the same element twice (or the element and its parent).
Option2
Check if the value of the two elements match. The first and most important thing here is to have a precise xpath for both elements.
def check_value(self, expected_price):
elem1_xp = '//tr[#whatever]//td[4]' # precise xpath of the subtotal price element
elem2_xp = '//div[contains(text(), "Total amount: $")]' # total price element
element_1_value = self.driver.find_element_by_xpath(elem1_xp).text
element_2_value = self.driver.find_element_by_xpath(elem2_xp).text.replace("Total amount: $", "")
assert element_1_value == element_2_value == expected_price, "Price mismatch! Expected price is %s, subtotal is %s, total price is %s" % (expected_price, element_1_value, element_2_value)
Relatively new in programming, doing Python with Selenium and Xpath
this is from this website:https://www.marketbeat.com/stocks/NASDAQ/AMZN/earnings/
what I am trying to do is copy the dates of the past earnings.
Problem is it is in the table and it also has tag.
I have this full Xpath:
/html/body/div[1]/main/article/form/div[8]/div[2]/div[2]/div/div[5]/div/table/tbody/tr[2]/td[1]
and upon inspection it shows this
< td data-sort-value="20201027000000">10/27/2020 < / td >
I am trying to get the 10/27/2020 out from here but I don't know how to.
It is easy when it is just the < td >$1.53< /td > where just giving the full xpath and then doing a .text on it gives me the text.
point is how do I get 10/27/2020. I would suspect that I would have to go over data-sort-value="" part.
Here is what I have :
stringeEPSxpath = """/html/body/div[1]/main/article/form/div[8]/div[2]/div[2]/div/div[5]/div/table/tbody/tr[""" + row + """]/td[1]"""
Date = stringeEPSxpath.text
(the row part is just me iterating the website page through a loop no problem there.)
For the rest of my code it has been pretty simple :
stringeEPSxpath = """/html/body/div[1]/main/article/form/div[8]/div[2]/div[2]/div/div[5]/div/table/tbody/tr[""" + row + """]/td[3]"""
stringEPSxpath = """/html/body/div[1]/main/article/form/div[8]/div[2]/div[2]/div/div[5]/div/table/tbody/tr[""" + row + """]/td[4]"""
elem = driver.find_element_by_xpath(stringeEPSxpath)
elem2 = driver.find_element_by_xpath(stringEPSxpath)
fix1 = (elem.text).replace(')','')
fix2 = (elem2.text).replace(')','')
eps1 = fix1.replace('(','-')
eps2 = fix2.replace('(','-')
As you can see above all I had to do was set a variable to it then use .text to convert it to a string. But now that does not work for dates.
The end result is the error of AttributeError: 'str' object has no attribute 'text' which I would suspect is because of the data-sort-value="20201027000000" part.
In this code stringeEPSxpath is string, not WebElement.
stringeEPSxpath = """/html/body/div[1]/main/article/form/div[8]/div[2]/div[2]/div/div[5]/div/table/tbody/tr[""" + row + """]/td[1]"""
Date = stringeEPSxpath.text
You can try this one:
from selenium.webdriver.common.by import By
element = driver.find_element(By.XPATH,"//td[#data-sort-value='20201027000000']")
Date = element.text
Forgot driver.find_element_by_xpath(stringeEPSxpath) which actually uses xpath. Since I copy and pasted from my other program I guess I deleted too much and completely forgot to add in the driver.findxpath.