I'm trying to scrape a website and get every meal_box meal_container row in a list by driver.find_elements but for some reason I couldn't do it. I tried By.CLASS_NAME, because it seemed the logical one but the length of my list was 0. Then I tried By.XPATH, and the length was then 1 (I understand why). I think I can use XPATH to get them one by one, but I don't want to do it if I can handle it in a for loop.
I don't know why the "find_elements(By.CLASS_NAME,'print_name')" works but not "find_elements(By.CLASS_NAME,"meal_box meal_container row")"
I'm new at both web scraping and stackoverflow, so if any other details are needed I can add them.
Here is my code:
meals = driver.find_elements(By.CLASS_NAME,"meal_box meal_container row")
print(len(meals))
for index, meal in enumerate(meals):
foods = meal.find_elements(By.CLASS_NAME, 'print_name')
print(len(foods))
if index == 0:
mealName = "Breakfast"
elif index == 1:
mealName = "Lunch"
elif index == 2:
mealName = "Dinner"
else:
mealName = "Snack"
for index, title in enumerate(foods):
recipe = {}
print(title.text)
print(mealName + "\n")
recipe["name"] = title.text
recipe["meal"] = mealName
Here is the screenshot of the HTML:
It seems Ok but about class name put a dot between characters.
Like "meal_box.meal_container.row" Try this.
meals = driver.find_elements(By.CLASS_NAME,"meal_box.meal_container.row")
Try to use driver.find_element_by_css_selector
It can be because "meal_box meal_container row" is inside of other element. So you should try finding the highest element and look for needed one inside.
root = driver.find_element(By.CLASS_NAME,"row")
meals = root.find_elements(By.CLASS_NAME, "meal_box meal_container row")
Related
I'm trying to workout the number of for loops to run depending on the number of List (totalListNum)
And it seems that it is returning Nonetype when infact it should be returning either text or int
website:https://stamprally.org/?search_keywords=&search_keywords_operator=and&search_cat1=68&search_cat2=0
Code Below
for prefectureValue in prefectureValueStorage:
driver.get(
f"https://stamprally.org/?search_keywords&search_keywords_operator=and&search_cat1={prefectureValue}&search_cat2=0")
# Calculate How Many Times To Run Page Loop
totalListNum = driver.find_element_by_css_selector(
'div.page_navi2.clearfix>p').get_attribute('text')
totalListNum.text.split("件中")
if totalListNum[0] % 10 != 0:
pageLoopCount = math.ceil(totalListNum[0])
else:
continue
currentpage = 0
while currentpage < pageLoopCount:
currentpage += 1
print(currentpage)
I dont think you should use get_attribute. Instead try this.
totalListNum = driver.find_element_by_css_selector('div.page_navi2.clearfix>p').text
First, your locator is not unique
Use this:
div.page_navi2.clearfix:nth-of-type(1)>p
or for the second element:
div.page_navi2.clearfix:nth-of-type(2)>p
Second, as already mentioned, use .text to get the text.
If .text does not work you can use .get_attribute('innerHTML')
I don't know how much usernames have because in each iteration data change old users are replaced with new ones so I don`t know who will be the last.How can I break the loop after no more new users are found.
n = 0
usernames = soup.find_all('div', class_='KV-D4')
while n < 10000:
for each in usernames:
each.get_text()
n+=1
if(usernames[last]):
break
If all you want to do is loop through a list of divs and break as soon as the value are no longer unique; then create a list and add items that are not unique otherwise break.
For example:
unique_usernames = []
usernames = soup.find_all('div', class_='KV-D4')
for each in usernames:
username = each.text
if username in unique _usernames:
break # First not unique user found
else:
unique_usernames.append(username)
I am looping through rows which each have a link and an index value that I assign to it. In addition to selenium, I am also using Beautiful Soup API to check page html.
The main issue is that once I have found the link index that I want to use, I execute links[index].click() and it will only work occasionally.
Error: list index out of range
When I double checked, I see that my index is still in range of the list but is still not working
# Each link is confirmed to work, but only works every other time the script is run
page_html = BeautifulSoup(driver.page_source, 'html.parser')
links = [link1, link2]
rows = page_html.find_all('tr',recursive=False)
index = 0
found = False
for row in rows:
col = row.select('td:nth-of-type(5)')
for string in col[0].strings:
# If column has a "Yes" string, let's use the index of this row
if (string == 'Yes'):
found = True
break
# Break from loop if we already have the row that we want
if found:
break
# If not found, continue adding to index value
index += 1
# This is the part of the code that does not work consistently
links[index].click()
To debug this I attempted the following:
def custom_wait(num=3):
driver.implicitly_wait(num)
time.sleep(num)
attempts = 0
while attempts < 10:
custom_wait()
try:
links[index].click()
except:
PrintException()
attempts += 1
else:
logger.debug("Link Successfully clicked")
break
When I run this code, it says that the link is successfully clicked but again it mentions that the index is out of range.
If the page contains more than 2 rows, it's not necessarily a surprise it throws an exception :O
The links list contains 2 values (index-0, index-1). If the third row's col does not contain the string 'Yes' you don't break from the for loop and increment the index variable.
So at the third row the index = 2 and the links list does not have anything at index-2, hence you get the IndexError
Why don't you loop over the links instead?
found = False
for link in links:
link.click()
rows = page_html.find_all('tr',recursive=False)
for row in rows:
col = row.select('td:nth-of-type(5)')
for string in col[0].strings:
if (string == 'Yes'):
found = True
break
if found:
break
if found:
break
Instead of writing 10+ IF statements, I'm trying to create one IF statement using a variable. Unfortunately, I'm not familiar with how to implement string concatenation for xpath using python. Can anyone teach me how to perform string formatting for the following code segments?
I would greatly appreciate it, thanks.
if page_number == 1:
next_link = browser.find_element_by_xpath('//*[#title="Go to page 2"]')
next_link.click()
page_number = page_number + 1
time.sleep(30)
elif page_number == 2:
next_link = browser.find_element_by_xpath('//*[#title="Go to page 3"]')
next_link.click()
page_number = page_number + 1
time.sleep(30)
This answer is not about string concatenation, but about simple problem solution...
Instead of clicking particular link on pagination you can click "Next" button:
pages_number = 10
for _ in range(pages_number):
driver.find_element_by_xpath("//a[#title='Go to next page']").click()
time.sleep(30)
If you need to open specific page you can use below:
required_page = 3
driver.find_element_by_link_text(required_page).click()
P.S. I assumed you are talking about this site
You can use a for-loop
Ex:
for i in range(1, 10):
next_link = browser.find_element_by_xpath('//*[#title="Go to page {0}"]'.format(str(i+1))) #using str.format for page number
next_link.click()
time.sleep(30)
How I can know, that there is a text or not. I used this code:
pricelist = driver.find_elements_by_xpath(".//*[#id='scroll']/div[2]/div/div[2]/div")
if EC.text_to_be_present_in_element(By.XPATH(".//*[#id='scroll']/div[2]/div/div[2]/div[1]")):
price = pricelist[1].text
elif EC.text_to_be_present_in_element(By.XPATH(".//*[#id='scroll']/div[2]/div/div[2]/div[2]")):
price = pricelist[2].text
else:
price = pricelist[3].text
Problem:
TypeError: 'str' object is not callable
You can use Xpath to check length of text in the element. See code below.
price = driver.find_element_by_xpath(".//*[#id='scroll']/div[2]/div/div[2]/div[string-length(text()) > 1]").text
You can change the text length in xpath like /div[string-length(text()) > 10] to find an element which contains more than 10 characters.
By.XPATH is not a method, but simple string. Correct usage is
EC.text_to_be_present_in_element((By.XPATH, xpath_expression)))
If you want to match element that contains text, use [text()] predicate:
".//*[#id='scroll']/div[2]/div/div[2]/div[text()]"
I have used another method:
pricelist = driver.find_elements_by_xpath(".//*[#id='scroll']/div[%d]/div/div[2]/div" %num)
for prc in pricelist:
if len(prc.text) < 10 and len(prc.text) > 0:
price = prc.text
This method is slow enough, but the most readeable