Selenium does not process when the class name contain the specific character - python

When i'm crawling data from...
enter image description here
enter image description here
I have issus when i run the block above.
Yup, when i find a element, contains specifics character. "--+".
It don't work.
In other way, i do
enter image description here
My version selenium v4.3
MY QUESTION: HOW ABOUT SELENIUM CAN READ THE CLASS THAT CONTAINT SPECIFIC CHARACTE.
The result: i want to get text in this class. the same as image below.
enter image description here

Related

Unable to extract alt text from <img> tags using Selenium

Im building a webscrape bot and I want to get the alt text of images from instagram posts. In theory this should not be that hard with .get_attribute('alt') command in Selenium but for some reason this never works. It always returns none.
Here's the code of my whole function:
def get_image_alt_text(posts):
alt_text = []
for link in posts:
driver.get(link)
time.sleep(4)
try:
image = driver.find_element(by=By.TAG_NAME, value="img")
alt_text.append(image.get_attribute('alt'))
print(alt_text)
except:
continue
return alt_text
when I run this it just puts an empty string into the list.
I also realize this isn't great code as there are usually multiple image tags on a page. So I rewrote it to find_elements() and then iterated every image on the page and none returned any alt text. Even though when inspecting the HTML on the site there is a value set for alt.
I ended up getting around this issue by using .get_attribute('innerHTML') and then making a separate function to slice out the alt value from the given string.

Extract id of the div using selenium in Python

Can someone help me to extract the id text (the underline one in the below-attached image) from this link: https://opennem.org.au/facilities/au/?selected=DEIBDL&status=operating,committed,commissioning,retired
To get the right on the item in any of the items in the table and inspect.
I knew how to extract the table text but I want the id text which I underlined
The selected element can be located by this XPath
//div[#class='card is-selected']
Or this CSS Selector
div.is-selected
So, you can use this Selenium code to extract the id attribute value:
id = driver.find_element(By.XPATH,"//div[#class='card is-selected']").get_attribute("id")
But this will give you the id appearing on the dev tools, this is not always the text you see as a user on the page.
To get the text user sees you will need this:
name = driver.find_element(By.CSS_SELECTOR,"div.is-selected .station-name").text
This can be done woth XPath as well

Scrapy css selector get text from occurence of first class

I'm trying to scrape text from class .s-recipe-header__info-item, but as you can see on the picture, there are three classes with the same name and I would like to extract only the first one to get text "Do hodiny" See the image of code here. So far I have tried this code:
recipe_item["preparation_time"] = response.css(".s-recipe-header__info > .s-recipe-header__info-items > .s-recipe-header__info-item::text").extract_first()
I have also tried to use .get() instead of .extract_first(), but both do not seem to work...
I am new to web scraping and I have only elemental HTML and CSS knowledge. Thank you in advance for your help.

Search results don't change URL - Web Scraping with Python and Selenium

I am trying to create a python script to scrape the public county records website. I ultimately want to be able to have a list of owner names and the script run through all the names and pull the most recent deed of trust information (lender name and date filed). For the code below, I just wrote the owner name as a string 'ANCHOR EQUITIES LTD'.
I have used Selenium to automate the entering of owner name into form boxes but when the 'return' button is pressed and my results are shown, the website url does not change. I try to locate the specific text in the table using xpath but the path does not exist when I look for it. I have concluded the path does not exist because it is searching for the xpath on the first page with no results shown. BeautifulSoup4 wouldn't work in this situation because parsing the url would only return a blank search form html
See my code below:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
browser = webdriver.Chrome()
browser.get('http://deed.co.travis.tx.us/ords/f?p=105:5:0::NO:::#results')
ownerName = browser.find_element_by_id("P5_GRANTOR_FULLNAME")
ownerName.send_keys('ANCHOR EQUITIES LTD')
docType = browser.find_element_by_id("P5_DOCUMENT_TYPE")
docType.send_keys("deed of trust")
ownerName.send_keys(Keys.RETURN)
print(browser.page_source)
#lenderName = browser.find_element_by_xpath("//*[#id=\"report_results\"]/tbody[2]/tr/td/table/tbody/tr[25]/td[9]/text()")
enter code here
I have commented out the variable that is giving me trouble.. Please help!!!!
If I am not explaining my problem correctly, please feel free to ask and I will clear up any questions.
I think you almost have it.
You match the element you are interested in using:
lenderNameElement = browser.find_element_by_xpath("//*[#id=\"report_results\"]/tbody[2]/tr/td/table/tbody/tr[25]/td[9]")
Next you access the text of that element:
lenderName = lenderNameElement.text
Or in a single step:
lenderName = browser.find_element_by_xpath("//*[#id=\"report_results\"]/tbody[2]/tr/td/table/tbody/tr[25]/td[9]").text
have you used following xpath?
//table[contains(#summary,"Search Results")]/tbody/tr
I have checked it's work perfect.In that, you have to iterate each tr

Code to extract a value from html code in Python

I need an urgent help for the issue described below.I am automating a
project using selenium python bindings
Scenario : creating a new member with a profile picture and add this
member to a group.Then check whether the profile picture given to the
member at the time of profile creation is the same when it appears in
the friends list
For this I would like to compare the image Ids at the time of profile
creation and in the friends list?
I have found out the image id using firebug.Image Id is given inside a
<div><a class=........Imageid=234563453.....................>
But how can I extract this Image Id from the ?
print self.driver.find_element_by_xpath("")._getattribute_(ImageId)
Can anybody provide me the code to extract this Imageid from <a class> ???
You can use regex.
s = "<a class=........Imageid=234563453.....................>"
m = re.search("Imageid=\d*",s)
print m.group().split('=')[1]

Categories

Resources