Selenium Webscraping Twitter - Getting hold of tweet timestamp? - python

When inspecting a twitter results page, within the following class:
<small class="time">
....
</small>
Is a timestamp for each tweet 'data-time':
<span class="_timestamp js-short-timestamp js-relative-timestamp" data-time="1510698047" data-time-ms="1510698047000" data-long-form="true" aria-hidden="true">12m</span>
Within selenium i am using the following code:
tweet_date = browser.find_elements_by_class_name('_timestamp')
But looking at a single entry only returns, in this case, 12m.
How is it possible to access one of the other properties within the class within selenium?

I usually use find_elements_by_xpath, this will let you grab a specific element from a page without worrying about names. Or so that's how it seems to work.
EDIT
Alright so I think I've got it figured out. First, find element by xpath and assign.
ts=browser.find_elements_by_xpath('//*[#id="stream-item-tweet-929138668551380992"]/div/div[2]/div[1]/small/a/span')
Forgot that if you use "elements" instead of "element" you'll need to add something like this.
ts=ts[0]
Then you can use the get_attribute method to get the info associated with 'data-time' in the html.
raw_time=ts.get_attribute('data-time')
Returns
raw_time == '1510358895'

Thank you to SuperStew who found the key to the answer - get_attribute()
My final solution for anyone wondering:
tweet_date = browser.find_elements_by_class_name("_timestamp")
And then for any date in that list:
tweet_date[1].get_attribute('data-time')

Related

I found a span on a website that is not visible and I can't scrape it! Why?

Currently I'm trying to scrape data from a website. Therefore I'm using Selenium.
Everything is working as it should. Until I realised I have to scrape a tooltiptext.
I found already different threads on stackoverflow that are providing an answer. Anyway I did not manage to solve this issue so far.
After a few hours of frustration I realised the following:
This span has nothing to do with the tooltip I guess. Because the tooltip looks like this:
There is actually a span that I can't read. I try to read it like this:
bewertung = driver.find_elements_by_xpath('//span[#class="a-icon-alt"]')
for item in bewertung:
print(item.text)
So Selenium finds this element. But unfortunatly '.text' returns nothing. Why is it always empty ?
And what for is the span from the first screenshot ? Btw. it is not displayed at the Website as well.
Since you've mentioned Selenium finds this element, I would assume you must have print the len of bewertung list
something like
print(len(bewertung))
if this list has some element in it, you could probably use innerText
bewertung = driver.find_elements_by_xpath('//span[#class="a-icon-alt"]')
for item in bewertung:
print(item.get_attribute("innerText"))
Note that, you are using find_elements which won't throw any error instead if it does not find the element it will return an empty list.
so if you use find_element instead, it would throw the exact error.
Also, I think you've xpath for the span (Which does not appear in UI, sometime they don't appear until some actions are triggered.)
You can try to use this xpath instead:
//i[#data-hook='average-stars-rating-anywhere']//span[#data-hook='acr-average-stars-rating-text']
Something like this in code:
bewertung = driver.find_elements_by_xpath("//i[#data-hook='average-stars-rating-anywhere']//span[#data-hook='acr-average-stars-rating-text']")
for item in bewertung:
print(item.text)

Find a link by href in selenium python

Let's take the example of spotify because I'm listening to music on it right now.
I would like to get the text contained in the href tag in the following code.
<a data-testid="nowplaying-track-link" href="/album/3xIwVbGJuAcovYIhzbLO3J">Toosie Slide</a>
What I want is to get "/album/3xIwVbGJuAcovYIhzbLO3J" or if that's not possible, get "Toosie Slide" in order to store it in a variable to compare it with a constant.
The difficulty with Spotify (and many other sites) is that this href tag is present several times on the web page. So I'd like to get only the link that's contained in "nowplaying-track-link" which is a data-testid.
There, I hope I was clear.
PS: I already know the commands like: driver.find_element_by_xpath, etc... but I can't use them in this case...
I'm not sure what you mean about the commands of the type and not being able to use them, but this is how you would get the info you're seeking:
element = driver.find_element_by_css_selector('[data-testid="nowplaying-track-link"]')
href = element.get_attribute('href')
element_text = element.text
if you want to put together the link, you can do it this way:
link = driver.current_url + href

Need Selenium to return the class title content of given HTML

Using Selenium to perform some webscraping. Have it log in to a site, where an HTML table of data is returned with five values at a time. I'm going to have Selenium scrape a particular bit of data off the table, write to a file, click next, and repeat with the next five.
New automation script. I've a myriad of variations of get_attribute, find_elements_by_class_name, etc. Example:
pnum = prtnames.get_attribute("title")
for x in prtnames:
print('pnum')
Here's the HTML from one of the returned values:
<div class="text-container prtname"><span class="PrtName" title="P011">P011</span></div>
I need to get that "P011" value. Obviously Selenium doesn't have "find_elements_by_title", and there is no HTML id for the value. The Xpath for that line of HTML is:
//*[#id="printerConnectTable"]/tbody/tr[5]/td/table/tbody/tr[1]/td[2]/div/span
But I don't see a reference to "title" or "P011" in that Xpath.
pnum = prtnames.get_attribute("title")
AttributeError: 'list' object has no attribute 'get_attribute'
It's like get_attribute doesn't exist, but there is some (albeit not much) documentation on it.
Fundamentally I'd like to grab that "P011" value and print to console, then I know Selenium is working with the right data.
P.S. I'm self-taught with all of this, I'm automating a sysadmin task.
I think the problem is that prtnames is a list of element, not a specific element. You can use a list comprehension if you want a list of the attributes of titles for the list of prtnames.
pnums = [x.get_attribute('title') for x in prtnames]

XPath: Select specific item from property string

Trying to drill down to a specific Xpath of a url in a longer string. I've gotten down to each of the listed blocks, but can't seem to get any further than the long string of properties.
example code:
<div class="abc class">
<a class="123" title="abc" keys="xyz" href="url string">
Right now I have...
.//*[#id='content']/div/div[1]/a
That only retrieves the whole string of data, from class through href. What would I need to just retrieve the "url string" from that part? Would this need to be accomplished with a subsequent 'for' argument in the python input?
A pure XPath solution would involve just adding the #href to the expression:
.//*[#id='content']/div/div[1]/a/#href
In Python, assuming you are using lxml.html, you can get the attribute using the .attrib:
for link in root.xpath(".//*[#id='content']/div/div[1]/a"):
print(link.attrib['href'])
Try to avoid this array
if your class name is unique you can do it like:-
//*[#id='content']/div/div[#class='abc class']/a[#keys='xyz']/#href
Hope it will help you :)

How to get value of onclick using xpath?

I have this piece of Html:
<tr class="selectable" onclick="PesquisarProntuarioView.EditarProntuario('108077098085')">
I would like to get what is inside onclick, i was looking for a command which would give me:
"PesquisarProntuarioView.EditarProntuario('01048108077098085')"
I already have tried several commands like this one:
element=driver.find_element_by_xpath("//a/tr/#onlick=PesquisarProntuarioView.EditarProntuario(*)")
driver.get(element)
However still no clue, Could someone please help me with the respective command in Selenium/Python?
Locate the element and use get_attribute() method:
element = driver.find_element_by_xpath("//a/tr")
print(element.get_attribute("onclick"))
If there are multiple elements you need to extract onclick from, use find_elements_* and call get_attribute() for every element found:
for element in driver.find_elements_by_xpath("//a/tr"):
print(element.get_attribute("onclick"))
I would slightly change the above solution to only select relevant elements in path //a/tr by providing part of the onclick attribute, like this:
onclickValue = driver.find_element_by_xpath("//a/tr[contains(#onlick,'PesquisarProntuarioView.EditarProntuario')]").get_attribute("onclick")

Categories

Resources