Using selenium to scrape email addresses with xpath

Using selenium to scrape email addresses with xpath - python

I want to scrape the email address from the following web page.
Facebook Business Info Page
So I decided to use the selenium driver with Python. I figured the best way to do this was through defining the xpath. From inspection of the elements, I noticed that the info I was looking for was found in the following HTML structure as seen here:
Now I must admit that I am bit of a noob when it comes to using Selenium and defining elements by xpath, so I was hoping someone would correct me if I am defining the following xpath incorrectly. This is what I have right now:
But I'm fairly certain I'm defining the wrong xpath. I know I want to grab the information in the _50f4 div class but I don't know how to define it. If someone could help me figure that out I would greatly appreciate it.

you can get the text of the email address using an xpath like : //div[#id = 'u_0_u']//ul/li[4]//div[#class = '_50f4']

Related

Can't find twitter DM textbox by xpath using selenium

I am trying to find the textbox element using the find_element_by_xpath() method, but It keeps telling me it cant find said element, here's the line of code that does that.
I've tried finding it by link_text, partial link text, selector and it just doesn't work
bar = nav.find_element_by_xpath('//*[#id="react-root"]/div/div/div[2]/main/div/div/div/div[2]/div/div/aside/div[2]/div[2]/div/div/div/div/div[1]/div/div/div/div[2]/div/div/div/div')
Thanks in advance!

So, I suggest creating your xpath if you want to be precise and avoid taking it based on html structure (which can change).
The locators looks like:
And you can take it with xpath:
//input[#placeholder='Search people' and #role='combobox']
To avoid this problem, I suggest going trough a tutorial for a better understanding regarding how to create custom locators: Xpath tutorial

What was happening is: When I opened the tab to inspect the element, the DM structure changed because of my screen size so the xpath wasn't the same

Python Selenium identifying xpath iframes element

I am trying to find the names of all iframes in a web page. When I run driver.find_element_by_xpath("//iframe") I get session="f139d552bcf5b17598ba7b5af3987c8", element="04036644-d6cf-40a1-9434-5ce5d951e9a" how do I correlate this back to a name that is useful in html so that I can switch by different locators like tag, css, id, etc.? Preferably not in Java but if that is the only solution available that's fine. What kind of Id /attribute is element="04036644-d6cf-40a1-9434-5ce5d951e9a". Can someone provide an example of what the code would look like?

Xpath issues with content egg plugin

I don't usually post on here as you all seem pretty advanced for me, but this feels like an advanced question, so thought I'd ask.
I'm using the Content Egg plugin for WordPress (please don't roll eyes) and it's asking me for XPath for a price I'm trying to get. I have tried so many different xpaths and none of them seem to be working. I'm trying to get the price Xpath from this webpage. Any ideas?
https://www.hockeydirect.com/Catalogue/Hockey-Sticks/Young-Ones-Hockey-Sticks/Young-Ones-ABD-Hockey-Sticks/Young-Ones-ABD-70-Hockey-Stick-341111
This is what i've tried:
//html/body/form/div[5]/div[2]/div[3]/div/span/div[1]/div[4]/div[2]/div[2]/div/div[1]/div/span/fieldset/div[1]/div/div[1]
//*div[#class="MainPriceContainer"]
//div[#class='MainPriceContainer vcenter']
Any help would be greatly appreciated.

try to copy the xpath from firefox by this steps
inspect elements and select the price with the selector pointer it will highlight the element in the inspector html
then right click to the highlighted code then copy then select XPATH from inside copy
then paste it to the Xpath
then set the price update in the offer module setting to 30 seconds and test
if worked tell me :)

Python Xpath request returns empty list irregularly

I know there are many similar questions, but I've been through all of those and they couldn't help me. I'm trying to get information from a website, and I've used the same method on other websites with success. Here however, it doesn't work. I would very much appreciate if somebody could give me a few tips!
I want to get the max temperature for tomorrow from this website.
import re, requests, time
from lxml import html
page = requests.get('http://www.weeronline.nl/Europa/Nederland/Amsterdam/4058223')
tree = html.fromstring(page.content)
a = tree.xpath('//*[#id="app"]/div/div[2]/div[5]/div[2]/div[2]/div[6]/div/div/div/div/div/div/ul/div[2]/div/li[1]/div/span/text()')
print(a)
This returns an empty list, however. The same method on a few other websites I checked worked fine. I've tried applying this method on other parts of this website and this domain, all to no avail.
Thanks for any and all help!
Best regards

Notice that when you try to open that page you are asked whether you agree to allow cookies. (It's something like that, I have no Dutch.) You will need to use something like selenium to click on a button to 'OK' that so that you have access to the page that you really want. Then you can use the technique discussed at Web Scrape page with multiple sections to be able to get the HTML for that page, and finally apply whatever xpath it takes to retrieve the content that you want.

Finding the xpath of a button, using it in python and selenium

I am unsure if any of you are familiar with Reddit, however I want to start a small subreddit for some warhammer lore questions, where people can post questions and then answer them. To highlight the questions that are answered I want a moderator account to automatically upvote them once they are "Solved", which I am trying to do with Selenium, however I am running into some troubles finding the upvote button.
Currently, I am able to log in with my moderator account, however I am unable to press the upvote button, I have tried the following code to no avail:
driver.get("https://www.reddit.com/r/ChosenSub/ChosenThread")
time.sleep(3)
driver.find_element_by_xpath("div[#id='siteTable']/div[#id='thing_t3_XXXXXX']/div[#class='midcol unvoted']/div[#class='arrow up login-required access-required']").click
Where the XXXXX is an id of the thread in question, however this produces absolutely no result. I am fairly familiar with Python, but in no way xPath, I used the XPath helper tool in Chrome to get the XPath above, but still no luck
If anyone has any potential ideas please do let me know, any and all help is very appreciated.

Considering provided in comments link, you can try to use simplified XPath as below:
driver.find_element_by_xpath("//div[#id='thing_t3_XXXXXX']//div[#aria-label='upvote']").click()
If you need more common method to upvote question by its id (if id value is predefined):
def upvote_question(question_id):
driver.find_element_by_xpath("//div[#id='%s']//div[#aria-label='upvote']" % question_id).click()
And then you can just use it with a question's id as argument:
upvote_question("thing_t1_dcjl4vu")

You probably need to add '//' in front of that xpath so that it finds the div anywhere in the document, otherwise it would have to be at the root of the html (which it most likely is not). So the xPath would be:
"//div[#id='siteTable']..."

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Using selenium to scrape email addresses with xpath - python

you can get the text of the email address using an xpath like : //div[#id = 'u_0_u']//ul/li[4]//div[#class = '_50f4']

Related

Can't find twitter DM textbox by xpath using selenium

Python Selenium identifying xpath iframes element

Xpath issues with content egg plugin

Python Xpath request returns empty list irregularly

Finding the xpath of a button, using it in python and selenium

Categories

Resources