Cant get text while webscraping a site

Cant get text while webscraping a site - python

I'm trying to get the part where it says Avvisami on this website: https://www.nike.com/it/launch/t/womens-air-jordan-3-sp-a-ma-maniere
to appear as a string on my code. Every time I try anything it doesn't work. This is the part of the code:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
options = webdriver.ChromeOptions()
options.add_argument(r'--user-data-dir=C:\Users\mainuser\AppData\Local\Google\Chrome\User Data')
options.add_argument('--profile-directory=Profile 1')
driver = webdriver.Chrome(options = options)
driver.get('https://www.nike.com/it/launch/t/womens-air-jordan-3-sp-a-ma-maniere')
instock = (driver.find_elements_by_class_name('ncss-btn-primary-dark btn-lg'))
print(instock)
and in that, this is the part I think I need to change:
instock = (driver.find_elements_by_class_name('ncss-btn-primary-dark btn-lg'))
print(in stock)
I've been trying to fix it for an hour or so but I just can't wrap my head around how.

instock = driver.find_element_by_css_selector(".ncss-btn-primary-dark.btn-lg").text
print(instock)
Multiple class names should be used with css selector and to grab the text just use .text and then place it in your variable.

You are trying to get a text from a list of elements. Iterate it and use .text:
elems = driver.find_elements_by_css_selector(".ncss-btn-primary-dark.btn-lg")
for el in elems:
print(el.text)
All of these elements are buttons.

Related

Selenium xpath no result

I am trying to do some web scrawling through Selenium. However, when I run the code, it does not show the result.
Here is my code:
import selenium
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from bs4 import BeautifulSoup
import time
import pandas as pd
driver = webdriver.Chrome()
url = 'https://vimeo.com/510879223'
driver.get(url)
#head > meta:nth-child(14)
#/html/head/meta[8]
title = driver.find_element(By.CSS_SELECTOR,"head > meta:nth-child(14)")
print (title.text)
description = driver.find_element(By.XPATH,"//meta[#property='og:description']").text
print (description)
Result:
Process finished with exit code 0
In this case, what should I add or delete? Is it happened because the site that I want to scrape does not support xpath scrape option?
If I do print (title), the result is:
<selenium.webdriver.remote.webelement.WebElement (session="6f182a4afb7c1173f1e74f1cd6a40d87", element="e10f1407-3a09-4f3e-96e4-19071cda7d8e")>
Feel like it has a result but I cannot check the result as text. In this case, what is the best way to fix it? Thank you!

In your case I would recommend finding title by Xpath since the css selector you are trying to use is showing me the description tag. Notice that the text you are looking for is not stored as text on the page but rather in the content attribute. Using the .get_attribute() method should help.
title = driver.find_element(By.XPATH,"//meta[#property='og:title']").get_attribute('content')
print (title)
description = driver.find_element(By.XPATH,"//meta[#property='og:description']").get_attribute('content')
print (description)

Hi sometimes errors may occur in xpath copying. There may be inconsistencies with the command you want to take action.You can try using commands.
find_element_by_css_selector
find_element_by_class_name
My example project may help you.
https://github.com/kaayaumutt/instagramBotApp/blob/main/instagramBotApp/instagramBotApp.py

Unable to obtain table info through python selenium

I am new bee on python selenium environment. I am trying to get the SQL version table from enter link description here
from selenium.webdriver.common.by import By
from selenium import webdriver
# define the website to scrape and path where the chromediver is located
website = "https://www.sqlserverversions.com"
driver = webdriver.Chrome(executable_path='/Users//Downloads/chromedriver/chromedriver.exe')
# define 'driver' variable
# open Google Chrome with chromedriver
driver.get(website)
matches = driver.find_elements(By.TAG_NAME, 'tr')
for match in matches:
b=match.find_elements(By.XPATH,"./td[1]")
print(b.text)
it says AttributeError: 'list' object has no attribute 'text'. Am i choosing the write syntax and right parameters to grab the data?
Below is the table which i am trying to get data.
enter image description here
Below are the parameters which i am trying to put in code.
enter image description here
Please advise what is required to modify in the code to obtain the data in table format.
Thanks,
Arun

If you need data only from first table:
from selenium.webdriver.common.by import By
from selenium import webdriver
website = "https://www.sqlserverversions.com"
driver = webdriver.Chrome(executable_path='/Users//Downloads/chromedriver/chromedriver.exe')
driver.get(website)
show_service_pack_versions = True
xpath_first_table_sql_rows = "(//table[#class='tbl'])[1]//tr/td/a[starts-with(text(),'SQL Server')]//ancestor::tr"
matches = driver.find_elements(By.XPATH, xpath_first_table_sql_rows)
for match in matches:
sql_server_a_element = match.find_element(By.XPATH, "./td/a[2]")
print(sql_server_a_element.text)
sql_server_rtm_version_a_element = match.find_element(By.XPATH, ".//td[#class='rtm']")
print('RTMs:')
print(sql_server_rtm_version_a_element.text)
if(show_service_pack_versions):
print('SPs:')
sql_server_sp_version_td_elements = match.find_elements(By.XPATH, ".//td[#class='sp']")
for td in sql_server_sp_version_td_elements:
print('---')
print(td.text)
print('----------------------------------')
if you set show_service_pack_versions = False then information regarding service packs will be skipped

There was a part of your code where you were calling b.text after getting the result of find_elements, which returns a list. You can only call b.text on a single WebElement (not a list of them). Here's the updated code:
from selenium.webdriver.common.by import By
from selenium import webdriver
website = "https://www.sqlserverversions.com"
driver = webdriver.Chrome(executable_path='/Users//Downloads/chromedriver/chromedriver.exe')
driver.get(website)
matches = driver.find_elements("css selector", "tr")
for match in matches[1:]:
items = match.find_elements("css selector", "td")
for item in items:
print(item.text)
That will print out A LOT of rows, unless you limit the loop.

If you just need text it's simpler to do it on the browser side:
data = driver.execute_script("""
return [...document.querySelectorAll('tr')].map(tr => [...tr.querySelectorAll('td')].map(td => td.innerText))
""")

scraping yahoo stock news

I am scraping news articles related to Infosys at the end of page but getting error
selenium.common.exceptions.InvalidSelectorException: Message: invalid selector .
Want to scrape all articles related to Infosys.
from bs4 import BeautifulSoup
import re
from selenium import webdriver
import chromedriver_binary
import string
import time
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
driver = webdriver.Chrome("/Users/abhishekgupta/Downloads/chromedriver")
driver.get("https://finance.yahoo.com/quote/INFY/news?p=INFY")
for i in range(20): # adjust integer value for need
# you can change right side number for scroll convenience or destination
driver.execute_script("window.scrollBy(0, 250)")
# you can change time integer to float or remove
time.sleep(1)
print(driver.find_element_by_xpath('//*[#id="latestQuoteNewsStream-0-Stream"]/ul/li[9]/div/div/div[2]/h3/a/text()').text())

You could use less detailed xpath using // instead of /div/div/div[2]
And if you want last item then get all li as list and later use [-1] to get last element on list
from selenium import webdriver
import time
driver = webdriver.Chrome("/Users/abhishekgupta/Downloads/chromedriver")
#driver = webdriver.Firefox()
driver.get("https://finance.yahoo.com/quote/INFY/news?p=INFY")
for i in range(20):
driver.execute_script("window.scrollBy(0, 250)")
time.sleep(1)
all_items = driver.find_elements_by_xpath('//*[#id="latestQuoteNewsStream-0-Stream"]/ul/li')
#for item in all_items:
# print(item.find_element_by_xpath('.//h3/a').text)
# print(item.find_element_by_xpath('.//p').text)
# print('---')
print(all_items[-1].find_element_by_xpath('.//h3/a').text)
print(all_items[-1].find_element_by_xpath('.//p').text)

xPath you provided does not exist in the page.
Download the xPath Finder Chrome Extension to find the correct xPath for articles.
Here is an example xPath of articles list, you need to loop through id:
/html/body/div[1]/div/div/div[1]/div/div[3]/div[1]/div/div[5]/div/div/div/ul/li[ID]/div/div/div[2]/h3/a/u

I think your code is fine just one thing: there are few difference when we retrieve text or links when using xpath in selenium as compare to scrapy or if you are using lxml fromstring library so here is something that should work for you
#use this code for printing instead
print(driver.find_element_by_xpath('//*[#id="latestQuoteNewsStream-0- Stream"]/ul/li[9]/div/div/div[2]/h3/a').text)
Even if you do this it will work the same way since there is only one element with this id so simply use
#This should also work fine
print(driver.find_element_by_xpath('//*[#id="latestQuoteNewsStream-0- Stream"]').text)

xpath returns more than one result, how to handle in python

I have started selenium using python. I am able to change the message text using find_element_by_id. I want to do the same with find_element_by_xpath which is not successful as the xpath has two instances. want to try this out to learn about xpath.
I want to do web scraping of a page using python in which I need clarity on using Xpath mainly needed for going to next page.
#This code works:
import time
import requests
import requests
from selenium import webdriver
driver = webdriver.Chrome()
url = "http://www.seleniumeasy.com/test/basic-first-form-demo.html"
driver.get(url)
eleUserMessage = driver.find_element_by_id("user-message")
eleUserMessage.clear()
eleUserMessage.send_keys("Testing Python")
time.sleep(2)
driver.close()
#This works fine. I wish to do the same with xpath.
#I inspect the the Input box in chrome, copy the xpath '//*[#id="user-message"]' which seems to refer to the other box as well.
# I wish to use xpath method to write text in this box as follows which does not work.
driver = webdriver.Chrome()
url = "http://www.seleniumeasy.com/test/basic-first-form-demo.html"
driver.get(url)
eleUserMessage = driver.find_elements_by_xpath('//*[#id="user-message"]')
eleUserMessage.clear()
eleUserMessage.send_keys("Test Python")
time.sleep(2)
driver.close()

To elaborate on my comment you would use a list like this:
eleUserMessage_list = driver.find_elements_by_xpath('//*[#id="user-message"]')
my_desired_element = eleUserMessage_list[0] # or maybe [1]
my_desired_element.clear()
my_desired_element.send_keys("Test Python")
time.sleep(2)
The only real difference between find_elements_by_xpath and find_element_by_xpath is the first option returns a list that needs to be indexed. Once it's indexed, it works the same as if you had run the second option!

Selenium find_elements_by_css_selector returns an empty list

I'm trying to select all the ids which contain coupon-link keyword with the following script.
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver = webdriver.Firefox()
driver.get("http://udemycoupon.discountsglobal.com/coupon-category/free-2/")
elems = driver.find_elements_by_css_selector('[id~=\"coupon-link\"]')
print(elems)
But I got an empty list [] as the result. What's wrong with my css_selector?
I've tested that find_elements_by_css_selector('[id=\"coupon-link-92654\"]') works successfully. But I want to select all the coupon-links, not just one of them.
I referenced the document at w3schools.com.

Selenium CSS only support three partial match operators viz.- $^*.
CSS partial match expression is not correct- Use * or ^ details at here and here. You can use xpath too.
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver = webdriver.Firefox()
driver.get("http://udemycoupon.discountsglobal.com/coupon-category/free-2/")
#select by css
#try *
css_lnks = [i.get_attribute('href') for i in driver.find_elements_by_css_selector('[id*=coupon-link]')]
#or try ^
#css_lnks = [i.get_attribute('href') for i in driver.find_elements_by_css_selector('[id^=coupon-link]')]
#select by xpath
xpth_lnks = [i.get_attribute('href') for i in driver.find_elements_by_xpath("//a[contains(#id,'coupon-link-')]")]
print xpth_lnks
print css_lnks

The ~= selector selects by value delimited by spaces. In that sense, it works similarly to a class selector matching the class attribute.
Since IDs don't usually have spaces in them (because an id attribute can only specify one ID at a time), it doesn't make sense to use ~= with the id attribute.
If you just want to select an element by a prefix in its ID, use ^=:
elems = driver.find_elements_by_css_selector('[id^=\"coupon-link\"]')

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Cant get text while webscraping a site - python

instock = driver.find_element_by_css_selector(".ncss-btn-primary-dark.btn-lg").text print(instock) Multiple class names should be used with css selector and to grab the text just use .text and then place it in your variable.

You are trying to get a text from a list of elements. Iterate it and use .text: elems = driver.find_elements_by_css_selector(".ncss-btn-primary-dark.btn-lg") for el in elems: print(el.text) All of these elements are buttons.

Related

Selenium xpath no result

Unable to obtain table info through python selenium

scraping yahoo stock news

xpath returns more than one result, how to handle in python

Selenium find_elements_by_css_selector returns an empty list

Categories

Resources