I'm trying to get text from "Redeemed Highlight My Message" twitch chat, here is my code.
from selenium import webdriver
driver = webdriver.Chrome('D:\Project\Project\Rebot Router\chromedriver11.exe')
driver.get("https://www.twitch.tv/nightblue3")
while True:
text11= driver.find_elements_by_xpath('//*[#id="6583f0b7722e3be4537e78903686d3b4"]/div/div[1]/div/div/section/div/div[3]/div[2]/div[3]/div/div/div[116]/div[2]/span[4]/span')
text44= driver.find_elements_by_class_name("chat-line--inline chat-line__message")
print(str(text11))
print(str(text44))
but when i run it that's what i get
[]
[]
[]
[]
[]
[]
[]
[]
[]
and when i use .text like that
while True:
text11= driver.find_elements_by_xpath('//*[#id="6583f0b7722e3be4537e78903686d3b4"]/div/div[1]/div/div/section/div/div[3]/div[2]/div[3]/div/div/div[116]/div[2]/span[4]/span').text
text44= driver.find_elements_by_class_name("chat-line--inline chat-line__message").text
print(str(text11))
print(str(text44))
that's what i get
Traceback (most recent call last):
File "D:/Project/Project/Rebot Router/test.py", line 7, in <module>
text11= driver.find_elements_by_xpath('//*[#id="6583f0b7722e3be4537e78903686d3b4"]/div/div[1]/div/div/section/div/div[3]/div[2]/div[3]/div/div/div[116]/div[2]/span[4]/span').text
AttributeError: 'list' object has no attribute 'text'
so any help please.
btw text11 and text44 is the same i just use in text11 xpath and text44 class_name.
while True:
Texts = driver.find_elements_by_xpath("//span[#class='text-fragment']")
for x in range (0, len(Texts)):
print(Texts[x].text)
Related
I'm new to web scraping with python and am having a problem with the weather web scraping script I wrote. Here is the whole code 'weather.py':
#! python3
import bs4, requests
weatherSite = requests.get('https://weather.com/en-CA/weather/today/l/eef019cb4dca2160f08eb9714e30f28e05e624bbae351ccb6a855dbc7f14f017')
weatherSoup = bs4.BeautifulSoup(weatherSite.text, 'html.parser')
weatherLoc = weatherSoup.select('.CurrentConditions--location--kyTeL')
weatherTime = weatherSoup.select('.CurrentConditions--timestamp--23dfw')
weatherTemp = weatherSoup.select('.CurrentConditions--tempValue--3a50n')
weatherCondition = weatherSoup.select('.CurrentConditions--phraseValue--2Z18W')
weatherDet = weatherSoup.select('.CurrentConditions--precipValue--3nxCj > span:nth-child(1)')
location = weatherLoc[0].text
time = weatherTime[0].text
temp = weatherTemp[0].text
condition = weatherCondition[0].text
det = weatherDet[0].text
print(location)
print(time)
print(temp + 'C')
print(condition)
print(det)
It basically parses the weather information from 'The Weather Channel' and prints it out. This code was working fine yesterday when I wrote it. But, I tried today and it is giving me the following error:
Traceback (most recent call last):
File "C:\Users\username\filesAndStuff\weather.py", line 16, in <module>
location = weatherLoc[0].text
IndexError: list index out of range
Replace:
weatherLoc = weatherSoup.select('.CurrentConditions--location--kyTeL')
# print(weatherLoc)
# []
By:
weatherLoc = weatherSoup.select('h1[class*="CurrentConditions--location--"]')
# print(weatherLoc)
# [<h1 class="CurrentConditions--location--2_osB">Hamilton, Ontario Weather</h1>]
As you can see, your suffix kYTeL is not the same for me 2_osB. You need a partial match on class attribute (class*=) (note the *)
from selenium import webdriver
import time
import random
driver = webdriver.Chrome()
videos = [
'https://www.youtube.com/watch?v=HPLJ-k8Mt-I'
'https://www.youtube.com/watch?v=nN1XDURBNYo'
'https://www.youtube.com/watch?v=KUTXpFO2yDo'
'https://www.youtube.com/watch?v=MccLaedI05g'
'https://www.youtube.com/watch?v=eHMSMGfkGjY'
'https://www.youtube.com/watch?v=c86GdHQaLsY'
'https://www.youtube.com/watch?v=dz59GsdvUF8'
'https://www.youtube.com/watch?v=Xjv1sY630Uc'
]
for i in range(1000):
print("vid is running for", i,)
randomvideo = random.randint(0, 3)
print(randomvideo)
driver.get(videos[randomvideo])
sleep_time = random.randint(5, 10)
time.sleep(sleep_time)
driver.quit()
output is
Traceback (most recent call last):
File "/home/asus/PycharmProjects/pythonProject2/selenum practice.py", line 21, in <module>
driver.get(videos[randomvideo])
IndexError: list index out of range
how to fix this!!!!
The value is from one of the list
SORRY!!this is the first time using stackoverflow
Please help
It should be written like this
videos = [
'https://www.youtube.com/watch?v=HPLJ-k8Mt-I',
'https://www.youtube.com/watch?v=nN1XDURBNYo',
'https://www.youtube.com/watch?v=KUTXpFO2yDo',
'https://www.youtube.com/watch?v=MccLaedI05g',
'https://www.youtube.com/watch?v=eHMSMGfkGjY',
'https://www.youtube.com/watch?v=c86GdHQaLsY',
'https://www.youtube.com/watch?v=dz59GsdvUF8',
'https://www.youtube.com/watch?v=Xjv1sY630Uc'
]
I'm trying to scrape multiple tables with the same class name using BeautifulSoup 4 and Python.
from bs4 import BeautifulSoup
import csv
standingsURL = "https://efl.network/index/efl/Standings.html"
standingsPage = requests.get(standingsURL)
standingsSoup = BeautifulSoup(standingsPage.content, 'html.parser')
standingTable = standingsSoup.find_all('table', class_='Grid')
standingTitles = standingTable.find_all("tr", class_='hilite')
standingHeaders = standingTable.find_all("tr", class_="alt")
However when running this it gives me the error
Traceback (most recent call last):
File "C:/Users/user/Desktop/program.py", line 15, in <module>
standingTitles = standingTable.find_all("tr", class_='hilite')
File "C:\Users\user\AppData\Local\Programs\Python\Python37-32\lib\site-packages\bs4\element.py", line 2128, in __getattr__
"ResultSet object has no attribute '%s'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?" % key
AttributeError: ResultSet object has no attribute 'find_all'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?
If i change the standingTable = standingsSoup.find_all('table', class_='Grid') with
standingTable = standingsSoup.find('table', class_='Grid')
it works, but only gives me the data of one of the tables while I'm trying to get the data of both
Try this.
from simplified_scrapy import SimplifiedDoc,req,utils
standingsURL = "https://efl.network/index/efl/Standings.html"
standingsPage = req.get(standingsURL)
doc = SimplifiedDoc(standingsPage)
standingTable = doc.selects('table.Grid')
standingTitles = standingTable.selects("tr.hilite")
standingHeaders = standingTable.selects("tr.alt")
print(standingTitles.tds.text)
Result:
[[['Wisconsin Brigade', '10', '3', '0', '.769', '386', '261', '6-1-0', '4-2-0', '3-2-0', '3-2-0', 'W4'], ...
Here are more examples. https://github.com/yiyedata/simplified-scrapy-demo/tree/master/doc_examples
My script stops scraping after 449th Yelp restaurant.
Entire Code: https://pastebin.com/5U3irKZp
for idx, item in enumerate(yelp_containers, 1):
print("--- Restaurant number #", idx)
restaurant_title = item.h3.get_text(strip=True)
restaurant_title = re.sub(r'^[\d.\s]+', '', restaurant_title)
restaurant_address = item.select_one('[class*="secondaryAttributes"]').get_text(separator='|', strip=True).split('|')[1]
The error I am getting is:
Traceback (most recent call last):
File "/Users/kenny/MEGA/Python/yelp scraper.py", line 41, in
restaurant_address = item.select_one('[class*="secondaryAttributes"]').get_text(separator='|', strip=True).split('|')[1]
IndexError: list index out of range
The problem is that some restaurants are missing the address, for example this one:
What you should do is check first, if the address has enough elements before indexing it. Change this line of code:
restaurant_address = item.select_one('[class*="secondaryAttributes"]').get_text(separator='|', strip=True).split('|')[1]
to these:
restaurant_address = item.select_one('[class*="secondaryAttributes"]').get_text(separator='|', strip=True).split('|')
restaurant_address = restaurant_address[1] if len(restaurant_address) > 1 else restaurant_address[0]
I ran your parser for all pages and it worked.
I am trying to get the below program working. It is supposed to find email addresses in a website but, it is breaking. I suspect the problem is with initializing result = [] inside the crawl function. Below is the code:
# -*- coding: utf-8 -*-
import requests
import re
import urlparse
# In this example we're trying to collect e-mail addresses from a website
# Basic e-mail regexp:
# letter/number/dot/comma # letter/number/dot/comma . letter/number
email_re = re.compile(r'([\w\.,]+#[\w\.,]+\.\w+)')
# HTML <a> regexp
# Matches href="" attribute
link_re = re.compile(r'href="(.*?)"')
def crawl(url, maxlevel):
result = []
# Limit the recursion, we're not downloading the whole Internet
if(maxlevel == 0):
return
# Get the webpage
req = requests.get(url)
# Check if successful
if(req.status_code != 200):
return []
# Find and follow all the links
links = link_re.findall(req.text)
for link in links:
# Get an absolute URL for a link
link = urlparse.urljoin(url, link)
result += crawl(link, maxlevel - 1)
# Find all emails on current page
result += email_re.findall(req.text)
return result
emails = crawl('http://ccs.neu.edu', 2)
print "Scrapped e-mail addresses:"
for e in emails:
print e
The error I get is below:
C:\Python27\python.exe "C:/Users/Sagar Shah/PycharmProjects/crawler/webcrawler.py"
Traceback (most recent call last):
File "C:/Users/Sagar Shah/PycharmProjects/crawler/webcrawler.py", line 41, in <module>
emails = crawl('http://ccs.neu.edu', 2)
File "C:/Users/Sagar Shah/PycharmProjects/crawler/webcrawler.py", line 35, in crawl
result += crawl(link, maxlevel - 1)
File "C:/Users/Sagar Shah/PycharmProjects/crawler/webcrawler.py", line 35, in crawl
result += crawl(link, maxlevel - 1)
TypeError: 'NoneType' object is not iterable
Process finished with exit code 1
Any suggestions will help. Thanks!
The problem is this:
if(maxlevel == 0):
return
Currently it return None when maxlevel == 0. You can't concatenate a list with a None object.
You need to return an empty list [] to be consistent.