When I run the following function:
def checkChange():
for user in userLinks:
url = userLinks[user]
response = urllib2.urlopen(url)
html = response.read()
I get
Traceback (most recent call last):
File "InStockBot.py", line 34, in <module>
checkChange()
File "InStockBot.py", line 24, in checkChange
html = response.read()
UnboundLocalError: local variable 'response' referenced before assignment
Which makes no sense to me. I have no global var response. I expect it to work as below, normally.
>>> url="http://google.com"
>>> response = urllib2.urlopen(url)
>>> html = response.read()
>>> html
'<!doctype html>
Anyone know why I get this error?
You're mixing tabs and spaces. Looking at the raw code you pasted:
' def checkChange():'
' \tfor user in userLinks:'
' \t\turl = userLinks[user]'
' \t\tresponse = urllib2.urlopen(url) '
' html = response.read()'
You can see the switch in the last line. Effectively, this means that the html = response.read() line isn't indented as far as you think it is, meaning that if userLinks is empty, you'll get:
Traceback (most recent call last):
File "inde.py", line 10, in <module>
checkChange()
File "inde.py", line 5, in checkChange
html = response.read()
UnboundLocalError: local variable 'response' referenced before assignment
Run your code using python -tt yourprogramname.py to confirm this, and switch to always using four-space tabs.
Your code isn't indented properly. Change it to this and it'll work (probably not as intended, but it will work):
for user in userLinks:
url = userLinks[user]
response = urllib2.urlopen(url)
html = response.read()
if userSources[user] != html:
del userSources[user]
del userLinks[user]
api.PostDirectMessage(user,'It appears the page has updated! Your item may be back in stock!')
The error occurs because you define response in the for loop, but if the loop doesn't run (i.e. userLinks == []), that variable is never set.
Related
This is my first post on this forum and I hope to explain my problem the right way.
So I wrote this little web crawler to update me when the price of a product on Amazon is updating the price. After that it is sending me a notification at Telegram.
def check_price():
page = requests.get(URL, headers=headers)
soup = BeautifulSoup(page.content, 'html.parser')
title = soup.find(id='priceblock_ourprice').get_text() # this is the problematic line
converted_price = title[0:6]
converted_price = float(converted_price.replace(',', '.'))
if os.path.exists('data.txt'):
with open('data.txt', 'r+') as f:
f_contents = f.read()
if converted_price != float(f_contents):
send_msg('The price was updated to: ' + str(converted_price) + '€')
f.write(str(converted_price))
else:
send_msg('The price was updated to: ' + str(converted_price) + '€')
with open('data.txt', 'w') as f:
f.write(str(converted_price))
return
The problem is now that it works on my local machine and I get the notification. But when I try to run the code on the server I get this message:
Traceback (most recent call last):
File "main.py", line 44, in <module>
check_price()
File "main.py", line 16, in check_price
title = soup.find(id='priceblock_ourprice').get_text()
AttributeError: 'NoneType' object has no attribute 'get_text'
I just post the main function for checking the price and not the sending because the problem is occurring before that.
I can't find the error in the way I did it. I hope you can help me and thanks.
So BS4 was working earlier today however it has problems when trying to load a page.
import requests
from bs4 import BeautifulSoup
name = input("")
twitter = requests.get("https://twitter.com/" + name)
#instagram = requests.get("https//instagram.com/" + name)
#website = requests.get("https://" + name + ".com")
twitter_soup = BeautifulSoup(twitter, 'html.parser')
twitter_available = twitter_soup.body.findAll(text="This account doesn't exist")
if twitter_available == True:
print("Available")
else:
print("Not Available")
So the line where twitter_soup is declared I get the following errors
Traceback (most recent call last):
File "D:\Programming\Python\name-checker.py", line 12, in
twitter_soup = BeautifulSoup(twitter, 'html.parser')
File "C:\Users\User\AppData\Local\Programs\Python\Python39\lib\site-packages\bs4_init_.py", line 310, in init
elif len(markup) <= 256 and (
TypeError: object of type 'Response' has no len()
I have also tried the other parsers the docs were suggesting however none are working.
I just figured it out.
So I had to use the actual html which is twitter.text in this situation instead of just using the request.
I am making a program for web scraping but this is my first time. The tutorial that I am using is built for python 2.7, but I am using 3.8.2. I have mostly edited my code to fit it to python 3, but one error pops up and I can't fix it.
import requests
import csv
from bs4 import BeautifulSoup
url = 'http://www.showmeboone.com/sheriff/JailResidents/JailResidents.asp'
response = requests.get(url)
html = response.content
soup = BeautifulSoup(features="html.parser")
results_table = soup.find('table', attrs={'class': 'resultsTable'})
output = []
for row in results_table.findAll('tr'):
output_rows = []
for cell in tr.findAll('td'):
output_rows.append(cell.text.replace(' ', ''))
output.append(output_rows)
print(output)
handle = open('out-using-requests.csv', 'a')
outfile = csv.writer(handle)
outfile.writerows(output)
The error I get is:
Traceback (most recent call last):
File "C:\Code\scrape.py", line 17, in <module>
for row in results_table.findAll('tr'):
AttributeError: 'NoneType' object has no attribute 'findAll'
The tutorial I am using is https://first-web-scraper.readthedocs.io/en/latest/
I tried some other questions, but they didn't help.
Please help!!!
Edit: Never mind, I got a good answer.
find returns None if it doesn't find a match. You need to check for that before attempting to find any sub elements in it:
results_table = soup.find('table', attrs={'class': 'resultsTable'})
output = []
if results_table:
for row in results_table.findAll('tr'):
output_rows = []
for cell in tr.findAll('td'):
output_rows.append(cell.text.replace(' ', ''))
output.append(output_rows)
The error allows the following conclusion:
results_table = None
Therefore, you cannot access the findAll() method because None.findAll() does not exist.
You should take a look, it is best to use a debugger to run through your program and see how the variables change line by line and why the mentioned line only returns ```None''. Especially important is the line:
results_table = soup.find('table', attrs={'class': 'resultsTable'})
Because in this row results_table is initialized yes, so here the above none'' value is returned andresults_table'' is assigned.
I was trying to make a simple program to extract words from paragraphs in a web page.
my code looks like this -
import requests
from bs4 import BeautifulSoup
import operator
def start(url):
word_list = []
source_code = requests.get(url).text
soup = BeautifulSoup(source_code)
for post_text in soup.find_all('p'):
cont = post_text.string
words = cont.lower().split()
for each_word in words:
print(each_word)
word_list.append(each_word)
start('https://lifehacker.com/why-finding-your-passion-isnt-enough-1826996673')
First I am getting this warning -
UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("html.parser"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.
The code that caused this warning is on line 17 of the file D:/Projects/Crawler/Main.py. To get rid of this warning, change code that looks like this:
BeautifulSoup(YOUR_MARKUP})
to this:
BeautifulSoup(YOUR_MARKUP, "html.parser")
markup_type=markup_type))
and then there is this error in the end:
Traceback (most recent call last):
File "D:/Projects/Crawler/Main.py", line 17, in <module>
start('https://lifehacker.com/why-finding-your-passion-isnt-enough-1826996673')
File "D:/Projects/Crawler/Main.py", line 11, in start
words = cont.lower().split()
AttributeError: 'NoneType' object has no attribute 'lower'
I have tried searching, but not able to resolve or understand the problem.
You are parsing that page using the paragraph tag <p>, but that tag does not always have textual content associated to it. For instance, if you were to instead run:
def start(url):
word_list = []
source_code = requests.get(url).text
soup = BeautifulSoup(source_code)
for post_text in soup.find_all('p'):
print(post_text)
You would see that you're getting hits off of things like advertisements: <p class="ad-label=bottom"></p>. As others have stated in the comment, None type does not have string methods, which is literally what your error is referring to.
A simple way to guard against this would be to wrap a section of your function in a try/except block:
for post_text in soup.find_all('p'):
try:
cont = post_text.string
words = cont.lower().split()
for each_word in words:
print(each_word)
word_list.append(each_word)
except AttributeError:
pass
Getting the following error:
Traceback (most recent call last):
File "stack.py", line 31, in ?
print >> out, "%s" % escape(p) File
"/usr/lib/python2.4/cgi.py", line
1039, in escape
s = s.replace("&", "&") # Must be done first! TypeError: 'NoneType'
object is not callable
For the following code:
import urllib2
from cgi import escape # Important!
from BeautifulSoup import BeautifulSoup
def is_talk_anchor(tag):
return tag.name == "a" and tag.findParent("dt", "thumbnail")
def talk_description(tag):
return tag.name == "p" and tag.findParent("h3")
links = []
desc = []
for pagenum in xrange(1, 5):
soup = BeautifulSoup(urllib2.urlopen("http://www.ted.com/talks?page=%d" % pagenum))
links.extend(soup.findAll(is_talk_anchor))
page = BeautifulSoup(urllib2.urlopen("http://www.ted.com/talks/arvind_gupta_turning_trash_into_toys_for_learning.html"))
desc.extend(soup.findAll(talk_description))
out = open("test.html", "w")
print >>out, """<html><head><title>TED Talks Index</title></head>
<body>
<table>
<tr><th>#</th><th>Name</th><th>URL</th><th>Description</th></tr>"""
for x, a in enumerate(links):
print >> out, "<tr><td>%d</td><td>%s</td><td>http://www.ted.com%s</td>" % (x + 1, escape(a["title"]), escape(a["href"]))
for y, p in enumerate(page):
print >> out, "<td>%s</td>" % escape(p)
print >>out, "</tr></table>"
I think the issue is with % escape(p). I'm trying to take the contents of that <p> out. Am I not supposed to use escape?
Also having an issue with the line:
page = BeautifulSoup(urllib2.urlopen("%s") % a["href"])
That's what I want to do, but again running into errors and wondering if there's an alternate way of doing it. Just trying to collect the links I found from previous lines and run it through BeautifulSoup again.
You have to investigate (using pdb) why one of your links is returned as None instance.
In particular: the traceback is self-speaking. The escape() is called with None. So you have to investigate which argument is None...it's one of of your items in 'links'. So why is one of your items None?
Likely because one of your calls to
def is_talk_anchor(tag):
return tag.name == "a" and tag.findParent("dt", "thumbnail")
returns None because tag.findParent("dt", "thumbnail") returns None (due to your given HTML input).
So you have to check or filter your items in 'links' for None (or adjust your parser code above) in order to pickup only existing links according to your needs.
And please read your tracebacks carefully and think about what the problem might be - tracebacks are very helpful and provide you with valuable information about your problem.