This is my first post on this forum and I hope to explain my problem the right way.
So I wrote this little web crawler to update me when the price of a product on Amazon is updating the price. After that it is sending me a notification at Telegram.
def check_price():
page = requests.get(URL, headers=headers)
soup = BeautifulSoup(page.content, 'html.parser')
title = soup.find(id='priceblock_ourprice').get_text() # this is the problematic line
converted_price = title[0:6]
converted_price = float(converted_price.replace(',', '.'))
if os.path.exists('data.txt'):
with open('data.txt', 'r+') as f:
f_contents = f.read()
if converted_price != float(f_contents):
send_msg('The price was updated to: ' + str(converted_price) + '€')
f.write(str(converted_price))
else:
send_msg('The price was updated to: ' + str(converted_price) + '€')
with open('data.txt', 'w') as f:
f.write(str(converted_price))
return
The problem is now that it works on my local machine and I get the notification. But when I try to run the code on the server I get this message:
Traceback (most recent call last):
File "main.py", line 44, in <module>
check_price()
File "main.py", line 16, in check_price
title = soup.find(id='priceblock_ourprice').get_text()
AttributeError: 'NoneType' object has no attribute 'get_text'
I just post the main function for checking the price and not the sending because the problem is occurring before that.
I can't find the error in the way I did it. I hope you can help me and thanks.
Related
So BS4 was working earlier today however it has problems when trying to load a page.
import requests
from bs4 import BeautifulSoup
name = input("")
twitter = requests.get("https://twitter.com/" + name)
#instagram = requests.get("https//instagram.com/" + name)
#website = requests.get("https://" + name + ".com")
twitter_soup = BeautifulSoup(twitter, 'html.parser')
twitter_available = twitter_soup.body.findAll(text="This account doesn't exist")
if twitter_available == True:
print("Available")
else:
print("Not Available")
So the line where twitter_soup is declared I get the following errors
Traceback (most recent call last):
File "D:\Programming\Python\name-checker.py", line 12, in
twitter_soup = BeautifulSoup(twitter, 'html.parser')
File "C:\Users\User\AppData\Local\Programs\Python\Python39\lib\site-packages\bs4_init_.py", line 310, in init
elif len(markup) <= 256 and (
TypeError: object of type 'Response' has no len()
I have also tried the other parsers the docs were suggesting however none are working.
I just figured it out.
So I had to use the actual html which is twitter.text in this situation instead of just using the request.
I am making a program for web scraping but this is my first time. The tutorial that I am using is built for python 2.7, but I am using 3.8.2. I have mostly edited my code to fit it to python 3, but one error pops up and I can't fix it.
import requests
import csv
from bs4 import BeautifulSoup
url = 'http://www.showmeboone.com/sheriff/JailResidents/JailResidents.asp'
response = requests.get(url)
html = response.content
soup = BeautifulSoup(features="html.parser")
results_table = soup.find('table', attrs={'class': 'resultsTable'})
output = []
for row in results_table.findAll('tr'):
output_rows = []
for cell in tr.findAll('td'):
output_rows.append(cell.text.replace(' ', ''))
output.append(output_rows)
print(output)
handle = open('out-using-requests.csv', 'a')
outfile = csv.writer(handle)
outfile.writerows(output)
The error I get is:
Traceback (most recent call last):
File "C:\Code\scrape.py", line 17, in <module>
for row in results_table.findAll('tr'):
AttributeError: 'NoneType' object has no attribute 'findAll'
The tutorial I am using is https://first-web-scraper.readthedocs.io/en/latest/
I tried some other questions, but they didn't help.
Please help!!!
Edit: Never mind, I got a good answer.
find returns None if it doesn't find a match. You need to check for that before attempting to find any sub elements in it:
results_table = soup.find('table', attrs={'class': 'resultsTable'})
output = []
if results_table:
for row in results_table.findAll('tr'):
output_rows = []
for cell in tr.findAll('td'):
output_rows.append(cell.text.replace(' ', ''))
output.append(output_rows)
The error allows the following conclusion:
results_table = None
Therefore, you cannot access the findAll() method because None.findAll() does not exist.
You should take a look, it is best to use a debugger to run through your program and see how the variables change line by line and why the mentioned line only returns ```None''. Especially important is the line:
results_table = soup.find('table', attrs={'class': 'resultsTable'})
Because in this row results_table is initialized yes, so here the above none'' value is returned andresults_table'' is assigned.
I'm programming in school and will soon need to program my final piece. The following program (written in python's programming language) as a program I'm writing simply to practice accessing APIs.
I'm attempting to access the API for a sight based on a game. The idea on the program is to check this API every 30 seconds to check for changes in the data, by storing to sets of data ('baseRank' and 'basePP') as soon as it's running, then comparing this data with new data taken 30 seconds later.
Here is my program:
import time
apiKey = '###'
rankDifferences = []
ppDifferences = []
const = True
username = '- Legacy'
url = "https://osu.ppy.sh/api/get_user?u={1}&k={0}".format(apiKey,username)
import urllib.request, json
with urllib.request.urlopen(url) as url:
stats = json.loads(url.read().decode())
stats = stats[0]
basePP = stats['pp_raw']
print(basePP)
baseRank = stats['pp_rank']
print(baseRank)
while const == True:
time.sleep(30)
import urllib.request, json
with urllib.request.urlopen(url) as url:
check = json.loads(url.read().decode())
check = check[0]
rankDifference = baseRank + check['pp_rank']
ppDifference = basePP + check['pp_raw']
baseRank = check['pp_raw']
basePP = check['pp_raw']
if rankDifference != 0:
print(rankDifference)
if ppDifference != 0:
print(ppDifference)`
Please note, where I have written 'apiKey = '###'', I am in fact using a real, working API key, but I've hidden it as the site asks you not to share your api key with others.
Here is the state of the shell after running:
5206.55
12045
Traceback (most recent call last):
File "C:/Users/ethan/Documents/osu API Accessor.py", line 23, in
with urllib.request.urlopen(url) as url:
File
"C:\Users\ethan\AppData\Local\Programs\Python\Python36\lib\urllib\request.py", >line 223, in urlopen
return opener.open(url, data, timeout)
File
"C:\Users\ethan\AppData\Local\Programs\Python\Python36\lib\urllib\request.py", >line 518, in open
protocol = req.type
AttributeError: 'HTTPResponse' object has no attribute 'type'
As you can see, it does print both 'basePP' and 'baseRank', proving that I can access this API. The problem seems to be when I try to access it a second time. To be completely honest, I'm not entirely sure what this error means.. So if you wouldn't mind taking the time to explain and/or help fix this error, it would be greatly appreciated.
Side note: This is my first time using this forum so if I'm doing anything wrong, I'm very sorry!
The problem seems to be when you do:
with urllib.request.urlopen(url) as url:
stats = json.loads(url.read().decode())
Your use of the url variable is changing it, so that when you try and use it later it doesn't work.
Try something like:
with urllib.request.urlopen(url) as page:
stats = json.loads(page.read().decode())
and it should be okay.
A little while ago, I posted on here for help using the API to download data from Tumblr blogs. birryree (https://stackoverflow.com/users/297696/birryree) was kind enough to help me correct my script and figure out where I had been going wrong, and I have been using his script with no problems since (Print more than 20 posts from Tumblr API).
This script requires that I manually input the blog name that I want to download each time. However, I need to download hundreds of blogs, so this has led to me working with hundreds of versions of the same script and is very time-consuming. I did some googling and found that it was possible to write Python scripts where you can input arguments from the command line and then they would be processed (if that's the right terminology) one by one.
I tried to write a script which would let me run a command from the command prompt and which would then download the three blogs I've asked for in the command prompt. (in this case, "prettythingsicantafford.tumblr.com; theficrecfairy.tumblr.com; and staff.tumblr.com).
So my script that I'm trying to run is:
import pytumblr
import sys
def get_all_posts(client, blog):
offset = 0
while True:
response = client.posts(blog, limit=20, offset=offset, reblog_info=True, notes_info=True)
# Get the 'posts' field of the response
posts = response['posts']
if not posts: return
for post in posts:
yield post
# move to the next offset
offset += 20
client = pytumblr.TumblrRestClient('SECRET')
blog = (sys.argv[1], sys.argv[2], sys.argv[3])
# use our function
with open('{}-posts.txt'.format(blog), 'w') as out_file:
for post in get_all_posts(client, blog):
print >>out_file, post
I am running the following command from the command prompt
tumblr_test2.py theficrecfairy prettythingsicantafford staff
However, I get the following error message:
Traceback (most recent call last):
File "C:\Users\izzy\test\tumblr_test2.py", line 29, in <module>
for post in get_all_posts(client, blog):
File "C:\Users\izzy\test\tumblr_test2.py", line 8, in get_all_posts
response = client.posts(blog, limit=20, offset=offset, reblog_info=True, notes_info=True)
File "C:\Python27\lib\site-packages\pytumblr\helpers.py", line 46, in add_dot_tumblr
args[1] += ".tumblr.com"
TypeError: can only concatenate tuple (not "str") to tuple
I have been trying to modify my script for about two weeks now in response to this error, but I have been unable to correct my no doubt very obvious mistake and would be very grateful for any help or advice.
EDIT FOLLOWING vishes_shell's ADVICE:
I am now working with the following script:
import pytumblr
import sys
def get_all_posts(client, blogs):
for blog in blogs:
offset = 0
while True:
response = client.posts(blog, limit=20, offset=offset, reblog_info=True, notes_info=True, filter='raw')
# Get the 'posts' field of the response
posts = response['posts']
if not posts: return
for post in posts:
yield post
# move to the next offset
offset += 20
client = pytumblr.TumblrRestClient('SECRET')
blog = sys.argv
# use our function
with open('{}-postsredux.txt'.format(blog), 'w') as out_file:
for post in get_all_posts(client, blog):
print >>out_file, post
However, I now get the following error message:
Traceback (most recent call last):
File "C:\Users\izzy\test\tumblr_test2.py", line 27, in <module>
with open('{}-postsredux.txt'.format(blog), 'w') as out_file:
IOError: [Errno 22] invalid mode ('w') or filename: "
['C:\\\\Users\\\\izzy\\\\test\\\\tumblr_test2.py',
'prettythingsicantafford', 'theficrecfairy']-postsredux.txt"
The problem that you trying to client.posts(blog, ...) when blog is tuple object, declared as:
blog = (sys.argv[1], sys.argv[2], sys.argv[3])
You need to refactor your method to go over each blog separately.
def get_all_posts(client, blogs):
for blog in blogs:
offset = 0
...
while True:
response = client.posts(blog, ...)
...
...
blog = sys.argv
...
When I run the following function:
def checkChange():
for user in userLinks:
url = userLinks[user]
response = urllib2.urlopen(url)
html = response.read()
I get
Traceback (most recent call last):
File "InStockBot.py", line 34, in <module>
checkChange()
File "InStockBot.py", line 24, in checkChange
html = response.read()
UnboundLocalError: local variable 'response' referenced before assignment
Which makes no sense to me. I have no global var response. I expect it to work as below, normally.
>>> url="http://google.com"
>>> response = urllib2.urlopen(url)
>>> html = response.read()
>>> html
'<!doctype html>
Anyone know why I get this error?
You're mixing tabs and spaces. Looking at the raw code you pasted:
' def checkChange():'
' \tfor user in userLinks:'
' \t\turl = userLinks[user]'
' \t\tresponse = urllib2.urlopen(url) '
' html = response.read()'
You can see the switch in the last line. Effectively, this means that the html = response.read() line isn't indented as far as you think it is, meaning that if userLinks is empty, you'll get:
Traceback (most recent call last):
File "inde.py", line 10, in <module>
checkChange()
File "inde.py", line 5, in checkChange
html = response.read()
UnboundLocalError: local variable 'response' referenced before assignment
Run your code using python -tt yourprogramname.py to confirm this, and switch to always using four-space tabs.
Your code isn't indented properly. Change it to this and it'll work (probably not as intended, but it will work):
for user in userLinks:
url = userLinks[user]
response = urllib2.urlopen(url)
html = response.read()
if userSources[user] != html:
del userSources[user]
del userLinks[user]
api.PostDirectMessage(user,'It appears the page has updated! Your item may be back in stock!')
The error occurs because you define response in the for loop, but if the loop doesn't run (i.e. userLinks == []), that variable is never set.