I have been at this for a while, but can't seem to get the text value of an element of a resultset object using beautifulsoup. Here is the method that is failing:
def __getNameOfProduct(self, product):
#product is of type bs4.resultset...
for value in product:
print value.find_all("div",class_="proddisc").title.string
Its my own markup so I don't have a url (I am working through a tutorial), but here is the error I am getting
Traceback (most recent call last):
File "ctd.py", line 64, in <module>
main()
File "ctd.py", line 60, in main
p.getItemsInStock()
File "ctd.py", line 26, in getItemsInStock
return self.__returnItemDetailAsDictionary(itemDetail)
File "ctd.py", line 32, in __returnItemDetailAsDictionary
nameOfProduct = self.__getNameOfProduct(product)
File "ctd.py", line 44, in __getNameOfProduct
print value.find_all("div",class_="proddisc").title.string
AttributeError: 'ResultSet' object has no attribute 'title'
Any help would be very much appreciated.
Thanks!
the way you accessing attribute is only valid for single object not for many_objects as you getting with "find_all".
as i got your requirement, this will work:
html = urllib.urlopen("http://yoursite.com")
soup = BeautifulSoup(html)
prodisc_div = soup.findAll('div', attrs={class:"prodisc"})
for each in prodisc_div:
print each.get("title")
Related
First of all, I am getting this error. When I try running
pip3 install --upgrade json
in an attempt to resolve the error, python is unable to find the module.
The segment of code I am working with can be found below the error, but some further direction as for the code itself would be appreciated.
Error:
Traceback (most recent call last):
File "Chicago_cp.py", line 18, in <module>
StopWork_data = json.load(BeautifulSoup(StopWork_response.data,'lxml'))
File "/usr/lib/python3.8/json/__init__.py", line 293, in load
return loads(fp.read(),
TypeError: 'NoneType' object is not callable
Script:
#!/usr/bin/python
import json
from bs4 import BeautifulSoup
import urllib3
http = urllib3.PoolManager()
# Define Merge
def Merge(dict1, dict2):
res = {**dict1, **dict2}
return res
# Open the URL and the screen name
StopWork__url = "someJsonUrl"
Violation_url = "anotherJsonUrl"
StopWork_response = http.request('GET', StopWork__url)
StopWork_data = json.load(BeautifulSoup(StopWork_response.data,'lxml'))
Violation_response = http.request('GET', Violation_url)
Violation_data = json.load(BeautifulSoup(Violation_response.data,'lxml'))
dict3 = Merge(StopWork_data,Violation_data)
print (dict1)
json.load expects a file object or something else with a read method. The BeautifulSoup object doesn't have a method read. You can ask it for any attribute and it will try to find a child tag with that name, i.e. a <read> tag in this case. When it doesn't find one it returns None which causes the error. Here's a demo:
import json
from bs4 import BeautifulSoup
soup = BeautifulSoup("<p>hi</p>", "html5lib")
assert soup.read is None
assert soup.blablabla is None
assert json.loads is not None
json.load(soup)
Output:
Traceback (most recent call last):
File "main.py", line 8, in <module>
json.load(soup)
File "/usr/lib/python3.8/json/__init__.py", line 293, in load
return loads(fp.read(),
TypeError: 'NoneType' object is not callable
If the URL is returning JSON then you don't need BeautifulSoup at all because that's for parsing HTML and XML. Just use json.loads(response.data).
I'm making an application to download features that are on Instagram and I'm making a function to download Instagram stories
in the instaloader documentation they write :
download_storyitem(item, target)
Download one user story.
Parameters
item (StoryItem) – Story item, as in story[‘items’] for story in get_stories()
target (Union[str, Path]) – Replacement for {target} in dirname_pattern and filename_pattern
and this is my code :
def get_stories(self):
get = root.get_stories(userids=6328186396)
root.download_storyitem(get, ['file_name.jpg', "C:\Users"])
but this produces an error :
Traceback (most recent call last):
File "instagram_downloader.py", line 68, in <module>
main()
File "instagram_downloader.py", line 59, in main
start.get_stories()
File "instagram_downloader.py", line 21, in get_stories
root.download_storyitem(get, ['file_name.jpg', "\Pictures"])
File "C:\Users\codin\AppData\Local\Programs\Python\Python38-32\lib\site-packages\instaloader\instaloader.py", line 631, in download_storyitem
date_local = item.date_local
AttributeError: 'generator' object has no attribute 'date_local'
download_storyitem(item, '{}/{}'.format(highlight.owner_username, highlight.title))
use this code
Need help with an adidas auto checkout script. Getting the following error:
Traceback (most recent call last):
File "adidas.py", line 169, in <module>
checkout()
File "adidas.py", line 80, in checkout
url = soup.find('div', {'class': 'cart_wrapper rbk_shadow_angle rbk_wrapper_checkout summary_wrapper'})['data-url']
TypeError: 'NoneType' object is not subscriptable
Link to the entire script: https://github.com/kfichter/OpenATC/blob/482360a7a160136a4969d2cf0527809660d021fb/Scripts/adidas.py
soup.find() is returning None. You are trying to look up the key 'data-url' in this result, but None does not support key lookup.
Depending on what you're trying to do, you should either change the query so it doesn't return None, or check that the value is not None before trying to access the 'data-url' key.
I am trying to find all num's in a list from an html using beautifulsoup:
import urllib
from BeautifulSoup import *
import re
line = None
url = raw_input('Enter - ')
html = urllib.urlopen(url).read()
soup = BeautifulSoup(html)
# Retrieve all of the anchor tags
tags = soup('span')
for line in tags:
line = line.strip()
numlist = re.findall('[0-9]+' , tags)
print numlist`
I'm getting a traceback:
Traceback (most recent call last): File "C:\Documents and
Settings\mea388\Desktop\PythonSchool\new 12.py", line 14, in
line = line.strip() TypeError: 'NoneType' object is not callable
I cannot understand why I'm getting a traceback.
That's because you are trying to run strip on the tag class within beautiful soup.
Change line 14 to:
line = line.string.strip()
However be aware that this can still be None when the tag you are searching for has multiple sub elements. Seee link to string method on doco for beautiful soup
I have the following script, to find an image on a page and download it:
from lxml import html
import urllib
import urllib2
url = 'http://www.example.com/pages/page0987/'
usock = urllib2.urlopen(url)
data = usock.read()
usock.close()
tree = html.fromstring(data)
src = tree.xpath('/html/body/div[2]/div[4]/div/div/img/#src')
urllib.urlretrieve(src, "local-filename.jpg")
I get a webpage, access an <img> element on this page (I tr to find it using an XPath query), then I get a src attribute of this element and then try to download the image using this url from the source.
But something is wrong; Python says:
Traceback (most recent call last):
File "C:\Users\Sergey\Desktop\dlImg.py", line 15, in <module>
urllib.urlretrieve(src, "local-filename.jpg")
File "C:\Python27\lib\urllib.py", line 94, in urlretrieve
return _urlopener.retrieve(url, filename, reporthook, data)
File "C:\Python27\lib\urllib.py", line 228, in retrieve
url = unwrap(toBytes(url))
File "C:\Python27\lib\urllib.py", line 1060, in unwrap
url = url.strip()
AttributeError: 'list' object has no attribute 'strip'
Your tree.xpath() query returns a list, not a single match. At the very least index for the first item:
urllib.urlretrieve(src[0], "local-filename.jpg")
or use a loop over the results. Take into account that the list can be empty as well (no matches found).