I am parsing websites which sell electronic products..
Specifically, I am looking to collect the name and the price of the product
I ran into a small problem when parsing a xml based site....
Here is my code:
>>> import urllib2
>>> from bs4 import BeautifulSoup
>>> url=urllib2.urlopen("http://store.explorelabs.com/index.php?main_page=products_all")
>>> soup=BeautifulSoup(url,"xml")
>>> data=soup.find_all(colspan="2")
The code above works
now when I do this (as the name is inside the strong tags)
>>> data.strong
or
>>> data.attrs
It shows me this:
Traceback (most recent call last):
File "<pyshell#10>", line 1, in <module>
data.strong
AttributeError: 'ResultSet' object has no attribute 'strong'
or
Traceback (most recent call last):
File "<pyshell#17>", line 1, in <module>
data.find_all('a')
AttributeError: 'ResultSet' object has no attribute 'find_all'
I am trying to iterate and try to find out more.
Any pointers would be very helpful.
find_all returns a list of elements that match, not one. Loop over the result set to get the individual items:
for element in data:
element.attrs
Related
This is the code im running.
import bs4
import requests
from bs4 import BeautifulSoup
r=requests.get('https://finance.yahoo.com/quote/SAS.ST/?guccounter=1')
soup=bs4,BeautifulSoup(r.text,"xml")
soup.find_all('div')
And when i run it the output is
> Traceback (most recent call last): File
> "/Users/darre/Desktop/script.py3", line 8, in <module>
> bi=soup.find_all('div') AttributeError: 'tuple' object has no attribute 'find_all'
Here is error, use soup=BeautifulSoup(r.text,"lxml") instead of
soup=bs4,BeautifulSoup(r.text,"xml")
BeautifulSoup use different parser details here parser description here
Running the example from pip modified to choose the 'chrome' browser, I get a KeyError
script:
import browserhistory as bh
dict_obj = bh.get_browserhistory()
data = dict_obj.keys()
print(data)
data = dict_obj['chrome'][0]
print(data)
output:
dict_keys([])
Traceback (most recent call last):
File "/home/raj/Desktop/anilyzer/myproject/main.py", line 6, in
data = dict_obj['chrome'][0]
KeyError: 'chrome'
What is happening?
It doesn't look like the module is maintained and cannot detect your browser(s)
https://github.com/kcp18/browserhistory
The function bh.get_browserhistory() should return a dictionary where the type of browser is the lookup key. Displaying this as you do shows that the dictionary is empty.
>>> data = dict_obj.keys()
>>> print(data)
dict_keys([])
This is why you get a KeyError when attempting to read a specific key from the dict
>>> d = {}
>>> type(d)
<class 'dict'>
>>> d["something"]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 'something'
However, there appears to be a different project which may do what you want https://github.com/pesos/browser-history
This is from reading the GitHub Issues: https://github.com/kcp18/browserhistory/issues and I cannot comment on its quality or safety!
I am successful in extracting the response from a JSON. However, I am unable to list all or extract what I need on the key and its pair
Below is my code:
import requests
response = requests.get("https://www.woolworths.com.au/apis/ui/Product/Specials/half-price?GroupID=948&isMobile=false&pageNumber=1&pageSize=36&richRelevanceId=SP_948&sortType=Personalised")
data = response.json()
I tried to do data['Stockcode']
but no luck or I use data['Product']
It says:
>>> data['Product']
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 'Product'
>>> data['Products']
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 'Products'
try:
>>> data['Items'][0]['Products']
Print data and see its data structure How its constructed then you can extract values as per your need
Need help with an adidas auto checkout script. Getting the following error:
Traceback (most recent call last):
File "adidas.py", line 169, in <module>
checkout()
File "adidas.py", line 80, in checkout
url = soup.find('div', {'class': 'cart_wrapper rbk_shadow_angle rbk_wrapper_checkout summary_wrapper'})['data-url']
TypeError: 'NoneType' object is not subscriptable
Link to the entire script: https://github.com/kfichter/OpenATC/blob/482360a7a160136a4969d2cf0527809660d021fb/Scripts/adidas.py
soup.find() is returning None. You are trying to look up the key 'data-url' in this result, but None does not support key lookup.
Depending on what you're trying to do, you should either change the query so it doesn't return None, or check that the value is not None before trying to access the 'data-url' key.
I'm trying to integrate Pagseguro (a brazilian payment service, similar to PayPal) with this lib
https://github.com/rochacbruno/python-pagseguro
But, I don't know how to access the data from notification that the service sends to me. This is my code:
notification_code = request.POST['notificationCode']
pg = PagSeguro(email="testPerson#gmail.com", token="token")
notification_data = pg.check_notification(notification_code)
print notification_data['status']
In the las line I receive this error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'PagSeguroNotificationResponse' object has no attribute '__getitem__'
The documentation in the README doesn't seem to match the code. It looks like rather than notication_data being a dictionary it is an object that has attributes matching the dictionary keys from the README.
So this should work if you just change print notification_data['status'] to the following:
print notification_data.status