WebScraping With BS4, AttributeError with Find_all - python

This is the code im running.
import bs4
import requests
from bs4 import BeautifulSoup
r=requests.get('https://finance.yahoo.com/quote/SAS.ST/?guccounter=1')
soup=bs4,BeautifulSoup(r.text,"xml")
soup.find_all('div')
And when i run it the output is
> Traceback (most recent call last): File
> "/Users/darre/Desktop/script.py3", line 8, in <module>
> bi=soup.find_all('div') AttributeError: 'tuple' object has no attribute 'find_all'

Here is error, use soup=BeautifulSoup(r.text,"lxml") instead of
soup=bs4,BeautifulSoup(r.text,"xml")
BeautifulSoup use different parser details here parser description here

Related

Troubles merging two json urls

First of all, I am getting this error. When I try running
pip3 install --upgrade json
in an attempt to resolve the error, python is unable to find the module.
The segment of code I am working with can be found below the error, but some further direction as for the code itself would be appreciated.
Error:
Traceback (most recent call last):
File "Chicago_cp.py", line 18, in <module>
StopWork_data = json.load(BeautifulSoup(StopWork_response.data,'lxml'))
File "/usr/lib/python3.8/json/__init__.py", line 293, in load
return loads(fp.read(),
TypeError: 'NoneType' object is not callable
Script:
#!/usr/bin/python
import json
from bs4 import BeautifulSoup
import urllib3
http = urllib3.PoolManager()
# Define Merge
def Merge(dict1, dict2):
res = {**dict1, **dict2}
return res
# Open the URL and the screen name
StopWork__url = "someJsonUrl"
Violation_url = "anotherJsonUrl"
StopWork_response = http.request('GET', StopWork__url)
StopWork_data = json.load(BeautifulSoup(StopWork_response.data,'lxml'))
Violation_response = http.request('GET', Violation_url)
Violation_data = json.load(BeautifulSoup(Violation_response.data,'lxml'))
dict3 = Merge(StopWork_data,Violation_data)
print (dict1)
json.load expects a file object or something else with a read method. The BeautifulSoup object doesn't have a method read. You can ask it for any attribute and it will try to find a child tag with that name, i.e. a <read> tag in this case. When it doesn't find one it returns None which causes the error. Here's a demo:
import json
from bs4 import BeautifulSoup
soup = BeautifulSoup("<p>hi</p>", "html5lib")
assert soup.read is None
assert soup.blablabla is None
assert json.loads is not None
json.load(soup)
Output:
Traceback (most recent call last):
File "main.py", line 8, in <module>
json.load(soup)
File "/usr/lib/python3.8/json/__init__.py", line 293, in load
return loads(fp.read(),
TypeError: 'NoneType' object is not callable
If the URL is returning JSON then you don't need BeautifulSoup at all because that's for parsing HTML and XML. Just use json.loads(response.data).

AttributeError: 'module' object has no attribute 'webdriver'

AttributeError: 'module' object has no attribute 'webdriver'
why this error happen when write
import selenium
and when write code like this no error happen
from selenium import webdriver
You get an error because webdriver is a module inside the selenium module, and you can't access modules without an explicit import statement.
If you take a look at help(selenium), you'll see there are two modules and one non-module contained inside.
PACKAGE CONTENTS
common (package)
selenium
webdriver (package)
And it behaves according to what I described above:
>>> selenium.common # doesn't work
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'module' object has no attribute 'common'
>>> from selenium import common # works
>>> selenium.selenium # works
<class 'selenium.selenium.selenium'>
>>> selenium.webdriver # doesn't work
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'module' object has no attribute 'webdriver'
>>> from selenium import webdriver # works
>>>

How to scrape a XML website using bs4?

I am parsing websites which sell electronic products..
Specifically, I am looking to collect the name and the price of the product
I ran into a small problem when parsing a xml based site....
Here is my code:
>>> import urllib2
>>> from bs4 import BeautifulSoup
>>> url=urllib2.urlopen("http://store.explorelabs.com/index.php?main_page=products_all")
>>> soup=BeautifulSoup(url,"xml")
>>> data=soup.find_all(colspan="2")
The code above works
now when I do this (as the name is inside the strong tags)
>>> data.strong
or
>>> data.attrs
It shows me this:
Traceback (most recent call last):
File "<pyshell#10>", line 1, in <module>
data.strong
AttributeError: 'ResultSet' object has no attribute 'strong'
or
Traceback (most recent call last):
File "<pyshell#17>", line 1, in <module>
data.find_all('a')
AttributeError: 'ResultSet' object has no attribute 'find_all'
I am trying to iterate and try to find out more.
Any pointers would be very helpful.
find_all returns a list of elements that match, not one. Loop over the result set to get the individual items:
for element in data:
element.attrs

Printing docstrings in Python 3

How do you print doc strings in python 3.1.x?
I tried with the re and sys modules as a test and I keep getting errors. Thanks
import re
print(re._doc_)
Traceback (most recent call last):
File "<pyshell#91>", line 1, in <module>
print(re._doc_)
AttributeError: 'module' object has no attribute '_doc_'
It's called __doc__, not _doc_.
import re
print(re.__doc__)
Works just fine.

python: module has no attribute mechanize

#!/usr/bin/env python
import mechanize
mech = mechanize.Browser()
page = br.open(SchoolRank('KY'))
Gives:
Traceback (most recent call last):
File "mechanize.py", line 2, in <module>
import mechanize
File "/home/jcress/Documents/programming/schooldig/trunk/mechanize.py", line 12, in <module>
mech = mechanize.Browser()
AttributeError: 'module' object has no attribute 'Browser'
And I'm confused. I have the module installed for 2.6 and 2.7, same result...
Change your filename away from mechanize.py. Python is importing your file as the module.

Categories

Resources