this is my program
import webbrowser
import urllib
from bs4 import BeautifulSoup
soup = BeautifulSoup(html_doc, 'html.parser')
webbrowser.open("randomurl.com") # Open "randomurl.com"
and this is my error
Traceback (most recent call last):
File "c:\Users\Utilisateur\Desktop\code de foue\teste.py", line 4, in <module>
soup = BeautifulSoup(html_doc, 'html.parser')
NameError: name 'html_doc' is not defined
I have installed it using pip install beautifulsoup4
I'f a beginner pls help me :)
I'm already using other forum..
Related
I am trying to web scrape. However due to this error not able to go ahead.
my cord is...
import requests
from bs4 import BeautifulSoup
indeed_result = requests.get("https://kr.indeed.com/jobs?q=python&limit=50")
indeed_soup = BeautifulSoup(indeed_result.text, "html.parser")
pagination = indeed_soup.find("div", {"class": "pagination"})
pages = pagination.fine_all('a')
for page in pages:
print(page.get_text())
and my error message is...
Traceback (most recent call last):
File "index.py", line 13, in <module>
pages = pagination.find_all('a')
TypeError: 'NoneType' object is not callable
for your information,
i had install BeautifulSoup4
pages = pagination.fine_all('a') has a typo in the word FIND. That should fix it.
I am using Beautiful Soup for web scraping and getting a TypeError here.
My code is as follows :
import requests
from bs4 import BeautifulSoup
r = requests.get("https://www.amazon.com/s?k=asus&rh=n%3A300189&nav_sdd=aps&pd_rd_r=58b28d7d-1955-433b-b33b-b1b5dcf1f522&pd_rd_w=MJzan&pd_rd_wg=QG3cj&pf_rd_p=6d81377b-6d6c-4363-ae02-8fa202ed7b50&pf_rd_r=X0BDDAPN7TTW0ZT1REX6&qid=1583290662&ref=sxwds-sbc_c2")
soup = BeautifulSoup(r.text, 'html.parser')
x = soup.find(Class='a-size-medium a-color-base a-text-normal')
for vari in x:
print(vari.get_text())
The error:
Traceback (most recent call last):
File "c:/Users/intel/Desktop/Untitled-1.py", line 8, in <module>
for vari in x:
TypeError: 'NoneType' object is not iterable
I don't think my class id is wrong...
Your code doesn't work because Amazon is blocking your automated request.
import requests
from bs4 import BeautifulSoup
r = requests.get("https://www.amazon.com/s?k=asus&rh=n%3A300189&nav_sdd=aps&pd_rd_r=58b28d7d-1955-433b-b33b-b1b5dcf1f522&pd_rd_w=MJzan&pd_rd_wg=QG3cj&pf_rd_p=6d81377b-6d6c-4363-ae02-8fa202ed7b50&pf_rd_r=X0BDDAPN7TTW0ZT1REX6&qid=1583290662&ref=sxwds-sbc_c2")
soup = BeautifulSoup(r.text, 'html.parser')
with open("out.html", "w") as f:
f.write(str(soup))
Python cannot find module Request from urllib.request
update: - well i try to run this with python3.... see:
martin#linux-3645:~/dev/python> Python3 w9.py
If 'Python3' is not a typo you can use command-not-found to lookup the package that contains it, like this:
cnf Python3
martin#linux-3645:~/dev/python>
When trying to import Request from urllib.request in a Python-code, it's unable to find the package.
>>> from urllib.request import urlopen as uReq
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: No module named request
see the code i am trying to run...:
import bs4
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
my_url = "https://wordpress.org/plugins/participants-database/"
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, "html.parser")
ttt = page_soup.find("div", {"class":"plugin-meta"})
text_nodes = [node.text.strip() for node in ttt.ul.findChildren('li')[:-1:2]]
unfortunatly this results in:
martin#linux-3645:~/dev/python> python w9.py
Traceback (most recent call last):
File "w9.py", line 3, in <module>
from urllib.request import urlopen as uReq
ImportError: No module named request
martin#linux-3645:~/dev/python>
Well: there is no urllib.request module in Python 2, that module only exists in Python 3.
i did some search and found a possible fix: see this soultion: i could use urllib2 instead:
from urllib2 import Request
From the top of the module documentation:
Note: The urllib2 module has been split across several modules in Python 3 named urllib.request and urllib.error. The 2to3 tool will automatically adapt imports when converting your sources to Python 3.
But wait: i thought that i allready run Python 3 and that i continue to use that version; the code i am trying to execute is clearly designed for Python 3.
what goes wrong here?
Your sample code is for python3. Please note that you must run "python3" in lowercase, not Python3.
With the sample code you have posted, you won't get any output on screen because you have not asked for any output.
If you add print(text_nodes) at the end, you will get the following output:
['Version: 1.7.7.7', 'Active installations: 10,000+', 'Tested up to:
4.9.4']
I just bought a book to show me how to scrape websites but the first example right off the bat is not working for me - so now I am a little upset that I bought the book in the first place but I would like to try and get it going.
In Python 3.5 my code:
from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen("http://www.pythonscraping.com/pages/page1.html")
BsObj = BeautifulSoup(html.read())
print(bsObj.h1)*
Here is the error I am getting
Traceback (most recent call last):
File
"C:/Users/MyName/AppData/Local/Programs/Python/Python35-32/Lib/site-packages/bs4/test.py",
line 5, in
BsObj = BeautifulSoup(html.read())
File "C:\Users\MyName\AppData\Local\Programs\Python\Python35-32\lib\site-packages\bs4__init__.py",
line 153, in init
builder = builder_class()
File "C:\Users\MyName\AppData\Local\Programs\Python\Python35-32\lib\site-packages\bs4\builder_htmlparser.py",
line 39, in init
return super(HTMLParserTreeBuilder, self).init(*args, **kwargs)
TypeError: init() got an unexpected keyword argument 'strict'
Any ideas would be super helpful?
Thanks in advance
I guess you transcribed the code from the book. bsObj is not named consistently and there is an unnecessary * after print(). It will work after you change those two things.
Also note that read() is not needed and that it's better to define the parser, otherwise you will get a warning.
from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen('http://www.pythonscraping.com/pages/page1.html')
bsObj = BeautifulSoup(html, 'html.parser')
print(bsObj.h1)
Hey you just had some typos BsObj not bsObj in print line.
from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen("http://www.pythonscraping.com/pages/page1.html")
BsObj = BeautifulSoup(html.read())
print(BsObj.h1)
Here is my code, using find_all, but It works great with .find():
import requests
from BeautifulSoup import BeautifulSoup
r = requests.get(URL_DEFINED)
print r.status_code
soup = BeautifulSoup(r.text)
print soup.find_all('ul')
This is what I got:
Traceback (most recent call last):
File "scraper.py", line 19, in <module>
print soup.find_all('ul')
TypeError: 'NoneType' object is not callable
It looks like you're using BeautifulSoup version 3, which used a slightly different naming convention, eg: .findAll, while BeautifulSoup 4 standardised naming to be more PEP8 like, eg: .find_all (but keeps the older naming for backwards compatibility). Note that soup('ul') is the equivalent to find all on both.
To download and install, use pip install beautifulsoup4.
Then change your import to be:
from bs4 import BeautifulSoup
Then you're good to go.
Download BS4 from here. http://www.crummy.com/software/BeautifulSoup/#Download
Install it and import it at the beginning of your code like this:
import requests
from bs4 import BeautifulSoup
r = requests.get(URL_DEFINED)
print r.status_code
soup = BeautifulSoup(r.text)
print soup.find_all('ul')