I have the following code (in PyCharm (MacOS)):
import pandas as pd
fiddy_states = pd.read_html('https://simple.wikipedia.org/wiki/List_of_U.S._states')
print(fiddy_states)
And I get the following error:
/Library/Frameworks/Python.framework/Versions/3.6/bin/python3.6 /Users/user_name/PycharmProjects/PandasTest/Doc3.py
Traceback (most recent call last):
File "/Users/user_name/PycharmProjects/PandasTest/Doc3.py", line 9, in <module>
fiddy_states = pd.read_html('https://simple.wikipedia.org/wiki/List_of_U.S._states')
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/io/html.py", line 906, in read_html
keep_default_na=keep_default_na)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/io/html.py", line 733, in _parse
parser = _parser_dispatch(flav)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/io/html.py", line 693, in _parser_dispatch
raise ImportError("lxml not found, please install it")
ImportError: lxml not found, please install it
In Anaconda does appear installed the last version of lxml (3.8.0). Despite of that, I have tried to reinstall it by: 1) writing pip install lxml and 2) downloading the lxml wheel corresponding to my python version (lxml-3.8.0-cp36-cp36m-win_amd64.whl), but in any case all remains the same (in the second case I get that it is not a supported wheel on this platform, even though the version of python is correct (3.6, 64 bits)).
I've read similar questions here (even with the same code above, since it's from a tutorial), but the problem still persists.
Based on the fact that the error is:
/Library/Frameworks/Python.framework/Versions/3.6/bin/python3.6
This means that you are working with python-3.6. Now usually the package manager for python-3.x is pip3. So you probably should install it with:
pip3 install lxml
For people reached here using Jupyter notebook, I restarted the kernel after pip install lxml and the error is gone.
I got same error, it seems that my python3 was pointing to pandas in python2 (since I have not install pandas in python3). After doing pip3 install pandas and restarting a notebook, it worked fine.
you may have to (re)install some of your libraries pip install lxml bs4 html5lib
pd.read_html() reads with 'lxml' library by default, so try another library that you installed above like pd.read_html(some_url, flavor='html5lib')
You can go to Settings > Project Interpreter > Click on '+' icon
Find 'lxml' from the list of packages and click 'Install Package' button found below.
I am using PyCharm 2019.2.1 (Community Edition)
Build #PC-192.6262.63, built on August 22, 2019
Runtime version: 11.0.3+12-b304.39 amd64
VM: OpenJDK 64-Bit Server VM by JetBrains s.r.o
Linux 4.15.0-58-generic
GC: ParNew, ConcurrentMarkSweep
Memory: 937M
Cores: 4
I tried to reinstall lxml without any progress.
I ended uninstalling pandas and reinstalling and updating and that solved my issues!
pip uninstall pandas
pip install pandas
pip3 install --upgrade pandas
I got the same error when trying to run some code that was using pandas. I tried some suggestions here but those did not work. Finally, what worked for me was the following two steps :
conda update anaconda
conda install spyder=5.0.5
Now when I restarted Spyder and ran my code it worked fine.
I have just installed and starting using anaconda so I don't know the root cause of this issue, but my guess is there seemed to be some "cross-connection" in the packages I had installed prior to my installation of Anaconda, and by running the above two steps now everything is running from within the Anaconda environment.
This error occurs when lxml is not installed, so just go to the terminal
and run: pip3 install lxml
I got the same problem. Trying to reinstall lxml does not work. After rereading the error message and tracing the error ~\Miniconda3\envs\mini_ds\lib\site-packages\pandas\io\html.py:872, I think I found the problem lies in the function _importers() in ~/pandas/io/html.py.
Here is the function:
def _importers() -> None:
# import things we need
# but make this done on a first use basis
global _IMPORTS
if _IMPORTS:
return
global _HAS_BS4, _HAS_LXML, _HAS_HTML5LIB
bs4 = import_optional_dependency("bs4", errors="ignore")
_HAS_BS4 = bs4 is not None
lxml = import_optional_dependency("lxml.etree", errors="ignore")
_HAS_LXML = lxml is not None
html5lib = import_optional_dependency("html5lib", errors="ignore")
_HAS_HTML5LIB = html5lib is not None
_IMPORTS = True
You can see that for lxml option, it actually tries importing "lxml.etree" instead of "lxml". So this is probably why reinstalling "lxml" would not help.
Conclusion, I think this is perhaps a problem of pandas version (mine is 1.4.1). For me, a quick solution is to specify the flavor ='html5lib' in pd.read_html().
I installed lxml 4.9.1, but it didn't work. So I tried to install lxml 4.8.0 instead, and it worked!
pip install lxml==4.8
As OP is using Anaconda, in order to solve that issue, install lxml by opening the CMD.Exe Prompt for the environment one is working on, and run
conda install -c anaconda lxml
(Source)
One can also do it by specifying the version as follows
conda install -c anaconda lxml=4.8.0
Notes:
pip doesn't manage dependencies the same way conda does and can, potentially, damage one's installation. Therefore, would recommend to use it only if conda doesn't work.
pip install lxml
# or
pip install lxml==4.9.1
If one is using pip and one has already the package installed and one is getting errors, one can pass -I (--ignore-installed) and -v as follows
pip install -Iv lxml==4.9.1
lxml official documentation can be found here.
This is their official GitHub repo.
I was seeing this issue as well on my RPi.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/pi/python3-ml/lib/python3.7/site-packages/pandas/util/_decorators.py", line 311, in wrapper
return func(*args, **kwargs)
File "/home/pi/python3-ml/lib/python3.7/site-packages/pandas/io/html.py", line 1113, in read_html
displayed_only=displayed_only,
File "/home/pi/python3-ml/lib/python3.7/site-packages/pandas/io/html.py", line 902, in _parse
parser = _parser_dispatch(flav)
File "/home/pi/python3-ml/lib/python3.7/site-packages/pandas/io/html.py", line 859, in _parser_dispatch
raise ImportError("lxml not found, please install it")
ImportError: lxml not found, please install it
Looking into /home/pi/python3-ml/lib/python3.7/site-packages/pandas/io/html.py it was attempting to use lxml.etree, so I attempted to just use that module
>>> from lxml import etree
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: libxslt.so.1: cannot open shared object file: No such file or directory
I searched for that error and found that the following packages needed to be installed on the RPi
sudo apt-get install libxslt
After installing I was successfully able to use pandas
import pandas as pd
from urllibenter code here.request import Request, urlopen
url = 'WEB-SITE'
request_site = Request(url, headers={"User-Agent": "Mozilla/5.0"})
webpage = urlopen(request_site)
dfk1 = pd.read_html(webpage, flavor='html5lib')
print(dfk1)
im running python 3.1.2 on my ubuntu 10.04
which version of BeautifulSoup i need to install and how?
i already download version 3.2 and run sudo python3 setup.py install
but doesnt works
thnx
EDIT : The error i get is :
>>> import BeautifulSoup
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "BeautifulSoup.py", line 448
raise AttributeError, "'%s' object has no attribute '%s'" % (self.__class__.__name__, attr)
^
SyntaxError: invalid syntax
>>>
I followed these steps to install beautifulsoup4 for HTML parsing.
Download the Beautiful Soup 4 source tarball and install it with
$ python3 setup.py install
After that enter your python3 console
$ import bs4
$ from bs4 import BeautifulSoup
The first statement will work while the second will fail. You have to change BeautifulSoup module into python3 format with the help of this command.
$ 2to3 -w bs4
3 After that you can run the test again, and everything will work fine.
The only series of BeautifulSoup which works with Python 3 is 3.1 However the author has abandoned it and will not release updates. You can read more about the problems here.
UPDATE: This is no longer true, BeautifulSoup 4 works on Python 3. You can install it with pip install beautifulsoup4.
pip install BeautifulSoup will install version 3.
Try beautifulsoup4. It works for both Python 2.x and Python 3.x.
Then just try with:
from bs4 import BeautifulSoup
I'm trying to install BeautifulSoup package through PyCharm, I have also tried to download it through the command line. But I just cannot seem to get to it to run on PyCharm. I keep getting the following error
Collecting BeautifulSoup
Using cached BeautifulSoup-3.2.1.tar.gz
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Users\Hemesh\AppData\Local\Temp\pycharm-packaging\BeautifulSoup\setup.py", line 22
print "Unit tests have failed!"
^
SyntaxError: Missing parentheses in call to 'print'
----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in C:\Users\Hemesh\AppData\Local\Temp\pycharm-packaging\BeautifulSoup\
You are using pip version 8.1.1, however version 8.1.2 is available.
You should consider upgrading via the 'python -m pip install --upgrade pip' command.
Can anyone help?
thanks in advance
Are you used to using Python 2.7? It seems to be asking for Python 3.X syntax for your print statement, so it might be an issue with the versions you're trying to use?
BeautifulSoup 3.2.1 only works with Python 2.7 (and is obsolete), if you get BeautifulSoup 4, it should work with both versions.
I am trying to create a script to download the captcha from my website.
I think the code works except from that error, when I run it in cmd (I am using windows not Linux) I receive the following:
from bs4 import BeautifulSoup
ImportError: No module named bs4
I tried using pip install BeautifulSoup4
but then I receive a syntax error at install.
Here is the script:
from bs4 import BeautifulSoup
import urllib2
import urllib
url = "https://example.com"
content = urllib2.urlopen(url)
soup = BeautifulSoup(content)
img = soup.find('img',id ='imgCaptcha')
print img
urllib.urlretrieve(urlparse.urljoin(url, img['src']), 'captcha.bmp')
The problem according to this answer must be due to the fact I have not activated the virtualenv, and THEN install BeautifulSoup4.
I don't think this information will be of any help but I saved my python text in a notepad.py and the run it using cmd.
I had the same problem until a moment ago. Thanks for the post and comments! According to #Martin Vseticka 's suggestion I checked if I have the pip.exe file in my python folders. I run python 2.7 and 3.7 simultaneously. I didn't have it in the python 2.7 folder but in the 3.7.
So in the command line I changed the directory to where the pip.exe file was located. Then I ran "pip install BeautifulSoup4" and it worked. See enclosed screen shot.
I did a fresh install of Python 3.5 on my Windows 8.1 (64b) machine and then run:
C:\Users\Me\AppData\Local\Programs\Python\Python35-32\Scripts>pip install b
eautifulsoup4
Collecting beautifulsoup4
Downloading beautifulsoup4-4.4.1-py3-none-any.whl (81kB)
100% |################################| 81kB 890kB/s
Installing collected packages: beautifulsoup4
Successfully installed beautifulsoup4-4.4.1
and then run:
C:\Users\Me\AppData\Local\Programs\Python\Python35-32>python test.py
test.py contains only:
from bs4 import BeautifulSoup
I received no error.
I could install bs4, but still got an error message like yours. My operation system is 64 bit, and I only had this issue after I installed the latest Python in "32 bit" version. So I removed the 32 bit one, then I can import the bs4 again, and it still works after I install the latest Python in 64 version. Hope it helps.
I have python3.7 32bit and python3.7 64 bit installed on Windows 10.
For this app, I am using the 32 bit version.
I was getting the following error:
ImportError: No module named bs4
when I was trying to 'import bs4' from my app.py. Someone said: "Make sure your pip and your python are both 32 bits", but omitted to explicitly say how, so here it is:
Delete your virtual environment if it exists.
Create a new venv using python -m venv venv, but explicitly call it from your 32bit python directory like this:
C:\Users\me\AppData\Local\Programs\Python\Python37-32\python -m venv my_venv
install the requirements by explicitly calling the 32bit version like this:
C:\Users\me\AppData\Local\Programs\Python\Python37-32\Scripts\pip install -r requirements.txt
That's it. I hope it works.
I installed bs4 successfully but when I import it, the command line told me that
Traceback (most recent call last):
Python Shell, prompt 3, line 1
File "C:\Python27\Lib\site-packages\bs4\__init__.py", line 303, in <module>
from . import _htmlparser
File "C:\Python27\Lib\site-packages\bs4\_htmlparser.py", line 36, in <module>
from bs4.builder import (
ImportError: No module named builder
I have searched google but I didn't find a solution..
Could our experts help me on this issue ?
thanks a lot !
my system info:
PC OS : windows 7 64bit
Python version: 2.7.10
You must first pip install beautifulsoup4, then try import bs4. If this doesn't work odds are you have a messed up pip configuration. In order to remedy this either reinstall pip, use easy_install or build from source.
In order to use easy_install just run easy_install beautifulsoup4. In order to build from source run download and extract this zip (unzip /path/to/beautifulsoup4.zip if you're at the terminal). Next cd into the now unzipped folder by doing cd /path/to/beautifulsoup and run python setup.py. The package will now be installed and ready to import!
I uninstalled bs4 package and re-installed it. and now it works...
It is quite weird because I tried to uninstall and re-install, but only this time, it worked...
Thanks for your kind help :)