I have the following code (in PyCharm (MacOS)):
import pandas as pd
fiddy_states = pd.read_html('https://simple.wikipedia.org/wiki/List_of_U.S._states')
print(fiddy_states)
And I get the following error:
/Library/Frameworks/Python.framework/Versions/3.6/bin/python3.6 /Users/user_name/PycharmProjects/PandasTest/Doc3.py
Traceback (most recent call last):
File "/Users/user_name/PycharmProjects/PandasTest/Doc3.py", line 9, in <module>
fiddy_states = pd.read_html('https://simple.wikipedia.org/wiki/List_of_U.S._states')
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/io/html.py", line 906, in read_html
keep_default_na=keep_default_na)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/io/html.py", line 733, in _parse
parser = _parser_dispatch(flav)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/io/html.py", line 693, in _parser_dispatch
raise ImportError("lxml not found, please install it")
ImportError: lxml not found, please install it
In Anaconda does appear installed the last version of lxml (3.8.0). Despite of that, I have tried to reinstall it by: 1) writing pip install lxml and 2) downloading the lxml wheel corresponding to my python version (lxml-3.8.0-cp36-cp36m-win_amd64.whl), but in any case all remains the same (in the second case I get that it is not a supported wheel on this platform, even though the version of python is correct (3.6, 64 bits)).
I've read similar questions here (even with the same code above, since it's from a tutorial), but the problem still persists.
Based on the fact that the error is:
/Library/Frameworks/Python.framework/Versions/3.6/bin/python3.6
This means that you are working with python-3.6. Now usually the package manager for python-3.x is pip3. So you probably should install it with:
pip3 install lxml
For people reached here using Jupyter notebook, I restarted the kernel after pip install lxml and the error is gone.
I got same error, it seems that my python3 was pointing to pandas in python2 (since I have not install pandas in python3). After doing pip3 install pandas and restarting a notebook, it worked fine.
you may have to (re)install some of your libraries pip install lxml bs4 html5lib
pd.read_html() reads with 'lxml' library by default, so try another library that you installed above like pd.read_html(some_url, flavor='html5lib')
You can go to Settings > Project Interpreter > Click on '+' icon
Find 'lxml' from the list of packages and click 'Install Package' button found below.
I am using PyCharm 2019.2.1 (Community Edition)
Build #PC-192.6262.63, built on August 22, 2019
Runtime version: 11.0.3+12-b304.39 amd64
VM: OpenJDK 64-Bit Server VM by JetBrains s.r.o
Linux 4.15.0-58-generic
GC: ParNew, ConcurrentMarkSweep
Memory: 937M
Cores: 4
I tried to reinstall lxml without any progress.
I ended uninstalling pandas and reinstalling and updating and that solved my issues!
pip uninstall pandas
pip install pandas
pip3 install --upgrade pandas
I got the same error when trying to run some code that was using pandas. I tried some suggestions here but those did not work. Finally, what worked for me was the following two steps :
conda update anaconda
conda install spyder=5.0.5
Now when I restarted Spyder and ran my code it worked fine.
I have just installed and starting using anaconda so I don't know the root cause of this issue, but my guess is there seemed to be some "cross-connection" in the packages I had installed prior to my installation of Anaconda, and by running the above two steps now everything is running from within the Anaconda environment.
This error occurs when lxml is not installed, so just go to the terminal
and run: pip3 install lxml
I got the same problem. Trying to reinstall lxml does not work. After rereading the error message and tracing the error ~\Miniconda3\envs\mini_ds\lib\site-packages\pandas\io\html.py:872, I think I found the problem lies in the function _importers() in ~/pandas/io/html.py.
Here is the function:
def _importers() -> None:
# import things we need
# but make this done on a first use basis
global _IMPORTS
if _IMPORTS:
return
global _HAS_BS4, _HAS_LXML, _HAS_HTML5LIB
bs4 = import_optional_dependency("bs4", errors="ignore")
_HAS_BS4 = bs4 is not None
lxml = import_optional_dependency("lxml.etree", errors="ignore")
_HAS_LXML = lxml is not None
html5lib = import_optional_dependency("html5lib", errors="ignore")
_HAS_HTML5LIB = html5lib is not None
_IMPORTS = True
You can see that for lxml option, it actually tries importing "lxml.etree" instead of "lxml". So this is probably why reinstalling "lxml" would not help.
Conclusion, I think this is perhaps a problem of pandas version (mine is 1.4.1). For me, a quick solution is to specify the flavor ='html5lib' in pd.read_html().
I installed lxml 4.9.1, but it didn't work. So I tried to install lxml 4.8.0 instead, and it worked!
pip install lxml==4.8
As OP is using Anaconda, in order to solve that issue, install lxml by opening the CMD.Exe Prompt for the environment one is working on, and run
conda install -c anaconda lxml
(Source)
One can also do it by specifying the version as follows
conda install -c anaconda lxml=4.8.0
Notes:
pip doesn't manage dependencies the same way conda does and can, potentially, damage one's installation. Therefore, would recommend to use it only if conda doesn't work.
pip install lxml
# or
pip install lxml==4.9.1
If one is using pip and one has already the package installed and one is getting errors, one can pass -I (--ignore-installed) and -v as follows
pip install -Iv lxml==4.9.1
lxml official documentation can be found here.
This is their official GitHub repo.
I was seeing this issue as well on my RPi.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/pi/python3-ml/lib/python3.7/site-packages/pandas/util/_decorators.py", line 311, in wrapper
return func(*args, **kwargs)
File "/home/pi/python3-ml/lib/python3.7/site-packages/pandas/io/html.py", line 1113, in read_html
displayed_only=displayed_only,
File "/home/pi/python3-ml/lib/python3.7/site-packages/pandas/io/html.py", line 902, in _parse
parser = _parser_dispatch(flav)
File "/home/pi/python3-ml/lib/python3.7/site-packages/pandas/io/html.py", line 859, in _parser_dispatch
raise ImportError("lxml not found, please install it")
ImportError: lxml not found, please install it
Looking into /home/pi/python3-ml/lib/python3.7/site-packages/pandas/io/html.py it was attempting to use lxml.etree, so I attempted to just use that module
>>> from lxml import etree
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: libxslt.so.1: cannot open shared object file: No such file or directory
I searched for that error and found that the following packages needed to be installed on the RPi
sudo apt-get install libxslt
After installing I was successfully able to use pandas
import pandas as pd
from urllibenter code here.request import Request, urlopen
url = 'WEB-SITE'
request_site = Request(url, headers={"User-Agent": "Mozilla/5.0"})
webpage = urlopen(request_site)
dfk1 = pd.read_html(webpage, flavor='html5lib')
print(dfk1)
Related
Problem:
I've been attempting to use the Python library pdf2image, which I am aware requires the prior installation of poppler. Poppler is installed (via homebrew) and the package via pip.
However, when running convert_to_path(my_pdf) I get the following:
Traceback (most recent call last):
File "<ipython-input-9-ba107659b495>", line 1, in <module>
test_image = convert_from_path(testfile,
File "/Users/<myuser>/.pyenv/versions/3.8.0/lib/python3.8/site-packages/pdf2image/pdf2image.py", line 97, in convert_from_path
page_count = pdfinfo_from_path(pdf_path, userpw, poppler_path=poppler_path)["Pages"]
File "/Users/<myuser>/.pyenv/versions/3.8.0/lib/python3.8/site-packages/pdf2image/pdf2image.py", line 467, in pdfinfo_from_path
raise PDFInfoNotInstalledError(
PDFInfoNotInstalledError: Unable to get page count. Is poppler installed and in PATH?
My System:
Mac, OC 10.15.7
Python (via Homebrew) version 3.8.0
which python
/Users/<myuser>/.pyenv/shims/python
which pip
/Users/<myuser>/.pyenv/shims/pip
What I've tried so far:
Poppler is installed via homebrew (brew install poppler) and pdf2image installed with pip3 install pdf2image
I've run brew cleanup too.
Attempted to force the poppler path in convert_to_path with the following,
pop_path = "/usr/local/Cellar/poppler/21.03.0_1"
convert_to_path(my_pdf_file,poppler_path = pop_path)
but still get the same error.
Had a decent look online and found many people with similar, yet not identical, problems. I feel like I must be doing something wrong, so any guidance would be great.
Partial Solution?
In manually entering the pop_path file path, I forgot to append /bin to the path,
pop_path = "/usr/local/Cellar/poppler/21.03.0_1/bin"
The code works now. Though my pride will take some time to recover...
I feel like I might still be sitting on a bad configuration issue? The many posts on similar issues seem to imply a homebrew installed-popper shouldn't have this issue. Is it maybe because I am using pyenv too?
if you use conda environment run the below code in mac terminal
conda install -c conda-forge pdf2image
conda install -c conda-forge poppler
It solve the problem
I am new to python and was using a python script written by someone else. I was running it fine in a different PC. Just had to install coupe of packages including pip3, google-cloud, google-cloud-bigquery and pandas.
Now when I installed the same packages on a different PC, I am unable to run the script. It is showed the following error first:
module = 'google.protobuf.descriptor_pb2' TypeError: expected bytes, Descriptor found
However, when In purged/re-installed/updated packages and also added protobuf3 and protobuf-py3 package, The error has been updated to the following message:
from google.cloud import bigquery
File "/home/mobeen/.local/lib/python3.6/site-packages/google/cloud/bigquery/__init__.py", line 35, in <module>
from google.cloud.bigquery.client import Client
File "/home/mobeen/.local/lib/python3.6/site-packages/google/cloud/bigquery/client.py", line 50, in <module>
import google.cloud._helpers
File "/home/mobeen/.local/lib/python3.6/site-packages/google/cloud/_helpers.py", line 33, in <module>
from google.protobuf import duration_pb2
File "/home/mobeen/.local/lib/python3.6/site-packages/google/protobuf/duration_pb2.py", line 8, in <module>
from google.protobuf import symbol_database as _symbol_database
File "/home/mobeen/.local/lib/python3.6/site-packages/google/protobuf/symbol_database.py", line 193, in <module>
_DEFAULT = SymbolDatabase(pool=descriptor_pool.Default())
AttributeError: module 'google.protobuf.descriptor_pool' has no attribute 'Default'
.Any help or leads in this will be appreciated
I solved the problem by uninstalling protobuf:
pip3 uninstall protobuf
pip3 uninstall python3-protobuf
NB: You should repeat this command until you get a message that there is no package named protobuf.
After that execute:
pip3 install protobuf
Install just the protobuf , don't install python3-protobuf
Hope this solution can help you.
I solved it by first executing twice
pip3 uninstall protobuf
The second time the terminal returned
Found existing installation: protobuf 3.6.1
Not uninstalling protobuf at /usr/lib/python3/dist-packages, outside environment /usr
Can't uninstall 'protobuf'. No files were found to uninstall.
Then I removed protobuf manually
sudo rm -r /usr/lib/python3/dist-packages/google/protobuf*
sudo rm -r /usr/lib/python3/dist-packages/protobuf*
And finally I executed
pip3 install --upgrade protobuf
And the problem was solved
Did you try this too?
"I solved the problem of showed Attribute Error: 'module' object has no attribute 'Default' when import tensorflow after installed by delete redundant protobuf file.
The reason is some google/protobuf/descriptor_pool.py do not have a 'Default' defined. This usually happened at old version of protobuf so I upgraded successfully, but problem not solved. And by checking PATH and search about 'google/protobuf', I found it existed in both "/usr/local/lib/python2.7/dist-packages/google/protobuf/" and "/usr/lib/python2.7/dist-packages/google/protobuf/". Previous one have attribute 'Default' but the second one not.
I tried import google.protobuf and google.protobuf.file, It shows '/usr/lib/python2.7/dist-packages/google/protobuf/init.pyc'. I deleted /usr/lib/python2.7/dist-packages/google/protobuf and tried to import tensorflow, worked."
Actually I've ran into a similar case, we had 2 packages installed protobuf and python3-protobuf. I actually dont know the root cause for this but apparently when you do that:
pip install protobuf
pip install python3-protobuf
that error you described is happening, looks like it gives you some different version, like those two packages have overlapping files, and they override each other or something.
the solution for me was simply to reverse the installation order (make sure to uninstall them both first):
pip install python3-protobuf
pip install protobuf
or just
pip install python3-protobuf protobuf
hope this helps anyone here.
The reason can be that the interpreter that you are using to run the python programm uses the previous version of google.protobuf
You can assure it to run in the interpreter
>>> import google.protobuf
>>> print google.protobuf.__version__
Then compare it running in the terminal
$pip show protobuf
If the versions are different that's the reason
So I suggest deleting this package right from the python interpreter console
>>> pip uninstall protobuf -y
After you can even install the package right from the python console
>>> pip install protobuf
After that you good to go ✌️
If anyone has had this problem before and come across a solution then please share. If I try
import bs4
I get:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: No module named 'bs4'
I get this error if i try to import any variation of beautifulsoup4 (capitalized, with no number etc). But if I check that the module is installed:
conda list beautifulsoup4
PS C:\WINDOWS\system32> conda list beautifulsoup4
# packages in environment at C:\Anaconda3:
#
beautifulsoup4 4.6.0 <pip>
I've tried removing the packages and reinstalling with conda or pip and the same result always happens. If I check where its installed the appropriate package seems to be installed and unpacked in the right place:
C:\Anaconda3\pkgs\beautifulsoup4-4.6.0-py34_0\Lib\site-packages\bs4
I've tried running the test programs in the directory and it returns the error described above. 'no bs4', I tried typing help('modules') into the shell but there is no sign of a beautifulsoup module at all which is unexpected given that conda list says its installed.
I've had no problems installing other packages with conda and can't see any obvious solution to this. Any help would be great?
Update 2:
the main problem turned out to be a different one from what I had thought it was, and asked for help here. I moved the new question to a new post:
Install custom python package in virtualenv
Update:
ok, so I screwed up my non-virtualenv by accident.
The non-virtualenv (normal bash) I could easily fix by removing the manually installed (via pip) lxml and running
conda install lxml --force
But for some reason, that doesn't work in the virtualenv.
There, running
conda install lxml --force
works without error message, but when I run python and simply say
>>> import lxml
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: No module named lxml
Any suggestions??
old message:
I'm trying to use virtualenv for my python flask application.
The python code runs perfectly fine without the virtualenv.
I've installed the packages I need in the virtualenv, but I after installing lxml via
pip install lxml
Installing collected packages: lxml
Successfully installed lxml-3.6.0
I get the following error message when running my code:
File "/Users/XXX/xxx/flask-aws/lib/python2.7/site-packages/docx-0.2.4-py2.7.egg/docx.py", line 17, in <module>
from lxml import etree
ImportError: dlopen(/Users/XXX/xxx/flask-aws/lib/python2.7/site-packages/lxml/etree.so, 2): Library not loaded: libxml2.2.dylib
Referenced from: /Users/XXX/xxx/flask-aws/lib/python2.7/site-packages/lxml/etree.so
Reason: Incompatible library version: etree.so requires version 12.0.0 or later, but libxml2.2.dylib provides version 10.0.0
I have seen other people report similar problems at stackoverflow, and one guy remarked that the problem might related to the virtualenv, but there was no solution.
Once again: The python code runs perfectly fine without virtualenv! But inside virtualenv, I can't get it to work.
I'm using Anaconda Python 2.7 on a Mac.
I'd appreciate any help guys!
I had the same error and stumbled upon this link, after searching for the incompatible library error "libxml2.2.dylib provides version 10.0.0"
Installing libxml2 that worked for me:
brew install libxml2
brew link --force libxml2
Solution that works for me in virtual environment is to force pip to recompile lxml:
pip install lxml --force-reinstall --ignore-installed --no-binary :all:
I installed bs4 successfully but when I import it, the command line told me that
Traceback (most recent call last):
Python Shell, prompt 3, line 1
File "C:\Python27\Lib\site-packages\bs4\__init__.py", line 303, in <module>
from . import _htmlparser
File "C:\Python27\Lib\site-packages\bs4\_htmlparser.py", line 36, in <module>
from bs4.builder import (
ImportError: No module named builder
I have searched google but I didn't find a solution..
Could our experts help me on this issue ?
thanks a lot !
my system info:
PC OS : windows 7 64bit
Python version: 2.7.10
You must first pip install beautifulsoup4, then try import bs4. If this doesn't work odds are you have a messed up pip configuration. In order to remedy this either reinstall pip, use easy_install or build from source.
In order to use easy_install just run easy_install beautifulsoup4. In order to build from source run download and extract this zip (unzip /path/to/beautifulsoup4.zip if you're at the terminal). Next cd into the now unzipped folder by doing cd /path/to/beautifulsoup and run python setup.py. The package will now be installed and ready to import!
I uninstalled bs4 package and re-installed it. and now it works...
It is quite weird because I tried to uninstall and re-install, but only this time, it worked...
Thanks for your kind help :)