Scrapy, problems on the tutorial - python

Ive been trying to get around Scrapy.
I have python 2.7 installed on my mac (OSX 10.8.5) from before, so I installed pip, scrapy, lxml and twisted (I did the last one manually though through dmg file).
I try to run scrapy startproject tutorial to no success I only get:
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/2.7/bin/scrapy", line 3, in <module>
from scrapy.cmdline import execute
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scrapy/__init__.py", line 43, in <module>
from twisted import version as _txv
ImportError: No module named twisted
Now I've looked around for hours can't seem to find what the problem is, so I thought I'd give this here a shot, any suggestions?
pJ

You might be better off setting up a virtualenv and installing twisted with pip instead.

Got the same error. Works if you install twisted manually, then uninstall scrappy and
reinstall
pip install twisted
pip uninstall scrapy
pip install scrapy

Related

Python SAML authentication

I'm trying to build a SAML authentication mechanism in Python using the OneLogin module, but I can't even get started. Trying from the example code provided in the docs, I can't even load the package.
This works:
import onelogin
but this gets a ModuleNotFoundError:
>>> from onelogin.saml2.auth import OneLogin_Saml2_Auth
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'onelogin.saml2'
I only managed to get this working on a Linux machine, so hope you're not on windows. Here's what worked for me.
I had to do a seperate install of hte xmlsec-library first:
apt-get install xmlsec1 openssl python-lxml libxmlsec1 libxmlsec1-dev
and then I was able to
pip install python3-saml
and there were no complains anymore.
You need to install Xmlsec first and then python3-saml but if you are on Windows, Xmlsec has some issues but as with python 3.6 and below the problem is solved. Use this link to download the wheel file for your python version https://github.com/mehcode/python-xmlsec/releases
Install the wheel file using
pip install <wheel_file_name>
As with Python 3.7, the only way out is to install xmlsec on a Linux machine as it is not yet supported on Windows.

Python: ImportError: lxml not found, please install it

I have the following code (in PyCharm (MacOS)):
import pandas as pd
fiddy_states = pd.read_html('https://simple.wikipedia.org/wiki/List_of_U.S._states')
print(fiddy_states)
And I get the following error:
/Library/Frameworks/Python.framework/Versions/3.6/bin/python3.6 /Users/user_name/PycharmProjects/PandasTest/Doc3.py
Traceback (most recent call last):
File "/Users/user_name/PycharmProjects/PandasTest/Doc3.py", line 9, in <module>
fiddy_states = pd.read_html('https://simple.wikipedia.org/wiki/List_of_U.S._states')
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/io/html.py", line 906, in read_html
keep_default_na=keep_default_na)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/io/html.py", line 733, in _parse
parser = _parser_dispatch(flav)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/io/html.py", line 693, in _parser_dispatch
raise ImportError("lxml not found, please install it")
ImportError: lxml not found, please install it
In Anaconda does appear installed the last version of lxml (3.8.0). Despite of that, I have tried to reinstall it by: 1) writing pip install lxml and 2) downloading the lxml wheel corresponding to my python version (lxml-3.8.0-cp36-cp36m-win_amd64.whl), but in any case all remains the same (in the second case I get that it is not a supported wheel on this platform, even though the version of python is correct (3.6, 64 bits)).
I've read similar questions here (even with the same code above, since it's from a tutorial), but the problem still persists.
Based on the fact that the error is:
/Library/Frameworks/Python.framework/Versions/3.6/bin/python3.6
This means that you are working with python-3.6. Now usually the package manager for python-3.x is pip3. So you probably should install it with:
pip3 install lxml
For people reached here using Jupyter notebook, I restarted the kernel after pip install lxml and the error is gone.
I got same error, it seems that my python3 was pointing to pandas in python2 (since I have not install pandas in python3). After doing pip3 install pandas and restarting a notebook, it worked fine.
you may have to (re)install some of your libraries pip install lxml bs4 html5lib
pd.read_html() reads with 'lxml' library by default, so try another library that you installed above like pd.read_html(some_url, flavor='html5lib')
You can go to Settings > Project Interpreter > Click on '+' icon
Find 'lxml' from the list of packages and click 'Install Package' button found below.
I am using PyCharm 2019.2.1 (Community Edition)
Build #PC-192.6262.63, built on August 22, 2019
Runtime version: 11.0.3+12-b304.39 amd64
VM: OpenJDK 64-Bit Server VM by JetBrains s.r.o
Linux 4.15.0-58-generic
GC: ParNew, ConcurrentMarkSweep
Memory: 937M
Cores: 4
I tried to reinstall lxml without any progress.
I ended uninstalling pandas and reinstalling and updating and that solved my issues!
pip uninstall pandas
pip install pandas
pip3 install --upgrade pandas
I got the same error when trying to run some code that was using pandas. I tried some suggestions here but those did not work. Finally, what worked for me was the following two steps :
conda update anaconda
conda install spyder=5.0.5
Now when I restarted Spyder and ran my code it worked fine.
I have just installed and starting using anaconda so I don't know the root cause of this issue, but my guess is there seemed to be some "cross-connection" in the packages I had installed prior to my installation of Anaconda, and by running the above two steps now everything is running from within the Anaconda environment.
This error occurs when lxml is not installed, so just go to the terminal
and run: pip3 install lxml
I got the same problem. Trying to reinstall lxml does not work. After rereading the error message and tracing the error ~\Miniconda3\envs\mini_ds\lib\site-packages\pandas\io\html.py:872, I think I found the problem lies in the function _importers() in ~/pandas/io/html.py.
Here is the function:
def _importers() -> None:
# import things we need
# but make this done on a first use basis
global _IMPORTS
if _IMPORTS:
return
global _HAS_BS4, _HAS_LXML, _HAS_HTML5LIB
bs4 = import_optional_dependency("bs4", errors="ignore")
_HAS_BS4 = bs4 is not None
lxml = import_optional_dependency("lxml.etree", errors="ignore")
_HAS_LXML = lxml is not None
html5lib = import_optional_dependency("html5lib", errors="ignore")
_HAS_HTML5LIB = html5lib is not None
_IMPORTS = True
You can see that for lxml option, it actually tries importing "lxml.etree" instead of "lxml". So this is probably why reinstalling "lxml" would not help.
Conclusion, I think this is perhaps a problem of pandas version (mine is 1.4.1). For me, a quick solution is to specify the flavor ='html5lib' in pd.read_html().
I installed lxml 4.9.1, but it didn't work. So I tried to install lxml 4.8.0 instead, and it worked!
pip install lxml==4.8
As OP is using Anaconda, in order to solve that issue, install lxml by opening the CMD.Exe Prompt for the environment one is working on, and run
conda install -c anaconda lxml
(Source)
One can also do it by specifying the version as follows
conda install -c anaconda lxml=4.8.0
Notes:
pip doesn't manage dependencies the same way conda does and can, potentially, damage one's installation. Therefore, would recommend to use it only if conda doesn't work.
pip install lxml
# or
pip install lxml==4.9.1
If one is using pip and one has already the package installed and one is getting errors, one can pass -I (--ignore-installed) and -v as follows
pip install -Iv lxml==4.9.1
lxml official documentation can be found here.
This is their official GitHub repo.
I was seeing this issue as well on my RPi.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/pi/python3-ml/lib/python3.7/site-packages/pandas/util/_decorators.py", line 311, in wrapper
return func(*args, **kwargs)
File "/home/pi/python3-ml/lib/python3.7/site-packages/pandas/io/html.py", line 1113, in read_html
displayed_only=displayed_only,
File "/home/pi/python3-ml/lib/python3.7/site-packages/pandas/io/html.py", line 902, in _parse
parser = _parser_dispatch(flav)
File "/home/pi/python3-ml/lib/python3.7/site-packages/pandas/io/html.py", line 859, in _parser_dispatch
raise ImportError("lxml not found, please install it")
ImportError: lxml not found, please install it
Looking into /home/pi/python3-ml/lib/python3.7/site-packages/pandas/io/html.py it was attempting to use lxml.etree, so I attempted to just use that module
>>> from lxml import etree
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: libxslt.so.1: cannot open shared object file: No such file or directory
I searched for that error and found that the following packages needed to be installed on the RPi
sudo apt-get install libxslt
After installing I was successfully able to use pandas
import pandas as pd
from urllibenter code here.request import Request, urlopen
url = 'WEB-SITE'
request_site = Request(url, headers={"User-Agent": "Mozilla/5.0"})
webpage = urlopen(request_site)
dfk1 = pd.read_html(webpage, flavor='html5lib')
print(dfk1)

Python error message "Incompatible library version" libxml and etree.so

Update 2:
the main problem turned out to be a different one from what I had thought it was, and asked for help here. I moved the new question to a new post:
Install custom python package in virtualenv
Update:
ok, so I screwed up my non-virtualenv by accident.
The non-virtualenv (normal bash) I could easily fix by removing the manually installed (via pip) lxml and running
conda install lxml --force
But for some reason, that doesn't work in the virtualenv.
There, running
conda install lxml --force
works without error message, but when I run python and simply say
>>> import lxml
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: No module named lxml
Any suggestions??
old message:
I'm trying to use virtualenv for my python flask application.
The python code runs perfectly fine without the virtualenv.
I've installed the packages I need in the virtualenv, but I after installing lxml via
pip install lxml
Installing collected packages: lxml
Successfully installed lxml-3.6.0
I get the following error message when running my code:
File "/Users/XXX/xxx/flask-aws/lib/python2.7/site-packages/docx-0.2.4-py2.7.egg/docx.py", line 17, in <module>
from lxml import etree
ImportError: dlopen(/Users/XXX/xxx/flask-aws/lib/python2.7/site-packages/lxml/etree.so, 2): Library not loaded: libxml2.2.dylib
Referenced from: /Users/XXX/xxx/flask-aws/lib/python2.7/site-packages/lxml/etree.so
Reason: Incompatible library version: etree.so requires version 12.0.0 or later, but libxml2.2.dylib provides version 10.0.0
I have seen other people report similar problems at stackoverflow, and one guy remarked that the problem might related to the virtualenv, but there was no solution.
Once again: The python code runs perfectly fine without virtualenv! But inside virtualenv, I can't get it to work.
I'm using Anaconda Python 2.7 on a Mac.
I'd appreciate any help guys!
I had the same error and stumbled upon this link, after searching for the incompatible library error "libxml2.2.dylib provides version 10.0.0"
Installing libxml2 that worked for me:
brew install libxml2
brew link --force libxml2
Solution that works for me in virtual environment is to force pip to recompile lxml:
pip install lxml --force-reinstall --ignore-installed --no-binary :all:

python 2.7.10 issues about import bs4

I installed bs4 successfully but when I import it, the command line told me that
Traceback (most recent call last):
Python Shell, prompt 3, line 1
File "C:\Python27\Lib\site-packages\bs4\__init__.py", line 303, in <module>
from . import _htmlparser
File "C:\Python27\Lib\site-packages\bs4\_htmlparser.py", line 36, in <module>
from bs4.builder import (
ImportError: No module named builder
I have searched google but I didn't find a solution..
Could our experts help me on this issue ?
thanks a lot !
my system info:
PC OS : windows 7 64bit
Python version: 2.7.10
You must first pip install beautifulsoup4, then try import bs4. If this doesn't work odds are you have a messed up pip configuration. In order to remedy this either reinstall pip, use easy_install or build from source.
In order to use easy_install just run easy_install beautifulsoup4. In order to build from source run download and extract this zip (unzip /path/to/beautifulsoup4.zip if you're at the terminal). Next cd into the now unzipped folder by doing cd /path/to/beautifulsoup and run python setup.py. The package will now be installed and ready to import!
I uninstalled bs4 package and re-installed it. and now it works...
It is quite weird because I tried to uninstall and re-install, but only this time, it worked...
Thanks for your kind help :)

ImportError: No module named libxml2

I am using Ubuntu 12.04.2 LTS. I have used libxml2 in my python script and when I try to run it, gives error
Traceback (most recent call last):
File "deploy.py", line 3, in <module>
import libxml2
ImportError: No module named libxml2
I tried almost all stackoverflow answers for the same question but nothings solves the issue (Installed several different packages).
You have to install the package in Ubuntu before being able to use it:
sudo apt-get install python-libxml2
You'd have to install python-libxml2.
Note that that's a fairly basic wrapper around the libxml2 C library. You may want to use lxml instead, for a far more Pythonic XML API.
You can try
pip install libxml2-python

Categories

Resources