Unable to run scrapy for Python - python

I am new to this platform as well as to Python scraping. I hope that my question will still be understandable and somebody can help me. Sorry, in case I make something unclear...
I have already checked other posts on a similar topic but could not manage to overcome my problem.
I am currently getting into web-scraping and wanted to try Scrapy. Therefore, I followed the installation instructions on the website. http://doc.scrapy.org/en/0.16/intro/install.html#intro-install
After I had figured out how it works I decided to run in a virtual environment.
I installed virtualenv and pip.
Then I installed Scrapy.
When I now want to start with the tutorial
scrapy startproject tutorial
I get the following error message:
File "/Users/XXX/environment_trial/bin/scrapy", line 3, in <module>
from scrapy.cmdline import execute
File "/Users/XXX/environment_trial/lib/python2.7/site-packages/scrapy/cmdline.py", line 7, in <module>
from scrapy.crawler import CrawlerProcess
File "/Users/XXX/environment_trial/lib/python2.7/site-packages/scrapy/crawler.py", line 3, in <module>
from twisted.internet import reactor, defer
ImportError: No module named twisted.internet
(environment_trial)XXX-iMac:~ XXX$
I could not find a Twisted.py on my Mac as suggested by other posts.
Can somebody please tell me what to do?

Simply put, you need to install twisted. You can get it from the download page. It looks like you'll need to install from source on a newer Mac, but that's just a case of extracting the tarball and running python setup.py install in the extracted folder.
edit: Since you already have pip installed, you can also grab twisted with it.
pip -E twisted_env install -U twisted

Please be sure that all the binaries that you are installing correspond to the exact same version of python that you have installed (For e.g python 2.7 ).
I did this mistake of installing pyopenSSL for python3.6 and it took me a lomng time to realize that versions did not match.

Related

How to install PyGTK MacOs catalina 2020

I am trying to install a software called p4vasp:
https://github.com/orest-d/p4vasp/blob/master/README.MacOS.
I am getting this error message:
You need to get version 2.0 (or later) of PyGTK for this to work. You can get source code from http://www.pygtk.org
So I go there and follow the steps, but when I try to run the python script hello.py it fails:
File "hello.py", line 1, in
import gi
ImportError: No module named gi.
I think this could the problem.
pygobject3 3.36.1 is already installed and gtk+3 3.24.22 as well.
I have tried all the solutions, none worked so far.. Could someone help me please?
Thank you in advance.
Best wishes to all

Windows: Can no longer find BeautifulSoup when double-clicking python file

For years now I've been using a python script from my Windows box that uses BeautifulSoup and requests to do bit of web scraping to return some running race results for my club. A long time ago, I installed Python 3.x and did a pip install of BeautifulSoup4, things just worked. I would double-click the script -- or run from a command prompt with custom arguments-- and all would run well.
This thing has worked perfectly for years.
Unfortunately I seem to have messed up the install lately. Now when I run it from the command line, I get this:
D:\Dev\Srr>srr_new6.py
Traceback (most recent call last):
File "D:\Dev\Srr\srr_new6.py", line 23, in <module>
from bs4 import BeautifulSoup
ImportError: No module named bs4
D:\Dev\Srr>
But if I try to install beautifulsoup4 it tells me it's already installed.
D:\Dev\Srr>pip install beautifulsoup4
Requirement already satisfied: beautifulsoup4 in c:\program
files\python36\lib\site-packages
D:\Dev\Srr>
When I installed python I was sure to put it in the system path. And when I check the path from my command prompt, there it is. If I type "python" from the command line, I do get the interactive environment. And I tried uninstalling and then reinstalling both python and beautifulsoup it did not appear to help.
Googling I've found mention of "virtualenv" so it sounds to me like I need to "activate" that but cannot seem to find or invoke it. Windows doesn't know about it when I type it from the command line.
Is there some simple way that I can fix this? Is there some way to make beautifulsoup4 be a part of the "default" virtualenv on Windows so that windows will always find it?
(I'd love to learn python. I've been saying that for years. But this isn't for work and there always seems to be other things I need to learn first for work)

python2-gobject on Arch linux ImportError: No module named gobject

I have some example file (download URL) to understand how to create Twisted chat with GUI.
In this particular file I have an exception ImportError: No module named gobject.
It's true, I have only gi and already installed:
sudo pacman -S python2-gobject
So I decide that this code for python3, and again fail. After pip install twisted I can't run code: ImportError: cannot import name 'gtk2reactor' appears.
How to run this code at least.
And how to prevent this in future, because I have the same error in many science packages for python.
P.S. installing from source impossible either.
make returns a lot of errors even if ./configure completes fine.
You may want to clarify your question a bit, but if you can't run code after you pip, or if your pip is broken a uninstall/reinstall of pip may be your best bet.
If you did successfully download the package, then I would dive into where you installed it and make sure the package is for the correct version of python, and that it is installed.
Juhaz from Freenode told me that the code in examples was pretty old and uses unmaintained bindings.
In case someone starts the same was as I am this question would be helpfull.
Try to look at wkPython, for example this post.

Conflicting versions of python in ubuntu

So i had python 2.7.2 on my server and i needed to update it to python 2.7.3. So i've tried to remove the 2.7.2 version and then install the new one using the sources. I wasn't able to remove the 2.7.2 version cause the system uses it to run crucial services on server, so i installed the 2.7.3 version in hope that after that i would be able to remove the old version. Still i cant remove the old version, although i'm able to execute the python 2.7.3 when i install any module i cant import it. I added the path to sys.path and i started finding the module but importing it causes another errors.
My python executes the /usr/local/bin/python which is the 2.7.3 version where the problems are.
If i try to execute python like this /usr/bin/python it executes the old version and everything works fine there, i can import the new installed modules.
So what can i do to make python 2.7.3 work?
I've searched a lot of tutorials and tried things like add the library in .pth files on python and i started finding the modules but when importing it i get errors like this:
>>> import numpy
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/dist-packages/numpy/__init__.py", line 137, in <module>
import add_newdocs
File "/usr/local/lib/python2.7/dist-packages/numpy/add_newdocs.py", line 9, in <module>
from numpy.lib import add_newdoc
File "/usr/local/lib/python2.7/dist-packages/numpy/lib/__init__.py", line 4, in <module>
from type_check import *
File "/usr/local/lib/python2.7/dist-packages/numpy/lib/type_check.py", line 8, in <module>
import numpy.core.numeric as _nx
File "/usr/local/lib/python2.7/dist-packages/numpy/core/__init__.py", line 5, in <module>
import multiarray
ImportError: /usr/local/lib/python2.7/dist-packages/numpy/core/multiarray.so: undefined symbol: PyUnicodeUCS4_AsUnicodeEscapeString
Thanks for the help
EDIT PROBLEM SOLvED
So to solve the missing import modules i created a .pth file under /usr/local/lib/python2.7/site-packages/ with the directories where the python modules are and the python starts to find them.
To fix the comptability problems you can install python from sources and specify the unicode doing ./configure --enable-unicode
more information here
Do not EVER mess with system python, EVER.
What you should do is install python 2.7.3 with a --prefix into your home directory, then use virtualenv -p /home/myuser/path/to/python.
In any case, using virtualenv to run your own application is almost always a good idea, as it avoids polluting the system package directories with libraries you use in your own applications.
It looks like the modules you've installed were built against your old version of Python, or at least a version incompatible with your newer installation. The import error you're seeing at the bottom is the numpy module searching for a symbol that is not in your build of 2.7.3. There is further information here.
If possible, it's usually way easier to upgrade Python with a package manager. That way, if anything on your system depends on Python, but does not need exactly 2.7.2, then Python can be easily upgraded without disturbing anything. I'm guessing that either your server doesn't have a newer version of Python available and you can't add new repositories, or you don't have access to a package manager. If using packages is possible, I would go ahead and remove what you've built from source (the command should be 'make clean' if Python uses GNU Make).
If that isn't an option, then there should be a way to compile Python, but not install it into system directories. Then you could add a symlink for users, and make sure that symlink has precedence in their path.
When installing python use the following steps
using prefix to specify the installation directory
./configure --prefix=/usr/bin/python
make
make install
Then everytime u run a new Terminal u have specify
export PATH="$PATH:/usr/bin/"
to tell where is the installation directory of Python
This way u can use any number of pythons
You can install python libs from R. It works for me.
For example, to install numpy library from R type:
system('python -m pip install -U numpy')

How to install python database library to work with robotframework - API issue

I am on a Windows 7 64bit machine, using Python 2.7 and I am trying to use the python database library in robotframework. I have previously used a java library file but now I want to use the python library.
I have gone to github and downloaded version 0.6.
I have also installed a setup file and MySQL-python from here
However when I try to install the database library (using python setup.py build) I get the following error:
Traceback (most recent call last):
File "setup.py", line 25, in <module>
from DatabaseLibrary import __version__
File "src\DatabaseLibrary\__init__.py", line 15, in <module>
from connection_manager import ConnectionManager
File "src\DatabaseLibrary\connection_manager.py", line 16, in <module>
from robot.api import logger
ImportError: No module named api
Why do I not have robot.api and how do I get it and install it? Or is there an easier way to install the python database library?
It seems that the Database library uses Robot Framework internals, but does not list Robot Framework as it's dependency. The robot.api package was introduced in RF 2.6, so upgrading/installing the latest Robot Framework (from project pages) should resolve your issue
First ensure the integrity of your module before trying to install. In order to install a module using distutils (setup.py) you need to run this command as an administrator:
python setup.py install
That should run the setup and report back to you any missing dependencies.
Alternatively, you can install PIP from this location: PIP Project home page. Their page provides instructions how to install PIP, it's a package manager for Python, similar to PEAR for PHP, CPAN for Perl or gem for ruby. When you have it installed you can install packages with this command:
pip install <module>
The issue was I did not have the "API" folder in the "Robot" folder in "Python27\Lib\site-packages" as I did not have the latest version of RF. And logger is a new logging API for Robot Framework 2.6 since Oct 2011. (As janne as pointed out)
Two fixes for this issue seem to be:
Tested and worked but not recommended unless you dont want to update RF: Edit the 2 files "connection_manager.py" and "query.py" in "robotframework-databaselibrary-0.6" so that there is no dependency on the Robotframework logger. This is a easy and quick edit, where you replace the "from robot.api import logger" to "import logging" and "logger" to "logging"
See "http://robotframework.googlecode.com/hg/doc/userguide/RobotFrameworkUserGuide.html#programmatic-logging-apis" for more detail.
Reinstall Robotframework and ensure the "API" folder is created. This is recommended as it is the best approach.
(Added as an answer as too long for a comment)

Categories

Resources