I am running EMR cluster(AWS) but I do not understand how notebook imports packages. I am running PySpark kernel.
import boto3
No module named 'boto3'
Traceback (most recent call last):
ModuleNotFoundError: No module named 'boto3'
print (sys.version) shows
3.7.6 (default, Feb 26 2020, 20:54:15)
[GCC 7.3.1 20180712 (Red Hat 7.3.1-6)]
print(sys.executable) shows
/tmp/1594625399736-0/bin/python
I have both Conda and pip3 install of boto3.
How to solve this?
Are you using pyspark? If yes, then you need to install the packages in the spark context. Refer to this AWS document: https://aws.amazon.com/blogs/big-data/install-python-libraries-on-a-running-cluster-with-emr-notebooks/
similarly install any dependency packages if you see module not found error on import. Make sure the versions are compatible.
sc.list_packages()
Package Version
-------------------------- -------
beautifulsoup4 4.9.0
boto 2.49.0
cycler 0.10.0
jmespath 0.9.5
kiwisolver 1.2.0
lxml 4.5.0
matplotlib 3.2.2
mysqlclient 1.4.2
nltk 3.4.5
nose 1.3.4
numpy 1.19.0
pandas 1.0.5
pip 9.0.1
py-dateutil 2.2
py4j 0.10.9
pyparsing 2.4.7
pyspark 3.0.0
python-dateutil 2.8.1
python37-sagemaker-pyspark 1.3.0
pytz 2020.1
PyYAML 5.3.1
setuptools 28.8.0
six 1.15.0
soupsieve 1.9.5
wheel 0.29.0
windmill 1.6
I have boto.
sc.install_pypi_package("boto3")
Related
So, I'm trying to install and use the google-images-download repo both through: pip install google-images-download and pip install git+https://github.com/Joeclinton1/google-images-download.git
I've tried installing it as SU as well. In PyCharm when I view packages I do see it but when I try this code:
from google_images_download import google_images_download
#instantiate the class
response = google_images_download.googleimagesdownload()
arguments = {"keywords":"aeroplane, school bus, dog in front of house",
"limit":10,"print_urls":False}
paths = response.download(arguments)
#print complete paths to the downloaded images
print(paths)
it gives this error continuously:
Traceback (most recent call last):
File "/Users/*x*/Desktop/SchoolPython/PythonUVA/Webscrape.py", line 1, in <module>
from google_images_download import google_images_download
ModuleNotFoundError: No module named 'google_images_download'
I think it might not be looking in the right filepath or library but any other repo I tried previously did work.
Any help is greatly appreciated.
*edit for versions
(3.9UVA) MacBook-Pro-van-Flavia:Webscrape.py flavia$ which pip
/Users/flavia/PycharmProjects/3.9UVA/bin/pip
(3.9UVA) MacBook-Pro-van-Flavia:Webscrape.py flavia$ which python
/Users/flavia/PycharmProjects/3.9UVA/bin/python
(3.9UVA) MacBook-Pro-van-Flavia:Webscrape.py flavia$ pip list
Package Version
---------------------- -----------
async-generator 1.10
attrs 21.4.0
certifi 2022.5.18.1
cffi 1.15.0
charset-normalizer 2.0.12
cryptography 37.0.2
google-images-download 2.8.0
h11 0.13.0
idna 3.3
outcome 1.1.0
Pillow 9.1.1
pip 21.3.1
pycparser 2.21
pyOpenSSL 22.0.0
PySocks 1.7.1
requests 2.27.1
selenium 4.2.0
setuptools 60.2.0
sniffio 1.2.0
sortedcontainers 2.4.0
trio 0.20.0
trio-websocket 0.9.2
urllib3 1.26.9
wheel 0.37.1
wsproto 1.1.0
I'm stumped. I'm developing some enhancements to scikit-image which are failing the automated build tests, probably due to rounding errors. I therefore need to get the automated tests running on my Windows system so that I can debug and work out what's wrong. I've so far tried two approaches, neither of which are working:
In my Anaconda Python 3.6 environment, when I try to run the automated tests, I am getting the following error:
RuntimeError: module compiled against API version 0xc but this version of numpy is 0xb
...which I have found reference to in other contexts, but have not been able to eliminate.
Since the automated test do run (but fail) on a Python 3.5-based system, I thought things might work if I tried a local Python 3.5 environment. Here, I am running into the issue that, despite being installed, the environment cannot find the MS C++ compiler cl.exe. It is installed in C:\Program Files (x86)\Microsoft Visual Studio\2017\BuildTools\VC\Tools\MSVC\14.15.26726\bin\HostX86\x64\ and is found and executed by my Python 3.6 environment, but my Python 3.5 environment doesn't find it despite me adding that directory to my PATH. I should add that my Python 3.6 environment finds it without the directory being added to the PATH. I understand that both Python 3.5 and 3.6 use MSVC 14.0.
I would prefer to fix the problem in my Python 3.6 environment if possible. Any assistance much appreciated.
Update
I have made a box-fresh Python 3.6 conda environment as follows:
conda create --name sk36 python=3.6
conda activate sk36
conda install scikit-image --only-deps
conda install cython
git clone https://github.com/scikit-image/scikit-image.git
cd scikit-image
pip install -e .
pytest skimage/feature
The specific error I am getting is as follows:
..\Anaconda3\lib\site-packages\py\_path\local.py:662: in pyimport
__import__(modname)
skimage\__init__.py:135: in <module>
from .data import data_dir
skimage\data\__init__.py:13: in <module>
from ..io import imread, use_plugin
skimage\io\__init__.py:7: in <module>
from .manage_plugins import *
skimage\io\manage_plugins.py:24: in <module>
from .collection import imread_collection_wrapper
skimage\io\collection.py:12: in <module>
from ..external.tifffile import TiffFile
skimage\external\tifffile\__init__.py:1: in <module>
from .tifffile import imsave, imread, imshow, TiffFile, TiffWriter, TiffSequence
skimage\external\tifffile\tifffile.py:292: in <module>
from . import _tifffile
E RuntimeError: module compiled against API version 0xc but this version of numpy is 0xb
...which appears to have something to do with tifffile. Since this package wasn't originally explicitly installed in my new environment, I tried installing various versions of it, including some which downgraded numpy and scipy. Still the same error as above.
Having done some more research it would appear that something is seeing numpy 1.13.x when in fact version 1.15.4 is installed. Here is the full output from conda list:
# Name Version Build Channel
blas 1.0 mkl anaconda
ca-certificates 2018.03.07 0 anaconda
certifi 2018.10.15 py36_0 anaconda
cloudpickle 0.6.1 py36_0 anaconda
cycler 0.10.0 py36h009560c_0 anaconda
cython 0.29 py36ha925a31_0 anaconda
dask-core 0.20.0 py36_0 anaconda
decorator 4.3.0 py36_0 anaconda
freetype 2.9.1 ha9979f8_1 anaconda
icc_rt 2017.0.4 h97af966_0 anaconda
icu 58.2 ha66f8fd_1 anaconda
imageio 2.4.1 py36_0 anaconda
intel-openmp 2019.0 118 anaconda
jpeg 9b hb83a4c4_2 anaconda
kiwisolver 1.0.1 py36h6538335_0 anaconda
libpng 1.6.35 h2a8f88b_0 anaconda
libtiff 4.0.9 h36446d0_2 anaconda
matplotlib 3.0.1 py36hc8f65d3_0 anaconda
mkl 2019.0 118 anaconda
mkl_fft 1.0.6 py36hdbbee80_0 anaconda
mkl_random 1.0.1 py36h77b88f5_1 anaconda
networkx 2.2 py36_1 anaconda
numpy 1.15.4 py36ha559c80_0 anaconda
numpy-base 1.15.4 py36h8128ebf_0 anaconda
olefile 0.46 py36_0 anaconda
openssl 1.0.2p hfa6e2cd_0 anaconda
package_has_been_revoked 1.0 0 enable_revoked
pillow 5.3.0 py36hdc69c19_0 anaconda
pip 18.1 py36_0 anaconda
pyparsing 2.3.0 py36_0 anaconda
pyqt 5.9.2 py36h6538335_2 anaconda
python 3.6.7 h33f27b4_1 anaconda
python-dateutil 2.7.5 py36_0 anaconda
pytz 2018.7 py36_0 anaconda
pywavelets 1.0.1 py36h8c2d366_0 anaconda
qt 5.9.6 vc14h1e9a669_2 anaconda
scikit-image 0.15.dev0 <pip>
scipy 1.1.0 py36h4f6bf74_1 anaconda
setuptools 40.5.0 py36_0 anaconda
sip 4.19.8 py36h6538335_0 anaconda
six 1.11.0 py36_1 anaconda
sqlite 3.25.2 hfa6e2cd_0 anaconda
tifffile 0.15.1 py36h452e1ab_1001 conda-forge
tk 8.6.8 hfa6e2cd_0 anaconda
toolz 0.9.0 py36_0 anaconda
tornado 5.1.1 py36hfa6e2cd_0 anaconda
vc 14.1 h21ff451_3 anaconda
vs2015_runtime 15.5.2 3 anaconda
wheel 0.32.2 py36_0 anaconda
wincertstore 0.2 py36h7fe50ca_0 anaconda
zlib 1.2.11 h8395fce_2 anaconda
Update 2
I've solved the problem for Python 3.6, and I think there's enough information above for the astute to be able to work out what was wrong. I'll put the solution in an answer below.
A cleanly built Python 3.5 environment can't find the compiler, so that issue still remains.
One approach you could try is to upgrade your numpy with
pip install numpy --upgrade
as described here: RuntimeError: module compiled against API version a but this version of numpy is 9
Otherwise (if for some reason you cannot upgrade numpy) I would suggest going with a virtual environment for scikit-image project. I just tried it on Windows 10 and was able to successfully execute tests. My steps (from cmd, inside the project folder):
conda uninstall scikit-image to remove any previously built/installed versions
conda -n scikit-image python=3.6 to create a virtual environment for this project (I used python 3.6, but you can change it to 3.5)
activate scikit-image activated the new virtual env
pip install -r requirements.txt -- installed dependencies (without this step I wasn't getting the dependencies for tests installed)
pip install -e .
pytest
It turns out that pytest wasn't actually installed in the correct environment, it was being invoked from base which did indeed have numpy 1.13.3 installed. Installing it in the cleanly built Python 3.6 environment solved the problem for Python 3.6 at least.
I am writing a script that simply asks the google api for the latitudes and longitudes for a list of addresses read in from a csv file and outputs an html with the googlemap widget embedded. Further I hoped to run pyinstaller in order to make this into a .exe.
Running the code on my original conda environment it works fine however the .exe that pyinstaller creates is massive for such a small script (over 300mb). As such, I created a new virtual environment in which to work and have installed what I believe to be the bare minimum packages necessary and have rewritten the code to use as few packages as I am able which for the currently working portion of the code dropped it down considerably to just over 10 mb. (No numpy or pandas for me... ah well).
The code again works fine up until the final step:
from ipywidgets.embed import embed_minimal_html
embed_minimal_html("exporttest.html", None)
The above line should take any widgets, in particular the figure created from
fig = gmaps.figure(layout=figure_layout)
markers = gmaps.marker_layer(coordinates)
fig.add_layer(markers)
fig
Running the currently modified version in my original conda environment with all my of my usual packages installed this runs as expected without errors. Running on the virtual environment however on the mentioned lines I get the following key error:
KeyError Traceback (most recent call last)
c:\programdata\anaconda3\envs\synod_environ\lib\sre_parse.py in
parse_template(source, pattern)
1020 try:
-> 1021 this = chr(ESCAPES[this][1])
1022 except KeyError:
KeyError: '\\u'
During handling of the above exception, another exception occurred:
error Traceback (most recent call last)
<ipython-input-5-3359941239ab> in <module>
1 from ipywidgets.embed import embed_minimal_html
2
----> 3 embed_minimal_html("exporttest.html", None)
...
error: bad escape \u at position 0
(For clarification, key error has two slashes before the u, some frustration in getting this to post correctly)
As the code runs correctly in the one environment but not the other, I can only assume that I'm missing a package somewhere that ipywidgets requires, but running pip check doesn't notify me of anything missing.
pip list returns the following packages:
altgraph 0.16.1
backcall 0.1.0
bleach 3.0.2
certifi 2018.10.15
chardet 3.0.4
colorama 0.4.0
decorator 4.3.0
defusedxml 0.5.0
entrypoints 0.2.3
future 0.17.1
geojson 2.4.1
gmaps 0.8.2
idna 2.7
ipykernel 5.1.0
ipython 7.1.1
ipython-genutils 0.2.0
ipywidgets 7.4.2
jedi 0.13.1
Jinja2 2.10
jsonschema 2.6.0
jupyter 1.0.0
jupyter-client 5.2.3
jupyter-console 6.0.0
jupyter-core 4.4.0
macholib 1.11
MarkupSafe 1.0
mistune 0.8.4
nbconvert 5.4.0
nbformat 4.4.0
notebook 5.7.0
pandocfilters 1.4.2
parso 0.3.1
pefile 2018.8.8
pickleshare 0.7.5
pip 10.0.1
prometheus-client 0.4.2
prompt-toolkit 2.0.7
Pygments 2.2.0
PyInstaller 3.4
python-dateutil 2.7.5
pywin32-ctypes 0.2.0
pywinpty 0.5.4
pyzmq 17.1.2
qtconsole 4.4.2
requests 2.20.0
Send2Trash 1.5.0
setuptools 40.4.3
six 1.11.0
terminado 0.8.1
testpath 0.4.2
tornado 5.1.1
traitlets 4.3.2
urllib3 1.24
wcwidth 0.1.7
webencodings 0.5.1
wheel 0.32.2
widgetsnbextension 3.4.2
wincertstore 0.2
Any thoughts on how to further identify what went wrong, what package might be missing or how to fix the issue, and/or alternate ways to save a googlemaps output?
Fiddling with it and comparing from one environment to the other, I found that my virtual environment had ipywidgets 7.4.2 while the base environment had ipywidgets 7.2.1. Downgrading versions fixed the issue I was having.
I followed the instructions here
Steps 1 and 2 have been checked. My Intel(R) Core(TM) i7-2860QM CPU # 2.50GHz CPU can support 64-bit and Intel virtalization tech and it is currently using virtualization tech according to my BIOS. Checking step 3: I'm on Ubuntu so no antivirus software and I'm not running any system level debugging. Now see the attached image, even though I set the VM to be 64-bit on the left, it is still 32-bit on the right.
I know that the settings are merely for organizational purposes and that they can't actually change the bitness of the VM. I downloaded the VM here - https://developer.microsoft.com/en-us/microsoft-edge/tools/vms/linux/. None of them are marked as 64-bit, so I do not know how to guarantee that I have a 64-bit Windows image
This is not the main issue I'm trying to solve though. It has been inferred as being the cause of my main issue.
Same code works on Ubuntu 14.04 but not the Windows 7 VM. Below you'll see me debugging and all variables look identical.
Next I type the error-causing line into the console and sure enough on one OS we have no issues and on the other we blow up
>>> np.asarray(frames, dtype=np.float32)
Traceback (most recent call last):
File "C:\Program Files\JetBrains\PyCharm 2016.1.2\helpers\pydev\_pydevd_bundle\pydevd_exec.py", line 3, in Exec
exec exp in global_vars, local_vars
File "<input>", line 1, in <module>
File "C:\Python27\lib\site-packages\numpy\core\numeric.py", line 482, in asarray
return array(a, dtype, copy=False, order=order)
MemoryError
Now the part that gives me the creeps. When I go back to the debugging tab from the console in Ubuntu I see many new variables have spontaneously been created even though I only typed one line into the Python console
I figure there should be a package problem.
I'm using Python 2.7.11 on the Windows 7 VM and these packages are installed
FITS-tools 0.0.dev0
Pillow 3.2.0 3.2.0
Pillow-PIL 0.1.dev0 0.1dev
PyQt4 4.11.4 4.11.4
astropy 1.1.2 1.1.2
cycler 0.10.0
image-registration 0.2.2.dev272
matplotlib 1.5.1 1.5.1
numpy 1.11.0 1.11.0
parmap 1.2.3 1.2.3
pip 8.1.1 8.1.1
pyfits 3.4 3.4
pyparsing 2.1.1 2.1.1
pyqtgraph 0.9.10 0.9.10
python-dateutil 2.5.3 2.5.3
pytz 2016.4 2016.4
scipy 0.17.0 0.17.0
setuptools 20.10.1 21.0.0
six 1.10.0 1.10.0
wheel 0.29.0 0.29.0
On Ubuntu - to my surprise - I'm using Python 2.7.6 (and for some reason I get a make: *** [libinstall] Error 1 when I try to upgrade to 2.7.11 but that's another issue). Here are the packages I ave installed on the working Ubuntu side
BeautifulSoup 3.2.1 3.2.1
CherryPy 3.2.2 5.3.0
Cython 0.22 0.24
Django 1.9.1 1.9.6
Markdown 2.4 2.6.6
PAM 0.4.2
Pillow 2.3.0 3.2.0
PyOpenGL 3.0.2 3.1.1a1
Pygments 1.6 2.1.3
Routes 2.0 2.3.1
Twisted-Core 13.2.0
Twisted-Web 13.2.0
VTK 5.8.0
WebOb 1.3.1 1.6.0
adium-theme-ubuntu 0.3.4
amqplib 1.0.2 1.0.2
apptools 4.3.0 4.4.0
apsw 3.8.2-r1 3.9.2-r1
apt-xapian-index 0.45
argparse 1.2.1 1.4.0
astropy 1.1.2 1.1.2
cffi 0.8.6 1.6.0
chardet 2.0.1 2.3.0
colorama 0.2.5 0.3.7
command-not-found 0.3
configobj 5.0.6 5.0.6
cssselect 0.9.1 0.9.1
cssutils 0.9.10 1.0.1
debtagshw 0.1
defer 1.0.6 1.0.4
deluge 1.3.6
dirspec 13.10 13.08
dnspython 1.11.1 1.12.0
duplicity 0.6.23
envisage 4.1.0 4.5.1
feedparser 5.1.3 5.2.1
h5py 2.2.1 2.6.0
html5lib 0.999 0.9999999
httplib2 0.8 0.9.2
image-registration 0.2.2.dev272
ipython 3.1.0 4.2.0
libtfr 1.0.4 2.0.0b4
lockfile 0.8 0.12.2
lxml 3.3.3 3.6.0
matplotlib 1.4.3 1.5.1
mayavi 4.4.3 4.4.4
mechanize 0.2.5 0.2.5
mock 1.0.1 2.0.0
netifaces 0.8 0.10.4
nose 1.3.7 1.3.7
numexpr 2.2.2 2.5.2
numpy 1.9.2 1.11.0
oauthlib 0.6.1 1.1.1
oneconf 0.3.7.14.04.1 0.0.1.dev0
pandas 0.16.1 0.18.1
parmap 1.2.3 1.2.3
pexpect 3.1 4.0.1
pip 1.5.4 8.1.1
piston-mini-client 0.7.5 0.7.5
plotly 1.6.17 1.9.10
ply 3.4 3.8
py 1.4.31 1.4.31
pyFFTW 0.9.2 0.10.1
pyOpenSSL 0.13 16.0.0
pycparser 2.10 2.14
pycrypto 2.6.1 2.6.1
pycups 1.9.66 1.9.73
pyface 5.0.0 5.1.0
pygame 1.9.1release
pygobject 3.12.0
pygpgme 0.3 0.3
pyparsing 2.0.3 2.1.1
pyqtgraph 0.9.10 0.9.10
pyserial 2.6 3.0.1
pysmbc 1.0.14.1 1.0.15.5
pytest 2.9.1 2.9.1
python-apt 0.9.3.5ubuntu2 0.7.8
python-dateutil 2.4.2 2.5.3
python-debian 0.1.21-nmu2ubuntu2 0.1.23
python-libtorrent 0.16.13 1.1.0
pytz 2015.4 2016.4
pyxdg 0.25 0.25
pyzmq 14.7.0 15.2.0
reportlab 3.0 3.3.0
repoze.lru 0.6 0.6
requests 2.2.1 2.10.0
scikit-learn 0.17.1 0.17.1
scipy 0.15.1 0.17.0
sessioninstaller 0.0.0
setuptools 3.3 21.0.0
simplejson 3.7.3 3.8.2
six 1.5.2 1.10.0
sklearn 0.0 0.0
software-center-aptd-plugins 0.0.0
system-service 0.1.6
tables 3.1.1 3.2.2
traits 4.5.0 4.5.0
traitsui 5.0.0 5.1.0
uTidylib 0.2 0.2
unity-lens-photos 1.0
urllib3 1.7.1 1.15.1
vboxapi 1.0 1.0
wheel 0.24.0 0.29.0
wsgiref 0.1.2 0.1.2
wxPython 2.8.12.1 2.9.1.1
wxPython-common 2.8.12.1 2.6.3.3
xdiagnose 3.6.3build2
xppy 0.7.0
zope.interface 4.0.5 4.1.3
You've got MemoryError. Means you requested allocation which is beyond available memory on your VM. Some potential reasons:
not enough memory allocated for VM (try to increase it)
2GB limit for process on Windows (run LARGEADDRESSAWARE python or move to 64 bit)
memory corruption so your heap is corrupted (debug your code)
Similar discussion Memory errors and list limits?
I created an environment called imagescraper and installed pip with it.
I then proceed to use pip to install a package called ImageScraper;
>>activate imagescraper
[imagescraper]>>pip install ImageScraper
Just to ensure that I have the package successfully installed:
>>conda list
[imagescraper] C:\Users\John>conda list
# packages in environment at C:\Anaconda2\envs\imagescrap
#
future 0.15.2 <pip>
imagescraper 2.0.7 <pip>
lxml 3.6.0 <pip>
numpy 1.11.0 <pip>
pandas 0.18.0 <pip>
pip 8.1.1 py27_1
python 2.7.11 4
python-dateutil 2.5.2 <pip>
pytz 2016.3 <pip>
requests 2.9.1 <pip>
setproctitle 1.1.9 <pip>
setuptools 20.3 py27_0
simplepool 0.1 <pip>
six 1.10.0 <pip>
vs2008_runtime 9.00.30729.1 0
wheel 0.29.0 py27_0
Before I launch Jupyter notebook, just to check where we are getting the path from:
[imagescraper] C:\Users\John>python
Python 2.7.11 |Continuum Analytics, Inc.| (default, Feb 16 2016, 09:58:36) [MSC
v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://anaconda.org
>>> import sys
>>> sys.executable
'C:\\Anaconda2\\envs\\imagescraper\\python.exe'
>>> import image_scraper
Seems ok, so I proceed to launch Jupyter notebook using
[imagescraper]>>jupyter notebook
Within the notebook I created a new book and when i tried the same;
import image_scraper
I am returned with:
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
<ipython-input-1-6c2b65c9cdeb> in <module>()
----> 1 import image_scraper
ImportError: No module named image_scraper
Doing the same to check the paths within Jupyter notebook, I get this;
import sys
sys.executable
'C:\\Anaconda2\\python.exe'
Which tells me that it is not referring to the environment where I installed the modules in.
Is there a way I can ensure that my notebooks all refer to its own env packages?
Here are two possible solutions:
You can register a new kernel based on your imagescraper environment. The kernel will start from the imagescraper environment and thus sees all its packages.
source activate imagescraper
conda install ipykernel
ipython kernel install --name imagescraper
This will add a new kernel named imagescraper to your jupyter dashboard.
Another solution is to install jupyter notebook into the imagescraper environment and start jupyter from the enviroment. This requires activating imagescraper whenever you start jupyter notebook.
source activate imagescraper
conda install notebook
jupyter notebook