pandas version is not updated after installing a new version on databricks - python

I am trying to solve a problem of pandas when I run python3.7 code on databricks.
The error is:
ImportError: cannot import name 'roperator' from 'pandas.core.ops' (/databricks/python/lib/python3.7/site-packages/pandas/core/ops.py)
the pandas version:
pd.__version__
0.24.2
I run
from pandas.core.ops import roperator
well on my laptop with
pandas 0.25.1
So, I tried to upgrade pandas on databricks.
%sh pip uninstall -y pandas
Successfully uninstalled pandas-1.1.2
%sh pip install pandas==0.25.1
Collecting pandas==0.25.1
Downloading pandas-0.25.1-cp37-cp37m-manylinux1_x86_64.whl (10.4 MB)
Requirement already satisfied: python-dateutil>=2.6.1 in /databricks/conda/envs/databricks-ml/lib/python3.7/site-packages (from pandas==0.25.1) (2.8.0)
Requirement already satisfied: numpy>=1.13.3 in /databricks/conda/envs/databricks-ml/lib/python3.7/site-packages (from pandas==0.25.1) (1.16.2)
Requirement already satisfied: pytz>=2017.2 in /databricks/conda/envs/databricks-ml/lib/python3.7/site-packages (from pandas==0.25.1) (2018.9)
Requirement already satisfied: six>=1.5 in /databricks/conda/envs/databricks-ml/lib/python3.7/site-packages (from python-dateutil>=2.6.1->pandas==0.25.1) (1.12.0)
Installing collected packages: pandas
ERROR: After October 2020 you may experience errors when installing or updating packages.
This is because pip will change the way that it resolves dependency conflicts.
We recommend you use --use-feature=2020-resolver to test your packages with the new resolver before it becomes the default.
mlflow 1.8.0 requires alembic, which is not installed.
mlflow 1.8.0 requires prometheus-flask-exporter, which is not installed.
mlflow 1.8.0 requires sqlalchemy<=1.3.13, which is not installed.
sklearn-pandas 2.0.1 requires numpy>=1.18.1, but you'll have numpy 1.16.2 which is incompatible.
sklearn-pandas 2.0.1 requires pandas>=1.0.5, but you'll have pandas 0.25.1 which is incompatible.
sklearn-pandas 2.0.1 requires scikit-learn>=0.23.0, but you'll have scikit-learn 0.20.3 which is incompatible.
sklearn-pandas 2.0.1 requires scipy>=1.4.1, but you'll have scipy 1.2.1 which is incompatible.
Successfully installed pandas-0.25.1
When I run:
import pandas as pd
pd.__version__
it is still:
0.24.2
Did I missed something ?
thanks

It's really recommended to install libraries via cluster initialization script. The %sh command is executed only on the driver node, but not on the executor nodes. And it also doesn't affect Python instance that is already running.
The correct solution will be to use dbutils.library commands, like this:
dbutils.library.installPyPI("pandas", "1.0.1")
dbutils.library.restartPython()
this will install library to all places, but it will require restarting of the Python to pickup new libraries.
Also, although it's possible to specify only package name, it's recommended to specify version explicitly, as some of the library version may not be compatible with runtime. Also, consider usage of the newer runtimes where library versions are already updated - check the release notes for runtimes to figure out the library versions installed out of the box.
For newer Databricks runtimes you can use new magic commands: %pip and %conda to install dependencies. See the documentation for more details.

Related

Jupyter packages

I am trying to import certain packages as I am working with Jupyter notebook files, and most of the packages seem to be missing, even though I have installed them. For example, when I do the command: from bs4 import BeautifulSoup or import requests
I get the error saying ModuleNotFoundError: No module named 'bs4' for the first one and a similar one for importing requests as well. I have tried pip install requests and pip install bs4, but same issue persists. I have installed them on:
"(base) aminnazemzadeh#amins-MacBook-Pro ~ % " which seems to be my home directory, and I also have anaconda3 installed alongside python3. What is the issue that I cannot import these modules.
I am using visual studio if it makes any difference
Once I add :
!pip install requests
!pip install bs4
I get:
/Users/aminnazemzadeh/.zshenv:.:1: no such file or directory: /Users/aminnazemzadeh/.cargo/env
Requirement already satisfied: requests in /Users/aminnazemzadeh/opt/anaconda3/lib/python3.9/site-packages (2.28.1)
Requirement already satisfied: charset-normalizer<3,>=2 in /Users/aminnazemzadeh/opt/anaconda3/lib/python3.9/site-packages (from requests) (2.0.4)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /Users/aminnazemzadeh/opt/anaconda3/lib/python3.9/site-packages (from requests) (1.26.11)
Requirement already satisfied: idna<4,>=2.5 in /Users/aminnazemzadeh/opt/anaconda3/lib/python3.9/site-packages (from requests) (3.3)
Requirement already satisfied: certifi>=2017.4.17 in /Users/aminnazemzadeh/opt/anaconda3/lib/python3.9/site-packages (from requests) (2022.9.24)
/Users/aminnazemzadeh/.zshenv:.:1: no such file or directory: /Users/aminnazemzadeh/.cargo/env
Requirement already satisfied: bs4 in /Users/aminnazemzadeh/opt/anaconda3/lib/python3.9/site-packages (0.0.1)
Requirement already satisfied: beautifulsoup4 in /Users/aminnazemzadeh/opt/anaconda3/lib/python3.9/site-packages (from bs4) (4.11.1)
Requirement already satisfied: soupsieve>1.2 in /Users/aminnazemzadeh/opt/anaconda3/lib/python3.9/site-packages (from beautifulsoup4->bs4) (2.3.1)
followed by this warning:
ModuleNotFoundError Traceback (most recent call last)
Cell In[7], line 4
2 get_ipython().system('pip install bs4')
3 from urllib.request import urlopen
----> 4 from bs4 import BeautifulSoup
ModuleNotFoundError: No module named 'bs4'
Thanks
probably you're installing the packages on an environment other than the one vs code is using. you can try installing the packages directly from your jupyter notebook by running the following code in a notebook cell. the current best practice to be running installs in the notebook is using the magic commands %pip or %conda:
%pip install requests beautifulsoup4
# or
%conda install requests beautifulsoup4
this should install the packages in the same environment that the notebook is running on.
note that you may need to restart the kernel to use the affected packages.
sources:
Jupyter Discourse Forum - Location of libraries or extensions installed in JupyterLab
Jupyter Discourse Forum - Why users can install modules from pip but not from conda?
Installing Beautiful Soup
ps: thanks #wayne for the comments regarding the current best practices for installing on the current running environment.
If you're using conda, you should install via conda whenever possible. When you install via pip, conda loses some of its ability to manage dependency versions.
Try creating a new conda environment, install the needed packages via conda, then set the kernel to your new environment in vscode. Dedicate conda environments to specific projects. It is okay to have a default/generic environment for playing around but not for any significant work as you can easily create errors in your other work if a dependency changes to an incompatible version.
Conda cheet sheet for reference if you need it: https://docs.conda.io/projects/conda/en/4.6.0/_downloads/52a95608c49671267e40c689e0bc00ca/conda-cheatsheet.pdf
You will need the Jupyter extension in vscode if you do not already have it installed.
You will also need to install the corresponding jupyter package in your conda environment.

pip can't find a tensorflow-cpu version which is listed in pypi.python.org

I am trying to pip install rasa on Ubuntu x64 in a new Python 3.6.9 virtual environment.
Collecting tensorflow-cpu~=1.15.0 (from rasa)
Cache entry deserialization failed, entry ignored
Could not find a version that satisfies the requirement tensorflow-cpu~=1.15.0 (from rasa) (from versions: )
No matching distribution found for tensorflow-cpu~=1.15.0 (from rasa)
If I pip install tensorflow-cpu~=1.15.0 I get the same error. I also cannot pip install tensorflow-cpu.
According to this answer I can list available package versions. It uses this snippet
def versions(pkg_name):
url = f'https://pypi.python.org/pypi/{pkg_name}/json'
releases = json.loads(request.urlopen(url).read())['releases']
return sorted(releases, key=parse_version, reverse=True)
Running it with pkg_name="tensorflow-cpu" I get
2.1.0
2.1.0rc2
2.1.0rc1
2.1.0rc0
1.15.0
1.15.0rc3
1.15.0rc2
1.15.0rc1
1.15.0rc0
But 1.15.0 is in this list. So why can't pip install it?

Cannot read ".parquet" files in Azure Jupyter Notebook (Python 2 and 3)

I am currently trying to open parquet files using Azure Jupyter Notebooks. I have tried both Python kernels (2 and 3).
After the installation of pyarrow I can import the module only if the Python kernel is 2 (not working with Python 3)
Here is what I've done so far (for clarity, I am not mentioning all my various attempts, such as using conda instead of pip, as it also failed):
!pip install --upgrade pip
!pip install -I Cython==0.28.5
!pip install pyarrow
import pandas
import pyarrow
import pyarrow.parquet
#so far, so good
filePath_parquet = "foo.parquet"
table_parquet_raw = pandas.read_parquet(filePath_parquet, engine='pyarrow')
This works well if I'm doing that off-line (using Spyder, Python v.3.7.0). But it fails using an Azure Notebook.
AttributeErrorTraceback (most recent call last)
<ipython-input-54-2739da3f2d20> in <module>()
6
7 #table_parquet_raw = pd.read_parquet(filePath_parquet, engine='pyarrow')
----> 8 table_parquet_raw = pandas.read_parquet(filePath_parquet, engine='pyarrow')
AttributeError: 'module' object has no attribute 'read_parquet'
Any idea please?
Thank you in advance !
EDIT:
Thank you very much for your reply Peter Pan !
I have typed these statements, here is what I got:
1.
print(pandas.__dict__)
=> read_parquet does not appear
2.
print(pandas.__file__)
=> I get:
/home/nbuser/anaconda3_23/lib/python3.4/site-packages/pandas/__init__.py
import sys; print(sys.path) => I get:
['', '/home/nbuser/anaconda3_23/lib/python34.zip',
'/home/nbuser/anaconda3_23/lib/python3.4',
'/home/nbuser/anaconda3_23/lib/python3.4/plat-linux',
'/home/nbuser/anaconda3_23/lib/python3.4/lib-dynload',
'/home/nbuser/.local/lib/python3.4/site-packages',
'/home/nbuser/anaconda3_23/lib/python3.4/site-packages',
'/home/nbuser/anaconda3_23/lib/python3.4/site-packages/Sphinx-1.3.1-py3.4.egg',
'/home/nbuser/anaconda3_23/lib/python3.4/site-packages/setuptools-27.2.0-py3.4.egg',
'/home/nbuser/anaconda3_23/lib/python3.4/site-packages/IPython/extensions',
'/home/nbuser/.ipython']
Do you have any idea please ?
EDIT 2:
Dear #PeterPan, I have typed both !conda update conda and !conda update pandas : when checking the Pandas version (pandas.__version__), it is still 0.19.2.
I have also tried with !conda update pandas -y -f, it returns:
`Fetching package metadata ...........
Solving package specifications: .
Package plan for installation in environment /home/nbuser/anaconda3_23:
The following NEW packages will be INSTALLED:
pandas: 0.19.2-np111py34_1`
When typing:
!pip install --upgrade pandas
I get:
Requirement already up-to-date: pandas in /home/nbuser/anaconda3_23/lib/python3.4/site-packages
Requirement already up-to-date: pytz>=2011k in /home/nbuser/anaconda3_23/lib/python3.4/site-packages (from pandas)
Requirement already up-to-date: numpy>=1.9.0 in /home/nbuser/anaconda3_23/lib/python3.4/site-packages (from pandas)
Requirement already up-to-date: python-dateutil>=2 in /home/nbuser/anaconda3_23/lib/python3.4/site-packages (from pandas)
Requirement already up-to-date: six>=1.5 in /home/nbuser/anaconda3_23/lib/python3.4/site-packages (from python-dateutil>=2->pandas)
Finally, when typing:
!pip install --upgrade pandas==0.24.0
I get:
Collecting pandas==0.24.0
Could not find a version that satisfies the requirement pandas==0.24.0 (from versions: 0.1, 0.2b0, 0.2b1, 0.2, 0.3.0b0, 0.3.0b2, 0.3.0, 0.4.0, 0.4.1, 0.4.2, 0.4.3, 0.5.0, 0.6.0, 0.6.1, 0.7.0rc1, 0.7.0, 0.7.1, 0.7.2, 0.7.3, 0.8.0rc1, 0.8.0rc2, 0.8.0, 0.8.1, 0.9.0, 0.9.1, 0.10.0, 0.10.1, 0.11.0, 0.12.0, 0.13.0, 0.13.1, 0.14.0, 0.14.1, 0.15.0, 0.15.1, 0.15.2, 0.16.0, 0.16.1, 0.16.2, 0.17.0, 0.17.1, 0.18.0, 0.18.1, 0.19.0rc1, 0.19.0, 0.19.1, 0.19.2, 0.20.0rc1, 0.20.0, 0.20.1, 0.20.2, 0.20.3, 0.21.0rc1, 0.21.0, 0.21.1, 0.22.0)
No matching distribution found for pandas==0.24.0
Therefore, my guess is that the problem comes from the way the packages are managed in Azure. Updating a package (here Pandas), should lead to an update to the latest version available, shouldn't it?
I tried to reproduce your issue on my Azure Jupyter Notebook, but failed. There was no any issue for me without doing your two steps !pip install --upgrade pip & !pip install -I Cython==0.28.5 which I think not matter.
Please run some codes below to check your import package pandas whether be correct.
Run print(pandas.__dict__) to check whether has the description of read_parquet function in the output.
Run print(pandas.__file__) to check whether you imported a different pandas package.
Run import sys; print(sys.path) to check the order of paths whether there is a same named file or directory under these paths.
If there is a same file or directory named pandas, you just need to rename it and restart your ipynb to re-run. It's a common issue which you can refer to these SO threads AttributeError: 'module' object has no attribute 'reader' and Importing installed package from script raises "AttributeError: module has no attribute" or "ImportError: cannot import name".
In Other cases, please update your post for more details to let me know.
The latest pandas version should be 0.23.4, not 0.24.0.
I tried to find out the earliest version of pandas which support the read_parquet feature via search the function name read_parquet in the documents of different version from 0.19.2 to 0.23.3. Then, I found pandas supports read_parquet feature after the version 0.21.1, as below.
The new features shown in the What's New of version 0.21.1
According to your EDIT 2 description, it seems that you are using Python 3.4 in Azure Jupyter Notebook. Not all pandas versions support Python 3.4 version.
The versions 0.21.1 & 0.22.0 offically support Python 2.7,3.5, and 3.6, as below.
And the PyPI page for pandas also requires the Python version as below.
So you can try to install the pandas versions 0.21.1 & 0.22.0 in the current notebook of Python 3.4. if failed, please create a new notebook in Python 2.7 or >=3.5 to install pandas version >= 0.21.1 to use the function read_parquet.

python package version incompatibility even on virtualenv

I am trying to install cuckoo sandbox(malware analysis tool).
I am doing pip install -U cuckoo as stated in cuckoo documentation, but it gives me following error
pandas 0.23.3 has requirement python-dateutil>=2.5.0, but you'll have python-dateutil 2.4.2 which is incompatible
So I thought maybe there is some package named python-dateutil and pandas is using its some version which is >= 2.5.0 but cuckoo needs its 2.4.2 version, so to not cause instability it's not getting installed.
So I thought of creating a virtualenv venv and install cuckoo in that. As there are no pandas in venv/lib/python2.7/site-packages installing a previous version of python-dateutil shouldn't be a problem. But again I am getting the same error. I am not getting where is the problem.

No module named googleapiclient.discovery

I have been looking to implement the example Python scripts I have found online to allow me to interact with the YouTube API as per the GitHub link found here
The problem I am having is with the import statement at the start:
import argparse
from googleapiclient.discovery import build
from googleapiclient.errors import HttpError
The online documentation requires the following command to install the googleapiclient library:
pip install --upgrade google-api-python-client
However, once installed I am still receiving an error that googleapiclient.discovery cannot be found. I have tried reinstalling via pip, with the following command line output generated, suggesting all is well:
Requirement already up-to-date: google-api-python-client in g:\python27\lib\site-packages (1.7.4)
Requirement not upgraded as not directly required: httplib2<1dev,>=0.9.2 in g:\python27\lib\site-packages (from google-api-python-client) (0.9.2)
Requirement not upgraded as not directly required: google-auth>=1.4.1 in g:\python27\lib\site-packages (from google-api-python-client) (1.5.0)
Requirement not upgraded as not directly required: google-auth-httplib2>=0.0.3 in g:\python27\lib\site-packages (from google-api-python-client) (0.0.3)
Requirement not upgraded as not directly required: six<2dev,>=1.6.1 in g:\python27\lib\site-packages (from google-api-python-client) (1.10.0)
Requirement not upgraded as not directly required: uritemplate<4dev,>=3.0.0 in g:\python27\lib\site-packages (from google-api-python-client) (3.0.0)
Requirement not upgraded as not directly required: rsa>=3.1.4 in g:\python27\lib\site-packages (from google-auth>=1.4.1->google-api-python-client) (3.4.2)
Requirement not upgraded as not directly required: cachetools>=2.0.0 in g:\python27\lib\site-packages (from google-auth>=1.4.1->google-api-python-client) (2.1.0)
Requirement not upgraded as not directly required: pyasn1-modules>=0.2.1 in g:\python27\lib\site-packages (from google-auth>=1.4.1->google-api-python-client) (0.2.2)
Requirement not upgraded as not directly required: pyasn1>=0.1.3 in g:\python27\lib\site-packages (from rsa>=3.1.4->google-auth>=1.4.1->google-api-python-client) (0.1.9)
pyasn1-modules 0.2.2 has requirement pyasn1<0.5.0,>=0.4.1, but you'll have pyasn1 0.1.9 which is incompatible.
What am I doing wrong?
Thanks
In case you are running Python3 (python --version), perhaps you should run this instead:
pip3 install google-api-python-client
Another quick way to counter this problem could be to install the package in the same folder as your code:
pip install google-api-python-client -t ./
That's not ideal but it will definitely work.
Or if you prefer to move external libraries to a lib/ folder:
pip install google-api-python-client -t ./lib
in that last case you will also need this at the beginning of your Python code:
import os
import sys
file_path = os.path.dirname(__file__)
module_path = os.path.join(file_path, "lib")
sys.path.append(module_path)
from googleapiclient.discovery import build
This solution is only applicable to those using "Visual studio" for building flask apps.(others can try though)
The only thing you need to check is "from where am I importing all my libraries" follow the process below while creating new env.
Python environments >(Right click) > Add new env > check the "View in python environments window".
I faced similar issue while I was trying to write code involving 'YouTube API' in VS Code. On the suggestion by many folks from online coding forums I ran
pip install --upgrade google-api-python-client
but it didn't help.
Taking following steps resolved the issue for me:
In VSCode go to 'Settings' (Ctrl + , on Windows), inside 'Search settings' enter venv and under the heading for 'Python: Venv Path' enter the path for your virtual environment as seen in the following screenshot:
settings for Python: Venv Path in VS Code
And, then click on the Python interpreter in VS Code as seen below: (the selected interpreter reflects at the bottom left corner of the VS Code editor)
Python interpreter path

Categories

Resources