Adding pandas dependencies after kedro new - python

I began a new project with kedro new without adding the files from the iris example. The original requirements.txt looked like:
black==v19.10b0
flake8>=3.7.9, <4.0
ipython~=7.0
isort>=4.3.21, <5.0
jupyter~=1.0
jupyter_client>=5.1, < 7.0
jupyterlab==0.31.1
kedro==0.16.6
nbstripout==0.3.3
pytest-cov~=2.5
pytest-mock>=1.7.1, <2.0
pytest~=5.0
wheel==0.32.2
I then ran kedro install to install the packages, generating requirements.in and requirements.txt. I now want to install the necessary dependencies for working with pandas and csv files. I tried updating the requirements.in with the line: kedro[pandas]==0.16.6 and then executing kedro install --build-reqs. However, that line fails with the error:
Could not find a version that matches pyarrow<1.0.0,<2.0dev,>=0.12.0,>=1.0.0 (from kedro[pandas]==0.16.6->-r /lrlhps/data/busanalytics/Guilherme/Projects/kedro-environment/spaceflights/src/requirements.in (line 8))
Tried: 0.9.0, 0.10.0, 0.11.0, 0.11.1, 0.12.0, 0.12.1, 0.13.0, 0.14.0, 0.15.1, 0.16.0, 0.16.0, 0.16.0, 0.16.0, 0.17.0, 0.17.0, 0.17.0, 0.17.0, 0.17.1, 0.17.1, 0.17.1, 0.17.1, 1.0.0, 1.0.0, 1.0.0, 1.0.0, 1.0.1, 1.0.1, 1.0.1, 1.0.1, 2.0.0, 2.0.0, 2.0.0
There are incompatible versions in the resolved dependencies:
pyarrow<2.0dev,>=1.0.0 (from google-cloud-bigquery[bqstorage,pandas]==2.2.0->pandas-gbq==0.14.0->kedro[pandas]==0.16.6->-r /Projects/kedro/spaceflights/src/requirements.in (line 8))
pyarrow<1.0.0,>=0.12.0 (from kedro[pandas]==0.16.6->-r /Projects/kedro/spaceflights/src/requirements.in (line 8))
Question: Is it possible to update requirements.in and have the pandas dependencies installed with the --build-reqs option? Or must I install the dependency with pip?

You should be able to install pandas by adding which specific components you wish to use, as exemplified in the documentation:
The dependencies above may be sufficient for some projects, but for the
spaceflights project, you need to add a requirement for the pandas project
because you are working with CSV and Excel files. You can add the necessary
dependencies for these files types as follows:
kedro[pandas.CSVDataSet,pandas.ExcelDataSet]==0.17.0
https://kedro.readthedocs.io/en/stable/03_tutorial/02_tutorial_template.html#add-and-remove-project-specific-dependencies
For instance, after adding
kedro[pandas.CSVDataSet]==0.17.0
to your requirements.in and issuing a kedro build-reqs, you should see
kedro[pandas.csvdataset]==0.17.0 # via -r /.../src/requirements.in
(...)
pandas==1.2.0 # via kedro
in your requirements.txt file.

Related

Pipenv Install command not able to find compatible version

I am trying to install pyarrow 4.0.0 in my project. Python version is 3.6.
install pyarrow==4.0.0
I am getting the following error
ERROR: Could not find a version that satisfies the requirement pyarrow~=4.0.0 (from versions: 0.9.0, 0.10.0, 0.11.0, 0.11.1, 0.12.0, 0.12.1, 0.13.0, 0.14.0, 0.15.1, 0.16.0, 0.17.0, 0.17.1, 1.0.0, 1.0.1, 2.0.0, 3.0.0, 4.0.0, 4.0.1, 5.0.0, 6.0.0, 6.0.1)
ERROR: No matching distribution found for pyarrow~=4.0.0
How come it is not able to find the version 4.0.0 when it is listed in the versions list. I am also able to find it in the source mentioned in pipfile.
[[source]]
url = "https://pypi.python.org/simple"
verify_ssl = true
name = "pypi"
it should be able to install the package since it is available.
Device: Macbook pro M1. Have tried 4.0.1 and 4.0.* with no success.
The first pyarrow release that was released with wheels support for M1 was 5.0.0. See this JIRA ticket for some historic:
https://issues.apache.org/jira/browse/ARROW-12122

Pip can't install flask-socketio

I want to install the module flask-socketio using pip2, however I get an error, that no matching bidict version was found. I looked bidict up, and it turns out, that the version doesnt even exist. I tried installing some other packets, but nothing worked. Here you can see the full error
'''
ERROR: Could not find a version that satisfies the requirement bidict>=0.21.0 (from python-socketio>=5.0.2->flask-socketio) (from versions: 0.1.5, 0.2.1, 0.3.0, 0.3.1, 0.9.0rc0, 0.9.0.post1, 0.10.0, 0.10.0.post1, 0.11.0, 0.12.0.post1, 0.13.0, 0.13.1, 0.14.0, 0.14.1, 0.14.2, 0.15.0.dev0, 0.15.0.dev1, 0.15.0rc1, 0.15.0, 0.16.0, 0.17.0, 0.17.1, 0.17.2, 0.17.3, 0.17.4, 0.17.5, 0.18.0, 0.18.1, 0.18.2, 0.18.3, 0.18.4)
ERROR: No matching distribution found for bidict>=0.21.0 (from python-socketio>=5.0.2->flask-socketio)
'''
Any Ideas?
Flask-SocketIO dropped support for Python 2 at version 5.0.0. Install an older version:
pip install "Flask-SocketIO<5.0.0"
This solved the problem on my Raspberry Pi:
pip3 install flask-socketio

What to do if pip can't find right versions from requirements.txt

I have found a github repo and cloned it, created and activated a venv, and then attempted to install the requirements from the requirements.txt
In this example, pip fails, saying:
ERROR: Could not find a version that satisfies the requirement tensorflow==1.5.0 (from -r requirements.txt (line 54)) (from versions: 1.13.0rc1, 1.13.0rc2, 1.13.1, 1.13.2, 1.14.0rc0, 1.14.0rc1, 1.14.0, 1.15.0rc0, 1.15.0rc1, 1.15.0rc2, 1.15.0rc3, 1.15.0, 1.15.2, 1.15.3, 2.0.0a0, 2.0.0b0, 2.0.0b1, 2.0.0rc0, 2.0.0rc1, 2.0.0rc2, 2.0.0, 2.0.1, 2.0.2, 2.1.0rc0, 2.1.0rc1, 2.1.0rc2, 2.1.0, 2.1.1, 2.2.0rc0, 2.2.0rc1, 2.2.0rc2, 2.2.0rc3, 2.2.0rc4, 2.2.0, 2.3.0rc0, 2.3.0rc1, 2.3.0rc2, 2.3.0)
This has happened to me in the past too, with a completely different project using the same tensorflow version.
Why is this tf version not in the list? Did the tf authors remove versions? Where can I find it?
How do I troubleshoot these situations in the future?
It just seems silly that the whole point in a venv and requirements.txt is to make sure I'm running the exact same packages and versions that the author has, but it trips up at the first hurdle.
if the package is not available on pypi, you need to manually download and install from the github repo or where ever it exists.
This is a problem probably concerning Tensorflow already described here:
TensorFlow not found using pip
This is likely due to OP's version of Python. TensorFlow 1.5.0 was released on January 26, 2018 and supports Python versions 2.7 and 3.3-3.6. The solution is to use a supported Python version. Python 3.7+ is not supported because Python 3.7 was released on June 27, 2018, after the release of TensorFlow 1.5.0.
See the TensorFlow 1.5.0 PyPI page for more information.

Could not find version error during install Imblearn in python

I am trying to install Imblearn for doing SMOTE in python. I have been trying to install imblearn package for sometime. But I am constantly getting errors. The following are the commands that I have tried,
pip install imblearn
pip install git+https://github.com/fmfn/UnbalancedDataset
And following is the error that I am getting,
Collecting imblearn
Using cached https://files.pythonhosted.org/packages/81/a7/4179e6ebfd654bd0eac0b9c06125b8b4c96a9d0a8ff9e9507eb2a26d2d7e/imblearn-0.0-py2.py3-none-any.whl
Collecting imbalanced-learn (from imblearn)
Using cached https://files.pythonhosted.org/packages/e0/87/39a4cecebc7fb9ddb433fe8bc7f76379b4918a0ade91f8a1423dc25c7ddc/imbalanced-learn-0.5.0.tar.gz
Requirement already satisfied: numpy>=1.11 in
Requirement already satisfied: scipy>=0.17 in
Collecting scikit-learn>=0.21 (from imbalanced-learn->imblearn)
ERROR: Could not find a version that satisfies the requirement scikit-learn>=0.21 (from imbalanced-learn->imblearn) (from versions: 0.9, 0.10, 0.11, 0.12, 0.12.1, 0.13, 0.13.1, 0.14, 0.14.1, 0.15.0b1, 0.15.0b2, 0.15.0, 0.15.1, 0.15.2, 0.16b1, 0.16.0, 0.16.1, 0.17b1, 0.17, 0.17.1, 0.18rc2, 0.18, 0.18.1, 0.18.2, 0.19b2, 0.19.0, 0.19.1, 0.19.2, 0.20rc1, 0.20.0, 0.20.1, 0.20.2, 0.20.3, 0.21rc2)
ERROR: No matching distribution found for scikit-learn>=0.21 (from imbalanced-learn->imblearn)
Because of this I am not able to proceed my modeling part as my data is heavily imbalanced. Can anybody please help me in installing this package? if not, what are the alternate ways to do SMOTE in python ?
Scikit-learn 0.20 was the last version to support Python2.7. Scikit-learn 0.21 and later require Python 3.5 or newer.
Refer to the following link:
https://pypi.org/project/scikit-learn/0.21.0/
I would recommend using Python3
the pip install command has changed. It is as below
https://pypi.org/project/scikit-learn/0.21.0/

Cannot read ".parquet" files in Azure Jupyter Notebook (Python 2 and 3)

I am currently trying to open parquet files using Azure Jupyter Notebooks. I have tried both Python kernels (2 and 3).
After the installation of pyarrow I can import the module only if the Python kernel is 2 (not working with Python 3)
Here is what I've done so far (for clarity, I am not mentioning all my various attempts, such as using conda instead of pip, as it also failed):
!pip install --upgrade pip
!pip install -I Cython==0.28.5
!pip install pyarrow
import pandas
import pyarrow
import pyarrow.parquet
#so far, so good
filePath_parquet = "foo.parquet"
table_parquet_raw = pandas.read_parquet(filePath_parquet, engine='pyarrow')
This works well if I'm doing that off-line (using Spyder, Python v.3.7.0). But it fails using an Azure Notebook.
AttributeErrorTraceback (most recent call last)
<ipython-input-54-2739da3f2d20> in <module>()
6
7 #table_parquet_raw = pd.read_parquet(filePath_parquet, engine='pyarrow')
----> 8 table_parquet_raw = pandas.read_parquet(filePath_parquet, engine='pyarrow')
AttributeError: 'module' object has no attribute 'read_parquet'
Any idea please?
Thank you in advance !
EDIT:
Thank you very much for your reply Peter Pan !
I have typed these statements, here is what I got:
1.
print(pandas.__dict__)
=> read_parquet does not appear
2.
print(pandas.__file__)
=> I get:
/home/nbuser/anaconda3_23/lib/python3.4/site-packages/pandas/__init__.py
import sys; print(sys.path) => I get:
['', '/home/nbuser/anaconda3_23/lib/python34.zip',
'/home/nbuser/anaconda3_23/lib/python3.4',
'/home/nbuser/anaconda3_23/lib/python3.4/plat-linux',
'/home/nbuser/anaconda3_23/lib/python3.4/lib-dynload',
'/home/nbuser/.local/lib/python3.4/site-packages',
'/home/nbuser/anaconda3_23/lib/python3.4/site-packages',
'/home/nbuser/anaconda3_23/lib/python3.4/site-packages/Sphinx-1.3.1-py3.4.egg',
'/home/nbuser/anaconda3_23/lib/python3.4/site-packages/setuptools-27.2.0-py3.4.egg',
'/home/nbuser/anaconda3_23/lib/python3.4/site-packages/IPython/extensions',
'/home/nbuser/.ipython']
Do you have any idea please ?
EDIT 2:
Dear #PeterPan, I have typed both !conda update conda and !conda update pandas : when checking the Pandas version (pandas.__version__), it is still 0.19.2.
I have also tried with !conda update pandas -y -f, it returns:
`Fetching package metadata ...........
Solving package specifications: .
Package plan for installation in environment /home/nbuser/anaconda3_23:
The following NEW packages will be INSTALLED:
pandas: 0.19.2-np111py34_1`
When typing:
!pip install --upgrade pandas
I get:
Requirement already up-to-date: pandas in /home/nbuser/anaconda3_23/lib/python3.4/site-packages
Requirement already up-to-date: pytz>=2011k in /home/nbuser/anaconda3_23/lib/python3.4/site-packages (from pandas)
Requirement already up-to-date: numpy>=1.9.0 in /home/nbuser/anaconda3_23/lib/python3.4/site-packages (from pandas)
Requirement already up-to-date: python-dateutil>=2 in /home/nbuser/anaconda3_23/lib/python3.4/site-packages (from pandas)
Requirement already up-to-date: six>=1.5 in /home/nbuser/anaconda3_23/lib/python3.4/site-packages (from python-dateutil>=2->pandas)
Finally, when typing:
!pip install --upgrade pandas==0.24.0
I get:
Collecting pandas==0.24.0
Could not find a version that satisfies the requirement pandas==0.24.0 (from versions: 0.1, 0.2b0, 0.2b1, 0.2, 0.3.0b0, 0.3.0b2, 0.3.0, 0.4.0, 0.4.1, 0.4.2, 0.4.3, 0.5.0, 0.6.0, 0.6.1, 0.7.0rc1, 0.7.0, 0.7.1, 0.7.2, 0.7.3, 0.8.0rc1, 0.8.0rc2, 0.8.0, 0.8.1, 0.9.0, 0.9.1, 0.10.0, 0.10.1, 0.11.0, 0.12.0, 0.13.0, 0.13.1, 0.14.0, 0.14.1, 0.15.0, 0.15.1, 0.15.2, 0.16.0, 0.16.1, 0.16.2, 0.17.0, 0.17.1, 0.18.0, 0.18.1, 0.19.0rc1, 0.19.0, 0.19.1, 0.19.2, 0.20.0rc1, 0.20.0, 0.20.1, 0.20.2, 0.20.3, 0.21.0rc1, 0.21.0, 0.21.1, 0.22.0)
No matching distribution found for pandas==0.24.0
Therefore, my guess is that the problem comes from the way the packages are managed in Azure. Updating a package (here Pandas), should lead to an update to the latest version available, shouldn't it?
I tried to reproduce your issue on my Azure Jupyter Notebook, but failed. There was no any issue for me without doing your two steps !pip install --upgrade pip & !pip install -I Cython==0.28.5 which I think not matter.
Please run some codes below to check your import package pandas whether be correct.
Run print(pandas.__dict__) to check whether has the description of read_parquet function in the output.
Run print(pandas.__file__) to check whether you imported a different pandas package.
Run import sys; print(sys.path) to check the order of paths whether there is a same named file or directory under these paths.
If there is a same file or directory named pandas, you just need to rename it and restart your ipynb to re-run. It's a common issue which you can refer to these SO threads AttributeError: 'module' object has no attribute 'reader' and Importing installed package from script raises "AttributeError: module has no attribute" or "ImportError: cannot import name".
In Other cases, please update your post for more details to let me know.
The latest pandas version should be 0.23.4, not 0.24.0.
I tried to find out the earliest version of pandas which support the read_parquet feature via search the function name read_parquet in the documents of different version from 0.19.2 to 0.23.3. Then, I found pandas supports read_parquet feature after the version 0.21.1, as below.
The new features shown in the What's New of version 0.21.1
According to your EDIT 2 description, it seems that you are using Python 3.4 in Azure Jupyter Notebook. Not all pandas versions support Python 3.4 version.
The versions 0.21.1 & 0.22.0 offically support Python 2.7,3.5, and 3.6, as below.
And the PyPI page for pandas also requires the Python version as below.
So you can try to install the pandas versions 0.21.1 & 0.22.0 in the current notebook of Python 3.4. if failed, please create a new notebook in Python 2.7 or >=3.5 to install pandas version >= 0.21.1 to use the function read_parquet.

Categories

Resources