splitting pandas dataframe - np.array_split error

splitting pandas dataframe - np.array_split error - python

I'm trying to split dataframe, this code used to work just fine:
split_dfs = np.array_split(big_df,8)
now it gives me error (i did a system update in between):
Traceback (most recent call last):
File "./prepare_fixations_dataset.py", line 127, in <module>
split_dfs = np.array_split(big_df,8)
File "/usr/lib/python2.7/site-packages/numpy/lib/shape_base.py", line 426, in array_split
if sub_arys[-1].size == 0 and sub_arys[-1].ndim != 1:
File "/usr/lib/python2.7/site-packages/pandas-0.15.1-py2.7-linux-x86_64.egg/pandas/core/generic.py", line 1936, in __getattr__
(type(self).__name__, name))
AttributeError: 'DataFrame' object has no attribute 'size'
if you have any siggestions why it may not work now please let me know.

the error is caused by the bug in numpy 1.9.0 or regression and incompatibility between numpy 1.9.0 and pandas 0.15.1.this bug doesn't happen with numpy 1.8.1 and pandas 0.15.1.
i filed a bug in the pandas github:
https://github.com/pydata/pandas/issues/8846
seems it is already fixed for pandas 0.15.2

Related

ERROR - secure_channel() got an unexpected keyword argument 'default_scopes' with .to_dataframe() on bigquery object

Environment Details
Python 3.7.12
google-api-core 1.23.0
google-auth 1.35.0
bigquery 2.3.1
let me know if i can provide any other library versions
We are querying some data from bigquery using python in airflow, and converting results into a dataframe. See this chunk of code:
from google.cloud import bigquery
bq = bigquery.Client()
query_result = bq.query(f"select count(*) as num_rows from our_project.ourdataset.our_table")
our_df = query_result.to_dataframe()
query_result is a <google.cloud.bigquery.job.query.QueryJob object at
When we run the last line our_df = query_result.to_dataframe(), we get the error TypeError: secure_channel() got an unexpected keyword argument 'default_scopes'. The whole error message is:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.7/site-packages/google/cloud/bigquery/job/query.py", line 1313, in to_dataframe
date_as_object=date_as_object,
File "/usr/local/lib/python3.7/site-packages/google/cloud/bigquery/table.py", line 1695, in to_dataframe
create_bqstorage_client=create_bqstorage_client,
File "/usr/local/lib/python3.7/site-packages/google/cloud/bigquery/table.py", line 1510, in to_arrow
bqstorage_client = self.client._create_bqstorage_client()
File "/usr/local/lib/python3.7/site-packages/google/cloud/bigquery/client.py", line 446, in _create_bqstorage_client
return bigquery_storage.BigQueryReadClient(credentials=self._credentials)
File "/usr/local/lib/python3.7/site-packages/google/cloud/bigquery_storage_v1/services/big_query_read/client.py", line 386, in __init__
or Transport == type(self).get_transport_class("grpc_asyncio")
File "/usr/local/lib/python3.7/site-packages/google/cloud/bigquery_storage_v1/services/big_query_read/transports/grpc.py", line 170, in __init__
("grpc.max_receive_message_length", -1),
File "/usr/local/lib/python3.7/site-packages/google/cloud/bigquery_storage_v1/services/big_query_read/transports/grpc.py", line 221, in create_channel
**kwargs,
File "/usr/local/lib/python3.7/site-packages/google/api_core/grpc_helpers.py", line 286, in create_channel
return grpc_gcp.secure_channel(target, composite_credentials, **kwargs)
TypeError: secure_channel() got an unexpected keyword argument 'default_scopes'
This line of code was previously working for us. The only change I recall us making since the code was last working was:
Installed dbt==0.19.0
to.data_frame() seems like the most basic python function there is, and it is quite frustrating that it is not working here for us. What can we do to resolve this?

I've found the issue, but I have no idea how to resolve. dbt-bigquery 0.19.0 depends on google-api-core<1.24 and >=1.16.0 When we installed dbt, it must have changed the google-api-core version from something greater, down to 1.23.0. However, google-api-core being as low as 1.23 is now causing this other issue with to_dataframe(). I know this because when I manually upgraded google-api-core to 1.26, .to_dataframe() was working again.
EDIT: upgrading dbt to 0.20.0 allows for google-api-core 1.3+, which resolves our issue!

For me upgrading google-auth, google-auth-httplib2, and google-api-core did the trick.

Trouble opening old pickle file

I am trying to load an old pickle file containing the airline dataset ( https://arxiv.org/abs/1611.06740 ) . The pickle is very old and I have problems accessing it. If I try:
objects = []
with (open("airline.pickle", "rb")) as openfile:
while True:
try:
objects.append(pickle.load(openfile))
except EOFError:
break
I get the following warning and error:
FutureWarning: pandas.core.index is deprecated and will be removed in a future version. The public classes are available in the top-level namespace.
objects.append(pickle.load(openfile))
Traceback (most recent call last):
File "c:\Users\LocalAdmin\surfdrive\Code\Python\Airline\pickleToCSV.py", line 9, in <module>
objects.append(pickle.load(openfile))
TypeError: _reconstruct: First argument must be a sub-type of ndarray
Trying with pandas does not work:
File "C:\Users\LocalAdmin\surfdrive\Code\Python\Airline\Airline\lib\site-packages\pandas\io\pickle.py", line 203, in read_pickle
return pickle.load(handles.handle) # type: ignore[arg-type]
TypeError: _reconstruct: First argument must be a sub-type of ndarray
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "c:\Users\LocalAdmin\surfdrive\Code\Python\Airline\pickleToCSV.py", line 7, in <module>
df = pd.read_pickle('airline.pickle')
File "C:\Users\LocalAdmin\surfdrive\Code\Python\Airline\Airline\lib\site-packages\pandas\io\pickle.py", line 208, in read_pickle
return pc.load(handles.handle, encoding=None)
File "C:\Users\LocalAdmin\surfdrive\Code\Python\Airline\Airline\lib\site-packages\pandas\compat\pickle_compat.py",
line 249, in load
return up.load()
File "C:\Users\LocalAdmin\AppData\Local\Programs\Python\Python39\lib\pickle.py", line 1212, in load
dispatch[key[0]](self)
File "C:\Users\LocalAdmin\AppData\Local\Programs\Python\Python39\lib\pickle.py", line 1725, in load_build
for k, v in state.items():
AttributeError: 'tuple' object has no attribute 'items'
How can I access the file and save it to csv? I need the data that is contained there. I am using pandas 1.2.4 and python 3.6.

The syntax should be simpler than in your example
with open("airline.pickle", "rb") as f:
objects = pickle.load(f)
If this fails, then I would look at the pickle documentation which covers some of the optional parameters that are useful for decoding pickle files created by python2.

As mentioned in a previous answer, the error TypeError: _reconstruct: First argument must be a sub-type of ndarray is due to a change from pandas version 0.14 to 0.15 (Source). The documentation said that pd.read_pickle would be able to load such old pickle files, but this is not working on recent versions. If you install an older version, I tested 0.17.1 which can be obtained in pypi or conda-forge, it can load that pickle file successfully.
If you are using conda, the following should work:
conda create -n old_pandas -c conda-forge pandas=0.17.* python=3.*
conda activate old_pandas
And then, in a Python prompt,
import pandas as pd
dataset = pd.read_pickle("airline.pickle")

MetPy Geocolor Satellite Tutorial Breakage

I am encountering an error in MetPy when following the geocolor satellite imagery tutorial. Specifically, the section entitled "Plot with Cartopy Geostationary Projection". This breakage occurred roughly two weeks ago and functionality has yet to return. Consider the following code:
from xarray import open_dataset
import metpy
data_dir = '.'
color_file = 'OR_ABI-L1b-RadC-M3C01_G16_s20180152002235_e20180152005008_c20180152005054.nc'
c = open_dataset('/'.join([data_dir,color_file]))
dat = c.metpy.parse_cf('Rad')
This block is functionally similar to that provided in the MetPy geocolor satellite tutorial. It worked fine until recently. Now the following error occurs:
Traceback (most recent call last):
File "<stdin>", line 1, in module
File "/usr/local/anaconda3/lib/python3.7/site-packages/metpy/xarray.py", line 191, in parse_cf
from .plots.mapping import CFProjection
File "/usr/local/anaconda3/lib/python3.7/site-packages/metpy/plots/__init__.py", line 13, in module
from .skewt import * # noqa: F403
File "/usr/local/anaconda3/lib/python3.7/site-packages/metpy/plots/skewt.py", line 28, in module
from ..calc import dewpoint, dry_lapse, moist_lapse, vapor_pressure
File "/usr/local/anaconda3/lib/python3.7/site-packages/metpy/calc/__init__.py", line 7, in module
from .cross_sections import * # noqa: F403
File "/usr/local/anaconda3/lib/python3.7/site-packages/metpy/calc/cross_sections.py", line 14, in module
from .tools import first_derivative
File "/usr/local/anaconda3/lib/python3.7/site-packages/metpy/calc/tools.py", line 101, in module
def find_intersections(x, a, b, direction='all'):
File "/usr/local/anaconda3/lib/python3.7/site-packages/pint/registry_helpers.py", line 248, in decorator
% (func.__name__, count_params, len(args))
TypeError: find_intersections takes 4 parameters, but 3 units were passed
What seems to be the problem here? Is there a workaround available?

I think it's an incompatibility between your installed versions of MetPy and Pint. Try making sure you're running the latest versions of those two with:
conda update metpy pint
I should note that MetPy 0.12.0 (currently the latest) is incompatible with xarray 0.15.1. As of this writing, if the above command updates xarray, you'll need to roll it back slightly with:
conda install xarray=0.15.0
We're working on a bugfix release to address this.

Pandas ExcelWriter yields AttributeError

While trying to create an xlsx file with pandas, I receive the following error:
Traceback (most recent call last):
File "<ipython-input-24-201aac2da411>", line 1, in <module>
writer = pd.ExcelWriter('test_file2.xlsx', engine='xlsxwriter', options={'constant_memory': True})
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\excel.py", line 1945, in __init__
self.book = xlsxwriter.Workbook(path, **engine_kwargs)
AttributeError: module 'xlsxwriter' has no attribute 'Workbook'
My code :
import pandas pd
writer = pd.ExcelWriter('test_file.xlsx', engine='xlsxwriter', options={'constant_memory': True})
Versions
Python 3.7.3
Pandas 0.24.2

I resolved the issue. Previously, I had downloaded the xlsxwriter package and added it to my python pathmanager in Spyder. This conflicted with pandas. Deleting the path and xlsxwriter resolved the issue.

How to import time column from snowflake to jupyter notebook dataframe?

I need to import data from snowflake to Jupyter. In the dataset I have a time column which is derived from timestamp values.
Every time I try to import the data, Jupyter says the process failed and below is the error message.
How should I get around this issue?
ERROR:snowflake.connector.converter:Failed to convert: field T: TIME::76493.000000000
Traceback (most recent call last):
File "/usr/local/lib/python2.7/site-packages/snowflake/connector/converter.py", line 88, in to_python
type_name=type_name))
AttributeError: 'SnowflakeConverter' object has no attribute '_TIME_to_python'
ERROR:snowflake.connector.cursor:failed to convert row to python
Traceback (most recent call last):
File "/usr/local/lib/python2.7/site-packages/snowflake/connector/cursor.py", line 658, in __row_to_python
res += (self._connection.converter.to_python(col_desc, col),)
File "/usr/local/lib/python2.7/site-packages/snowflake/connector/converter.py", line 88, in to_python
type_name=type_name))
AttributeError: 'SnowflakeConverter' object has no attribute '_TIME_to_python'
ERROR: An unexpected error occurred while tokenizing input
The following traceback may be corrupted or invalid
The error message is: ('EOF in multi-line string', (1, 0))

Can you check the Python Connector version? The error indicates TIME data type is not supported by Python Connector. TIME data type has been supported since v1.0.6. As of today, the latest version is 1.2.8:
https://pypi.python.org/pypi/snowflake-connector-python/
Here is an example of TIME data type in Jupyter notebook:
https://gist.github.com/smtakeda/e401c80d71f2da4aa7452d238c5ccffa

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

splitting pandas dataframe - np.array_split error - python

Related

ERROR - secure_channel() got an unexpected keyword argument 'default_scopes' with .to_dataframe() on bigquery object

Trouble opening old pickle file

MetPy Geocolor Satellite Tutorial Breakage

Pandas ExcelWriter yields AttributeError

How to import time column from snowflake to jupyter notebook dataframe?

Categories

Resources