IndexingError: Too many indexers in a function - python

I am trying to diagnose the following error: IndexingError: Too many indexers.
I following solutions already on stack overflow doesn't resolve my issue.
Here is my simplified code:
import pandas as pd
import numpy as np
def foo(T=None,*c):
A = df.loc[c][0]
B = df.loc[c][1]
C = df.loc[c][2]
return (10**(A-((B)/(T+C))))*760
file = r'C:\Users\ayubi\Documents\Python Files\Chemical Engineering Files\T-xy_Data.xlsx'
df = pd.read_excel(file)
df_adj = df.set_index('Component')
print(df_adj)
compOne = df_adj.loc['Benzene']
foo(25,*compOne)
The reason why i have the function foo arguments in that order is if T is not given, I will be able to use fsolve.
Here is the error:
runcell(0, 'C:/Users/ayubi/Documents/Python Files/Chemical Engineering Files/test_example.py')
A B C
Component
Benzene 6.87987 1196.76 219.161
Toluene 6.95087 1342.31 219.187
Water 8.07131 1730.63 233.426
Traceback (most recent call last):
File "C:\Users\ayubi\Documents\Python Files\Chemical Engineering Files\test_example.py", line 24, in <module>
foo(25,*compOne)
File "C:\Users\ayubi\Documents\Python Files\Chemical Engineering Files\test_example.py", line 12, in foo
A = df.loc[c][0]
File "C:\Users\ayubi\Anaconda3\lib\site-packages\pandas\core\indexing.py", line 873, in __getitem__
return self._getitem_tuple(key)
File "C:\Users\ayubi\Anaconda3\lib\site-packages\pandas\core\indexing.py", line 1049, in _getitem_tuple
self._has_valid_tuple(tup)
File "C:\Users\ayubi\Anaconda3\lib\site-packages\pandas\core\indexing.py", line 700, in _has_valid_tuple
raise IndexingError("Too many indexers")
IndexingError: Too many indexers
The function should be able to take the arguments T, if given, and compOne and grab the value associated with compOne then calculate something.
Thanks for your help in advance.

Related

Using `xarray.apply_ufunc` with `np.linalg.pinv` returns an error with `dask.array`

I get an error when running the following MWE:
import xarray as xr
import numpy as np
from numpy.linalg import pinv
import dask
data = np.random.randn(4, 4, 3, 2)
da = xr.DataArray(data=data, dims=("x", "y", "i", "j"),)
da = da.chunk(x=1, y=1)
da_inv = xr.apply_ufunc(pinv, da,
input_core_dims=[["i", "j"]],
output_core_dims=[["i", "j"]],
exclude_dims=set(("i", "j")),
dask = "parallelized",
)
This throws me this error:
Traceback (most recent call last):
File "/glade/scratch/tomasc/tracer_inversion2/mwe.py", line 14, in <module>
da_inv = xr.apply_ufunc(pinv, da,
File "/glade/u/home/tomasc/miniconda3/envs/py310/lib/python3.10/site-packages/xarray/core/computation.py", line 1204, in apply_ufunc
return apply_dataarray_vfunc(
File "/glade/u/home/tomasc/miniconda3/envs/py310/lib/python3.10/site-packages/xarray/core/computation.py", line 315, in apply_dataarray_vfunc
result_var = func(*data_vars)
File "/glade/u/home/tomasc/miniconda3/envs/py310/lib/python3.10/site-packages/xarray/core/computation.py", line 771, in apply_variable_ufunc
result_data = func(*input_data)
File "/glade/u/home/tomasc/miniconda3/envs/py310/lib/python3.10/site-packages/xarray/core/computation.py", line 747, in func
res = da.apply_gufunc(
File "/glade/u/home/tomasc/miniconda3/envs/py310/lib/python3.10/site-packages/dask/array/gufunc.py", line 489, in apply_gufunc
core_output_shape = tuple(core_shapes[d] for d in ocd)
File "/glade/u/home/tomasc/miniconda3/envs/py310/lib/python3.10/site-packages/dask/array/gufunc.py", line 489, in <genexpr>
core_output_shape = tuple(core_shapes[d] for d in ocd)
KeyError: 'dim0'
Even though when using dask.array.map_blocks directly, things seem to work right out of the box:
data_inv = dask.array.map_blocks(pinv, da.data).compute() # works!
What am I missing here?
(Same question answered on the xarray repository here.)
You were almost there, you just needed to add the sizes of new the output dimensions by including the kwarg
dask_gufunc_kwargs={'output_sizes': {'i': 2, 'j': 3}}
It does sort of say this in the docstring for apply_ufunc but it could definitely be clearer!
That's a very unhelpful error, but it's ultimately being thrown because the keys 'i' and 'j' don't exist in the dict of expected sizes of the output (because you didn't provide them).
The actual error message has been improved in xarray version v2023.2.0.

TypeError: '(slice(0, 15, None), 15)' is an invalid key

I have a code in Python that looks something like the code pasted below. For context, the all csv files print [15 rows x 16 columns], I just changed the name for privacy purposes.
import numpy as np
import pandas as pd
C = pd.read_csv('/Users/name/Desktop/filename1.csv')
Chome = pd.read_csv('/Users/name/Desktop/filename2.csv')
Cwork = pd.read_csv('/Users/name/Desktop/filename3.csv')
Cschool = pd.read_csv('/Users/name/Desktop/filename4.csv')
Cother = pd.read_csv('/Users/name/Desktop/filename5.csv')
Cf = np.zeros([17,17])
Cf = C
Cf[0:15,16] = C[0:15,15]
Cf[16,0:15] = C[15,0:15]
Cf[16,16] = C[15,15]
print(Cf)
When I run the code I get the following error:
runfile('/Users/name/.spyder-py3/untitled12.py', wdir='/Users/name/.spyder-py3')
Traceback (most recent call last):
File "/Users/name/.spyder-py3/untitled12.py", line 23, in <module>
Cf[0:15,16] = C[0:15,15]
File "/opt/anaconda3/lib/python3.8/site-packages/pandas/core/frame.py", line 2800, in __getitem__
indexer = self.columns.get_loc(key)
File "/opt/anaconda3/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 2646, in get_loc
return self._engine.get_loc(key)
File "pandas/_libs/index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 116, in pandas._libs.index.IndexEngine.get_loc
TypeError: '(slice(0, 15, None), 15)' is an invalid key
I am not exactly sure what this error means. I am pretty new to python, so debugging is a skill I am trying to better understand. So any advice on what I can do to fix this error, or what it means would be helpful. Thank you.
Note the following sequence in your code sample:
C = pd.read_csv(...)
... # Other cases of pd.read_csv
Cf = np.zeros([17,17])
So, at least till now, C is a DataFrame and Cf is a Numpy array.
Then Cf = C is probably a logical error, since it overwrites
the Numpy array (full of zeroes) with another reference to C.
And now as the offending instruction (Cf[0:15,16] = C[0:15,15]) is concerned:
Note that C[0:15,15] is wrong (run this code on your own to see it).
In case of pandasonic DataFrames you can use "positional addressing",
including slices, using iloc.
On the other hand, this notation is allowed for Numpy arrays.
So, assuming that Cf = C is not needed and Cf should remain a
Numpy array, you probably should correct this instruction to:
Cf[0:15,16] = C.iloc[0:15,15]
And make analogous corrections in remaining instructions in your code.
Edit
Another option is to refer to the underlying Numpy array in C DataFrame,
using values attribute.
In this case you can use Numpythonic addressing style, e.g.:
C.values[0:15,15]
causes no error.

DistributedSampler — Expected a ‘cuda’ device type for generator when generating indices

Performing distributed training, I have the following code like this:
training_sampler = DistributedSampler(training_set, num_replicas=2, rank=0)
training_generator = data.DataLoader(training_set, **params, sampler=training_sampler)
for x, y, z in training_generator: # Error occurs here.
...
Overall, I get the following message:
-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
fn(i, *args)
File "/home/ubuntu/VC/ppg_training_extraction/ppg_training_scripts/train_ASR_trim_scp.py", line 336, in train
for local_batch_src, local_batch_tgt, lengths in dataloaders[phase]:
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 352, in __iter__
return self._get_iterator()
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 294, in _get_iterator
return _MultiProcessingDataLoaderIter(self)
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 827, in __init__
self._reset(loader, first_iter=True)
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 857, in _reset
self._try_put_index()
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1091, in _try_put_index
index = self._next_index()
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 427, in _next_index
return next(self._sampler_iter) # may raise StopIteration
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/torch/utils/data/sampler.py", line 227, in __iter__
for idx in self.sampler:
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/torch/utils/data/distributed.py", line 97, in __iter__
indices = torch.randperm(len(self.dataset), generator=g).tolist() # type: ignore
RuntimeError: Expected a 'cuda' device type for generator but found 'cpu'
Now at that line, I ran the following instructions in pdb:
(Pdb) g = torch.Generator()
(Pdb) g.manual_seed(0)
<torch._C.Generator object at 0x7ff7f8143110>
(Pdb) indices = torch.randperm(4556, generator=g).tolist()
(Pdb) indices = torch.randperm(455604, generator=g).tolist()
*** RuntimeError: Expected a 'cuda' device type for generator but found 'cpu'
Why am I getting the runtime error when the upperbound integer is high, but not when it’s low enough?
Note, I ran on a clean Python session and found
>>> import torch
>>> g = torch.Generator()
>>> g.manual_seed(0)
<torch._C.Generator object at 0x7f9d2dfb39f0>
>>> indices = torch.randperm(455604, generator=g).tolist()
that this worked fine. Is it some configuration in how I’m handling distributed training among multiple GPUs? Any sort of insights would be appreciated!
I just met the same problem using dataloader and I found the following helps without removing torch.set_default_tensor_type('torch.cuda.FloatTensor')
data.DataLoader(..., generator=torch.Generator(device='cuda'))
since I don't want to manually add .to('cuda') for tons of tensors in my code
I had the same issue. Looks like somebody found the source of the problem here:
torch.set_default_tensor_type('torch.cuda.FloatTensor')
I fixed my problem by getting rid of this line in my code and manually using .to(device).

How to Read nrows From Pandas HDF Storage?

What am I trying to do?
pd.read_csv(... nrows=###) can read the top nrows of a file. I'd like to do the same while using pd.read_hdf(...).
What is the problem?
I am confused by the documentation. start and stop look like what I need but when I try it, a ValueError is returned. The second thing I tried was using nrows=10 thinking that it might be an allowable **kwargs. When I do, no errors are thrown but also the full dataset is returned instead of just 10 rows.
Question: How does one correctly read a smaller subset of rows from an HDF file? (edit: without having to read the whole thing into memory first!)
Below is my interactive session:
>>> import pandas as pd
>>> df = pd.read_hdf('storage.h5')
Traceback (most recent call last):
File "<pyshell#1>", line 1, in <module>
df = pd.read_hdf('storage.h5')
File "C:\Python35\lib\site-packages\pandas\io\pytables.py", line 367, in read_hdf
raise ValueError('key must be provided when HDF5 file '
ValueError: key must be provided when HDF5 file contains multiple datasets.
>>> import h5py
>>> f = h5py.File('storage.h5', mode='r')
>>> list(f.keys())[0]
'table'
>>> f.close()
>>> df = pd.read_hdf('storage.h5', key='table', start=0, stop=10)
Traceback (most recent call last):
File "<pyshell#6>", line 1, in <module>
df = pd.read_hdf('storage.h5', key='table', start=0, stop=10)
File "C:\Python35\lib\site-packages\pandas\io\pytables.py", line 370, in read_hdf
return store.select(key, auto_close=auto_close, **kwargs)
File "C:\Python35\lib\site-packages\pandas\io\pytables.py", line 740, in select
return it.get_result()
File "C:\Python35\lib\site-packages\pandas\io\pytables.py", line 1447, in get_result
results = self.func(self.start, self.stop, where)
File "C:\Python35\lib\site-packages\pandas\io\pytables.py", line 733, in func
columns=columns, **kwargs)
File "C:\Python35\lib\site-packages\pandas\io\pytables.py", line 2890, in read
return self.obj_type(BlockManager(blocks, axes))
File "C:\Python35\lib\site-packages\pandas\core\internals.py", line 2795, in __init__
self._verify_integrity()
File "C:\Python35\lib\site-packages\pandas\core\internals.py", line 3006, in _verify_integrity
construction_error(tot_items, block.shape[1:], self.axes)
File "C:\Python35\lib\site-packages\pandas\core\internals.py", line 4280, in construction_error
passed, implied))
ValueError: Shape of passed values is (614, 593430), indices imply (614, 10)
>>> df = pd.read_hdf('storage.h5', key='table', nrows=10)
>>> df.shape
(593430, 614)
Edit:
I just attempted to use where:
mylist = list(range(30))
df = pd.read_hdf('storage.h5', key='table', where='index=mylist')
Received a TypeError indicating a Fixed format store (the default format value of df.to_hdf(...)):
TypeError: cannot pass a where specification when reading from a
Fixed format store. this store must be selected in its entirety
Does this mean I can't select a subset of rows if the format is Fixed format?
I ran into the same problem. I am pretty certain by now that https://github.com/pandas-dev/pandas/issues/11188 tracks this very problem. It is a ticket from 2015 and it contains a repro. Jeff Reback suggested that this is actually a bug, and he even pointed us towards a solution back in 2015. It's just that nobody built that solution yet. I might have a try.
Seems like this now works, at least with pandas 1.0.1. Just provide start and stop arguments:
df = pd.read_hdf('test.h5', '/floats/trajectories', start=0, stop=5)

Python sklearn kaggle/titanic tutorial fails on the last feature scale

I was in the process of working through this tutorial: http://ahmedbesbes.com/how-to-score-08134-in-titanic-kaggle-challenge.html
And it went with no problems, until I got to the last section of the middle section:
As you can see, the features range in different intervals. Let's normalize all of them in the unit interval. All of them except the PassengerId that we'll need for the submission
In [48]:
>>> def scale_all_features():
>>> global combined
>>> features = list(combined.columns)
>>> features.remove('PassengerId')
>>> combined[features] = combined[features].apply(lambda x: x/x.max(), axis=0)
>>> print 'Features scaled successfully !'
In [49]:
>>> scale_all_features()
Features scaled successfully !
and despite typing it word for word in my python script:
#Cell 48
GreatDivide.split()
def scale_all_features():
global combined
features = list(combined.columns)
features.remove('PassengerId')
combined[features] = combined[features].apply(lambda x: x/x.max(), axis=0)
print 'Features scaled successfully !'
#Cell 49
GreatDivide.split()
scale_all_features()
It keeps giving me an error:
--------------------------------------------------48--------------------------------------------------
--------------------------------------------------49--------------------------------------------------
Traceback (most recent call last):
File "KaggleTitanic[2-FE]--[01].py", line 350, in <module>
scale_all_features()
File "KaggleTitanic[2-FE]--[01].py", line 332, in scale_all_features
combined[features] = combined[features].apply(lambda x: x/x.max(), axis=0)
File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 4061, in apply
return self._apply_standard(f, axis, reduce=reduce)
File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 4157, in _apply_standard
results[i] = func(v)
File "KaggleTitanic[2-FE]--[01].py", line 332, in <lambda>
combined[features] = combined[features].apply(lambda x: x/x.max(), axis=0)
File "/usr/local/lib/python2.7/dist-packages/pandas/core/ops.py", line 651, in wrapper
return left._constructor(wrap_results(na_op(lvalues, rvalues)),
File "/usr/local/lib/python2.7/dist-packages/pandas/core/ops.py", line 592, in na_op
result[mask] = op(x[mask], y)
TypeError: ("unsupported operand type(s) for /: 'str' and 'str'", u'occurred at index Ticket')
What's the problem here? All of the previous 49 sections ran with no problem, so if I was getting an error it would have shown by now, right?
You can help insure that the math transformation only occurs on numeric columns with the following.
numeric_cols = combined.columns[combined.dtypes != 'object']
combined.loc[:, numeric_cols] = combined[numeric_cols] / combined[numeric_cols].max()
There is no need for that apply function.

Categories

Resources