Why am I getting a 'hashable' error when combining two dataframes? - python

I have two DataFrames and I'm attempting to combine them as follows:
df3 = df1.combine(df2, np.mean)
However, I'm getting the following error:
TypeError: 'Series' objects are mutable, thus they cannot be hashed.
I'm not sure I understand why I'm getting the message as by definition DataFrames are mutable?
I don't get an error if I switch to:
df3 = df1.combine(df2, np.minimum)
Is this something to do with me having NaN values in the two DataFrames? If it is then what would be the solution? Devise my own function to replicate np.mean?
Updated:
I just found np.nanmean but that gives the following error:
TypeError: 'Series' object cannot be interpreted as an integer

np.mean takes one positional argument as the input array. So you cannot and should not do
np.mean(series1, series2)
Since the command above will interpret series2 as the second argument for np.mean, which is axis. But this argument is an integer and python try to convert series2 into one, which triggers the error.
In stead, you should do this for mean:
np.mean([series1, series2])
In the other case, np.minimum is designed to do:
np.minimum(series1, series2)
and gives the minimum element-wise as expected.
TLDR For mean, you can just do:
df = (df + df2)/2

Related

use a list plus some strings to select columns from a dataframe

I am trying to make a dynamic list and then combine it with a fixed string to select columns from a dataframe:
import pandas as pd
df = pd.DataFrame([], columns=['c1','c2','c3','c4'])
column_list= ['c2','c3']
df2 = df[['c1',column_list]]
but I get the following error:
TypeError: unhashable type: 'list'
I tried a dict as well but that is similar error.
In your code, pandas tries to find the column ['c1','c2','c3','c4'], which is not possible as only hashable objects can be column names. Even if this wasn't triggering an error (e.g. if you used tuples), this wouldn't give you what you want. You need a 1D list.
Use expansion:
df[['c1', *column_list]]
Or addition:
df[['c1']+column_list]
Output:
Empty DataFrame
Columns: [c1, c2, c3]
Index: []

TypeError: unhashable type: 'list'. in pandas with Groupby or PivotTable

first at all I want to specify that my question is very similar to others questions done before, but I tried their answers and nothing worked for me.
I'm trying to aggregate some info using more than one variable to group. I can use pivot table or groupby, both are fine for this, but I get the same error all the time.
My code is:
import numpy as np
vars_agrup = ['num_vars', 'list_vars', 'Modelo']
metricas = ["MAE", "MAE_perc", "MSE", "R2"]
g = pd.pivot_table(df, index=vars_agrup, aggfunc=np.sum, values=metricas)
or
df.groupby(vars_agrup, as_index=False).agg(Count=('MAE','sum'))
Also, I tried to use () instead of [] to avoid make it a list, but then the program search a column called "'num_vars', 'list_vars', 'Modelo'" which doesn't exist. I tried ([]) and [()]. Index instead of columns. It's always the same: for one variable to group it's fine, for multiples I get the error: TypeError: unhashable type: 'list'
For sure, all these variables are columns in df.
Edit: My df looks like this:

Getting "'int' object is not subscriptable" error while apply a method to a pandas data series

I have a stocks_df data frame that looks like that one in the picture. When I apply the lambda as in the picture it doesn't throw any errors.
However when I do
list = pandas.Series([1,2,3,4,5])
new_list = list.apply(lambda x: x/x[0])
It gives me "'int' object is not subscriptable" error. Is there any difference between the two? What am I doing wrong here?
For a series, apply operates element wise. To reference the first element of the series, you need to use list[0] instead of x[0]:
new_list = list.apply(lambda x: x/list[0])
For a DataFrame, apply by default operates column wise, that's why x/x[0] works.
To use the same syntax, you could use:
new_list = list.to_frame().apply(lambda x: x/x[0])
By the way, using built-in type name (list) as variable name is not a good idea.

Trouble passing in lambda to apply for pandas DataFrame

I'm trying to apply a function to all rows of a pandas DataFrame (actually just one column in that DataFrame)
I'm sure this is a syntax error but I'm know sure what I'm doing wrong
df['col'].apply(lambda x, y:(x - y).total_seconds(), args=[d1], axis=1)
The col column contains a bunch a datetime.datetime objects and d1 is the earliest of them. I'm trying to get a column of the total number of seconds for each of the rows
EDIT I keep getting the following error
TypeError: <lambda>() got an unexpected keyword argument 'axis'
I don't understand why axis is getting passed to my lambda function
EDIT 2
I've also tried doing
def diff_dates(d1, d2):
return (d1-d2).total_seconds()
df['col'].apply(diff_dates, args=[d1], axis=1)
And I get the same error
Note there is no axis param for a Series.apply call, as distinct to a DataFrame.apply call.
Series.apply(func, convert_dtype=True, args=(), **kwds)
func : function
convert_dtype : boolean, default True
Try to find better dtype for elementwise function results. If False, leave as dtype=object
args : tuple
Positional arguments to pass to function in addition to the value
There is one for a df but it's unclear how you're expecting this to work when you're calling it on a series but you're expecting it to work on a row?

Resolving Reindexing only valid with uniquely valued Index objects

I have viewed many of the questions that come up with this error. I am running pandas '0.10.1'
df = DataFrame({'A' : np.random.randn(5),
'B' : np.random.randn(5),'C' : np.random.randn(5),
'D':['a','b','c','d','e'] })
#gives error
df.take([2,0,1,2,3], axis=1).drop(['C'],axis=1)
#works fine
df.take([2,0,1,2,1], axis=1).drop(['C'],axis=1)
Only thing I can see is that in the former case I have the non-numeric column, which seems to be affecting the index somehow but the below command returns empty:
df.take([2,0,1,2,3], axis=1).index.get_duplicates()
Reindexing error makes no sense does not seem to apply as my old index is unique.
My index appears unique as far as I can tell using this command df.take([2,0,1,2,3], axis=1).index.get_duplicates() from this Q&A: problems with reindexing dataframes: Reindexing only valid with uniquely valued Index objects
"Reindexing only valid with uniquely valued Index objects" does not seem to apply
I think my pandas version# is ok so this should bug should not be the problem pandas Reindexing only valid with uniquely valued Index objects
Firstly, I believe you meant to test for duplicates using the following command:
df.take([2,0,1,2,3],axis=1).columns.get_duplicates()
because if you used index instead of columns, then it would obviously returned an empty array because the random float values don't repeat. The above command returns, as expected:
['C']
Secondly, I think you're right, the non-numeric column is throwing it off, because even if you use the following, there is still an error:
df = DataFrame({'A' : np.random.randn(5), 'B' : np.random.randn(5),'C' :np.random.randn(5), 'D':[str(x) for x in np.random.randn(5) ]})
It could be a bug, because if you check out the core file called 'index.py', on line 86, and line 1228, the type it is expecting is either (respectively):
_engine_type = _index.ObjectEngine
_engine_type = _index.Int64Engine
and neither of those seem to be expecting a string, if you look deeper into the documentation. That's the best I got, good luck!! Let me know if you solve this as I'm interested too.

Categories

Resources