Unfortunately I am unable to produce a replicable example, but here's the issue I'm running into - with one dataframe, I am able to loop through the columns and save the count of unique values per column. With another dataframe, which has the same exact columns and data as the first dataframe - the only difference being that the second dataframe is all object dtypes, while the first has some ints and floats - i run into a 'unhashable type: 'dict'' error.
this works:
for col in olddf.columns:
unique = len(olddf[col].unique())
print(col, unique)
i get an unhashable type: 'dict' error with this:
for col in orig_results.columns:
unique = len(orig_results[col].unique())
Like I mentioned, unfortunately I'm unable to come up with a sample dataset to replicate. Wondering if anyone by any chance has a general idea of what might be happening? Thanks!
Turns out it was the location column throwing an error, which contains dictionaries as values: {'latitude': '40.7388739110531', 'longitude': '40.738873911'}. since dictionaries are unhashable, we can't get a unique count.
Related
I have a data frame as shown below.I need to select all rows from the data frame where column values are less than or equal to 0.3768. But I am getting an error.
Below is my dataframe.
I need to select all rows where column value is less than or equal to 0.3768.
Below is my code
for column in df_thd_funct_mode1_T:
new_df = df_thd_funct_mode1_T.loc[(np.isclose(df_thd_funct_mode1_T[column] <= .3768))]
I am getting an error as shown below.
TypeError: '<=' not supported between instances of 'str' and 'float'
May I know how to solve this issue
I run the script provided by Christian.But I am getting values greater than .3768 in my new data frame.
You could try this here:
new_df = df_thd_funct_mode1_T[df_thd_funct_mode1_T.apply(lambda row: all(float(column) <= .3768 for column in row), axis=1)]
Here you do not need the loop any more that you had in your example.
I basically go through the dataframe there and for each row I check, whether all values are less than .3768.
If you want to filter such, that you accept the row as soon as there is any value less than .3768 in that row, you have to replace all with any.
This of course will only work under the condition, that all columns only contain floats. If not, then you will run into an Error, trying to cast that into a float.
If you want to check multiple columns you have to add to condition every column check. Every condition should be like:
df.loc[df['column_name'] <= 0.3768 ]
Check these examples it should be helpful to your question: https://www.statology.org/pandas-select-rows-based-on-column-values/
I have a dataframe, and I'm trying to encode all the categorical values within the dataframe. So the following is the code I wrote to encode all categorical columns in one go,
for col in data.select_dtypes('object').columns:
data[col] = data[col].astype('category').cat.codes
but this works only sometimes and often throws the following error saying "Dataframe has no attributed as cat"
AttributeError: 'DataFrame' object has no attribute 'cat'
Now I'm not able to understand how it works sometimes and fails other times. Also I haven't applied the cat method to the whole dataframe but to a column (series) each time going though the loop.
Does anyone know what's going wrong here?
Problem is there are duplicated columns names, so if select one column get all columns with same label.
for col in data.select_dtypes('object').columns:
print (col)
#check what select
print (data[col] )
data[col] = data[col].astype('category').cat.codes
Goal: To update a cell value of a pandas dataframe with a list of lists of lists
Problem: When attempting to do this, I get an error of "Must have equal len keys and value when setting with an iterable"
Overview: I have a pandas dataframe, to which I am wanting to update a cell value. The cell value to input is a list, of lists of lists. The dataframe is already storing a cell value that is such, but this value needs to be updated to a new list of lists of lists. However, when I attempt to make the cell update, I receive the error noted above. I cannot reproduce this issue is some simplified code, so I wonder if it has to do with how long the list is - 40+ lists within lists.
My code to perform this action looks something like this, though as I said, this will not reproduce the issue, as I suspect it has to do with the length of the list:
import pandas
df = pandas.DataFrame(data={'First':['Bob','Bobby'], 'Last': ['Bobberson', 'Bobson']})
for i in df.index:
df.loc[i, 'First'] = [[[1,3], [4,5]],[[2,3],[4,5]]]
Try replacing .loc with .at:
for i in df.index:
df.at[i, 'First'] = [[[1,3], [4,5]],[[2,3],[4,5]]]
See more details about why .loc has a problem here
I am trying to sort rows within column sample.single by excluding ('./.'). The all data types are objects. I have attempted the options below. I suspect the special character is compromising the second attempt. Dataframe is comprised of 195 columns.
My gtdata columns:
Index(['sample.single', 'sample2.single', 'sample3.single'] dtype='object')
Please advise, thank you!
gtdata = gtdata[('sample.single')!= ('./.') ]
I receive a key error: KeyError: True
When I try:
gtdata = gtdata[gtdata.sample.single != ('./.') ]
I receive an attribute error:
AttributeError: 'DataFrame' object has no attribute 'single'
Not 100% sure what you're trying to achieve, but assuming you have cells containing the "./." string which you want to filter out, the below is one way to do it:
import pandas as pd
# generate some sample data
gtdata = pd.DataFrame({'sample.single': ['blah1', 'blah2', 'blah3./.'],
'sample2.single': ['blah1', 'blah2', 'blah3'],
'sample3.single': ['blah1', 'blah2', 'blah3']})
# filter out all cells in column sample.single containing ./.
gtdata = gtdata[~gtdata['sample.single'].str.contains("./.")]
When subsetting in Pandas you should be passing a boolean vector with the same dimension as the DataFrame.
The problem with your first approach was that ('sample.single')!=('./.') evaluates into a single boolean value as opposed to a boolean vector. You're also comparing two strings, not any column in the DataFrame.
The problem with your second approach is that gtdata.sample.single doesn't make sense in pandas syntax. To get the sample.single column you have to refer to is as gtdata['sample.single']. If your column name did not contain a ".", you could use the shorthand that you were trying to use: e.g. gtdata.sample_single.
I recommend reviewing the documentation for subsetting Pandas DataFrames.
I have a dataframe that has about 20 columns and I am trying to get a subset of the dataframe by selecting only some specific columns about 6. My line of code is:
df3_query = df3[['Cont NUMBER'],['PL NUMBER'],['NAME'],['LOAN COUNT'],['SCORE MINIMUM'],['COUNT PERCENT']]
I am getting an error as
TypeError: unhashable type: 'list'
May I know the reason in which why I get this error? Also I would like to select only those columns from the df3 dataframe. Can anyone help me on this?
You need to write your column names in one list not as list of lists:
df3_query = df3[['Cont NUMBER', 'PL NUMBER', 'NAME', 'LOAN COUNT', 'SCORE MINIMUM', 'COUNT PERCENT']]
From docs:
You can pass a list of columns to [] to select columns in that order.
If a column is not contained in the DataFrame, an exception will be
raised. Multiple columns can also be set in this manner