I can't create a new Dataframe with the columns I want - python

I am trying to extract 3 columns so that I can create a graph later.
newDF = df.loc[filt,['Dublin','Cork','Galway']]
print(newDF)
but unfortunately I get an error :
Passing list-likes to .loc or [] with any missing labels is no longer supported. The following labels were missing: Index(['Dublin'],
dtype='object')
Thank you for your help...

new_df = df[['Dublin', 'Cork', 'Galway']]

Related

Passing list-likes to .loc or [] with any missing labels is no longer supported

I want to create a modified dataframe with the specified columns.
I tried the following but throws the error "Passing list-likes to .loc or [] with any missing labels is no longer supported"
# columns to keep
filtered_columns = ['text', 'agreeCount', 'disagreeCount', 'id', 'user.firstName', 'user.lastName', 'user.gender', 'user.id']
tips_filtered = tips_df.loc[:, filtered_columns]
# display tips
tips_filtered
Thank you
It looks like Pandas has deprecated this method of indexing. According to their docs:
This behavior is deprecated and will show a warning message pointing
to this section. The recommended alternative is to use .reindex()
Using the new recommended method, you can filter your columns using:
tips_filtered = tips_df.reindex(columns = filtered_columns).
NB: To reindex rows, you would use reindex(index = ...) (More information here).
Some of the columns in the list are not included in the dataframe , if you do want do that , let us try reindex
tips_filtered = tips_df.reindex(columns=filtered_columns)
I encountered the same error with missing row index labels rather than columns.
For example, I would have a dataset of products with the following ids: ['a','b','c','d']. I store those products in a dataframe with indices ['a','b','c','d']:
df=pd.DataFrame(['product a','product b','product c', 'product d'],index=['a','b','c','d'])
Now let's assume I have an updated product index:
row_indices=['b','c','d','e'] in which 'e' corresponds to a new product: 'product e'. Note that 'e' was not present in my original index ['a','b','c','d'].
If I try to pass this updated index to my df dataframe: df.loc[row_indices,:],
I'll get this nasty error message:
KeyError: "Passing list-likes to .loc or [] with any missing labels is no longer supported. The following labels were missing: Index(['e'], dtype='object').
To avoid this error I need to do intersection of my updated index with the original index:
df.loc[df.index.intersection(row_indices),:]
this is in line with recommendation of what pandas docs
This error pops up if indexing on something which is not present - reset_index() worked for me as I was indexing on a subset of the actual dataframe with actual indices, in this case the column may not be present in the dataframe.
I had the same issue while trying to create new columns along with existing ones :
df = pd.DataFrame([[1,2,3]], columns=["a","b","c"])
def foobar(a,b):
return a,b
df[["c","d"]] = df.apply(lambda row: foobar(row["a"], row["b"]), axis=1)
The solution was to add result_type="expand" as an argument of apply() :
df[["c","d"]] = df.apply(lambda row: foobar(row["a"], row["b"]), axis=1, result_type="expand")

Sorting by columns after grouping generating error

Could anyone please tell me why sorting is generating an error here? I suspect it is related to indexing but reset_index didnt solve the issue
df['s'] = df.groupby(['ID','Date'],as_index=False)['Text_Data']\
.transform(lambda x : ' '.join(x))\
.sort_values(['ID','Date']) .
KeyError: ('ID', 'Date')
What I was trying to do is to sort the dataframe regardless grouping. In R you would do ungroup() first not sure anything simliar is necessary in Pyhton? Thanks
df.groupby(['ID','Date'],as_index=False)['Text_Data'].transform(lambda x : ' '.join(x))
This above code will give you a Pandas Series which consists of only one column Text_Data. But when you apply sort_values(['ID','Date']), this generates an error because there are no ID and Date Columns present here.
You can separately sort your dataframe and transformed your column into Series. Then, delete that column from dataframe and append the transformed column to it like this,
df = df.sort_values(['ID','Date'])
df['s'] = df.groupby(['ID','Date'],as_index=False)['Text_Data'].transform(lambda x : ' '.join(x))
del df['Text_Data']
df['Text_Data] = df['s'].values

Pivoting python

Quick question:
I have the following situation (table):
Imported data frame
Now what I would like to achieve is the following (or something in those lines, it does not have to be exactly that)
Goal
I do not want the following columns so I drop them
data.drop(data.columns[[0,5,6]], axis=1,inplace=True)
What I assumed is that the following line of code could solve it, but I am missing something?
pivoted = data.pivot(index=["Intentional homicides and other crimes","Unnamed: 2"],columns='Unnamed: 3', values='Unnamed: 4')
produces
ValueError: Length of passed values is 3395, index implies 2
Difference to the 8 question is that I do not want any aggregation functions, just to leave values as is.
Data can be found at: Data
The problem with the method pandas.DataFrame.pivot is that it does not handle duplicate values in the index. One way to solve this is to use the function pandas.pivot_table instead.
df = pd.read_csv('Crimes_UN_data.csv', skiprows=[0], encoding='latin1')
cols = list(df.columns)
cols[1] = 'Region'
df.columns = cols
pivoted = pd.pivot_table(df, values='Value', index=['Region', 'Year'], columns='Series', aggfunc=sum)
It should not sum anything, despite the aggfunc argument, but it was throwing pandas.core.base.DataError: No numeric types to aggregate if the argument was not provided.

Pandas - Function to remove na

trying to do a quick function but struggling since new to Pandas/Python. I'm trying to remove nas from two of my columns, but I keep getting this error, my code is the following:
def remove_na():
df.dropna(subset=['Column 1', 'Column 2'])
df.reset_index(drop=True)
df = remove_rows()
df.head(3)
AttributeError: 'NoneType' object has no attribute 'dropna'
I want to use this function on different tables, hence why I thought it would make sense to create a method. However, I just don't understand why it's not working for this method when compared to others it seems fine. Thank you.
I believe you can specify if you want to remove NA from columns or rows by the paremeter axis where 0 is index and 1 is columns. This would remove all NAs from all columns
df.dropna(axis =1, inplace=True )
I think you can use apply with dropna:
df = df.apply(lambda x: pd.Series(x.dropna().values))
print (df)
OR you can also try this
df=df.dropna(axis=0, how='any')
You're getting an error cos the dropna function here yields a dataframe as its output.
You can either save it to a dataframe:
df = df.dropna(subset=['Column 1', 'Column 2'])
or call the argument 'inplace=True' :
df.dropna(subset=['Column 1', 'Column 2'], inplace=True)
In order to remove all the missing values from the data set at once using pandas you can use the following:(Remember You have to specify the index in the arguments so that you can efficiently remove the missing values)
# making new data frame with dropped NA values
new_data = data.dropna(axis = 0, how ='any')

pandas:drop multiple columns which name in a list and assigned to a new dataframe

I have a dataframe with several columns:
df
pymnt_plan ... settlement_term days
Now I know which columns I Want to delete/drop, based on the following list:
mylist = ['pymnt_plan',
'recoveries',
'collection_recovery_fee',
'policy_code',
'num_tl_120dpd_2m',
'hardship_flag',
'debt_settlement_flag_date',
'settlement_status',
'settlement_date',
'settlement_amount',
'settlement_percentage',
'settlement_term']
How to drop multiple columns which their names in a list and assigned to a new dataframe? In this case:
df2
days
You can do
new_df = df[list]
df = df.drop(columns=list)
In Pandas 0.20.3 using 'df = df.drop(columns=list)' I get:
TypeError: drop() got an unexpected keyword argument 'columns'
So you can use this instead:
df = df.drop(axis=1, labels=list)

Categories

Resources