I am trying to rename unnamed columns in my data frame.
Value of the 1st row in this column is expected to become a name of that column. If a column doesn't contain Unnamed, its name should remain unchanged.
I try to achieve it this way:
for col in columns:
if 'Unnamed' in col:
df = df.rename(columns=df.iloc[0])
break
In this case each column is renamed. Any ideas what am I doing wrong?
Use Index.where with str.contains, it replace if False, so inverted mask by ~:
df = pd.DataFrame({'Unnamed 1':['a', 2], 'col':['b',8]})
df.columns = df.columns.where(~df.columns.str.contains('Unnamed'), df.iloc[0])
print (df)
a col
0 a b
1 2 8
Your solution is possible change by loop Series with first row:
new = []
for col, v in df.iloc[0].items():
if 'Unnamed' in col:
new.append(v)
else:
new.append(col)
df.columns = new
Same in list comprehension:
df.columns = [v if 'Unnamed' in col else col for col, v in df.iloc[0].items()]
You can rename unamed columns as shown below
df = df.rename({'Unnamed: 0':'NewName1'})
If multiple unnamed columns then use index based on occurance of unnamed column
df = df.rename({'Unnamed: 0':'NewName1','Unnamed: 1':'NewName2'})
Related
I have a dataframe as a result of a pivot which has several thousand columns (representing time-boxed attributes). Below is a much shortened version for resemblance.
d = {'incount - 14:00': [1,'NaN', 1,1,'NaN','NaN','NaN','NaN',1],
'incount - 15:00': [2,1,2,'NaN','NaN','NaN',1,4,'NaN'],
'outcount - 14:00':[2,'NaN',1,1,1,1,2,2,1]
'outcount - 15:00':[2,2,1,1,'NaN',2,'NaN',1,1]}
df = pd.DataFrame(data=d)
I want to replace the NaNs in columns that contain "incount" with 0 (leaving other columns untouched). I have tried the following but predictably it does not recognise the column name.
df['incount'] = df_all['incount'].fillna(0)
I need the ability to search the column names and only impact those containing a defined string.
try this:
m = df.columns[df.columns.str.startswith('incount')]
df.loc[:, m] = df.loc[:, m].fillna(0)
print(df)
you can use:
loop_cols = list(df.columns[df.columns.str.contains('incount',na=False)]) #get columns containing incount as a list
#or
#loop_cols = [col for col in df.columns if 'incount' in col]
print(loop_cols)
'''
['incount - 14:00', 'incount - 15:00']
'''
for i in loop_cols:
df[i]=df[i].fillna(0)
I'm trying to append a list of values ('A') from a separate df to the bottom of my output (finalDf) where the values are always the same and don't need to be in order.
Heres what i have tried so far:
temp1 = pd.DataFrame(df['A'].append(df1['A'], ignore_index = True))
temp2 = pd.DataFrame(df['B'].append(df1['B'], ignore_index = True))
print(df.shape)
print(temp1.shape)
print(temp2.shape)
shape output (example from my code with + 28 values from df1):
(11641, 6)
(11669, 1)
(11669, 1)
Where appending the values seems to work based on the shape of temp1 but I cant seem to apply the values from both Col 'A' and Col 'B' to the bottom of col 'A' in dfFinal together - it's always either col 'A' or col 'B' from df1 never both in df
TLDR; How can I best take the values from col 'A' and Col 'B' in df1 and append them to Col 'A' and Col 'B' in df to make dfFinal which I can then export to csv ?
This can be done with the concat function along axis=0 i.e. it will join the data frames provided along rows. In layman terms, it will join the 2nd data frame below the 1st. Keep in mind that the number of columns should be the same in both the data frames.
df.concat([temp1, temp2], axis=0, ignore_index=True)
Over here, ignore_index ignores the new indexes that will be formed by concatenations and instead creates a new one from 0 to 'n-1'.
For more information: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.concat.html
I have a dataframe and I have a list of some column names that correspond to the dataframe. How do I filter the dataframe so that it != the list of column names, i.e. I want the dataframe columns that are outside the specified list.
I tried the following:
quant_vair = X != true_binary_cols
but get the output error of: Unable to coerce to Series, length must be 545: given 155
Been battling for hours, any help will be appreciated.
It will help:
df.drop(columns = ["col1", "col2"])
You can either drop the columns from the dataframe, or create a list that does not contain all these columns:
df_filtered = df.drop(columns=true_binary_cols)
Or:
filtered_col = [col for col in df if col not in true_binary_cols]
df_filtered = df[filtered_col]
I'm working with a dataframe. If the column in the dataframe has a certain percentage of blanks I want to append that column into a dictionary (and eventually turn that dictionary into a new dataframe).
features = {}
percent_is_blank = 0.4
for column in df:
x = df[column].isna().mean()
if x < percent_is_blank:
features[column] = ??
new_df = pd.DataFrame.from_dict([features], columns=features.keys())
What would go in the "??"
I think better is filtering with DataFrame.loc:
new_df = df.loc[:, df.isna().mean() < percent_is_blank]
In your solution is possible use:
for column in df:
x = df[column].isna().mean()
if x < percent_is_blank:
features[column] = df[column]
I want to delete columns that start the particular "TYPE" word and do not contain _1?
df =
TYPE_1 TYPE_2 TYPE_3 COL1
aaa asb bbb 123
The result should be:
df =
TYPE_1 COL1
aaa 123
Currently I am deleting these columns manually, however this approach is not very efficient if the number of columns is big:
df = df.drop(["TYPE_2","TYPE_3"], axis=1)
A list comprehension can be used. Note: axis=1 denotes that we are referring to the column and inplace=True can also be used as per pandas.DataFrame.drop docs.
droplist = [i for i in df.columns if i.startswith('TYPE') and '_1' not in i]
df1.drop(droplist,axis=1,inplace=True)
This is the fifth answer but I wanted to showcase the power of the filter dataframe method which filters by column names with regex. This searches for columns that don't start with TYPE or have _1 somewhere in them.
df.filter(regex='^(?!TYPE)|_1')
Easy:
unwanted = [column for column in df.columns
if column.startswith("TYPE") and "_1" not in column]
df = df.drop(unwanted)
t_cols = [c for c in df.columns.values if c.startswith('TYPE_') and not c == 'TYPE_1']
df.drop(t_cols)
Should do the job