Pandas DataFrame Add column to index without resetting - python

how do I add 'd' to the index below without having to reset it first?
from pandas import DataFrame
df = DataFrame( {'a': range(6), 'b': range(6), 'c': range(6)} )
df.set_index(['a','b'], inplace=True)
df['d'] = range(6)
# how do I set index to 'a b d' without having to reset it first?
df.reset_index(['a','b','d'], inplace=True)
df.set_index(['a','b','d'], inplace=True)
df

We added an append option to set_index. Try that.
The command is:
df.set_index(['d'], append=True)
(we don't need to specify ['a', 'b'], as they already are in the index and we're appending to them)

Your code is not valid, reset_index has no inplace argument in my version of pandas (0.8.1).
The following achieves what you want but there's probably a more elegant way, but you've not provided enough information as to why you are avoiding the reset_index.
df2.index = MultiIndex.from_tuples([(x,y,df2['d'].values[i]) for i,(x,y) in enumerate(df2.index.values)])
HTH

Related

how to append a dataframe without overwriting existing dataframe using for loop in python

i have an empty dataframe[] and want to append additional dataframes using for loop without overwriting existing dataframes, regular append method is overwriting the existing dataframe and showing only the last appended dataframe in output.
use concat() from the pandas module.
import pandas as pd
df_new = pd.concat([df_empty, df_additional])
read more about it in the pandas Docs.
regarding the question in the comment...
df = pd.DataFrame(insert columns which your to-be-appended-df has too)
for i in range(10):
function_to_get_df_new()
df = pd.concat([df, df_new])
Let you have list of dataframes list_of_df = [df1, df2, df3].
You have empty dataframe df = pd.Dataframe()
If you want to append all dataframes in list into that empty dataframe df:
for i in list_of_df:
df = df.append(i)
Above loop will not change df1, df2, df3. But df will change.
Note that doing df.append(df1) will not change df, unless you assign it back to df so that df = df.append(df1)
You can't also use set:
df_new = pd.concat({df_empty, df_additional})
Because pandas.DataFrame objects can't be hashed, set needs hashed so that's why
Or tuple:
df_new = pd.concat((df_empty, df_additional))
They are little quicker...
Update for for loop:
df = pd.DataFrame(data)
for i in range(your number):
df_new=function_to_get_df_new()
df = pd.concat({df, df_new}) # or tuple: df = pd.concat((df, df_new))
The question is already well answered, my 5cts are the suggestion to use ignore_index=True option to get a continuous new index, not duplicate the older ones.
import pandas as pd
df_to_append = pd.DataFrame([[1, 2], [3, 4]], columns=list('AB')) # sample
df = pd.DataFrame() # this is a placeholder for the destination
for i in range(3):
df = df.append(df_to_append, ignore_index=True)
I don't think you need to use for loop here, try concat()
import pandas
result = pandas.concat([emptydf,additionaldf])
pandas.concat documentation

Pandas, how to reindex a dataframe that is generated from appending multiple dataframe.

I have a dataframe that is generated from appending multiple dataframe together into a long list. As shown in figure, the default index is a loop between 0 ~ 7 because each original df has this index. The total row number is 240. So how can reindex the new df into 0~239 instead of 30 x 0~7.
I tried df.reset_index(drop=True), but it doesn't seem to work. I also tried:df.reindex(np.arange(240)) but it returned error
ValueError: cannot reindex from a duplicate axis
It seems you forget assign output, because by default reset_index does not work inplace:
df = df.reset_index(drop=True)
Or:
df.reset_index(drop=True, inplace=True)
But better solution is (if use concat) add parameter ignore_index=True:
df = pd.concat([df1, df2, ..., df7], ignore_index=True)
You could change your append() method to ignore index:
df1.append(df2, ignore_index=True)

How to change column names in pandas Dataframe using a list of names?

I have been trying to change the column names of a pandas dataframe using a list of names. The following code is being used:
df.rename(columns = list_of_names, inplace=True)
However I got a Type Error each time, with an error message that says "list object is not callable".
I would like to know why does this happen? And What can I do to solve this problem?
Thank you for your help.
you could use
df.columns = ['Leader', 'Time', 'Score']
If you need rename (l is your list of name need to change to)
df.rename(columns=dict(zip(df.columns,l)))
Just update the columns attribute:
df.columns = list_of_names
set_axis
To set column names, use set_axis along axis=1 or axis='columns':
df = df.set_axis(list_of_names, axis=1)
Note that the default axis=0 sets index names.
Why not just modify df.columns directly?
The accepted answer is fine and is used often, but set_axis has some advantages:
set_axis allows method chaining:
df.some_method().set_axis(list_of_names, axis=1).another_method()
vs:
df = df.some_method()
df.columns = list_of_names
df.another_method()
set_axis should theoretically provide better error checking than directly modifying an attribute, though I can't find a specific example at the moment.
if your list is : column_list so column_list is ['a', 'b', 'c']
and original df.columns is ['X', 'Y', 'Z']
you just need: df.columns = column_list

how to reset index pandas dataframe after dropna() pandas dataframe

I'm not sure how to reset index after dropna(). I have
df_all = df_all.dropna()
df_all.reset_index(drop=True)
but after running my code, row index skips steps. For example, it becomes 0,1,2,4,...
The code you've posted already does what you want, but does not do it "in place." Try adding inplace=True to reset_index() or else reassigning the result to df_all. Note that you can also use inplace=True with dropna(), so:
df_all.dropna(inplace=True)
df_all.reset_index(drop=True, inplace=True)
Does it all in place. Or,
df_all = df_all.dropna()
df_all = df_all.reset_index(drop=True)
to reassign df_all.
You can chain methods and write it as a one-liner:
df = df.dropna().reset_index(drop=True)
You can reset the index to default using set_axis() as well.
df.dropna(inplace=True)
df.set_axis(range(len(df)), inplace=True)
set_axis() is especially useful, if you want to reset the index to something other than the default because as long as the lengths match, you can change the index to literally anything with it. For example, you can change it to first row, second row etc.
df = df.dropna()
df = df.set_axis(['first row', 'second row'])

Pandas dataframe column selection

I am using Pandas to select columns from a dataframe, olddf. Let's say the variable names are 'a', 'b','c', 'starswith1', 'startswith2', 'startswith3',...,'startswith10'.
My approach was to create a list of all variables with a common starting value.
filter_col = [col for col in list(health) if col.startswith('startswith')]
I'd like to then select columns within that list as well as others, by name, so I don't have to type them all out. However, this doesn't work:
newdf = olddf['a','b',filter_col]
And this doesn't either:
newdf = olddf[['a','b'],filter_col]
I'm a newbie so this is probably pretty simple. Is the reason this doesn't work because I'm mixing a list improperly?
Thanks.
Use
newdf = olddf[['a','b']+filter_col]
since adding lists concatenates them:
In [264]: ['a', 'b'] + ['startswith1']
Out[264]: ['a', 'b', 'startswith1']
Alternatively, you could use the filter method:
newdf = olddf.filter(regex=r'^(startswith|[ab])')

Categories

Resources