How to rename the first column of a pandas dataframe? - python

I have come across this question many a times over internet however not many answers are there except for few of the likes of the following:
Cannot rename the first column in pandas DataFrame
I approached the same using following:
df = df.rename(columns={df.columns[0]: 'Column1'})
Is there a better or cleaner way of doing the rename of the first column of a pandas dataframe? Or any specific column number?

You're already using a cleaner way in pandas.
It is sad that:
df.columns[0] = 'Column1'
Is impossible because Index objects do not support mutable assignments. It would give an TypeError.
You still could do iterable unpacking:
df.columns = ['Column1', *df.columns[1:]]
Or:
df = df.set_axis(['Column1', *df.columns[1:]], axis=1)

Not sure if cleaner, but possible idea is convert to list and set by indexing new value:
df = pd.DataFrame(columns=[4,7,0,2])
arr = df.columns.tolist()
arr[0] = 'Column1'
df.columns = arr
print (df)
Empty DataFrame
Columns: [Column1, 7, 0, 2]
Index: []

Related

Create a dictionary from pandas empty dataframe with only column names

I have a pandas data frame with only two column names( single row, which can be also considered as headers).I want to make a dictionary out of this with the first column being the value and the second column being the key.I already tried the
to.dict() method, but it's not working as it's an empty dataframe.
Example
df=|Land |Norway| to {'Land': Norway}
I can change the pandas data frame to some other type and find my way around it, but this question is mostly to learn the best/different/efficient approach for this problem.
For now I have this as the solution :
dict(zip(a.iloc[0:0,0:1],a.iloc[0:0,1:2]))
Is there any other way to do this?
Here's a simple way convert the columns to a list and a list to a dictionary
def list_to_dict(a):
it = iter(a)
ret_dict = dict(zip(it, it))
return ret_dict
df = pd.DataFrame([], columns=['Land', 'Normway'])
dict_val = list_to_dict(df.columns.to_list())
dict_val # {'Land': 'Normway'}
Very manual solution
df = pd.DataFrame(columns=['Land', 'Norway'])
df = pd.DataFrame({df.columns[0]: df.columns[1]}, index=[0])
If you have any number of columns and you want each sequential pair to have this transformation, try:
df = pd.DataFrame(dict(zip(df.columns[::2], df.columns[1::2])), index=[0])
Note: You will get an error if your DataFrame does not have at least two columns.

What is the correct way to add a list as a column to a dataframe?

I want to add a list as a new column to a dataframe. I am doing:
df['Intervention'] = interventionList
It gives me
SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame
I read Pandas add a series to dataframe column where the accepted answer is very similar.
I believe one option would be to use:
df.assign(Intervention = interventionList)
or to make a copy of the dataframe:
df2 = df.copy()
You can try something like this
import pandas as pd
li = [1,2,3,4,5]
li2 =[6,7,8,9,10]
df = pd.DataFrame()
#Using pd.Series to add lists to dataframe
df['col1'] = pd.Series(li)
df['col2'] = pd.Series(li2)
df

Read dataframe in pandas skipping first column to read time series data

Question is quite self explanatory.Is there any way to read the csv file to read the time series data skipping first column.?
I tried this code:
df = pd.read_csv("occupancyrates.csv", delimiter = ',')
df = df[:,1:]
print(df)
But this is throwing an error:
"TypeError: unhashable type: 'slice'"
If you know the name of the column just do:
df = pd.read_csv("occupancyrates.csv") # no need to use the delimiter = ','
df = df.drop(['your_column_to_drop'], axis=1)
print(df)
df = pd.read_csv("occupancyrates.csv")
df.pop('column_name')
dataframe is like a dictionary, where column names are keys & values are the column items. For Ex
d = dict(a=1,b=2)
d.pop('a')
Now if you print d, the output will be
{'b': 2}
This is what I have done above to remove a column out of data frame.
By doing this way you need not to assign it back to dataframe like other answer(s)
df = df.iloc[:, 1:]
Or you don't even need to specify inplace=True anywhere
The simplest way to delete the first column should be:
del df[df.columns[0]]
or
df.pop(df.columns[0])

how to append a dataframe without overwriting existing dataframe using for loop in python

i have an empty dataframe[] and want to append additional dataframes using for loop without overwriting existing dataframes, regular append method is overwriting the existing dataframe and showing only the last appended dataframe in output.
use concat() from the pandas module.
import pandas as pd
df_new = pd.concat([df_empty, df_additional])
read more about it in the pandas Docs.
regarding the question in the comment...
df = pd.DataFrame(insert columns which your to-be-appended-df has too)
for i in range(10):
function_to_get_df_new()
df = pd.concat([df, df_new])
Let you have list of dataframes list_of_df = [df1, df2, df3].
You have empty dataframe df = pd.Dataframe()
If you want to append all dataframes in list into that empty dataframe df:
for i in list_of_df:
df = df.append(i)
Above loop will not change df1, df2, df3. But df will change.
Note that doing df.append(df1) will not change df, unless you assign it back to df so that df = df.append(df1)
You can't also use set:
df_new = pd.concat({df_empty, df_additional})
Because pandas.DataFrame objects can't be hashed, set needs hashed so that's why
Or tuple:
df_new = pd.concat((df_empty, df_additional))
They are little quicker...
Update for for loop:
df = pd.DataFrame(data)
for i in range(your number):
df_new=function_to_get_df_new()
df = pd.concat({df, df_new}) # or tuple: df = pd.concat((df, df_new))
The question is already well answered, my 5cts are the suggestion to use ignore_index=True option to get a continuous new index, not duplicate the older ones.
import pandas as pd
df_to_append = pd.DataFrame([[1, 2], [3, 4]], columns=list('AB')) # sample
df = pd.DataFrame() # this is a placeholder for the destination
for i in range(3):
df = df.append(df_to_append, ignore_index=True)
I don't think you need to use for loop here, try concat()
import pandas
result = pandas.concat([emptydf,additionaldf])
pandas.concat documentation

Unexpected transformation in pandas DataFrame while editing its copy

I have pandas DataFrame df with different types of columns, some values of df are NaN.
To test some assumption, I create copy of df, and transform copied df to (0, 1) with pandas.isnull():
df_copy = df
for column in df_copy:
df_copy[column] = df_copy[column].isnull().astype(int)
but after that BOTH df and df_copy consist of 0 and 1.
Why this code transforms df to 0, 1 and is there way to prevent it?
You can prevent it declaring:
df_copy = df.copy()
This creates a new object. Prior to that you essentially had two pointers to the same object. You also might want to check this answer and note that DataFrames are mutable.
Btw, you could obtain the desired result simply by:
df_copy = df.isnull().astype(int)
even better memory-wise
for column in df:
df[column + 'flag'] = df[column].isnull().astype(int)

Categories

Resources