Rename several unnamed columns in a pandas dataframe - python

I imported a csv as a dataframe from San Francisco Salaries database from Kaggle
df=pd.read_csv('Salaries.csv')
I created a dataframe as an aggregate function from 'df'
df2=df.groupby(['JobTitle','Year'])[['TotalPay']].median()
Problem 1: The first and second column appear as nameless and that shouldn't happen.
Even when I use code of
df2.columns
It only names TotalPay as a column
Problem 2: I try to rename, for instance, the first column as JobTitle and the code doesn't do anything
df3=df2.rename(columns = {0:'JobTitle'},inplace=True)
So the solution that was given here does not apparently work: Rename unnamed column pandas dataframe.
I wish two possible solutions:
1) That the aggregate function respects the column naming AND/OR
2) Rename the empty dataframe's columns

The problem isn't really that you need to rename the columns.
What do the first few rows of the .csv file that you're importing look at, because you're not importing it properly. Pandas isn't recognising that JobTitle and Year are meant to be column headers. Pandas read_csv() is very flexible with what it will let you do.
If you import the data properly, you won't need to reindex, or relabel.

Quoting answer by MaxU:
df3 = df2.reset_index()
Thank you!

Related

How to add more than one dataframe column in pandas groupby function?

I have written the following codes in three separate cells in my jupyter notebook and have been able to generate the output I want. However, having this information in one dataframe will make it much easier to read.
How can I combine these separate dataframes into one so that the member_casual column is the index with max_ride_length, avg_ride_length and most_active_day_of_week columns next to it in the same dataframe?
Malo is correct. I will expand a little bit because you can also name the columns when they are aggregated:
df.groupby('member_casual').agg(max_ride_length=('ride_length','max'), avg_ride_length=('ride_length','mean'), most_active_day_of_the_week=('day_of_week',pd.Series.mode))
In the doc https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.DataFrameGroupBy.aggregate.html
agg accepts a list a function as in the example:
df.groupby('A').agg(['min', 'max'])

Pandas DataFrame 'Date' Column has no header / dtypes / not listed in df.columns

I'm importing data from nasdaqdatalink api
Two questions from this:
(1) How is this already a Pandas DataFrame without me needing to type df = pd.DataFrame ?
(2) The 'Date' column, doesn't appear to be a DataFrame column? if I try df.columns it doesn't show up in the index and obviously has no header. So I am confused on what's happening here.
Essentially, I wanted to select data from this DataFrame between two dates, but the only way I really know how to do that is by selecting the column name first. However, I'm missing something here. I tried to rename the column in position [0] but that just created a new column named 'Date' with NaN values.
What am I not understanding? (I've only just begun learning Python, Pandas etc. ~1 month ago ! so this is about as far as I could go on my own without more help)
screenshot
There's actually a better way, by keeping Date as the index, see the output of:
df.loc['2008-01-01':'2009-01-01']
df.reset_index() makes whatever the current index is into a column.

Convert rows to columns in Python

I have a excel in below format
Note:- Values in Column Name will be dynamic. In current example 10 records are shown. In another set of data it can be different number of column name.
I want to convert the rows into columns as below
Is there any easy option in python pandas to handle this scenario?
Thanks #juhat for the suggestion on pivot table. I was able to achieve the intended result with this code:
fsdData = pd.read_csv("py_fsd.csv")
fsdData.pivot(index="msg Srl", columns="Column Name", values="Value")

How to rename columns of dataframe that has same column names?

I have dataframes that have the same column names as follows
df1=pd.DataFrame({'Group1':['a','b','c','d','e'],'Group2':["f","g","h","i","j"],'Group3':['k','L','m','n',"0"]})
df2=pd.DataFrame({'Group1':[0,0,2,1,0],'Group2':[1,2,0,0,0],'Group3':[0,0,0,1,1]})
For some reasons, I want to concatenate these dataframe as follows.
dfnew=pd.concat([df1[["Group1","Group2"]], df2[["Group1","Group2"]]], axis=1)
I want to rename the columns of this new dataframe, thus tried below.
dfnew.columns={"1","2","3","4"}
I expected the order of the columns would be 1,2,3,4, but the actual result was 4,3,1,2 instead.
I do not know why this happens.
If someone could advise me, I would appreciate it very much.
In addition, I need to concatenate many dataframes for future work.
(i.e. concatenate df1,df2, df3...df1000).
Is there a good way to rename columns as "1,2,3,4.....1000"? because typing these numbers is lots of work.
Thank you.
To rename columns you can use this syntax:
dfnew.columns=["1","2","3","4"]
In future , if you want to rename 1000 columns as you have asked maybe you can do something like this:
dfnew.columns=[str(i) for i in range(1,1001)]
Use the brackets to ensure that the columns order is preserved
dfnew.columns=["1","2","3","4"]

export table to csv keeping format python

I have a dataframe grouped by 3 variables. It looks like:
https://i.stack.imgur.com/q8W0y.png
When I export the table to csv, the format changes. I want to keep the original
Any ideas?
Thanks!
Pandas to_csv (and csv in general) does not support the MultiIndex used in your data. As such, it just stores the indices "long" (so each level of the MultiIndex would be a column, and each row would have its index value.) I suspect that's what you are calling "format changes".
The upshot is that if you expect to save a pandas dataframe to csv and then reestablish the dataframe from the csv, then you need to re-index the dataframe to the MultiIndex yourself, after importing it.

Categories

Resources