Convert index in column header in python dataframe - python

I am trying to convert python dataframe into column headers. I am using transpose function but results are not as expected. Which function can be used to accomplish the results as given below?
data is:
Year 2020
Month SEPTEMBER
Filed Date 29-11-2020
Year 2022
Month JULY
Filed Date 20-08-2022
Year 2022
Month APRIL
Filed Date 20-05-2022
Year 2017
Month AUGUST
Filed Date 21-09-2017
Year 2018
Month JULY
Filed Date 03-02-2019
Year 2021
Month MAY
Filed Date 22-06-2021
Year 2017
Month DECEMBER
Filed Date 19-01-2018
Year 2018
Month MAY
Filed Date 03-02-2019
Year 2019
Month MARCH
Filed Date 28-09-2019
and convert it into:
Year Month Filed Date
2020 September 29-11-2020
2022 July 20-08-2022

You can do it like this:
df = pd.DataFrame(
[df1.iloc[i:i+3][1].tolist() for i in range(0, len(df1), 3)],
columns=df1.iloc[0:3][0].tolist(),
)
print(df):
Year Month Filed
0 2020 SEPTEMBER Date 29-11-2020
1 2022 JULY Date 20-08-2022
2 2022 APRIL Date 20-05-2022
3 2017 AUGUST Date 21-09-2017
4 2018 JULY Date 03-02-2019
5 2021 MAY Date 22-06-2021
6 2017 DECEMBER Date 19-01-2018
7 2018 MAY Date 03-02-2019
8 2019 MARCH Date 28-09-2019

I have found a solution to my problem. Here df1 is:
Year 2020
Month SEPTEMBER
Filed Date 29-11-2020
Year 2022
Month JULY
Filed Date 20-08-2022
Year 2022
Month APRIL
Filed Date 20-05-2022
Year 2017
Month AUGUST
Filed Date 21-09-2017
Year 2018
Month JULY
Filed Date 03-02-2019
Year 2021
Month MAY
Filed Date 22-06-2021
Year 2017
Month DECEMBER
Filed Date 19-01-2018
Year 2018
Month MAY
Filed Date 03-02-2019
Year 2019
Month MARCH
Filed Date 28-09-2019
I used pivot function and approached the problem like this:
df=pd.DataFrame()
for i in range(0,len(df1),3):
df= df.append(df1.pivot(columns='A', values='B', index=None).bfill(axis = 0).iloc[i])
df.reset_index(drop=True, inplace=True)
print(df)
result:
A Filed Date Month Year
0 29-11-2020 SEPTEMBER 2020
1 20-08-2022 JULY 2022
2 20-05-2022 APRIL 2022
3 21-09-2017 AUGUST 2017
4 03-02-2019 JULY 2018

Related

Python-Pandas-Datetime- How to convert Financial Year and Financial Month to Calendar date

Trying to convert financial year and month to calendar date. I have a dataframe as below. Each ID will have multiple records.
ID Financial_Year Financial_Month
1 2021 1
1 2022 2
2 2021 3
2 2023 1
Trying to convert financial year and month to calendar date. I have a dataframe as below. Each ID will have multiple records.
ID Financial_Year Financial_Month
1 2021 1
1 2022 2
2 2021 3
2 2023 1
Expected output:
Eg:
If the financial year starts form July to June eg: FY 2022 means:
July -2021 - This is 1st month in the financial year,
August- 2021 - This is 2nd month in the financial year
Sep -2021 - This is 3rd month in the financial year
Oct -2021 - This is 4th month in the financial year
Nov 2021 - - This is 5th month in the financial year
Dec 2021- - This is 6th month in the financial year
jan 2022- This is 7th month in the financial year
feb 2022- This is 8th month in the financial year
March 2022- This is 9th month in the financial year
April 2022- This is 10th month in the financial year
May 2022- This is 11th month in the financial year
June 2022- This is 12th month in the financial year
Calendar year:
Jan -1st of the year ,Feb,March,April,May,June,July,Aug,SEp,Oct,Nov,Dec - 12th of the year
Expected output: Convert financial year and Month to Calendar date
ID Financial_Year Financial_Month Calendar_date
1 2021 1 01-07-2021
1 2022 2 01-08-2022
2 2021 3 01-09-2021
2 2023 12 01-06-2023
The datetime and apply functions on a dataframe can get you the desired result:
import pandas as pd
import datetime
def calendar_year(yr, mnth):
mnth = mnth + 6
mnth = mnth % 12
dt = datetime.datetime(yr, mnth, 1).strftime("%d-%m-%Y")
return dt
df["calendar_month"] = df.apply(lambda x: calendar_year(x["Financial_Year"], x["Financial_Month"]), axis = 1)
As the year isn't changing in your example, I have just adjusted the month to reflect the calendar month.
You can adjust the number of months added (currently 6) to adjust to your financial year.

How to get the Australian financial year from a date in a pandas dataframe

I have a pandas dataframe that has a datetime column called date.
How can I create a new column to represent the Australian financial year using the date column?
The Australian financial year starts on 1 July and ends the next year on 30 June.
Example 1: 10 June 2019 is FY 2019
Example 2: 5 July 2019 is FY 2020
The code below creates a new column representing Australian financial year using the existing 'date' column:
df['FY'] = df['date'].map(lambda d: d.year + 1 if d.month > 6 else d.year)

How to add a new column based on different conditions on other columns pandas

This is my dataframe:
Date Month
04/21/2019 April
07/03/2019 July
01/05/2018 January
09/23/2019 September
I want to add a column called fiscal year. A new fiscal year starts on 1st of July every year and ends on the last day of June. So for example if the year is 2019 and month is April, it is still fiscal year 2019. However, if the year is 2019 but month is anything after June, it will be fiscal year 2020. The resulting data frame should look like this:
Date Month FY
04/21/2019 April FY19
07/03/2019 July FY20
01/05/2019 January FY19
09/23/2019 September FY20
How do I achieve this?
One way using pandas.Dateoffset:
df["FY"] = (pd.to_datetime(df["Date"])
+ pd.DateOffset(months=6)).dt.strftime("FY%Y")
print(df)
Output:
Date Month FY
0 04/21/2019 April FY2019
1 07/03/2019 July FY2020
2 01/05/2019 January FY2019
3 09/23/2019 September FY2020
try via pd.PeriodIndex()+pd.to_datetime():
df['Date']=pd.to_datetime(df['Date'])
df['FY']=pd.PeriodIndex(df['Date'],freq='A-JUN').strftime("FY%y")
output:
Date Month FY
0 2019-04-21 April FY19
1 2019-07-03 July FY20
2 2019-01-05 January FY19
3 2019-09-23 September FY20
Note: I suggest you you convert your 'Date' to datetime first then do any operation on it or If you don't want to convert 'Date' column then use the above code in a single step:
df['FY']=pd.PeriodIndex(pd.to_datetime(df['Date']),freq='A-JUN').strftime("FY%y")

Select corresponding column value for max value of separate column(from a specific range of column) of pandas data frame

year month quantity
DateNew
2005-01 2005 January 49550
2005-02 2005 February 96088
2005-03 2005 March 28874
2005-04 2005 April 66917
2005-05 2005 May 24070
... ... ... ...
2018-08 2018 August 132629
2018-09 2018 September 104394
2018-10 2018 October 121305
2018-11 2018 November 121049
2018-12 2018 December 174984
This is the data frame that I have. I want to select the maximum quantity for each year and return the corresponding month for it.
I have tried this so far
df.groupby('year').max()
But in this, I get the max value for each and every column and hence getting September in each year.
I have no clue how to approach the actual solution.
I think you want idxmax:
df.loc[df.groupby('year')['quantity'].idxmax()]
Output:
year month quantity
DateNew
2005-02 2005 February 96088
2018-12 2018 December 174984
Or just for the months:
df.loc[df.groupby('year')['quantity'].idxmax(), 'month']
Output:
DateNew
2005-02 February
2018-12 December
Name: month, dtype: object
Also, you can use sort_values followed by duplicated:
df.loc[~df.sort_values('quantity').duplicated('year', keep='last'), 'month']

How to save split data in panda in reverse order?

You can use this to create the dataframe:
xyz = pd.DataFrame({'release' : ['7 June 2013', '2012', '31 January 2013',
'February 2008', '17 June 2014', '2013']})
I am trying to split the data and save, them into 3 columns named "day, month and year", using this command:
dataframe[['day','month','year']] = dataframe['release'].str.rsplit(expand=True)
The resulting dataframe is :
dataframe
As you can see, that it works perfectly when it gets 3 strings, but whenever it is getting less then 3 strings, it saves the data at the wrong place.
I have tried split and rsplit, both are giving the same result.
Any solution to get the data at the right place?
The last one is year and it is present in every condition , it should be the first one to be saved and then month if it is present otherwise nothing and same way the day should be stored.
You could
In [17]: dataframe[['year', 'month', 'day']] = dataframe['release'].apply(
lambda x: pd.Series(x.split()[::-1]))
In [18]: dataframe
Out[18]:
release year month day
0 7 June 2013 2013 June 7
1 2012 2012 NaN NaN
2 31 January 2013 2013 January 31
3 February 2008 2008 February NaN
4 17 June 2014 2014 June 17
5 2013 2013 NaN NaN
Try reversing the result.
dataframe[['year','month','day']] = dataframe['release'].str.rsplit(expand=True).reverse()

Categories

Resources