PYTHON: Changing Column names - python

I am reading a excel sheet by using pandas as - pd.read_excel() and then putting it in a list and appending it to the final Data-frame.
The sheet which i am reading have a column name Sales and in the final data-frame i have column with name Item.
AllFields is the Data-frame with all the list of columns.
So my question is while appending the list to the final data-frame the records of the Sales Columns comes under the column name Item.
Example of data which i am reading from sheet
Sales 2013 2014 2015 2016 2017 2018 2019
Units Sold 0 0 0 0 0 0 0
Unit Sale Price $900 $900 $900 $900 $900 $900 $900
Unit Profit $500 $500 $500 $500 $500 $500 $500
and then appending to the data-frame which have columns
Full Project Item Market Project Project Step Round Sponsor Subproduct 2013 2014 2015 2016 2017 2018 2019
reading_book1 = pd.read_excel(file, sheet_name="1-Rollout", skiprows=restvalue).iloc[:10]
EmptyList1 = [reading_book1]
RestDataframe = RestDataframe.append(AllFields).append(EmptyList1)
RestDataframe['Project'] = read_ProjectNumber
RestDataframe['Full Project'] = read_fullProject
RestDataframe['Sponsor'] = read_Sponsor
RestDataframe['Round'] = read_round
RestDataframe['Project Step'] = read_projectstep
RestDataframe['Market'] = "Rest of the World Market"
FinalDataframe = FinalDataframe.append(CADataframe).append(RestDataframe)

You need to use pd.concat
RestDataFrame= pd.concat([AllFields,EmptyList1], axis=1)
And then change the name of Sales column to Item column with
data.rename(columns={'Sales':'Item'}, inplace=True)

Related

group two dataframes with different sizes in python pandas

I've got two data frames, one has historical prices of stocks in this format:
year
Company1
Company2
1980
4.66
12.32
1981
5.68
15.53
etc with hundreds of columns, then I have a dataframe specifing a company, its sector and its country.
company 1
industrials
Germany
company 2
consumer goods
US
company 3
industrials
France
I used the first dataframe to plot the prices of various companies over time, however, I'd like to now somehow group the data from the first table with the second one and create a separate dataframe which will have form of sectors total value of time, ie.
year
industrials
consumer goods
healthcare
1980
50.65
42.23
25.65
1981
55.65
43.23
26.15
Thank you
You can do the following, assuming df_1 is your DataFrame with price of stock per year and company, and df_2 your DataFrame with information on the companies:
# turn company columns into rows
df_1 = df_1.melt(id_vars='year', var_name='company')
df_1 = df_1.merge(df_2)
# groupby and move industry to columns
output = df_1.groupby(['year', 'industry'])['value'].sum().unstack('industry')
Output:
industry consumer goods industrials
year
1980 12.32 4.66
1981 15.53 5.68

How to get the Australian financial year from a date in a pandas dataframe

I have a pandas dataframe that has a datetime column called date.
How can I create a new column to represent the Australian financial year using the date column?
The Australian financial year starts on 1 July and ends the next year on 30 June.
Example 1: 10 June 2019 is FY 2019
Example 2: 5 July 2019 is FY 2020
The code below creates a new column representing Australian financial year using the existing 'date' column:
df['FY'] = df['date'].map(lambda d: d.year + 1 if d.month > 6 else d.year)

Extract Data from DF into a new DF

I am not confident you can see the image. I am a student, last class before graduation, thought python would be fun. Stuck on an issue.
I have a dataframe called final_hgun_frame_raw that successfully lists every state plus DC, in alphabetical order. THere is an index column at starts at 0 - 51. The column headings are STATE, 2010,2011...2019.
The table shows, for example, that index 0 is AL and under column 2010 there is a value 2.44, 2011 there is a value 2.72, etc. For every year and for every state is a value.
My assignment is to create another data frame with 4 columns: Index, State, Year and Value
I have created a null dataframe with STATE, YEAR and VALUE
I know that I should you .tolist and .append but I am having trouble starting. The output should look something like:
State Year Value
AL 2010 2.44
AL 2011 2.72
Each row (state) plus each year (Year) plus each value (value) should not be its' own table.
There should be a table that is 4 columns x 510 rows
How do I extract that information?
You can use pd.melt for this:
import pandas as pd
data = [{'State':'AL', 2010:2.44, 2011:2.72, 2012:3.68}, {'State':'AK', 2010:3.60, 2011:3.93, 2012:4.91}]
df = pd.DataFrame(data)
df = pd.melt(df, id_vars=['State'], var_name='Year', value_name='Value').sort_values(by=['State'])
Output:
State
Year
Value
1
AK
2010
3.6
3
AK
2011
3.93
5
AK
2012
4.91
0
AL
2010
2.44
2
AL
2011
2.72
4
AL
2012
3.68

When merging Dataframes on a common column like ID (primary key),how do you handle data that appears more than once for a single ID, in the second df?

So I have two dfs.
DF1
Superhero ID Superhero City
212121 Spiderman New york
364331 Ironman New york
678523 Batman Gotham
432432 Dr Strange New york
665544 Thor Asgard
123456 Superman Metropolis
555555 Nightwing Gotham
666666 Loki Asgard
Df2
SID Mission End date
665544 10/10/2020
665544 03/03/2021
212121 02/02/2021
665544 05/12/2020
212121 15/07/2021
123456 03/06/2021
666666 12/10/2021
I need to create a new df that summarizes how many heroes are in each city and in which quarter will their missions be complete. I'll be able to match the superhero (and their city) in df1 to the mission end date via their Superhero ID or SID in Df2 ('Superhero Id'=='SID'). Superhero IDs appear only once in Df1 but can appear multiple times in DF2.
Ultimately I need a count for the total no. of heroes in the different cities (which I can do - see below) as well as how many heroes will be free per quarter.
These are the thresholds for the quarters
Quarter 1 – Apr, May, Jun
Quarter 2 – Jul, Aug, Sept
Quarter 3 – Oct, Nov, Dec
Quarter 4 – Jan, Feb, Mar
The following code tells me how many heroes are in each city:
df_Count = pd.DataFrame(df1.City.value_counts().reset_index())
Which produces:
City Count
New york 3
Gotham 2
Asgard 2
Metropolis 1
I can also convert the dates into datetime format via the following operation:
#Convert to datetime series
Df2['Mission End date'] = pd.to_datetime('Df2['Mission End date']')
Ultimately I need a new df that looks like this
City Total Count No. of heroes free in Q3 No. of heroes free in Q4 Free in Q1 2021+
New york 3 2 0 1
Gotham 2 2 2 0
Asgard 2 1 2 0
Metropolis 1 0 0 1
If anyone can help me create the appropriate quarters and be able to sort them into the appropriate columns I'd be extremely grateful. I'd also like a way to handle heroes having multiple mission end dates. I can't ignore them I need to still count them. I suspect I'll need to create a custom function which I can than apply to each row via the apply() method and a lambda expression. This issue has been a pain for a while now so I'd appreciate all the help I can get. Thank you very much :)
After merging your dataframe with
df = df1.merge(df2, left_on='Superhero ID', right_on='SID')
And converting your date column to pd.datetime format
df.assign(missing_end_date=lambda x: pd.to_datetime(x['Missing End Date']))
You can create two columns; one to extract the quarter and one to extract the year of the newly created datetime column
df.assign(quarter_end_date=lambda x: x.missing_end_date.dt.quarter)
.assign(year_end_date=lambda x: x.missing_end_date.dt.year)
And combine them into a column that shows the quarter in a format Qx, yyyy
df.assign(quarter_year_end=lambda x: f"Q{int(x.quarter_end_date)}, {int(x.year_end_date)}")
Finally groupby the city and quarter, count the number of superheros and pivot the dataframe to get your desired result
df.groupby(['City', 'quarter_year_end'])
.count()
.reset_index()
.pivot(index='City', columns='quarter_year_end', values='Superhero')

how to select a group from groupby dataframe using pandas

I have a dataframe with multilevel index (company, year) that grouped by mean, looks like this:
company year mean salary
ABC 2018 3000
2019 3400
LOL 2018 1200
2019 3500
I want to select the data belongs to "LOL", my desired outcome would be:
company year mean salary
LOL 2018 1200
2019 3500
Is there a way I can only select a certain group? I tried to use .filter function on dataframe but I was only able to apply it to rows such as (lambda x: x > 1000) but not for index value.
Any advice will be appreciated!
Use DataFrame.xs with drop_level=False for avoid removed first level:
df1 = df.xs('LOL', drop_level=False)
Or filter by first level with Index.get_level_values:
df1 = df[df.index.get_level_values(0) == 'LOL']
print (df1)
mean salary
company year
LOL 2018 1200
2019 3500

Categories

Resources