I have a dataframe that looks like this:
Deal Year Financial Data1 Financial Data2 Financial Data3 Quarter
0 1 1991/1/1 122 123 120 1
3 1 1991/1/1 122 123 120 2
6 1 1991/1/1 122 123 120 3
1 2 1992/1/1 85 90 80 4
4 2 1992/1/1 85 90 80 5
7 2 1992/1/1 85 90 80 6
2 3 1993/1/1 85 90 100 1
5 3 1993/1/1 85 90 100 2
8 3 1993/1/1 85 90 100 3
However I only want the Financial Data1 displayed for the first quarter in each deal and The whole thing combined into one column again.
The end result should look something like this:
Deal Year Financial Data Quarter
0 1 1991/1/1 122 1
3 1 1991/1/1 123 2
6 1 1991/1/1 120 3
1 2 1992/1/1 85 4
4 2 1992/1/1 90 5
7 2 1992/1/1 80 6
2 3 1993/1/1 85 1
5 3 1993/1/1 90 2
8 3 1993/1/1 100 3
Okie dokie, using np.where() I think this does what you're trying to do:
import pandas as pd
import numpy as np
df = pd.read_fwf(StringIO(
"""Deal Year Financial_Data1 Financial_Data2 Financial_Data3 Quarter
1 1991/1/1 122 123 120 1
1 1991/1/1 122 123 120 2
1 1991/1/1 122 123 120 3
2 1992/1/1 85 90 80 4
2 1992/1/1 85 90 80 5
2 1992/1/1 85 90 80 6
3 1993/1/1 85 90 100 1
3 1993/1/1 85 90 100 2
3 1993/1/1 85 90 100 3"""))
df['Financial_Data'] = np.where(
# if 'Quarter'%3==1
df['Quarter']%3==1,
# Then return Financial_Data1
df['Financial_Data1'],
# Else
np.where(
# If 'Quarter'%3==2
df['Quarter']%3==2,
# Then return Financial_Data2
df['Financial_Data2'],
# Else return Financial_Data3
df['Financial_Data3']
)
)
# Drop Old Columns
df = df.drop(['Financial_Data1', 'Financial_Data2', 'Financial_Data3'], axis=1)
print(df)
Output:
Deal Year Quarter Financial_Data
0 1 1991/1/1 1 122
1 1 1991/1/1 2 123
2 1 1991/1/1 3 120
3 2 1992/1/1 4 85
4 2 1992/1/1 5 90
5 2 1992/1/1 6 80
6 3 1993/1/1 1 85
7 3 1993/1/1 2 90
8 3 1993/1/1 3 100
(PS: I wasn't 100% sure how you intended on dealing with Quarter 4-6, in this example I just treat them as 1-3)
Related
This question already has answers here:
How can I pivot a dataframe?
(5 answers)
Pandas Melt Function
(2 answers)
Closed 10 months ago.
Let's say that these are my data
day region cars motorcycles bikes buses
1 A 0 1 1 2
2 A 4 0 6 8
3 A 2 9 8 0
1 B 6 12 34 82
2 B 13 92 76 1
3 B 23 87 98 9
1 C 29 200 31 45
2 C 54 80 23 89
3 C 129 90 231 56
How do I make the regions columns and the columns(except for the day column) rows?
Basically, I want it to look like this :
day vehicle_type A B C
1 cars 0 6 29
2 cars 4 13 54
3 cars 2 23 129
1 motorcycles 1 12 200
2 motorcycles 0 92 80
3 motorcycles 9 87 90
1 bikes 1 34 31
2 bikes 6 76 23
3 bikes 8 98 231
1 buses 2 82 45
2 buses 8 1 89
3 buses 0 9 56
Use stack and unstack:
(
df.set_index(["day", "region"])
.rename_axis(columns="vehicle_type")
.stack()
.unstack(level=1)
.rename_axis(columns=None)
.reset_index()
)
I've got a dataframe like
Season Game Event_Num Home Away Margin
0 2016-17 1 1 0 0 0
1 2016-17 1 2 0 0 0
2 2016-17 1 3 0 2 2
3 2016-17 1 4 0 2 2
4 2016-17 1 5 0 2 2
.. ... ... ... ... ... ...
95 2017-18 5 53 17 10 7
96 2017-18 5 54 17 10 7
97 2017-18 5 55 17 10 7
98 2017-18 5 56 17 10 7
99 2017-18 5 57 17 10 7
And ultimately, I'd like to take the last row of each Game played, so for instance, the last row for Game 1, Game 2, etc. so I can see what the final margin was, but I'd like to do this for every unique season.
For example, if there were 3 games played for 2 unique seasons then the df would look something like:
Season Game Event_Num Home Away Final Margin
0 2016-17 1 1 90 80 10
1 2016-17 2 2 83 88 5
2 2016-17 3 3 67 78 11
3 2017-18 1 4 101 102 1
4 2017-18 2 5 112 132 20
5 2017-18 3 6
Is there a good way to do something like this? TIA.
Try:
df.groupby(['Season','Game']).tail(1)
output
Season Game Event_Num Home Away Margin
4 2016-17 1 5 0 2 2
9 2017-18 5 57 17 10 7
It seemed I had a simple problem of pivoting a pandas Table, but unfortunately, the problem seems a bit complicated to me.
I am providing a tiny sample table and the output I am looking to give the example of the problem I am facing:
Say, I have a table like this:
df =
AF BF AT BT
1 4 100 70
2 7 102 66
3 11 200 90
4 13 300 178
5 18 403 200
So I need it into a wide/pivot format but the parameter name in each case will be set as the same. ( I am not looking to subset the string if possible)
My output table should like the following:
dfout =
PAR F T
A 1 100
B 4 70
A 2 102
B 7 66
A 3 200
B 11 90
A 4 300
B 13 178
A 5 403
B 18 200
I tried pivoting, but not able to achieve the desired output. Any help will be immensely appreciated. Thanks.
You can use pandas wide_to_long, but first you have to reorder the columns:
pd.wide_to_long(
df.rename(columns=lambda x: x[::-1]).reset_index(),
stubnames=["F", "T"],
i="index",
sep="",
j="PAR",
suffix=".",
).reset_index("PAR")
PAR F T
index
0 A 1 100
1 A 2 102
2 A 3 200
3 A 4 300
4 A 5 403
0 B 4 70
1 B 7 66
2 B 11 90
3 B 13 178
4 B 18 200
Alternatively, you could use the pivot_longer function from the pyjanitor, to reshape the data :
# pip install pyjanitor
import janitor
df.pivot_longer(names_to=("PAR", ".value"), names_pattern=r"(.)(.)")
PAR F T
0 A 1 100
1 B 4 70
2 A 2 102
3 B 7 66
4 A 3 200
5 B 11 90
6 A 4 300
7 B 13 178
8 A 5 403
9 B 18 200
Update: Using data from #jezrael:
df
C AF BF AT BT
0 10 1 4 100 70
1 20 2 7 102 66
2 30 3 11 200 90
3 40 4 13 300 178
4 50 5 18 403 200
pd.wide_to_long(
df.rename(columns=lambda x: x[::-1]),
stubnames=["F", "T"],
i="C",
sep="",
j="PAR",
suffix=".",
).reset_index()
C PAR F T
0 10 A 1 100
1 20 A 2 102
2 30 A 3 200
3 40 A 4 300
4 50 A 5 403
5 10 B 4 70
6 20 B 7 66
7 30 B 11 90
8 40 B 13 178
9 50 B 18 200
if you use the pivot_longer function:
df.pivot_longer(index="C", names_to=("PAR", ".value"), names_pattern=r"(.)(.)")
C PAR F T
0 10 A 1 100
1 10 B 4 70
2 20 A 2 102
3 20 B 7 66
4 30 A 3 200
5 30 B 11 90
6 40 A 4 300
7 40 B 13 178
8 50 A 5 403
9 50 B 18 200
pivot_longer is being worked on; in the next release of pyjanitor it should be much better. But pd.wide_to_long can solve your task pretty easily. The other answers can easily solve it as well.
Idea is create MultiIndex in columns by first and last letter and then use DataFrame.stack for reshape, last some data cleaning in MultiIndex in index:
df.columns= [df.columns.str[-1], df.columns.str[0]]
df = df.stack().reset_index(level=0, drop=True).rename_axis('PAR').reset_index()
print (df)
PAR F T
0 A 1 100
1 B 4 70
2 A 2 102
3 B 7 66
4 A 3 200
5 B 11 90
6 A 4 300
7 B 13 178
8 A 5 403
9 B 18 200
EDIT:
print (df)
C AF BF AT BT
0 10 1 4 100 70
1 20 2 7 102 66
2 30 3 11 200 90
3 40 4 13 300 178
4 50 5 18 403 200
df = df.set_index('C')
df.columns = pd.MultiIndex.from_arrays([df.columns.str[-1],
df.columns.str[0]], names=[None,'PAR'])
df = df.stack().reset_index()
print (df)
C PAR F T
0 10 A 1 100
1 10 B 4 70
2 20 A 2 102
3 20 B 7 66
4 30 A 3 200
5 30 B 11 90
6 40 A 4 300
7 40 B 13 178
8 50 A 5 403
9 50 B 18 200
Let's try:
(pd.wide_to_long(df.reset_index(),stubnames=['A','B'],
i='index',
j='PAR', sep='', suffix='[FT]')
.stack().unstack('PAR').reset_index(level=1)
)
Output:
PAR level_1 F T
index
0 A 1 100
0 B 4 70
1 A 2 102
1 B 7 66
2 A 3 200
2 B 11 90
3 A 4 300
3 B 13 178
4 A 5 403
4 B 18 200
my dataset
name day value
A 7 88
A 15 101
A 21 121
A 29 56
B 21 131
B 30 78
B 35 102
C 8 80
C 16 101
...
I am trying to plot with values for these days, but I want to label because there are too many unique numbers of days.
I try to label it consistently,
Is there a way to speed up labeling by cutting it every 7 days(week)?
For example, ~ 7day = 1week, 8 ~ 14day = 2week, and so on.
output what I want
name day value week
A 7 88 1
A 15 101 3
A 21 121 3
A 29 56 5
B 21 131 3
B 30 78 5
B 35 102 5
C 8 80 2
C 16 101 3
thank you for reading
Subtract 1, then use integer division by 7 and last add 1:
df['week'] = (df['day'] - 1) // 7 + 1
print (df)
name day value week
0 A 7 88 1
1 A 15 101 3
2 A 21 121 3
3 A 29 56 5
4 B 21 131 3
5 B 30 78 5
6 B 35 102 5
7 C 8 80 2
8 C 16 101 3
I have a data frame with two columns
df = ['xPos', 'lineNum']
import pandas as pd
data = '''\
xPos lineNum
40 1
50 1
75 1
90 1
42 2
75 2
110 2
45 3
70 3
95 3
125 3
38 4
56 4
74 4'''
I have created the aggregate data frame for this by using
aggrDF = df.describe(include='all')
command
and I am interested in the minimum of the xPos value. So, i get it by using
minxPos = aggrDF.ix['min']['xPos']
Desired output
data = '''\
xPos lineNum xDiff
40 1 2
50 1 10
75 1 25
90 1 15
42 2 4
75 2 33
110 2 35
45 3 7
70 3 25
95 3 25
125 3 30
38 4 0
56 4 18
74 4 18'''
The logic
I want to compere the two consecutive rows of the data frame and calculate a new column based on this logic:
if( df['LineNum'] != df['LineNum'].shift(1) ):
df['xDiff'] = df['xPos'] - minxPos
else:
df['xDiff'] = df['xPos'].shift(1)
Essentially, I want the new column to have the difference of the two consecutive rows in the df, as long as the line number is the same.
If the line number changes, then, the xDiff column should have the difference with the minimum xPos value that I have from the aggregate data frame.
Can you please help? thanks,
These two lines should do it:
df['xDiff'] = df.groupby('lineNum').diff()['xPos']
df.loc[df['xDiff'].isnull(), 'xDiff'] = df['xPos'] - minxPos
>>> df
xPos lineNum xDiff
0 40 1 2.0
1 50 1 10.0
2 75 1 25.0
3 90 1 15.0
4 42 2 4.0
5 75 2 33.0
6 110 2 35.0
7 45 3 7.0
8 70 3 25.0
9 95 3 25.0
10 125 3 30.0
11 38 4 0.0
12 56 4 18.0
13 74 4 18.0
You just need groupby lineNum and apply the condition you already writing down
df['xDiff']=np.concatenate(df.groupby('lineNum').apply(lambda x : np.where(x['lineNum'] != x['lineNum'].shift(1),x['xPos'] - x['xPos'].min(),x['xPos'].shift(1)).astype(int)).values)
df
Out[76]:
xPos lineNum xDiff
0 40 1 0
1 50 1 40
2 75 1 50
3 90 1 75
4 42 2 0
5 75 2 42
6 110 2 75
7 45 3 0
8 70 3 45
9 95 3 70
10 125 3 95
11 38 4 0
12 56 4 38
13 74 4 56