How do I manipulate a Dataframe with Pivot_Table in Python - python

I have spent much time on this but I am nowhere closer to a solution.
I have a dataframe which outputs as
RegionID AreaID Year Jan Feb Mar Apr May Jun
0 20.0 1.0 2020.0 1174.0 1056.0 1051.0 1107.0 1097.0 1118.0
1 19.0 2.0 2020.0 460.0 451.0 421.0 421.0 420.0 457.0
2 20.0 3.0 2020.0 2723.0 2594.0 2590.0 2399.0 2377.0 2331.0
3 21.0 4.0 2020.0 863.0 859.0 813.0 785.0 757.0 765.0
4 19.0 5.0 2020.0 4037.0 3942.0 4069.0 3844.0 3567.0 3721.0
5 19.0 6.0 2020.0 1695.0 1577.0 1531.0 1614.0 1671.0 1693.0
6 18.0 7.0 2020.0 1757.0 1505.0 1445.0 1514.0 1406.0 1444.0
7 18.0 8.0 2020.0 832.0 721.0 747.0 852.0 885.0 872.0
8 18.0 9.0 2020.0 2538.0 2000.0 2026.0 1981.0 1987.0 1949.0
9 21.0 10.0 2020.0 1145.0 1235.0 1114.0 1161.0 1150.0 1189.0
10 20.0 11.0 2020.0 551.0 497.0 503.0 472.0 505.0 532.0
11 19.0 12.0 2020.0 1664.0 1526.0 1389.0 1373.0 1384.0 1404.0
12 21.0 13.0 2020.0 381.0 351.0 299.0 286.0 297.0 319.0
13 21.0 14.0 2020.0 1733.0 1627.0 1567.0 1561.0 1498.0 1511.0
14 18.0 15.0 2020.0 1257.0 1257.0 1160.0 1172.0 1124.0 1113.0
I want to pivot this data so that I have a month combined field like below
RegionID AreaID Year Month Amout
20.0 1.0 2020 Jan 1174
20.0 1.0 2020 Feb 1056
20.0 1.0 2020 Mar 1051
Can this be done using pandas? I have been trying with the pivot_table but I cant get it to work.

I hope I've understood your question well. You can .set_index() and then .stack():
print(
df.set_index(["RegionID", "AreaID", "Year"])
.stack()
.reset_index()
.rename(columns={"level_3": "Month", 0: "Amount"})
)
Prints:
RegionID AreaID Year Month Amount
0 20.0 1.0 2020.0 Jan 1174.0
1 20.0 1.0 2020.0 Feb 1056.0
2 20.0 1.0 2020.0 Mar 1051.0
3 20.0 1.0 2020.0 Apr 1107.0
4 20.0 1.0 2020.0 May 1097.0
5 20.0 1.0 2020.0 Jun 1118.0
6 19.0 2.0 2020.0 Jan 460.0
7 19.0 2.0 2020.0 Feb 451.0
8 19.0 2.0 2020.0 Mar 421.0
9 19.0 2.0 2020.0 Apr 421.0
10 19.0 2.0 2020.0 May 420.0
11 19.0 2.0 2020.0 Jun 457.0
...
Or:
print(
df.melt(
["RegionID", "AreaID", "Year"], var_name="Month", value_name="Amount"
)
)

Related

How to do similar to conditional countifs on a dataframe

I am trying to replicate countifs in excel to get a rank between two unique values that are listed in my dataframe. I have attached the expected output calculated in excel using countif and let/rank functions.
I am trying to generate "average rank of gas and coal plants" that takes the number from the "average rank column" and then ranks the two unique types from technology (CCGT or COAL) into two new ranks (Gas or Coal) so then I can get the relavant quantiles for this. In case you are wondering why I would need to do this seeing as there are only two coal plants, well when I run this model on a larger dataset it will be useful to know how to do this in code and not manually on my dataset.
Ideally the output will return two ranks 1-47 for all units with technology == CCGT and 1-2 for all units with technology == COAL.
This is the column I am looking to make
Unit ID
Technology
03/01/2022
04/01/2022
05/01/2022
06/01/2022
07/01/2022
08/01/2022
Average Rank
Unit Rank
Avg Rank of Gas & Coal plants
Gas Quintiles
Coal Quintiles
Quintiles
FAWN-1
CCGT
1.0
5.0
1.0
5.0
2.0
1.0
2.5
1
1
1
0
Gas_1
GRAI-6
CCGT
4.0
18.0
2.0
4.0
3.0
3.0
5.7
2
2
1
0
Gas_1
EECL-1
CCGT
5.0
29.0
4.0
1.0
1.0
2.0
7.0
3
3
1
0
Gas_1
PEMB-21
CCGT
7.0
1.0
6.0
13.0
8.0
8.0
7.2
4
4
1
0
Gas_1
PEMB-51
CCGT
3.0
3.0
3.0
11.0
16.0
7.2
5
5
1
0
Gas_1
PEMB-41
CCGT
9.0
4.0
7.0
7.0
10.0
13.0
8.3
6
6
1
0
Gas_1
WBURB-1
CCGT
6.0
9.0
22.0
2.0
7.0
5.0
8.5
7
7
1
0
Gas_1
PEMB-31
CCGT
14.0
6.0
13.0
6.0
4.0
9.0
8.7
8
8
1
0
Gas_1
GRMO-1
CCGT
2.0
7.0
10.0
24.0
11.0
6.0
10.0
9
9
1
0
Gas_1
PEMB-11
CCGT
21.0
2.0
9.0
10.0
9.0
14.0
10.8
10
10
2
0
Gas_2
STAY-1
CCGT
19.0
12.0
5.0
23.0
6.0
7.0
12.0
11
11
2
0
Gas_2
GRAI-7
CCGT
10.0
27.0
15.0
9.0
15.0
11.0
14.5
12
12
2
0
Gas_2
DIDCB6
CCGT
28.0
11.0
11.0
8.0
19.0
15.0
15.3
13
13
2
0
Gas_2
SCCL-3
CCGT
17.0
16.0
31.0
3.0
18.0
10.0
15.8
14
14
2
0
Gas_2
STAY-4
CCGT
12.0
8.0
20.0
18.0
14.0
23.0
15.8
14
14
2
0
Gas_2
CDCL-1
CCGT
13.0
22.0
8.0
25.0
12.0
16.0
16.0
16
16
2
0
Gas_2
STAY-3
CCGT
8.0
17.0
17.0
20.0
13.0
22.0
16.2
17
17
2
0
Gas_2
MRWD-1
CCGT
19.0
26.0
5.0
19.0
17.3
18
18
2
0
Gas_2
WBURB-3
CCGT
24.0
14.0
17.0
17.0
18.0
19
19
3
0
Gas_3
WBURB-2
CCGT
14.0
21.0
12.0
31.0
18.0
19.2
20
20
3
0
Gas_3
GYAR-1
CCGT
26.0
14.0
17.0
20.0
21.0
19.6
21
21
3
0
Gas_3
STAY-2
CCGT
18.0
20.0
18.0
21.0
24.0
20.0
20.2
22
22
3
0
Gas_3
KLYN-A-1
CCGT
24.0
12.0
19.0
27.0
20.5
23
23
3
0
Gas_3
SHOS-1
CCGT
16.0
15.0
28.0
15.0
29.0
27.0
21.7
24
24
3
0
Gas_3
DIDCB5
CCGT
10.0
35.0
22.0
22.3
25
25
3
0
Gas_3
CARR-1
CCGT
33.0
26.0
27.0
22.0
4.0
22.4
26
26
3
0
Gas_3
LAGA-1
CCGT
15.0
13.0
29.0
32.0
23.0
24.0
22.7
27
27
3
0
Gas_3
CARR-2
CCGT
24.0
25.0
27.0
29.0
21.0
12.0
23.0
28
28
3
0
Gas_3
GRAI-8
CCGT
11.0
28.0
36.0
16.0
26.0
25.0
23.7
29
29
4
0
Gas_4
SCCL-2
CCGT
29.0
16.0
28.0
25.0
24.5
30
30
4
0
Gas_4
LBAR-1
CCGT
19.0
25.0
31.0
28.0
25.8
31
31
4
0
Gas_4
CNQPS-2
CCGT
20.0
32.0
32.0
26.0
27.5
32
32
4
0
Gas_4
SPLN-1
CCGT
23.0
30.0
30.0
27.7
33
33
4
0
Gas_4
DAMC-1
CCGT
23.0
21.0
38.0
34.0
29.0
34
34
4
0
Gas_4
KEAD-2
CCGT
30.0
30.0
35
35
4
0
Gas_4
SHBA-1
CCGT
26.0
23.0
35.0
37.0
30.3
36
36
4
0
Gas_4
HUMR-1
CCGT
22.0
30.0
37.0
37.0
33.0
28.0
31.2
37
37
4
0
Gas_4
CNQPS-4
CCGT
27.0
33.0
35.0
30.0
31.3
38
38
5
0
Gas_5
CNQPS-1
CCGT
25.0
40.0
33.0
32.7
39
39
5
0
Gas_5
SEAB-1
CCGT
32.0
34.0
36.0
29.0
32.8
40
40
5
0
Gas_5
PETEM1
CCGT
35.0
35.0
41
41
5
0
Gas_5
ROCK-1
CCGT
31.0
34.0
38.0
38.0
35.3
42
42
5
0
Gas_5
SEAB-2
CCGT
31.0
39.0
39.0
34.0
35.8
43
43
5
0
Gas_5
WBURB-43
COAL
32.0
37.0
40.0
39.0
31.0
35.8
44
1
0
1
Coal_1
FDUNT-1
CCGT
36.0
36.0
45
44
5
0
Gas_5
COSO-1
CCGT
30.0
42.0
36.0
36.0
45
44
5
0
Gas_5
WBURB-41
COAL
33.0
38.0
41.0
40.0
32.0
36.8
47
2
0
1
Coal_1
FELL-1
CCGT
34.0
39.0
43.0
41.0
33.0
38.0
48
46
5
0
Gas_5
KEAD-1
CCGT
43.0
43.0
49
47
5
0
Gas_5
I have tried to do it the same way I got average rank, which is a rank of the average of inputs in the dataframe but it doesn't seem to work with additional conditions.
Thank you!!
import pandas as pd
df = pd.read_csv("gas.csv")
display(df['Technology'].value_counts())
print('------')
display(df['Technology'].value_counts()[0]) # This is how you access count of CCGT
display(df['Technology'].value_counts()[1])
Output:
CCGT 47
COAL 2
Name: Technology, dtype: int64
------
47
2
By the way: pd.cut or pd.qcut can be used to calculate quantiles. You don't have to manually define what a quantile is.
Refer to the documentation and other websites:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.qcut.html
https://www.geeksforgeeks.org/how-to-use-pandas-cut-and-qcut/
There are many methods you can pass to rank. Refer to documentation:
https://pandas.pydata.org/docs/reference/api/pandas.Series.rank.html
df['rank'] = df.groupby("Technology")["Average Rank"].rank(method = "dense", ascending = True)
df
method{‘average’, ‘min’, ‘max’, ‘first’, ‘dense’}, default ‘average’
How to rank the group of records that have the same value (i.e. ties):
average: average rank of the group
min: lowest rank in the group
max: highest rank in the group
first: ranks assigned in order they appear in the array
dense: like ‘min’, but rank always increases by 1 between groups.

What is the best way to create a new dataframe with existing ones of different shapes and criteria

I have a few dataframes that I have made through various sorting and processing of data from the main dataframe (df1).
df1 - large and will currently covers 6 days worth of data for every 30 mins but I wish to scale up to longer periods:
import pandas as pd
import numpy as np
bmu_units = pd.read_csv('bmu_units_technology.csv')
b1610 = pd.read_csv('b1610_df.csv')
b1610 = (b1610.merge(bmu_units, on=['BM Unit ID 1'], how='left'))
b1610['% of capacity running'] = b1610.quantity / b1610.Capacity
def func(tech):
if tech in ["CCGT","OCGT","COAL"]:
return "Fossil"
else:
return "ZE"
b1610["Type"] = b1610['Technology'].apply(func)
settlementDate time BM Unit ID 1 BM Unit ID 2_x settlementPeriod quantity BM Unit ID 2_y Capacity Technology % of capacity running Type
0 03/01/2022 00:00:00 RCBKO-1 T_RCBKO-1 1 278.658 T_RCBKO-1 279.0 WIND 0.998774 ZE
1 03/01/2022 00:00:00 LARYO-3 T_LARYW-3 1 162.940 T_LARYW-3 180.0 WIND 0.905222 ZE
2 03/01/2022 00:00:00 LAGA-1 T_LAGA-1 1 262.200 T_LAGA-1 905.0 CCGT 0.289724 Fossil
3 03/01/2022 00:00:00 CRMLW-1 T_CRMLW-1 1 3.002 T_CRMLW-1 47.0 WIND 0.063872 ZE
4 03/01/2022 00:00:00 GRIFW-1 T_GRIFW-1 1 9.972 T_GRIFW-1 102.0 WIND 0.097765 ZE
... ... ... ... ... ... ... ... ... ... ... ...
52533 08/01/2022 23:30:00 CRMLW-1 T_CRMLW-1 48 8.506 T_CRMLW-1 47.0 WIND 0.180979 ZE
52534 08/01/2022 23:30:00 LARYO-4 T_LARYW-4 48 159.740 T_LARYW-4 180.0 WIND 0.887444 ZE
52535 08/01/2022 23:30:00 HOWBO-3 T_HOWBO-3 48 32.554 T_HOWBO-3 440.0 Offshore Wind 0.073986 ZE
52536 08/01/2022 23:30:00 BETHW-1 E_BETHW-1 48 5.010 E_BETHW-1 30.0 WIND 0.167000 ZE
52537 08/01/2022 23:30:00 HMGTO-1 T_HMGTO-1 48 92.094 HMGTO-1 108.0 WIND 0.852722 ZE
df2:
rank = (
b1610.pivot_table(
index=['settlementDate','BM Unit ID 1','Technology'],
columns='settlementPeriod',
values='% of capacity running',
aggfunc=sum,
fill_value=0)
)
rank['rank of capacity'] = rank.sum(axis=1)
rank
settlementPeriod 1 2 3 4 5 6 7 8 9 10 ... 40 41 42 43 44 45 46 47 48 rank of capacity
settlementDate BM Unit ID 1 Technology
03/01/2022 ABRBO-1 WIND 0.936970 0.969293 0.970909 0.925051 0.885657 0.939394 0.963434 0.938586 0.863232 0.781212 ... 0.461818 0.394545 0.428889 0.537172 0.520606 0.545253 0.873333 0.697778 0.651111 29.566263
ABRTW-1 WIND 0.346389 0.343333 0.345389 0.341667 0.342222 0.346778 0.347611 0.347722 0.346833 0.340556 ... 0.018778 0.015889 0.032056 0.043056 0.032167 0.109611 0.132111 0.163278 0.223556 10.441333
ACHRW-1 WIND 0.602884 0.575628 0.602140 0.651070 0.667721 0.654791 0.539209 0.628698 0.784233 0.782140 ... 0.174419 0.148465 0.139860 0.091535 0.094698 0.272419 0.205023 0.184651 0.177628 18.517814
AKGLW-2 WIND 0.000603 0.000603 0.000603 0.000635 0.000603 0.000635 0.000635 0.000635 0.000635 0.000603 ... 0.191079 0.195079 0.250476 0.281048 0.290000 0.279524 0.358508 0.452698 0.572730 8.616032
ANSUW-1 WIND 0.889368 0.865053 0.915684 0.894000 0.888526 0.858211 0.875158 0.878421 0.809368 0.898737 ... 0.142632 0.212526 0.276421 0.225053 0.235789 0.228000 0.152211 0.226000 0.299158 19.662421
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
08/01/2022 WBURB-2 CCGT 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.636329 0.642447 0.961835 0.908706 0.650212 0.507012 0.513176 0.503576 0.518212 24.439765
HOWBO-3 Offshore Wind 0.030418 0.026355 0.026595 0.014373 0.012523 0.008418 0.010977 0.016918 0.019127 0.025641 ... 0.055509 0.063845 0.073850 0.073923 0.073895 0.073791 0.073886 0.074050 0.073986 2.332809
MRWD-1 CCGT 0.808043 0.894348 0.853043 0.650870 0.159783 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.701739 0.488913 0.488913 0.489348 0.489130 0.392826 0.079130 0.000000 0.000000 23.485217
WBURB-3 CCGT 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.771402 0.699986 0.648386 0.919242 0.759520 0.424513 0.430598 0.420089 0.436376 25.436282
DRAXX-4 BIOMASS 0.706074 0.791786 0.806713 0.806462 0.806270 0.806136 0.806509 0.806369 0.799749 0.825070 ... 0.777395 0.816093 0.707122 0.666639 0.680406 0.679216 0.501433 0.000000 0.000000 36.576512
df3 - this was made by sorting the above dataframe to list sums for each day for each BM Unit ID filtered for specific technology types.
BM Unit ID 1 Technology 03/01/2022 04/01/2022 05/01/2022 06/01/2022 07/01/2022 08/01/2022 ave rank rank
0 FAWN-1 CCGT 1.0 5.0 1.0 5.0 2.0 1.0 2.500000 1.0
1 GRAI-6 CCGT 4.0 18.0 2.0 4.0 3.0 3.0 5.666667 2.0
2 EECL-1 CCGT 5.0 29.0 4.0 1.0 1.0 2.0 7.000000 3.0
3 PEMB-21 CCGT 7.0 1.0 6.0 13.0 8.0 8.0 7.166667 4.0
4 PEMB-51 CCGT 3.0 3.0 3.0 11.0 16.0 NaN 7.200000 5.0
5 PEMB-41 CCGT 9.0 4.0 7.0 7.0 10.0 13.0 8.333333 6.0
6 WBURB-1 CCGT 6.0 9.0 22.0 2.0 7.0 5.0 8.500000 7.0
7 PEMB-31 CCGT 14.0 6.0 13.0 6.0 4.0 9.0 8.666667 8.0
8 GRMO-1 CCGT 2.0 7.0 10.0 24.0 11.0 6.0 10.000000 9.0
9 PEMB-11 CCGT 21.0 2.0 9.0 10.0 9.0 14.0 10.833333 10.0
10 STAY-1 CCGT 19.0 12.0 5.0 23.0 6.0 7.0 12.000000 11.0
11 GRAI-7 CCGT 10.0 27.0 15.0 9.0 15.0 11.0 14.500000 12.0
12 DIDCB6 CCGT 28.0 11.0 11.0 8.0 19.0 15.0 15.333333 13.0
13 STAY-4 CCGT 12.0 8.0 20.0 18.0 14.0 23.0 15.833333 14.0
14 SCCL-3 CCGT 17.0 16.0 31.0 3.0 18.0 10.0 15.833333 14.0
15 CDCL-1 CCGT 13.0 22.0 8.0 25.0 12.0 16.0 16.000000 15.0
16 STAY-3 CCGT 8.0 17.0 17.0 20.0 13.0 22.0 16.166667 16.0
17 MRWD-1 CCGT NaN NaN 19.0 26.0 5.0 19.0 17.250000 17.0
18 WBURB-3 CCGT NaN NaN 24.0 14.0 17.0 17.0 18.000000 18.0
19 WBURB-2 CCGT NaN 14.0 21.0 12.0 31.0 18.0 19.200000 19.0
20 GYAR-1 CCGT NaN 26.0 14.0 17.0 20.0 21.0 19.600000 20.0
21 STAY-2 CCGT 18.0 20.0 18.0 21.0 24.0 20.0 20.166667 21.0
22 SHOS-1 CCGT 16.0 15.0 28.0 15.0 29.0 27.0 21.666667 22.0
23 KLYN-A-1 CCGT NaN 24.0 12.0 19.0 27.0 29.0 22.200000 23.0
24 DIDCB5 CCGT NaN 10.0 35.0 22.0 NaN NaN 22.333333 24.0
25 CARR-1 CCGT NaN 33.0 26.0 27.0 22.0 4.0 22.400000 25.0
26 LAGA-1 CCGT 15.0 13.0 29.0 32.0 23.0 24.0 22.666667 26.0
27 CARR-2 CCGT 24.0 25.0 27.0 29.0 21.0 12.0 23.000000 27.0
28 GRAI-8 CCGT 11.0 28.0 36.0 16.0 26.0 25.0 23.666667 28.0
29 SCCL-2 CCGT 29.0 NaN 16.0 28.0 25.0 NaN 24.500000 29.0
30 LBAR-1 CCGT NaN 19.0 25.0 31.0 28.0 NaN 25.750000 30.0
31 CNQPS-2 CCGT 20.0 NaN 32.0 NaN 32.0 26.0 27.500000 31.0
32 SPLN-1 CCGT NaN NaN 23.0 30.0 30.0 NaN 27.666667 32.0
33 CNQPS-1 CCGT 25.0 NaN 33.0 NaN NaN NaN 29.000000 33.0
34 DAMC-1 CCGT 23.0 21.0 38.0 34.0 NaN NaN 29.000000 33.0
35 KEAD-2 CCGT 30.0 NaN NaN NaN NaN NaN 30.000000 34.0
36 HUMR-1 CCGT 22.0 30.0 37.0 37.0 33.0 28.0 31.166667 35.0
37 SHBA-1 CCGT 26.0 23.0 40.0 35.0 37.0 NaN 32.200000 36.0
38 SEAB-1 CCGT NaN 32.0 34.0 36.0 NaN 30.0 33.000000 37.0
39 CNQPS-4 CCGT 27.0 NaN 41.0 33.0 35.0 31.0 33.400000 38.0
40 PETEM1 CCGT NaN 35.0 NaN NaN NaN NaN 35.000000 39.0
41 SEAB-2 CCGT NaN 31.0 39.0 39.0 34.0 NaN 35.750000 40.0
42 COSO-1 CCGT NaN NaN 30.0 42.0 36.0 NaN 36.000000 41.0
43 ROCK-1 CCGT 31.0 34.0 42.0 38.0 38.0 NaN 36.600000 42.0
44 WBURB-43 COAL 32.0 37.0 45.0 40.0 39.0 32.0 37.500000 43.0
45 WBURB-41 COAL 33.0 38.0 46.0 41.0 40.0 33.0 38.500000 44.0
46 FELL-1 CCGT 34.0 39.0 47.0 43.0 41.0 34.0 39.666667 45.0
47 FDUNT-1 OCGT NaN 36.0 44.0 NaN NaN NaN 40.000000 46.0
48 KEAD-1 CCGT NaN NaN 43.0 NaN NaN NaN 43.000000 47.0
My issue is that I am trying to create a new dataframe using the existing dataframes listed above in which I can list all my BM Unit ID 1's in order of rank from df2 while populating the values with means of values for all dates (not split by date) in df1. An example of what I am after is below, which I made on excel using index match. Here I have the results for each settlement period from df1 and df2 but instead of split by date they are an aggregated mean over all dates in the df but they are still ranked according to the last column of df2, which is key.
Desired Output:
BM Unit ID Technology 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
Rank Capacity
1 150 FAWN-1 CCGT 130.43 130.93 130.78 130.58 130.57 130.54 130.71 130.87 130.89 130.98 130.83 130.80 130.88 131.02 130.81 130.65 130.86 130.84 131.19 130.60 130.69 130.70 130.40 130.03 130.13 130.03 129.75
2 455 GRAI-6 CCGT 339.45 342.33 322.53 312.40 303.78 307.60 316.35 277.18 293.48 325.75 326.75 271.34 299.74 328.06 317.12 342.66 364.50 390.90 403.32 411.52 400.18 405.94 394.04 400.08 389.08 382.74 374.76
3 408 EECL-1 CCGT 363.31 386.71 364.46 363.31 363.31 363.38 361.87 305.06 286.99 282.74 323.93 242.88 242.64 207.73 294.71 357.15 383.47 426.93 433.01 432.98 435.14 436.38 416.04 417.69 430.42 415.09 406.45
4 430 PEMB-21 CCGT 334.40 419.50 436.70 441.90 440.50 415.80 327.90 323.70 322.70 331.10 367.50 368.40 396.70 259.05 415.95 356.32 386.84 400.00 429.52 435.40 434.84 435.88 435.60 438.48 438.16 437.84 437.76
5 465 PEMB-51 CCGT 370.65 370.45 359.90 326.25 326.20 322.65 324.60 274.25 319.55 288.80 301.75 279.08 379.60 376.76 389.92 419.24 403.64 420.92 428.20 421.32 396.92 397.80 424.40 433.92 434.56 431.44 434.40
6 445 PEMB-41 CCGT 337.00 423.40 423.10 427.50 427.00 419.00 361.00 318.80 263.20 226.70 268.70 231.35 366.90 378.35 392.20 421.55 354.96 382.48 422.64 428.28 428.76 431.24 431.92 431.84 429.52 429.00 431.48
7 425 WBURB-1 CCGT 240.41 293.17 252.27 256.51 261.65 253.44 247.14 217.08 223.11 199.27 254.69 314.16 361.07 317.50 259.54 266.83 349.64 383.43 408.18 412.29 395.54 383.48 355.98 340.49 360.87 352.74 376.92
8 465 PEMB-31 CCGT 297.73 360.27 355.40 357.07 358.67 353.07 300.93 284.73 268.73 255.20 248.53 257.75 366.75 376.45 396.40 320.56 342.68 352.52 361.16 379.40 386.64 390.36 409.12 427.48 426.60 426.80 427.16
9 144 GRMO-1 CCGT 106.62 106.11 105.96 106.00 106.00 105.98 105.99 105.90 105.47 105.31 105.28 105.07 105.04 105.06 105.06 105.04 105.06 105.06 105.07 105.04 105.05 105.06 105.04 105.04 105.04 105.06 105.07
10 430 PEMB-11 CCGT 432.80 430.40 430.70 431.90 432.10 429.30 430.00 408.30 320.90 346.50 432.90 432.20 312.93 297.20 414.55 432.00 420.40 429.80 402.60 426.90 430.65 435.85 435.10 431.15 435.20 431.50 431.75
11 457 STAY-1 CCGT 216.07 223.27 232.67 243.47 234.67 221.73 227.00 219.00 237.00 218.33 250.73 228.27 219.67 142.68 243.00 300.64 312.28 331.00 360.84 379.28 398.92 410.04 410.56 409.24 411.96 408.84 411.88
12 455 GRAI-7 CCGT 425.20 425.40 377.90 339.40 342.00 329.80 408.00 402.40 329.00 257.30 130.43 211.37 262.60 318.45 299.98 324.72 350.40 386.26 394.20 402.10 390.48 401.22 388.94 394.10 395.14 379.70 377.26
13 710 DIDCB6 CCGT 465.80 459.50 411.60 411.70 413.70 410.80 351.50 333.40 333.70 390.40 234.60 265.56 348.16 430.28 524.32 554.04 536.28 589.28 594.04 597.72 592.76 557.86 687.70 687.25 687.35 687.25 679.80
14 400 SCCL-3 CCGT 311.50 337.40 378.80 311.50 381.30 338.60 302.70 300.70 300.60 300.70 338.20 321.50 363.80 260.35 228.18 308.70 334.73 324.60 354.63 362.38 347.30 306.22 346.86 365.04 365.40 370.68 370.52
400 SCCL-3 CCGT 311.50 337.40 378.80 311.50 381.30 338.60 302.70 300.70 300.60 300.70 338.20 321.50 363.80 260.35 228.18 308.70 334.73 324.60 354.63 362.38 347.30 306.22 346.86 365.04 365.40 370.68 370.52
16 440 CDCL-1 CCGT 270.63 255.24 210.87 197.10 195.12 198.72 197.64 198.99 233.19 221.31 176.94 317.52 280.68 213.12 297.68 342.25 397.26 372.28 371.74 379.87 347.51 348.48 352.15 384.88 395.14 381.02 360.40
17 457 STAY-3 CCGT 311.25 311.30 311.60 311.45 311.15 311.30 308.40 313.10 223.90 196.05 242.95 172.87 217.40 236.84 252.92 352.98 384.06 414.76 403.68 424.90 418.38 403.00 420.26 424.40 427.06 421.64 424.66
18 920 MRWD-1 CCGT 468.70 483.90 420.60 267.80 472.60 470.20 241.40 299.30 327.70 327.80 336.90 241.60 308.33 529.93 793.73 828.40 870.67 846.67 827.07 855.93 829.33 865.87 870.40 846.87 765.47 785.20 824.00
19 425 WBURB-3 CCGT 311.73 427.68 333.68 333.93 370.68 335.09 420.85 433.86 370.45 321.70 340.54 300.95 155.47 190.67 290.81 310.43 332.52 376.63 391.11 413.74 408.33 398.69 397.54 368.05 410.64 413.05 428.91
20 425 WBURB-2 CCGT 295.54 424.56 336.68 334.08 371.20 358.44 358.90 358.96 377.94 325.42 203.19 165.32 205.75 121.41 162.51 180.15 301.12 413.77 410.33 397.21 385.59 378.09 381.50 380.93 413.71 418.53 427.09
21 420 GYAR-1 CCGT 404.33 404.33 403.73 405.12 404.13 404.33 404.33 376.98 218.02 218.02 351.01 215.10 177.46 222.43 345.47 398.94 401.97 401.97 402.17 401.87 401.47 401.77 401.62 402.51 402.31 402.41 402.26
22 457 STAY-2 CCGT 434.20 435.40 435.40 435.20 434.20 434.20 434.20 434.60 249.80 196.20 291.20 234.80 196.80 88.73 167.10 239.52 324.52 372.80 412.40 423.32 424.04 423.96 423.92 424.08 423.88 420.96 422.44
23 400 KLYN-A-1 CCGT 382.58 382.50 384.94 385.81 385.83 385.79 385.02 384.94 259.16 141.03 195.65 205.75 278.81 256.95 296.85 337.82 369.26 376.38 376.84 376.56 376.30 376.09 375.62 375.45 375.11 375.17 375.09
24 420 SHOS-1 CCGT 290.63 326.33 229.60 265.70 269.05 259.40 299.45 310.20 301.65 266.00 307.90 319.30 253.06 246.85 263.04 220.46 277.68 297.84 290.62 297.86 302.83 295.13 293.73 289.04 306.14 314.24 321.76

How to concatenate variable string data to a row in a dataframe based on numeric value

I have a pandas dataframe result, looks like this:
Weekday Day Store1 Store2 Store3 Store4 Store5
0 Mon 6 0.0 0.0 0.0 0.0 0.0
1 Tue 7 42.0 33.0 23.0 42.0 21.0
2 Wed 8 43.0 29.0 13.0 33.0 22.0
3 Thu 9 45.0 24.0 20.0 29.0 18.0
4 Fri 10 48.0 21.0 22.0 37.0 22.0
5 Sat 11 34.0 22.0 23.0 34.0 18.0
0 Mon 13 39.0 21.0 21.0 25.0 21.0
1 Tue 14 39.0 20.0 18.0 0.0 19.0
2 Wed 15 46.0 26.0 18.0 31.0 24.0
3 Thu 16 38.0 21.0 15.0 45.0 29.0
4 Fri 17 42.0 21.0 21.0 41.0 20.0
5 Sat 18 40.0 25.0 15.0 36.0 19.0
0 Mon 20 39.0 22.0 23.0 36.0 19.0
1 Tue 21 31.0 18.0 16.0 35.0 23.0
2 Wed 22 33.0 25.0 17.0 39.0 22.0
3 Thu 23 34.0 24.0 19.0 18.0 27.0
4 Fri 24 33.0 18.0 24.0 43.0 24.0
5 Sat 25 38.0 22.0 20.0 40.0 12.0
0 Mon 27 41.0 21.0 18.0 31.0 23.0
1 Tue 28 32.0 21.0 14.0 23.0 14.0
2 Wed 29 33.0 18.0 15.0 19.0 23.0
3 Thu 30 36.0 21.0 21.0 23.0 18.0
4 Fri 1 40.0 30.0 24.0 38.0 23.0
5 Sat 2 40.0 19.0 22.0 38.0 21.0
Notice how Day goes from 6 to 30, then back to 1, and 2. In this example, it's referring to
September 6, 2021 - October 2nd, 2021.
I currently have a variable PrimaryMonth = September and SecondaryMonth = October
I know that I can do result['Month'] = 'September' but it will list all the Month values as September, I'd like to find a way, if possible, to iterate through the rows so that when it reaches the bottom 1 and 2 it will show October in the new Month column.
Is it possible to do a For loop or some other iteration to accomplish this? I was initially brainstorming some pseudocode
#for row in result:
# while Day <= 31
#concat PrimaryMonth
#else concat SecondaryMonth
You can kind of get an idea of where I want to go with this.
Many things are easier if you use proper date formats...
date_str = 'Monday, September 6, 2021 - Saturday, October 2, 2021'
new_index = pd.date_range(*map(pd.to_datetime, date_str.split(' - ')))
dates = pd.DataFrame(index=new_index)
dates['day'] = dates.index.day
dates.columns = ['Day']
df = pd.merge(dates, df, 'outer')
df.index = dates.index
df['month'] = df.index.month_name()
print(df.dropna())
Output:
Day Weekday Store1 Store2 Store3 Store4 Store5 month
2021-09-06 6 Mon 0.0 0.0 0.0 0.0 0.0 September
2021-09-07 7 Tue 42.0 33.0 23.0 42.0 21.0 September
2021-09-08 8 Wed 43.0 29.0 13.0 33.0 22.0 September
2021-09-09 9 Thu 45.0 24.0 20.0 29.0 18.0 September
2021-09-10 10 Fri 48.0 21.0 22.0 37.0 22.0 September
2021-09-11 11 Sat 34.0 22.0 23.0 34.0 18.0 September
2021-09-13 13 Mon 39.0 21.0 21.0 25.0 21.0 September
2021-09-14 14 Tue 39.0 20.0 18.0 0.0 19.0 September
2021-09-15 15 Wed 46.0 26.0 18.0 31.0 24.0 September
2021-09-16 16 Thu 38.0 21.0 15.0 45.0 29.0 September
2021-09-17 17 Fri 42.0 21.0 21.0 41.0 20.0 September
2021-09-18 18 Sat 40.0 25.0 15.0 36.0 19.0 September
2021-09-20 20 Mon 39.0 22.0 23.0 36.0 19.0 September
2021-09-21 21 Tue 31.0 18.0 16.0 35.0 23.0 September
2021-09-22 22 Wed 33.0 25.0 17.0 39.0 22.0 September
2021-09-23 23 Thu 34.0 24.0 19.0 18.0 27.0 September
2021-09-24 24 Fri 33.0 18.0 24.0 43.0 24.0 September
2021-09-25 25 Sat 38.0 22.0 20.0 40.0 12.0 September
2021-09-27 27 Mon 41.0 21.0 18.0 31.0 23.0 September
2021-09-28 28 Tue 32.0 21.0 14.0 23.0 14.0 September
2021-09-29 29 Wed 33.0 18.0 15.0 19.0 23.0 September
2021-09-30 30 Thu 36.0 21.0 21.0 23.0 18.0 September
2021-10-01 1 Fri 40.0 30.0 24.0 38.0 23.0 October
2021-10-02 2 Sat 40.0 19.0 22.0 38.0 21.0 October
And no, no matter what you do, a for-loop is probably the wrong answer when it comes to pandas.

Calculating age from dataframe (dob -y/m/d)

I'm trying to add a column "Age" to my data
number of purchased hours(mins) dob Y dob M dob D
0 7200 2010.0 10.0 12.0
1 7320 2010.0 6.0 2.0
2 5400 2011.0 6.0 18.0
3 9180 2009.0 10.0 18.0
4 3102 2007.0 7.0 30.0
5 5400 2011.0 4.0 6.0
6 9000 2009.0 8.0 5.0
7 6000 2004.0 2.0 7.0
8 6000 2007.0 8.0 17.0
9 6000 2013.0 5.0 5.0
10 12000 2012.0 9.0 27.0
11 12000 2004.0 11.0 25.0
12 6000 2009.0 11.0 20.0
I've tried this code, but not sure what went wrong
from datetime import datetime as dt
df['Age'] = datetime.datetime.now()-pd.to_datetime(df[['dob D','dob M','dob Y']])
Below is the error that popped up
ValueError: to assemble mappings requires at least that [year, month, day] be specified: [day,month,year] is missing
If want use to_datetime with 3 columns it working only with renamed columns names:
d = {'dob Y':'year', 'dob M':'month', 'dob D':'day'}
df['Age'] = (pd.Timestamp.now().floor('d') -
pd.to_datetime(df[['dob D','dob M','dob Y']].rename(columns=d)))
print (df)
number of purchased hours(mins) dob Y dob M dob D Age
0 7200 2010.0 10.0 12.0 3380 days
1 7320 2010.0 6.0 2.0 3512 days
2 5400 2011.0 6.0 18.0 3131 days
3 9180 2009.0 10.0 18.0 3739 days
4 3102 2007.0 7.0 30.0 4550 days
5 5400 2011.0 4.0 6.0 3204 days
6 9000 2009.0 8.0 5.0 3813 days
7 6000 2004.0 2.0 7.0 5819 days
8 6000 2007.0 8.0 17.0 4532 days
9 6000 2013.0 5.0 5.0 2444 days
10 12000 2012.0 9.0 27.0 2664 days
11 12000 2004.0 11.0 25.0 5527 days
12 6000 2009.0 11.0 20.0 3706 days
If want convert timedeltas to days:
d = {'dob Y':'year', 'dob M':'month', 'dob D':'day'}
df['Age'] = ((pd.Timestamp.now().floor('d') -
pd.to_datetime(df[['dob D','dob M','dob Y']].rename(columns=d)))
.dt.days)
print (df)
number of purchased hours(mins) dob Y dob M dob D Age
0 7200 2010.0 10.0 12.0 3380
1 7320 2010.0 6.0 2.0 3512
2 5400 2011.0 6.0 18.0 3131
3 9180 2009.0 10.0 18.0 3739
4 3102 2007.0 7.0 30.0 4550
5 5400 2011.0 4.0 6.0 3204
6 9000 2009.0 8.0 5.0 3813
7 6000 2004.0 2.0 7.0 5819
8 6000 2007.0 8.0 17.0 4532
9 6000 2013.0 5.0 5.0 2444
10 12000 2012.0 9.0 27.0 2664
11 12000 2004.0 11.0 25.0 5527
12 6000 2009.0 11.0 20.0 3706

Pandas merging DF's that have diff size shape and column names and freq with no duplicates

Pandas merging DF's that have 2 columns in common their index(team name(32 teams), year 2018-2015); DF1 has 9 columns of yearly Team NFL stats AVG's, DF2 has same index team(32) and year(2018-2015) but has 11 columns of diff stats for each of the 17games(or "weeks")::: so im trying to merge for every team and year it prints the yearly avgs 1st or (Df1's 9columns,in a row) for that team and year, followed by DF2's each of its 11 columns of stats per 17(ROWS)-- games ("weeks") on its index of each team and year.
lvl0 = result.Tm_name.values
lvl1 = result.Year.values
newidx = pd.MultiIndex.from_arrays([lvl0, lvl1], names = ["Tm_name", "Year"])
result.set_index(newidx, inplace = True)
result.drop(["Year", "Tm_name"], axis = 1, inplace = True)
print(result)
W L W_L_Pct PD MoV SoS SRS OSRS DSRS
Tm_name Year
1 2015 13.0 3.0 0.813 176.0 11.0 1.3 12.3 9.0 3.4
2016 7.0 8.0 0.469 56.0 3.5 -1.9 1.6 2.4 -0.8
2017 8.0 8.0 0.500 -66.0 -4.1 0.4 -3.7 -4.0 0.2
2018 3.0 13.0 0.188 -200.0 -12.5 1.0 -11.5 -9.6 -1.9
2 2015 8.0 8.0 0.500 -6.0 -0.4 -3.4 -3.8 -4.0 0.3
2016 11.0 5.0 0.688 134.0 8.4 0.1 8.5 10.5 -2.0
2017 10.0 6.0 0.625 38.0 2.4 1.9 4.3 1.1 3.2
2018 7.0 9.0 0.438 -9.0 -0.6 0.4 -0.1 2.5 -2.6
3 2015 5.0 11.0 0.313 -73.0 -4.6 2.6 -1.9 -0.7 -1.2
2016 8.0 8.0 0.500 22.0 1.4 0.2 1.5 -1.1 2.6
2017 9.0 7.0 0.563 92.0 5.8 -2.4 3.4 2.2 1.2
2018 10.0 6.0 0.625 102.0 6.4 0.6 7.0 0.6 6.4
4 2015 8.0 8.0 0.500 20.0 1.3 -1.2 0.0 0.3 -0.2
2016 7.0 9.0 0.438 21.0 1.3 -1.6 -0.3 1.8 -2.2
2017 9.0 7.0 0.563 -57.0 -3.6 -0.5 -4.0 -3.0 -1.0
2018 6.0 10.0 0.375 -105.0 -6.6 -0.3 -6.9 -6.3 -0.6
5 2015 15.0 1.0 0.938 192.0 12.0 -3.9 8.1 6.0 2.1
2016 6.0 10.0 0.375 -33.0 -2.1 1.1 -1.0 -0.2 -0.8
2017 11.0 5.0 0.688 36.0 2.3 2.1 4.3 1.7 2.7
2018 7.0 9.0 0.438 -6.0 -0.4 1.3 0.9 0.1 0.8
6 2015 6.0 10.0 0.375 -62.0 -3.9 2.6 -1.3 -0.1 -1.2
2016 3.0 13.0 0.188 -120.0 -7.5 0.0 -7.5 -5.2 -2.3
2017 5.0 11.0 0.313 -56.0 -3.5 2.2 -1.3 -4.6 3.3
2018 12.0 4.0 0.750 138.0 8.6 -2.3 6.3 1.5 4.8
7 2015 12.0 4.0 0.750 140.0 8.8 1.9 10.6 4.8 5.8
lvl_0 = result2.Tm_name.values
lvl_1 = result2.Year.values
newidx_2 = newidx = pd.MultiIndex.from_arrays([lvl_0, lvl_1], names=["Tm_name", "Year"])
result2.set_index(newidx, inplace=True)
result2.drop(["Year", "Tm_name"], axis=1, inplace=True)
print(result2)
Week Date win_loss home_away Opp1_team Tm_Pnts \
Tm_name Year
1 2018 1 2018-09-09 0.0 1.0 32.0 6.0
2018 2 2018-09-16 0.0 0.0 18.0 0.0
2018 3 2018-09-23 0.0 1.0 6.0 14.0
2018 4 2018-09-30 0.0 1.0 28.0 17.0
2018 5 2018-10-07 1.0 0.0 29.0 28.0
2018 6 2018-10-14 0.0 0.0 20.0 17.0
2018 7 2018-10-18 0.0 1.0 10.0 10.0
2018 8 2018-10-28 1.0 1.0 29.0 18.0
2018 10 2018-11-11 0.0 0.0 16.0 14.0
2018 11 2018-11-18 0.0 1.0 25.0 21.0
2018 12 2018-11-25 0.0 0.0 17.0 10.0
2018 13 2018-12-02 1.0 0.0 12.0 20.0
2018 14 2018-12-09 0.0 1.0 11.0 3.0
2018 15 2018-12-16 0.0 0.0 2.0 14.0
2018 16 2018-12-23 0.0 1.0 18.0 9.0
2018 17 2018-12-30 0.0 0.0 28.0 24.0
2017 1 2017-09-10 0.0 0.0 11.0 23.0
2017 2 2017-09-17 1.0 0.0 14.0 16.0
2017 3 2017-09-25 0.0 1.0 9.0 17.0
2017 4 2017-10-01 1.0 1.0 29.0 18.0
2017 5 2017-10-08 0.0 0.0 26.0 7.0
2017 6 2017-10-15 1.0 1.0 30.0 38.0
2017 7 2017-10-22 0.0 0.0 18.0 0.0
2017 9 2017-11-05 1.0 0.0 29.0 20.0
2017 10 2017-11-09 0.0 1.0 28.0 16.0
2017 11 2017-11-19 0.0 0.0 13.0 21.0
2017 12 2017-11-26 1.0 1.0 15.0 27.0
2017 13 2017-12-03 0.0 1.0 18.0 16.0
2017 14 2017-12-10 1.0 1.0 31.0 12.0
2017 15 2017-12-17 0.0 0.0 32.0 15.0
... ... ... ... ... ... ...
2016 5 2016-10-06 1.0 0.0 29.0 33.0
2016 6 2016-10-17 1.0 1.0 24.0 28.0
2016 7 2016-10-23 NaN 1.0 28.0 6.0
2016 8 2016-10-30 0.0 0.0 5.0 20.0
2016 10 2016-11-13 1.0 1.0 29.0 23.0
2016 11 2016-11-20 0.0 0.0 20.0 24.0
2016 12 2016-11-27 0.0 0.0 2.0 19.0
2016 13 2016-12-04 1.0 1.0 32.0 31.0
2016 14 2016-12-11 0.0 0.0 19.0 23.0
2016 15 2016-12-18 0.0 1.0 22.0 41.0
2016 16 2016-12-24 1.0 0.0 28.0 34.0
2016 17 2016-01-01 1.0 0.0 18.0 44.0
2015 1 2015-09-13 1.0 1.0 22.0 31.0
2015 2 2015-09-20 1.0 0.0 6.0 48.0
2015 3 2015-09-27 1.0 1.0 29.0 47.0
2015 4 2015-10-04 0.0 1.0 NaN 22.0
2015 5 2015-10-11 1.0 0.0 11.0 42.0
2015 6 2015-10-18 0.0 0.0 27.0 13.0
2015 7 2015-10-26 1.0 1.0 3.0 26.0
2015 8 2015-11-01 1.0 0.0 8.0 34.0
2015 10 2015-11-15 1.0 0.0 28.0 39.0
2015 11 2015-11-22 1.0 1.0 7.0 34.0
2015 12 2015-11-29 1.0 0.0 29.0 19.0
2015 13 2015-12-06 1.0 0.0 NaN 27.0
2015 14 2015-12-10 1.0 1.0 20.0 23.0
2015 15 2015-12-20 1.0 0.0 26.0 40.0
2015 16 2015-12-27 1.0 1.0 12.0 38.0
2015 17 2015-01-03 0.0 1.0 28.0 6.0
2 2018 1 2018-09-06 0.0 0.0 26.0 12.0
2018 2 2018-09-16 1.0 1.0 5.0 31.0
Opp2_pnts Off_1stD Off_TotYd Def_1stD_All Def_TotYd_All
Tm_name Year
1 2018 24.0 14.0 213.0 30.0 429.0
2018 34.0 5.0 137.0 24.0 432.0
2018 16.0 13.0 221.0 21.0 316.0
2018 20.0 18.0 263.0 19.0 331.0
2018 18.0 10.0 220.0 33.0 447.0
2018 27.0 16.0 268.0 20.0 411.0
2018 45.0 14.0 223.0 15.0 309.0
2018 15.0 20.0 321.0 16.0 267.0
2018 26.0 21.0 260.0 20.0 330.0
2018 23.0 13.0 282.0 19.0 325.0
2018 45.0 10.0 149.0 30.0 414.0
2018 17.0 18.0 315.0 22.0 325.0
2018 17.0 22.0 279.0 16.0 218.0
2018 40.0 18.0 253.0 23.0 435.0
2018 31.0 15.0 263.0 33.0 461.0
2018 27.0 12.0 198.0 16.0 291.0
2017 35.0 24.0 308.0 19.0 367.0
2017 13.0 17.0 389.0 18.0 266.0
2017 28.0 22.0 332.0 15.0 273.0
2017 15.0 25.0 368.0 20.0 305.0
2017 34.0 16.0 307.0 19.0 419.0
2017 33.0 23.0 432.0 21.0 412.0
2017 33.0 10.0 196.0 28.0 425.0
2017 10.0 20.0 368.0 17.0 329.0
2017 22.0 24.0 290.0 14.0 287.0
2017 31.0 17.0 292.0 22.0 357.0
2017 24.0 20.0 344.0 19.0 219.0
2017 32.0 19.0 305.0 18.0 303.0
2017 7.0 16.0 261.0 14.0 204.0
2017 20.0 19.0 286.0 14.0 218.0
... ... ... ... ... ...
2016 21.0 17.0 288.0 25.0 286.0
2016 3.0 28.0 396.0 11.0 230.0
2016 6.0 23.0 443.0 11.0 257.0
2016 30.0 22.0 340.0 19.0 349.0
2016 20.0 26.0 443.0 15.0 281.0
2016 30.0 24.0 290.0 16.0 217.0
2016 38.0 23.0 332.0 28.0 360.0
2016 23.0 24.0 369.0 19.0 333.0
2016 26.0 21.0 300.0 15.0 314.0
2016 48.0 26.0 425.0 33.0 488.0
2016 31.0 21.0 370.0 24.0 391.0
2016 6.0 21.0 344.0 9.0 123.0
2015 19.0 25.0 427.0 18.0 408.0
2015 23.0 21.0 300.0 18.0 335.0
2015 7.0 28.0 446.0 10.0 156.0
2015 24.0 26.0 447.0 13.0 328.0
2015 17.0 15.0 345.0 29.0 435.0
2015 25.0 21.0 469.0 14.0 310.0
2015 18.0 21.0 414.0 18.0 276.0
2015 20.0 25.0 491.0 16.0 254.0
2015 32.0 30.0 451.0 18.0 343.0
2015 31.0 21.0 383.0 24.0 377.0
2015 13.0 26.0 337.0 17.0 368.0
2015 3.0 29.0 524.0 9.0 212.0
2015 20.0 22.0 393.0 23.0 389.0
2015 17.0 28.0 493.0 19.0 424.0
2015 8.0 19.0 381.0 16.0 178.0
2015 36.0 16.0 232.0 22.0 354.0
2 2018 18.0 16.0 299.0 18.0 232.0
2018 24.0 23.0 442.0 27.0 439.0
--I can't just merge them I've tried many different ways, it never comes out right so I figured I can get into a list and make a dataframe from the array. The array looks like what I want but when I put it into a dataframe, it will not do it{check below}
app = []
for row in result.itertuples():
app.append(row)
for row_1 in result2.itertuples():
if row[0] == row_1[0]:
app.append(row_1)
9.0, Off_1stD=25.0, Off_TotYd=427.0, Def_1stD_All=18.0, Def_TotYd_All=408.0)
Pandas(Index=(1, 2015), Week='2', Date=Timestamp('2015-09-20 00:00:00'), win_loss=1.0, home_away=0.0, Opp1_team=6.0, Tm_Pnts=48.0, Opp2_pnts=23.0, Off_1stD=21.0, Off_TotYd=300.0, Def_1stD_All=18.0, Def_TotYd_All=335.0)
Pandas(Index=(1, 2015), Week='3', Date=Timestamp('2015-09-27 00:00:00'), win_loss=1.0, home_away=1.0, Opp1_team=29.0, Tm_Pnts=47.0, Opp2_pnts=7.0, Off_1stD=28.0, Off_TotYd=446.0, Def_1stD_All=10.0, Def_TotYd_All=156.0)
Pandas(Index=(1, 2015), Week='4', Date=Timestamp('2015-10-04 00:00:00'), win_loss=0.0, home_away=1.0, Opp1_team=nan, Tm_Pnts=22.0, Opp2_pnts=24.0, Off_1stD=26.0, Off_TotYd=447.0, Def_1stD_All=13.0, Def_TotYd_All=328.0)
Pandas(Index=(1, 2015), Week='5', Date=Timestamp('2015-10-11 00:00:00'), win_loss=1.0, home_away=0.0, Opp1_team=11.0, Tm_Pnts=42.0, Opp2_pnts=17.0, Off_1stD=15.0, Off_TotYd=345.0, Def_1stD_All=29.0, Def_TotYd_All=435.0)
Pandas(Index=(1, 2015), Week='6', Date=Timestamp('2015-10-18 00:00:00'), win_loss=0.0, home_away=0.0, Opp1_team=27.0, Tm_Pnts=13.0, Opp2_pnts=25.0, Off_1stD=21.0, Off_TotYd=469.0, Def_1stD_All=14.0, Def_TotYd_All=310.0)
Pandas(Index=(1, 2015), Week='7', Date=Timestamp('2015-10-26 00:00:00'), win_loss=1.0, home_away=1.0, Opp1_team=3.0, Tm_Pnts=26.0, Opp2_pnts=18.0, Off_1stD=21.0, Off_TotYd=414.0, Def_1stD_All=18.0, Def_TotYd_All=276.0)
Pandas(Index=(1, 2015), Week='8', Date=Timestamp('2015-11-01 00:00:00'), win_loss=1.0, home_away=0.0, Opp1_team=8.0, Tm_Pnts=34.0, Opp2_pnts=20.0, Off_1stD=25.0, Off_TotYd=491.0, Def_1stD_All=16.0, Def_TotYd_All=254.0)
Pandas(Index=(1, 2015), Week='10', Date=Timestamp('2015-11-15 00:00:00'), win_loss=1.0, home_away=0.0, Opp1_team=28.0, Tm_Pnts=39.0, Opp2_pnts=32.0, Off_1stD=30.0, Off_TotYd=451.0, Def_1stD_All=18.0, Def_TotYd_All=343.0)
Pandas(Index=(1, 2015), Week='11', Date=Timestamp('2015-11-22 00:00:00'), win_loss=1.0, home_away=1.0, Opp1_team=7.0, Tm_Pnts=34.0, Opp2_pnts=31.0, Off_1stD=21.0, Off_TotYd=383.0, Def_1stD_All=24.0, Def_TotYd_All=377.0)
Pandas(Index=(1, 2015), Week='12', Date=Timestamp('2015-11-29 00:00:00'), win_loss=1.0, home_away=0.0, Opp1_team=29.0, Tm_Pnts=19.0, Opp2_pnts=13.0, Off_1stD=26.0, Off_TotYd=337.0, Def_1stD_All=17.0, Def_TotYd_All=368.0)
Pandas(Index=(1, 2015), Week='13', Date=Timestamp('2015-12-06 00:00:00'), win_loss=1.0, home_away=0.0, Opp1_team=nan, Tm_Pnts=27.0, Opp2_pnts=3.0, Off_1stD=29.0, Off_TotYd=524.0, Def_1stD_All=9.0, Def_TotYd_All=212.0)
Pandas(Index=(1, 2015), Week='14', Date=Timestamp('2015-12-10 00:00:00'), win_loss=1.0, home_away=1.0, Opp1_team=20.0, Tm_Pnts=23.0, Opp2_pnts=20.0, Off_1stD=22.0, Off_TotYd=393.0, Def_1stD_All=23.0, Def_TotYd_All=389.0)
Pandas(Index=(1, 2015), Week='15', Date=Timestamp('2015-12-20 00:00:00'), win_loss=1.0, home_away=0.0, Opp1_team=26.0, Tm_Pnts=40.0, Opp2_pnts=17.0, Off_1stD=28.0, Off_TotYd=493.0, Def_1stD_All=19.0, Def_TotYd_All=424.0)
Pandas(Index=(1, 2015), Week='16', Date=Timestamp('2015-12-27 00:00:00'), win_loss=1.0, home_away=1.0, Opp1_team=12.0, Tm_Pnts=38.0, Opp2_pnts=8.0, Off_1stD=19.0, Off_TotYd=381.0, Def_1stD_All=16.0, Def_TotYd_All=178.0)
Pandas(Index=(1, 2015), Week='17', Date=Timestamp('2015-01-03 00:00:00'), win_loss=0.0, home_away=1.0, Opp1_team=28.0, Tm_Pnts=6.0, Opp2_pnts=36.0, Off_1stD=16.0, Off_TotYd=232.0, Def_1stD_All=22.0, Def_TotYd_All=354.0)
Pandas(Index=(1, 2016), W=7.0, L=8.0, W_L_Pct=0.469, PD=56.0, MoV=3.5, SoS=-1.9, SRS=1.6, OSRS=2.4, DSRS=-0.8)
Pandas(Index=(1, 2016), Week=1, Date=Timestamp('2016-09-11 00:00:00'), win_loss=0.0, home_away=1.0, Opp1_team=21.0, Tm_Pnts=21.0, Opp2_pnts=23.0, Off_1stD=21.0, Off_TotYd=344.0, Def_1stD_All=19.0, Def_TotYd_All=363.0)
Pandas(Index=(1, 2016), Week=2, Date=Timestamp('2016-09-18 00:00:00'), win_loss=1.0, home_away=1.0, Opp1_team=30.0, Tm_Pnts=40.0, Opp2_pnts=7.0, Off_1stD=20.0, Off_TotYd=416.0, Def_1stD_All=21.0, Def_TotYd_All=306.0)
Pandas(Index=(1, 2016), Week=3, Date=Timestamp('2016-09-25 00:00:00'), win_loss=0.0, home_away=0.0, Opp1_team=4.0, Tm_Pnts=18.0, Opp2_pnts=33.0, Off_1stD=25.0, Off_TotYd=348.0, Def_1stD_All=16.0, Def_TotYd_All=297.0)
Pandas(Index=(1, 2016), Week=4, Date=Timestamp('2016-10-02 00:00:00'), win_loss=0.0, home_away=1.0, Opp1_team=18.0, Tm_Pnts=13.0, Opp2_pnts=17.0, Off_1stD=26.0, Off_TotYd=420.0, Def_1stD_All=12.0, Def_TotYd_All=288.0)
Pandas(Index=(1, 2016), Week=5, Date=Timestamp('2016-10-06 00:00:00'), win_loss=1.0, home_away=0.0, Opp1_team=29.0, Tm_Pnts=33.0, Opp2_pnts=21.0, Off_1stD=17.0, Off_TotYd=288.0, Def_1stD_All=25.0, Def_TotYd_All=286.0)
Pandas(Index=(1, 2016), Week=6, Date=Timestamp('2016-10-17 00:00:00'), win_loss=1.0, home_away=1.0, Opp1_team=24.0, Tm_Pnts=28.0, Opp2_pnts=3.0, Off_1stD=28.0, Off_TotYd=396.0, Def_1stD_All=11.0, Def_TotYd_All=230.0)
Pandas(Index=(1, 2016), Week=7, Date=Timestamp('2016-10-23 00:00:00'), win_loss=nan, home_away=1.0, Opp1_team=28.0, Tm_Pnts=6.0, Opp2_pnts=6.0, Off_1stD=23.0, Off_TotYd=443.0, Def_1stD_All=11.0, Def_TotYd_All=257.0)
Pandas(Index=(1, 2016), Week=8, Date=Timestamp('2016-10-30 00:00:00'), win_loss=0.0, home_away=0.0, Opp1_team=5.0, Tm_Pnts=20.0, Opp2_pnts=30.0, Off_1stD=22.0, Off_TotYd=340.0, Def_1stD_All=19.0, Def_TotYd_All=349.0)
Pandas(Index=(1, 2016), Week=10, Date=Timestamp('2016-11-13 00:00:00'), win_loss=1.0, home_away=1.0, Opp1_team=29.0, Tm_Pnts=23.0, Opp2_pnts=20.0, Off_1stD=26.0, Off_TotYd=443.0, Def_1stD_All=15.0, Def_TotYd_All=281.0)
Pandas(Index=(1, 2016), Week=11, Date=Timestamp('2016-11-20 00:00:00'), win_loss=0.0, home_away=0.0, Opp1_team=20.0, Tm_Pnts=24.0, Opp2_pnts=30.0, Off_1stD=24.0, Off_TotYd=290.0, Def_1stD_All=16.0, Def_TotYd_All=217.0)
Pandas(Index=(1, 2016), Week=12, Date=Timestamp('2016-11-27 00:00:00'), win_loss=0.0, home_away=0.0, Opp1_team=2.0, Tm_Pnts=19.0, Opp2_pnts=38.0, Off_1stD=23.0, Off_TotYd=332.0, Def_1stD_All=28.0, Def_TotYd_All=360.0)
Pandas(Index=(1, 2016), Week=13, Date=Timestamp('2016-12-04 00:00:00'), win_loss=1.0, home_away=1.0, Opp1_team=32.0, Tm_Pnts=31.0, Opp2_pnts=23.0, Off_1stD=24.0, Off_TotYd=369.0, Def_1stD_All=19.0, Def_TotYd_All=333.0)
Pandas(Index=(1, 2016), Week=14, Date=Timestamp('2016-12-11 00:00:00'), win_loss=0.0, home_away=0.0, Opp1_team=19.0, Tm_Pnts=23.0, Opp2_pnts=26.0, Off_1stD=21.0, Off_TotYd=300.0, Def_1stD_All=15.0, Def_TotYd_All=314.0)
Pandas(Index=(1, 2016), Week=15, Date=Timestamp('2016-12-18 00:00:00'), win_loss=0.0, home_away=1.0, Opp1_team=22.0, Tm_Pnts=41.0, Opp2_pnts=48.0, Off_1stD=26.0, Off_TotYd=425.0, Def_1stD_All=33.0, Def_TotYd_All=488.0)
Pandas(Index=(1, 2016), Week=16, Date=Timestamp('2016-12-24 00:00:00'), win_loss=1.0, home_away=0.0, Opp1_team=28.0, Tm_Pnts=34.0, Opp2_pnts=31.0, Off_1stD=21.0, Off_TotYd=370.0, Def_1stD_All=24.0, Def_TotYd_All=391.0)
Pandas(Index=(1, 2016), Week=17, Date=Timestamp('2016-01-01 00:00:00'), win_loss=1.0, home_away=0.0, Opp1_team=18.0, Tm_Pnts=44.0, Opp2_pnts=6.0, Off_1stD=21.0, Off_TotYd=344.0, Def_1stD_All=9.0, Def_TotYd_All=123.0)....
this is setup just how I would like each year starts with year avgs in a row followed by 17 rows of stats from each game that year.---now when I try to put into a DataFrame
new = pd.DataFrame(app, index=["Tm_name", "Year"])
AssertionError: 10 cols passed, passed data had 12 cols
could some one please help me iv been playing with this for 2 weeks, tried doing multi indexing, diff merges , concats, just can't seem to get it to look like the APP ARRAY and have no duplicates...
Thanks again

Categories

Resources