How do I manipulate a Dataframe with Pivot_Table in Python - python
I have spent much time on this but I am nowhere closer to a solution.
I have a dataframe which outputs as
RegionID AreaID Year Jan Feb Mar Apr May Jun
0 20.0 1.0 2020.0 1174.0 1056.0 1051.0 1107.0 1097.0 1118.0
1 19.0 2.0 2020.0 460.0 451.0 421.0 421.0 420.0 457.0
2 20.0 3.0 2020.0 2723.0 2594.0 2590.0 2399.0 2377.0 2331.0
3 21.0 4.0 2020.0 863.0 859.0 813.0 785.0 757.0 765.0
4 19.0 5.0 2020.0 4037.0 3942.0 4069.0 3844.0 3567.0 3721.0
5 19.0 6.0 2020.0 1695.0 1577.0 1531.0 1614.0 1671.0 1693.0
6 18.0 7.0 2020.0 1757.0 1505.0 1445.0 1514.0 1406.0 1444.0
7 18.0 8.0 2020.0 832.0 721.0 747.0 852.0 885.0 872.0
8 18.0 9.0 2020.0 2538.0 2000.0 2026.0 1981.0 1987.0 1949.0
9 21.0 10.0 2020.0 1145.0 1235.0 1114.0 1161.0 1150.0 1189.0
10 20.0 11.0 2020.0 551.0 497.0 503.0 472.0 505.0 532.0
11 19.0 12.0 2020.0 1664.0 1526.0 1389.0 1373.0 1384.0 1404.0
12 21.0 13.0 2020.0 381.0 351.0 299.0 286.0 297.0 319.0
13 21.0 14.0 2020.0 1733.0 1627.0 1567.0 1561.0 1498.0 1511.0
14 18.0 15.0 2020.0 1257.0 1257.0 1160.0 1172.0 1124.0 1113.0
I want to pivot this data so that I have a month combined field like below
RegionID AreaID Year Month Amout
20.0 1.0 2020 Jan 1174
20.0 1.0 2020 Feb 1056
20.0 1.0 2020 Mar 1051
Can this be done using pandas? I have been trying with the pivot_table but I cant get it to work.
I hope I've understood your question well. You can .set_index() and then .stack():
print(
df.set_index(["RegionID", "AreaID", "Year"])
.stack()
.reset_index()
.rename(columns={"level_3": "Month", 0: "Amount"})
)
Prints:
RegionID AreaID Year Month Amount
0 20.0 1.0 2020.0 Jan 1174.0
1 20.0 1.0 2020.0 Feb 1056.0
2 20.0 1.0 2020.0 Mar 1051.0
3 20.0 1.0 2020.0 Apr 1107.0
4 20.0 1.0 2020.0 May 1097.0
5 20.0 1.0 2020.0 Jun 1118.0
6 19.0 2.0 2020.0 Jan 460.0
7 19.0 2.0 2020.0 Feb 451.0
8 19.0 2.0 2020.0 Mar 421.0
9 19.0 2.0 2020.0 Apr 421.0
10 19.0 2.0 2020.0 May 420.0
11 19.0 2.0 2020.0 Jun 457.0
...
Or:
print(
df.melt(
["RegionID", "AreaID", "Year"], var_name="Month", value_name="Amount"
)
)
Related
How to do similar to conditional countifs on a dataframe
I am trying to replicate countifs in excel to get a rank between two unique values that are listed in my dataframe. I have attached the expected output calculated in excel using countif and let/rank functions. I am trying to generate "average rank of gas and coal plants" that takes the number from the "average rank column" and then ranks the two unique types from technology (CCGT or COAL) into two new ranks (Gas or Coal) so then I can get the relavant quantiles for this. In case you are wondering why I would need to do this seeing as there are only two coal plants, well when I run this model on a larger dataset it will be useful to know how to do this in code and not manually on my dataset. Ideally the output will return two ranks 1-47 for all units with technology == CCGT and 1-2 for all units with technology == COAL. This is the column I am looking to make Unit ID Technology 03/01/2022 04/01/2022 05/01/2022 06/01/2022 07/01/2022 08/01/2022 Average Rank Unit Rank Avg Rank of Gas & Coal plants Gas Quintiles Coal Quintiles Quintiles FAWN-1 CCGT 1.0 5.0 1.0 5.0 2.0 1.0 2.5 1 1 1 0 Gas_1 GRAI-6 CCGT 4.0 18.0 2.0 4.0 3.0 3.0 5.7 2 2 1 0 Gas_1 EECL-1 CCGT 5.0 29.0 4.0 1.0 1.0 2.0 7.0 3 3 1 0 Gas_1 PEMB-21 CCGT 7.0 1.0 6.0 13.0 8.0 8.0 7.2 4 4 1 0 Gas_1 PEMB-51 CCGT 3.0 3.0 3.0 11.0 16.0 7.2 5 5 1 0 Gas_1 PEMB-41 CCGT 9.0 4.0 7.0 7.0 10.0 13.0 8.3 6 6 1 0 Gas_1 WBURB-1 CCGT 6.0 9.0 22.0 2.0 7.0 5.0 8.5 7 7 1 0 Gas_1 PEMB-31 CCGT 14.0 6.0 13.0 6.0 4.0 9.0 8.7 8 8 1 0 Gas_1 GRMO-1 CCGT 2.0 7.0 10.0 24.0 11.0 6.0 10.0 9 9 1 0 Gas_1 PEMB-11 CCGT 21.0 2.0 9.0 10.0 9.0 14.0 10.8 10 10 2 0 Gas_2 STAY-1 CCGT 19.0 12.0 5.0 23.0 6.0 7.0 12.0 11 11 2 0 Gas_2 GRAI-7 CCGT 10.0 27.0 15.0 9.0 15.0 11.0 14.5 12 12 2 0 Gas_2 DIDCB6 CCGT 28.0 11.0 11.0 8.0 19.0 15.0 15.3 13 13 2 0 Gas_2 SCCL-3 CCGT 17.0 16.0 31.0 3.0 18.0 10.0 15.8 14 14 2 0 Gas_2 STAY-4 CCGT 12.0 8.0 20.0 18.0 14.0 23.0 15.8 14 14 2 0 Gas_2 CDCL-1 CCGT 13.0 22.0 8.0 25.0 12.0 16.0 16.0 16 16 2 0 Gas_2 STAY-3 CCGT 8.0 17.0 17.0 20.0 13.0 22.0 16.2 17 17 2 0 Gas_2 MRWD-1 CCGT 19.0 26.0 5.0 19.0 17.3 18 18 2 0 Gas_2 WBURB-3 CCGT 24.0 14.0 17.0 17.0 18.0 19 19 3 0 Gas_3 WBURB-2 CCGT 14.0 21.0 12.0 31.0 18.0 19.2 20 20 3 0 Gas_3 GYAR-1 CCGT 26.0 14.0 17.0 20.0 21.0 19.6 21 21 3 0 Gas_3 STAY-2 CCGT 18.0 20.0 18.0 21.0 24.0 20.0 20.2 22 22 3 0 Gas_3 KLYN-A-1 CCGT 24.0 12.0 19.0 27.0 20.5 23 23 3 0 Gas_3 SHOS-1 CCGT 16.0 15.0 28.0 15.0 29.0 27.0 21.7 24 24 3 0 Gas_3 DIDCB5 CCGT 10.0 35.0 22.0 22.3 25 25 3 0 Gas_3 CARR-1 CCGT 33.0 26.0 27.0 22.0 4.0 22.4 26 26 3 0 Gas_3 LAGA-1 CCGT 15.0 13.0 29.0 32.0 23.0 24.0 22.7 27 27 3 0 Gas_3 CARR-2 CCGT 24.0 25.0 27.0 29.0 21.0 12.0 23.0 28 28 3 0 Gas_3 GRAI-8 CCGT 11.0 28.0 36.0 16.0 26.0 25.0 23.7 29 29 4 0 Gas_4 SCCL-2 CCGT 29.0 16.0 28.0 25.0 24.5 30 30 4 0 Gas_4 LBAR-1 CCGT 19.0 25.0 31.0 28.0 25.8 31 31 4 0 Gas_4 CNQPS-2 CCGT 20.0 32.0 32.0 26.0 27.5 32 32 4 0 Gas_4 SPLN-1 CCGT 23.0 30.0 30.0 27.7 33 33 4 0 Gas_4 DAMC-1 CCGT 23.0 21.0 38.0 34.0 29.0 34 34 4 0 Gas_4 KEAD-2 CCGT 30.0 30.0 35 35 4 0 Gas_4 SHBA-1 CCGT 26.0 23.0 35.0 37.0 30.3 36 36 4 0 Gas_4 HUMR-1 CCGT 22.0 30.0 37.0 37.0 33.0 28.0 31.2 37 37 4 0 Gas_4 CNQPS-4 CCGT 27.0 33.0 35.0 30.0 31.3 38 38 5 0 Gas_5 CNQPS-1 CCGT 25.0 40.0 33.0 32.7 39 39 5 0 Gas_5 SEAB-1 CCGT 32.0 34.0 36.0 29.0 32.8 40 40 5 0 Gas_5 PETEM1 CCGT 35.0 35.0 41 41 5 0 Gas_5 ROCK-1 CCGT 31.0 34.0 38.0 38.0 35.3 42 42 5 0 Gas_5 SEAB-2 CCGT 31.0 39.0 39.0 34.0 35.8 43 43 5 0 Gas_5 WBURB-43 COAL 32.0 37.0 40.0 39.0 31.0 35.8 44 1 0 1 Coal_1 FDUNT-1 CCGT 36.0 36.0 45 44 5 0 Gas_5 COSO-1 CCGT 30.0 42.0 36.0 36.0 45 44 5 0 Gas_5 WBURB-41 COAL 33.0 38.0 41.0 40.0 32.0 36.8 47 2 0 1 Coal_1 FELL-1 CCGT 34.0 39.0 43.0 41.0 33.0 38.0 48 46 5 0 Gas_5 KEAD-1 CCGT 43.0 43.0 49 47 5 0 Gas_5 I have tried to do it the same way I got average rank, which is a rank of the average of inputs in the dataframe but it doesn't seem to work with additional conditions. Thank you!!
import pandas as pd df = pd.read_csv("gas.csv") display(df['Technology'].value_counts()) print('------') display(df['Technology'].value_counts()[0]) # This is how you access count of CCGT display(df['Technology'].value_counts()[1]) Output: CCGT 47 COAL 2 Name: Technology, dtype: int64 ------ 47 2 By the way: pd.cut or pd.qcut can be used to calculate quantiles. You don't have to manually define what a quantile is. Refer to the documentation and other websites: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.qcut.html https://www.geeksforgeeks.org/how-to-use-pandas-cut-and-qcut/ There are many methods you can pass to rank. Refer to documentation: https://pandas.pydata.org/docs/reference/api/pandas.Series.rank.html df['rank'] = df.groupby("Technology")["Average Rank"].rank(method = "dense", ascending = True) df method{‘average’, ‘min’, ‘max’, ‘first’, ‘dense’}, default ‘average’ How to rank the group of records that have the same value (i.e. ties): average: average rank of the group min: lowest rank in the group max: highest rank in the group first: ranks assigned in order they appear in the array dense: like ‘min’, but rank always increases by 1 between groups.
What is the best way to create a new dataframe with existing ones of different shapes and criteria
I have a few dataframes that I have made through various sorting and processing of data from the main dataframe (df1). df1 - large and will currently covers 6 days worth of data for every 30 mins but I wish to scale up to longer periods: import pandas as pd import numpy as np bmu_units = pd.read_csv('bmu_units_technology.csv') b1610 = pd.read_csv('b1610_df.csv') b1610 = (b1610.merge(bmu_units, on=['BM Unit ID 1'], how='left')) b1610['% of capacity running'] = b1610.quantity / b1610.Capacity def func(tech): if tech in ["CCGT","OCGT","COAL"]: return "Fossil" else: return "ZE" b1610["Type"] = b1610['Technology'].apply(func) settlementDate time BM Unit ID 1 BM Unit ID 2_x settlementPeriod quantity BM Unit ID 2_y Capacity Technology % of capacity running Type 0 03/01/2022 00:00:00 RCBKO-1 T_RCBKO-1 1 278.658 T_RCBKO-1 279.0 WIND 0.998774 ZE 1 03/01/2022 00:00:00 LARYO-3 T_LARYW-3 1 162.940 T_LARYW-3 180.0 WIND 0.905222 ZE 2 03/01/2022 00:00:00 LAGA-1 T_LAGA-1 1 262.200 T_LAGA-1 905.0 CCGT 0.289724 Fossil 3 03/01/2022 00:00:00 CRMLW-1 T_CRMLW-1 1 3.002 T_CRMLW-1 47.0 WIND 0.063872 ZE 4 03/01/2022 00:00:00 GRIFW-1 T_GRIFW-1 1 9.972 T_GRIFW-1 102.0 WIND 0.097765 ZE ... ... ... ... ... ... ... ... ... ... ... ... 52533 08/01/2022 23:30:00 CRMLW-1 T_CRMLW-1 48 8.506 T_CRMLW-1 47.0 WIND 0.180979 ZE 52534 08/01/2022 23:30:00 LARYO-4 T_LARYW-4 48 159.740 T_LARYW-4 180.0 WIND 0.887444 ZE 52535 08/01/2022 23:30:00 HOWBO-3 T_HOWBO-3 48 32.554 T_HOWBO-3 440.0 Offshore Wind 0.073986 ZE 52536 08/01/2022 23:30:00 BETHW-1 E_BETHW-1 48 5.010 E_BETHW-1 30.0 WIND 0.167000 ZE 52537 08/01/2022 23:30:00 HMGTO-1 T_HMGTO-1 48 92.094 HMGTO-1 108.0 WIND 0.852722 ZE df2: rank = ( b1610.pivot_table( index=['settlementDate','BM Unit ID 1','Technology'], columns='settlementPeriod', values='% of capacity running', aggfunc=sum, fill_value=0) ) rank['rank of capacity'] = rank.sum(axis=1) rank settlementPeriod 1 2 3 4 5 6 7 8 9 10 ... 40 41 42 43 44 45 46 47 48 rank of capacity settlementDate BM Unit ID 1 Technology 03/01/2022 ABRBO-1 WIND 0.936970 0.969293 0.970909 0.925051 0.885657 0.939394 0.963434 0.938586 0.863232 0.781212 ... 0.461818 0.394545 0.428889 0.537172 0.520606 0.545253 0.873333 0.697778 0.651111 29.566263 ABRTW-1 WIND 0.346389 0.343333 0.345389 0.341667 0.342222 0.346778 0.347611 0.347722 0.346833 0.340556 ... 0.018778 0.015889 0.032056 0.043056 0.032167 0.109611 0.132111 0.163278 0.223556 10.441333 ACHRW-1 WIND 0.602884 0.575628 0.602140 0.651070 0.667721 0.654791 0.539209 0.628698 0.784233 0.782140 ... 0.174419 0.148465 0.139860 0.091535 0.094698 0.272419 0.205023 0.184651 0.177628 18.517814 AKGLW-2 WIND 0.000603 0.000603 0.000603 0.000635 0.000603 0.000635 0.000635 0.000635 0.000635 0.000603 ... 0.191079 0.195079 0.250476 0.281048 0.290000 0.279524 0.358508 0.452698 0.572730 8.616032 ANSUW-1 WIND 0.889368 0.865053 0.915684 0.894000 0.888526 0.858211 0.875158 0.878421 0.809368 0.898737 ... 0.142632 0.212526 0.276421 0.225053 0.235789 0.228000 0.152211 0.226000 0.299158 19.662421 ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... 08/01/2022 WBURB-2 CCGT 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.636329 0.642447 0.961835 0.908706 0.650212 0.507012 0.513176 0.503576 0.518212 24.439765 HOWBO-3 Offshore Wind 0.030418 0.026355 0.026595 0.014373 0.012523 0.008418 0.010977 0.016918 0.019127 0.025641 ... 0.055509 0.063845 0.073850 0.073923 0.073895 0.073791 0.073886 0.074050 0.073986 2.332809 MRWD-1 CCGT 0.808043 0.894348 0.853043 0.650870 0.159783 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.701739 0.488913 0.488913 0.489348 0.489130 0.392826 0.079130 0.000000 0.000000 23.485217 WBURB-3 CCGT 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.771402 0.699986 0.648386 0.919242 0.759520 0.424513 0.430598 0.420089 0.436376 25.436282 DRAXX-4 BIOMASS 0.706074 0.791786 0.806713 0.806462 0.806270 0.806136 0.806509 0.806369 0.799749 0.825070 ... 0.777395 0.816093 0.707122 0.666639 0.680406 0.679216 0.501433 0.000000 0.000000 36.576512 df3 - this was made by sorting the above dataframe to list sums for each day for each BM Unit ID filtered for specific technology types. BM Unit ID 1 Technology 03/01/2022 04/01/2022 05/01/2022 06/01/2022 07/01/2022 08/01/2022 ave rank rank 0 FAWN-1 CCGT 1.0 5.0 1.0 5.0 2.0 1.0 2.500000 1.0 1 GRAI-6 CCGT 4.0 18.0 2.0 4.0 3.0 3.0 5.666667 2.0 2 EECL-1 CCGT 5.0 29.0 4.0 1.0 1.0 2.0 7.000000 3.0 3 PEMB-21 CCGT 7.0 1.0 6.0 13.0 8.0 8.0 7.166667 4.0 4 PEMB-51 CCGT 3.0 3.0 3.0 11.0 16.0 NaN 7.200000 5.0 5 PEMB-41 CCGT 9.0 4.0 7.0 7.0 10.0 13.0 8.333333 6.0 6 WBURB-1 CCGT 6.0 9.0 22.0 2.0 7.0 5.0 8.500000 7.0 7 PEMB-31 CCGT 14.0 6.0 13.0 6.0 4.0 9.0 8.666667 8.0 8 GRMO-1 CCGT 2.0 7.0 10.0 24.0 11.0 6.0 10.000000 9.0 9 PEMB-11 CCGT 21.0 2.0 9.0 10.0 9.0 14.0 10.833333 10.0 10 STAY-1 CCGT 19.0 12.0 5.0 23.0 6.0 7.0 12.000000 11.0 11 GRAI-7 CCGT 10.0 27.0 15.0 9.0 15.0 11.0 14.500000 12.0 12 DIDCB6 CCGT 28.0 11.0 11.0 8.0 19.0 15.0 15.333333 13.0 13 STAY-4 CCGT 12.0 8.0 20.0 18.0 14.0 23.0 15.833333 14.0 14 SCCL-3 CCGT 17.0 16.0 31.0 3.0 18.0 10.0 15.833333 14.0 15 CDCL-1 CCGT 13.0 22.0 8.0 25.0 12.0 16.0 16.000000 15.0 16 STAY-3 CCGT 8.0 17.0 17.0 20.0 13.0 22.0 16.166667 16.0 17 MRWD-1 CCGT NaN NaN 19.0 26.0 5.0 19.0 17.250000 17.0 18 WBURB-3 CCGT NaN NaN 24.0 14.0 17.0 17.0 18.000000 18.0 19 WBURB-2 CCGT NaN 14.0 21.0 12.0 31.0 18.0 19.200000 19.0 20 GYAR-1 CCGT NaN 26.0 14.0 17.0 20.0 21.0 19.600000 20.0 21 STAY-2 CCGT 18.0 20.0 18.0 21.0 24.0 20.0 20.166667 21.0 22 SHOS-1 CCGT 16.0 15.0 28.0 15.0 29.0 27.0 21.666667 22.0 23 KLYN-A-1 CCGT NaN 24.0 12.0 19.0 27.0 29.0 22.200000 23.0 24 DIDCB5 CCGT NaN 10.0 35.0 22.0 NaN NaN 22.333333 24.0 25 CARR-1 CCGT NaN 33.0 26.0 27.0 22.0 4.0 22.400000 25.0 26 LAGA-1 CCGT 15.0 13.0 29.0 32.0 23.0 24.0 22.666667 26.0 27 CARR-2 CCGT 24.0 25.0 27.0 29.0 21.0 12.0 23.000000 27.0 28 GRAI-8 CCGT 11.0 28.0 36.0 16.0 26.0 25.0 23.666667 28.0 29 SCCL-2 CCGT 29.0 NaN 16.0 28.0 25.0 NaN 24.500000 29.0 30 LBAR-1 CCGT NaN 19.0 25.0 31.0 28.0 NaN 25.750000 30.0 31 CNQPS-2 CCGT 20.0 NaN 32.0 NaN 32.0 26.0 27.500000 31.0 32 SPLN-1 CCGT NaN NaN 23.0 30.0 30.0 NaN 27.666667 32.0 33 CNQPS-1 CCGT 25.0 NaN 33.0 NaN NaN NaN 29.000000 33.0 34 DAMC-1 CCGT 23.0 21.0 38.0 34.0 NaN NaN 29.000000 33.0 35 KEAD-2 CCGT 30.0 NaN NaN NaN NaN NaN 30.000000 34.0 36 HUMR-1 CCGT 22.0 30.0 37.0 37.0 33.0 28.0 31.166667 35.0 37 SHBA-1 CCGT 26.0 23.0 40.0 35.0 37.0 NaN 32.200000 36.0 38 SEAB-1 CCGT NaN 32.0 34.0 36.0 NaN 30.0 33.000000 37.0 39 CNQPS-4 CCGT 27.0 NaN 41.0 33.0 35.0 31.0 33.400000 38.0 40 PETEM1 CCGT NaN 35.0 NaN NaN NaN NaN 35.000000 39.0 41 SEAB-2 CCGT NaN 31.0 39.0 39.0 34.0 NaN 35.750000 40.0 42 COSO-1 CCGT NaN NaN 30.0 42.0 36.0 NaN 36.000000 41.0 43 ROCK-1 CCGT 31.0 34.0 42.0 38.0 38.0 NaN 36.600000 42.0 44 WBURB-43 COAL 32.0 37.0 45.0 40.0 39.0 32.0 37.500000 43.0 45 WBURB-41 COAL 33.0 38.0 46.0 41.0 40.0 33.0 38.500000 44.0 46 FELL-1 CCGT 34.0 39.0 47.0 43.0 41.0 34.0 39.666667 45.0 47 FDUNT-1 OCGT NaN 36.0 44.0 NaN NaN NaN 40.000000 46.0 48 KEAD-1 CCGT NaN NaN 43.0 NaN NaN NaN 43.000000 47.0 My issue is that I am trying to create a new dataframe using the existing dataframes listed above in which I can list all my BM Unit ID 1's in order of rank from df2 while populating the values with means of values for all dates (not split by date) in df1. An example of what I am after is below, which I made on excel using index match. Here I have the results for each settlement period from df1 and df2 but instead of split by date they are an aggregated mean over all dates in the df but they are still ranked according to the last column of df2, which is key. Desired Output: BM Unit ID Technology 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 Rank Capacity 1 150 FAWN-1 CCGT 130.43 130.93 130.78 130.58 130.57 130.54 130.71 130.87 130.89 130.98 130.83 130.80 130.88 131.02 130.81 130.65 130.86 130.84 131.19 130.60 130.69 130.70 130.40 130.03 130.13 130.03 129.75 2 455 GRAI-6 CCGT 339.45 342.33 322.53 312.40 303.78 307.60 316.35 277.18 293.48 325.75 326.75 271.34 299.74 328.06 317.12 342.66 364.50 390.90 403.32 411.52 400.18 405.94 394.04 400.08 389.08 382.74 374.76 3 408 EECL-1 CCGT 363.31 386.71 364.46 363.31 363.31 363.38 361.87 305.06 286.99 282.74 323.93 242.88 242.64 207.73 294.71 357.15 383.47 426.93 433.01 432.98 435.14 436.38 416.04 417.69 430.42 415.09 406.45 4 430 PEMB-21 CCGT 334.40 419.50 436.70 441.90 440.50 415.80 327.90 323.70 322.70 331.10 367.50 368.40 396.70 259.05 415.95 356.32 386.84 400.00 429.52 435.40 434.84 435.88 435.60 438.48 438.16 437.84 437.76 5 465 PEMB-51 CCGT 370.65 370.45 359.90 326.25 326.20 322.65 324.60 274.25 319.55 288.80 301.75 279.08 379.60 376.76 389.92 419.24 403.64 420.92 428.20 421.32 396.92 397.80 424.40 433.92 434.56 431.44 434.40 6 445 PEMB-41 CCGT 337.00 423.40 423.10 427.50 427.00 419.00 361.00 318.80 263.20 226.70 268.70 231.35 366.90 378.35 392.20 421.55 354.96 382.48 422.64 428.28 428.76 431.24 431.92 431.84 429.52 429.00 431.48 7 425 WBURB-1 CCGT 240.41 293.17 252.27 256.51 261.65 253.44 247.14 217.08 223.11 199.27 254.69 314.16 361.07 317.50 259.54 266.83 349.64 383.43 408.18 412.29 395.54 383.48 355.98 340.49 360.87 352.74 376.92 8 465 PEMB-31 CCGT 297.73 360.27 355.40 357.07 358.67 353.07 300.93 284.73 268.73 255.20 248.53 257.75 366.75 376.45 396.40 320.56 342.68 352.52 361.16 379.40 386.64 390.36 409.12 427.48 426.60 426.80 427.16 9 144 GRMO-1 CCGT 106.62 106.11 105.96 106.00 106.00 105.98 105.99 105.90 105.47 105.31 105.28 105.07 105.04 105.06 105.06 105.04 105.06 105.06 105.07 105.04 105.05 105.06 105.04 105.04 105.04 105.06 105.07 10 430 PEMB-11 CCGT 432.80 430.40 430.70 431.90 432.10 429.30 430.00 408.30 320.90 346.50 432.90 432.20 312.93 297.20 414.55 432.00 420.40 429.80 402.60 426.90 430.65 435.85 435.10 431.15 435.20 431.50 431.75 11 457 STAY-1 CCGT 216.07 223.27 232.67 243.47 234.67 221.73 227.00 219.00 237.00 218.33 250.73 228.27 219.67 142.68 243.00 300.64 312.28 331.00 360.84 379.28 398.92 410.04 410.56 409.24 411.96 408.84 411.88 12 455 GRAI-7 CCGT 425.20 425.40 377.90 339.40 342.00 329.80 408.00 402.40 329.00 257.30 130.43 211.37 262.60 318.45 299.98 324.72 350.40 386.26 394.20 402.10 390.48 401.22 388.94 394.10 395.14 379.70 377.26 13 710 DIDCB6 CCGT 465.80 459.50 411.60 411.70 413.70 410.80 351.50 333.40 333.70 390.40 234.60 265.56 348.16 430.28 524.32 554.04 536.28 589.28 594.04 597.72 592.76 557.86 687.70 687.25 687.35 687.25 679.80 14 400 SCCL-3 CCGT 311.50 337.40 378.80 311.50 381.30 338.60 302.70 300.70 300.60 300.70 338.20 321.50 363.80 260.35 228.18 308.70 334.73 324.60 354.63 362.38 347.30 306.22 346.86 365.04 365.40 370.68 370.52 400 SCCL-3 CCGT 311.50 337.40 378.80 311.50 381.30 338.60 302.70 300.70 300.60 300.70 338.20 321.50 363.80 260.35 228.18 308.70 334.73 324.60 354.63 362.38 347.30 306.22 346.86 365.04 365.40 370.68 370.52 16 440 CDCL-1 CCGT 270.63 255.24 210.87 197.10 195.12 198.72 197.64 198.99 233.19 221.31 176.94 317.52 280.68 213.12 297.68 342.25 397.26 372.28 371.74 379.87 347.51 348.48 352.15 384.88 395.14 381.02 360.40 17 457 STAY-3 CCGT 311.25 311.30 311.60 311.45 311.15 311.30 308.40 313.10 223.90 196.05 242.95 172.87 217.40 236.84 252.92 352.98 384.06 414.76 403.68 424.90 418.38 403.00 420.26 424.40 427.06 421.64 424.66 18 920 MRWD-1 CCGT 468.70 483.90 420.60 267.80 472.60 470.20 241.40 299.30 327.70 327.80 336.90 241.60 308.33 529.93 793.73 828.40 870.67 846.67 827.07 855.93 829.33 865.87 870.40 846.87 765.47 785.20 824.00 19 425 WBURB-3 CCGT 311.73 427.68 333.68 333.93 370.68 335.09 420.85 433.86 370.45 321.70 340.54 300.95 155.47 190.67 290.81 310.43 332.52 376.63 391.11 413.74 408.33 398.69 397.54 368.05 410.64 413.05 428.91 20 425 WBURB-2 CCGT 295.54 424.56 336.68 334.08 371.20 358.44 358.90 358.96 377.94 325.42 203.19 165.32 205.75 121.41 162.51 180.15 301.12 413.77 410.33 397.21 385.59 378.09 381.50 380.93 413.71 418.53 427.09 21 420 GYAR-1 CCGT 404.33 404.33 403.73 405.12 404.13 404.33 404.33 376.98 218.02 218.02 351.01 215.10 177.46 222.43 345.47 398.94 401.97 401.97 402.17 401.87 401.47 401.77 401.62 402.51 402.31 402.41 402.26 22 457 STAY-2 CCGT 434.20 435.40 435.40 435.20 434.20 434.20 434.20 434.60 249.80 196.20 291.20 234.80 196.80 88.73 167.10 239.52 324.52 372.80 412.40 423.32 424.04 423.96 423.92 424.08 423.88 420.96 422.44 23 400 KLYN-A-1 CCGT 382.58 382.50 384.94 385.81 385.83 385.79 385.02 384.94 259.16 141.03 195.65 205.75 278.81 256.95 296.85 337.82 369.26 376.38 376.84 376.56 376.30 376.09 375.62 375.45 375.11 375.17 375.09 24 420 SHOS-1 CCGT 290.63 326.33 229.60 265.70 269.05 259.40 299.45 310.20 301.65 266.00 307.90 319.30 253.06 246.85 263.04 220.46 277.68 297.84 290.62 297.86 302.83 295.13 293.73 289.04 306.14 314.24 321.76
How to concatenate variable string data to a row in a dataframe based on numeric value
I have a pandas dataframe result, looks like this: Weekday Day Store1 Store2 Store3 Store4 Store5 0 Mon 6 0.0 0.0 0.0 0.0 0.0 1 Tue 7 42.0 33.0 23.0 42.0 21.0 2 Wed 8 43.0 29.0 13.0 33.0 22.0 3 Thu 9 45.0 24.0 20.0 29.0 18.0 4 Fri 10 48.0 21.0 22.0 37.0 22.0 5 Sat 11 34.0 22.0 23.0 34.0 18.0 0 Mon 13 39.0 21.0 21.0 25.0 21.0 1 Tue 14 39.0 20.0 18.0 0.0 19.0 2 Wed 15 46.0 26.0 18.0 31.0 24.0 3 Thu 16 38.0 21.0 15.0 45.0 29.0 4 Fri 17 42.0 21.0 21.0 41.0 20.0 5 Sat 18 40.0 25.0 15.0 36.0 19.0 0 Mon 20 39.0 22.0 23.0 36.0 19.0 1 Tue 21 31.0 18.0 16.0 35.0 23.0 2 Wed 22 33.0 25.0 17.0 39.0 22.0 3 Thu 23 34.0 24.0 19.0 18.0 27.0 4 Fri 24 33.0 18.0 24.0 43.0 24.0 5 Sat 25 38.0 22.0 20.0 40.0 12.0 0 Mon 27 41.0 21.0 18.0 31.0 23.0 1 Tue 28 32.0 21.0 14.0 23.0 14.0 2 Wed 29 33.0 18.0 15.0 19.0 23.0 3 Thu 30 36.0 21.0 21.0 23.0 18.0 4 Fri 1 40.0 30.0 24.0 38.0 23.0 5 Sat 2 40.0 19.0 22.0 38.0 21.0 Notice how Day goes from 6 to 30, then back to 1, and 2. In this example, it's referring to September 6, 2021 - October 2nd, 2021. I currently have a variable PrimaryMonth = September and SecondaryMonth = October I know that I can do result['Month'] = 'September' but it will list all the Month values as September, I'd like to find a way, if possible, to iterate through the rows so that when it reaches the bottom 1 and 2 it will show October in the new Month column. Is it possible to do a For loop or some other iteration to accomplish this? I was initially brainstorming some pseudocode #for row in result: # while Day <= 31 #concat PrimaryMonth #else concat SecondaryMonth You can kind of get an idea of where I want to go with this.
Many things are easier if you use proper date formats... date_str = 'Monday, September 6, 2021 - Saturday, October 2, 2021' new_index = pd.date_range(*map(pd.to_datetime, date_str.split(' - '))) dates = pd.DataFrame(index=new_index) dates['day'] = dates.index.day dates.columns = ['Day'] df = pd.merge(dates, df, 'outer') df.index = dates.index df['month'] = df.index.month_name() print(df.dropna()) Output: Day Weekday Store1 Store2 Store3 Store4 Store5 month 2021-09-06 6 Mon 0.0 0.0 0.0 0.0 0.0 September 2021-09-07 7 Tue 42.0 33.0 23.0 42.0 21.0 September 2021-09-08 8 Wed 43.0 29.0 13.0 33.0 22.0 September 2021-09-09 9 Thu 45.0 24.0 20.0 29.0 18.0 September 2021-09-10 10 Fri 48.0 21.0 22.0 37.0 22.0 September 2021-09-11 11 Sat 34.0 22.0 23.0 34.0 18.0 September 2021-09-13 13 Mon 39.0 21.0 21.0 25.0 21.0 September 2021-09-14 14 Tue 39.0 20.0 18.0 0.0 19.0 September 2021-09-15 15 Wed 46.0 26.0 18.0 31.0 24.0 September 2021-09-16 16 Thu 38.0 21.0 15.0 45.0 29.0 September 2021-09-17 17 Fri 42.0 21.0 21.0 41.0 20.0 September 2021-09-18 18 Sat 40.0 25.0 15.0 36.0 19.0 September 2021-09-20 20 Mon 39.0 22.0 23.0 36.0 19.0 September 2021-09-21 21 Tue 31.0 18.0 16.0 35.0 23.0 September 2021-09-22 22 Wed 33.0 25.0 17.0 39.0 22.0 September 2021-09-23 23 Thu 34.0 24.0 19.0 18.0 27.0 September 2021-09-24 24 Fri 33.0 18.0 24.0 43.0 24.0 September 2021-09-25 25 Sat 38.0 22.0 20.0 40.0 12.0 September 2021-09-27 27 Mon 41.0 21.0 18.0 31.0 23.0 September 2021-09-28 28 Tue 32.0 21.0 14.0 23.0 14.0 September 2021-09-29 29 Wed 33.0 18.0 15.0 19.0 23.0 September 2021-09-30 30 Thu 36.0 21.0 21.0 23.0 18.0 September 2021-10-01 1 Fri 40.0 30.0 24.0 38.0 23.0 October 2021-10-02 2 Sat 40.0 19.0 22.0 38.0 21.0 October And no, no matter what you do, a for-loop is probably the wrong answer when it comes to pandas.
Calculating age from dataframe (dob -y/m/d)
I'm trying to add a column "Age" to my data number of purchased hours(mins) dob Y dob M dob D 0 7200 2010.0 10.0 12.0 1 7320 2010.0 6.0 2.0 2 5400 2011.0 6.0 18.0 3 9180 2009.0 10.0 18.0 4 3102 2007.0 7.0 30.0 5 5400 2011.0 4.0 6.0 6 9000 2009.0 8.0 5.0 7 6000 2004.0 2.0 7.0 8 6000 2007.0 8.0 17.0 9 6000 2013.0 5.0 5.0 10 12000 2012.0 9.0 27.0 11 12000 2004.0 11.0 25.0 12 6000 2009.0 11.0 20.0 I've tried this code, but not sure what went wrong from datetime import datetime as dt df['Age'] = datetime.datetime.now()-pd.to_datetime(df[['dob D','dob M','dob Y']]) Below is the error that popped up ValueError: to assemble mappings requires at least that [year, month, day] be specified: [day,month,year] is missing
If want use to_datetime with 3 columns it working only with renamed columns names: d = {'dob Y':'year', 'dob M':'month', 'dob D':'day'} df['Age'] = (pd.Timestamp.now().floor('d') - pd.to_datetime(df[['dob D','dob M','dob Y']].rename(columns=d))) print (df) number of purchased hours(mins) dob Y dob M dob D Age 0 7200 2010.0 10.0 12.0 3380 days 1 7320 2010.0 6.0 2.0 3512 days 2 5400 2011.0 6.0 18.0 3131 days 3 9180 2009.0 10.0 18.0 3739 days 4 3102 2007.0 7.0 30.0 4550 days 5 5400 2011.0 4.0 6.0 3204 days 6 9000 2009.0 8.0 5.0 3813 days 7 6000 2004.0 2.0 7.0 5819 days 8 6000 2007.0 8.0 17.0 4532 days 9 6000 2013.0 5.0 5.0 2444 days 10 12000 2012.0 9.0 27.0 2664 days 11 12000 2004.0 11.0 25.0 5527 days 12 6000 2009.0 11.0 20.0 3706 days If want convert timedeltas to days: d = {'dob Y':'year', 'dob M':'month', 'dob D':'day'} df['Age'] = ((pd.Timestamp.now().floor('d') - pd.to_datetime(df[['dob D','dob M','dob Y']].rename(columns=d))) .dt.days) print (df) number of purchased hours(mins) dob Y dob M dob D Age 0 7200 2010.0 10.0 12.0 3380 1 7320 2010.0 6.0 2.0 3512 2 5400 2011.0 6.0 18.0 3131 3 9180 2009.0 10.0 18.0 3739 4 3102 2007.0 7.0 30.0 4550 5 5400 2011.0 4.0 6.0 3204 6 9000 2009.0 8.0 5.0 3813 7 6000 2004.0 2.0 7.0 5819 8 6000 2007.0 8.0 17.0 4532 9 6000 2013.0 5.0 5.0 2444 10 12000 2012.0 9.0 27.0 2664 11 12000 2004.0 11.0 25.0 5527 12 6000 2009.0 11.0 20.0 3706
Pandas merging DF's that have diff size shape and column names and freq with no duplicates
Pandas merging DF's that have 2 columns in common their index(team name(32 teams), year 2018-2015); DF1 has 9 columns of yearly Team NFL stats AVG's, DF2 has same index team(32) and year(2018-2015) but has 11 columns of diff stats for each of the 17games(or "weeks")::: so im trying to merge for every team and year it prints the yearly avgs 1st or (Df1's 9columns,in a row) for that team and year, followed by DF2's each of its 11 columns of stats per 17(ROWS)-- games ("weeks") on its index of each team and year. lvl0 = result.Tm_name.values lvl1 = result.Year.values newidx = pd.MultiIndex.from_arrays([lvl0, lvl1], names = ["Tm_name", "Year"]) result.set_index(newidx, inplace = True) result.drop(["Year", "Tm_name"], axis = 1, inplace = True) print(result) W L W_L_Pct PD MoV SoS SRS OSRS DSRS Tm_name Year 1 2015 13.0 3.0 0.813 176.0 11.0 1.3 12.3 9.0 3.4 2016 7.0 8.0 0.469 56.0 3.5 -1.9 1.6 2.4 -0.8 2017 8.0 8.0 0.500 -66.0 -4.1 0.4 -3.7 -4.0 0.2 2018 3.0 13.0 0.188 -200.0 -12.5 1.0 -11.5 -9.6 -1.9 2 2015 8.0 8.0 0.500 -6.0 -0.4 -3.4 -3.8 -4.0 0.3 2016 11.0 5.0 0.688 134.0 8.4 0.1 8.5 10.5 -2.0 2017 10.0 6.0 0.625 38.0 2.4 1.9 4.3 1.1 3.2 2018 7.0 9.0 0.438 -9.0 -0.6 0.4 -0.1 2.5 -2.6 3 2015 5.0 11.0 0.313 -73.0 -4.6 2.6 -1.9 -0.7 -1.2 2016 8.0 8.0 0.500 22.0 1.4 0.2 1.5 -1.1 2.6 2017 9.0 7.0 0.563 92.0 5.8 -2.4 3.4 2.2 1.2 2018 10.0 6.0 0.625 102.0 6.4 0.6 7.0 0.6 6.4 4 2015 8.0 8.0 0.500 20.0 1.3 -1.2 0.0 0.3 -0.2 2016 7.0 9.0 0.438 21.0 1.3 -1.6 -0.3 1.8 -2.2 2017 9.0 7.0 0.563 -57.0 -3.6 -0.5 -4.0 -3.0 -1.0 2018 6.0 10.0 0.375 -105.0 -6.6 -0.3 -6.9 -6.3 -0.6 5 2015 15.0 1.0 0.938 192.0 12.0 -3.9 8.1 6.0 2.1 2016 6.0 10.0 0.375 -33.0 -2.1 1.1 -1.0 -0.2 -0.8 2017 11.0 5.0 0.688 36.0 2.3 2.1 4.3 1.7 2.7 2018 7.0 9.0 0.438 -6.0 -0.4 1.3 0.9 0.1 0.8 6 2015 6.0 10.0 0.375 -62.0 -3.9 2.6 -1.3 -0.1 -1.2 2016 3.0 13.0 0.188 -120.0 -7.5 0.0 -7.5 -5.2 -2.3 2017 5.0 11.0 0.313 -56.0 -3.5 2.2 -1.3 -4.6 3.3 2018 12.0 4.0 0.750 138.0 8.6 -2.3 6.3 1.5 4.8 7 2015 12.0 4.0 0.750 140.0 8.8 1.9 10.6 4.8 5.8 lvl_0 = result2.Tm_name.values lvl_1 = result2.Year.values newidx_2 = newidx = pd.MultiIndex.from_arrays([lvl_0, lvl_1], names=["Tm_name", "Year"]) result2.set_index(newidx, inplace=True) result2.drop(["Year", "Tm_name"], axis=1, inplace=True) print(result2) Week Date win_loss home_away Opp1_team Tm_Pnts \ Tm_name Year 1 2018 1 2018-09-09 0.0 1.0 32.0 6.0 2018 2 2018-09-16 0.0 0.0 18.0 0.0 2018 3 2018-09-23 0.0 1.0 6.0 14.0 2018 4 2018-09-30 0.0 1.0 28.0 17.0 2018 5 2018-10-07 1.0 0.0 29.0 28.0 2018 6 2018-10-14 0.0 0.0 20.0 17.0 2018 7 2018-10-18 0.0 1.0 10.0 10.0 2018 8 2018-10-28 1.0 1.0 29.0 18.0 2018 10 2018-11-11 0.0 0.0 16.0 14.0 2018 11 2018-11-18 0.0 1.0 25.0 21.0 2018 12 2018-11-25 0.0 0.0 17.0 10.0 2018 13 2018-12-02 1.0 0.0 12.0 20.0 2018 14 2018-12-09 0.0 1.0 11.0 3.0 2018 15 2018-12-16 0.0 0.0 2.0 14.0 2018 16 2018-12-23 0.0 1.0 18.0 9.0 2018 17 2018-12-30 0.0 0.0 28.0 24.0 2017 1 2017-09-10 0.0 0.0 11.0 23.0 2017 2 2017-09-17 1.0 0.0 14.0 16.0 2017 3 2017-09-25 0.0 1.0 9.0 17.0 2017 4 2017-10-01 1.0 1.0 29.0 18.0 2017 5 2017-10-08 0.0 0.0 26.0 7.0 2017 6 2017-10-15 1.0 1.0 30.0 38.0 2017 7 2017-10-22 0.0 0.0 18.0 0.0 2017 9 2017-11-05 1.0 0.0 29.0 20.0 2017 10 2017-11-09 0.0 1.0 28.0 16.0 2017 11 2017-11-19 0.0 0.0 13.0 21.0 2017 12 2017-11-26 1.0 1.0 15.0 27.0 2017 13 2017-12-03 0.0 1.0 18.0 16.0 2017 14 2017-12-10 1.0 1.0 31.0 12.0 2017 15 2017-12-17 0.0 0.0 32.0 15.0 ... ... ... ... ... ... ... 2016 5 2016-10-06 1.0 0.0 29.0 33.0 2016 6 2016-10-17 1.0 1.0 24.0 28.0 2016 7 2016-10-23 NaN 1.0 28.0 6.0 2016 8 2016-10-30 0.0 0.0 5.0 20.0 2016 10 2016-11-13 1.0 1.0 29.0 23.0 2016 11 2016-11-20 0.0 0.0 20.0 24.0 2016 12 2016-11-27 0.0 0.0 2.0 19.0 2016 13 2016-12-04 1.0 1.0 32.0 31.0 2016 14 2016-12-11 0.0 0.0 19.0 23.0 2016 15 2016-12-18 0.0 1.0 22.0 41.0 2016 16 2016-12-24 1.0 0.0 28.0 34.0 2016 17 2016-01-01 1.0 0.0 18.0 44.0 2015 1 2015-09-13 1.0 1.0 22.0 31.0 2015 2 2015-09-20 1.0 0.0 6.0 48.0 2015 3 2015-09-27 1.0 1.0 29.0 47.0 2015 4 2015-10-04 0.0 1.0 NaN 22.0 2015 5 2015-10-11 1.0 0.0 11.0 42.0 2015 6 2015-10-18 0.0 0.0 27.0 13.0 2015 7 2015-10-26 1.0 1.0 3.0 26.0 2015 8 2015-11-01 1.0 0.0 8.0 34.0 2015 10 2015-11-15 1.0 0.0 28.0 39.0 2015 11 2015-11-22 1.0 1.0 7.0 34.0 2015 12 2015-11-29 1.0 0.0 29.0 19.0 2015 13 2015-12-06 1.0 0.0 NaN 27.0 2015 14 2015-12-10 1.0 1.0 20.0 23.0 2015 15 2015-12-20 1.0 0.0 26.0 40.0 2015 16 2015-12-27 1.0 1.0 12.0 38.0 2015 17 2015-01-03 0.0 1.0 28.0 6.0 2 2018 1 2018-09-06 0.0 0.0 26.0 12.0 2018 2 2018-09-16 1.0 1.0 5.0 31.0 Opp2_pnts Off_1stD Off_TotYd Def_1stD_All Def_TotYd_All Tm_name Year 1 2018 24.0 14.0 213.0 30.0 429.0 2018 34.0 5.0 137.0 24.0 432.0 2018 16.0 13.0 221.0 21.0 316.0 2018 20.0 18.0 263.0 19.0 331.0 2018 18.0 10.0 220.0 33.0 447.0 2018 27.0 16.0 268.0 20.0 411.0 2018 45.0 14.0 223.0 15.0 309.0 2018 15.0 20.0 321.0 16.0 267.0 2018 26.0 21.0 260.0 20.0 330.0 2018 23.0 13.0 282.0 19.0 325.0 2018 45.0 10.0 149.0 30.0 414.0 2018 17.0 18.0 315.0 22.0 325.0 2018 17.0 22.0 279.0 16.0 218.0 2018 40.0 18.0 253.0 23.0 435.0 2018 31.0 15.0 263.0 33.0 461.0 2018 27.0 12.0 198.0 16.0 291.0 2017 35.0 24.0 308.0 19.0 367.0 2017 13.0 17.0 389.0 18.0 266.0 2017 28.0 22.0 332.0 15.0 273.0 2017 15.0 25.0 368.0 20.0 305.0 2017 34.0 16.0 307.0 19.0 419.0 2017 33.0 23.0 432.0 21.0 412.0 2017 33.0 10.0 196.0 28.0 425.0 2017 10.0 20.0 368.0 17.0 329.0 2017 22.0 24.0 290.0 14.0 287.0 2017 31.0 17.0 292.0 22.0 357.0 2017 24.0 20.0 344.0 19.0 219.0 2017 32.0 19.0 305.0 18.0 303.0 2017 7.0 16.0 261.0 14.0 204.0 2017 20.0 19.0 286.0 14.0 218.0 ... ... ... ... ... ... 2016 21.0 17.0 288.0 25.0 286.0 2016 3.0 28.0 396.0 11.0 230.0 2016 6.0 23.0 443.0 11.0 257.0 2016 30.0 22.0 340.0 19.0 349.0 2016 20.0 26.0 443.0 15.0 281.0 2016 30.0 24.0 290.0 16.0 217.0 2016 38.0 23.0 332.0 28.0 360.0 2016 23.0 24.0 369.0 19.0 333.0 2016 26.0 21.0 300.0 15.0 314.0 2016 48.0 26.0 425.0 33.0 488.0 2016 31.0 21.0 370.0 24.0 391.0 2016 6.0 21.0 344.0 9.0 123.0 2015 19.0 25.0 427.0 18.0 408.0 2015 23.0 21.0 300.0 18.0 335.0 2015 7.0 28.0 446.0 10.0 156.0 2015 24.0 26.0 447.0 13.0 328.0 2015 17.0 15.0 345.0 29.0 435.0 2015 25.0 21.0 469.0 14.0 310.0 2015 18.0 21.0 414.0 18.0 276.0 2015 20.0 25.0 491.0 16.0 254.0 2015 32.0 30.0 451.0 18.0 343.0 2015 31.0 21.0 383.0 24.0 377.0 2015 13.0 26.0 337.0 17.0 368.0 2015 3.0 29.0 524.0 9.0 212.0 2015 20.0 22.0 393.0 23.0 389.0 2015 17.0 28.0 493.0 19.0 424.0 2015 8.0 19.0 381.0 16.0 178.0 2015 36.0 16.0 232.0 22.0 354.0 2 2018 18.0 16.0 299.0 18.0 232.0 2018 24.0 23.0 442.0 27.0 439.0 --I can't just merge them I've tried many different ways, it never comes out right so I figured I can get into a list and make a dataframe from the array. The array looks like what I want but when I put it into a dataframe, it will not do it{check below} app = [] for row in result.itertuples(): app.append(row) for row_1 in result2.itertuples(): if row[0] == row_1[0]: app.append(row_1) 9.0, Off_1stD=25.0, Off_TotYd=427.0, Def_1stD_All=18.0, Def_TotYd_All=408.0) Pandas(Index=(1, 2015), Week='2', Date=Timestamp('2015-09-20 00:00:00'), win_loss=1.0, home_away=0.0, Opp1_team=6.0, Tm_Pnts=48.0, Opp2_pnts=23.0, Off_1stD=21.0, Off_TotYd=300.0, Def_1stD_All=18.0, Def_TotYd_All=335.0) Pandas(Index=(1, 2015), Week='3', Date=Timestamp('2015-09-27 00:00:00'), win_loss=1.0, home_away=1.0, Opp1_team=29.0, Tm_Pnts=47.0, Opp2_pnts=7.0, Off_1stD=28.0, Off_TotYd=446.0, Def_1stD_All=10.0, Def_TotYd_All=156.0) Pandas(Index=(1, 2015), Week='4', Date=Timestamp('2015-10-04 00:00:00'), win_loss=0.0, home_away=1.0, Opp1_team=nan, Tm_Pnts=22.0, Opp2_pnts=24.0, Off_1stD=26.0, Off_TotYd=447.0, Def_1stD_All=13.0, Def_TotYd_All=328.0) Pandas(Index=(1, 2015), Week='5', Date=Timestamp('2015-10-11 00:00:00'), win_loss=1.0, home_away=0.0, Opp1_team=11.0, Tm_Pnts=42.0, Opp2_pnts=17.0, Off_1stD=15.0, Off_TotYd=345.0, Def_1stD_All=29.0, Def_TotYd_All=435.0) Pandas(Index=(1, 2015), Week='6', Date=Timestamp('2015-10-18 00:00:00'), win_loss=0.0, home_away=0.0, Opp1_team=27.0, Tm_Pnts=13.0, Opp2_pnts=25.0, Off_1stD=21.0, Off_TotYd=469.0, Def_1stD_All=14.0, Def_TotYd_All=310.0) Pandas(Index=(1, 2015), Week='7', Date=Timestamp('2015-10-26 00:00:00'), win_loss=1.0, home_away=1.0, Opp1_team=3.0, Tm_Pnts=26.0, Opp2_pnts=18.0, Off_1stD=21.0, Off_TotYd=414.0, Def_1stD_All=18.0, Def_TotYd_All=276.0) Pandas(Index=(1, 2015), Week='8', Date=Timestamp('2015-11-01 00:00:00'), win_loss=1.0, home_away=0.0, Opp1_team=8.0, Tm_Pnts=34.0, Opp2_pnts=20.0, Off_1stD=25.0, Off_TotYd=491.0, Def_1stD_All=16.0, Def_TotYd_All=254.0) Pandas(Index=(1, 2015), Week='10', Date=Timestamp('2015-11-15 00:00:00'), win_loss=1.0, home_away=0.0, Opp1_team=28.0, Tm_Pnts=39.0, Opp2_pnts=32.0, Off_1stD=30.0, Off_TotYd=451.0, Def_1stD_All=18.0, Def_TotYd_All=343.0) Pandas(Index=(1, 2015), Week='11', Date=Timestamp('2015-11-22 00:00:00'), win_loss=1.0, home_away=1.0, Opp1_team=7.0, Tm_Pnts=34.0, Opp2_pnts=31.0, Off_1stD=21.0, Off_TotYd=383.0, Def_1stD_All=24.0, Def_TotYd_All=377.0) Pandas(Index=(1, 2015), Week='12', Date=Timestamp('2015-11-29 00:00:00'), win_loss=1.0, home_away=0.0, Opp1_team=29.0, Tm_Pnts=19.0, Opp2_pnts=13.0, Off_1stD=26.0, Off_TotYd=337.0, Def_1stD_All=17.0, Def_TotYd_All=368.0) Pandas(Index=(1, 2015), Week='13', Date=Timestamp('2015-12-06 00:00:00'), win_loss=1.0, home_away=0.0, Opp1_team=nan, Tm_Pnts=27.0, Opp2_pnts=3.0, Off_1stD=29.0, Off_TotYd=524.0, Def_1stD_All=9.0, Def_TotYd_All=212.0) Pandas(Index=(1, 2015), Week='14', Date=Timestamp('2015-12-10 00:00:00'), win_loss=1.0, home_away=1.0, Opp1_team=20.0, Tm_Pnts=23.0, Opp2_pnts=20.0, Off_1stD=22.0, Off_TotYd=393.0, Def_1stD_All=23.0, Def_TotYd_All=389.0) Pandas(Index=(1, 2015), Week='15', Date=Timestamp('2015-12-20 00:00:00'), win_loss=1.0, home_away=0.0, Opp1_team=26.0, Tm_Pnts=40.0, Opp2_pnts=17.0, Off_1stD=28.0, Off_TotYd=493.0, Def_1stD_All=19.0, Def_TotYd_All=424.0) Pandas(Index=(1, 2015), Week='16', Date=Timestamp('2015-12-27 00:00:00'), win_loss=1.0, home_away=1.0, Opp1_team=12.0, Tm_Pnts=38.0, Opp2_pnts=8.0, Off_1stD=19.0, Off_TotYd=381.0, Def_1stD_All=16.0, Def_TotYd_All=178.0) Pandas(Index=(1, 2015), Week='17', Date=Timestamp('2015-01-03 00:00:00'), win_loss=0.0, home_away=1.0, Opp1_team=28.0, Tm_Pnts=6.0, Opp2_pnts=36.0, Off_1stD=16.0, Off_TotYd=232.0, Def_1stD_All=22.0, Def_TotYd_All=354.0) Pandas(Index=(1, 2016), W=7.0, L=8.0, W_L_Pct=0.469, PD=56.0, MoV=3.5, SoS=-1.9, SRS=1.6, OSRS=2.4, DSRS=-0.8) Pandas(Index=(1, 2016), Week=1, Date=Timestamp('2016-09-11 00:00:00'), win_loss=0.0, home_away=1.0, Opp1_team=21.0, Tm_Pnts=21.0, Opp2_pnts=23.0, Off_1stD=21.0, Off_TotYd=344.0, Def_1stD_All=19.0, Def_TotYd_All=363.0) Pandas(Index=(1, 2016), Week=2, Date=Timestamp('2016-09-18 00:00:00'), win_loss=1.0, home_away=1.0, Opp1_team=30.0, Tm_Pnts=40.0, Opp2_pnts=7.0, Off_1stD=20.0, Off_TotYd=416.0, Def_1stD_All=21.0, Def_TotYd_All=306.0) Pandas(Index=(1, 2016), Week=3, Date=Timestamp('2016-09-25 00:00:00'), win_loss=0.0, home_away=0.0, Opp1_team=4.0, Tm_Pnts=18.0, Opp2_pnts=33.0, Off_1stD=25.0, Off_TotYd=348.0, Def_1stD_All=16.0, Def_TotYd_All=297.0) Pandas(Index=(1, 2016), Week=4, Date=Timestamp('2016-10-02 00:00:00'), win_loss=0.0, home_away=1.0, Opp1_team=18.0, Tm_Pnts=13.0, Opp2_pnts=17.0, Off_1stD=26.0, Off_TotYd=420.0, Def_1stD_All=12.0, Def_TotYd_All=288.0) Pandas(Index=(1, 2016), Week=5, Date=Timestamp('2016-10-06 00:00:00'), win_loss=1.0, home_away=0.0, Opp1_team=29.0, Tm_Pnts=33.0, Opp2_pnts=21.0, Off_1stD=17.0, Off_TotYd=288.0, Def_1stD_All=25.0, Def_TotYd_All=286.0) Pandas(Index=(1, 2016), Week=6, Date=Timestamp('2016-10-17 00:00:00'), win_loss=1.0, home_away=1.0, Opp1_team=24.0, Tm_Pnts=28.0, Opp2_pnts=3.0, Off_1stD=28.0, Off_TotYd=396.0, Def_1stD_All=11.0, Def_TotYd_All=230.0) Pandas(Index=(1, 2016), Week=7, Date=Timestamp('2016-10-23 00:00:00'), win_loss=nan, home_away=1.0, Opp1_team=28.0, Tm_Pnts=6.0, Opp2_pnts=6.0, Off_1stD=23.0, Off_TotYd=443.0, Def_1stD_All=11.0, Def_TotYd_All=257.0) Pandas(Index=(1, 2016), Week=8, Date=Timestamp('2016-10-30 00:00:00'), win_loss=0.0, home_away=0.0, Opp1_team=5.0, Tm_Pnts=20.0, Opp2_pnts=30.0, Off_1stD=22.0, Off_TotYd=340.0, Def_1stD_All=19.0, Def_TotYd_All=349.0) Pandas(Index=(1, 2016), Week=10, Date=Timestamp('2016-11-13 00:00:00'), win_loss=1.0, home_away=1.0, Opp1_team=29.0, Tm_Pnts=23.0, Opp2_pnts=20.0, Off_1stD=26.0, Off_TotYd=443.0, Def_1stD_All=15.0, Def_TotYd_All=281.0) Pandas(Index=(1, 2016), Week=11, Date=Timestamp('2016-11-20 00:00:00'), win_loss=0.0, home_away=0.0, Opp1_team=20.0, Tm_Pnts=24.0, Opp2_pnts=30.0, Off_1stD=24.0, Off_TotYd=290.0, Def_1stD_All=16.0, Def_TotYd_All=217.0) Pandas(Index=(1, 2016), Week=12, Date=Timestamp('2016-11-27 00:00:00'), win_loss=0.0, home_away=0.0, Opp1_team=2.0, Tm_Pnts=19.0, Opp2_pnts=38.0, Off_1stD=23.0, Off_TotYd=332.0, Def_1stD_All=28.0, Def_TotYd_All=360.0) Pandas(Index=(1, 2016), Week=13, Date=Timestamp('2016-12-04 00:00:00'), win_loss=1.0, home_away=1.0, Opp1_team=32.0, Tm_Pnts=31.0, Opp2_pnts=23.0, Off_1stD=24.0, Off_TotYd=369.0, Def_1stD_All=19.0, Def_TotYd_All=333.0) Pandas(Index=(1, 2016), Week=14, Date=Timestamp('2016-12-11 00:00:00'), win_loss=0.0, home_away=0.0, Opp1_team=19.0, Tm_Pnts=23.0, Opp2_pnts=26.0, Off_1stD=21.0, Off_TotYd=300.0, Def_1stD_All=15.0, Def_TotYd_All=314.0) Pandas(Index=(1, 2016), Week=15, Date=Timestamp('2016-12-18 00:00:00'), win_loss=0.0, home_away=1.0, Opp1_team=22.0, Tm_Pnts=41.0, Opp2_pnts=48.0, Off_1stD=26.0, Off_TotYd=425.0, Def_1stD_All=33.0, Def_TotYd_All=488.0) Pandas(Index=(1, 2016), Week=16, Date=Timestamp('2016-12-24 00:00:00'), win_loss=1.0, home_away=0.0, Opp1_team=28.0, Tm_Pnts=34.0, Opp2_pnts=31.0, Off_1stD=21.0, Off_TotYd=370.0, Def_1stD_All=24.0, Def_TotYd_All=391.0) Pandas(Index=(1, 2016), Week=17, Date=Timestamp('2016-01-01 00:00:00'), win_loss=1.0, home_away=0.0, Opp1_team=18.0, Tm_Pnts=44.0, Opp2_pnts=6.0, Off_1stD=21.0, Off_TotYd=344.0, Def_1stD_All=9.0, Def_TotYd_All=123.0).... this is setup just how I would like each year starts with year avgs in a row followed by 17 rows of stats from each game that year.---now when I try to put into a DataFrame new = pd.DataFrame(app, index=["Tm_name", "Year"]) AssertionError: 10 cols passed, passed data had 12 cols could some one please help me iv been playing with this for 2 weeks, tried doing multi indexing, diff merges , concats, just can't seem to get it to look like the APP ARRAY and have no duplicates... Thanks again