Scale values of a particular column of python dataframe between 1-10 - python

I have a dataframe which contains youtube videos views, I want to scale these values in the range of 1-10.
Below is the sample of how values look like? How do i normalize it in the range of 1-10 or is there any more efficient way to do this thing?
rating
4394029
274358
473691
282858
703750
255967
3298456
136643
796896
2932
220661
48688
4661584
2526119
332176
7189818
322896
188162
157437
1153128
788310
1307902

One possibility is performing a scaling with max.
1 + df / df.max() * 9
rating
0 6.500315
1 1.343433
2 1.592952
3 1.354073
4 1.880933
5 1.320412
6 5.128909
7 1.171046
8 1.997531
9 1.003670
10 1.276217
11 1.060946
12 6.835232
13 4.162121
14 1.415808
15 10.000000
16 1.404192
17 1.235536
18 1.197075
19 2.443451
20 1.986783
21 2.637193
Similar solution by Wen (now deleted):
1 + (df - df.min()) * 9 / (df.max() - df.min())
rating
0 6.498887
1 1.339902
2 1.589522
3 1.350546
4 1.877621
5 1.316871
6 5.126922
7 1.167444
8 1.994266
9 1.000000
10 1.272658
11 1.057299
12 6.833941
13 4.159739
14 1.412306
15 10.000000
16 1.400685
17 1.231960
18 1.193484
19 2.440368
20 1.983514
21 2.634189

Related

How to add values in a data frame with specific conditions

I have a code where it outputs the amount of times a product is bought in a specific month in all stores; however, I was wondering how I would be able to have the sum of 3 conditions, where python would add the products from a specific month and a specific store.
This is my code so far:
df = df.groupby(['Month_Bought'])['Amount_Bought'].sum()
print(df)
Output:
01-2020 27
02-2020 26
03-2020 24
04-2020 23
05-2020 31
06-2020 33
07-2020 26
08-2020 30
09-2020 33
10-2020 26
11-2020 30
12-2020 30
Need to separate the data to make the dataframe look like this:
Store1 Store2
01-2020 3 24
02-2020 4 22
03-2020 8 16
04-2020 4 19
05-2020 10 21
06-2020 11 21
07-2020 12 14
08-2020 10 20
09-2020 3 30
10-2020 14 12
11-2020 21 9
12-2020 9 21
Assuming your data is long (a column contains values for which store a product was purchased), you could group by store and month:
import pandas as pd
records = [
{'Month_Bought':'01-2020', 'Amount_Bought':1, 'Store': 'Store1'},
{'Month_Bought':'01-2020', 'Amount_Bought':2, 'Store': 'Store2'},
{'Month_Bought':'02-2020', 'Amount_Bought':2, 'Store': 'Store1'},
{'Month_Bought':'02-2020', 'Amount_Bought':4, 'Store': 'Store2'}
]
df = pd.DataFrame.from_records(records)
# Initial dataframe
Month_Bought Amount_Bought Store
0 01-2020 1 Store1
1 01-2020 2 Store2
2 02-2020 2 Store1
3 02-2020 4 Store2
# Now groupby store and month
df_agg = df.groupby(['Store', 'Month_Bought'], as_index=False)['Amount_Bought'].sum()
# Convert from long to wide:
df_agg_pivot = df_agg.pivot(index='Month_Bought', columns='Store', values='Amount_Bought')
# Reformat
df_agg_pivot = df_agg_pivot.reset_index()
df_agg_pivot.columns.name = None
# Final result:
Month_Bought Store1 Store2
0 01-2020 1 2
1 02-2020 2 4

Sort pandas dataframe column based on substring

I have a pandas dataframe, as shown below:
Timestamp_Start Event_ID Duration
555.54944 Fix_1 0.42248
559.07281 Fix_10 0.01996
559.14642 Fix_11 0
556.03192 Fix_2 0.16113
556.27985 Fix_3 0.24188
556.56097 Fix_4 0.04987
556.65497 Fix_5 0.10748
556.80859 Fix_6 0.75708
557.57983 Fix_7 0.11329
557.75348 Fix_8 0.65643
558.43665 Fix_9 0.27447
555.97925 Sac_1 0.04577
559.09961 Sac_10 0.0404
559.15302 Sac_11 0.00726
556.19916 Sac_2 0.07403
556.52747 Sac_3 0.02789
556.61865 Sac_4 0.02985
556.76849 Sac_5 0.0337
557.57294 Sac_6 0
557.69965 Sac_7 0.04687
558.41632 Sac_8 0.01325
558.71796 Sac_9 0.34552
I want to sort the 'Event_ID' column, so that Fix_1,Fix_2,Fix_3... and Sac_1,Sac_2,Sac_3... appear in order, like below:
Timestamp_StartEvent_ID Duration
555.54944 Fix_1 0.42248
556.03192 Fix_2 0.16113
556.27985 Fix_3 0.24188
556.56097 Fix_4 0.04987
556.65497 Fix_5 0.10748
556.80859 Fix_6 0.75708
557.57983 Fix_7 0.11329
557.75348 Fix_8 0.65643
558.43665 Fix_9 0.27447
559.07281 Fix_10 0.01996
559.14642 Fix_11 0
555.97925 Sac_1 0.04577
556.19916 Sac_2 0.07403
556.52747 Sac_3 0.02789
556.61865 Sac_4 0.02985
556.76849 Sac_5 0.0337
557.57294 Sac_6 0
557.69965 Sac_7 0.04687
558.41632 Sac_8 0.01325
558.71796 Sac_9 0.34552
559.09961 Sac_10 0.0404
559.15302 Sac_11 0.00726
Any ideas on how to do that? Thanks for your help.
One way using distutils.version:
import numpy as np
from distutils.version import LooseVersion
f = np.vectorize(LooseVersion)
new_df = df.sort_values("Event_ID", key=f)
print(new_df)
Output:
Timestamp_Start Event_ID Duration
0 555.54944 Fix_1 0.42248
3 556.03192 Fix_2 0.16113
4 556.27985 Fix_3 0.24188
5 556.56097 Fix_4 0.04987
6 556.65497 Fix_5 0.10748
7 556.80859 Fix_6 0.75708
8 557.57983 Fix_7 0.11329
9 557.75348 Fix_8 0.65643
10 558.43665 Fix_9 0.27447
1 559.07281 Fix_10 0.01996
2 559.14642 Fix_11 0.00000
11 555.97925 Sac_1 0.04577
14 556.19916 Sac_2 0.07403
15 556.52747 Sac_3 0.02789
16 556.61865 Sac_4 0.02985
17 556.76849 Sac_5 0.03370
18 557.57294 Sac_6 0.00000
19 557.69965 Sac_7 0.04687
20 558.41632 Sac_8 0.01325
21 558.71796 Sac_9 0.34552
12 559.09961 Sac_10 0.04040
13 559.15302 Sac_11 0.00726
Normal sorting on the dataframe will not work, as you need the integer in the string to be treated as int value.
It can be done with extra space though.
You can make two columns like this,
df['event'] = df.Event_ID.str.rsplit("_").str[0]
df['idx'] = df.Event_ID.str.rsplit("_").str[-1].astype(int)
Now, sort on these two columns,
df.sort_values(['event', 'idx'])
Timestamp_Start Event_ID Duration idx event
0 555.54944 Fix_1 0.42248 1 Fix
3 556.03192 Fix_2 0.16113 2 Fix
4 556.27985 Fix_3 0.24188 3 Fix
5 556.56097 Fix_4 0.04987 4 Fix
6 556.65497 Fix_5 0.10748 5 Fix
7 556.80859 Fix_6 0.75708 6 Fix
8 557.57983 Fix_7 0.11329 7 Fix
9 557.75348 Fix_8 0.65643 8 Fix
10 558.43665 Fix_9 0.27447 9 Fix
1 559.07281 Fix_10 0.01996 10 Fix
2 559.14642 Fix_11 0.00000 11 Fix
11 555.97925 Sac_1 0.04577 1 Sac
14 556.19916 Sac_2 0.07403 2 Sac
15 556.52747 Sac_3 0.02789 3 Sac
16 556.61865 Sac_4 0.02985 4 Sac
17 556.76849 Sac_5 0.03370 5 Sac
18 557.57294 Sac_6 0.00000 6 Sac
19 557.69965 Sac_7 0.04687 7 Sac
20 558.41632 Sac_8 0.01325 8 Sac
21 558.71796 Sac_9 0.34552 9 Sac
12 559.09961 Sac_10 0.04040 10 Sac
13 559.15302 Sac_11 0.00726 11 Sac
You can reset_index, drop extra columns as needed

Pandas - formatting a NxN matrix

I have to deal with a square matrix (N x N) (N will change depending on the system, but the matrix will always be a square matrix).
Here is an example:
0 1 2 3 4
0 5.1677124550E-001 5.4962112499E-005 3.2484393256E-002 -1.8901697652E-001 -6.7156804753E-003
1 5.5380106796E-005 5.6159927753E-001 -1.9000545049E-003 -1.4737748792E-002 -7.2598453774E-002
2 3.2486915835E-002 -1.8996351539E-003 5.6791783316E-001 7.2316374186E-002 1.5013066446E-003
3 -1.8901411495E-001 -1.4737367075E-002 7.2315825338E-002 6.2721160365E-001 3.1553528602E-002
4 -6.7136454124E-003 -7.2597907350E-002 1.5007743348E-003 3.1554372311E-002 2.7318109331E-001
5 6.6738948243E-002 1.4102132238E-003 -1.2689244944E-001 4.7666038803E-002 1.8559074897E-002
6 -2.5293332676E-002 3.7536452002E-002 -1.3453018251E-002 -1.3177136905E-001 6.8262612506E-002
7 5.0951492945E-003 2.1082303893E-005 2.2599127408E-004 1.0287898189E-001 -1.1117916184E-001
8 1.0818230191E-003 -1.2435319909E-002 8.1008075834E-003 -4.2864102001E-002 4.2865913226E-002
9 -1.8399671295E-002 -2.1579653853E-002 -8.3073582356E-003 -2.1848513510E-001 -7.3408914625E-002
10 3.4566032399E-003 -4.0687639382E-003 1.3769999130E-003 -1.1873434189E-001 -3.3274201039E-002
11 6.6093238125E-003 1.7153435473E-002 4.9392012712E-003 -8.4590814134E-002 -4.3601041176E-002
12 -1.1418316960E-001 -1.1241625427E-001 -3.2263873516E-002 -1.9323129435E-002 -2.6233049625E-002
13 -1.1352899039E-001 -2.2898299860E-001 -5.3035072561E-002 7.4480651562E-004 6.3778892206E-004
14 -3.2197359289E-002 -5.3404040557E-002 -6.2530142462E-002 9.6648204015E-003 1.5382174347E-002
15 -1.2210509335E-001 1.1380412205E-001 -3.8374895516E-002 -1.2823165326E-002 2.3865200517E-002
16 1.1478157080E-001 -2.1487971631E-001 5.9955334103E-002 -1.2803721235E-003 -2.2477259002E-004
17 -3.9162044498E-002 6.0167325377E-002 -6.7692892326E-002 6.3814569032E-003 -1.3309923267E-002
18 -5.1386866211E-002 -1.1483215267E-003 -3.8482481829E-002 2.2227734790E-003 2.4860195290E-004
19 -1.8287048910E-003 -4.5442287955E-002 -7.6787332291E-003 7.6970470456E-004 -1.8456603178E-003
20 -3.4812676792E-002 -7.8376169613E-003 -3.1205975353E-001 -2.8005140005E-003 3.9792109835E-004
21 2.6908361866E-003 3.7102890907E-004 2.8494060446E-002 -4.8904422930E-002 -5.8840348122E-004
22 -1.6354677061E-003 2.2592828188E-003 1.6591434361E-004 -4.9992263663E-003 -4.3243295112E-002
23 -1.4297833794E-003 -1.7830154308E-003 -1.1426700328E-002 1.7125095395E-003 -1.2016863398E-002
24 1.6271802154E-003 1.6383303957E-003 -7.8049656555E-004 3.7177399735E-003 -1.0472268655E-002
25 -4.1949740427E-004 1.5301971185E-004 -9.8681335931E-004 -2.2257204483E-004 -5.1722898203E-003
26 1.0290471110E-003 9.3255502541E-004 7.7166886713E-004 4.5630851485E-003 -4.3761358485E-003
27 -7.0031784470E-004 -3.5205332654E-003 -1.6311730073E-003 -1.2805479632E-002 -6.5565487971E-003
28 7.4046927792E-004 1.9332629981E-003 3.7374682636E-004 3.9965654817E-003 -6.2275912806E-003
29 -3.4680278867E-004 -2.3027344089E-003 -1.1338817043E-003 -1.2023581780E-002 -5.4242202971E-003
5 6 7 8 9
0 6.6743285428E-002 -2.5292337123E-002 5.0949675928E-003 1.0817408844E-003 -1.8399704662E-002
1 1.4100215877E-003 3.7536256943E-002 2.1212526899E-005 -1.2435482773E-002 -2.1579384876E-002
2 -1.2689432485E-001 -1.3453164785E-002 2.2618690004E-004 8.1008703937E-003 -8.3084039605E-003
3 4.7663851818E-002 -1.3181118094E-001 1.0290976691E-001 -4.2887391630E-002 -2.1847562123E-001
4 1.8558453001E-002 6.8311145594E-002 -1.1122358467E-001 4.2891711956E-002 -7.3413776745E-002
5 6.5246209445E-001 -3.7960754525E-002 5.8439215647E-002 -9.0620367134E-002 -8.4164313206E-002
6 -3.7935271881E-002 1.9415336793E-001 -6.8115262349E-002 5.0899890760E-002 -3.3687874555E-002
7 5.8422477033E-002 -6.8128901087E-002 3.9950499633E-001 -4.4336879147E-002 -4.0665928103E-002
8 -9.0612201567E-002 5.0902528870E-002 -4.4330072001E-002 1.2680415316E-001 1.7096405711E-002
9 -8.4167028549E-002 -3.3690056890E-002 -4.0677875424E-002 1.7097273427E-002 5.2579065978E-001
10 -6.4841142152E-002 -5.4453858464E-003 -2.4697277476E-001 8.5069643903E-005 1.8744016178E-001
11 -1.0367060076E-001 1.5864203200E-002 -1.6074822795E-002 -5.5265410413E-002 -7.3152548403E-002
12 -9.0665723957E-003 3.3027526012E-003 1.8484849938E-003 -7.5841163489E-004 -3.3700244298E-003
13 4.7717318460E-004 -1.8118719766E-003 1.6014630540E-003 -2.3830908057E-004 2.1049292570E-003
14 4.3836856576E-003 -1.7242302777E-003 -1.2023546553E-003 4.0533783460E-004 1.4850814596E-003
15 -1.2402059167E-002 -7.4793143461E-003 -3.8769252328E-004 3.9551076185E-003 1.0737706641E-003
16 -9.3076805579E-005 -1.6074185601E-003 1.7551579833E-003 -5.1663470094E-004 1.1072804383E-003
17 4.6817349747E-003 3.6900011954E-003 -8.6155331565E-004 -9.1007768778E-005 -7.3899260162E-004
18 3.2959550689E-002 3.0400921147E-003 3.9724187499E-004 -1.9220339108E-003 1.8075790317E-003
19 7.0905456379E-004 -5.0949208181E-004 -4.6021500516E-004 -7.9847500945E-004 1.4079850530E-004
20 -1.8687467448E-002 -6.3913023759E-004 -7.3566296037E-004 2.3726543730E-003 -1.0663719038E-003
21 3.6598966411E-003 -8.2335128379E-003 7.5645765132E-004 -2.1824880567E-002 -3.5125687811E-003
22 -1.6198130808E-002 8.4576317115E-003 -6.2045498682E-004 3.3460766491E-002 3.2638760335E-003
23 -3.2057393808E-001 -1.1315081941E-002 3.4822885510E-003 -5.8263446092E-003 2.9508421818E-004
24 -2.6366856593E-002 -5.8331954255E-004 1.1995976399E-003 3.4813904521E-003 -5.0942740761E-002
25 6.5474742063E-003 -5.7681583908E-003 -2.2680039574E-002 -3.3264360995E-002 4.8925407218E-003
26 -1.1288074542E-002 -4.5938216710E-003 -1.9339903561E-003 1.0812058656E-002 2.3005958417E-002
27 1.8937006089E-002 6.5590668002E-003 -2.9973042787E-003 -9.1466195902E-003 -2.0027029530E-001
28 -5.0006834397E-003 -3.1011487603E-002 -2.1071980031E-002 1.5171078954E-002 -6.3286786806E-002
29 1.0199591553E-002 -7.9372677248E-004 3.0157129340E-003 3.3043947441E-003 1.2554933598E-001
10 11 12 13 14
0 3.4566170422E-003 6.6091516193E-003 -1.1418209846E-001 -1.1352717720E-001 -3.2196213169E-002
1 -4.0687114857E-003 1.7153538295E-002 -1.1241515840E-001 -2.2897846552E-001 -5.3401852861E-002
2 1.3767476381E-003 4.9395834885E-003 -3.2262805417E-002 -5.3032729716E-002 -6.2527093260E-002
3 -1.1874067860E-001 -8.4586993618E-002 -1.9322697616E-002 7.4504831410E-004 9.6646936748E-003
4 -3.3280804952E-002 -4.3604931512E-002 -2.6232842935E-002 6.3789697287E-004 1.5382093474E-002
5 -6.4845769217E-002 -1.0366990398E-001 -9.0664935892E-003 4.7719667654E-004 4.3835884630E-003
6 -5.4306282394E-003 1.5863464756E-002 3.3027917727E-003 -1.8118646089E-003 -1.7242102753E-003
7 -2.4687457565E-001 -1.6075394559E-002 1.8484728466E-003 1.6014634135E-003 -1.2023496466E-003
8 8.5962912652E-005 -5.5265657567E-002 -7.5843145596E-004 -2.3831274033E-004 4.0533385644E-004
9 1.8744386918E-001 -7.3152643002E-002 -3.3700964189E-003 2.1048865009E-003 1.4850822567E-003
10 4.2975054072E-001 1.0364270794E-001 -1.5875283846E-003 6.7147216913E-004 1.2875627684E-004
11 1.0364402707E-001 6.0712435750E-001 5.1492123223E-003 8.2705404716E-004 -1.8653698814E-003
12 -1.5875318643E-003 5.1492269487E-003 1.2662026379E-001 1.2488481495E-001 3.3008712754E-002
13 6.7147489686E-004 8.2705994225E-004 1.2488477299E-001 2.4603749137E-001 5.7666439818E-002
14 1.2875157882E-004 -1.8653719810E-003 3.3008614344E-002 5.7666322609E-002 6.3196096154E-002
15 1.1375173141E-003 -1.2188735107E-003 9.5708352328E-003 -1.3282223067E-002 5.3571128896E-003
16 2.1319373893E-004 -2.6367828437E-004 1.4833724552E-002 -2.0115235494E-002 7.8461850894E-003
17 2.3051283757E-004 3.4044831571E-004 4.9262824289E-003 -6.6151918659E-003 1.1684894610E-003
18 -5.6658408835E-004 1.5710333316E-003 -2.6543076573E-003 1.0490950154E-003 -1.5676208892E-002
19 1.0005496308E-003 1.0400419914E-003 -2.7122935995E-003 -5.3716049248E-005 -2.6747366947E-002
20 3.1068907684E-004 5.3348953665E-004 -4.7934824223E-004 4.4853558686E-004 -6.0300656596E-003
21 2.7080517882E-003 -1.9033626829E-002 8.8615570289E-004 -3.7735646663E-004 -7.4101143501E-004
22 -2.9622921796E-003 -2.4159082408E-002 6.6943323966E-004 1.1154593780E-004 1.5914682394E-004
23 3.2842560830E-003 -6.2612752482E-003 1.5738434272E-004 4.6284599959E-004 4.0588132107E-004
24 1.6971737369E-003 2.4217812563E-002 4.3246402884E-004 9.5059931011E-005 3.5484698283E-004
25 -7.4868993750E-002 -8.7332668698E-002 -6.0147742690E-005 -4.8099146029E-005 1.1509155506E-004
26 -9.3177706949E-002 -2.9315061874E-001 2.1287190612E-004 5.0813661565E-005 2.6955715462E-004
27 -7.0097859908E-002 1.2458191360E-001 -1.2846480258E-003 1.2192486380E-004 4.6853704861E-004
28 -6.9485493530E-002 4.8763866344E-002 7.7223643475E-004 1.3853535883E-004 5.4636752811E-005
29 4.8961381968E-002 -1.5272337445E-001 -8.8648769643E-004 -4.4975303480E-005 5.9586006091E-004
15 16 17 18 19
0 -1.2210501176E-001 1.1478027359E-001 -3.9162145749E-002 -5.1389252158E-002 -1.8288904037E-003
1 1.1380272374E-001 -2.1487588526E-001 6.0165774430E-002 -1.1487007778E-003 -4.5441546655E-002
2 -3.8374694597E-002 5.9953296524E-002 -6.7691825286E-002 -3.8484030260E-002 -7.6800715249E-003
3 -1.2822729286E-002 -1.2805898275E-003 6.3813065178E-003 2.2220841872E-003 7.6991955181E-004
4 2.3864994996E-002 -2.2470892452E-004 -1.3309838494E-002 2.4851560674E-004 -1.8460620529E-003
5 -1.2402212045E-002 -9.2994801153E-005 4.6817064931E-003 3.2958166488E-002 7.0866732024E-004
6 -7.4793278406E-003 -1.6074103229E-003 3.6899979002E-003 3.0392561951E-003 -5.0946020505E-004
7 -3.8770026733E-004 1.7551659565E-003 -8.6155605026E-004 3.9692465089E-004 -4.6038088334E-004
8 3.9551171890E-003 -5.1663991899E-004 -9.1008948343E-005 -1.9220277566E-003 -7.9837924658E-004
9 1.0738350084E-003 1.1072790098E-003 -7.3897453645E-004 1.8057852560E-003 1.4013275714E-004
10 1.1375075076E-003 2.1317640112E-004 2.3050639764E-004 -5.6673414945E-004 1.0005316579E-003
11 -1.2189105982E-003 -2.6367792495E-004 3.4043235164E-004 1.5732522246E-003 1.0407973658E-003
12 9.5708232459E-003 1.4833737759E-002 4.9262816092E-003 -2.6542614308E-003 -2.7122986789E-003
13 -1.3282260152E-002 -2.0115238348E-002 -6.6152067653E-003 1.0491248568E-003 -5.3705750675E-005
14 5.3571028398E-003 7.8462085672E-003 1.1684872139E-003 -1.5676176683E-002 -2.6747374282E-002
15 1.3378635756E-001 -1.2613361119E-001 4.2401828623E-002 -2.6595403473E-003 1.9873360401E-003
16 -1.2613349126E-001 2.3154756121E-001 -6.5778628114E-002 -2.2828335280E-003 1.4601821131E-003
17 4.2401749392E-002 -6.5778591727E-002 6.8187241643E-002 -1.6653902450E-002 2.5505038138E-002
18 -2.6595920073E-003 -2.2828074980E-003 -1.6653942562E-002 5.4855247002E-002 2.4729783529E-003
19 1.9873415121E-003 1.4601899329E-003 2.5505058190E-002 2.4729967206E-003 4.4724663284E-002
20 -3.8366743828E-004 -8.8746730931E-004 -6.4420927497E-003 3.6656962180E-002 8.1224860664E-003
21 9.2845385141E-004 3.6802433505E-004 -9.5040708316E-004 -5.1941208846E-003 -1.2444625713E-004
22 -5.0318487549E-004 1.4342911215E-004 2.8985859503E-004 2.0416113478E-004 9.1951318240E-004
23 7.4036073171E-004 -3.4730013615E-004 -1.3351566400E-004 2.3474188588E-003 1.3102362758E-005
24 -2.7749145090E-004 4.7724454321E-005 5.5527644806E-005 -1.7302886151E-004 -1.7726879169E-004
25 -2.5090250470E-004 2.1741519930E-005 2.7208805916E-004 -2.5982303487E-004 -1.9668228900E-004
26 -1.4489113997E-004 -3.0397727583E-005 2.7239543481E-005 -6.0050637375E-004 -2.9892198193E-005
27 -1.6519482597E-005 1.6435294198E-004 5.0961893634E-005 1.4077278097E-004 -1.9027010603E-005
28 -2.3547595249E-004 7.6124571826E-005 1.0117983985E-004 -1.1534040559E-004 -1.0579685787E-004
29 7.0507166233E-005 1.1552377841E-004 -4.5931305760E-005 -2.0007797315E-004 -1.3505340062E-004
20 21 22 23 24
0 -3.4812101478E-002 2.6911592086E-003 -1.6354152863E-003 -1.4301333227E-003 1.6249964844E-003
1 -7.8382610347E-003 3.7103408229E-004 2.2593110441E-003 -1.7829862164E-003 1.6374435740E-003
2 -3.1205423941E-001 2.8493671639E-002 1.6587990556E-004 -1.1426237591E-002 -7.8189111866E-004
3 -2.8004725758E-003 -4.8903739721E-002 -4.9988134121E-003 1.7100983514E-003 3.7179545055E-003
4 3.9806443322E-004 -5.8790208912E-004 -4.3242458298E-002 -1.2016207108E-002 -1.0472139534E-002
5 -1.8686790048E-002 3.6592865292E-003 -1.6198931842E-002 -3.2057224847E-001 -2.6367531700E-002
6 -6.3919412091E-004 -8.2335246704E-003 8.4576155591E-003 -1.1315054733E-002 -5.8369163532E-004
7 -7.3581915791E-004 7.5646519519E-004 -6.2047477465E-004 3.4823216513E-003 1.1991380964E-003
8 2.3726528036E-003 -2.1824763131E-002 3.3460717579E-002 -5.8262172949E-003 3.4812921433E-003
9 -1.0665296285E-003 -3.5124206435E-003 3.2639684654E-003 2.9530797749E-004 -5.0943824872E-002
10 3.1067613876E-004 2.7079189356E-003 -2.9623459983E-003 3.2841200274E-003 1.6984442797E-003
11 5.3351732140E-004 -1.9033427571E-002 -2.4158940046E-002 -6.2609613281E-003 2.4221378111E-002
12 -4.7937892256E-004 8.8611314755E-004 6.6939922854E-004 1.5740024716E-004 4.3249394082E-004
13 4.4851926804E-004 -3.7736678097E-004 1.1153694999E-004 4.6284806253E-004 9.5077824774E-005
14 -6.0300787410E-003 -7.4096053004E-004 1.5918637627E-004 4.0586523098E-004 3.5485782222E-004
15 -3.8368712363E-004 9.2843754228E-004 -5.0316845184E-004 7.4036906127E-004 -2.7745851356E-004
16 -8.8745240886E-004 3.6801936222E-004 1.4342995270E-004 -3.4729860789E-004 4.7711904531E-005
17 -6.4420819427E-003 -9.5038506002E-004 2.8983698019E-004 -1.3352326563E-004 5.5544671478E-005
18 3.6656852373E-002 -5.1941195232E-003 2.0415783452E-004 2.3474119607E-003 -1.7153048632E-004
19 8.1224361521E-003 -1.2444681834E-004 9.1951236579E-004 1.3097434442E-005 -1.7668019335E-004
20 3.3911554853E-001 2.8652507893E-003 -6.8339696880E-005 3.7476484447E-004 8.3606654277E-004
21 2.8652527558E-003 6.1967615286E-002 -3.2455918220E-003 7.8074203872E-003 -1.5351890960E-003
22 -6.8340068690E-005 -3.2455946984E-003 4.1826230856E-002 6.5337193429E-003 -3.1932674182E-003
23 3.7476336333E-004 7.8073802579E-003 6.5336763366E-003 3.4246747567E-001 -2.2590437719E-005
24 8.3515185725E-004 -1.5351889308E-003 -3.1932682244E-003 -2.2585651674E-005 4.7006835231E-002
25 5.3158843621E-007 1.0652535047E-003 1.4954902777E-003 2.4073368793E-004 1.1954474977E-003
26 5.5963948637E-004 -4.4872582333E-004 -1.4772351943E-003 6.3199701928E-004 -2.1389718034E-002
27 -1.7619372799E-004 9.0741766644E-004 9.8175835796E-004 -2.9459682310E-004 7.2835611826E-004
28 2.5127782091E-004 -9.3298199434E-004 6.8787235133E-005 1.2732690365E-004 7.9688727422E-003
29 2.6201943695E-004 1.7128017387E-004 1.2934748675E-003 3.4008367645E-004 1.9615268308E-002
25 26 27 28 29
0 -4.2035299977E-004 1.0294528397E-003 -7.0032537135E-004 7.4047266192E-004 -3.4678947810E-004
1 1.5264932827E-004 9.3263518942E-004 -3.5205362458E-003 1.9332600101E-003 -2.3027335108E-003
2 -9.8735571502E-004 7.7177183895E-004 -1.6311830663E-003 3.7374078263E-004 -1.1338849320E-003
3 -2.2267753982E-004 4.5631164845E-003 -1.2805227755E-002 3.9967067646E-003 -1.2023590679E-002
4 -5.1722782688E-003 -4.3757731112E-003 -6.5561880794E-003 -6.2274289617E-003 -5.4242286711E-003
5 6.5472637324E-003 -1.1287788747E-002 1.8937046693E-002 -5.0006811267E-003 1.0199602824E-002
6 -5.7685226078E-003 -4.5935456207E-003 6.5591405092E-003 -3.1011377655E-002 -7.9382348181E-004
7 -2.2680665405E-002 -1.9338350120E-003 -2.9972765688E-003 -2.1071947728E-002 3.0156847654E-003
8 -3.3264515239E-002 1.0812126530E-002 -9.1466888768E-003 1.5170890552E-002 3.3044094214E-003
9 4.8928775025E-003 2.3007654009E-002 -2.0026482543E-001 -6.3285758846E-002 1.2554808336E-001
10 -7.4869041758E-002 -9.3178724533E-002 -7.0098856149E-002 -6.9485640501E-002 4.8962839723E-002
11 -8.7330564494E-002 -2.9314613543E-001 1.2458021507E-001 4.8763534298E-002 -1.5272144228E-001
12 -6.0132426168E-005 2.1286995818E-004 -1.2846479090E-003 7.7223667108E-004 -8.8648784383E-004
13 -4.8090893023E-005 5.0813447259E-005 1.2192474211E-004 1.3853537972E-004 -4.4975512069E-005
14 1.1509828375E-004 2.6955725919E-004 4.6853708025E-004 5.4636589826E-005 5.9585997916E-004
15 -2.5088560837E-004 -1.4490239429E-004 -1.6517113547E-005 -2.3547725232E-004 7.0506301073E-005
16 2.1741623849E-005 -3.0396484786E-005 1.6435437640E-004 7.6123660238E-005 1.1552303684E-004
17 2.7209709129E-004 2.7234932342E-005 5.0963084246E-005 1.0117936124E-004 -4.5931984725E-005
18 -2.5882735848E-004 -6.0031848430E-004 1.4070861538E-004 -1.1535910049E-004 -2.0001808065E-004
19 -1.9638025822E-004 -2.9919459983E-005 -1.9047914816E-005 -1.0580143635E-004 -1.3503643634E-004
20 8.4829116415E-007 5.5948891149E-004 -1.7619563318E-004 2.5127749619E-004 2.6202088722E-004
21 1.0652521780E-003 -4.4872868033E-004 9.0739586785E-004 -9.3299673048E-004 1.7126146660E-004
22 1.4954902653E-003 -1.4772362211E-003 9.8175151528E-004 6.8801505444E-005 1.2934673074E-003
23 2.4072903510E-004 6.3199689136E-004 -2.9460500091E-004 1.2731327319E-004 3.4007600115E-004
24 1.1952923145E-003 -2.1389995888E-002 7.2832026293E-004 7.9688600183E-003 1.9615297182E-002
25 9.4289717269E-002 1.0562741426E-001 -1.7552990896E-004 7.0060843371E-003 8.7782610441E-003
26 1.0562750999E-001 3.0308674016E-001 -1.6382699707E-003 -5.5832273099E-003 -1.1726448645E-002
27 -1.7551353029E-004 -1.6382784849E-003 2.0673701256E-001 8.2101212014E-002 -1.3115219203E-001
28 7.0060896795E-003 -5.5832572276E-003 8.2101377926E-002 8.7668224780E-002 -5.4259499038E-002
29 8.7782416309E-003 -1.1726450275E-002 -1.3115216547E-001 -5.4259354736E-002 1.5092602943E-001
This should be a 30x30 matrix and I'm trying:
data = pd.read_fwf('C:/Users/henri/Documents/Projects/Python-Lessons/ORCA/orca.hess',
widths=[9, 19, 19, 19, 19, 19])
But it reads as 185x6. I'd like to ignore the first column (numbering the lines) from 0-29 and I'm not using the columns indexes (from 0-29 too) to perform any mathematical operation. Also, Pandas is rounding my numbers and I'd like to keep the original format.
Here is a snip of my output:
Unnamed: 0 0 1 2 3 4
0 0.0 5.167712e-01 0.000055 0.032484 -0.189017 -0.006716
1 1.0 5.538011e-05 0.561599 -0.001900 -0.014738 -0.072598
2 2.0 3.248692e-02 -0.001900 0.567918 0.072316 0.001501
Any help is much appreciated, guys.
import pandas as pd
filename = 'data'
df = pd.read_fwf(filename, widths=[9, 19, 19, 19, 19, 19])
df = df.rename(columns={'Unnamed: 0':'row'})
df = df.dropna(subset=['row'], how='any')
df['col'] = df.groupby('row').cumcount()
df = df.pivot(index='row', columns='col')
df = df.dropna(how='any', axis=1)
df.columns = range(len(df.columns))
print(df.head())
yields
0 1 2 3 4 5 6 \
row
0.0 0.516771 0.066743 0.003457 -0.122105 -0.034812 -0.000420 0.000055
1.0 0.000055 0.001410 -0.004069 0.113803 -0.007838 0.000153 0.561599
2.0 0.032487 -0.126894 0.001377 -0.038375 -0.312054 -0.000987 -0.001900
3.0 -0.189014 0.047664 -0.118741 -0.012823 -0.002800 -0.000223 -0.014737
4.0 -0.006714 0.018558 -0.033281 0.023865 0.000398 -0.005172 -0.072598
7 8 9 ... 20 21 22 \
row ...
0.0 -0.025292 0.006609 0.114780 ... -0.113527 -0.051389 -0.001430
1.0 0.037536 0.017154 -0.214876 ... -0.228978 -0.001149 -0.001783
2.0 -0.013453 0.004940 0.059953 ... -0.053033 -0.038484 -0.011426
3.0 -0.131811 -0.084587 -0.001281 ... 0.000745 0.002222 0.001710
4.0 0.068311 -0.043605 -0.000225 ... 0.000638 0.000249 -0.012016
23 24 25 26 27 28 29
row
0.0 0.000740 -0.006716 -0.018400 -0.032196 -0.001829 0.001625 -0.000347
1.0 0.001933 -0.072598 -0.021579 -0.053402 -0.045442 0.001637 -0.002303
2.0 0.000374 0.001501 -0.008308 -0.062527 -0.007680 -0.000782 -0.001134
3.0 0.003997 0.031554 -0.218476 0.009665 0.000770 0.003718 -0.012024
4.0 -0.006227 0.273181 -0.073414 0.015382 -0.001846 -0.010472 -0.005424
[5 rows x 30 columns]
After parsing the file with
df = pd.read_fwf(filename, widths=[9, 19, 19, 19, 19, 19])
df = df.rename(columns={'Unnamed: 0':'row'})
the column headers can be identified by have a df['row'] value of NaN.
So they can be removed with
df = df.dropna(subset=['row'], how='any')
Now the row numbers keep repeating from 0 to 29. If we group by the row
value, then we can assign an intra-group "cumulative count" to the rows within
each group. That is, the first row of the group gets assigned the value 0, the
next row 1, etc. -- within that group -- and the process is repeated for each
group.
df['col'] = df.groupby('row').cumcount()
# row 0 1 2 3 4 col
# 0 0.0 5.167712e-01 0.000055 0.032484 -0.189017 -0.006716 0
# 1 1.0 5.538011e-05 0.561599 -0.001900 -0.014738 -0.072598 0
# 2 2.0 3.248692e-02 -0.001900 0.567918 0.072316 0.001501 0
# ...
# 182 27.0 -1.755135e-04 -0.001638 0.206737 0.082101 -0.131152 5
# 183 28.0 7.006090e-03 -0.005583 0.082101 0.087668 -0.054259 5
# 184 29.0 8.778242e-03 -0.011726 -0.131152 -0.054259 0.150926 5
Now the desired DataFrame can be obtained by pivoting:
df = df.pivot(index='row', columns='col')
and relabeling the columns:
df.columns = range(len(df.columns))
A more NumPy-based approach might look like this:
import numpy as np
import pandas as pd
filename = 'data'
df = pd.read_csv(filename, delim_whitespace=True)
arr = df.values
N = df.index.max()+1
arr = np.delete(arr, np.arange(N, len(arr), N+1), axis=0)
chunks = np.split(arr, np.arange(N, len(arr), N))
result = pd.DataFrame(np.hstack(chunks)).dropna(axis=1)
print(result)
This will also work for any sized matrix.

Break Existing Dataframe Apart Based on Multi Index

I have an existing dataframe that is sorted like this:
In [3]: result_GB_daily_average
Out[3]:
NREL Avert
Month Day
1 1 14.718417 37.250000
2 40.381167 45.250000
3 42.512646 40.666667
4 12.166896 31.583333
5 14.583208 50.416667
6 34.238000 45.333333
7 45.581229 29.125000
8 60.548479 27.916667
9 48.061583 34.041667
10 20.606958 37.583333
11 5.418833 70.833333
12 51.261375 43.208333
13 21.796771 42.541667
14 27.118979 41.958333
15 8.230542 43.625000
16 14.233958 48.708333
17 28.345875 51.125000
18 43.896375 55.500000
19 95.800542 44.500000
20 53.763104 39.958333
21 26.171437 50.958333
22 20.372688 66.916667
23 20.594042 42.541667
24 16.889083 48.083333
25 16.416479 42.125000
26 28.459625 40.125000
27 1.055229 49.833333
28 36.798792 42.791667
29 27.260083 47.041667
30 23.584917 55.750000
... ... ...
12 2 34.491604 55.916667
3 26.444333 53.458333
4 15.088333 45.000000
5 10.213500 32.083333
6 19.087688 17.000000
7 23.078292 17.375000
8 41.523667 29.458333
9 17.173854 37.833333
10 11.488687 52.541667
11 15.203479 30.000000
12 8.390917 37.666667
13 70.067062 23.458333
14 24.281729 25.583333
15 31.826104 33.458333
16 5.085271 42.916667
17 3.778229 46.916667
18 31.276958 57.625000
19 7.399458 46.916667
20 18.531958 39.291667
21 26.831937 35.958333
22 55.514000 32.375000
23 24.018875 34.041667
24 54.454125 43.083333
25 57.379812 25.250000
26 94.520833 33.958333
27 49.693854 27.500000
28 2.406438 46.916667
29 7.133833 53.916667
30 7.829167 51.500000
31 5.584646 55.791667
I would like to split this dataframe apart into 12 different data frames, one for each month, but the problem is they are all slightly different lengths because the amount of days in a month vary, meaning that attempts at using np.array_split have failed. How can I split this based on the Month index?
One solution :
df=result_GB_daily_average
[df.iloc[df.index.get_level_values('Month')==i+1] for i in range(12)]
or, shorter:
[df.ix[i] for i in range(12)]

Nested if loop with DataFrame is very,very slow

I have 10 million rows to go through and it will take many hours to process, I must be doing something wrong
I converted the names of my df variables for ease in typing
Close=df['Close']
eqId=df['eqId']
date=df['date']
IntDate=df['IntDate']
expiry=df['expiry']
delta=df['delta']
ivMid=df['ivMid']
conf=df['conf']
The below code works fine, just ungodly slow, any suggestions?
print(datetime.datetime.now().time())
for i in range(2,1000):
if delta[i]==90:
if delta[i-1]==50:
if delta[i-2]==10:
if expiry[i]==expiry[i-2]:
df.Skew[i]=ivMid[i]-ivMid[i-2]
print(datetime.datetime.now().time())
14:02:11.014396
14:02:13.834275
df.head(100)
Close eqId date IntDate expiry delta ivMid conf Skew
0 37.380005 7 2008-01-02 39447 1 50 0.3850 0.8663
1 37.380005 7 2008-01-02 39447 1 90 0.5053 0.7876
2 36.960007 7 2008-01-03 39448 1 50 0.3915 0.8597
3 36.960007 7 2008-01-03 39448 1 90 0.5119 0.7438
4 35.179993 7 2008-01-04 39449 1 50 0.4055 0.8454
5 35.179993 7 2008-01-04 39449 1 90 0.5183 0.7736
6 33.899994 7 2008-01-07 39452 1 50 0.4464 0.8400
7 33.899994 7 2008-01-07 39452 1 90 0.5230 0.7514
8 31.250000 7 2008-01-08 39453 1 10 0.4453 0.7086
9 31.250000 7 2008-01-08 39453 1 50 0.4826 0.8246
10 31.250000 7 2008-01-08 39453 1 90 0.5668 0.6474 0.1215
11 30.830002 7 2008-01-09 39454 1 10 0.4716 0.7186
12 30.830002 7 2008-01-09 39454 1 50 0.4963 0.8479
13 30.830002 7 2008-01-09 39454 1 90 0.5735 0.6704 0.1019
14 31.460007 7 2008-01-10 39455 1 10 0.4254 0.6737
15 31.460007 7 2008-01-10 39455 1 50 0.4929 0.8218
16 31.460007 7 2008-01-10 39455 1 90 0.5902 0.6411 0.1648
17 30.699997 7 2008-01-11 39456 1 10 0.4868 0.7183
18 30.699997 7 2008-01-11 39456 1 50 0.4965 0.8411
19 30.639999 7 2008-01-14 39459 1 10 0.5117 0.7620
20 30.639999 7 2008-01-14 39459 1 50 0.4989 0.8804
21 30.639999 7 2008-01-14 39459 1 90 0.5887 0.6845 0.077
22 29.309998 7 2008-01-15 39460 1 10 0.4956 0.7363
23 29.309998 7 2008-01-15 39460 1 50 0.5054 0.8643
24 30.080002 7 2008-01-16 39461 1 10 0.4983 0.6646
At this rate it will take 7.77 hrs to process
Basically, the whole point of numpy & pandas is to avoid loops like the plague, and do things in a vectorial way. As you noticed, without that, speed is gone.
Let's break your problem into steps.
The Conditions
Here, your your first condition can be written like this:
df.delta == 90
(Note how this compares the entire column at once. This is much much faster than your loop!).
and the second one can be written like this (using shift):
df.delta.shift(1) == 50
The rest of your conditions are similar.
Note that to combine conditions, you need to use parentheses. So, the first two conditions, together, should be written as:
(df.delta == 90) & (df.delta.shift(1) == 50)
You should be able to now write an expression combining all your conditions. Let's call it cond, i.e.,
cond = (df.delta == 90) & (df.delta.shift(1) == 50) & ...
The assignment
To assign things to a new column, use
df['skew'] = ...
We just need to figure out what to put on the right-hand-sign
The Right Hand Side
Since we have cond, we can write the right-hand-side as
np.where(cond, df.ivMid - df.ivMid.shift(2), 0)
What this says is: when condition is true, take the second term; when it's not, take the third term (in this case I used 0, but do whatever you like).
By combining all of this, you should be able to write a very efficient version of your code.

Categories

Resources