I have a dataframe that looks like this.
SectorID MyDate PrevMainCost OutageCost
0 123 10/31/2022 332 193
1 123 9/30/2022 308 269
2 123 8/31/2022 33 440
3 123 7/31/2022 230 147
4 123 6/30/2022 264 184
5 123 5/31/2022 290 46
6 123 4/30/2022 51 150
7 123 3/31/2022 69 253
8 123 2/28/2022 257 308
9 123 1/31/2022 441 349
10 456 10/31/2022 280 188
11 456 9/30/2022 432 150
12 456 8/31/2022 357 307
13 456 7/31/2022 425 45
14 456 6/30/2022 101 278
15 456 5/31/2022 62 240
16 456 4/30/2022 407 46
17 456 3/31/2022 35 218
18 456 2/28/2022 403 113
19 456 1/31/2022 295 200
20 456 12/31/2021 20 235
21 456 11/30/2021 440 403
22 789 10/31/2022 145 181
23 789 9/30/2022 320 259
24 789 8/31/2022 485 472
25 789 7/31/2022 59 24
26 789 6/30/2022 345 64
27 789 5/31/2022 34 480
28 789 4/30/2022 260 162
29 789 3/31/2022 46 399
30 999 10/31/2022 491 346
31 999 9/30/2022 77 212
32 999 8/31/2022 316 112
33 999 7/31/2022 106 351
34 999 6/30/2022 481 356
35 999 5/31/2022 20 269
36 999 4/30/2022 246 268
37 999 3/31/2022 377 173
38 999 2/28/2022 426 413
39 999 1/31/2022 341 168
40 999 12/31/2021 144 471
41 999 11/30/2021 358 393
42 999 10/31/2021 340 197
43 999 9/30/2021 119 252
44 999 8/31/2021 470 203
45 999 7/31/2021 359 163
46 999 6/30/2021 410 383
47 999 5/31/2021 200 119
48 999 4/30/2021 230 291
I am trying to find the minimum of PrevMainCost and OutageCost, after grouping by SectorID. Here's my primitive code.
import numpy as np
import pandas
df = pandas.read_clipboard(sep='\\s+')
df
df_sum = df.groupby('SectorID').sum()
df_sum
df_sum.loc[df_sum['PrevMainCost'] <= df_sum['OutageCost'], 'Result'] = 'Main'
df_sum.loc[df_sum['PrevMainCost'] > df_sum['OutageCost'], 'Result'] = 'Out'
Result: (result column shows flags whether PrevMainCost is lower or OutageCost is lower)
PrevMainCost OutageCost Result
SectorID
123 2275 2339 Main
456 3257 2423 Out
789 1694 2041 Main
999 5511 5140 Out
I am trying to figure out how to use Scipy Optimization to solve this problem. I Googled this problem and came up with this simple code sample.
from scipy.optimize import *
df_sum.groupby(by=['SectorID']).apply(lambda g: minimize(equation, g.Result, options={'xtol':0.001}).x)
When I run that, I get an error saying 'NameError: name 'equation' is not defined'.
How can I find the minimum of either the preventative maintenance cost or the outage cost, after grouping by SectorID? Also, how can I add some kind of constraint, such as no more than 30% of all resources can be used by one any particular SectorID?
I have this loop which print each 10 numbers in line then move to next line
for i in range(100, 201):
if i % 10 == 0:
print(i)
else:
print(i, end=" ", )
and this the result:
100
101 102 103 104 105 106 107 108 109 110
111 112 113 114 115 116 117 118 119 120
121 122 123 124 125 126 127 128 129 130
131 132 133 134 135 136 137 138 139 140
141 142 143 144 145 146 147 148 149 150
151 152 153 154 155 156 157 158 159 160
161 162 163 164 165 166 167 168 169 170
171 172 173 174 175 176 177 178 179 180
181 182 183 184 185 186 187 188 189 190
191 192 193 194 195 196 197 198 199 200
it printing first number in line alone, but the want the opposite, the last number alone, something like this
100 101 102 103 104 105 106 107 108 109
110 111 112 113 114 115 116 117 118 119
120 121 122 123 124 125 126 127 128 129
130 131 132 133 134 135 136 137 138 139
140 141 142 143 144 145 146 147 148 149
150 151 152 153 154 155 156 157 158 159
160 161 162 163 164 165 166 167 168 169
170 171 172 173 174 175 176 177 178 179
180 181 182 183 184 185 186 187 188 189
190 191 192 193 194 195 196 197 198 199
200
You have the correct code. The only thing is that you are breaking at mod 0. You should break at mod 9.
for i in range(100, 201):
if i % 10 == 9:
print(i)
else:
print(i, end=" ", )
Try this:
for i in range(100, 201):
if (i + 1) % 10 == 0:
print(i)
else:
print(i, end=" ")
Well, 100 % 10 is equals to zero.
Which means it's going to print 100 in a line, and 101 in another line.
If you want the new line to start from 100 then all you have to do is to make the statement i % 10 == 0 a false statement. So you will have to add one to the 100.
so just try this code instead
for i in range(100, 201):
if (i+1) % 10 == 0:
print(i)
else:
print(i, end=" ", )
or you could try changing i % 10 == 0 to i % 10 == 9
What I Can Say By Seeing Your Code Is That You Are Using 2 Print Statements. One Inside The If Statement And One Inside Else Statement.
So What Python Compiler Is Doing There Is Printing New Line When Each If Statement Is True.
So When Your Iterating Number%10 Will Be Zero It Will Print That Number And Add A New Line, Because After Each Print Statement Python Do So.
And As You Have end=" " in Else Statement So It Is Printing The Number In Same Line When IF Statements Are False.
So What You Can Do Is:
You Can Use if (i+1)%10: instead of i % 10 == 0:
When Your Iterating Number's(i's) One's Place Will Be 9 (i.e. 109,119,etc) The If Statement Will Be True For That One, The Number Will Be Printed On The Same Line And Will Add A New Line After That.
I have a dataframe as shown below:
Category 1 2 3 4 5 6 7 8 9 10 11 12 13
A 424 377 161 133 2 81 141 169 297 153 53 50 197
B 231 121 111 106 4 79 68 70 92 93 71 65 66
C 480 379 159 139 2 116 148 175 308 150 98 82 195
D 88 56 38 40 0 25 24 55 84 36 24 26 36
E 1084 1002 478 299 7 256 342 342 695 378 175 132 465
F 497 246 283 206 4 142 151 168 297 224 194 198 148
H 8 5 4 3 0 2 3 2 7 5 3 2 0
G 3191 2119 1656 856 50 826 955 739 1447 1342 975 628 1277
K 58 26 27 51 1 18 22 42 47 35 19 20 14
S 363 254 131 105 6 82 86 121 196 98 81 57 125
T 54 59 20 4 0 9 12 7 36 23 5 4 20
O 554 304 207 155 3 130 260 183 287 204 98 106 195
P 756 497 325 230 5 212 300 280 448 270 201 140 313
PP 64 43 26 17 1 15 35 17 32 28 18 9 27
R 265 157 109 89 1 68 68 104 154 96 63 55 90
S 377 204 201 114 5 112 267 136 209 172 147 90 157
St 770 443 405 234 5 172 464 232 367 270 290 136 294
Qs 47 33 11 14 0 18 14 19 26 17 5 6 13
Y 1806 626 1102 1177 14 625 619 1079 1273 981 845 891 455
W 123 177 27 28 0 18 62 34 64 27 14 4 51
Z 2770 1375 1579 1082 17 900 1630 1137 1465 1383 861 755 1201
I want to sort the dataframe by values in each row. Once done, I want to sort the index also.
For example the values in first row corresponding to category A, should appear as:
2 50 53 81 133 141 153 161 169 197 297 377 424
I have tried df.sort_values(by=df.index.tolist(), ascending=False, axis=1) but this doesn't work. The values don't appear in sorted order at all
np.sort + sort_index
You can use np.sort along axis=1, then sort_index:
cols, idx = df.columns[1:], df.iloc[:, 0]
res = pd.DataFrame(np.sort(df.iloc[:, 1:].values, axis=1), columns=cols, index=idx)\
.sort_index()
print(res)
1 2 3 4 5 6 7 8 9 10 11 12 \
Category
A 2 50 53 81 133 141 153 161 169 197 297 377
B 4 65 66 68 70 71 79 92 93 106 111 121
C 2 82 98 116 139 148 150 159 175 195 308 379
D 0 24 24 25 26 36 36 38 40 55 56 84
E 7 132 175 256 299 342 342 378 465 478 695 1002
F 4 142 148 151 168 194 198 206 224 246 283 297
G 50 628 739 826 856 955 975 1277 1342 1447 1656 2119
H 0 0 2 2 2 3 3 3 4 5 5 7
K 1 14 18 19 20 22 26 27 35 42 47 51
O 3 98 106 130 155 183 195 204 207 260 287 304
P 5 140 201 212 230 270 280 300 313 325 448 497
PP 1 9 15 17 17 18 26 27 28 32 35 43
Qs 0 5 6 11 13 14 14 17 18 19 26 33
R 1 55 63 68 68 89 90 96 104 109 154 157
S 6 57 81 82 86 98 105 121 125 131 196 254
S 5 90 112 114 136 147 157 172 201 204 209 267
St 5 136 172 232 234 270 290 294 367 405 443 464
T 0 4 4 5 7 9 12 20 20 23 36 54
W 0 4 14 18 27 27 28 34 51 62 64 123
Y 14 455 619 625 626 845 891 981 1079 1102 1177 1273
Z 1 17 755 861 900 1082 1137 1375 1383 1465 1579 1630
One way is to apply sorted setting 1 as axis, applying pd.Series to return a dataframe instead of a list, and finally sorting by Category:
df.loc[:,'1':].apply(sorted, axis = 1).apply(pd.Series)
.set_index(df.Category).sort_index()
Category 0 1 2 3 4 5 6 7 8 9 10 ...
0 A 2 50 53 81 133 141 153 161 169 197 297 ...
1 B 4 65 66 68 70 71 79 92 93 106 111 ...
I have following df
Blades & Razors & Foam Diaper Empty Fem Care HairCare Irrelevant Laundry Oral Care Others Personal Cleaning Care Skin Care
retailer
RTM 158 486 193 2755 3490 1458 889 2921 69 1543 645
RTM 39 0 28 2305 80 27 0 0 0 1207 414
RTM 98 276 121 1090 2359 717 561 911 293 1286 528
RTM 107 484 54 2136 2777 151 80 2191 7 1096 673
RTM 156 465 254 2972 2802 763 867 1065 8 2777 728
RTM 126 326 142 2126 2035 581 575 753 45 1768 292
RTM 0 0 181 1816 1455 598 579 0 2 749 451
RTM 86 374 308 2197 2075 576 698 693 26 1398 212
RTM 132 61 153 2094 1508 180 590 785 66 1519 486
RTM 90 303 8 0 0 18 0 60 0 358 0
RTM 0 14 6 190 198 21 131 75 18 171 0
I want make groupby() on my index and then get average on every column with the group? Any idea how to get so ?
To group on index, use:
df.groupby(level=0).mean()
or
df.groupby(df.index).mean()
Sample:
df = pd.DataFrame(data=np.random.random((10, 5)), columns=list('CDEFG'), index=list('AB')*5)
df.head()
C D E F G
A 0.230504 0.830818 0.560533 0.266903 0.745196
B 0.996806 0.861006 0.257780 0.258976 0.738617
A 0.409191 0.688814 0.214247 0.309678 0.565571
B 0.805192 0.940919 0.707562 0.772370 0.122562
A 0.596964 0.935662 0.493612 0.108362 0.673538
for either of the above yields:
C D E F G
A 0.328301 0.560188 0.632549 0.491101 0.343343
B 0.405996 0.490331 0.540921 0.394136 0.466504
C D E F G
A 0.328301 0.560188 0.632549 0.491101 0.343343
B 0.405996 0.490331 0.540921 0.394136 0.466504
I have 4 csv files named PV.csv, Dwel.csv, Sess.csv, and Elap.csv. I have 15 columns and arouind 2000 rows in each file. At first I would like to add a new column named Var in each file and fill up the cells of the new column with the same file name. Therefore, new column 'Var' in PV.csv file will filled up by PV. Same is for other 3 files.
After that I would like to manipulate the all the files as follows.
Finally I would like to merge / join these 4 files based on the A_ID and B_ID and write records into a new csv file name finalFile.csv.
Any suggestion and help is appreciated.
<p>PV.csv is as follows:</p>
A_ID B_ID LO UP LO UP
103 321 0 402
103 503 192 225 433 608
106 264 104 258 334 408
107 197 6 32 113 258
Dwell.csv is as follows:
A_ID B_ID LO UP LO UP
103 321 40 250 517 780
103 503 80 125 435 585
106 264 192 525 682
107 197 324 492 542 614
Session.csv is as follows:
A_ID B_ID LO UP LO UP
103 321 75 350 370 850
106 264 92 225 482 608
107 197 24 92 142
Elapsed.csv is as follows:
A_ID B_ID LO UP LO UP
103 321 5 35 75
103 503 100 225 333 408
106 264 102 325 582
107 197 24 92 142 214
First output file of PV.csv will be as follows:
Same way all rest of three files will be filled up with new column with ehrer file name, Dwell, Session, and Elapsed:
A_ID B_ID Var LO UP LO UP
103 321 PV 0 402
103 503 PV 192 225 433 608
106 264 PV 104 258 334 408
107 197 PV 6 32 113 258
Final output file will be as follows:
finalFile.csv.
A_ID B_ID Var LO UP
103 321 PV 0 402
103 321 Dwel 40 250
103 321 Dwel 251 517
103 321 Dwel 518 780
103 321 Sess 75 350
103 321 Sess 351 370
103 321 Sess 371 850
103 321 Elap 5 35
103 321 Elap 36 75
103 503 PV 192 225
103 503 PV 226 433
103 503 PV 434 608
103 503 Dwel 80 125
103 503 Dwel 126 435
103 503 Dwel 436 585
103 503 Elap 100 225
103 503 Elap 226 333
103 503 Elap 334 408
106 264 PV 104 258
106 264 PV 259 334
106 264 PV 335 408
106 264 Dwel 192 525
106 264 Dwel 526 682
106 264 Sess 92 225
106 264 Sess 226 482
106 264 Sess 483 608
106 264 Elap 102 325
106 264 Elap 326 582
107 197 PV 6 32
107 192 PV 33 113
107 192 PV 114 258
107 192 Dwel 324 492
107 192 Dwel 493 542
107 192 Dwel 543 614
107 192 Sess 24 92
107 192 Sess 93 142
107 192 Elap 24 92
107 192 Elap 93 142
107 192 Elap 143 214
You should use python builtin csv module.
To create the final csv file you can do like this. Read through each file, add the new column value to every row and write it to the new file
import csv
with open('finalcsv.csv', 'w') as outcsv:
writer = csv.writer(outcsv)
writer.writerow(['a','b','c','etc','Var']) # write final headers
for filename in ['PV.csv','Dwel.csv','Sess.csv','Elap.csv']:
with open(filename) as incsv:
val = filename.split('.csv')[0]
reader = csv.reader(incsv) # create reader object
reader.next() # skip the headers
for row in reader:
writer.writerow(row+[val])
The following script should get you started:
from collections import defaultdict
from itertools import groupby
import csv
entries = defaultdict(list)
csv_files = [(0, 'PV.csv', 'PV'), (1, 'Dwell.csv', 'Dwel'), (2, 'Session.csv', 'Sess'), (3, 'Elapsed.csv', 'Elap')]
for index, filename, shortname in csv_files:
f_input = open(filename, 'rb')
csv_input = csv.reader(f_input)
header = next(csv_input)
for row in csv_input:
row[:] = [col for col in row if col]
entries[(row[0], row[1])].append((index, shortname, row[2:]))
f_input.close()
f_output = open('finalFile.csv', 'wb')
csv_output = csv.writer(f_output)
csv_output.writerow(header[:2] + ['Var'] + header[2:4])
for key in sorted(entries.keys()):
for k, g in groupby(sorted(entries[key]), key=lambda x: x[1]):
var_group = list(g)
if len(var_group[0][2]):
up = var_group[0][2][0]
for entry in var_group:
for pair in zip(*[iter(entry[2])]*2):
csv_output.writerow([key[0], key[1], entry[1], up, pair[1]])
up = int(pair[1]) + 1
f_output.close()
Using the data you have provided, this gives the following output:
A_ID,B_ID,Var,LO,UP
103,321,PV,0,402
103,321,Dwel,40,250
103,321,Dwel,251,780
103,321,Sess,75,350
103,321,Sess,351,850
103,321,Elap,5,35
103,503,PV,192,225
103,503,PV,226,608
103,503,Dwel,80,125
103,503,Dwel,126,585
103,503,Elap,100,225
103,503,Elap,226,408
106,264,PV,104,258
106,264,PV,259,408
106,264,Dwel,192,525
106,264,Sess,92,225
106,264,Sess,226,608
106,264,Elap,102,325
107,197,PV,6,32
107,197,PV,33,258
107,197,Dwel,324,492
107,197,Dwel,493,614
107,197,Sess,24,92
107,197,Elap,24,92
107,197,Elap,93,214
To work with all csv files in a folder, you could add the following to the top of the script:
import os
import glob
csv_files = [(index, file, os.path.splitext(file)[0]) for index, file in enumerate(glob.glob('*.csv'))]
You should also change the location of the output file otherwise it will be read in the next time the script is run.
Tested using Python 2.6.6 (which I believe is what the OP is using)
There's a standard library module for these manipulations
https://docs.python.org/2/library/csv.html#module-csv
Not a full answer by any means, but your full implementation will almost certainly start there. The python docs above include several working examples which will get you started.