I have a range of values i iterating through the number of hours in a year (8760) starting at 1. For every hour, the variable hour increments by 1 until it reaches 24 where it restarts. The variable year_day increments by 1 after every 24 hours is reached. Eg
i hour year_day
1 1 1
2 2 1
3 3 1
...
23 23 1
24 1 2
25 2 2
...
47 24 2
48 1 3
49 2 3
I'm struggling to make it so that when i = 24, hour also is 24 and year_day remains at 1. Then when i is the next value directly after a multiple is found, the hour restarts at 1 and year_day increments by 1. In other words, everytime it reaches midnight, the hour = 24 and year_day is still the previous day. Eg
i hour year_day
23 23 1
24 24 1
25 1 2
...
47 23 2
48 24 2
49 1 3
Here is the code:
hour = 0
year_day = 1
for i in range(1, 8761):
hour = hour + 1
if i % 24 == 0:
hour = 1
year_day = year_day + 1
print(i, hour, year_day)
Your code is ok, you just need to start with hour=1 and print before the if statement. Try the following:
hour = 1
year_day = 1
for i in range(1, 8761):
print(i, hour, year_day)
hour+=1
if i % 24 == 0:
hour = 1
year_day = year_day + 1
Output:
...
21 21 1
22 22 1
23 23 1
24 24 1
25 1 2
26 2 2
27 3 2
...
I have used a pandas approach to this question. The code is as follows:
import numpy as np
import pandas as pd
i = list(range(1,50))
df = pd.DataFrame(i, columns=["i"])
df["hours"] = df["i"]%24
df["hours"][df["hours"]==0] = 24
df["days"] = (df["i"]//24.1+1).astype(int)
display(df)
The output is:
i hours days
0 1 1 1
1 2 2 1
2 3 3 1
3 4 4 1
4 5 5 1
5 6 6 1
6 7 7 1
7 8 8 1
8 9 9 1
9 10 10 1
10 11 11 1
11 12 12 1
12 13 13 1
13 14 14 1
14 15 15 1
15 16 16 1
16 17 17 1
17 18 18 1
18 19 19 1
19 20 20 1
20 21 21 1
21 22 22 1
22 23 23 1
23 24 24 1
24 25 1 2
25 26 2 2
26 27 3 2
27 28 4 2
28 29 5 2
29 30 6 2
30 31 7 2
31 32 8 2
32 33 9 2
33 34 10 2
34 35 11 2
35 36 12 2
36 37 13 2
37 38 14 2
38 39 15 2
39 40 16 2
40 41 17 2
41 42 18 2
42 43 19 2
43 44 20 2
44 45 21 2
45 46 22 2
46 47 23 2
47 48 24 2
48 49 1 3
hour = 0
year_day = 1
for i in range(1, 8761):
if i % 24 == 0:
hour = 0
year_day += 1
hour += 1
print(i, hour, year_day)
Returns:
20 20 1
. . .
24 1 2
25 2 2
. . .
46 23 2
47 24 2
48 1 3
Related
How to iterate rows in pandas to perform the Manipulation in a format below
I have a csv file that contains a 365 column and 1152 rows(the rows index is divided like(1,48),(1,48)...), I need to select K maximum rows from every (1,48) row index and perform some manipulation.
Steps I took:
I used df.apply to do this.
Code I tried
def with_battery(val):
for i in range(d2i.shape[0]):
if i in [31,32,33,34,35,36]: #[31,32,33,34,35,36] should be replaced by top K max.
#batterysize = 50
if val.iloc[i]>batterysize:
val.iloc[i]=0
else:
val.iloc[i] -= batterysize
return val
D2j = D2i.apply(with_battery,axis=0)
How the data is:
**Input Dataframe**
1 2 3 4 5 6 7
1 10 11 34 21 23 12 10
2 11 11 11 11 11 11 11
3 32 32 32 32 32 32 32
4 21 21 21 21 21 21 21
5 42 42 42 42 42 42 42
6 34 34 34 34 34 34 34
1 21 21 21 21 21 21 21
2 22 22 22 22 22 22 22
3 54 54 54 54 54 54 54
4 45 45 45 45 45 45 45
5 43 43 43 43 43 43 43
6 42 42 42 42 42 42 42
> for K=3, the row (3,5,6) is max so I made the value less than 50 as Zero and value more than 50 as value - 50. Similarly in next chunk of rows (3,4,5) is top 3 max rows and I performed similar action as above
Output Dataframe
1 2 3 4 5 6 7
1 10 11 34 21 23 12 10
2 11 11 11 11 11 11 11
3 0 0 0 0 0 0 0
4 21 21 21 21 21 21 21
5 0 0 0 0 0 0 0
6 0 0 0 0 0 0 0
1 21 21 21 21 21 21 21
2 22 22 22 22 22 22 22
3 4 4 4 4 4 4 4
4 0 0 0 0 0 0 0
5 0 0 0 0 0 0 0
6 42 42 42 42 42 42 42
I have a dataframe with profit values, IDs, and week values. It looks a little like this
ID
Week
Profit
A
1
2
A
2
2
A
3
0
A
4
0
I want to create two new columns called "Bi-Weekly" and "Monthly", so week 1 would be label 2, week 2 would also be label 2, but week 3 would be labeled 4, and week 4 would be labeled 4, and they would all be labeled month 1, so I could groupby weekly, bi-weekly, or monthly profit as needed. Right now I've created two functions which work, but the weeks are going to go up to a year (52 weeks) so I was wondering if there's a more efficient way. My bi-weekly function below.
def biweek(prof_calc):
if (prof_calc['week']==2):
return 2
elif (prof_calc['week']==3):
return 2
elif (prof_calc['week']==4):
return 4
elif (prof_calc['week']==5):
return 4
elif (prof_calc['week']==6):
return 6
elif (prof_calc['week']==7):
return 6
elif (prof_calc['week']==8):
return 8
elif (prof_calc['week']==9):
return 8
elif (prof_calc['week']==10):
return 10
elif (prof_calc['week']==11):
return 10
prof_calc['BiWeek'] = prof_calc.apply(biweek, axis=1)
IIUC, you could try:
df["Biweekly"] = (df["Week"]-1)//2+1
df["Monthly"] = (df["Week"]-1)//4+1
>>> df
ID Week Profit Biweekly Monthly
0 A 1 42 1 1
1 A 2 69 1 1
2 A 3 53 2 1
3 A 4 63 2 1
4 A 5 56 3 2
5 A 6 57 3 2
6 A 7 86 4 2
7 A 8 23 4 2
8 A 9 35 5 3
9 A 10 10 5 3
10 A 11 25 6 3
11 A 12 21 6 3
12 A 13 39 7 4
13 A 14 82 7 4
14 A 15 76 8 4
15 A 16 20 8 4
16 A 17 97 9 5
17 A 18 67 9 5
18 A 19 21 10 5
19 A 20 22 10 5
20 A 21 88 11 6
21 A 22 67 11 6
22 A 23 33 12 6
23 A 24 38 12 6
24 A 25 8 13 7
25 A 26 67 13 7
26 A 27 16 14 7
27 A 28 49 14 7
28 A 29 3 15 8
29 A 30 17 15 8
30 A 31 79 16 8
31 A 32 19 16 8
32 A 33 21 17 9
33 A 34 9 17 9
34 A 35 56 18 9
35 A 36 83 18 9
36 A 37 1 19 10
37 A 38 53 19 10
38 A 39 66 20 10
39 A 40 55 20 10
40 A 41 85 21 11
41 A 42 90 21 11
42 A 43 34 22 11
43 A 44 3 22 11
44 A 45 9 23 12
45 A 46 28 23 12
46 A 47 58 24 12
47 A 48 14 24 12
48 A 49 42 25 13
49 A 50 69 25 13
50 A 51 76 26 13
51 A 52 49 26 13
I am working on a dataframe and I want to group the data for an hour into 4 different slots of 15 mins,
0-15 - 1st slot
15-30 - 2nd slot
30-45 - 3rd slot
45-00(or 60) - 4th slot
I am not even able to think, how to go forward with this
I tried extracting hours, minutes and seconds from the time, but what to do now?
Use integer division by 15 and then add 1:
df = pd.DataFrame({'M': range(60)})
df['slot'] = df['M'] // 15 + 1
print (df)
M slot
0 0 1
1 1 1
2 2 1
3 3 1
4 4 1
5 5 1
6 6 1
7 7 1
8 8 1
9 9 1
10 10 1
11 11 1
12 12 1
13 13 1
14 14 1
15 15 2
16 16 2
17 17 2
18 18 2
19 19 2
20 20 2
21 21 2
22 22 2
23 23 2
24 24 2
25 25 2
26 26 2
27 27 2
28 28 2
29 29 2
30 30 3
31 31 3
32 32 3
33 33 3
34 34 3
35 35 3
36 36 3
37 37 3
38 38 3
39 39 3
40 40 3
41 41 3
42 42 3
43 43 3
44 44 3
45 45 4
46 46 4
47 47 4
48 48 4
49 49 4
50 50 4
51 51 4
52 52 4
53 53 4
54 54 4
55 55 4
56 56 4
57 57 4
58 58 4
59 59 4
Here I'm sharing a sample data(I'm dealing with Big Data), the "counts" value varies from 1 to 3000+,, sometimes more than that..
Sample data looks like :
ID counts
41 44 17 16 19 52 6
17 30 16 19 4
52 41 44 30 17 16 6
41 44 52 41 41 41 6
17 17 17 17 41 5
I was trying to split "ID" column into multiple & trying to get that count,,
data= reading the csv_file
split_data = data.ID.apply(lambda x: pd.Series(str(x).split(" "))) # separating columns
as I mentioned, I'm dealing with big data,, so this method is not that much effective..i'm facing problem to get the "ID" counts
I want to collect the total counts of each ID & map it to the corresponding ID column.
Expected output:
ID counts 16 17 19 30 41 44 52
41 41 17 16 19 52 6 1 1 1 0 2 0 1
17 30 16 19 4 1 1 1 1 0 0 0
52 41 44 30 17 16 6 1 1 0 1 1 1 1
41 44 52 41 41 41 6 0 0 0 0 4 1 1
17 17 17 17 41 5 0 4 0 0 1 0 0
If you have any idea,, please let me know
Thank you
Use Counter for get counts of values splitted by space in list comprehension:
from collections import Counter
L = [{int(k): v for k, v in Counter(x.split()).items()} for x in df['ID']]
df1 = pd.DataFrame(L, index=df.index).fillna(0).astype(int).sort_index(axis=1)
df = df.join(df1)
print (df)
ID counts 16 17 19 30 41 44 52
0 41 44 17 16 19 52 6 1 1 1 0 1 1 1
1 17 30 16 19 4 1 1 1 1 0 0 0
2 52 41 44 30 17 16 6 1 1 0 1 1 1 1
3 41 44 52 41 41 41 6 0 0 0 0 4 1 1
4 17 17 17 17 41 5 0 4 0 0 1 0 0
Another idea, but I guess slowier:
df1 = df.assign(a = df['ID'].str.split()).explode('a')
df1 = df.join(pd.crosstab(df1['ID'], df1['a']), on='ID')
print (df1)
ID counts 16 17 19 30 41 44 52
0 41 44 17 16 19 52 6 1 1 1 0 1 1 1
1 17 30 16 19 4 1 1 1 1 0 0 0
2 52 41 44 30 17 16 6 1 1 0 1 1 1 1
3 41 44 52 41 41 41 6 0 0 0 0 4 1 1
4 17 17 17 17 41 5 0 4 0 0 1 0 0
I have some code here:
for i in range(self.size):
print('{:6d}'.format(self.data[i], end=' '))
if (i + 1) % NUMBER_OF_COLUMNS == 0:
print()
Right now this prints as:
1
1
1
1
1
2
3
3
3
3
(whitespace)
3
3
3
etc.
It creates a new line when it hits 10 digits, but it doens't print the initial 10 in a row...
This is what I want-
1 1 1 1 1 1 1 2 2 3
3 3 3 3 3 4 4 4 4 5
However when it hits two digit numbers it gets messed up -
8 8 8 8 8 9 9 9 9 10
10 10 10 10 10 10 etc.
I want it to be right-aligned like this-
8 8 8 8 8 9
10 10 10 10 11 12 etc.
When I remove the format piece it will print the rows out, but there wont be the extra spacing in there of course!
You can align strings by "padding" values using a string's .rjust method. Using some dummy data:
NUMBER_OF_COLUMNS = 10
for i in range(100):
print("{}".format(i//2).rjust(3), end=' ')
#print("{:3}".format(i//2), end=' ') edit: this also works. Thanks AChampion
if (i + 1) % NUMBER_OF_COLUMNS == 0:
print()
#Output:
0 0 1 1 2 2 3 3 4 4
5 5 6 6 7 7 8 8 9 9
10 10 11 11 12 12 13 13 14 14
15 15 16 16 17 17 18 18 19 19
20 20 21 21 22 22 23 23 24 24
25 25 26 26 27 27 28 28 29 29
30 30 31 31 32 32 33 33 34 34
35 35 36 36 37 37 38 38 39 39
40 40 41 41 42 42 43 43 44 44
45 45 46 46 47 47 48 48 49 49
Another approach is to just chunk up the data into rows and print each row, e.g.:
def chunk(iterable, n):
return zip(*[iter(iterable)]*n)
for row in chunk(self.data, NUMBER_OF_COLUMNS):
print(' '.join(str(data).rjust(6) for data in row))
e.g:
In []:
for row in chunk(range(100), 10):
print(' '.join(str(data//2).rjust(3) for data in row))
Out[]:
0 0 1 1 2 2 3 3 4 4
5 5 6 6 7 7 8 8 9 9
10 10 11 11 12 12 13 13 14 14
15 15 16 16 17 17 18 18 19 19
20 20 21 21 22 22 23 23 24 24
25 25 26 26 27 27 28 28 29 29
30 30 31 31 32 32 33 33 34 34
35 35 36 36 37 37 38 38 39 39
40 40 41 41 42 42 43 43 44 44
45 45 46 46 47 47 48 48 49 49