I have a df as follows
Time Samstag
0 00:15:00 80.6
1 00:30:00 74.6
2 00:45:00 69.2
3 01:00:00 63.6
4 01:15:00 57.1
5 01:30:00 50.4
6 01:45:00 44.1
7 02:00:00 39.1
8 02:15:00 36.0
9 02:30:00 34.4
10 02:45:00 33.7
11 03:00:00 33.3
12 03:15:00 32.7
13 03:30:00 32.0
14 03:45:00 31.5
15 04:00:00 31.3
16 04:15:00 31.5
17 04:30:00 31.7
18 04:45:00 31.5
19 05:00:00 30.3
20 05:15:00 28.1
21 05:30:00 26.4
22 05:45:00 27.1
23 06:00:00 32.3
24 06:15:00 42.9
25 06:30:00 56.2
26 06:45:00 68.5
27 07:00:00 76.3
28 07:15:00 77.0
29 07:30:00 72.9
30 07:45:00 67.3
31 08:00:00 63.6
32 08:15:00 64.5
33 08:30:00 69.5
34 08:45:00 77.4
35 09:00:00 87.1
36 09:15:00 97.4
37 09:30:00 108.4
38 09:45:00 119.9
39 10:00:00 132.1
40 10:15:00 144.7
41 10:30:00 156.7
42 10:45:00 166.9
43 11:00:00 174.1
44 11:15:00 177.4
45 11:30:00 177.7
46 11:45:00 176.2
47 12:00:00 174.1
48 12:15:00 172.6
49 12:30:00 172.0
50 12:45:00 172.4
51 13:00:00 174.1
52 13:15:00 177.1
53 13:30:00 180.4
54 13:45:00 183.0
55 14:00:00 183.9
56 14:15:00 182.4
57 14:30:00 179.5
58 14:45:00 176.6
59 15:00:00 175.1
60 15:15:00 176.0
61 15:30:00 178.9
62 15:45:00 182.8
63 16:00:00 186.8
64 16:15:00 190.3
65 16:30:00 193.8
66 16:45:00 197.9
67 17:00:00 203.5
68 17:15:00 210.8
69 17:30:00 218.8
70 17:45:00 226.3
71 18:00:00 231.8
72 18:15:00 234.4
73 18:30:00 234.5
74 18:45:00 233.0
75 19:00:00 230.9
76 19:15:00 228.7
77 19:30:00 226.9
78 19:45:00 225.3
79 20:00:00 224.0
80 20:15:00 223.0
81 20:30:00 221.5
82 20:45:00 218.9
83 21:00:00 214.2
84 21:15:00 207.0
85 21:30:00 197.0
86 21:45:00 184.4
87 22:00:00 169.2
88 22:15:00 151.8
89 22:30:00 133.7
90 22:45:00 116.7
91 23:00:00 102.7
92 23:15:00 93.0
93 23:30:00 86.6
94 23:45:00 82.2
I am trying to plot this as follows:
sns.lineplot(x="Time", y="Samstag", data=w_df)
plt.xticks(rotation=15)
plt.xlabel("Time")
plt.ylabel("KWH")
plt.show()
and it gives:
The label of x-axis is 00:00, 05:33:20, .... and so on.
I am trying to plot the Time column as the ticks in x-axis
I tried:
t = pd.to_datetime(w_df["Time"], format='%H:%M:%S')
t = t.apply(lambda x: x.strftime('%H:%M:%S'))
sns.lineplot(x="Time", y="Samstag", data=w_df)
plt.xticks(ticks=t, rotation=15)
plt.xlabel("Time")
plt.ylabel("KWH")
plt.show()
It throws the following error:
Traceback (most recent call last):
File "", line 2, in
plt.xticks(ticks=t, rotation=15)
File
"/home/user/anaconda3/lib/python3.7/site-packages/matplotlib/pyplot.py",
line 1540, in xticks
locs = ax.set_xticks(ticks)
File
"/home/user/anaconda3/lib/python3.7/site-packages/matplotlib/axes/_base.py",
line 3350, in set_xticks
ret = self.xaxis.set_ticks(ticks, minor=minor)
File
"/home/user/anaconda3/lib/python3.7/site-packages/matplotlib/axis.py",
line 1755, in set_ticks
self.set_view_interval(min(ticks), max(ticks))
File
"/home/user/anaconda3/lib/python3.7/site-packages/matplotlib/axis.py",
line 1892, in setter
setter(self, min(vmin, vmax, oldmin), max(vmin, vmax, oldmax),
TypeError: '<' not supported between instances of 'numpy.ndarray' and
'str'
Can anyone please tell the mistake that I am doing?
Also,
w_df.dtypes
Out[27]:
Time object
Samstag float64
Sonntag float64
Werktag float64
dtype: object
So I took some of your data and attempted to get your result. Unfortunately, my Seaborn plot is plotting in the same format that you would like. This may have to do with the format of your time column. When I made my small dataset from your example, I made the time column a string, and it appears that everything is plotting fine.
d = {'Time': ["00:15:00", "00:30:00", "00:45:00", "01:00:00", "01:15:00", "01:30:00", "01:45:00",
"02:00:00", "02:15:00", "02:30:00", "02:45:00", "03:00:00", "03:15:00", "03:30:00", "03:45:00",
"04:00:00", "04:15:00", "04:30:00", "04:45:00", "05:00:00", "05:15:00", "05:30:00",
"05:45:00", "06:00:00"],
'Samstag': [80.6, 74.6,69.2, 62.6, 57.1,50.4, 44.1, 39.1, 36.0, 34.4, 33.7,33.3, 32.7, 32.0,
31.5, 31.3, 31.5, 31.7, 31.5,30.3, 28.1, 26.4, 27.1, 32.3]
}
df = pd.DataFrame(d)
sns.lineplot(x="Time", y="Samstag", data=df)
plt.xticks(rotation=15)
plt.xlabel("Time")
plt.ylabel("KWH")
plt.show()
This makes every time stamp a tick mark. Perhaps you can change your time column to be a string, if it is not already.
df['Time'] = df['Time'].astype(str)
I have a dataframe with a datetime index and 100 columns.
I want to have a new dataframe with the same datetime index and columns, but the values would contain the sum of the first 10 hours of each day.
So if I had an original dataframe like this:
A B C
---------------------------------
2018-01-01 00:00:00 2 5 -10
2018-01-01 01:00:00 6 5 7
2018-01-01 02:00:00 7 5 9
2018-01-01 03:00:00 9 5 6
2018-01-01 04:00:00 10 5 2
2018-01-01 05:00:00 7 5 -1
2018-01-01 06:00:00 1 5 -1
2018-01-01 07:00:00 -4 5 10
2018-01-01 08:00:00 9 5 10
2018-01-01 09:00:00 21 5 -10
2018-01-01 10:00:00 2 5 -1
2018-01-01 11:00:00 8 5 -1
2018-01-01 12:00:00 8 5 10
2018-01-01 13:00:00 8 5 9
2018-01-01 14:00:00 7 5 -10
2018-01-01 15:00:00 7 5 5
2018-01-01 16:00:00 7 5 -10
2018-01-01 17:00:00 4 5 7
2018-01-01 18:00:00 5 5 8
2018-01-01 19:00:00 2 5 8
2018-01-01 20:00:00 2 5 4
2018-01-01 21:00:00 8 5 3
2018-01-01 22:00:00 1 5 3
2018-01-01 23:00:00 1 5 1
2018-01-02 00:00:00 2 5 2
2018-01-02 01:00:00 3 5 8
2018-01-02 02:00:00 4 5 6
2018-01-02 03:00:00 5 5 6
2018-01-02 04:00:00 1 5 7
2018-01-02 05:00:00 7 5 7
2018-01-02 06:00:00 5 5 1
2018-01-02 07:00:00 2 5 2
2018-01-02 08:00:00 4 5 3
2018-01-02 09:00:00 6 5 4
2018-01-02 10:00:00 9 5 4
2018-01-02 11:00:00 11 5 5
2018-01-02 12:00:00 2 5 8
2018-01-02 13:00:00 2 5 0
2018-01-02 14:00:00 4 5 5
2018-01-02 15:00:00 5 5 4
2018-01-02 16:00:00 7 5 4
2018-01-02 17:00:00 -1 5 7
2018-01-02 18:00:00 1 5 7
2018-01-02 19:00:00 1 5 7
2018-01-02 20:00:00 5 5 7
2018-01-02 21:00:00 2 5 7
2018-01-02 22:00:00 2 5 7
2018-01-02 23:00:00 8 5 7
So for all rows with date 2018-01-01:
The value for column A would be 68 (2+6+7+9+10+7+1-4+9+21)
The value for column B would be 50 (5+5+5+5+5+5+5+5+5+5)
The value for column C would be 22 (-10+7+9+6+2-1-1+10+10-10)
So for all rows with date 2018-01-02:
The value for column A would be 39 (2+3+4+5+1+7+5+2+4+6)
The value for column B would be 50 (5+5+5+5+5+5+5+5+5+5)
The value for column C would be 46 (2+8+6+6+7+7+1+2+3+4)
The outcome would be:
A B C
---------------------------------
2018-01-01 00:00:00 68 50 22
2018-01-01 01:00:00 68 50 22
2018-01-01 02:00:00 68 50 22
2018-01-01 03:00:00 68 50 22
2018-01-01 04:00:00 68 50 22
2018-01-01 05:00:00 68 50 22
2018-01-01 06:00:00 68 50 22
2018-01-01 07:00:00 68 50 22
2018-01-01 08:00:00 68 50 22
2018-01-01 09:00:00 68 50 22
2018-01-01 10:00:00 68 50 22
2018-01-01 11:00:00 68 50 22
2018-01-01 12:00:00 68 50 22
2018-01-01 13:00:00 68 50 22
2018-01-01 14:00:00 68 50 22
2018-01-01 15:00:00 68 50 22
2018-01-01 16:00:00 68 50 22
2018-01-01 17:00:00 68 50 22
2018-01-01 18:00:00 68 50 22
2018-01-01 19:00:00 68 50 22
2018-01-01 20:00:00 68 50 22
2018-01-01 21:00:00 68 50 22
2018-01-01 22:00:00 68 50 22
2018-01-01 23:00:00 68 50 22
2018-01-02 00:00:00 39 50 46
2018-01-02 01:00:00 39 50 46
2018-01-02 02:00:00 39 50 46
2018-01-02 03:00:00 39 50 46
2018-01-02 04:00:00 39 50 46
2018-01-02 05:00:00 39 50 46
2018-01-02 06:00:00 39 50 46
2018-01-02 07:00:00 39 50 46
2018-01-02 08:00:00 39 50 46
2018-01-02 09:00:00 39 50 46
2018-01-02 10:00:00 39 50 46
2018-01-02 11:00:00 39 50 46
2018-01-02 12:00:00 39 50 46
2018-01-02 13:00:00 39 50 46
2018-01-02 14:00:00 39 50 46
2018-01-02 15:00:00 39 50 46
2018-01-02 16:00:00 39 50 46
2018-01-02 17:00:00 39 50 46
2018-01-02 18:00:00 39 50 46
2018-01-02 19:00:00 39 50 46
2018-01-02 20:00:00 39 50 46
2018-01-02 21:00:00 39 50 46
2018-01-02 22:00:00 39 50 46
2018-01-02 23:00:00 39 50 46
I figured I'd group by date first and perform a sum and then merge the results based on the date. Is there a better/faster way to do this?
Thanks.
EDIT: I worked on this answer in the mean time:
df= df.between_time('0:00','9:00').groupby(pd.Grouper(freq='D')).sum()
df= df.resample('1H').ffill()
You need groupby df.index.date and use transfrom with lambda function to find sum of first 10 values as:
df.loc[:,['A','B','C']] = df.groupby(df.index.date).transform(lambda x: x[:10].sum())
Or if the sequence is the same for both grouped values and real columns
df.loc[:,:] = df.groupby(df.index.date).transform(lambda x: x[:10].sum())
print(df)
A B C
2018-01-01 00:00:00 68 50 22
2018-01-01 01:00:00 68 50 22
2018-01-01 02:00:00 68 50 22
2018-01-01 03:00:00 68 50 22
2018-01-01 04:00:00 68 50 22
2018-01-01 05:00:00 68 50 22
2018-01-01 06:00:00 68 50 22
2018-01-01 07:00:00 68 50 22
2018-01-01 08:00:00 68 50 22
2018-01-01 09:00:00 68 50 22
2018-01-01 10:00:00 68 50 22
2018-01-01 11:00:00 68 50 22
2018-01-01 12:00:00 68 50 22
2018-01-01 13:00:00 68 50 22
2018-01-01 14:00:00 68 50 22
2018-01-01 15:00:00 68 50 22
2018-01-01 16:00:00 68 50 22
2018-01-01 17:00:00 68 50 22
2018-01-01 18:00:00 68 50 22
2018-01-01 19:00:00 68 50 22
2018-01-01 20:00:00 68 50 22
2018-01-01 21:00:00 68 50 22
2018-01-01 22:00:00 68 50 22
2018-01-01 23:00:00 68 50 22
2018-01-02 00:00:00 39 50 46
2018-01-02 01:00:00 39 50 46
2018-01-02 02:00:00 39 50 46
2018-01-02 03:00:00 39 50 46
2018-01-02 04:00:00 39 50 46
2018-01-02 05:00:00 39 50 46
2018-01-02 06:00:00 39 50 46
2018-01-02 07:00:00 39 50 46
2018-01-02 08:00:00 39 50 46
2018-01-02 09:00:00 39 50 46
2018-01-02 10:00:00 39 50 46
2018-01-02 11:00:00 39 50 46
2018-01-02 12:00:00 39 50 46
2018-01-02 13:00:00 39 50 46
2018-01-02 14:00:00 39 50 46
2018-01-02 15:00:00 39 50 46
2018-01-02 16:00:00 39 50 46
2018-01-02 17:00:00 39 50 46
2018-01-02 18:00:00 39 50 46
2018-01-02 19:00:00 39 50 46
2018-01-02 20:00:00 39 50 46
2018-01-02 21:00:00 39 50 46
2018-01-02 22:00:00 39 50 46
2018-01-02 23:00:00 39 50 46
I have a pandas dataframe with time-series data in 1-min intervals. Is there a pythonic way to slice my data for every 15 min like this?
a=pd.DataFrame(index=pd.date_range('2017-01-01 00:04','2017-01-01 01:04',freq='1T'))
a['data']=np.arange(61)
for i in range(0,len(a),15):
print a[i:i+15]
Is there any built in function for this in pandas?
IIUC, use groups and pd.Grouper with freq=15min
for _, g in a.groupby(pd.Grouper(freq='15min')):
print(g)
Can also do
groups = a.groupby(pd.Grouper(freq='15min'))
list(groups)
Outputs
data
2017-01-01 00:04:00 0
2017-01-01 00:05:00 1
2017-01-01 00:06:00 2
2017-01-01 00:07:00 3
2017-01-01 00:08:00 4
2017-01-01 00:09:00 5
2017-01-01 00:10:00 6
2017-01-01 00:11:00 7
2017-01-01 00:12:00 8
2017-01-01 00:13:00 9
2017-01-01 00:14:00 10
data
2017-01-01 00:15:00 11
2017-01-01 00:16:00 12
2017-01-01 00:17:00 13
2017-01-01 00:18:00 14
2017-01-01 00:19:00 15
2017-01-01 00:20:00 16
2017-01-01 00:21:00 17
2017-01-01 00:22:00 18
2017-01-01 00:23:00 19
2017-01-01 00:24:00 20
2017-01-01 00:25:00 21
2017-01-01 00:26:00 22
2017-01-01 00:27:00 23
2017-01-01 00:28:00 24
2017-01-01 00:29:00 25
data
2017-01-01 00:30:00 26
2017-01-01 00:31:00 27
2017-01-01 00:32:00 28
2017-01-01 00:33:00 29
2017-01-01 00:34:00 30
2017-01-01 00:35:00 31
2017-01-01 00:36:00 32
2017-01-01 00:37:00 33
2017-01-01 00:38:00 34
2017-01-01 00:39:00 35
2017-01-01 00:40:00 36
2017-01-01 00:41:00 37
2017-01-01 00:42:00 38
2017-01-01 00:43:00 39
2017-01-01 00:44:00 40
I'm trying to slice a Dataframe using DateTimeIndex, but a got one issue.
When the new DataFrame Change Month, he switch the day and the month.
Here is my dataframe:
Valeur
date
2015-01-08 00:00:00 93
2015-01-08 00:10:00 90
2015-01-08 00:20:00 88
2015-01-08 00:30:00 103
2015-01-08 00:40:00 86
2015-01-08 00:50:00 88
2015-01-08 01:00:00 86
2015-01-08 01:10:00 84
2015-01-08 01:20:00 95
2015-01-08 01:30:00 88
2015-01-08 01:40:00 85
2015-01-08 01:50:00 92
... ...
2016-10-30 22:20:00 98
2016-10-30 22:30:00 94
2016-10-30 22:40:00 94
2016-10-30 22:50:00 103
2016-10-30 23:00:00 92
2016-10-30 23:10:00 85
2016-10-30 23:20:00 98
2016-10-30 23:30:00 96
2016-10-30 23:40:00 95
2016-10-30 23:50:00 101
[65814 rows x 1 columns]
Here my two TimeStamps:
startingDate : 2015-10-31 23:50:00
lastDate : 2016-10-30 23:50:00
When i slice my df like this :
dfconso = dfconso[startingDate:lastDate]
i got something like this :
Valeur
date
2015-10-31 23:50:00 88
2015-01-11 00:00:00 83
2015-01-11 00:10:00 82
2015-01-11 00:20:00 87
2015-01-11 00:30:00 77
2015-01-11 00:40:00 72
2015-01-11 00:50:00 86
2015-01-11 01:00:00 77
2015-01-11 01:10:00 80
... ...
2016-10-30 23:10:00 85
2016-10-30 23:20:00 98
2016-10-30 23:30:00 96
2016-10-30 23:40:00 95
2016-10-30 23:50:00 101
The problem is the slice start at the good date, but when the DateTimeIndex change month, something wrong append.
Pass from 31 October 2015 to 11 January 2015.
And i don't understand why..
I try to print the month and day to see and i got that :
In:
print("Index 0 : month", dfconso.index[0].month, ", day", dfconso.index[0].day)
print("Index 1 : month", dfconso.index[1].month, ", day", dfconso.index[1].day)
Out:
Index 0 : month 10 , day 31
Index 1 : month 1 , day 11
If someone has an idea
EDIT :
After df.sort_index() my df, i can see the convert of String date to TimeStamps date, didn't work sometimes, and switch Month and Day.
Format at String :
"31/08/2015 20:00:00"
My code to transform from String to TimeStamps:
dfconso.index = pd.to_datetime(dfconso.index, infer_datetime_format=True, format="%d/%m/%Y")
SOLUTION :
that was a bad use of pd.to_datetime, i change infer_date_time_format to Dayfirst :
dfconso.index = pd.to_datetime(dfconso.index, dayfirst=True)
That solve my problem.
The error might not be a mixup of day and month, but just an ordering problem. Try reordering the data before slicing it (the provided part of your data looks fine, but who knows about the rest..).
Here is how reordering works: Sort a pandas datetime index
I have a pandas DataFrame that represents a value for every hour of a day and I want to report each value of each day for a year. I have written the 'naive' way to do it. Is there a more efficient way?
Naive way (that works correctly, but takes a lot of time):
dfConsoFrigo = pd.read_csv("../assets/datas/refregirateur.csv", sep=';')
dataframe = pd.DataFrame(columns=['Puissance'])
iterator = 0
for day in pd.date_range("01 Jan 2017 00:00", "31 Dec 2017 23:00", freq='1H'):
iterator = iterator % 24
dataframe.loc[day] = dfConsoFrigo.iloc[iterator]['Puissance']
iterator += 1
Input (time;value) 24 rows:
Heure;Puissance
00:00;48.0
01:00;47.0
02:00;46.0
03:00;46.0
04:00;45.0
05:00;46.0
...
19:00;55.0
20:00;53.0
21:00;51.0
22:00;50.0
23:00;49.0
Expected Output (8760 rows):
Puissance
2017-01-01 00:00:00 48
2017-01-01 01:00:00 47
2017-01-01 02:00:00 46
2017-01-01 03:00:00 46
2017-01-01 04:00:00 45
...
2017-12-31 20:00:00 53
2017-12-31 21:00:00 51
2017-12-31 22:00:00 50
2017-12-31 23:00:00 49
I think you need numpy.tile:
np.random.seed(10)
df = pd.DataFrame({'Puissance':np.random.randint(100, size=24)})
rng = pd.date_range("01 Jan 2017 00:00", "31 Dec 2017 23:00", freq='1H')
df = pd.DataFrame({'a':np.tile(df['Puissance'].values, 365)}, index=rng)
print (df.head(30))
a
2017-01-01 00:00:00 9
2017-01-01 01:00:00 15
2017-01-01 02:00:00 64
2017-01-01 03:00:00 28
2017-01-01 04:00:00 89
2017-01-01 05:00:00 93
2017-01-01 06:00:00 29
2017-01-01 07:00:00 8
2017-01-01 08:00:00 73
2017-01-01 09:00:00 0
2017-01-01 10:00:00 40
2017-01-01 11:00:00 36
2017-01-01 12:00:00 16
2017-01-01 13:00:00 11
2017-01-01 14:00:00 54
2017-01-01 15:00:00 88
2017-01-01 16:00:00 62
2017-01-01 17:00:00 33
2017-01-01 18:00:00 72
2017-01-01 19:00:00 78
2017-01-01 20:00:00 49
2017-01-01 21:00:00 51
2017-01-01 22:00:00 54
2017-01-01 23:00:00 77
2017-01-02 00:00:00 9
2017-01-02 01:00:00 15
2017-01-02 02:00:00 64
2017-01-02 03:00:00 28
2017-01-02 04:00:00 89
2017-01-02 05:00:00 93