How to order dataframe for plotting 3d bar in pandas - python

I am trying to create a chart with multiple bars in 3d from pandas. Reviewing some examples on the web, I see that the best way to accomplish this is to get a dataframe like this:
data
Variable A B C D
date
2000-01-03 0.469112 -1.135632 0.119209 -2.104569
2000-01-04 -0.282863 1.212112 -1.044236 -0.494929
2000-01-05 -1.509059 -0.173215 -0.861849 1.071804
My dataframe is:
df
Date_inicio Date_Fin Date_Max Clase
0 2004-04-09 23:00:00 2004-04-10 04:00:00 2004-04-10 02:00:00 MBCCM
1 2004-04-12 23:00:00 2004-04-13 04:00:00 2004-04-13 00:00:00 MBSCL
2 2004-04-24 04:00:00 2004-04-24 12:00:00 2004-04-24 09:00:00 SCL
3 2004-05-02 07:00:00 2004-05-02 14:00:00 2004-05-02 11:00:00 SCL
4 2004-05-30 05:00:00 2004-05-30 08:00:00 2004-05-30 07:00:00 MBCCM
5 2004-05-31 03:00:00 2004-05-31 07:00:00 2004-05-31 05:00:00 MBCCM
6 2004-06-08 00:00:00 2004-06-08 05:00:00 2004-06-08 03:00:00 MBSCL
7 2004-06-12 22:00:00 2004-06-13 12:00:00 2004-06-13 06:00:00 CCM
8 2004-06-13 03:00:00 2004-06-13 08:00:00 2004-06-13 06:00:00 MBCCM
9 2004-06-14 00:00:00 2004-06-14 03:00:00 2004-06-14 02:00:00 MBSCL
10 2004-06-14 03:00:00 2004-06-14 09:00:00 2004-06-14 07:00:00 MBSCL
11 2004-06-17 08:00:00 2004-06-17 14:00:00 2004-06-17 11:00:00 MBCCM
12 2004-06-17 12:00:00 2004-06-17 17:00:00 2004-06-17 14:00:00 MBCCM
13 2004-06-22 00:00:00 2004-06-22 08:00:00 2004-06-22 06:00:00 SCL
14 2004-06-22 08:00:00 2004-06-22 14:00:00 2004-06-22 11:00:00 MBCCM
15 2004-06-22 23:00:00 2004-06-23 09:00:00 2004-06-23 06:00:00 CCM
16 2004-07-01 05:00:00 2004-07-01 09:00:00 2004-07-01 06:00:00 MBCCM
17 2004-07-02 00:00:00 2004-07-02 04:00:00 2004-07-02 02:00:00 MBSCL
18 2004-07-04 12:00:00 2004-07-04 15:00:00 2004-07-04 13:00:00 MBCCM
19 2004-07-06 04:00:00 2004-07-06 13:00:00 2004-07-06 07:00:00 SCL
20 2004-07-07 04:00:00 2004-07-07 12:00:00 2004-07-07 10:00:00 CCM
21 2004-07-08 03:00:00 2004-07-08 06:00:00 2004-07-08 05:00:00 MBCCM
22 2004-07-08 12:00:00 2004-07-08 17:00:00 2004-07-08 13:00:00 MBCCM
23 2004-07-08 02:00:00 2004-07-08 06:00:00 2004-07-08 04:00:00 MBCCM
24 2004-07-09 05:00:00 2004-07-09 12:00:00 2004-07-09 08:00:00 CCM
25 2004-07-11 18:00:00 2004-07-12 12:00:00 2004-07-11 21:00:00 MBSCL
26 2004-07-11 23:00:00 2004-07-12 05:00:00 2004-07-12 02:00:00 MBSCL
27 2004-07-15 11:00:00 2004-07-15 19:00:00 2004-07-15 12:00:00 CCM
28 2004-07-16 12:00:00 2004-07-16 16:00:00 2004-07-16 14:00:00 MBCCM
29 2004-07-17 02:00:00 2004-07-17 06:00:00 2004-07-17 05:00:00 MBCCM
Now I want to get the occurrence of all classes for hour. For example, how many times different classes occur at some time in Date_inicio, Date_fin and Date_max. From df i obtain the next frecuencies table,
frec
Frec_Inicio Frec_Max Frec_Fin
Horas
1 2 0 1
2 3 8 1
3 5 3 2
4 6 2 6
5 6 6 5
6 5 6 4
7 5 7 2
8 2 4 5
9 1 6 6
10 0 3 2
11 2 5 5
12 4 1 9
13 2 4 2
14 3 2 4
15 0 2 3
16 1 1 3
17 0 2 3
18 1 1 1
19 0 0 3
20 1 1 1
21 1 1 0
22 3 1 0
23 9 1 0
24 8 3 2
Now, my goal is to plot a 3D bar like the figure below
To achieve this, i write the following code
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
xpos=np.arange(frec.shape[0])
ypos=np.arange(frec.shape[1])
xpos, ypos = np.meshgrid(xpos+0.25, ypos+0.25)
xpos = xpos.flatten()
ypos = ypos.flatten()
zpos=np.zeros(frec.shape).flatten()
dx=0.5 * np.ones_like(zpos)
dy=0.5 * np.ones_like(zpos)
dz=frec.values.ravel()
dz[np.isnan(dz)]=0.
ax.bar3d(xpos,ypos,zpos,dx,dy,dz,color='b', alpha=0.5)
ax.set_xticks([.5,1.5,2.5])
ax.set_yticks([.5,1.5,2.5,3.5])
ax.w_yaxis.set_ticklabels(frec.columns)
ax.w_xaxis.set_ticklabels(frec.index)
ax.set_xlabel('Time')
ax.set_ylabel('B')
ax.set_zlabel('Occurrence')
plt.show()
How I get a better plot, similar to the previous figure?

Here is the code to do count:
import pandas as pd
text="""Date_inicio, Date_Fin, Date_Max, Clase
2004-04-09 23:00:00, 2004-04-10 04:00:00, 2004-04-10 02:00:00, MBCCM
2004-04-12 23:00:00, 2004-04-13 04:00:00, 2004-04-13 00:00:00, MBSCL
2004-04-24 04:00:00, 2004-04-24 12:00:00, 2004-04-24 09:00:00, SCL
2004-05-02 07:00:00, 2004-05-02 14:00:00, 2004-05-02 11:00:00, SCL
2004-05-30 05:00:00, 2004-05-30 08:00:00, 2004-05-30 07:00:00, MBCCM
2004-05-31 03:00:00, 2004-05-31 07:00:00, 2004-05-31 05:00:00, MBCCM
2004-06-08 00:00:00, 2004-06-08 05:00:00, 2004-06-08 03:00:00, MBSCL
2004-06-12 22:00:00, 2004-06-13 12:00:00, 2004-06-13 06:00:00, CCM
2004-06-13 03:00:00, 2004-06-13 08:00:00, 2004-06-13 06:00:00, MBCCM
2004-06-14 00:00:00, 2004-06-14 03:00:00, 2004-06-14 02:00:00, MBSCL
2004-06-14 03:00:00, 2004-06-14 09:00:00, 2004-06-14 07:00:00, MBSCL
2004-06-17 08:00:00, 2004-06-17 14:00:00, 2004-06-17 11:00:00, MBCCM
2004-06-17 12:00:00, 2004-06-17 17:00:00, 2004-06-17 14:00:00, MBCCM
2004-06-22 00:00:00, 2004-06-22 08:00:00, 2004-06-22 06:00:00, SCL
2004-06-22 08:00:00, 2004-06-22 14:00:00, 2004-06-22 11:00:00, MBCCM
2004-06-22 23:00:00, 2004-06-23 09:00:00, 2004-06-23 06:00:00, CCM
2004-07-01 05:00:00, 2004-07-01 09:00:00, 2004-07-01 06:00:00, MBCCM
2004-07-02 00:00:00, 2004-07-02 04:00:00, 2004-07-02 02:00:00, MBSCL
2004-07-04 12:00:00, 2004-07-04 15:00:00, 2004-07-04 13:00:00, MBCCM
2004-07-06 04:00:00, 2004-07-06 13:00:00, 2004-07-06 07:00:00, SCL
2004-07-07 04:00:00, 2004-07-07 12:00:00, 2004-07-07 10:00:00, CCM
2004-07-08 03:00:00, 2004-07-08 06:00:00, 2004-07-08 05:00:00, MBCCM
2004-07-08 12:00:00, 2004-07-08 17:00:00, 2004-07-08 13:00:00, MBCCM
2004-07-08 02:00:00, 2004-07-08 06:00:00, 2004-07-08 04:00:00, MBCCM
2004-07-09 05:00:00, 2004-07-09 12:00:00, 2004-07-09 08:00:00, CCM
2004-07-11 18:00:00, 2004-07-12 12:00:00, 2004-07-11 21:00:00, MBSCL
2004-07-11 23:00:00, 2004-07-12 05:00:00, 2004-07-12 02:00:00, MBSCL
2004-07-15 11:00:00, 2004-07-15 19:00:00, 2004-07-15 12:00:00, CCM
2004-07-16 12:00:00, 2004-07-16 16:00:00, 2004-07-16 14:00:00, MBCCM
2004-07-17 02:00:00, 2004-07-17 06:00:00, 2004-07-17 05:00:00, MBCCM"""
import io
df = pd.read_csv(io.BytesIO(text), skipinitialspace=True)
df.drop(["Clase"], axis=1, inplace=True)
df = df.apply(lambda s:s.str[11:13]).convert_objects(convert_numeric=True)
df2 = df.apply(lambda s:s.value_counts())
print df2
Here is the code that draw 3d bars:
import pandas as pd
text="""Horas Frec_Inicio Frec_Max Frec_Fin
1 2 0 1
2 3 8 1
3 5 3 2
4 6 2 6
5 6 6 5
6 5 6 4
7 5 7 2
8 2 4 5
9 1 6 6
10 0 3 2
11 2 5 5
12 4 1 9
13 2 4 2
14 3 2 4
15 0 2 3
16 1 1 3
17 0 2 3
18 1 1 1
19 0 0 3
20 1 1 1
21 1 1 0
22 3 1 0
23 9 1 0
24 8 3 2"""
import io
df = pd.read_csv(io.BytesIO(text), skipinitialspace=True, delim_whitespace=True)
df.set_index("Horas", inplace=True)
columns_name = [x.replace("_", " ") for x in df.columns]
df.columns = [0, 2, 4]
x, y, z = df.stack().reset_index().values.T
import visvis as vv
app = vv.use()
f = vv.clf()
a = vv.cla()
bar =vv.bar3(x, y, z, width=0.8)
bar.colors = ["r","g","b"] * 24
a.axis.yTicks = dict(zip(df.columns, columns_name))
app.Run()
the output:

Related

From hours to String

I have this df:
Index Dates
0 2017-01-01 23:30:00
1 2017-01-12 22:30:00
2 2017-01-20 13:35:00
3 2017-01-21 14:25:00
4 2017-01-28 22:30:00
5 2017-08-01 13:00:00
6 2017-09-26 09:39:00
7 2017-10-08 06:40:00
8 2017-10-04 07:30:00
9 2017-12-13 07:40:00
10 2017-12-31 14:55:00
The purpose was that between the time ranges 5:00 to 11:59 a new df would be created with data that would say: morning. To achieve this I converted those hours to booleans:
hour_morning=(pd.to_datetime(df['Dates']).dt.strftime('%H:%M:%S').between('05:00:00','11:59:00'))
and then passed them to a list with "morning" str
text_morning=[str('morning') for x in hour_morning if x==True]
I have the error in the last line because it only returns ´morning´ string values, it is as if the 'X' ignored the 'if' condition. Why is this happening and how do i fix it?
Do
text_morning=[str('morning') if x==True else 'not_morning' for x in hour_morning ]
You can also use np.where:
text_morning = np.where(hour_morning, 'morning', 'not morning')
Given:
Dates values
0 2017-01-01 23:30:00 0
1 2017-01-12 22:30:00 1
2 2017-01-20 13:35:00 2
3 2017-01-21 14:25:00 3
4 2017-01-28 22:30:00 4
5 2017-08-01 13:00:00 5
6 2017-09-26 09:39:00 6
7 2017-10-08 06:40:00 7
8 2017-10-04 07:30:00 8
9 2017-12-13 07:40:00 9
10 2017-12-31 14:55:00 10
Doing:
# df.Dates = pd.to_datetime(df.Dates)
df = df.set_index("Dates")
Now we can use pd.DataFrame.between_time:
new_df = df.between_time('05:00:00','11:59:00')
print(new_df)
Output:
values
Dates
2017-09-26 09:39:00 6
2017-10-08 06:40:00 7
2017-10-04 07:30:00 8
2017-12-13 07:40:00 9
Or use it to update the original dataframe:
df.loc[df.between_time('05:00:00','11:59:00').index, 'morning'] = 'morning'
# Output:
values morning
Dates
2017-01-01 23:30:00 0 NaN
2017-01-12 22:30:00 1 NaN
2017-01-20 13:35:00 2 NaN
2017-01-21 14:25:00 3 NaN
2017-01-28 22:30:00 4 NaN
2017-08-01 13:00:00 5 NaN
2017-09-26 09:39:00 6 morning
2017-10-08 06:40:00 7 morning
2017-10-04 07:30:00 8 morning
2017-12-13 07:40:00 9 morning
2017-12-31 14:55:00 10 NaN

How to create a set rows for every group depending on the result after the operation in the groupby() Pandas?

I have a situation where my pandas dataframe df looks like below:
stay_id starttime charttime dd day delta total cum_uo
0 30578301 2154-03-14 00:30:00 2154-03-13 13:00:00 0 days 11:30:00 0 0 0 90.0
1 30578301 2154-03-14 00:30:00 2154-03-13 14:00:00 0 days 10:30:00 0 1 1 215.0
2 30578301 2154-03-14 00:30:00 2154-03-13 15:00:00 0 days 09:30:00 0 1 2 325.0
3 30578301 2154-03-14 00:30:00 2154-03-13 16:00:00 0 days 08:30:00 0 1 3 370.0
4 30578301 2154-03-14 00:30:00 2154-03-13 17:00:00 0 days 07:30:00 0 1 4 425.0
5 30578301 2154-03-14 00:30:00 2154-03-13 18:00:00 0 days 06:30:00 0 1 5 490.0
6 30578301 2154-03-14 00:30:00 2154-03-13 19:00:00 0 days 05:30:00 0 1 6 540.0
7 30578301 2154-03-14 00:30:00 2154-03-13 20:00:00 0 days 04:30:00 0 1 7 615.0
8 30578301 2154-03-14 00:30:00 2154-03-13 21:00:00 0 days 03:30:00 0 1 8 660.0
9 30578301 2154-03-14 00:30:00 2154-03-13 22:00:00 0 days 02:30:00 0 1 9 710.0
10 30578301 2154-03-14 00:30:00 2154-03-13 23:00:00 0 days 01:30:00 0 1 10 740.0
11 30578301 2154-03-14 00:30:00 2154-03-14 00:00:00 0 days 00:30:00 0 1 11 780.0
12 30578301 2154-03-14 00:30:00 2154-03-14 01:00:00 -1 days+23:30:00 -1 1 12 905.0
13 30578301 2154-03-14 00:30:00 2154-03-14 02:00:00 -1 days+22:30:00 -1 1 13 1255.0
The frame I am showing above is grouped by stay_id, starttime.
I want to add the hour from "total" column as a new column for minimum value of delta=0 days 00:30:00, for df["day"]==0".
What I tried:
def helper(rows):
val = rows[rows["day"]==0]["dd"].min()
d_hour_array =rows[rows["dd"]==val]["total"].values[0]
return d_hour_array
df.groupby(['stay_id', 'starttime']).apply(helper)
The result I am getting
stay_id starttime
30578301 2154-03-14 00:30:00 11
2154-03-14 05:11:00 16
2154-03-14 09:41:00 20
2154-03-14 19:05:00 29
2154-03-15 09:59:00 44
2154-03-15 20:58:00 55
dtype: int64
How to add the values as new column 11, 16, 20, 29, 44, 45 from the result to each group after `groupby(['stay_id', 'starttime'])?
(like creating df["d_hour"] for each group).
Expected Output:
stay_id starttime charttime dd day delta total cum_uo d_hour
30578301 2154-03-14 00:30:00 2154-03-13 13:00:00 0 days 11:30:00 0 0 0 90.0 11
2154-03-14 00:30:00 2154-03-13 14:00:00 0 days 10:30:00 0 1 1 215.0 11
30578301 2154-03-14 05:11:00 2154-03-13 14:00:00 0 days 13:11:00 0 1 3 370.0 16
2154-03-14 05:11:00 2154-03-13 17:00:00 0 days 12:11:00 0 1 4 425.0 16
Any help is much appreciated please...

Error plotting a time column as x-axis ticks

I have a df as follows
Time Samstag
0 00:15:00 80.6
1 00:30:00 74.6
2 00:45:00 69.2
3 01:00:00 63.6
4 01:15:00 57.1
5 01:30:00 50.4
6 01:45:00 44.1
7 02:00:00 39.1
8 02:15:00 36.0
9 02:30:00 34.4
10 02:45:00 33.7
11 03:00:00 33.3
12 03:15:00 32.7
13 03:30:00 32.0
14 03:45:00 31.5
15 04:00:00 31.3
16 04:15:00 31.5
17 04:30:00 31.7
18 04:45:00 31.5
19 05:00:00 30.3
20 05:15:00 28.1
21 05:30:00 26.4
22 05:45:00 27.1
23 06:00:00 32.3
24 06:15:00 42.9
25 06:30:00 56.2
26 06:45:00 68.5
27 07:00:00 76.3
28 07:15:00 77.0
29 07:30:00 72.9
30 07:45:00 67.3
31 08:00:00 63.6
32 08:15:00 64.5
33 08:30:00 69.5
34 08:45:00 77.4
35 09:00:00 87.1
36 09:15:00 97.4
37 09:30:00 108.4
38 09:45:00 119.9
39 10:00:00 132.1
40 10:15:00 144.7
41 10:30:00 156.7
42 10:45:00 166.9
43 11:00:00 174.1
44 11:15:00 177.4
45 11:30:00 177.7
46 11:45:00 176.2
47 12:00:00 174.1
48 12:15:00 172.6
49 12:30:00 172.0
50 12:45:00 172.4
51 13:00:00 174.1
52 13:15:00 177.1
53 13:30:00 180.4
54 13:45:00 183.0
55 14:00:00 183.9
56 14:15:00 182.4
57 14:30:00 179.5
58 14:45:00 176.6
59 15:00:00 175.1
60 15:15:00 176.0
61 15:30:00 178.9
62 15:45:00 182.8
63 16:00:00 186.8
64 16:15:00 190.3
65 16:30:00 193.8
66 16:45:00 197.9
67 17:00:00 203.5
68 17:15:00 210.8
69 17:30:00 218.8
70 17:45:00 226.3
71 18:00:00 231.8
72 18:15:00 234.4
73 18:30:00 234.5
74 18:45:00 233.0
75 19:00:00 230.9
76 19:15:00 228.7
77 19:30:00 226.9
78 19:45:00 225.3
79 20:00:00 224.0
80 20:15:00 223.0
81 20:30:00 221.5
82 20:45:00 218.9
83 21:00:00 214.2
84 21:15:00 207.0
85 21:30:00 197.0
86 21:45:00 184.4
87 22:00:00 169.2
88 22:15:00 151.8
89 22:30:00 133.7
90 22:45:00 116.7
91 23:00:00 102.7
92 23:15:00 93.0
93 23:30:00 86.6
94 23:45:00 82.2
I am trying to plot this as follows:
sns.lineplot(x="Time", y="Samstag", data=w_df)
plt.xticks(rotation=15)
plt.xlabel("Time")
plt.ylabel("KWH")
plt.show()
and it gives:
The label of x-axis is 00:00, 05:33:20, .... and so on.
I am trying to plot the Time column as the ticks in x-axis
I tried:
t = pd.to_datetime(w_df["Time"], format='%H:%M:%S')
t = t.apply(lambda x: x.strftime('%H:%M:%S'))
sns.lineplot(x="Time", y="Samstag", data=w_df)
plt.xticks(ticks=t, rotation=15)
plt.xlabel("Time")
plt.ylabel("KWH")
plt.show()
It throws the following error:
Traceback (most recent call last):
File "", line 2, in
plt.xticks(ticks=t, rotation=15)
File
"/home/user/anaconda3/lib/python3.7/site-packages/matplotlib/pyplot.py",
line 1540, in xticks
locs = ax.set_xticks(ticks)
File
"/home/user/anaconda3/lib/python3.7/site-packages/matplotlib/axes/_base.py",
line 3350, in set_xticks
ret = self.xaxis.set_ticks(ticks, minor=minor)
File
"/home/user/anaconda3/lib/python3.7/site-packages/matplotlib/axis.py",
line 1755, in set_ticks
self.set_view_interval(min(ticks), max(ticks))
File
"/home/user/anaconda3/lib/python3.7/site-packages/matplotlib/axis.py",
line 1892, in setter
setter(self, min(vmin, vmax, oldmin), max(vmin, vmax, oldmax),
TypeError: '<' not supported between instances of 'numpy.ndarray' and
'str'
Can anyone please tell the mistake that I am doing?
Also,
w_df.dtypes
Out[27]:
Time object
Samstag float64
Sonntag float64
Werktag float64
dtype: object
So I took some of your data and attempted to get your result. Unfortunately, my Seaborn plot is plotting in the same format that you would like. This may have to do with the format of your time column. When I made my small dataset from your example, I made the time column a string, and it appears that everything is plotting fine.
d = {'Time': ["00:15:00", "00:30:00", "00:45:00", "01:00:00", "01:15:00", "01:30:00", "01:45:00",
"02:00:00", "02:15:00", "02:30:00", "02:45:00", "03:00:00", "03:15:00", "03:30:00", "03:45:00",
"04:00:00", "04:15:00", "04:30:00", "04:45:00", "05:00:00", "05:15:00", "05:30:00",
"05:45:00", "06:00:00"],
'Samstag': [80.6, 74.6,69.2, 62.6, 57.1,50.4, 44.1, 39.1, 36.0, 34.4, 33.7,33.3, 32.7, 32.0,
31.5, 31.3, 31.5, 31.7, 31.5,30.3, 28.1, 26.4, 27.1, 32.3]
}
df = pd.DataFrame(d)
sns.lineplot(x="Time", y="Samstag", data=df)
plt.xticks(rotation=15)
plt.xlabel("Time")
plt.ylabel("KWH")
plt.show()
This makes every time stamp a tick mark. Perhaps you can change your time column to be a string, if it is not already.
df['Time'] = df['Time'].astype(str)

How to extract rows between 2 times with Pandas?

I want to make sub dataframes out of one dataframe, using its datetime index. For example, if I want to extract rows between 07:00~06:00 and make new dataframes:
import pandas as pd
int_rows = 24
str_freq = '180min'
i = pd.date_range('2018-04-09', periods=int_rows, freq=str_freq)
df = pd.DataFrame({'A': [i for i in range(int_rows)]}, index=i)
>>> df
A
2018-04-09 00:00:00 0
2018-04-09 03:00:00 1
2018-04-09 06:00:00 2
2018-04-09 09:00:00 3
2018-04-09 12:00:00 4
2018-04-09 15:00:00 5
2018-04-09 18:00:00 6
2018-04-09 21:00:00 7
2018-04-10 00:00:00 8
2018-04-10 03:00:00 9
2018-04-10 06:00:00 10
2018-04-10 09:00:00 11
2018-04-10 12:00:00 12
2018-04-10 15:00:00 13
2018-04-10 18:00:00 14
2018-04-10 21:00:00 15
2018-04-11 00:00:00 16
2018-04-11 03:00:00 17
2018-04-11 06:00:00 18
2018-04-11 09:00:00 19
2018-04-11 12:00:00 20
2018-04-11 15:00:00 21
2018-04-11 18:00:00 22
2018-04-11 21:00:00 23
# new dataframes that I want
A
2018-04-09 00:00:00 0
2018-04-09 03:00:00 1
A
2018-04-09 06:00:00 2
2018-04-09 09:00:00 3
2018-04-09 12:00:00 4
2018-04-09 15:00:00 5
2018-04-09 18:00:00 6
2018-04-09 21:00:00 7
2018-04-10 00:00:00 8
2018-04-10 03:00:00 9
A
2018-04-10 06:00:00 10
2018-04-10 09:00:00 11
2018-04-10 12:00:00 12
2018-04-10 15:00:00 13
2018-04-10 18:00:00 14
2018-04-10 21:00:00 15
2018-04-11 00:00:00 16
2018-04-11 03:00:00 17
A
2018-04-11 06:00:00 18
2018-04-11 09:00:00 19
2018-04-11 12:00:00 20
2018-04-11 15:00:00 21
2018-04-11 18:00:00 22
2018-04-11 21:00:00 23
I found between_time method, but it doesn't care about dates. I could iterate over the original dataframe and check each date and time, but I think it's going to be inefficient. Are there any simple ways to do this?
You can 'shift' the timestamp by 6 hours and group by day:
for k, d in df.groupby((df.index - pd.to_timedelta('6:00:00')).normalize()):
print(d); print()
Output:
A
2018-04-09 00:00:00 0
2018-04-09 03:00:00 1
A
2018-04-09 06:00:00 2
2018-04-09 09:00:00 3
2018-04-09 12:00:00 4
2018-04-09 15:00:00 5
2018-04-09 18:00:00 6
2018-04-09 21:00:00 7
2018-04-10 00:00:00 8
2018-04-10 03:00:00 9
A
2018-04-10 06:00:00 10
2018-04-10 09:00:00 11
2018-04-10 12:00:00 12
2018-04-10 15:00:00 13
2018-04-10 18:00:00 14
2018-04-10 21:00:00 15
2018-04-11 00:00:00 16
2018-04-11 03:00:00 17
A
2018-04-11 06:00:00 18
2018-04-11 09:00:00 19
2018-04-11 12:00:00 20
2018-04-11 15:00:00 21
2018-04-11 18:00:00 22
2018-04-11 21:00:00 23

Grouping by monthly and plot a bar stacked in pandas

I would like to get a dataframe, where data representing different classes and monthly frequency for each class. For example, in the following dataframe want to use the column Forma for get a dataframe representing monthly frequencies of each of the classes of the column Forma y get for example a dataframe df1
df
Evento Forma Excentricidad
Fecha
2004-04-09 22:45:00 1 MBCCM 0.7
2004-04-12 22:45:00 2 MBSCL 0.6
2004-04-24 03:45:00 3 SCL 0.4
2004-05-02 06:45:00 4 SCL 0.5
2004-05-30 04:45:00 5 MBCCM 0.9
2004-05-31 03:15:00 6 MBCCM 0.8
2004-06-08 00:15:00 7 MBSCL 0.6
2004-06-12 22:15:00 8 CCM 1.0
2004-06-13 02:45:00 9 MBCCM 0.8
2004-06-13 23:45:00 10 MBSCL 0.6
2004-06-14 03:15:00 11 MBSCL 0.6
2004-06-17 08:15:00 12 MBCCM 0.7
2004-06-17 11:45:00 13 MBCCM 0.7
2004-06-22 00:15:00 14 SCL 0.5
2004-06-22 07:45:00 15 MBCCM 0.9
2004-06-22 22:45:00 16 CCM 0.8
2004-07-01 05:15:00 17 MBCCM 0.8
2004-07-02 00:15:00 18 MBSCL 0.6
2004-07-04 11:45:00 19 MBCCM 0.9
2004-07-06 03:45:00 20 SCL 0.6
2004-07-07 04:15:00 21 CCM 0.9
2004-07-08 02:45:00 22 MBCCM 1.0
2004-07-08 11:45:00 23 MBCCM 0.8
2004-07-08 02:15:00 24 MBCCM 0.9
2004-07-09 04:45:00 25 CCM 0.7
2004-07-11 18:15:00 26 MBSCL 0.4
2004-07-11 23:15:00 27 MBSCL 0.3
2004-07-15 10:45:00 28 CCM 0.8
2004-07-16 12:15:00 29 MBCCM 0.8
2004-07-17 02:15:00 30 MBCCM 0.8
2004-07-17 05:45:00 31 MBCCM 0.7
2004-07-19 23:15:00 32 CCM 0.9
2004-07-20 09:15:00 33 CCM 0.7
2004-07-20 21:45:00 34 SCL 0.6
2004-07-23 03:45:00 35 SCL 0.6
2004-07-23 12:45:00 36 MBCCM 0.9
2004-07-24 00:45:00 37 CCM 0.7
2004-07-26 00:15:00 38 MBCCM 0.8
2004-07-27 05:15:00 39 MBSCL 0.6
2004-07-27 07:15:00 40 MBSCL 0.6
2004-07-27 14:15:00 41 MBCCM 0.7
2004-07-27 19:45:00 42 SCL 0.6
2004-07-27 23:15:00 43 MBSCL 0.6
2004-07-28 07:15:00 44 MBCCM 0.8
2004-07-30 05:15:00 45 MBCCM 0.7
2004-07-31 00:15:00 46 SCL 0.5
2004-07-31 04:15:00 47 MBSCL 0.6
df1
Tipo Abril Mayo Junio Julio Agosto Septiembre Octubre
MCC 2 9 8 1 5 6 3
CCM 7 11 23 12 7 2 4
MBCCM 4 8 4 1 3 4 2
SCL 1 7 2 4 1 9 5
MBSCL 6 3 7 1 9 3 10
how can i do this, from df?
import pandas as pd
df = pd.read_table('data', sep='\s{2,}')
df.index = pd.to_datetime(df.index)
df['Month'] = [date.strftime('%B') for date in df.index]
print(pd.crosstab(rows=[df['Forma']], cols=[df['Month']], margins=False))
yields
Month April July June May
Forma
CCM 0 6 2 0
MBCCM 1 13 4 2
MBSCL 1 7 3 0
SCL 1 5 1 1

Categories

Resources