Error plotting a time column as x-axis ticks - python

I have a df as follows
Time Samstag
0 00:15:00 80.6
1 00:30:00 74.6
2 00:45:00 69.2
3 01:00:00 63.6
4 01:15:00 57.1
5 01:30:00 50.4
6 01:45:00 44.1
7 02:00:00 39.1
8 02:15:00 36.0
9 02:30:00 34.4
10 02:45:00 33.7
11 03:00:00 33.3
12 03:15:00 32.7
13 03:30:00 32.0
14 03:45:00 31.5
15 04:00:00 31.3
16 04:15:00 31.5
17 04:30:00 31.7
18 04:45:00 31.5
19 05:00:00 30.3
20 05:15:00 28.1
21 05:30:00 26.4
22 05:45:00 27.1
23 06:00:00 32.3
24 06:15:00 42.9
25 06:30:00 56.2
26 06:45:00 68.5
27 07:00:00 76.3
28 07:15:00 77.0
29 07:30:00 72.9
30 07:45:00 67.3
31 08:00:00 63.6
32 08:15:00 64.5
33 08:30:00 69.5
34 08:45:00 77.4
35 09:00:00 87.1
36 09:15:00 97.4
37 09:30:00 108.4
38 09:45:00 119.9
39 10:00:00 132.1
40 10:15:00 144.7
41 10:30:00 156.7
42 10:45:00 166.9
43 11:00:00 174.1
44 11:15:00 177.4
45 11:30:00 177.7
46 11:45:00 176.2
47 12:00:00 174.1
48 12:15:00 172.6
49 12:30:00 172.0
50 12:45:00 172.4
51 13:00:00 174.1
52 13:15:00 177.1
53 13:30:00 180.4
54 13:45:00 183.0
55 14:00:00 183.9
56 14:15:00 182.4
57 14:30:00 179.5
58 14:45:00 176.6
59 15:00:00 175.1
60 15:15:00 176.0
61 15:30:00 178.9
62 15:45:00 182.8
63 16:00:00 186.8
64 16:15:00 190.3
65 16:30:00 193.8
66 16:45:00 197.9
67 17:00:00 203.5
68 17:15:00 210.8
69 17:30:00 218.8
70 17:45:00 226.3
71 18:00:00 231.8
72 18:15:00 234.4
73 18:30:00 234.5
74 18:45:00 233.0
75 19:00:00 230.9
76 19:15:00 228.7
77 19:30:00 226.9
78 19:45:00 225.3
79 20:00:00 224.0
80 20:15:00 223.0
81 20:30:00 221.5
82 20:45:00 218.9
83 21:00:00 214.2
84 21:15:00 207.0
85 21:30:00 197.0
86 21:45:00 184.4
87 22:00:00 169.2
88 22:15:00 151.8
89 22:30:00 133.7
90 22:45:00 116.7
91 23:00:00 102.7
92 23:15:00 93.0
93 23:30:00 86.6
94 23:45:00 82.2
I am trying to plot this as follows:
sns.lineplot(x="Time", y="Samstag", data=w_df)
plt.xticks(rotation=15)
plt.xlabel("Time")
plt.ylabel("KWH")
plt.show()
and it gives:
The label of x-axis is 00:00, 05:33:20, .... and so on.
I am trying to plot the Time column as the ticks in x-axis
I tried:
t = pd.to_datetime(w_df["Time"], format='%H:%M:%S')
t = t.apply(lambda x: x.strftime('%H:%M:%S'))
sns.lineplot(x="Time", y="Samstag", data=w_df)
plt.xticks(ticks=t, rotation=15)
plt.xlabel("Time")
plt.ylabel("KWH")
plt.show()
It throws the following error:
Traceback (most recent call last):
File "", line 2, in
plt.xticks(ticks=t, rotation=15)
File
"/home/user/anaconda3/lib/python3.7/site-packages/matplotlib/pyplot.py",
line 1540, in xticks
locs = ax.set_xticks(ticks)
File
"/home/user/anaconda3/lib/python3.7/site-packages/matplotlib/axes/_base.py",
line 3350, in set_xticks
ret = self.xaxis.set_ticks(ticks, minor=minor)
File
"/home/user/anaconda3/lib/python3.7/site-packages/matplotlib/axis.py",
line 1755, in set_ticks
self.set_view_interval(min(ticks), max(ticks))
File
"/home/user/anaconda3/lib/python3.7/site-packages/matplotlib/axis.py",
line 1892, in setter
setter(self, min(vmin, vmax, oldmin), max(vmin, vmax, oldmax),
TypeError: '<' not supported between instances of 'numpy.ndarray' and
'str'
Can anyone please tell the mistake that I am doing?
Also,
w_df.dtypes
Out[27]:
Time object
Samstag float64
Sonntag float64
Werktag float64
dtype: object

So I took some of your data and attempted to get your result. Unfortunately, my Seaborn plot is plotting in the same format that you would like. This may have to do with the format of your time column. When I made my small dataset from your example, I made the time column a string, and it appears that everything is plotting fine.
d = {'Time': ["00:15:00", "00:30:00", "00:45:00", "01:00:00", "01:15:00", "01:30:00", "01:45:00",
"02:00:00", "02:15:00", "02:30:00", "02:45:00", "03:00:00", "03:15:00", "03:30:00", "03:45:00",
"04:00:00", "04:15:00", "04:30:00", "04:45:00", "05:00:00", "05:15:00", "05:30:00",
"05:45:00", "06:00:00"],
'Samstag': [80.6, 74.6,69.2, 62.6, 57.1,50.4, 44.1, 39.1, 36.0, 34.4, 33.7,33.3, 32.7, 32.0,
31.5, 31.3, 31.5, 31.7, 31.5,30.3, 28.1, 26.4, 27.1, 32.3]
}
df = pd.DataFrame(d)
sns.lineplot(x="Time", y="Samstag", data=df)
plt.xticks(rotation=15)
plt.xlabel("Time")
plt.ylabel("KWH")
plt.show()
This makes every time stamp a tick mark. Perhaps you can change your time column to be a string, if it is not already.
df['Time'] = df['Time'].astype(str)

Related

Select only the nth largest value in a Series, for each day

I have some noise survey data telling me noise levels measured over the period of several days. I want to find the 5th highest noise level in each night-time period. I have made this into a Pandas Series and used groupby and nlargest methods to show me the 5 highest noise levels each night, but now I want to view only the 5th highest value for each period (i.e. 82, 86, 86, 87 etc.). What's the best way to achieve this?
night_time_lmax.groupby(by=night_time_lmax.index.date).nlargest(5)
Start date & time
2021-08-18 2021-08-18 23:00:00 82.0
2021-08-18 23:15:00 82.0
2021-08-18 23:30:00 82.0
2021-08-18 23:45:00 82.0
2021-08-19 2021-08-19 05:45:00 100.0
2021-08-19 01:15:00 91.0
2021-08-19 04:45:00 87.0
2021-08-19 06:15:00 87.0
2021-08-19 01:45:00 86.0
2021-08-20 2021-08-20 06:30:00 90.0
2021-08-20 06:00:00 88.0
2021-08-20 03:15:00 87.0
2021-08-20 05:30:00 87.0
2021-08-20 01:15:00 86.0
2021-08-21 2021-08-21 01:30:00 98.0
2021-08-21 03:00:00 93.0
2021-08-21 00:45:00 88.0
2021-08-21 06:00:00 88.0
2021-08-21 03:30:00 87.0
2021-08-22 2021-08-22 23:45:00 102.0
2021-08-22 00:30:00 96.0
2021-08-22 06:30:00 92.0
2021-08-22 05:00:00 91.0
2021-08-22 01:30:00 90.0
2021-08-23 2021-08-23 01:15:00 98.0
2021-08-23 02:15:00 88.0
2021-08-23 00:45:00 87.0
2021-08-23 03:00:00 86.0
2021-08-23 06:00:00 86.0
2021-08-24 2021-08-24 01:00:00 93.0
2021-08-24 00:30:00 89.0
2021-08-24 06:30:00 87.0
2021-08-24 02:45:00 86.0
2021-08-24 06:00:00 86.0```
I see two options here.
Either sort your data by your value and then take the nth element per group:
(night_time_lmax.sort_values(by='value_column', ascending=False)
.groupby(by=night_time_lmax.index.date).nth(5)
)
## below gives the same result for shorter syntax:
# (night_time_lmax.sort_values(by='value_column')
# .groupby(by=night_time_lmax.index.date).nth(-5)
# )
Or use a double groupby, once for the top 5 and once for the last:
(night_time_lmax.groupby(by=night_time_lmax.index.date).nlargest(5)
.groupby(by=night_time_lmax.index.date).last()
)

Pandas - Sum of first X hours of datetime index

I have a dataframe with a datetime index and 100 columns.
I want to have a new dataframe with the same datetime index and columns, but the values would contain the sum of the first 10 hours of each day.
So if I had an original dataframe like this:
A B C
---------------------------------
2018-01-01 00:00:00 2 5 -10
2018-01-01 01:00:00 6 5 7
2018-01-01 02:00:00 7 5 9
2018-01-01 03:00:00 9 5 6
2018-01-01 04:00:00 10 5 2
2018-01-01 05:00:00 7 5 -1
2018-01-01 06:00:00 1 5 -1
2018-01-01 07:00:00 -4 5 10
2018-01-01 08:00:00 9 5 10
2018-01-01 09:00:00 21 5 -10
2018-01-01 10:00:00 2 5 -1
2018-01-01 11:00:00 8 5 -1
2018-01-01 12:00:00 8 5 10
2018-01-01 13:00:00 8 5 9
2018-01-01 14:00:00 7 5 -10
2018-01-01 15:00:00 7 5 5
2018-01-01 16:00:00 7 5 -10
2018-01-01 17:00:00 4 5 7
2018-01-01 18:00:00 5 5 8
2018-01-01 19:00:00 2 5 8
2018-01-01 20:00:00 2 5 4
2018-01-01 21:00:00 8 5 3
2018-01-01 22:00:00 1 5 3
2018-01-01 23:00:00 1 5 1
2018-01-02 00:00:00 2 5 2
2018-01-02 01:00:00 3 5 8
2018-01-02 02:00:00 4 5 6
2018-01-02 03:00:00 5 5 6
2018-01-02 04:00:00 1 5 7
2018-01-02 05:00:00 7 5 7
2018-01-02 06:00:00 5 5 1
2018-01-02 07:00:00 2 5 2
2018-01-02 08:00:00 4 5 3
2018-01-02 09:00:00 6 5 4
2018-01-02 10:00:00 9 5 4
2018-01-02 11:00:00 11 5 5
2018-01-02 12:00:00 2 5 8
2018-01-02 13:00:00 2 5 0
2018-01-02 14:00:00 4 5 5
2018-01-02 15:00:00 5 5 4
2018-01-02 16:00:00 7 5 4
2018-01-02 17:00:00 -1 5 7
2018-01-02 18:00:00 1 5 7
2018-01-02 19:00:00 1 5 7
2018-01-02 20:00:00 5 5 7
2018-01-02 21:00:00 2 5 7
2018-01-02 22:00:00 2 5 7
2018-01-02 23:00:00 8 5 7
So for all rows with date 2018-01-01:
The value for column A would be 68 (2+6+7+9+10+7+1-4+9+21)
The value for column B would be 50 (5+5+5+5+5+5+5+5+5+5)
The value for column C would be 22 (-10+7+9+6+2-1-1+10+10-10)
So for all rows with date 2018-01-02:
The value for column A would be 39 (2+3+4+5+1+7+5+2+4+6)
The value for column B would be 50 (5+5+5+5+5+5+5+5+5+5)
The value for column C would be 46 (2+8+6+6+7+7+1+2+3+4)
The outcome would be:
A B C
---------------------------------
2018-01-01 00:00:00 68 50 22
2018-01-01 01:00:00 68 50 22
2018-01-01 02:00:00 68 50 22
2018-01-01 03:00:00 68 50 22
2018-01-01 04:00:00 68 50 22
2018-01-01 05:00:00 68 50 22
2018-01-01 06:00:00 68 50 22
2018-01-01 07:00:00 68 50 22
2018-01-01 08:00:00 68 50 22
2018-01-01 09:00:00 68 50 22
2018-01-01 10:00:00 68 50 22
2018-01-01 11:00:00 68 50 22
2018-01-01 12:00:00 68 50 22
2018-01-01 13:00:00 68 50 22
2018-01-01 14:00:00 68 50 22
2018-01-01 15:00:00 68 50 22
2018-01-01 16:00:00 68 50 22
2018-01-01 17:00:00 68 50 22
2018-01-01 18:00:00 68 50 22
2018-01-01 19:00:00 68 50 22
2018-01-01 20:00:00 68 50 22
2018-01-01 21:00:00 68 50 22
2018-01-01 22:00:00 68 50 22
2018-01-01 23:00:00 68 50 22
2018-01-02 00:00:00 39 50 46
2018-01-02 01:00:00 39 50 46
2018-01-02 02:00:00 39 50 46
2018-01-02 03:00:00 39 50 46
2018-01-02 04:00:00 39 50 46
2018-01-02 05:00:00 39 50 46
2018-01-02 06:00:00 39 50 46
2018-01-02 07:00:00 39 50 46
2018-01-02 08:00:00 39 50 46
2018-01-02 09:00:00 39 50 46
2018-01-02 10:00:00 39 50 46
2018-01-02 11:00:00 39 50 46
2018-01-02 12:00:00 39 50 46
2018-01-02 13:00:00 39 50 46
2018-01-02 14:00:00 39 50 46
2018-01-02 15:00:00 39 50 46
2018-01-02 16:00:00 39 50 46
2018-01-02 17:00:00 39 50 46
2018-01-02 18:00:00 39 50 46
2018-01-02 19:00:00 39 50 46
2018-01-02 20:00:00 39 50 46
2018-01-02 21:00:00 39 50 46
2018-01-02 22:00:00 39 50 46
2018-01-02 23:00:00 39 50 46
I figured I'd group by date first and perform a sum and then merge the results based on the date. Is there a better/faster way to do this?
Thanks.
EDIT: I worked on this answer in the mean time:
df= df.between_time('0:00','9:00').groupby(pd.Grouper(freq='D')).sum()
df= df.resample('1H').ffill()
You need groupby df.index.date and use transfrom with lambda function to find sum of first 10 values as:
df.loc[:,['A','B','C']] = df.groupby(df.index.date).transform(lambda x: x[:10].sum())
Or if the sequence is the same for both grouped values and real columns
df.loc[:,:] = df.groupby(df.index.date).transform(lambda x: x[:10].sum())
print(df)
A B C
2018-01-01 00:00:00 68 50 22
2018-01-01 01:00:00 68 50 22
2018-01-01 02:00:00 68 50 22
2018-01-01 03:00:00 68 50 22
2018-01-01 04:00:00 68 50 22
2018-01-01 05:00:00 68 50 22
2018-01-01 06:00:00 68 50 22
2018-01-01 07:00:00 68 50 22
2018-01-01 08:00:00 68 50 22
2018-01-01 09:00:00 68 50 22
2018-01-01 10:00:00 68 50 22
2018-01-01 11:00:00 68 50 22
2018-01-01 12:00:00 68 50 22
2018-01-01 13:00:00 68 50 22
2018-01-01 14:00:00 68 50 22
2018-01-01 15:00:00 68 50 22
2018-01-01 16:00:00 68 50 22
2018-01-01 17:00:00 68 50 22
2018-01-01 18:00:00 68 50 22
2018-01-01 19:00:00 68 50 22
2018-01-01 20:00:00 68 50 22
2018-01-01 21:00:00 68 50 22
2018-01-01 22:00:00 68 50 22
2018-01-01 23:00:00 68 50 22
2018-01-02 00:00:00 39 50 46
2018-01-02 01:00:00 39 50 46
2018-01-02 02:00:00 39 50 46
2018-01-02 03:00:00 39 50 46
2018-01-02 04:00:00 39 50 46
2018-01-02 05:00:00 39 50 46
2018-01-02 06:00:00 39 50 46
2018-01-02 07:00:00 39 50 46
2018-01-02 08:00:00 39 50 46
2018-01-02 09:00:00 39 50 46
2018-01-02 10:00:00 39 50 46
2018-01-02 11:00:00 39 50 46
2018-01-02 12:00:00 39 50 46
2018-01-02 13:00:00 39 50 46
2018-01-02 14:00:00 39 50 46
2018-01-02 15:00:00 39 50 46
2018-01-02 16:00:00 39 50 46
2018-01-02 17:00:00 39 50 46
2018-01-02 18:00:00 39 50 46
2018-01-02 19:00:00 39 50 46
2018-01-02 20:00:00 39 50 46
2018-01-02 21:00:00 39 50 46
2018-01-02 22:00:00 39 50 46
2018-01-02 23:00:00 39 50 46

How do I group hourly data by day and count only values greater than a set amount in Pandas?

I am new to Pandas but have been working with python for a few years now.
I have a large data set of hourly data with multiple columns. I need to group the data by day then count how many times the value is above 85 for each day for each column.
example data:
date KMRY KSNS PCEC1 KFAT
2014-06-06 13:00:00 56.000000 63.0 17 11
2014-06-06 14:00:00 58.000000 61.0 17 11
2014-06-06 15:00:00 63.000000 63.0 16 10
2014-06-06 16:00:00 67.000000 65.0 12 11
2014-06-06 17:00:00 67.000000 67.0 10 13
2014-06-06 18:00:00 72.000000 75.0 9 14
2014-06-06 19:00:00 77.000000 79.0 9 15
2014-06-06 20:00:00 84.000000 81.0 9 23
2014-06-06 21:00:00 81.000000 86.0 12 31
2014-06-06 22:00:00 84.000000 84.0 13 28
2014-06-06 23:00:00 83.000000 86.0 15 34
2014-06-07 00:00:00 84.000000 86.0 16 36
2014-06-07 01:00:00 86.000000 89.0 17 43
2014-06-07 02:00:00 86.000000 89.0 20 44
2014-06-07 03:00:00 89.000000 89.0 22 49
2014-06-07 04:00:00 86.000000 86.0 22 51
2014-06-07 05:00:00 86.000000 89.0 21 53
From the sample above my results should look like the following:
date KMRY KSNS PCEC1 KFAT
2014-06-06 0 2 0 0
2014-06-07 5 6 0 0
Any help you be greatly appreciated.
(D_RH>85).sum()
The above code gets me close but I need a daily break down also not just the column counts.
One way would be to make date a DatetimeIndex and then groupby the result of the comparison to 85. For example:
>>> df["date"] = pd.to_datetime(df["date"]) # only if it isn't already
>>> df = df.set_index("date")
>>> (df > 85).groupby(df.index.date).sum()
KMRY KSNS PCEC1 KFAT
2014-06-06 0 2 0 0
2014-06-07 5 6 0 0

How to order dataframe for plotting 3d bar in pandas

I am trying to create a chart with multiple bars in 3d from pandas. Reviewing some examples on the web, I see that the best way to accomplish this is to get a dataframe like this:
data
Variable A B C D
date
2000-01-03 0.469112 -1.135632 0.119209 -2.104569
2000-01-04 -0.282863 1.212112 -1.044236 -0.494929
2000-01-05 -1.509059 -0.173215 -0.861849 1.071804
My dataframe is:
df
Date_inicio Date_Fin Date_Max Clase
0 2004-04-09 23:00:00 2004-04-10 04:00:00 2004-04-10 02:00:00 MBCCM
1 2004-04-12 23:00:00 2004-04-13 04:00:00 2004-04-13 00:00:00 MBSCL
2 2004-04-24 04:00:00 2004-04-24 12:00:00 2004-04-24 09:00:00 SCL
3 2004-05-02 07:00:00 2004-05-02 14:00:00 2004-05-02 11:00:00 SCL
4 2004-05-30 05:00:00 2004-05-30 08:00:00 2004-05-30 07:00:00 MBCCM
5 2004-05-31 03:00:00 2004-05-31 07:00:00 2004-05-31 05:00:00 MBCCM
6 2004-06-08 00:00:00 2004-06-08 05:00:00 2004-06-08 03:00:00 MBSCL
7 2004-06-12 22:00:00 2004-06-13 12:00:00 2004-06-13 06:00:00 CCM
8 2004-06-13 03:00:00 2004-06-13 08:00:00 2004-06-13 06:00:00 MBCCM
9 2004-06-14 00:00:00 2004-06-14 03:00:00 2004-06-14 02:00:00 MBSCL
10 2004-06-14 03:00:00 2004-06-14 09:00:00 2004-06-14 07:00:00 MBSCL
11 2004-06-17 08:00:00 2004-06-17 14:00:00 2004-06-17 11:00:00 MBCCM
12 2004-06-17 12:00:00 2004-06-17 17:00:00 2004-06-17 14:00:00 MBCCM
13 2004-06-22 00:00:00 2004-06-22 08:00:00 2004-06-22 06:00:00 SCL
14 2004-06-22 08:00:00 2004-06-22 14:00:00 2004-06-22 11:00:00 MBCCM
15 2004-06-22 23:00:00 2004-06-23 09:00:00 2004-06-23 06:00:00 CCM
16 2004-07-01 05:00:00 2004-07-01 09:00:00 2004-07-01 06:00:00 MBCCM
17 2004-07-02 00:00:00 2004-07-02 04:00:00 2004-07-02 02:00:00 MBSCL
18 2004-07-04 12:00:00 2004-07-04 15:00:00 2004-07-04 13:00:00 MBCCM
19 2004-07-06 04:00:00 2004-07-06 13:00:00 2004-07-06 07:00:00 SCL
20 2004-07-07 04:00:00 2004-07-07 12:00:00 2004-07-07 10:00:00 CCM
21 2004-07-08 03:00:00 2004-07-08 06:00:00 2004-07-08 05:00:00 MBCCM
22 2004-07-08 12:00:00 2004-07-08 17:00:00 2004-07-08 13:00:00 MBCCM
23 2004-07-08 02:00:00 2004-07-08 06:00:00 2004-07-08 04:00:00 MBCCM
24 2004-07-09 05:00:00 2004-07-09 12:00:00 2004-07-09 08:00:00 CCM
25 2004-07-11 18:00:00 2004-07-12 12:00:00 2004-07-11 21:00:00 MBSCL
26 2004-07-11 23:00:00 2004-07-12 05:00:00 2004-07-12 02:00:00 MBSCL
27 2004-07-15 11:00:00 2004-07-15 19:00:00 2004-07-15 12:00:00 CCM
28 2004-07-16 12:00:00 2004-07-16 16:00:00 2004-07-16 14:00:00 MBCCM
29 2004-07-17 02:00:00 2004-07-17 06:00:00 2004-07-17 05:00:00 MBCCM
Now I want to get the occurrence of all classes for hour. For example, how many times different classes occur at some time in Date_inicio, Date_fin and Date_max. From df i obtain the next frecuencies table,
frec
Frec_Inicio Frec_Max Frec_Fin
Horas
1 2 0 1
2 3 8 1
3 5 3 2
4 6 2 6
5 6 6 5
6 5 6 4
7 5 7 2
8 2 4 5
9 1 6 6
10 0 3 2
11 2 5 5
12 4 1 9
13 2 4 2
14 3 2 4
15 0 2 3
16 1 1 3
17 0 2 3
18 1 1 1
19 0 0 3
20 1 1 1
21 1 1 0
22 3 1 0
23 9 1 0
24 8 3 2
Now, my goal is to plot a 3D bar like the figure below
To achieve this, i write the following code
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
xpos=np.arange(frec.shape[0])
ypos=np.arange(frec.shape[1])
xpos, ypos = np.meshgrid(xpos+0.25, ypos+0.25)
xpos = xpos.flatten()
ypos = ypos.flatten()
zpos=np.zeros(frec.shape).flatten()
dx=0.5 * np.ones_like(zpos)
dy=0.5 * np.ones_like(zpos)
dz=frec.values.ravel()
dz[np.isnan(dz)]=0.
ax.bar3d(xpos,ypos,zpos,dx,dy,dz,color='b', alpha=0.5)
ax.set_xticks([.5,1.5,2.5])
ax.set_yticks([.5,1.5,2.5,3.5])
ax.w_yaxis.set_ticklabels(frec.columns)
ax.w_xaxis.set_ticklabels(frec.index)
ax.set_xlabel('Time')
ax.set_ylabel('B')
ax.set_zlabel('Occurrence')
plt.show()
How I get a better plot, similar to the previous figure?
Here is the code to do count:
import pandas as pd
text="""Date_inicio, Date_Fin, Date_Max, Clase
2004-04-09 23:00:00, 2004-04-10 04:00:00, 2004-04-10 02:00:00, MBCCM
2004-04-12 23:00:00, 2004-04-13 04:00:00, 2004-04-13 00:00:00, MBSCL
2004-04-24 04:00:00, 2004-04-24 12:00:00, 2004-04-24 09:00:00, SCL
2004-05-02 07:00:00, 2004-05-02 14:00:00, 2004-05-02 11:00:00, SCL
2004-05-30 05:00:00, 2004-05-30 08:00:00, 2004-05-30 07:00:00, MBCCM
2004-05-31 03:00:00, 2004-05-31 07:00:00, 2004-05-31 05:00:00, MBCCM
2004-06-08 00:00:00, 2004-06-08 05:00:00, 2004-06-08 03:00:00, MBSCL
2004-06-12 22:00:00, 2004-06-13 12:00:00, 2004-06-13 06:00:00, CCM
2004-06-13 03:00:00, 2004-06-13 08:00:00, 2004-06-13 06:00:00, MBCCM
2004-06-14 00:00:00, 2004-06-14 03:00:00, 2004-06-14 02:00:00, MBSCL
2004-06-14 03:00:00, 2004-06-14 09:00:00, 2004-06-14 07:00:00, MBSCL
2004-06-17 08:00:00, 2004-06-17 14:00:00, 2004-06-17 11:00:00, MBCCM
2004-06-17 12:00:00, 2004-06-17 17:00:00, 2004-06-17 14:00:00, MBCCM
2004-06-22 00:00:00, 2004-06-22 08:00:00, 2004-06-22 06:00:00, SCL
2004-06-22 08:00:00, 2004-06-22 14:00:00, 2004-06-22 11:00:00, MBCCM
2004-06-22 23:00:00, 2004-06-23 09:00:00, 2004-06-23 06:00:00, CCM
2004-07-01 05:00:00, 2004-07-01 09:00:00, 2004-07-01 06:00:00, MBCCM
2004-07-02 00:00:00, 2004-07-02 04:00:00, 2004-07-02 02:00:00, MBSCL
2004-07-04 12:00:00, 2004-07-04 15:00:00, 2004-07-04 13:00:00, MBCCM
2004-07-06 04:00:00, 2004-07-06 13:00:00, 2004-07-06 07:00:00, SCL
2004-07-07 04:00:00, 2004-07-07 12:00:00, 2004-07-07 10:00:00, CCM
2004-07-08 03:00:00, 2004-07-08 06:00:00, 2004-07-08 05:00:00, MBCCM
2004-07-08 12:00:00, 2004-07-08 17:00:00, 2004-07-08 13:00:00, MBCCM
2004-07-08 02:00:00, 2004-07-08 06:00:00, 2004-07-08 04:00:00, MBCCM
2004-07-09 05:00:00, 2004-07-09 12:00:00, 2004-07-09 08:00:00, CCM
2004-07-11 18:00:00, 2004-07-12 12:00:00, 2004-07-11 21:00:00, MBSCL
2004-07-11 23:00:00, 2004-07-12 05:00:00, 2004-07-12 02:00:00, MBSCL
2004-07-15 11:00:00, 2004-07-15 19:00:00, 2004-07-15 12:00:00, CCM
2004-07-16 12:00:00, 2004-07-16 16:00:00, 2004-07-16 14:00:00, MBCCM
2004-07-17 02:00:00, 2004-07-17 06:00:00, 2004-07-17 05:00:00, MBCCM"""
import io
df = pd.read_csv(io.BytesIO(text), skipinitialspace=True)
df.drop(["Clase"], axis=1, inplace=True)
df = df.apply(lambda s:s.str[11:13]).convert_objects(convert_numeric=True)
df2 = df.apply(lambda s:s.value_counts())
print df2
Here is the code that draw 3d bars:
import pandas as pd
text="""Horas Frec_Inicio Frec_Max Frec_Fin
1 2 0 1
2 3 8 1
3 5 3 2
4 6 2 6
5 6 6 5
6 5 6 4
7 5 7 2
8 2 4 5
9 1 6 6
10 0 3 2
11 2 5 5
12 4 1 9
13 2 4 2
14 3 2 4
15 0 2 3
16 1 1 3
17 0 2 3
18 1 1 1
19 0 0 3
20 1 1 1
21 1 1 0
22 3 1 0
23 9 1 0
24 8 3 2"""
import io
df = pd.read_csv(io.BytesIO(text), skipinitialspace=True, delim_whitespace=True)
df.set_index("Horas", inplace=True)
columns_name = [x.replace("_", " ") for x in df.columns]
df.columns = [0, 2, 4]
x, y, z = df.stack().reset_index().values.T
import visvis as vv
app = vv.use()
f = vv.clf()
a = vv.cla()
bar =vv.bar3(x, y, z, width=0.8)
bar.colors = ["r","g","b"] * 24
a.axis.yTicks = dict(zip(df.columns, columns_name))
app.Run()
the output:

Grouping by monthly and plot a bar stacked in pandas

I would like to get a dataframe, where data representing different classes and monthly frequency for each class. For example, in the following dataframe want to use the column Forma for get a dataframe representing monthly frequencies of each of the classes of the column Forma y get for example a dataframe df1
df
Evento Forma Excentricidad
Fecha
2004-04-09 22:45:00 1 MBCCM 0.7
2004-04-12 22:45:00 2 MBSCL 0.6
2004-04-24 03:45:00 3 SCL 0.4
2004-05-02 06:45:00 4 SCL 0.5
2004-05-30 04:45:00 5 MBCCM 0.9
2004-05-31 03:15:00 6 MBCCM 0.8
2004-06-08 00:15:00 7 MBSCL 0.6
2004-06-12 22:15:00 8 CCM 1.0
2004-06-13 02:45:00 9 MBCCM 0.8
2004-06-13 23:45:00 10 MBSCL 0.6
2004-06-14 03:15:00 11 MBSCL 0.6
2004-06-17 08:15:00 12 MBCCM 0.7
2004-06-17 11:45:00 13 MBCCM 0.7
2004-06-22 00:15:00 14 SCL 0.5
2004-06-22 07:45:00 15 MBCCM 0.9
2004-06-22 22:45:00 16 CCM 0.8
2004-07-01 05:15:00 17 MBCCM 0.8
2004-07-02 00:15:00 18 MBSCL 0.6
2004-07-04 11:45:00 19 MBCCM 0.9
2004-07-06 03:45:00 20 SCL 0.6
2004-07-07 04:15:00 21 CCM 0.9
2004-07-08 02:45:00 22 MBCCM 1.0
2004-07-08 11:45:00 23 MBCCM 0.8
2004-07-08 02:15:00 24 MBCCM 0.9
2004-07-09 04:45:00 25 CCM 0.7
2004-07-11 18:15:00 26 MBSCL 0.4
2004-07-11 23:15:00 27 MBSCL 0.3
2004-07-15 10:45:00 28 CCM 0.8
2004-07-16 12:15:00 29 MBCCM 0.8
2004-07-17 02:15:00 30 MBCCM 0.8
2004-07-17 05:45:00 31 MBCCM 0.7
2004-07-19 23:15:00 32 CCM 0.9
2004-07-20 09:15:00 33 CCM 0.7
2004-07-20 21:45:00 34 SCL 0.6
2004-07-23 03:45:00 35 SCL 0.6
2004-07-23 12:45:00 36 MBCCM 0.9
2004-07-24 00:45:00 37 CCM 0.7
2004-07-26 00:15:00 38 MBCCM 0.8
2004-07-27 05:15:00 39 MBSCL 0.6
2004-07-27 07:15:00 40 MBSCL 0.6
2004-07-27 14:15:00 41 MBCCM 0.7
2004-07-27 19:45:00 42 SCL 0.6
2004-07-27 23:15:00 43 MBSCL 0.6
2004-07-28 07:15:00 44 MBCCM 0.8
2004-07-30 05:15:00 45 MBCCM 0.7
2004-07-31 00:15:00 46 SCL 0.5
2004-07-31 04:15:00 47 MBSCL 0.6
df1
Tipo Abril Mayo Junio Julio Agosto Septiembre Octubre
MCC 2 9 8 1 5 6 3
CCM 7 11 23 12 7 2 4
MBCCM 4 8 4 1 3 4 2
SCL 1 7 2 4 1 9 5
MBSCL 6 3 7 1 9 3 10
how can i do this, from df?
import pandas as pd
df = pd.read_table('data', sep='\s{2,}')
df.index = pd.to_datetime(df.index)
df['Month'] = [date.strftime('%B') for date in df.index]
print(pd.crosstab(rows=[df['Forma']], cols=[df['Month']], margins=False))
yields
Month April July June May
Forma
CCM 0 6 2 0
MBCCM 1 13 4 2
MBSCL 1 7 3 0
SCL 1 5 1 1

Categories

Resources