I have some noise survey data telling me noise levels measured over the period of several days. I want to find the 5th highest noise level in each night-time period. I have made this into a Pandas Series and used groupby and nlargest methods to show me the 5 highest noise levels each night, but now I want to view only the 5th highest value for each period (i.e. 82, 86, 86, 87 etc.). What's the best way to achieve this?
night_time_lmax.groupby(by=night_time_lmax.index.date).nlargest(5)
Start date & time
2021-08-18 2021-08-18 23:00:00 82.0
2021-08-18 23:15:00 82.0
2021-08-18 23:30:00 82.0
2021-08-18 23:45:00 82.0
2021-08-19 2021-08-19 05:45:00 100.0
2021-08-19 01:15:00 91.0
2021-08-19 04:45:00 87.0
2021-08-19 06:15:00 87.0
2021-08-19 01:45:00 86.0
2021-08-20 2021-08-20 06:30:00 90.0
2021-08-20 06:00:00 88.0
2021-08-20 03:15:00 87.0
2021-08-20 05:30:00 87.0
2021-08-20 01:15:00 86.0
2021-08-21 2021-08-21 01:30:00 98.0
2021-08-21 03:00:00 93.0
2021-08-21 00:45:00 88.0
2021-08-21 06:00:00 88.0
2021-08-21 03:30:00 87.0
2021-08-22 2021-08-22 23:45:00 102.0
2021-08-22 00:30:00 96.0
2021-08-22 06:30:00 92.0
2021-08-22 05:00:00 91.0
2021-08-22 01:30:00 90.0
2021-08-23 2021-08-23 01:15:00 98.0
2021-08-23 02:15:00 88.0
2021-08-23 00:45:00 87.0
2021-08-23 03:00:00 86.0
2021-08-23 06:00:00 86.0
2021-08-24 2021-08-24 01:00:00 93.0
2021-08-24 00:30:00 89.0
2021-08-24 06:30:00 87.0
2021-08-24 02:45:00 86.0
2021-08-24 06:00:00 86.0```
I see two options here.
Either sort your data by your value and then take the nth element per group:
(night_time_lmax.sort_values(by='value_column', ascending=False)
.groupby(by=night_time_lmax.index.date).nth(5)
)
## below gives the same result for shorter syntax:
# (night_time_lmax.sort_values(by='value_column')
# .groupby(by=night_time_lmax.index.date).nth(-5)
# )
Or use a double groupby, once for the top 5 and once for the last:
(night_time_lmax.groupby(by=night_time_lmax.index.date).nlargest(5)
.groupby(by=night_time_lmax.index.date).last()
)
I have a dataframe with a datetime index and 100 columns.
I want to have a new dataframe with the same datetime index and columns, but the values would contain the sum of the first 10 hours of each day.
So if I had an original dataframe like this:
A B C
---------------------------------
2018-01-01 00:00:00 2 5 -10
2018-01-01 01:00:00 6 5 7
2018-01-01 02:00:00 7 5 9
2018-01-01 03:00:00 9 5 6
2018-01-01 04:00:00 10 5 2
2018-01-01 05:00:00 7 5 -1
2018-01-01 06:00:00 1 5 -1
2018-01-01 07:00:00 -4 5 10
2018-01-01 08:00:00 9 5 10
2018-01-01 09:00:00 21 5 -10
2018-01-01 10:00:00 2 5 -1
2018-01-01 11:00:00 8 5 -1
2018-01-01 12:00:00 8 5 10
2018-01-01 13:00:00 8 5 9
2018-01-01 14:00:00 7 5 -10
2018-01-01 15:00:00 7 5 5
2018-01-01 16:00:00 7 5 -10
2018-01-01 17:00:00 4 5 7
2018-01-01 18:00:00 5 5 8
2018-01-01 19:00:00 2 5 8
2018-01-01 20:00:00 2 5 4
2018-01-01 21:00:00 8 5 3
2018-01-01 22:00:00 1 5 3
2018-01-01 23:00:00 1 5 1
2018-01-02 00:00:00 2 5 2
2018-01-02 01:00:00 3 5 8
2018-01-02 02:00:00 4 5 6
2018-01-02 03:00:00 5 5 6
2018-01-02 04:00:00 1 5 7
2018-01-02 05:00:00 7 5 7
2018-01-02 06:00:00 5 5 1
2018-01-02 07:00:00 2 5 2
2018-01-02 08:00:00 4 5 3
2018-01-02 09:00:00 6 5 4
2018-01-02 10:00:00 9 5 4
2018-01-02 11:00:00 11 5 5
2018-01-02 12:00:00 2 5 8
2018-01-02 13:00:00 2 5 0
2018-01-02 14:00:00 4 5 5
2018-01-02 15:00:00 5 5 4
2018-01-02 16:00:00 7 5 4
2018-01-02 17:00:00 -1 5 7
2018-01-02 18:00:00 1 5 7
2018-01-02 19:00:00 1 5 7
2018-01-02 20:00:00 5 5 7
2018-01-02 21:00:00 2 5 7
2018-01-02 22:00:00 2 5 7
2018-01-02 23:00:00 8 5 7
So for all rows with date 2018-01-01:
The value for column A would be 68 (2+6+7+9+10+7+1-4+9+21)
The value for column B would be 50 (5+5+5+5+5+5+5+5+5+5)
The value for column C would be 22 (-10+7+9+6+2-1-1+10+10-10)
So for all rows with date 2018-01-02:
The value for column A would be 39 (2+3+4+5+1+7+5+2+4+6)
The value for column B would be 50 (5+5+5+5+5+5+5+5+5+5)
The value for column C would be 46 (2+8+6+6+7+7+1+2+3+4)
The outcome would be:
A B C
---------------------------------
2018-01-01 00:00:00 68 50 22
2018-01-01 01:00:00 68 50 22
2018-01-01 02:00:00 68 50 22
2018-01-01 03:00:00 68 50 22
2018-01-01 04:00:00 68 50 22
2018-01-01 05:00:00 68 50 22
2018-01-01 06:00:00 68 50 22
2018-01-01 07:00:00 68 50 22
2018-01-01 08:00:00 68 50 22
2018-01-01 09:00:00 68 50 22
2018-01-01 10:00:00 68 50 22
2018-01-01 11:00:00 68 50 22
2018-01-01 12:00:00 68 50 22
2018-01-01 13:00:00 68 50 22
2018-01-01 14:00:00 68 50 22
2018-01-01 15:00:00 68 50 22
2018-01-01 16:00:00 68 50 22
2018-01-01 17:00:00 68 50 22
2018-01-01 18:00:00 68 50 22
2018-01-01 19:00:00 68 50 22
2018-01-01 20:00:00 68 50 22
2018-01-01 21:00:00 68 50 22
2018-01-01 22:00:00 68 50 22
2018-01-01 23:00:00 68 50 22
2018-01-02 00:00:00 39 50 46
2018-01-02 01:00:00 39 50 46
2018-01-02 02:00:00 39 50 46
2018-01-02 03:00:00 39 50 46
2018-01-02 04:00:00 39 50 46
2018-01-02 05:00:00 39 50 46
2018-01-02 06:00:00 39 50 46
2018-01-02 07:00:00 39 50 46
2018-01-02 08:00:00 39 50 46
2018-01-02 09:00:00 39 50 46
2018-01-02 10:00:00 39 50 46
2018-01-02 11:00:00 39 50 46
2018-01-02 12:00:00 39 50 46
2018-01-02 13:00:00 39 50 46
2018-01-02 14:00:00 39 50 46
2018-01-02 15:00:00 39 50 46
2018-01-02 16:00:00 39 50 46
2018-01-02 17:00:00 39 50 46
2018-01-02 18:00:00 39 50 46
2018-01-02 19:00:00 39 50 46
2018-01-02 20:00:00 39 50 46
2018-01-02 21:00:00 39 50 46
2018-01-02 22:00:00 39 50 46
2018-01-02 23:00:00 39 50 46
I figured I'd group by date first and perform a sum and then merge the results based on the date. Is there a better/faster way to do this?
Thanks.
EDIT: I worked on this answer in the mean time:
df= df.between_time('0:00','9:00').groupby(pd.Grouper(freq='D')).sum()
df= df.resample('1H').ffill()
You need groupby df.index.date and use transfrom with lambda function to find sum of first 10 values as:
df.loc[:,['A','B','C']] = df.groupby(df.index.date).transform(lambda x: x[:10].sum())
Or if the sequence is the same for both grouped values and real columns
df.loc[:,:] = df.groupby(df.index.date).transform(lambda x: x[:10].sum())
print(df)
A B C
2018-01-01 00:00:00 68 50 22
2018-01-01 01:00:00 68 50 22
2018-01-01 02:00:00 68 50 22
2018-01-01 03:00:00 68 50 22
2018-01-01 04:00:00 68 50 22
2018-01-01 05:00:00 68 50 22
2018-01-01 06:00:00 68 50 22
2018-01-01 07:00:00 68 50 22
2018-01-01 08:00:00 68 50 22
2018-01-01 09:00:00 68 50 22
2018-01-01 10:00:00 68 50 22
2018-01-01 11:00:00 68 50 22
2018-01-01 12:00:00 68 50 22
2018-01-01 13:00:00 68 50 22
2018-01-01 14:00:00 68 50 22
2018-01-01 15:00:00 68 50 22
2018-01-01 16:00:00 68 50 22
2018-01-01 17:00:00 68 50 22
2018-01-01 18:00:00 68 50 22
2018-01-01 19:00:00 68 50 22
2018-01-01 20:00:00 68 50 22
2018-01-01 21:00:00 68 50 22
2018-01-01 22:00:00 68 50 22
2018-01-01 23:00:00 68 50 22
2018-01-02 00:00:00 39 50 46
2018-01-02 01:00:00 39 50 46
2018-01-02 02:00:00 39 50 46
2018-01-02 03:00:00 39 50 46
2018-01-02 04:00:00 39 50 46
2018-01-02 05:00:00 39 50 46
2018-01-02 06:00:00 39 50 46
2018-01-02 07:00:00 39 50 46
2018-01-02 08:00:00 39 50 46
2018-01-02 09:00:00 39 50 46
2018-01-02 10:00:00 39 50 46
2018-01-02 11:00:00 39 50 46
2018-01-02 12:00:00 39 50 46
2018-01-02 13:00:00 39 50 46
2018-01-02 14:00:00 39 50 46
2018-01-02 15:00:00 39 50 46
2018-01-02 16:00:00 39 50 46
2018-01-02 17:00:00 39 50 46
2018-01-02 18:00:00 39 50 46
2018-01-02 19:00:00 39 50 46
2018-01-02 20:00:00 39 50 46
2018-01-02 21:00:00 39 50 46
2018-01-02 22:00:00 39 50 46
2018-01-02 23:00:00 39 50 46
I am new to Pandas but have been working with python for a few years now.
I have a large data set of hourly data with multiple columns. I need to group the data by day then count how many times the value is above 85 for each day for each column.
example data:
date KMRY KSNS PCEC1 KFAT
2014-06-06 13:00:00 56.000000 63.0 17 11
2014-06-06 14:00:00 58.000000 61.0 17 11
2014-06-06 15:00:00 63.000000 63.0 16 10
2014-06-06 16:00:00 67.000000 65.0 12 11
2014-06-06 17:00:00 67.000000 67.0 10 13
2014-06-06 18:00:00 72.000000 75.0 9 14
2014-06-06 19:00:00 77.000000 79.0 9 15
2014-06-06 20:00:00 84.000000 81.0 9 23
2014-06-06 21:00:00 81.000000 86.0 12 31
2014-06-06 22:00:00 84.000000 84.0 13 28
2014-06-06 23:00:00 83.000000 86.0 15 34
2014-06-07 00:00:00 84.000000 86.0 16 36
2014-06-07 01:00:00 86.000000 89.0 17 43
2014-06-07 02:00:00 86.000000 89.0 20 44
2014-06-07 03:00:00 89.000000 89.0 22 49
2014-06-07 04:00:00 86.000000 86.0 22 51
2014-06-07 05:00:00 86.000000 89.0 21 53
From the sample above my results should look like the following:
date KMRY KSNS PCEC1 KFAT
2014-06-06 0 2 0 0
2014-06-07 5 6 0 0
Any help you be greatly appreciated.
(D_RH>85).sum()
The above code gets me close but I need a daily break down also not just the column counts.
One way would be to make date a DatetimeIndex and then groupby the result of the comparison to 85. For example:
>>> df["date"] = pd.to_datetime(df["date"]) # only if it isn't already
>>> df = df.set_index("date")
>>> (df > 85).groupby(df.index.date).sum()
KMRY KSNS PCEC1 KFAT
2014-06-06 0 2 0 0
2014-06-07 5 6 0 0
I am trying to create a chart with multiple bars in 3d from pandas. Reviewing some examples on the web, I see that the best way to accomplish this is to get a dataframe like this:
data
Variable A B C D
date
2000-01-03 0.469112 -1.135632 0.119209 -2.104569
2000-01-04 -0.282863 1.212112 -1.044236 -0.494929
2000-01-05 -1.509059 -0.173215 -0.861849 1.071804
My dataframe is:
df
Date_inicio Date_Fin Date_Max Clase
0 2004-04-09 23:00:00 2004-04-10 04:00:00 2004-04-10 02:00:00 MBCCM
1 2004-04-12 23:00:00 2004-04-13 04:00:00 2004-04-13 00:00:00 MBSCL
2 2004-04-24 04:00:00 2004-04-24 12:00:00 2004-04-24 09:00:00 SCL
3 2004-05-02 07:00:00 2004-05-02 14:00:00 2004-05-02 11:00:00 SCL
4 2004-05-30 05:00:00 2004-05-30 08:00:00 2004-05-30 07:00:00 MBCCM
5 2004-05-31 03:00:00 2004-05-31 07:00:00 2004-05-31 05:00:00 MBCCM
6 2004-06-08 00:00:00 2004-06-08 05:00:00 2004-06-08 03:00:00 MBSCL
7 2004-06-12 22:00:00 2004-06-13 12:00:00 2004-06-13 06:00:00 CCM
8 2004-06-13 03:00:00 2004-06-13 08:00:00 2004-06-13 06:00:00 MBCCM
9 2004-06-14 00:00:00 2004-06-14 03:00:00 2004-06-14 02:00:00 MBSCL
10 2004-06-14 03:00:00 2004-06-14 09:00:00 2004-06-14 07:00:00 MBSCL
11 2004-06-17 08:00:00 2004-06-17 14:00:00 2004-06-17 11:00:00 MBCCM
12 2004-06-17 12:00:00 2004-06-17 17:00:00 2004-06-17 14:00:00 MBCCM
13 2004-06-22 00:00:00 2004-06-22 08:00:00 2004-06-22 06:00:00 SCL
14 2004-06-22 08:00:00 2004-06-22 14:00:00 2004-06-22 11:00:00 MBCCM
15 2004-06-22 23:00:00 2004-06-23 09:00:00 2004-06-23 06:00:00 CCM
16 2004-07-01 05:00:00 2004-07-01 09:00:00 2004-07-01 06:00:00 MBCCM
17 2004-07-02 00:00:00 2004-07-02 04:00:00 2004-07-02 02:00:00 MBSCL
18 2004-07-04 12:00:00 2004-07-04 15:00:00 2004-07-04 13:00:00 MBCCM
19 2004-07-06 04:00:00 2004-07-06 13:00:00 2004-07-06 07:00:00 SCL
20 2004-07-07 04:00:00 2004-07-07 12:00:00 2004-07-07 10:00:00 CCM
21 2004-07-08 03:00:00 2004-07-08 06:00:00 2004-07-08 05:00:00 MBCCM
22 2004-07-08 12:00:00 2004-07-08 17:00:00 2004-07-08 13:00:00 MBCCM
23 2004-07-08 02:00:00 2004-07-08 06:00:00 2004-07-08 04:00:00 MBCCM
24 2004-07-09 05:00:00 2004-07-09 12:00:00 2004-07-09 08:00:00 CCM
25 2004-07-11 18:00:00 2004-07-12 12:00:00 2004-07-11 21:00:00 MBSCL
26 2004-07-11 23:00:00 2004-07-12 05:00:00 2004-07-12 02:00:00 MBSCL
27 2004-07-15 11:00:00 2004-07-15 19:00:00 2004-07-15 12:00:00 CCM
28 2004-07-16 12:00:00 2004-07-16 16:00:00 2004-07-16 14:00:00 MBCCM
29 2004-07-17 02:00:00 2004-07-17 06:00:00 2004-07-17 05:00:00 MBCCM
Now I want to get the occurrence of all classes for hour. For example, how many times different classes occur at some time in Date_inicio, Date_fin and Date_max. From df i obtain the next frecuencies table,
frec
Frec_Inicio Frec_Max Frec_Fin
Horas
1 2 0 1
2 3 8 1
3 5 3 2
4 6 2 6
5 6 6 5
6 5 6 4
7 5 7 2
8 2 4 5
9 1 6 6
10 0 3 2
11 2 5 5
12 4 1 9
13 2 4 2
14 3 2 4
15 0 2 3
16 1 1 3
17 0 2 3
18 1 1 1
19 0 0 3
20 1 1 1
21 1 1 0
22 3 1 0
23 9 1 0
24 8 3 2
Now, my goal is to plot a 3D bar like the figure below
To achieve this, i write the following code
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
xpos=np.arange(frec.shape[0])
ypos=np.arange(frec.shape[1])
xpos, ypos = np.meshgrid(xpos+0.25, ypos+0.25)
xpos = xpos.flatten()
ypos = ypos.flatten()
zpos=np.zeros(frec.shape).flatten()
dx=0.5 * np.ones_like(zpos)
dy=0.5 * np.ones_like(zpos)
dz=frec.values.ravel()
dz[np.isnan(dz)]=0.
ax.bar3d(xpos,ypos,zpos,dx,dy,dz,color='b', alpha=0.5)
ax.set_xticks([.5,1.5,2.5])
ax.set_yticks([.5,1.5,2.5,3.5])
ax.w_yaxis.set_ticklabels(frec.columns)
ax.w_xaxis.set_ticklabels(frec.index)
ax.set_xlabel('Time')
ax.set_ylabel('B')
ax.set_zlabel('Occurrence')
plt.show()
How I get a better plot, similar to the previous figure?
Here is the code to do count:
import pandas as pd
text="""Date_inicio, Date_Fin, Date_Max, Clase
2004-04-09 23:00:00, 2004-04-10 04:00:00, 2004-04-10 02:00:00, MBCCM
2004-04-12 23:00:00, 2004-04-13 04:00:00, 2004-04-13 00:00:00, MBSCL
2004-04-24 04:00:00, 2004-04-24 12:00:00, 2004-04-24 09:00:00, SCL
2004-05-02 07:00:00, 2004-05-02 14:00:00, 2004-05-02 11:00:00, SCL
2004-05-30 05:00:00, 2004-05-30 08:00:00, 2004-05-30 07:00:00, MBCCM
2004-05-31 03:00:00, 2004-05-31 07:00:00, 2004-05-31 05:00:00, MBCCM
2004-06-08 00:00:00, 2004-06-08 05:00:00, 2004-06-08 03:00:00, MBSCL
2004-06-12 22:00:00, 2004-06-13 12:00:00, 2004-06-13 06:00:00, CCM
2004-06-13 03:00:00, 2004-06-13 08:00:00, 2004-06-13 06:00:00, MBCCM
2004-06-14 00:00:00, 2004-06-14 03:00:00, 2004-06-14 02:00:00, MBSCL
2004-06-14 03:00:00, 2004-06-14 09:00:00, 2004-06-14 07:00:00, MBSCL
2004-06-17 08:00:00, 2004-06-17 14:00:00, 2004-06-17 11:00:00, MBCCM
2004-06-17 12:00:00, 2004-06-17 17:00:00, 2004-06-17 14:00:00, MBCCM
2004-06-22 00:00:00, 2004-06-22 08:00:00, 2004-06-22 06:00:00, SCL
2004-06-22 08:00:00, 2004-06-22 14:00:00, 2004-06-22 11:00:00, MBCCM
2004-06-22 23:00:00, 2004-06-23 09:00:00, 2004-06-23 06:00:00, CCM
2004-07-01 05:00:00, 2004-07-01 09:00:00, 2004-07-01 06:00:00, MBCCM
2004-07-02 00:00:00, 2004-07-02 04:00:00, 2004-07-02 02:00:00, MBSCL
2004-07-04 12:00:00, 2004-07-04 15:00:00, 2004-07-04 13:00:00, MBCCM
2004-07-06 04:00:00, 2004-07-06 13:00:00, 2004-07-06 07:00:00, SCL
2004-07-07 04:00:00, 2004-07-07 12:00:00, 2004-07-07 10:00:00, CCM
2004-07-08 03:00:00, 2004-07-08 06:00:00, 2004-07-08 05:00:00, MBCCM
2004-07-08 12:00:00, 2004-07-08 17:00:00, 2004-07-08 13:00:00, MBCCM
2004-07-08 02:00:00, 2004-07-08 06:00:00, 2004-07-08 04:00:00, MBCCM
2004-07-09 05:00:00, 2004-07-09 12:00:00, 2004-07-09 08:00:00, CCM
2004-07-11 18:00:00, 2004-07-12 12:00:00, 2004-07-11 21:00:00, MBSCL
2004-07-11 23:00:00, 2004-07-12 05:00:00, 2004-07-12 02:00:00, MBSCL
2004-07-15 11:00:00, 2004-07-15 19:00:00, 2004-07-15 12:00:00, CCM
2004-07-16 12:00:00, 2004-07-16 16:00:00, 2004-07-16 14:00:00, MBCCM
2004-07-17 02:00:00, 2004-07-17 06:00:00, 2004-07-17 05:00:00, MBCCM"""
import io
df = pd.read_csv(io.BytesIO(text), skipinitialspace=True)
df.drop(["Clase"], axis=1, inplace=True)
df = df.apply(lambda s:s.str[11:13]).convert_objects(convert_numeric=True)
df2 = df.apply(lambda s:s.value_counts())
print df2
Here is the code that draw 3d bars:
import pandas as pd
text="""Horas Frec_Inicio Frec_Max Frec_Fin
1 2 0 1
2 3 8 1
3 5 3 2
4 6 2 6
5 6 6 5
6 5 6 4
7 5 7 2
8 2 4 5
9 1 6 6
10 0 3 2
11 2 5 5
12 4 1 9
13 2 4 2
14 3 2 4
15 0 2 3
16 1 1 3
17 0 2 3
18 1 1 1
19 0 0 3
20 1 1 1
21 1 1 0
22 3 1 0
23 9 1 0
24 8 3 2"""
import io
df = pd.read_csv(io.BytesIO(text), skipinitialspace=True, delim_whitespace=True)
df.set_index("Horas", inplace=True)
columns_name = [x.replace("_", " ") for x in df.columns]
df.columns = [0, 2, 4]
x, y, z = df.stack().reset_index().values.T
import visvis as vv
app = vv.use()
f = vv.clf()
a = vv.cla()
bar =vv.bar3(x, y, z, width=0.8)
bar.colors = ["r","g","b"] * 24
a.axis.yTicks = dict(zip(df.columns, columns_name))
app.Run()
the output:
I would like to get a dataframe, where data representing different classes and monthly frequency for each class. For example, in the following dataframe want to use the column Forma for get a dataframe representing monthly frequencies of each of the classes of the column Forma y get for example a dataframe df1
df
Evento Forma Excentricidad
Fecha
2004-04-09 22:45:00 1 MBCCM 0.7
2004-04-12 22:45:00 2 MBSCL 0.6
2004-04-24 03:45:00 3 SCL 0.4
2004-05-02 06:45:00 4 SCL 0.5
2004-05-30 04:45:00 5 MBCCM 0.9
2004-05-31 03:15:00 6 MBCCM 0.8
2004-06-08 00:15:00 7 MBSCL 0.6
2004-06-12 22:15:00 8 CCM 1.0
2004-06-13 02:45:00 9 MBCCM 0.8
2004-06-13 23:45:00 10 MBSCL 0.6
2004-06-14 03:15:00 11 MBSCL 0.6
2004-06-17 08:15:00 12 MBCCM 0.7
2004-06-17 11:45:00 13 MBCCM 0.7
2004-06-22 00:15:00 14 SCL 0.5
2004-06-22 07:45:00 15 MBCCM 0.9
2004-06-22 22:45:00 16 CCM 0.8
2004-07-01 05:15:00 17 MBCCM 0.8
2004-07-02 00:15:00 18 MBSCL 0.6
2004-07-04 11:45:00 19 MBCCM 0.9
2004-07-06 03:45:00 20 SCL 0.6
2004-07-07 04:15:00 21 CCM 0.9
2004-07-08 02:45:00 22 MBCCM 1.0
2004-07-08 11:45:00 23 MBCCM 0.8
2004-07-08 02:15:00 24 MBCCM 0.9
2004-07-09 04:45:00 25 CCM 0.7
2004-07-11 18:15:00 26 MBSCL 0.4
2004-07-11 23:15:00 27 MBSCL 0.3
2004-07-15 10:45:00 28 CCM 0.8
2004-07-16 12:15:00 29 MBCCM 0.8
2004-07-17 02:15:00 30 MBCCM 0.8
2004-07-17 05:45:00 31 MBCCM 0.7
2004-07-19 23:15:00 32 CCM 0.9
2004-07-20 09:15:00 33 CCM 0.7
2004-07-20 21:45:00 34 SCL 0.6
2004-07-23 03:45:00 35 SCL 0.6
2004-07-23 12:45:00 36 MBCCM 0.9
2004-07-24 00:45:00 37 CCM 0.7
2004-07-26 00:15:00 38 MBCCM 0.8
2004-07-27 05:15:00 39 MBSCL 0.6
2004-07-27 07:15:00 40 MBSCL 0.6
2004-07-27 14:15:00 41 MBCCM 0.7
2004-07-27 19:45:00 42 SCL 0.6
2004-07-27 23:15:00 43 MBSCL 0.6
2004-07-28 07:15:00 44 MBCCM 0.8
2004-07-30 05:15:00 45 MBCCM 0.7
2004-07-31 00:15:00 46 SCL 0.5
2004-07-31 04:15:00 47 MBSCL 0.6
df1
Tipo Abril Mayo Junio Julio Agosto Septiembre Octubre
MCC 2 9 8 1 5 6 3
CCM 7 11 23 12 7 2 4
MBCCM 4 8 4 1 3 4 2
SCL 1 7 2 4 1 9 5
MBSCL 6 3 7 1 9 3 10
how can i do this, from df?
import pandas as pd
df = pd.read_table('data', sep='\s{2,}')
df.index = pd.to_datetime(df.index)
df['Month'] = [date.strftime('%B') for date in df.index]
print(pd.crosstab(rows=[df['Forma']], cols=[df['Month']], margins=False))
yields
Month April July June May
Forma
CCM 0 6 2 0
MBCCM 1 13 4 2
MBSCL 1 7 3 0
SCL 1 5 1 1