I have a dataframe which I pivoted and I now want to select spefici rows from the data. I have seen similar questions such as the one here: Selecting columns in a pandas pivot table based on specific row value?. In my case I want to return all the columns but I want to select only specific rows.
timestamp,value
2008-03-01 00:00:00,55.0
2008-03-01 00:15:00,20.0
2008-03-01 00:30:00,13.0
2008-03-01 00:45:00,78.0
2008-03-01 01:00:00,34.0
2008-03-01 01:15:00,123.0
2008-03-01 01:30:00,25.0
2008-03-01 01:45:00,91.0
2008-03-02 00:00:00,55.0
2008-03-02 00:15:00,46.0
2008-03-02 00:30:00,66.0
2008-03-02 00:45:00,24.0
2008-03-02 01:00:00,70.0
2008-03-02 01:15:00,32.0
2008-03-02 01:30:00,15.0
2008-03-02 01:45:00,92.0
I have done the below to generate the below output
import pandas as pd
import numpy as np
from datetime import datetime
df = pd.read_csv('df.csv')
df.timestamp = pd.to_datetime(df.timestamp)
df = df.set_index('timestamp')
df['date'] = df.index.map(lambda t: t.date())
df['time'] = df.index.map(lambda t: t.time())
df_pivot = pd.pivot_table(df, values='value', index='timestamp', columns='time')
df_pivot = df_pivot.fillna(0.0)
print(df_pivot)
Generated output
time 00:00:00 00:15:00 00:30:00 00:45:00 01:00:00 01:15:00 01:30:00 01:45:00
timestamp
2008-03-01 00:00:00 55.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2008-03-01 00:15:00 0.0 20.0 0.0 0.0 0.0 0.0 0.0 0.0
2008-03-01 00:30:00 0.0 0.0 13.0 0.0 0.0 0.0 0.0 0.0
2008-03-01 00:45:00 0.0 0.0 0.0 78.0 0.0 0.0 0.0 0.0
2008-03-01 01:00:00 0.0 0.0 0.0 0.0 34.0 0.0 0.0 0.0
2008-03-01 01:15:00 0.0 0.0 0.0 0.0 0.0 123.0 0.0 0.0
2008-03-01 01:30:00 0.0 0.0 0.0 0.0 0.0 0.0 25.0 0.0
2008-03-01 01:45:00 0.0 0.0 0.0 0.0 0.0 0.0 0.0 91.0
2008-03-02 00:00:00 55.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2008-03-02 00:15:00 0.0 46.0 0.0 0.0 0.0 0.0 0.0 0.0
2008-03-02 00:30:00 0.0 0.0 66.0 0.0 0.0 0.0 0.0 0.0
2008-03-02 00:45:00 0.0 0.0 0.0 24.0 0.0 0.0 0.0 0.0
2008-03-02 01:00:00 0.0 0.0 0.0 0.0 70.0 0.0 0.0 0.0
2008-03-02 01:15:00 0.0 0.0 0.0 0.0 0.0 32.0 0.0 0.0
2008-03-02 01:30:00 0.0 0.0 0.0 0.0 0.0 0.0 15.0 0.0
2008-03-02 01:45:00 0.0 0.0 0.0 0.0 0.0 0.0 0.0 92.0
I want to select e.g., only the data for 2008-03-01 00:00:00, 2008-03-01 01:15:00, and 2008-03-02 01:00:00.
Expected output
time 00:00:00 00:15:00 00:30:00 00:45:00 01:00:00 01:15:00 01:30:00 01:45:00
timestamp
2008-03-01 00:00:00 55.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2008-03-01 01:15:00 0.0 0.0 0.0 0.0 0.0 123.0 0.0 0.0
2008-03-02 01:00:00 0.0 0.0 0.0 0.0 70.0 0.0 0.0 0.0
How can I do that
Use list of datetimes converted by to_datetime and select by DataFrame.loc:
#create DatetimeIndex
df = pd.read_csv('df.csv', index_col='timestamp', parse_dates=['timestamp'])
#used pandas methods
df['date'] = df.index.date
df['time'] = df.index.time
#added fill_value parameter
df_pivot = pd.pivot_table(df,values='value',index='timestamp',columns='time',fill_value=0)
L = ['2008-03-01 00:00:00','2008-03-01 01:15:00','2008-03-02 01:00:00']
df = df_pivot.loc[pd.to_datetime(L)]
print (df)
time 00:00:00 00:15:00 00:30:00 00:45:00 01:00:00 \
2008-03-01 00:00:00 55 0 0 0 0
2008-03-01 01:15:00 0 0 0 0 0
2008-03-02 01:00:00 0 0 0 0 70
time 01:15:00 01:30:00 01:45:00
2008-03-01 00:00:00 0 0 0
2008-03-01 01:15:00 123 0 0
2008-03-02 01:00:00 0 0 0
Related
Suppose I have the following code that calculates how many products I can purchase given my budget-
import math
import pandas as pd
data = [['2021-01-02', 5.5], ['2021-02-02', 10.5], ['2021-03-02', 15.0], ['2021-04-02', 20.0]]
df = pd.DataFrame(data, columns=['Date', 'Current_Price'])
df.Date = pd.to_datetime(df.Date)
mn = df.Date.min()
mx = df.Date.max()
dr = pd.date_range(mn - pd.tseries.offsets.MonthBegin(), mx + pd.tseries.offsets.MonthEnd(), name="Date")
df = df.set_index("Date").reindex(dr).reset_index()
df['Current_Price'] = df.groupby(
pd.Grouper(key='Date', freq='1M'))['Current_Price'].ffill().bfill()
# The dataframe below shows the current price of the product
# I'd like to buy at the specific date_range
print(df)
# Create 'Day' column to know which day of the month
df['Day'] = pd.to_datetime(df['Date']).dt.day
# Create 'Deposit' column to record how much money is
# deposited in, say, my bank account to buy the product.
# 'Withdrawal' column is to record how much I spent in
# buying product(s) at the current price on a specific date.
# 'Num_of_Products_Bought' shows how many items I bought
# on that specific date.
#
# Please note that the calculate below takes into account
# the left over money, which remains after I've purchased a
# product, for future purchase. For example, if you observe
# the resulting dataframe at the end of this code, you'll
# notice that I was able to purchase 7 products on March 1, 2021
# although my deposit on that day was $100. That is because
# on the days leading up to March 1, 2021, I have been saving
# the spare change from previous product purchases and that
# extra money allows me to buy an extra product on March 1, 2021
# despite my budget of $100 should only allow me to purchase
# 6 products.
df[['Deposit', 'Withdrawal', 'Num_of_Products_Bought']] = 0.0
# Suppose I save $100 at the beginning of every month in my bank account
df.loc[df['Day'] == 1, 'Deposit'] = 100.0
for index, row in df.iterrows():
if df.loc[index, 'Day'] == 1:
# num_prod_bought = (sum_of_deposit_so_far - sum_of_withdrawal)/current_price
df.loc[index, 'Num_of_Products_Bought'] = math.floor(
(sum(df.iloc[0:(index + 1)]['Deposit'])
- sum(df.iloc[0:(index + 1)]['Withdrawal']))
/ df.loc[index, 'Current_Price'])
# Record how much I spent buying the product on specific date
df.loc[index, 'Withdrawal'] = df.loc[index, 'Num_of_Products_Bought'] * df.loc[index, 'Current_Price']
print(df)
# This code above is working as intended,
# but how can I make it more efficient/pandas-like?
# In particular, I don't like to idea of having to
# iterate the rows and having to recalculate
# the running (sum of) deposit amount and
# the running (sum of) the withdrawal.
As mentioned in the comment in the code, I would like to know how to accomplish the same without having to iterate the rows one by one and calculating the sum of the rows up to the current row in my iteration (I read around StackOverflow and saw cumsum() function, but I don't think cumsum has the notion of current row in the iteration).
Thank you very much in advance for your suggestions/answers!
A solution using .apply:
def fn():
leftover = 0
amount, deposit = yield
while True:
new_amount, new_deposit = yield (deposit + leftover) // amount
leftover = (deposit + leftover) % amount
amount, deposit = new_amount, new_deposit
df = df.set_index("Date")
s = fn()
next(s)
m = df.index.day == 1
df.loc[m, "Deposit"] = 100
df.loc[m, "Num_of_Products_Bought"] = df.loc[
m, ["Current_Price", "Deposit"]
].apply(lambda x: s.send((x["Current_Price"], x["Deposit"])), axis=1)
df.loc[m, "Withdrawal"] = (
df.loc[m, "Num_of_Products_Bought"] * df.loc[m, "Current_Price"]
)
print(df.fillna(0).reset_index())
Prints:
Date Current_Price Deposit Num_of_Products_Bought Withdrawal
0 2021-01-01 5.5 100.0 18.0 99.0
1 2021-01-02 5.5 0.0 0.0 0.0
2 2021-01-03 5.5 0.0 0.0 0.0
3 2021-01-04 5.5 0.0 0.0 0.0
4 2021-01-05 5.5 0.0 0.0 0.0
5 2021-01-06 5.5 0.0 0.0 0.0
6 2021-01-07 5.5 0.0 0.0 0.0
7 2021-01-08 5.5 0.0 0.0 0.0
8 2021-01-09 5.5 0.0 0.0 0.0
9 2021-01-10 5.5 0.0 0.0 0.0
10 2021-01-11 5.5 0.0 0.0 0.0
11 2021-01-12 5.5 0.0 0.0 0.0
12 2021-01-13 5.5 0.0 0.0 0.0
13 2021-01-14 5.5 0.0 0.0 0.0
14 2021-01-15 5.5 0.0 0.0 0.0
15 2021-01-16 5.5 0.0 0.0 0.0
16 2021-01-17 5.5 0.0 0.0 0.0
17 2021-01-18 5.5 0.0 0.0 0.0
18 2021-01-19 5.5 0.0 0.0 0.0
19 2021-01-20 5.5 0.0 0.0 0.0
20 2021-01-21 5.5 0.0 0.0 0.0
21 2021-01-22 5.5 0.0 0.0 0.0
22 2021-01-23 5.5 0.0 0.0 0.0
23 2021-01-24 5.5 0.0 0.0 0.0
24 2021-01-25 5.5 0.0 0.0 0.0
25 2021-01-26 5.5 0.0 0.0 0.0
26 2021-01-27 5.5 0.0 0.0 0.0
27 2021-01-28 5.5 0.0 0.0 0.0
28 2021-01-29 5.5 0.0 0.0 0.0
29 2021-01-30 5.5 0.0 0.0 0.0
30 2021-01-31 5.5 0.0 0.0 0.0
31 2021-02-01 10.5 100.0 9.0 94.5
32 2021-02-02 10.5 0.0 0.0 0.0
33 2021-02-03 10.5 0.0 0.0 0.0
34 2021-02-04 10.5 0.0 0.0 0.0
35 2021-02-05 10.5 0.0 0.0 0.0
36 2021-02-06 10.5 0.0 0.0 0.0
37 2021-02-07 10.5 0.0 0.0 0.0
38 2021-02-08 10.5 0.0 0.0 0.0
39 2021-02-09 10.5 0.0 0.0 0.0
40 2021-02-10 10.5 0.0 0.0 0.0
41 2021-02-11 10.5 0.0 0.0 0.0
42 2021-02-12 10.5 0.0 0.0 0.0
43 2021-02-13 10.5 0.0 0.0 0.0
44 2021-02-14 10.5 0.0 0.0 0.0
45 2021-02-15 10.5 0.0 0.0 0.0
46 2021-02-16 10.5 0.0 0.0 0.0
47 2021-02-17 10.5 0.0 0.0 0.0
48 2021-02-18 10.5 0.0 0.0 0.0
49 2021-02-19 10.5 0.0 0.0 0.0
50 2021-02-20 10.5 0.0 0.0 0.0
51 2021-02-21 10.5 0.0 0.0 0.0
52 2021-02-22 10.5 0.0 0.0 0.0
53 2021-02-23 10.5 0.0 0.0 0.0
54 2021-02-24 10.5 0.0 0.0 0.0
55 2021-02-25 10.5 0.0 0.0 0.0
56 2021-02-26 10.5 0.0 0.0 0.0
57 2021-02-27 10.5 0.0 0.0 0.0
58 2021-02-28 10.5 0.0 0.0 0.0
59 2021-03-01 15.0 100.0 7.0 105.0
60 2021-03-02 15.0 0.0 0.0 0.0
61 2021-03-03 15.0 0.0 0.0 0.0
62 2021-03-04 15.0 0.0 0.0 0.0
63 2021-03-05 15.0 0.0 0.0 0.0
64 2021-03-06 15.0 0.0 0.0 0.0
65 2021-03-07 15.0 0.0 0.0 0.0
66 2021-03-08 15.0 0.0 0.0 0.0
67 2021-03-09 15.0 0.0 0.0 0.0
68 2021-03-10 15.0 0.0 0.0 0.0
69 2021-03-11 15.0 0.0 0.0 0.0
70 2021-03-12 15.0 0.0 0.0 0.0
71 2021-03-13 15.0 0.0 0.0 0.0
72 2021-03-14 15.0 0.0 0.0 0.0
73 2021-03-15 15.0 0.0 0.0 0.0
74 2021-03-16 15.0 0.0 0.0 0.0
75 2021-03-17 15.0 0.0 0.0 0.0
76 2021-03-18 15.0 0.0 0.0 0.0
77 2021-03-19 15.0 0.0 0.0 0.0
78 2021-03-20 15.0 0.0 0.0 0.0
79 2021-03-21 15.0 0.0 0.0 0.0
80 2021-03-22 15.0 0.0 0.0 0.0
81 2021-03-23 15.0 0.0 0.0 0.0
82 2021-03-24 15.0 0.0 0.0 0.0
83 2021-03-25 15.0 0.0 0.0 0.0
84 2021-03-26 15.0 0.0 0.0 0.0
85 2021-03-27 15.0 0.0 0.0 0.0
86 2021-03-28 15.0 0.0 0.0 0.0
87 2021-03-29 15.0 0.0 0.0 0.0
88 2021-03-30 15.0 0.0 0.0 0.0
89 2021-03-31 15.0 0.0 0.0 0.0
90 2021-04-01 20.0 100.0 5.0 100.0
91 2021-04-02 20.0 0.0 0.0 0.0
92 2021-04-03 20.0 0.0 0.0 0.0
93 2021-04-04 20.0 0.0 0.0 0.0
94 2021-04-05 20.0 0.0 0.0 0.0
95 2021-04-06 20.0 0.0 0.0 0.0
96 2021-04-07 20.0 0.0 0.0 0.0
97 2021-04-08 20.0 0.0 0.0 0.0
98 2021-04-09 20.0 0.0 0.0 0.0
99 2021-04-10 20.0 0.0 0.0 0.0
100 2021-04-11 20.0 0.0 0.0 0.0
101 2021-04-12 20.0 0.0 0.0 0.0
102 2021-04-13 20.0 0.0 0.0 0.0
103 2021-04-14 20.0 0.0 0.0 0.0
104 2021-04-15 20.0 0.0 0.0 0.0
105 2021-04-16 20.0 0.0 0.0 0.0
106 2021-04-17 20.0 0.0 0.0 0.0
107 2021-04-18 20.0 0.0 0.0 0.0
108 2021-04-19 20.0 0.0 0.0 0.0
109 2021-04-20 20.0 0.0 0.0 0.0
110 2021-04-21 20.0 0.0 0.0 0.0
111 2021-04-22 20.0 0.0 0.0 0.0
112 2021-04-23 20.0 0.0 0.0 0.0
113 2021-04-24 20.0 0.0 0.0 0.0
114 2021-04-25 20.0 0.0 0.0 0.0
115 2021-04-26 20.0 0.0 0.0 0.0
116 2021-04-27 20.0 0.0 0.0 0.0
117 2021-04-28 20.0 0.0 0.0 0.0
118 2021-04-29 20.0 0.0 0.0 0.0
119 2021-04-30 20.0 0.0 0.0 0.0
I have the data like this:
OwnerUserId Score
CreationDate
2015-01-01 00:16:46.963 1491895.0 0.0
2015-01-01 00:23:35.983 1491895.0 1.0
2015-01-01 00:30:55.683 1491895.0 1.0
2015-01-01 01:10:43.830 2141635.0 0.0
2015-01-01 01:11:08.927 1491895.0 1.0
2015-01-01 01:12:34.273 3297613.0 1.0
..........
This is a whole year data with different user's score ,I hope to get the data like:
OwnerUserId 1491895.0 1491895.0 1491895.0 2141635.0 1491895.0
00:00 0.0 3.0 0.0 3.0 5.8
00:01 5.0 3.0 0.0 3.0 5.8
00:02 3.0 33.0 20.0 3.0 5.8
......
23:40 12.0 33.0 10.0 3.0 5.8
23:41 32.0 33.0 20.0 3.0 5.8
23:42 12.0 13.0 10.0 3.0 5.8
The element of dataframe is the score(mean or sum).
I have been try like follow:
pd.pivot_table(data_series.reset_index(),index=['CreationDate'],columns=['OwnerUserId'],
fill_value=0).resample('W').sum()['Score']
Get the result like the image.
I think you need:
#remove `[]` and add parameter values for remove MultiIndex in columns
df = pd.pivot_table(data_series.reset_index(),
index='CreationDate',
columns='OwnerUserId',
values='Score',
fill_value=0)
#truncate seconds and convert to timedeltaindex
df.index = pd.to_timedelta(df.index.floor('T').strftime('%H:%M:%S'))
#or round to minutes
#df.index = pd.to_timedelta(df.index.round('T').strftime('%H:%M:%S'))
print (df)
OwnerUserId 1491895.0 2141635.0 3297613.0
00:16:00 0 0 0
00:23:00 1 0 0
00:30:00 1 0 0
01:10:00 0 0 0
01:11:00 1 0 0
01:12:00 0 0 1
idx = pd.timedelta_range('00:00:00', '23:59:00', freq='T')
#resample by minutes, aggregate sum, for add missing rows use reindex
df = df.resample('T').sum().fillna(0).reindex(idx, fill_value=0)
print (df)
OwnerUserId 1491895.0 2141635.0 3297613.0
00:00:00 0.0 0.0 0.0
00:01:00 0.0 0.0 0.0
00:02:00 0.0 0.0 0.0
00:03:00 0.0 0.0 0.0
00:04:00 0.0 0.0 0.0
00:05:00 0.0 0.0 0.0
00:06:00 0.0 0.0 0.0
...
...
I have a timeseries
date
2009-12-23 0.0
2009-12-28 0.0
2009-12-29 0.0
2009-12-30 0.0
2009-12-31 0.0
2010-01-04 0.0
2010-01-05 0.0
2010-01-06 0.0
2010-01-07 0.0
2010-01-08 0.0
2010-01-11 0.0
2010-01-12 0.0
2010-01-13 0.0
2010-01-14 0.0
2010-01-15 0.0
2010-01-18 0.0
2010-01-19 0.0
2010-01-20 0.0
2010-01-21 0.0
2010-01-22 0.0
2010-01-25 0.0
2010-01-26 0.0
2010-01-27 0.0
2010-01-28 0.0
2010-01-29 0.0
2010-02-01 0.0
2010-02-02 0.0
I would like to set the value to 1 based on the following rule:
If the constant is set 9 this means the 9th of each month. Due to
that that 2010-01-09 doesn't exist I would like to set the next date
that exists in the series to 1 which is 2010-01-11 above.
I have tried to create two series one (series1) with day < 9 set to 1 and one (series2) with day > 9 to 1 and then series1.shift(1) * series2
It works in the middle of the month but not if day is set to 1 due to that the last date in previous month is set to 0 in series1.
Assume your timeseries is s with a datetimeindex
I want to create a groupby object of all index values whose days are greater than or equal to 9.
g = s.index.to_series().dt.day.ge(9).groupby(pd.TimeGrouper('M'))
Then I'll check that there is at least one day past >= 9 and grab the first among them. With those, I'll assign the value of 1.
s.loc[g.idxmax()[g.any()]] = 1
s
date
2009-12-23 1.0
2009-12-28 0.0
2009-12-29 0.0
2009-12-30 0.0
2009-12-31 0.0
2010-01-04 0.0
2010-01-05 0.0
2010-01-06 0.0
2010-01-07 0.0
2010-01-08 0.0
2010-01-11 1.0
2010-01-12 0.0
2010-01-13 0.0
2010-01-14 0.0
2010-01-15 0.0
2010-01-18 0.0
2010-01-19 0.0
2010-01-20 0.0
2010-01-21 0.0
2010-01-22 0.0
2010-01-25 0.0
2010-01-26 0.0
2010-01-27 0.0
2010-01-28 0.0
2010-01-29 0.0
2010-02-01 0.0
2010-02-02 0.0
Name: val, dtype: float64
Note that 2009-12-23 also was assigned a 1 as it satisfies this requirement as well.
I have a dataframe df with this structure :
TIMESTAMP probab-activ1 probab-activ3 probab-activ5
2015-07-31 23:00:00 90.0 90.0 90.0
2015-07-31 23:10:00 0.0 0.0 0.0
2015-07-31 23:20:00 0.0 0.0 0.0
2015-07-31 23:30:00 0.0 0.0 0.0
2015-07-31 23:40:00 0.0 0.0 0.0
...
2015-10-31 23:20:00 0.0 0.0 0.0
2015-10-31 23:30:00 0.0 0.0 0.0
2015-10-31 23:40:00 0.0 0.0 0.0
I need to calculate for each day of the week (monday , tuesday ,.., sunday) the mean of the probability (probab-activ1, probab-activ3 and probab-activ5) durant the 2 last months.
Any idea to solve this problem?
Thank you in advance
You can use the datetime module and convert your timestamp to a format that is useful for your purpose. For example, you could do:
import datetime
timestamp = '2015-07-31 23:00:00'
day_of_week = datetime.datetime.strptime(timestamp, '%Y-%m-%d %H:%M:%S').strftime('%a')
day_of_week
'Fri'
I would like to plot a bar graph that has only a few entries of data in each column of a pandas DataFrame with a bar graph. This is successful, but not only does it have the wrong y-axis limits, it also makes the x ticks very closely spaced so that the graph is useless. I would like to change the step rate to be about every week or so and only display day, month and year. I have the following DataFrame:
Observed WRF
2014-06-28 12:00:00 0.0 0.0
2014-06-28 13:00:00 0.0 0.0
2014-06-28 14:00:00 0.0 0.0
2014-06-28 15:00:00 0.0 0.0
2014-06-28 16:00:00 0.0 0.0
2014-06-28 17:00:00 0.0 0.0
2014-06-28 18:00:00 0.0 0.0
2014-06-28 19:00:00 0.0 0.0
2014-06-28 20:00:00 0.0 0.0
2014-06-28 21:00:00 0.0 0.0
2014-06-28 22:00:00 0.0 0.0
2014-06-28 23:00:00 0.0 0.0
2014-06-29 00:00:00 0.0 0.0
2014-06-29 01:00:00 0.0 0.0
2014-06-29 02:00:00 0.0 0.0
2014-06-29 03:00:00 0.0 0.0
2014-06-29 04:00:00 0.0 0.0
2014-06-29 05:00:00 0.0 0.0
2014-06-29 06:00:00 0.0 0.0
2014-06-29 07:00:00 0.0 0.0
2014-06-29 08:00:00 0.0 0.0
2014-06-29 09:00:00 0.0 0.0
2014-06-29 10:00:00 0.0 0.0
2014-06-29 11:00:00 0.0 0.0
2014-06-29 12:00:00 0.0 0.0
2014-06-29 13:00:00 0.0 0.0
2014-06-29 14:00:00 0.0 0.0
2014-06-29 15:00:00 0.0 0.0
2014-06-29 16:00:00 0.0 0.0
2014-06-29 17:00:00 0.0 0.0
... ...
2014-07-04 02:00:00 0.0002 0.0
2014-07-04 03:00:00 0.2466 0.0
2014-07-04 04:00:00 0.7103 0.0
2014-07-04 05:00:00 0.9158 1.93521e-13
2014-07-04 06:00:00 0.6583 0.0
2014-07-04 07:00:00 0.3915 0.0
2014-07-04 08:00:00 0.1249 0.0
2014-07-04 09:00:00 0.0 0.0
... ...
2014-08-30 07:00:00 0.0 0.0
2014-08-30 08:00:00 0.0 0.0
2014-08-30 09:00:00 0.0 0.0
2014-08-30 10:00:00 0.0 0.0
2014-08-30 11:00:00 0.0 0.0
2014-08-30 12:00:00 0.0 0.0
2014-08-30 13:00:00 0.0 0.0
2014-08-30 14:00:00 0.0 0.0
2014-08-30 15:00:00 0.0 0.0
2014-08-30 16:00:00 0.0 0.0
2014-08-30 17:00:00 0.0 0.0
2014-08-30 18:00:00 0.0 0.0
2014-08-30 19:00:00 0.0 0.0
2014-08-30 20:00:00 0.0 0.0
2014-08-30 21:00:00 0.0 0.0
2014-08-30 22:00:00 0.0 0.0
2014-08-30 23:00:00 0.0 0.0
2014-08-31 00:00:00 0.0 0.0
2014-08-31 01:00:00 0.0 0.0
2014-08-31 02:00:00 0.0 0.0
2014-08-31 03:00:00 0.0 0.0
2014-08-31 04:00:00 0.0 0.0
2014-08-31 05:00:00 0.0 0.0
2014-08-31 06:00:00 0.0 0.0
2014-08-31 07:00:00 0.0 0.0
2014-08-31 08:00:00 0.0 0.0
2014-08-31 09:00:00 0.0 0.0
2014-08-31 10:00:00 0.0 0.0
2014-08-31 11:00:00 0.0 0.0
2014-08-31 12:00:00 0.0 0.0
And the following code to plot it:
df4.plot(kind='bar',edgecolor='none',figsize=(16,8),linewidth=2, color=((1,0.502,0),'black'))
plt.legend(prop={'size':16})
plt.subplots_adjust(left=.1, right=0.9, top=0.9, bottom=.1)
plt.title('Five Day WRF Model Comparison Near %.2f,%.2f' %(lat,lon),fontsize=24)
plt.ylabel('Hourly Accumulated Precipitation [mm]',fontsize=18,color='black')
ax4=plt.gca()
maxs4=df4.max()
ax4.set_ylim([0, maxs4.max()])
ax4.xaxis_date()
ax4.xaxis.set_label_coords(0.5, -0.05)
plt.xlabel('Time',fontsize=18,color='black')
plt.show()
The y-axis starts at 0, but continues to about double the maximum value of the y-limit. The x-axis counts by hours, which is what I separated the data by, so that makes sense. However, it is not a helpful display.
Look at this code:
import pandas as pd
import numpy as np
from datetime import datetime
import matplotlib.pylab as plt
from matplotlib.dates import DateFormatter
# Sample data
df_origin = pd.DataFrame(pd.date_range(datetime(2014,6,28,12,0,0),
datetime(2014,8,30,12,0,0), freq='1H'), columns=['Valid Time'])
df_origin = df_origin .set_index('Valid Time')
df_origin ['Precipitation'] = np.random.uniform(low=0., high=10., size=(len(df_origin.index)))
df_origin .loc[20:100, 'Precipitation'] = 0.
df_origin .loc[168:168*2, 'Precipitation'] = 0. # second week has to be dry
# Plotting
df_origin.plot(y='Precipitation',kind='bar',edgecolor='none',figsize=(16,8),linewidth=2, color=((1,0.502,0)))
plt.legend(prop={'size':16})
plt.subplots_adjust(left=.1, right=0.9, top=0.9, bottom=.1)
plt.title('Precipitation (WRF Model)',fontsize=24)
plt.ylabel('Hourly Accumulated Precipitation [mm]',fontsize=18,color='black')
ax = plt.gca()
plt.gcf().autofmt_xdate()
# skip ticks for X axis
ax.set_xticklabels([dt.strftime('%Y-%m-%d') for dt in df_origin.index])
for i, tick in enumerate(ax.xaxis.get_major_ticks()):
if (i % (24*7) != 0): # 24 hours * 7 days = 1 week
tick.set_visible(False)
plt.xlabel('Time',fontsize=18,color='black')
plt.show()