Input:
#Fixed-mono-cell temperature
parameters = pvlib.temperature.TEMPERATURE_MODEL_PARAMETERS['sapm']['open_rack_glass_glass'] #to extract specfic parameter
cell_temperature_mono_fixed = pvlib.temperature.sapm_cell(effective_irrad_mono_fixed,
df['T_a'],
df['W_s'],
**parameters)
cell_temperature_mono_fixed
Output:
2005-01-01 01:00:00 NaN
2005-01-01 02:00:00 NaN
2005-01-01 03:00:00 NaN
2005-01-01 04:00:00 NaN
2005-01-01 05:00:00 NaN
..
8755 NaN
8756 NaN
8757 NaN
8758 NaN
8759 NaN
Length: 17520, dtype: float64
cell_temperature_mono_fixed.plot
Output:
/Users/charlielinck/opt/anaconda3/lib/python3.9/site-packages/pandas/core/indexes/base.py:4024: RuntimeWarning: '
Extra data information:
df: dataframe
date_time Sun_Az Sun_alt GHI DHI DNI T_a W_s
0 2005-01-01 01:00:00 17.9 90.0 0.0 0.0 0.0 15.5 13.3
1 2005-01-01 02:00:00 54.8 90.0 0.0 0.0 0.0 17.0 14.5
2 2005-01-01 03:00:00 73.7 90.0 0.0 0.0 0.0 16.7 14.0
3 2005-01-01 04:00:00 85.7 90.0 0.0 0.0 0.0 16.7 14.2
4 2005-01-01 05:00:00 94.9 90.0 0.0 0.0 0.0 16.7 14.1
5 2005-01-01 06:00:00 103.5 90.0 0.0 0.0 0.0 16.6 14.3
6 2005-01-01 07:00:00 111.6 90.0 0.0 0.0 0.0 16.5 13.8
7 2005-01-01 08:00:00 120.5 89.6 1.0 1.0 0.0 16.6 16.0
8 2005-01-01 09:00:00 130.5 79.9 27.0 27.0 0.0 16.8 16.5
9 2005-01-01 10:00:00 141.8 71.7 55.0 55.0 0.0 16.9 16.9
10 2005-01-01 11:00:00 154.9 65.5 83.0 83.0 0.0 17.0 17.2
11 2005-01-01 12:00:00 169.8 61.9 114.0 114.0 0.0 17.4 17.9
12 2005-01-01 13:00:00 185.2 61.4 110.0 110.0 0.0 17.5 18.0
13 2005-01-01 14:00:00 200.4 64.0 94.0 94.0 0.0 17.5 17.8
14 2005-01-01 15:00:00 214.3 69.5 70.0 70.0 0.0 17.5 17.6
15 2005-01-01 16:00:00 226.3 77.2 38.0 38.0 0.0 17.2 17.0
16 2005-01-01 17:00:00 236.5 86.4 4.0 4.0 0.0 16.7 16.3
17 2005-01-01 18:00:00 245.5 90.0 0.0 0.0 0.0 16.0 14.5
18 2005-01-01 19:00:00 254.2 90.0 0.0 0.0 0.0 14.9 13.0
19 2005-01-01 20:00:00 262.3 90.0 0.0 0.0 0.0 16.0 14.1
20 2005-01-01 21:00:00 271.3 90.0 0.0 0.0 0.0 15.1 13.3
21 2005-01-01 22:00:00 282.1 90.0 0.0 0.0 0.0 15.5 13.2
22 2005-01-01 23:00:00 298.1 90.0 0.0 0.0 0.0 15.6 13.0
23 2005-01-02 00:00:00 327.5 90.0 0.0 0.0 0.0 15.8 13.1
df['T_a'] is temperature data,
df['W_s'] is windspeed data
effective_irrad_mono_fixed.head(24)
date_time
2005-01-01 01:00:00 0.000000
2005-01-01 02:00:00 0.000000
2005-01-01 03:00:00 0.000000
2005-01-01 04:00:00 0.000000
2005-01-01 05:00:00 0.000000
2005-01-01 06:00:00 0.000000
2005-01-01 07:00:00 0.000000
2005-01-01 08:00:00 0.936690
2005-01-01 09:00:00 25.168996
2005-01-01 10:00:00 51.165091
2005-01-01 11:00:00 77.354266
2005-01-01 12:00:00 108.002486
2005-01-01 13:00:00 103.809820
2005-01-01 14:00:00 88.138705
2005-01-01 15:00:00 65.051870
2005-01-01 16:00:00 35.390518
2005-01-01 17:00:00 3.742581
2005-01-01 18:00:00 0.000000
2005-01-01 19:00:00 0.000000
2005-01-01 20:00:00 0.000000
2005-01-01 21:00:00 0.000000
2005-01-01 22:00:00 0.000000
2005-01-01 23:00:00 0.000000
2005-01-02 00:00:00 0.000000
Question: I don't understand that if I simply run the function I only get NaN values, might it have something to with the timestamp.
I believe this also results in the RunTimeWarning when I want to plot the function.
This is not really a pvlib issue, more a pandas issue. The problem is that your input time series objects are not on a consistent index: the irradiance input has a pandas.DatetimeIndex while the temperature and wind speed inputs have pandas.RangeIndex (see the index printed out from your df). Math operations on Series are done by aligning index elements and substituting NaN where things don't line up. For example see how only the shared index elements correspond to non-NaN values here:
In [46]: a = pd.Series([1, 2, 3], index=[1, 2, 3])
...: b = pd.Series([2, 3, 4], index=[2, 3, 4])
...: a*b
Out[46]:
1 NaN
2 4.0
3 9.0
4 NaN
dtype: float64
If you examine the index of your cell_temperature_mono_fixed, you'll see it has both timestamps (from the irradiance input) and integers (from the other two), so it's taking the union of the indexes but only filling in values for the intersection (which is empty in this case).
So to fix your problem, you should make sure all the inputs are on a consistent index. The easiest way to do that is probably at the dataframe level, i.e. df = df.set_index('date_time').
Related
I have rain and temp data sourced from Environment Canada but it contains some NaN values.
start_date = '2015-12-31'
end_date = '2021-05-26'
mask = (data['date'] > start_date) & (data['date'] <= end_date)
df = data.loc[mask]
print(df)
date time rain_gauge_value temperature
8760 2016-01-01 00:00:00 0.0 -2.9
8761 2016-01-01 01:00:00 0.0 -3.4
8762 2016-01-01 02:00:00 0.0 -3.6
8763 2016-01-01 03:00:00 0.0 -3.6
8764 2016-01-01 04:00:00 0.0 -4.0
... ... ... ... ...
56107 2021-05-26 19:00:00 0.0 22.0
56108 2021-05-26 20:00:00 0.0 21.5
56109 2021-05-26 21:00:00 0.0 21.1
56110 2021-05-26 22:00:00 0.0 19.5
56111 2021-05-26 23:00:00 0.0 18.5
[47352 rows x 4 columns]
Find the rows with a NaN value
null = df[df['rain_gauge_value'].isnull()]
print(null)
date time rain_gauge_value temperature
11028 2016-04-04 12:00:00 NaN -6.9
11986 2016-05-14 10:00:00 NaN NaN
11987 2016-05-14 11:00:00 NaN NaN
11988 2016-05-14 12:00:00 NaN NaN
11989 2016-05-14 13:00:00 NaN NaN
... ... ... ... ...
49024 2020-08-04 16:00:00 NaN NaN
49025 2020-08-04 17:00:00 NaN NaN
50505 2020-10-05 09:00:00 NaN 11.3
54083 2021-03-03 11:00:00 NaN -5.1
54084 2021-03-03 12:00:00 NaN -4.5
[6346 rows x 4 columns]
This is my dataframe I want to use to fill the NaN values
print(rain_df)
date time rain_gauge_value temperature
0 2015-12-28 00:00:00 0.1 -6.0
1 2015-12-28 01:00:00 0.0 -7.0
2 2015-12-28 02:00:00 0.0 -8.0
3 2015-12-28 03:00:00 0.0 -8.0
4 2015-12-28 04:00:00 0.0 -7.0
... ... ... ... ...
48043 2021-06-19 19:00:00 0.6 20.0
48044 2021-06-19 20:00:00 0.6 19.0
48045 2021-06-19 21:00:00 0.8 18.0
48046 2021-06-19 22:00:00 0.4 17.0
48047 2021-06-19 23:00:00 0.0 16.0
[48048 rows x 4 columns]
But when I use the fillna() method, some of the values don't get substitued.
null = null.fillna(rain_df)
null = null[null['rain_gauge_value'].isnull()]
print(null)
date time rain_gauge_value temperature
48057 2020-06-25 09:00:00 NaN NaN
48058 2020-06-25 10:00:00 NaN NaN
48059 2020-06-25 11:00:00 NaN NaN
48060 2020-06-25 12:00:00 NaN NaN
48586 2020-07-17 10:00:00 NaN NaN
48587 2020-07-17 11:00:00 NaN NaN
48588 2020-07-17 12:00:00 NaN NaN
49022 2020-08-04 14:00:00 NaN NaN
49023 2020-08-04 15:00:00 NaN NaN
49024 2020-08-04 16:00:00 NaN NaN
49025 2020-08-04 17:00:00 NaN NaN
50505 2020-10-05 09:00:00 NaN 11.3
54083 2021-03-03 11:00:00 NaN -5.1
54084 2021-03-03 12:00:00 NaN -4.5
How can I resolve this issue?
when fillna, you probably want a method, like fill using previous/next value, mean of column etc, what we can do is like this
nulls_index = df['rain_gauge_value'].isnull()
df = df.fillna(method='ffill') # use ffill as example
nulls_after_fill = df[nulls_index]
take a look at:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.fillna.html
You need to inform pandas how you want to patch. It may be obvious to you want to use the "patch" dataframe's values when the date and times line up, but it won't be obvious to pandas. see my dummy example:
raw = pd.DataFrame(dict(date=[date(2015,12,28), date(2015,12,28)], time= [time(0,0,0),time(0,0,1)],temp=[1.,np.nan],rain=[4.,np.nan]))
raw
date time temp rain
0 2015-12-28 00:00:00 1.0 4.0
1 2015-12-28 00:00:01 NaN NaN
patch = pd.DataFrame(dict(date=[date(2015,12,28), date(2015,12,28)], time=[time(0,0,0),time(0,0,1)],temp=[5.,5.],rain=[10.,10.]))
patch
date time temp rain
0 2015-12-28 00:00:00 5.0 10.0
1 2015-12-28 00:00:01 5.0 10.0
you need the indexes of raw and patch to correspond to how you want to patch the raw data (in this case, you want to patch based on date and time)
raw.set_index(['date','time']).fillna(patch.set_index(['date','time']))
returns
temp rain
date time
2015-12-28 00:00:00 1.0 4.0
00:00:01 5.0 10.0
I have a timeseries with data related to the irradiance of the sun. I have data for every hour during a year, but every month has data from a diferent year. For example, the data taken in March can be from 2012 and the data taken in January can be from 2014.
T2m RH G(h) Gb(n) Gd(h) IR(h) WS10m WD10m SP Hour Month
time(UTC)
2012-01-01 00:00:00 16.00 81.66 0.0 -0.0 0.0 310.15 2.56 284.0 102252.0 0 1
2012-01-01 01:00:00 15.97 82.42 0.0 -0.0 0.0 310.61 2.49 281.0 102228.0 1 1
2012-01-01 02:00:00 15.93 83.18 0.0 -0.0 0.0 311.06 2.41 278.0 102205.0 2 1
2012-01-01 03:00:00 15.89 83.94 0.0 -0.0 0.0 311.52 2.34 281.0 102218.0 3 1
2012-01-01 04:00:00 15.85 84.70 0.0 -0.0 0.0 311.97 2.26 284.0 102232.0 4 1
... ... ... ... ... ... ... ... ... ... ... ...
2011-12-31 19:00:00 16.19 77.86 0.0 -0.0 0.0 307.88 2.94 301.0 102278.0 19 12
2011-12-31 20:00:00 16.15 78.62 0.0 -0.0 0.0 308.33 2.86 302.0 102295.0 20 12
2011-12-31 21:00:00 16.11 79.38 0.0 -0.0 0.0 308.79 2.79 297.0 102288.0 21 12
2011-12-31 22:00:00 16.08 80.14 0.0 -0.0 0.0 309.24 2.71 292.0 102282.0 22 12
2011-12-31 23:00:00 16.04 80.90 0.0 -0.0 0.0 309.70 2.64 287.0 102275.0 23 12
My question is: there is a way I can set all the data to a certain year?
For example, set all data to 2014
T2m RH G(h) Gb(n) Gd(h) IR(h) WS10m WD10m SP Hour Month
time(UTC)
2014-01-01 00:00:00 16.00 81.66 0.0 -0.0 0.0 310.15 2.56 284.0 102252.0 0 1
2014-01-01 01:00:00 15.97 82.42 0.0 -0.0 0.0 310.61 2.49 281.0 102228.0 1 1
2014-01-01 02:00:00 15.93 83.18 0.0 -0.0 0.0 311.06 2.41 278.0 102205.0 2 1
2014-01-01 03:00:00 15.89 83.94 0.0 -0.0 0.0 311.52 2.34 281.0 102218.0 3 1
2014-01-01 04:00:00 15.85 84.70 0.0 -0.0 0.0 311.97 2.26 284.0 102232.0 4 1
... ... ... ... ... ... ... ... ... ... ... ...
2014-12-31 19:00:00 16.19 77.86 0.0 -0.0 0.0 307.88 2.94 301.0 102278.0 19 12
2014-12-31 20:00:00 16.15 78.62 0.0 -0.0 0.0 308.33 2.86 302.0 102295.0 20 12
2014-12-31 21:00:00 16.11 79.38 0.0 -0.0 0.0 308.79 2.79 297.0 102288.0 21 12
2014-12-31 22:00:00 16.08 80.14 0.0 -0.0 0.0 309.24 2.71 292.0 102282.0 22 12
2014-12-31 23:00:00 16.04 80.90 0.0 -0.0 0.0 309.70 2.64 287.0 102275.0 23 12
Thanks in advance.
Use offsets.DateOffset with year (without s) for set same year in all DatetimeIndex:
rng = pd.date_range('2009-04-03', periods=10, freq='350D')
df = pd.DataFrame({ 'a': range(10)}, rng)
print (df)
a
2009-04-03 0
2010-03-19 1
2011-03-04 2
2012-02-17 3
2013-02-01 4
2014-01-17 5
2015-01-02 6
2015-12-18 7
2016-12-02 8
2017-11-17 9
df.index += pd.offsets.DateOffset(year=2014)
print (df)
a
2014-04-03 0
2014-03-19 1
2014-03-04 2
2014-02-17 3
2014-02-01 4
2014-01-17 5
2014-01-02 6
2014-12-18 7
2014-12-02 8
2014-11-17 9
Another idea with Index.map and replace:
df.index = df.index.map(lambda x: x.replace(year=2014))
I have a df as follows
dates winter summer rest Final
2020-01-01 00:15:00 65.5 71.5 73.0 NaN
2020-01-01 00:30:00 62.6 69.0 70.1 NaN
2020-01-01 00:45:00 59.6 66.3 67.1 NaN
2020-01-01 01:00:00 57.0 63.5 64.5 NaN
2020-01-01 01:15:00 54.8 60.9 62.3 NaN
2020-01-01 01:30:00 53.1 58.6 60.6 NaN
2020-01-01 01:45:00 51.7 56.6 59.2 NaN
2020-01-01 02:00:00 50.5 55.1 57.9 NaN
2020-01-01 02:15:00 49.4 54.2 56.7 NaN
2020-01-01 02:30:00 48.5 53.7 55.6 NaN
2020-01-01 02:45:00 47.9 53.4 54.7 NaN
2020-01-01 03:00:00 47.7 53.3 54.2 NaN
2020-01-01 03:15:00 47.9 53.1 54.1 NaN
2020-01-01 03:30:00 48.7 53.2 54.6 NaN
2020-01-01 03:45:00 50.2 54.1 55.8 NaN
2020-01-01 04:00:00 52.3 56.1 57.9 NaN
2020-04-28 12:30:00 225.1 200.0 209.8 NaN
2020-04-28 12:45:00 215.7 193.8 201.9 NaN
2020-04-28 13:00:00 205.6 186.9 193.4 NaN
2020-04-28 13:15:00 195.7 179.9 185.0 NaN
2020-04-28 13:30:00 186.7 173.4 177.4 NaN
2020-04-28 13:45:00 179.2 168.1 170.9 NaN
2020-04-28 14:00:00 173.8 164.4 166.3 NaN
2020-04-28 14:15:00 171.0 163.0 163.9 NaN
2020-04-28 14:30:00 170.7 163.5 163.6 NaN
2020-12-31 21:15:00 88.5 90.2 89.2 NaN
2020-12-31 21:30:00 85.2 88.5 87.2 NaN
2020-12-31 21:45:00 82.1 86.3 85.0 NaN
2020-12-31 22:00:00 79.4 84.1 83.2 NaN
2020-12-31 22:15:00 77.6 82.4 82.1 NaN
2020-12-31 22:30:00 76.4 81.2 81.7 NaN
2020-12-31 22:45:00 75.6 80.3 81.6 NaN
2020-12-31 23:00:00 74.7 79.4 81.3 NaN
2020-12-31 23:15:00 73.7 78.4 80.6 NaN
2020-12-31 23:30:00 72.3 77.2 79.5 NaN
2020-12-31 23:45:00 70.5 75.7 77.9 NaN
2021-01-01 00:00:00 68.2 73.8 75.7 NaN
The dates column has dates starting from 2020-01-01 00:15:00 till 2021-01-01 00:00:00 split at every 15 mins.
I also have the following date range conditions:
Winter: 01.11 - 20.03
Summer: 15.05 - 14.09
Rest: 21.03 - 14.05 & 15.09 - 31.10
What I want to do is to create a new column named season that checks every date in the dates column and assigns winter if the date is in Winter range, summer if it is in Summer range and rest if it is the Rest range.
Then, based on the value in the season column, the Final column must be filled. If the value in season column is 'winter', then the values from winter column must be placed, if the value in season column is 'summer', then the values from summer column must be placed and so on.
How can this be done?
Idea is normalize datetimes for same year, then filter by Series.between and set new column by numpy.select:
d = pd.to_datetime(df['dates'].dt.strftime('%m-%d-2020'))
m1 = d.between('2020-11-01','2020-12-31') | d.between('2020-01-01','2020-03-20')
m2 = d.between('2020-05-15','2020-09-14')
df['Final'] = np.select([m1, m2], ['Winter','Summer'], default='Rest')
print (df)
dates winter summer rest Final
0 2020-01-01 00:15:00 65.5 71.5 73.0 Winter
1 2020-06-15 00:30:00 62.6 69.0 70.1 Summer
2 2020-12-25 00:45:00 59.6 66.3 67.1 Winter
3 2020-10-10 01:00:00 57.0 63.5 64.5 Rest
I have a dataframe that has one column and a timestamp index including anywhere from 2 to 7 days:
kWh
Timestamp
2017-07-08 06:00:00 0.00
2017-07-08 07:00:00 752.75
2017-07-08 08:00:00 1390.20
2017-07-08 09:00:00 2027.65
2017-07-08 10:00:00 2447.27
.... ....
2017-07-12 20:00:00 167.64
2017-07-12 21:00:00 0.00
2017-07-12 22:00:00 0.00
2017-07-12 23:00:00 0.00
I would like to transpose the kWh column so that one day's worth of values (hourly granularity, so 24 values/day) fill up a row. And the next row is the next day of values and so on (so five days of forecasted data has five rows with 24 elements each).
Because my query of the data comes in the vertical format, and my regression and subsequent analysis already occurs in the vertical format, I don't want to change the process too much and am hoping there is a simpler way. I have tried giving a multi-index with df.index.hour and then using unstack(), but I get a huge dataframe with NaN values everywhere.
Is there an elegant way to do this?
If we start from a frame like
In [25]: df = pd.DataFrame({"kWh": [1]}, index=pd.date_range("2017-07-08",
"2017-07-12", freq="1H").rename("Timestamp")).cumsum()
In [26]: df.head()
Out[26]:
kWh
Timestamp
2017-07-08 00:00:00 1
2017-07-08 01:00:00 2
2017-07-08 02:00:00 3
2017-07-08 03:00:00 4
2017-07-08 04:00:00 5
we can make date and hour columns and then pivot:
In [27]: df["date"] = df.index.date
In [28]: df["hour"] = df.index.hour
In [29]: df.pivot(index="date", columns="hour", values="kWh")
Out[29]:
hour 0 1 2 3 4 5 6 7 8 9 ... \
date ...
2017-07-08 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 ...
2017-07-09 25.0 26.0 27.0 28.0 29.0 30.0 31.0 32.0 33.0 34.0 ...
2017-07-10 49.0 50.0 51.0 52.0 53.0 54.0 55.0 56.0 57.0 58.0 ...
2017-07-11 73.0 74.0 75.0 76.0 77.0 78.0 79.0 80.0 81.0 82.0 ...
2017-07-12 97.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
hour 14 15 16 17 18 19 20 21 22 23
date
2017-07-08 15.0 16.0 17.0 18.0 19.0 20.0 21.0 22.0 23.0 24.0
2017-07-09 39.0 40.0 41.0 42.0 43.0 44.0 45.0 46.0 47.0 48.0
2017-07-10 63.0 64.0 65.0 66.0 67.0 68.0 69.0 70.0 71.0 72.0
2017-07-11 87.0 88.0 89.0 90.0 91.0 92.0 93.0 94.0 95.0 96.0
2017-07-12 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
[5 rows x 24 columns]
Not sure why your MultiIndex code doesn't work.
I'm assuming your MultiIndex code is something along the lines, which gives the same output as the pivot:
In []
df = pd.DataFrame({"kWh": [1]}, index=pd.date_range("2017-07-08",
"2017-07-12", freq="1H").rename("Timestamp")).cumsum()
df.index = pd.MultiIndex.from_arrays([df.index.date, df.index.hour], names=['Date','Hour'])
df.unstack()
Out[]:
kWh ... \
Hour 0 1 2 3 4 5 6 7 8 9 ...
Date ...
2017-07-08 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 ...
2017-07-09 25.0 26.0 27.0 28.0 29.0 30.0 31.0 32.0 33.0 34.0 ...
2017-07-10 49.0 50.0 51.0 52.0 53.0 54.0 55.0 56.0 57.0 58.0 ...
2017-07-11 73.0 74.0 75.0 76.0 77.0 78.0 79.0 80.0 81.0 82.0 ...
2017-07-12 97.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
Hour 14 15 16 17 18 19 20 21 22 23
Date
2017-07-08 15.0 16.0 17.0 18.0 19.0 20.0 21.0 22.0 23.0 24.0
2017-07-09 39.0 40.0 41.0 42.0 43.0 44.0 45.0 46.0 47.0 48.0
2017-07-10 63.0 64.0 65.0 66.0 67.0 68.0 69.0 70.0 71.0 72.0
2017-07-11 87.0 88.0 89.0 90.0 91.0 92.0 93.0 94.0 95.0 96.0
2017-07-12 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
[5 rows x 24 columns]
I would like to plot a bar graph that has only a few entries of data in each column of a pandas DataFrame with a bar graph. This is successful, but not only does it have the wrong y-axis limits, it also makes the x ticks very closely spaced so that the graph is useless. I would like to change the step rate to be about every week or so and only display day, month and year. I have the following DataFrame:
Observed WRF
2014-06-28 12:00:00 0.0 0.0
2014-06-28 13:00:00 0.0 0.0
2014-06-28 14:00:00 0.0 0.0
2014-06-28 15:00:00 0.0 0.0
2014-06-28 16:00:00 0.0 0.0
2014-06-28 17:00:00 0.0 0.0
2014-06-28 18:00:00 0.0 0.0
2014-06-28 19:00:00 0.0 0.0
2014-06-28 20:00:00 0.0 0.0
2014-06-28 21:00:00 0.0 0.0
2014-06-28 22:00:00 0.0 0.0
2014-06-28 23:00:00 0.0 0.0
2014-06-29 00:00:00 0.0 0.0
2014-06-29 01:00:00 0.0 0.0
2014-06-29 02:00:00 0.0 0.0
2014-06-29 03:00:00 0.0 0.0
2014-06-29 04:00:00 0.0 0.0
2014-06-29 05:00:00 0.0 0.0
2014-06-29 06:00:00 0.0 0.0
2014-06-29 07:00:00 0.0 0.0
2014-06-29 08:00:00 0.0 0.0
2014-06-29 09:00:00 0.0 0.0
2014-06-29 10:00:00 0.0 0.0
2014-06-29 11:00:00 0.0 0.0
2014-06-29 12:00:00 0.0 0.0
2014-06-29 13:00:00 0.0 0.0
2014-06-29 14:00:00 0.0 0.0
2014-06-29 15:00:00 0.0 0.0
2014-06-29 16:00:00 0.0 0.0
2014-06-29 17:00:00 0.0 0.0
... ...
2014-07-04 02:00:00 0.0002 0.0
2014-07-04 03:00:00 0.2466 0.0
2014-07-04 04:00:00 0.7103 0.0
2014-07-04 05:00:00 0.9158 1.93521e-13
2014-07-04 06:00:00 0.6583 0.0
2014-07-04 07:00:00 0.3915 0.0
2014-07-04 08:00:00 0.1249 0.0
2014-07-04 09:00:00 0.0 0.0
... ...
2014-08-30 07:00:00 0.0 0.0
2014-08-30 08:00:00 0.0 0.0
2014-08-30 09:00:00 0.0 0.0
2014-08-30 10:00:00 0.0 0.0
2014-08-30 11:00:00 0.0 0.0
2014-08-30 12:00:00 0.0 0.0
2014-08-30 13:00:00 0.0 0.0
2014-08-30 14:00:00 0.0 0.0
2014-08-30 15:00:00 0.0 0.0
2014-08-30 16:00:00 0.0 0.0
2014-08-30 17:00:00 0.0 0.0
2014-08-30 18:00:00 0.0 0.0
2014-08-30 19:00:00 0.0 0.0
2014-08-30 20:00:00 0.0 0.0
2014-08-30 21:00:00 0.0 0.0
2014-08-30 22:00:00 0.0 0.0
2014-08-30 23:00:00 0.0 0.0
2014-08-31 00:00:00 0.0 0.0
2014-08-31 01:00:00 0.0 0.0
2014-08-31 02:00:00 0.0 0.0
2014-08-31 03:00:00 0.0 0.0
2014-08-31 04:00:00 0.0 0.0
2014-08-31 05:00:00 0.0 0.0
2014-08-31 06:00:00 0.0 0.0
2014-08-31 07:00:00 0.0 0.0
2014-08-31 08:00:00 0.0 0.0
2014-08-31 09:00:00 0.0 0.0
2014-08-31 10:00:00 0.0 0.0
2014-08-31 11:00:00 0.0 0.0
2014-08-31 12:00:00 0.0 0.0
And the following code to plot it:
df4.plot(kind='bar',edgecolor='none',figsize=(16,8),linewidth=2, color=((1,0.502,0),'black'))
plt.legend(prop={'size':16})
plt.subplots_adjust(left=.1, right=0.9, top=0.9, bottom=.1)
plt.title('Five Day WRF Model Comparison Near %.2f,%.2f' %(lat,lon),fontsize=24)
plt.ylabel('Hourly Accumulated Precipitation [mm]',fontsize=18,color='black')
ax4=plt.gca()
maxs4=df4.max()
ax4.set_ylim([0, maxs4.max()])
ax4.xaxis_date()
ax4.xaxis.set_label_coords(0.5, -0.05)
plt.xlabel('Time',fontsize=18,color='black')
plt.show()
The y-axis starts at 0, but continues to about double the maximum value of the y-limit. The x-axis counts by hours, which is what I separated the data by, so that makes sense. However, it is not a helpful display.
Look at this code:
import pandas as pd
import numpy as np
from datetime import datetime
import matplotlib.pylab as plt
from matplotlib.dates import DateFormatter
# Sample data
df_origin = pd.DataFrame(pd.date_range(datetime(2014,6,28,12,0,0),
datetime(2014,8,30,12,0,0), freq='1H'), columns=['Valid Time'])
df_origin = df_origin .set_index('Valid Time')
df_origin ['Precipitation'] = np.random.uniform(low=0., high=10., size=(len(df_origin.index)))
df_origin .loc[20:100, 'Precipitation'] = 0.
df_origin .loc[168:168*2, 'Precipitation'] = 0. # second week has to be dry
# Plotting
df_origin.plot(y='Precipitation',kind='bar',edgecolor='none',figsize=(16,8),linewidth=2, color=((1,0.502,0)))
plt.legend(prop={'size':16})
plt.subplots_adjust(left=.1, right=0.9, top=0.9, bottom=.1)
plt.title('Precipitation (WRF Model)',fontsize=24)
plt.ylabel('Hourly Accumulated Precipitation [mm]',fontsize=18,color='black')
ax = plt.gca()
plt.gcf().autofmt_xdate()
# skip ticks for X axis
ax.set_xticklabels([dt.strftime('%Y-%m-%d') for dt in df_origin.index])
for i, tick in enumerate(ax.xaxis.get_major_ticks()):
if (i % (24*7) != 0): # 24 hours * 7 days = 1 week
tick.set_visible(False)
plt.xlabel('Time',fontsize=18,color='black')
plt.show()