I was wondering how one rotates the x-labels, something in the lines of:
theme(axis.text.x = element_text(angle = 90, hjust = 1))
in ggplot?
Thank you.
Plotnine is basically a clone of ggplot, you can call (almost) exactly that.
Here's an example :
import pandas as pd
from datetime import datetime, timedelta
from plotnine import ggplot, geom_point, aes, theme, element_text
now = datetime.now()
ago_28days = now - timedelta(days=28)
delta = now - ago_28days
timestamps = [ago_28days + timedelta(days=i) for i in range(delta.days)]
df = pd.DataFrame(data={'timestamp': timestamps, 'value':list(range(28))})
(ggplot(df) +
geom_point(aes('timestamp', 'value')) +
theme(axis_text_x=element_text(rotation=90, hjust=1))
)
Related
From the code given here, I have developed another code which uses Matplotlib in place of Seaborn (The data are plotted on several figures and subplots, and so are now more readable and I am closer to the point I want to reach: the user by putting the cursor over a point has access to all the information of the point, in particular the datetime.)
Here it is:
import pandas as pd
import numpy as np
import matplotlib.dates as mdates
import matplotlib.pyplot as plt
import random
from datetime import datetime
# size of the database
n = 1000
nA = 4
nB = 9
no = np.arange(n)
date = np.random.randint(1e9, size=n).astype('datetime64[s]')
A = [''.join(['A',str(random.randint(1, nA))]) for j in range(n)]
B = [''.join(['B',str(random.randint(1, nB))]) for j in range(n)]
Epsilon1 = np.random.random_sample((n,))
Epsilon2 = np.random.random_sample((n,))
Epsilon3 = np.random.random_sample((n,))
data = pd.DataFrame({'no':no,
'Date':date,
'A':A,
'B':B,
'Epsilon1':Epsilon1,
'Epsilon2':Epsilon2,
'Epsilon3':Epsilon3})
def format_coord(x, y):
string_x = datetime.utcfromtimestamp(x).strftime("%m/%d/%Y, %H:%M:%S")
return 'x={}, y={:.4f}'.format(string_x,y)
def plot_Epsilon_matplotlib():
for A in data['A'].sort_values().drop_duplicates().to_list():
n_col = 2
fig, axes = plt.subplots(np.ceil(nB/n_col).astype(int),n_col)
for j, B in enumerate(data['B'].sort_values().drop_duplicates().to_list()):
df = data.loc[(data['A']==A) & (data['B']==B)]
df = df.sort_values("Date", ascending=True)
axes.flatten()[j].plot(df["Date"],df['Epsilon1'],marker='x',c='b',label="Epsilon1")
axes.flatten()[j].plot(df["Date"],df['Epsilon2'],marker='x',c='r',label="Epsilon2")
axes.flatten()[j].plot(df["Date"],df['Epsilon3'],marker='x',c='g',label="Epsilon3")
axes.flatten()[j].format_coord = format_coord
if __name__ == '__main__':
plot_Epsilon_matplotlib()
The goal is that when the user puts the cursor over a point, he gets access to the full datetime of the data.
I have first tried to change the major formatter (as here):
axes.flatten()[j].xaxis.set_major_formatter(mdates.DateFormatter('%Y/%m/%d %H:%M:%S'))
but then the x ticks are not readable (especially if the user zooms on a subplot)
I then tried the define my own format_coord as here. My first try is given in the full code given above. The format of the datetime in Matplotlib figure status bar is good but the date remains in 1970 !
After reading this discussion, I realized this problem relates on Numpy datetime64 to Datetime conversion. I then coded this new version of format_coord (strongly inspired from this answer):
def format_coord_bis(x,y):
dt64 = np.datetime64(datetime.utcfromtimestamp(x))
unix_epoch = np.datetime64(0, 's')
one_second = np.timedelta64(1, 's')
seconds_since_epoch = (dt64 - unix_epoch) / one_second
string_x = datetime.utcfromtimestamp(seconds_since_epoch).strftime("%m/%d/%Y, %H:%M:%S")
return 'x={}, y={:.4f}'.format(string_x,y)
but the date given in the status bar remains the 01/01/1970...
I have found the solution from this answer.
The function format_coord() should be defined as follows:
def format_coord(x, y):
string_x = matplotlib.dates.num2date(x).strftime('%Y-%m-%d %H:%M:%S')
return 'x={}, y={:.4f}'.format(string_x,y)
I am using WRF output data to plot SkweT, here is code:
import wrf
from netCDF4 import Dataset
import matplotlib.pyplot as plt
import numpy as np
import metpy.calc as mpcalc
from metpy.plots import SkewT
from metpy.units import units
wrfin = Dataset(r'wrfout_d02_2022-06-19_00_00_00')
lat_lon = [25.0803, 121.2183]
x_y = wrf.ll_to_xy(wrfin, lat_lon[0], lat_lon[1])
p1 = wrf.getvar(wrfin,"pressure",timeidx=0)
T1 = wrf.getvar(wrfin,"tc",timeidx=0)
Td1 = wrf.getvar(wrfin,"td",timeidx=0)
u1 = wrf.getvar(wrfin,"ua",timeidx=0)
v1 = wrf.getvar(wrfin,"va",timeidx=0)
p = p1[:,x_y[0],x_y[1]] * units.hPa
T = T1[:,x_y[0],x_y[1]] * units.degC
Td = Td1[:,x_y[0],x_y[1]] * units.degC
u = v1[:,x_y[0],x_y[1]] * units('m/s')
v = u1[:,x_y[0],x_y[1]] * units('m/s')
skew = SkewT()
skew.plot(p, T, 'r')
skew.plot(p, Td, 'g')
my_interval = np.arange(100, 1000, 50) * units('mbar')
ix = mpcalc.resample_nn_1d(p, my_interval)
skew.plot_barbs(p[ix], u[ix], v[ix])
skew.plot_dry_adiabats()
skew.plot_moist_adiabats()
skew.plot_mixing_lines()
skew.ax.set_ylim(1000, 100)
skew.ax.set_xlim(-60, 40)
skew.ax.set_xlabel('Temperature ($^\circ$C)')
skew.ax.set_ylabel('Pressure (hPa)')
plt.savefig('SkewT.png', bbox_inches='tight')
but running error message especially in the block:
**raise DimensionalityError(
pint.errors.DimensionalityError: Cannot convert from 'dimensionless' (dimensionless) to 'millibar' ([mass] / [length] / [time] ** 2)**
It seems that mpcalc.resample_nn_1d doesn't work.
how can I solve it?
python version : 3.9.7
metpy version : 0.12.0
My guess is that wrf.getvar() is returning NumPy masked arrays, which behave a little weird with the Pint Quantity instances (compared with normal arrays). I would recommend trying this syntax to "attach" units:
p = units.Quantity(p1[:,x_y[0],x_y[1]], 'hPa')
T = units.Quantity(T1[:,x_y[0],x_y[1]], 'degC')
Td = units.Quantity(Td1[:,x_y[0],x_y[1]], 'degC')
u = units.Quantity(v1[:,x_y[0],x_y[1]], 'm/s')
v = units.Quantity(u1[:,x_y[0],x_y[1]], 'm/s')
Also, a heads up that MetPy 0.12 is quite a bit out of date, and I would recommend updating to the latest version (currently 1.3.1) as soon as you can.
I'm plotting the counts of a variable grouped by time as a heatmap. However, when including both hour and minute, the counts are quite low so the resulting heatmap doesn't really provide any real insight. Is it possible to group the counts in a bigger block of time? I'm hoping to test some different periods (5, 10 mins).
I'm also hoping to plot time on the x-axis. Similar to the output attached.
import seaborn as sns
import pandas as pd
from datetime import datetime
from datetime import timedelta
start = datetime(1900,1,1,10,0,0)
end = datetime(1900,1,1,13,0,0)
seconds = (end - start).total_seconds()
step = timedelta(minutes = 1)
array = []
for i in range(0, int(seconds), int(step.total_seconds())):
array.append(start + timedelta(seconds=i))
array = [i.strftime('%Y-%m-%d %H:%M%:%S') for i in array]
df2 = pd.DataFrame(array).rename(columns = {0:'Time'})
df2['Count'] = np.random.uniform(0.0, 0.5, size = len(df2))
df2['Count'] = df2['Count'].round(1)
df2['Time'] = pd.to_datetime(df2['Time'])
df2['Hour'] = df2['Time'].dt.hour
df2['Min'] = df2['Time'].dt.minute
g = df2.groupby(['Hour','Min','Count'])
count_df = g['Count'].nunique().unstack()
count_df.fillna(0, inplace = True)
sns.heatmap(count_df)
To deal with such cases, I think it would be easy to use data downsampling. It is also easy to change the thresholds. The axis labels in the output graph will need to be modified, but we recommend this method.
import seaborn as sns
import pandas as pd
import numpy as np
from datetime import datetime
from datetime import timedelta
start = datetime(1900,1,1,10,0,0)
end = datetime(1900,1,1,13,0,0)
seconds = (end - start).total_seconds()
step = timedelta(minutes = 1)
array = []
for i in range(0, int(seconds), int(step.total_seconds())):
array.append(start + timedelta(seconds=i))
array = [i.strftime('%Y-%m-%d %H:%M:%S') for i in array]
df2 = pd.DataFrame(array).rename(columns = {0:'Time'})
df2['Count'] = np.random.uniform(0.0, 0.5, size = len(df2))
df2['Count'] = df2['Count'].round(1)
df2['Time'] = pd.to_datetime(df2['Time'])
df2['Hour'] = df2['Time'].dt.hour
df2['Min'] = df2['Time'].dt.minute
df2.set_index('Time', inplace=True)
count_df = df2.resample('10min')['Count'].value_counts().unstack()
count_df.fillna(0, inplace = True)
sns.heatmap(count_df.T)
The way you could achieve this is by creating a column with numbers that have repeating elements for the number of minutes.
For example:
minutes = 3
x = [0,1,2]
np.repeat(x, repeats=minutes, axis=0)
>>>> [0,0,0,1,1,1,2,2,2]
and then group your data using this column.
So your code would look like:
...
minutes = 5
x = [i for i in range(int(df2.shape[0]/5))]
df2['group'] = np.repeat(x, repeats=minutes, axis=0)
g = df2.groupby(['Min', 'Count'])
count_df = g['Count'].nunique().unstack()
count_df.fillna(0, inplace = True)
I guess this is supposed to be simple.. But I cant seem to make it work.
I have some stock data
import pandas as pd
import numpy as np
df = pd.DataFrame(index=pd.date_range(start = "06/01/2018", end = "08/01/2018"),
data = np.random.rand(62)*100)
I am doing some analysis on it, this results of my drawing some lines on the graph.
And I want to plot a 45 line somewhere on the graph as a reference for lines I drew on the graph.
What I have tried is
x = df.tail(len(df)/20).index
x = x.reset_index()
x_first_val = df.loc[x.loc[0].date].adj_close
In order to get some point and then use slope = 1 and calculate y values.. but this sounds all wrong.
Any ideas?
Here is a possibility:
import pandas as pd
import numpy as np
df = pd.DataFrame(index=pd.date_range(start = "06/01/2018", end = "08/01/2018"),
data=np.random.rand(62)*100,
columns=['data'])
# Get values for the time:
index_range = df.index[('2018-06-18' < df.index) & (df.index < '2018-07-21')]
# get the timestamps in nanoseconds (since epoch)
timestamps_ns = index_range.astype(np.int64)
# convert it to a relative number of days (for example, could be seconds)
time_day = (timestamps_ns - timestamps_ns[0]) / 1e9 / 60 / 60 / 24
# Define y-data for a line:
slope = 3 # unit: "something" per day
something = time_day * slope
trendline = pd.Series(something, index=index_range)
# Graph:
df.plot(label='data', alpha=0.8)
trendline.plot(label='some trend')
plt.legend(); plt.ylabel('something');
which gives:
edit - first answer, using dayofyear instead of the timestamps:
import pandas as pd
import numpy as np
df = pd.DataFrame(index=pd.date_range(start = "06/01/2018", end = "08/01/2018"),
data=np.random.rand(62)*100,
columns=['data'])
# Define data for a line:
slope = 3 # unit: "something" per day
index_range = df.index[('2018-06-18' < df.index) & (df.index < '2018-07-21')]
dayofyear = index_range.dayofyear # it will not work around the new year...
dayofyear = dayofyear - dayofyear[0]
something = dayofyear * slope
trendline = pd.Series(something, index=index_range)
# Graph:
df.plot(label='data', alpha=0.8)
trendline.plot(label='some trend')
plt.legend(); plt.ylabel('something');
Is anyone else having trouble with the new rolling.std() in pandas? The deprecated method was rolling_std(). The new method runs fine but produces a constant number that does not roll with the time series.
Sample code is below. If you trade stocks, you may recognize the formula for Bollinger bands. The output I get from rolling.std() tracks the stock day by day and is obviously not rolling.
This in in pandas 0.19.1. Any help would be appreciated.
import datetime
import pandas as pd
import pandas_datareader.data as web
start = datetime.datetime(2012,1,1)
end = datetime.datetime(2012,12,31)
g = web.DataReader(['AAPL'], 'yahoo', start, end)
stocks = g['Close']
stocks['Date'] = pd.to_datetime(stocks.index)
stocks['AAPL_LO'] = stocks['AAPL'] - stocks['AAPL'].rolling(20).std() * 2
stocks['AAPL_HI'] = stocks['AAPL'] + stocks['AAPL'].rolling(20).std() * 2
stocks.dropna(axis=0, how='any', inplace=True)
import pandas as pd
from pandas_datareader import data as pdr
import numpy as np
import datetime
end = datetime.date.today()
begin=end-pd.DateOffset(365*10)
st=begin.strftime('%Y-%m-%d')
ed=end.strftime('%Y-%m-%d')
data = pdr.get_data_yahoo("AAPL",st,ed)
def bollinger_strat(data, window, no_of_std):
rolling_mean = data['Close'].rolling(window).mean()
rolling_std = data['Close'].rolling(window).std()
df['Bollinger High'] = rolling_mean + (rolling_std * no_of_std)
df['Bollinger Low'] = rolling_mean - (rolling_std * no_of_std)
bollinger_strat(data,20,2)