I am plotting a simple chart and adding a number taken from a DataFrame, via plot.text(). The number is plotting as intended, but detail of its properties are also being displayed. I would like to suppress the display of properties and plot just the number.
The following code reproduces the issue.
import numpy as np
import pandas as pd
from pandas import *
import matplotlib.pyplot as plot
%matplotlib inline
rand = np.random.RandomState(1)
index = np.arange(8)
df = DataFrame(rand.randn(8, 1), index=index, columns=list('A'))
df['date'] = date_range('1/1/2014', periods=8)
print df
A date
0 1.624345 2014-01-01
1 -0.611756 2014-01-02
2 -0.528172 2014-01-03
3 -1.072969 2014-01-04
4 0.865408 2014-01-05
5 -2.301539 2014-01-06
6 1.744812 2014-01-07
7 -0.761207 2014-01-08
df2 = pd.DataFrame(index = ['1'], columns=['example'])
df2['example'] = 1.436792
print df2
example
1 1.436792
fig, ax = plot.subplots(figsize=(15,10))
df.plot(x='date', y='A')
plot.text(0.05, 0.95, df2['example'],
horizontalalignment='left',
verticalalignment='center',
transform = ax.transAxes)
The plot is showing index, name and dtype data along with the example number. Can anybody show how to suppress this detail and just plot the number? Any help much appreciated.
Just plot with the DataFrame values:
plot.text(0.05, 0.95, df2['example'].values,
horizontalalignment='left',
verticalalignment='center',
transform = ax.transAxes)
Or just set visible=True to hide everything:
plot.text(0.05, 0.95, df2['example'],
horizontalalignment='left',
verticalalignment='center',
transform = ax.transAxes, visible=False)
Related
I want to plot the data between two variables. In that I want to plot monthly data using a special color.
My code and expectedoutput:
import matplotlib.pyplot as plt
df
A B
2019-01-01 10 20
2019-01-02 20 30
2019-02-01 10 15
2019-02-02 20 40
2019-03-01 12 32
2019-03-02 5 14
plt.plot(df['A'],df['B'])
plt.show()
My current plot plots all the data as usual but I am expecting something different as given below.
My expected output:
2019-03-01 10 20
You can do something like this:
markers = 'dsxo'
months = pd.to_datetime(df.index).to_period('M')
for i, (k,d) in enumerate(df.groupby(months) ):
plt.plot(d['A'],d['B'], label=k, marker=markers[i])
plt.legend()
Output:
Check this code:
import matplotlib.pyplot as plt
import pandas as pd
df = pd.read_csv('data.csv')
df['Month'] = df.index.map(lambda x: x[:-3])
fig, ax = plt.subplots(1, 1, figsize = (6, 6))
for month in df['Month'].unique():
ax.plot(df[df['Month'] == month]['A'],
df[df['Month'] == month]['B'],
label = month)
plt.legend()
plt.show()
that gives this graph:
I've got two pandas series, one with a 7 day rolling mean for the entire year and another with monthly averages. I'm trying to plot them both on the same matplotlib figure, with the averages as a bar graph and the 7 day rolling mean as a line graph. Ideally, the line would be graph on top of the bar graph.
The issue I'm having is that, with my current code, the bar graph is showing up without the line graph, but when I try plotting the line graph first I get a ValueError: ordinal must be >= 1.
Here's what the series' look like:
These are first 15 values of the 7 day rolling mean series, it has a date and a value for the entire year:
date
2016-01-01 NaN
2016-01-03 NaN
2016-01-04 NaN
2016-01-05 NaN
2016-01-06 NaN
2016-01-07 NaN
2016-01-08 0.088473
2016-01-09 0.099122
2016-01-10 0.086265
2016-01-11 0.084836
2016-01-12 0.076741
2016-01-13 0.070670
2016-01-14 0.079731
2016-01-15 0.079187
2016-01-16 0.076395
This is the entire monthly average series:
dt_month
2016-01-01 0.498323
2016-02-01 0.497795
2016-03-01 0.726562
2016-04-01 1.000000
2016-05-01 0.986411
2016-06-01 0.899849
2016-07-01 0.219171
2016-08-01 0.511247
2016-09-01 0.371673
2016-10-01 0.000000
2016-11-01 0.972478
2016-12-01 0.326921
Here's the code I'm using to try and plot them:
ax = series_one.plot(kind="bar", figsize=(20,2))
series_two.plot(ax=ax)
plt.show()
Here's the graph that generates:
Any help is hugely appreciated! Also, advice on formatting this question and creating code to make two series for a minimum working example would be awesome.
Thanks!!
The problem is that pandas bar plots are categorical (Bars are at subsequent integer positions). Since in your case the two series have a different number of elements, plotting the line graph in categorical coordinates is not really an option. What remains is to plot the bar graph in numerical coordinates as well. This is not possible with pandas, but is the default behaviour with matplotlib.
Below I shift the monthly dates by 15 days to the middle of the month to have nicely centered bars.
import matplotlib.pyplot as plt
import numpy as np; np.random.seed(42)
import pandas as pd
t1 = pd.date_range("2018-01-01", "2018-12-31", freq="D")
s1 = pd.Series(np.cumsum(np.random.randn(len(t1)))+14, index=t1)
s1[:6] = np.nan
t2 = pd.date_range("2018-01-01", "2018-12-31", freq="MS")
s2 = pd.Series(np.random.rand(len(t2))*15+5, index=t2)
# shift monthly data to middle of month
s2.index += pd.Timedelta('15 days')
fig, ax = plt.subplots()
ax.bar(s2.index, s2.values, width=14, alpha=0.3)
ax.plot(s1.index, s1.values)
plt.show()
The problem might be the two series' indices are of very different scales. You can use ax.twiny to plot them:
ax = series_one.plot(kind="bar", figsize=(20,2))
ax_tw = ax.twiny()
series_two.plot(ax=ax_tw)
plt.show()
Output:
I have a dataframe which looks like this (left column is the index):
YYYY-MO-DD HH-MI-SS_SSS ATMOSPHERIC PRESSURE (hPa) mean
2016-11-07 14:00:00 1014.028782
2016-11-07 15:00:00 1014.034111
.... ....
2016-11-30 09:00:00 1006.516436
2016-11-30 10:00:00 1006.216156
Now I want to plot a colormap with this data - so I want to create an X (horizontal axis) to be just the dates:
2016-11-07, 2016-11-08,...,2016-11-30
and the Y (Vertical axis) to be the time:
00:00:00, 01:00:00, 02:00:00, ..., 23:00:00
And finally the Z (color map) to be the pressure data for each date and time [f(x,y)].
How can I arrange the data for this kind of plotting ?
Thank you !
With test data prepared like so:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
samples = 24 * 365
index = pd.date_range('2017-01-01', freq='1H', periods=samples)
data = pd.DataFrame(np.random.rand(samples), index=index, columns=['data'])
I would do something like this:
data = data.reset_index()
data['date'] = data['index'].apply(lambda x: x.date())
data['time'] = data['index'].apply(lambda x: x.time())
pivoted = data.pivot(index='time', columns='date', values='data')
fig, ax = plt.subplots(1, 1)
ax.imshow(pivoted, origin='lower', cmap='viridis')
plt.show()
Which produces:
To improve the axis labeling, this is a start:
ax.set_yticklabels(['{:%H:%M:%S}'.format(t) for t in data['time'].unique()])
ax.set_xticklabels(['{:%Y-%m-%d}'.format(t) for t in data['date'].unique()])
but you'll need to figure out how to choose how often a label appears with set_xticks() and set_yticks()
I have a DataFrame that looks like this when unstacked.
Start Date 2016-07-11 2016-07-12 2016-07-13
Period
0 1.000000 1.000000 1.0
1 0.684211 0.738095 NaN
2 0.592105 NaN NaN
I'm trying to plot it in Seaborn as a heatmap but it's giving me unintended results.
Here's my code:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.DataFrame(np.array(data), columns=['Start Date', 'Period', 'Users'])
df = df.fillna(0)
df = df.set_index(['Start Date', 'Period'])
sizes = df['Users'].groupby(level=0).first()
df = df['Users'].unstack(0).divide(sizes, axis=1)
plt.title("Test")
sns.heatmap(df.T, mask=df.T.isnull(), annot=True, fmt='.0%')
plt.tight_layout()
plt.savefig(table._v_name + "fig.png")
I want it so that text doesn't overlap and there aren't 6 heat legends on the side. Also if possible, how do I fix the date so that it only displays %Y-%m-%d?
While exact reproducible data is not available, consider below using posted snippet data. This example runs a pivot_table() to achieve the structure as posted with StartDates across columns. Overall, your heatmap possibly outputs the multiple color bars and overlapping figures due to the unstack() processing where you seem to be dividing by users (look into seaborn.FacetGrid to split). So below runs the df as is through heatmap. Also, an apply() re-formats datetime to specified need.
from io import StringIO
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
data = '''Period,StartDate,Value
0,2016-07-11,1.000000
0,2016-07-12,1.000000
0,2016-07-13,1.0
1,2016-07-11,0.684211
1,2016-07-12,0.738095
1,2016-07-13
2,2016-07-11,0.592105
2,2016-07-12
2,2016-07-13'''
df = pd.read_csv(StringIO(data))
df['StartDate'] = pd.to_datetime(df['StartDate'])
df['StartDate'] = df['StartDate'].apply(lambda x: x.strftime('%Y-%m-%d'))
pvtdf = df.pivot_table(values='Value', index=['Period'],
columns='StartDate', aggfunc=sum)
print(pvtdf)
# StartDate 2016-07-11 2016-07-12 2016-07-13
# Period
# 0 1.000000 1.000000 1.0
# 1 0.684211 0.738095 NaN
# 2 0.592105 NaN NaN
sns.set()
plt.title("Test")
ax = sns.heatmap(pvtdf.T, mask=pvtdf.T.isnull(), annot=True, fmt='.0%')
plt.tight_layout()
plt.show()
EDIT The missed dates 20140103 and 20140104 is expected, I don't want them to be patched auto.
And I don't want to use xs = pd.Series(data=range(len(ts)), index=pd.to_datetime(ts)) instead of xs = pd.Series(data=range(len(ts)), index=pd.to_datetime(ts)) since I want to use some operation as
xs = pd.Series(data=range(len(ts)), index=pd.to_datetime(ts))
xs['2014']
Out[24]:
2014-01-01 0
2014-01-02 1
2014-01-05 2
2014-01-06 3
2014-01-07 4
dtype: int64
which doesn't work for:
In [26]: xs = pd.Series(data=range(len(ts)), index=ts)
In [27]: xs['2014']
---------------------------------------------------------------------------
KeyError: '2014'
For example:
In [1]: import pandas as pd
In [2]: ts = ['20140101', '20140102', '20140105', '20140106', '20140107']
In [3]: xs = pd.Series(data=range(len(ts)), index=pd.to_datetime(ts))
In [4]: xs.plot()
Will plot this image, which add the extra data on 20140103, 20140104.
While I want to got a image like this:
import matplotlib.pyplot as plt
plt.plot(xs.values)
Thanks to euri10 for use_index = False
ts = ['20140101', '20140102', '20140105', '20140106', '20140107']
xs = pd.Series(data=range(len(ts)), index=pd.to_datetime(ts))
fig, ax = plt.subplots()
xs.plot(use_index=False)
ax.set_xticklabels(pd.to_datetime(ts))
ax.set_xticks(range(len(ts)))
fig.autofmt_xdate()
plt.show()
Couldn't test it but you may want to use use_index=False and/or xticks
Since you converted the dates to datetime, pandas is plotting the dates themselves on the x-axis. Since there are days that aren't included in the Series, pandas will leave the appropriate spaces on the x-axis where those points are. If, for some reason, you don't want this (for instance, if you are counting business days and the missing points are weekends), I think the solution that makes it most clear is to make the index not a date but a number (Day 1, Day 2, etc.). To my knowledge, pandas won't allow you to distort the axes.
import pandas as pd
ts = ['20140101', '20140102', '20140105', '20140106', '20140107']
xs = pd.Series(data=range(len(ts)), index=pd.to_datetime(ts))
print xs
#output
2014-01-01 0
2014-01-02 1
2014-01-05 2
2014-01-06 3
2014-01-07 4
You are missing 2 dates.