EDIT The missed dates 20140103 and 20140104 is expected, I don't want them to be patched auto.
And I don't want to use xs = pd.Series(data=range(len(ts)), index=pd.to_datetime(ts)) instead of xs = pd.Series(data=range(len(ts)), index=pd.to_datetime(ts)) since I want to use some operation as
xs = pd.Series(data=range(len(ts)), index=pd.to_datetime(ts))
xs['2014']
Out[24]:
2014-01-01 0
2014-01-02 1
2014-01-05 2
2014-01-06 3
2014-01-07 4
dtype: int64
which doesn't work for:
In [26]: xs = pd.Series(data=range(len(ts)), index=ts)
In [27]: xs['2014']
---------------------------------------------------------------------------
KeyError: '2014'
For example:
In [1]: import pandas as pd
In [2]: ts = ['20140101', '20140102', '20140105', '20140106', '20140107']
In [3]: xs = pd.Series(data=range(len(ts)), index=pd.to_datetime(ts))
In [4]: xs.plot()
Will plot this image, which add the extra data on 20140103, 20140104.
While I want to got a image like this:
import matplotlib.pyplot as plt
plt.plot(xs.values)
Thanks to euri10 for use_index = False
ts = ['20140101', '20140102', '20140105', '20140106', '20140107']
xs = pd.Series(data=range(len(ts)), index=pd.to_datetime(ts))
fig, ax = plt.subplots()
xs.plot(use_index=False)
ax.set_xticklabels(pd.to_datetime(ts))
ax.set_xticks(range(len(ts)))
fig.autofmt_xdate()
plt.show()
Couldn't test it but you may want to use use_index=False and/or xticks
Since you converted the dates to datetime, pandas is plotting the dates themselves on the x-axis. Since there are days that aren't included in the Series, pandas will leave the appropriate spaces on the x-axis where those points are. If, for some reason, you don't want this (for instance, if you are counting business days and the missing points are weekends), I think the solution that makes it most clear is to make the index not a date but a number (Day 1, Day 2, etc.). To my knowledge, pandas won't allow you to distort the axes.
import pandas as pd
ts = ['20140101', '20140102', '20140105', '20140106', '20140107']
xs = pd.Series(data=range(len(ts)), index=pd.to_datetime(ts))
print xs
#output
2014-01-01 0
2014-01-02 1
2014-01-05 2
2014-01-06 3
2014-01-07 4
You are missing 2 dates.
Related
I'm trying to plot a time series data but I have some problems.
I'm using this code:
from matplotlib import pyplot as plt
plt.figure('Fig')
plt.plot(data.index,data.Colum,'g', linewidth=2.0,label='Data')
And I get this:
But I dont want the interpolation between missing values!
How can I achieve this?
Since you are using pandas you could do something like this:
import pandas as pd
import matplotlib.pyplot as plt
pd.np.random.seed(1234)
idx = pd.date_range(end=datetime.today().date(), periods=10, freq='D')
vals = pd.Series(pd.np.random.randint(1, 10, size=idx.size), index=idx)
vals.iloc[4:8] = pd.np.nan
print vals
Here is an example of a column from a DataFrame with DatetimeIndex
2016-03-29 4.0
2016-03-30 7.0
2016-03-31 6.0
2016-04-01 5.0
2016-04-02 NaN
2016-04-03 NaN
2016-04-04 NaN
2016-04-05 NaN
2016-04-06 9.0
2016-04-07 1.0
Freq: D, dtype: float64
To plot it without dates where data is NaN you could do something like this:
fig, ax = plt.subplots()
ax.plot(range(vals.dropna().size), vals.dropna())
ax.set_xticklabels(vals.dropna().index.date.tolist());
fig.autofmt_xdate()
Which should produce a plot like this:
The trick here is to replace the dates with some range of values that do not trigger matplotlib's internal date processing when you call .plot method.
Later, when the plotting is done, replace the ticklabels with actual dates.
Optionally, call .autofmt_xdate() to make labels readable.
I have a pandas DataFrame with a TIMESTAMP column (not the index), and the timestamp format is as follows:
2015-03-31 22:56:45.510
I also have columns called CLASS and AXLES. I would like to compute the count of records for each month separately for each unique value of AXLES (AXLES can take an integer value between 3-12).
I came up with a combination of resample and groupby:
resamp = dfWIM.set_index('TIMESTAMP').groupby('AXLES').resample('M', how='count').CLASS
This seems to give me a multiIndex dataframe object, as shown below.
In [72]: resamp
Out [72]:
AXLES TIMESTAMP
3 2014-07-31 5517
2014-08-31 31553
2014-09-30 42816
2014-10-31 49308
2014-11-30 44168
2014-12-31 45518
2015-01-31 54782
2015-02-28 52166
2015-03-31 47929
4 2014-07-31 3147
2014-08-31 24810
2014-09-30 39075
2014-10-31 46857
2014-11-30 42651
2014-12-31 48282
2015-01-31 42708
2015-02-28 43904
2015-03-31 50033
From here, how can I access different components of this multiIndex object to create a bar plot for the following conditions?
show data when AXLES = 3
show x ticks in the Month - Year format (no days, hours, minutes etc.)
Thanks!
EDIT: Following code gives me the plot, but I could not change the xtick formatting to MM-YY.
resamp[3].plot(kind='bar')
EDIT 2 below is a code snippet that generates a small sample of the data similar to what I have:
dftest = {'TIMESTAMP':['2014-08-31','2014-09-30','2014-10-31'], 'AXLES':[3, 3, 3], 'CLASS':[5,6,7]}
dfTest = pd.DataFrame(dftest)
dfTest.TIMESTAMP = pd.to_datetime(pd.Series(dfTest.TIMESTAMP))
resamp = dfTest.set_index('TIMESTAMP').groupby('AXLES').resample('M', how='count').CLASS
resamp[3].plot(kind='bar')
EDIT 3:
Here below is the solution:
A.Plot the whole resampled dataframe (based on #Ako 's suggestion):
df = resamp.unstack(0)
df.index = [ts.strftime('%b 20%y') for ts in df.index]
df.plot(kind='bar', rot=0)
B.Plot an individual index from the resampled dataframe (based on #Alexander 's suggestion):
df = resamp[3]
df.index = [ts.strftime('%b 20%y') for ts in df.index]
df.plot(kind='bar', rot=0)
You could generate and set the labels explicitly using ax.xaxis.set_major_formatter with a ticker.FixedFormatter. This will allow you to keep your DataFrame's MultiIndex with timestamp values, while displaying the timestamps in the desired %m-%Y format:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import matplotlib.ticker as ticker
dftest = {'TIMESTAMP':['2014-08-31','2014-09-30','2014-10-31'], 'AXLES':[3, 3, 3], 'CLASS':[5,6,7]}
dfTest = pd.DataFrame(dftest)
dfTest.TIMESTAMP = pd.to_datetime(pd.Series(dfTest.TIMESTAMP))
resamp = dfTest.set_index('TIMESTAMP').groupby('AXLES').resample('M', how='count').CLASS
ax = resamp[3].plot(kind='bar')
ticklabels = [timestamp.strftime('%m-%Y') for axle, timestamp in resamp.index]
ax.xaxis.set_major_formatter(ticker.FuncFormatter(lambda x, pos: ticklabels[int(x)]))
plt.gcf().autofmt_xdate()
plt.show()
yields
The following should work, but it is difficult to test without some data.
Start by resetting your index to get access to the TIMESTAMP column. Then use strftime to format it to your desired text representation (e.g. mm-yy). Finally, reset the index back to AXLES and TIMESTAMP.
df = resamp.reset_index()
df['TIMESTAMP'] = [ts.strftime('%m-%y') for ts in df.TIMESTAMP]
df.set_index(['AXLES', 'TIMESTAMP'], inplace=True)
>>> df.xs(3, level=0).plot(kind='bar')
I have a simple pandas DataFrame with yearly values that I am plotting as a line graph:
import matplotlib.pyplot as plt
import pandas as pd
>>>df
a b
2010-01-01 9.7 9.0
2011-01-01 8.8 14.2
2012-01-01 8.4 7.6
2013-01-01 9.6 8.4
2014-01-01 8.2 5.5
The expected format for the X axis is to use no margins for the labels:
fig = plt.figure(0)
ax = fig.add_subplot(1, 1, 1)
df.plot(ax = ax)
But I would like to force the values to plot in the middle of the year range, like as done in excel:
I have tried setting the x axis margins:
ax.margins(xmargin = 1)
But can see no difference.
If you just want to move the dates, you could try adding this line at the end:
ax.set_xlim(ax.get_xlim()[0] - 0.5, ax.get_xlim()[1] + 0.5)
If you need to format the dates as well you could either modify your index or make changes in the plotted ticks like so:
(presuming that you df.index is a datetime object)
ax.set_xticklabels(df.index.to_series().apply(lambda x: x.strftime('%d/%m/%Y')))
This will format the dates to look like your Excel example.
Or you could change your index to look like you want and then call .plot():
df.index = df.index.to_series().apply(lambda x: x.strftime('%d/%m/%Y'))
print df.index.tolist()
['01/01/2010', '01/01/2011', '01/01/2012', '01/01/2013', '01/01/2014']
And, if you index is not datetime you need to convert it first like this:
df.index = pd.to_datetime(df.index)
I am plotting a simple chart and adding a number taken from a DataFrame, via plot.text(). The number is plotting as intended, but detail of its properties are also being displayed. I would like to suppress the display of properties and plot just the number.
The following code reproduces the issue.
import numpy as np
import pandas as pd
from pandas import *
import matplotlib.pyplot as plot
%matplotlib inline
rand = np.random.RandomState(1)
index = np.arange(8)
df = DataFrame(rand.randn(8, 1), index=index, columns=list('A'))
df['date'] = date_range('1/1/2014', periods=8)
print df
A date
0 1.624345 2014-01-01
1 -0.611756 2014-01-02
2 -0.528172 2014-01-03
3 -1.072969 2014-01-04
4 0.865408 2014-01-05
5 -2.301539 2014-01-06
6 1.744812 2014-01-07
7 -0.761207 2014-01-08
df2 = pd.DataFrame(index = ['1'], columns=['example'])
df2['example'] = 1.436792
print df2
example
1 1.436792
fig, ax = plot.subplots(figsize=(15,10))
df.plot(x='date', y='A')
plot.text(0.05, 0.95, df2['example'],
horizontalalignment='left',
verticalalignment='center',
transform = ax.transAxes)
The plot is showing index, name and dtype data along with the example number. Can anybody show how to suppress this detail and just plot the number? Any help much appreciated.
Just plot with the DataFrame values:
plot.text(0.05, 0.95, df2['example'].values,
horizontalalignment='left',
verticalalignment='center',
transform = ax.transAxes)
Or just set visible=True to hide everything:
plot.text(0.05, 0.95, df2['example'],
horizontalalignment='left',
verticalalignment='center',
transform = ax.transAxes, visible=False)
I create a pandas dataframe with a DatetimeIndex like so:
import datetime
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# create datetime index and random data column
todays_date = datetime.datetime.now().date()
index = pd.date_range(todays_date-datetime.timedelta(10), periods=14, freq='D')
data = np.random.randint(1, 10, size=14)
columns = ['A']
df = pd.DataFrame(data, index=index, columns=columns)
# initialize new weekend column, then set all values to 'yes' where the index corresponds to a weekend day
df['weekend'] = 'no'
df.loc[(df.index.weekday == 5) | (df.index.weekday == 6), 'weekend'] = 'yes'
print(df)
Which gives
A weekend
2014-10-13 7 no
2014-10-14 6 no
2014-10-15 7 no
2014-10-16 9 no
2014-10-17 4 no
2014-10-18 6 yes
2014-10-19 4 yes
2014-10-20 7 no
2014-10-21 8 no
2014-10-22 8 no
2014-10-23 1 no
2014-10-24 4 no
2014-10-25 3 yes
2014-10-26 8 yes
I can easily plot the A colum with pandas by doing:
df.plot()
plt.show()
which plots a line of the A column but leaves out the weekend column as it does not hold numerical data.
How can I put a "marker" on each spot of the A column where the weekend column has the value yes?
Meanwhile I found out, it is as simple as using boolean indexing in pandas. Doing the plot directly with pyplot instead of pandas' own plot wrapper (which is more convenient to me):
plt.plot(df.index, df.A)
plt.plot(df[df.weekend=='yes'].index, df[df.weekend=='yes'].A, 'ro')
Now, the red dots mark all weekend days which are given by df.weekend='yes' values.