I have a problem in plotting time series data, created using pandas date_range and period_range. The former works, but the latter does not. To illustrate the problem, consider the following
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# random numbers
N = 100
z = np.random.randn(N)
ts = pd.DataFrame({'Y': z, 'X': np.cumsum(z)})
# 'date_range' is used
month_date = pd.date_range('1978-02', periods=N, freq='M')
df_date = ts.set_index(month_date)
# 'period_range' is used
month_period = pd.period_range('1978-02', periods=N, freq='M')
df_period = ts.set_index(month_period)
# plot
plt.plot(df_date);plt.show()
plt.plot(df_period);plt.show()
plt.plot(df_date) gives a nice figure, whereas plt.plot(df_period) generates the following error, which I do not understand:
ValueError: view limit minimum 0.0 is less than 1 and is an invalid Matplotlib date value. This often happens if you pass a non-datetime value to an axis that has datetime units<Figure size 432x288 with 1 Axes>
Is this an expected result? What am I missing here?
BTW, df_date.plot() and df_period.plot() both work fine, causing no problem.
Thanks in advance.
The indexes of the two dateframes are of different type:
print(type(df_date)) # pandas.core.indexes.datetimes.DatetimeIndex
print(type(df_period)) # pandas.core.indexes.period.PeriodIndex
Matplotlib does not know how to plot a PeriodIndex.
You may convert the PeriodIndex to a DatetimeIndex and plot that one instead,
plt.plot(df_period.to_timestamp())
Related
I have a CSV file which has been generated and altered to the current form;
Quick snapshot of sample Data
I want to be able to plot a graph that will have the X along the X axis as normal, and the Y axis to be a frequency 'True' values i.e (1's) So that I can visualise the relationship between time and frequency of the event occurring.
Thus far I have attempted a melt and using value_counts but they seem to give absolute not relative to the X value. I understand the data will likely need sorting additionally before plotting but I'm not sure the best way to go about this.
Many thanks for any help.
You can either plot a histogram which probably would work. or you can do a groupby 'x' to find aggregate of the sum of 'y' with code s shown below:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
data = np.array([[1988,1988,1988,1989,1990,1990,1991,1991], [0,1,1,0,1,1,0,0]])
df = pd.DataFrame(data.T, columns = ['x', 'y'])
df=df.groupby(['x']).sum()
print(df)
I want to reduce the xlim label because i'm using datetime information and that take long space of the xlim. The problem it's when i want to read that
So i need some like to scale that, i think
dates = pd.read_csv("EURUSDtest.csv")
dates = dates["Date"]+" " + dates["Time"]
plt.title("EUR/USD")
plt.plot(dates, data_pred)
plt.xticks(rotation="vertical")
plt.tick_params(labelsize=10)
plt.plot(forecasting)
The problem...
IIUC: You need to convert the dates column to pandas datetime type by calling pd.to_datetime.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# To reproduce the issue you have lets create a date column as string
df = pd.DataFrame({"Dates":pd.date_range(start='2018-1-1', end='2019-1-1', freq='15MIN').strftime("%m-%d-%Y %H-%M-%S")})
# Convert the date string to date type
df["Dates"] = pd.to_datetime(df["Dates"])
# Add column to assign some dummy values
df = df.assign(VAL=np.linspace(10, 110, len(df)))
# Plot the graph
# Now the graph automatically adjusts the XLIM based on the size of the graph
plt.title("eur/usd")
plt.plot(df["Dates"], df["VAL"])
plt.xticks(rotation="vertical")
plt.show()
However if you need to further control xlim to your needs you need to go through matplotlib tutorials.
I am plotting time series using pandas .plot() and want to see every month shown as an x-tick.
Here is the dataset structure
Here is the result of the .plot()
I was trying to use examples from other posts and matplotlib documentation and do something like
ax.xaxis.set_major_locator(
dates.MonthLocator(revenue_pivot.index, bymonthday=1,interval=1))
But that removed all the ticks :(
I also tried to pass xticks = df.index, but it has not changed anything.
What would be the rigth way to show more ticks on x-axis?
No need to pass any args to MonthLocator. Make sure to use x_compat in the df.plot() call per #Rotkiv's answer.
import pandas as pd
import numpy as np
import matplotlib.pylab as plt
import matplotlib.dates as mdates
df = pd.DataFrame(np.random.rand(100,2), index=pd.date_range('1-1-2018', periods=100))
ax = df.plot(x_compat=True)
ax.xaxis.set_major_locator(mdates.MonthLocator())
plt.show()
formatted x-axis with set_major_locator
unformatted x-axis
You could also format the x-axis ticks and labels of a pandas DateTimeIndex "manually" using the attributes of a pandas Timestamp object.
I found that much easier than using locators from matplotlib.dates which work on other datetime formats than pandas (if I am not mistaken) and thus sometimes show an odd behaviour if dates are not converted accordingly.
Here's a generic example that shows the first day of each month as a label based on attributes of pandas Timestamp objects:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# data
dim = 8760
idx = pd.date_range('1/1/2000 00:00:00', freq='h', periods=dim)
df = pd.DataFrame(np.random.randn(dim, 2), index=idx)
# select tick positions based on timestamp attribute logic. see:
# https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Timestamp.html
positions = [p for p in df.index
if p.hour == 0
and p.is_month_start
and p.month in range(1, 13, 1)]
# for date formatting, see:
# https://docs.python.org/2/library/datetime.html#strftime-and-strptime-behavior
labels = [l.strftime('%m-%d') for l in positions]
# plot with adjusted labels
ax = df.plot(kind='line', grid=True)
ax.set_xlabel('Time (h)')
ax.set_ylabel('Foo (Bar)')
ax.set_xticks(positions)
ax.set_xticklabels(labels)
plt.show()
yields:
Hope this helps!
The right way to do that described here
Using the x_compat parameter, it is possible to suppress automatic tick resolution adjustment
df.A.plot(x_compat=True)
If you want to just show more ticks, you can also dive deep into the structure of pd.plotting._converter:
dai = ax.xaxis.minor.formatter.plot_obj.date_axis_info
dai['fmt'][dai['fmt'] == b''] = b'%b'
After plotting, the formatter is a TimeSeries_DateFormatter and _set_default_format has been called, so self.plot_obj.date_axis_info is not None. You can now manipulate the structured array .date_axis_info to be to your liking, namely contain less b'' and more b'%b'
Remove tick labels:
ax = df.plot(x='date', y=['count'])
every_nth = 10
for n, label in enumerate(ax.xaxis.get_ticklabels()):
if n % every_nth != 0:
label.set_visible(False)
Lower every_nth to include more labels, raise to keep fewer.
I'm trying to plot two series and have the x-axis ticks labeled every 5 years. If I index the data with a PeriodIndex for some reason I get ticks every 10 years. If I use a list of integers to index, then it works fine. Is there a way to get the right tick labels with a PeriodIndex?
import pandas as pd
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
np.random.seed(0)
idx = pd.PeriodIndex(range(2000,2021),freq='A')
data = pd.DataFrame(np.random.normal(size=(len(idx),2)),index=idx)
fig,ax = plt.subplots(1,2,figsize=(10,5))
data.loc[:,0].plot(ax=ax[0])
data.iloc[9:,1].plot(ax=ax[1])
ax[1].xaxis.set_major_locator(mpl.ticker.MultipleLocator(5))
plt.show()
idx = range(2000,2021)
The workaround I know is to convert the PeriodIndex to DatetimeIndex and then to an array of datetime.datetimeobjects and use plt.plot_date() to plot and mpl.dates.YearLocator(5) to format. This seems overly complicated.
I want to create a plot just like this:
The code:
P.fill_between(DF.start.index, DF.lwr, DF.upr, facecolor='blue', alpha=.2)
P.plot(DF.start.index, DF.Rt, '.')
but with dates in the x axis, like this (without bands):
the code:
P.plot_date(DF.start, DF.Rt, '.')
the problem is that fill_between fails when x values are date_time objects.
Does anyone know of a workaround? DF is a pandas DataFrame.
It would help if you show how df is defined. What does df.info() report? This will show us the dtypes of the columns.
There are many ways that dates can be represented: as strings, ints, floats, datetime.datetime, NumPy datetime64s, Pandas Timestamps, or Pandas DatetimeIndex. The correct way to plot it depends on what you have.
Here is an example showing your code works if df.index is a DatetimeIndex:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import stats
index = pd.date_range(start='2000-1-1', end='2015-1-1', freq='M')
N = len(index)
poisson = (stats.poisson.rvs(1000, size=(N,3))/100.0)
poisson.sort(axis=1)
df = pd.DataFrame(poisson, columns=['lwr', 'Rt', 'upr'], index=index)
plt.fill_between(df.index, df.lwr, df.upr, facecolor='blue', alpha=.2)
plt.plot(df.index, df.Rt, '.')
plt.show()
If the index has string representations of dates, then (with Matplotlib version 1.4.2) you would get a TypeError:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import stats
index = pd.date_range(start='2000-1-1', end='2015-1-1', freq='M')
N = len(index)
poisson = (stats.poisson.rvs(1000, size=(N,3))/100.0)
poisson.sort(axis=1)
df = pd.DataFrame(poisson, columns=['lwr', 'Rt', 'upr'])
index = [item.strftime('%Y-%m-%d') for item in index]
plt.fill_between(index, df.lwr, df.upr, facecolor='blue', alpha=.2)
plt.plot(index, df.Rt, '.')
plt.show()
yields
File "/home/unutbu/.virtualenvs/dev/local/lib/python2.7/site-packages/numpy/ma/core.py", line 2237, in masked_invalid
condition = ~(np.isfinite(a))
TypeError: Not implemented for this type
In this case, the fix is to convert the strings to Timestamps:
index = pd.to_datetime(index)
Regarding the error reported by chilliq:
TypeError: ufunc 'isfinite' not supported for the input types, and the inputs
could not be safely coerced to any supported types according to the casting
rule ''safe''
This can be produced if the DataFrame columns have "object" dtype when using fill_between. Changing the example column types and then trying to plot, as follows, results in the error above:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import stats
index = pd.date_range(start='2000-1-1', end='2015-1-1', freq='M')
N = len(index)
poisson = (stats.poisson.rvs(1000, size=(N,3))/100.0)
poisson.sort(axis=1)
df = pd.DataFrame(poisson, columns=['lwr', 'Rt', 'upr'], index=index)
dfo = df.astype(object)
plt.fill_between(df0.index, df0.lwr, df0.upr, facecolor='blue', alpha=.2)
plt.show()
From dfo.info() we see that the column types are "object":
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 180 entries, 2000-01-31 to 2014-12-31
Freq: M
Data columns (total 3 columns):
lwr 180 non-null object
Rt 180 non-null object
upr 180 non-null object
dtypes: object(3)
memory usage: 5.6+ KB
Ensuring that the DataFrame has numerical columns will solve the problem. To do this we can use pandas.to_numeric to convert, as follows:
dfn = dfo.apply(pd.to_numeric, errors='ignore')
plt.fill_between(dfn.index, dfn.lwr, dfn.upr, facecolor='blue', alpha=.2)
plt.show()
I got similar error while using fill_between:
ufunc 'bitwise_and' not supported
However, in my case the cause of error was rather stupid. I was passing color parameter but without explicit argument name which caused it to be #4 parameter called where. So simply making sure keyword parameters has key solved the issue:
ax.fill_between(xdata, highs, lows, color=color, alpha=0.2)
I think none of the answers addresses the original question, they all change it a little bit.
If you want to plot timdeltas you can use this workaround
ax = df.Rt.plot()
x = ax.get_lines()[0].get_xdata().astype(float)
ax.fill_between(x, df.lwr, df.upr, color="b", alpha=0.2)
plt.show()
This work sin your case. In general, the only caveat is that you always need to plot the index using pandas and then get the coordinates from the artist. I am sure that by looking at pandas code, one can actually find how they plot the timedeltas. Then one can apply that to the code, and the first plot is not needed anymore.