I would like to be able to change the x limits so it shows a time frame of my choice.
reproducible example:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# libraries for Data Download
import datetime # if you want to play with the dates
from pandas_datareader import data as pdr
import yfinance as yf
df = pdr.get_data_yahoo('ETH-USD', interval = '1d', period = "5y")
plt.figure(figsize=(24,10), dpi=140)
plt.grid()
df['Close'].plot()
df['Close'].ewm(span=50).mean().plot(c = '#4d00ff')
df['Close'].ewm(span=100).mean().plot(c = '#9001f0')
df['Close'].ewm(span=200).mean().plot(c = '#d102e8')
df['Close'].ewm(span=300).mean().plot(c = '#f101c2')
df['Close'].rolling(window=200).mean().plot(c = '#e80030')
plt.title('ETH-USD PLOT',fontsize=25, ha='center')
plt.legend(['C L O S E', 'EMA 50','EMA 100','EMA 200','EMA 300','MA 200', ])
# plt.xlim(['2016-5','2017-05']) # My attempt
plt.show()
when un-commenting the line above I get:
I would have liked '2016-5' to '2017-05' to have taken up the whole plot so I can see more detail.
It seems to me that you xlim works well, however, if I understand your question correctly, you also need to adjust ylim (let's say (0,100) from your graph, as it doesn't seem data within the time period specified goes past value of 100) to stretch data vertically, and so fill the graph efficiently.
try adding plt.ylim((0,100)) together with your commented code
Output:
with your plt.xlim(['2016-5','2017-05']) and plt.ylim((0,100))
with your plt.xlim(['2016-5','2017-05']) and plt.ylim((0,40))
as you can see, due to data variance in the period, you might lose some data information at later dates or have less clear image of movement at earlier dates.
Related
I want to overlay 30 plots, each of those is the Temperature of one day, to make at the end a comparison of the develop of the Temperature and how much differ from one day to another , the problem is that when i separate the data(separate the 30 days) in pandas, every day data set has different length,for example the first day has 54977 Temperature data , and the second day has 54988 ant the third also differ so the thing I want in resume is: overlay 30 plots and in the resultant graphic the x axis use the time ticks of the first day, and the other 29 plots just match to those ticks and reduce the data to a limit in the plot to make them all start from a point a finish in other it doesnt matter if some hours or data get lost, i just want to make something like this(see last image).
The code so far is this, im not very good in python so dont judge my long code
`
import pandas as pd
from datetime import date
import datetime as dt
import calendar
import numpy as np
import pylab as plt
import matplotlib.ticker as ticker
import seaborn as sns
>
datos = pd.read_csv("Jun2018T.txt", sep = ',', names=('Fecha', 'Hora', 'RADNETA', 'RADCORENT', 'RADCORSAL', 'RADINFENT', 'RADINFSAL', 'TEMP'))
>
datos['Hora'] = datos['Hora'].str[:9]
datos['Hora']
>
Dia01Jun2018 = datos[datos['Fecha'] == "2018-06-01"]
>
tiempo01=Dia01Jun2018['Hora']
temp01=Dia01Jun2018['TEMP']
>
imagen = plt.figure(figsize=(25,10))
plt.plot(tiempo01,temp01)
plt.xticks(np.arange(0, 54977, 7000)) #the number 54977 is the last data that the first day has, the second day has a different length an so on with the rest of the days
plt.xlabel("Tiempo (H:M:S)(Formato 24 Horas)")
plt.ylabel("Temperatura (K)")
plt.title("Día 01 Jun 2018")
plt.show()
imagen.savefig('D1JUN2018')
`
The code above repeats for every day, maybe with a cycle is more quickly but i don handle python very good.
And the result of this is this graph is the next one:
enter image description here
The graph that i want is this
enter image description here
Mi data is represented in this form
enter image description here
and this are the formats
enter image description here
if I understood your question right, that you want to plot all days in a single plot, you have togenerate one figure, plt.plot() all days before you finally plt.show() the image including all plots made before. Try something like shown below:
(as I don't know your data, I don't know if this code would work. the concept should be clear at least.)
import pandas as pd
from datetime import date
import datetime as dt
import calendar
import numpy as np
import pylab as plt
import matplotlib.ticker as ticker
import seaborn as sns
>
datos = pd.read_csv("Jun2018T.txt", sep = ',', names=('Fecha', 'Hora', 'RADNETA', 'RADCORENT', 'RADCORSAL', 'RADINFENT', 'RADINFSAL', 'TEMP'))
>
datos['Hora'] = datos['Hora'].str[:9]
>
imagen = plt.figure(figsize=(25,10))
for day in range(1,31):
dia = datos[datos['Fecha'] == "2018-06-"+(f"{day:02d}")]
tiempo= pd.to_datetime(dia['HORA'], format='%H:%M:%S').dt.time
temp= dia['TEMP']
plt.plot(tiempo, temp)
#plt.xticks(np.arange(0, 54977, 7000))
plt.xlabel("Tiempo (H:M:S)(Formato 24 Horas)")
plt.ylabel("Temperatura (K)")
plt.title("Jun 2018")
plt.show()
imagen.savefig('JUN2018')
For the second part of your question:
as your data is stored with an timestamp, you can transform it to pandas time objects. Using them for plots, the x-axis should not have an offset anymore. I've modified the tiempo =... assignment in the code above.
The x-tics should automatically be in time mode now.
I'm hoping to create a line graph which shows the changes to flowering and fruiting times (phenophases) from year to year. For each phenophase I'd like to plot the average Day of Year and, if possible, show the min and max for each year as an error bar. I've filtered down all the data I need in a few data frames, grouped it all in a sensible way, but I can't figure out how to get it all to plot. Here's a screen grab of where I'm at: Imgur
All the examples I've found adding error bars have been based on formulas or other equal amounts over/under, but in my case the max/min will be different so I'm not sure how to integrate that. Possible just create a list of each column's data and feed that to plot? I'm playing with that now but not getting far.
Also, if anyone has general suggestions as to better ways to present this data I'm all ears. I've looked into Gantt plots but didn't get far with them, as this seems a bit more straight-forward just using matplotlib. I'm happy to put some demo data or the rest of my notebook up if anyone thinks that would help.
Edit: Here's some sample data and the code from my notebook: Gist
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
%matplotlib inline
pd.set_option('display.max_columns', 40)
tick_spacing = 1
dfClean = df[['Site_Cluster', 'Species', 'Phenophase_Name',
'Phenophase_Status', 'Observation_Year', 'Day_of_Year']]
dfClean = dfClean[dfClean.Phenophase_Status == 1]
PhenoNames = ['Open flowers', 'Ripe fruits']
dfLakes = dfClean[(dfClean.Phenophase_Name.isin(PhenoNames))
& (dfClean.Site_Cluster == 'Lakes')
& (dfClean.Species == 'lapponica')]
dfLakesGrouped = dfLakes.groupby(['Observation_Year', 'Phenophase_Name'])
dfLakesReady = dfLakesGrouped.Day_of_Year.agg([np.min, np.mean, np.max]).round(0)
dfLakesReady = dfLakesReady.unstack()
print(dfLakesReady['mean'].plot())
Here's another answer:
from pandas import DataFrame, date_range, Timedelta
import numpy as np
from matplotlib import pyplot as plt
rng = date_range(start='2015-01-01', periods=5, freq='24H')
df = DataFrame({'y':np.random.normal(size=len(rng))}, index=rng)
y1 = df['y']
y2 = (y1*3)
sd1 = (y1*2)
sd2 = (y1*2)
fig,(ax1,ax2) = plt.subplots(2,1,sharex=True)
_ = y1.plot(yerr=sd1, ax=ax1)
_ = y2.plot(yerr=sd2, ax=ax2)
Output:
If I run the following, it appears to work as expected, but the y-axis is limited to the earliest and latest times in the data. I want it to show midnight to midnight. I thought I could do that with the code that's commented out. But when I uncomment it, I get the correct y-axis, yet nothing plots. Where am I going wrong?
from datetime import datetime
import matplotlib.pyplot as plt
data = ['2018-01-01 09:28:52', '2018-01-03 13:02:44', '2018-01-03 15:30:27', '2018-01-04 11:55:09']
x = []
y = []
for i in range(0, len(data)):
t = datetime.strptime(data[i], '%Y-%m-%d %H:%M:%S')
x.append(t.strftime('%Y-%m-%d')) # X-axis = date
y.append(t.strftime('%H:%M:%S')) # Y-axis = time
plt.plot(x, y, '.')
# begin = datetime.strptime('00:00:00', '%H:%M:%S').strftime('%H:%M:%S')
# end = datetime.strptime('23:59:59', '%H:%M:%S').strftime('%H:%M:%S')
# plt.ylim(begin, end)
plt.show()
Edit: I also noticed that the x-axis isn't right either. The data skips Jan 2, but I want that on the axis so the data is to scale.
This is a dramatically simplified version of code dealing with over a year's worth of data with over 2,500 entries.
If Pandas is available to you, consider this approach:
import pandas as pd
data = pd.to_datetime(data, yearfirst=True)
plt.plot(data.date, data.time)
_=plt.ylim(["00:00:00", "23:59:59"])
Update per comments
X-axis date formatting can be adjusted using the Locator and Formatter methods of the matplotlib.dates module. Locator finds the tick positions, and Formatter specifies how you want the labels to appear.
Sometimes Matplotlib/Pandas just gets it right, other times you need to call out exactly what you want using these extra methods. In this case, I'm not sure why those numbers are showing up, but this code will remove them.
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
f, ax = plt.subplots()
data = pd.to_datetime(data, yearfirst=True)
ax.plot(data.date, data.time)
ax.set_ylim(["00:00:00", "23:59:59"])
days = mdates.DayLocator()
d_fmt = mdates.DateFormatter('%m-%d')
ax.xaxis.set_major_locator(days)
ax.xaxis.set_major_formatter(d_fmt)
How can I format the x-axis so that the spacing between periods is "to scale". As in, the distance between 10yr and 30yr should be much larger than the distance between 1yr and 2yr.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import Quandl as ql
yield_ = ql.get("USTREASURY/YIELD")
today = yield_.iloc[-1,:]
month_ago = yield_.iloc[-1000,:]
df = pd.concat([today, month_ago], axis=1)
df.columns = ['today', 'month_ago']
df.plot(style={'today': 'ro-', 'month_ago': 'bx--'},title='Treasury Yield Curve, %');
plt.show()
I want my chart to look like this...
I think doing this while staying purely within Pandas might be tricky. You first need to create a new matplotlib figure and axe. The following might not work exactly but will give you a good idea.
df['years']=[1/12.,0.25,0.5,1,2,3,5,7,10,20,30]
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
df.plot(x='years',y='today',ax=ax,kind='scatter')
df.plot(x='years',y='month_ago',ax=ax,kind='scatter')
plt.show()
If you want your axe labels to look like your chart you'll also need to set the lower and upper limit of your axis so they look good and then do something like:
ax.set_xticklabels(list(df.index))
I have a question that sounds simple but it's driving me mad for some days. I have a historical time series closed in two lists: the first list is containing prices, let's say P = [1, 1.5, 1.3 ...] while the second list is containing the related dates, let's say D = [01/01/2010, 02/01/2010...]. What I would like to do is to plot SOME of these dates (when I say "some" is because the "best" result I got so far is to show all of them as tickers, so creating a black cloud of unreadable data in the x-axis) that, when you zoom in, are shown more in details. This picture is now having the progressive automated range made by Matplotlib:
Instead of 0, 200, 400 etc. I would like to have the dates values that are related to the data-point plotted. Moreover, when I zoom-in I get the following:
As well as I get the detail between 0 and 200 (20, 40 etc.) I would like to get the dates attached to the list.
I'm sure this is a simple problem to solve but I'm new to Matplotlib as well as to Python and any hint would be appreciated. Thanks in advance
Matplotlib has sophisticated support for plotting dates. I'd recommend the use of AutoDateFormatter and AutoDateLocator. They are even locale-specific, so they choose month-names according to your locale.
import matplotlib.pyplot as plt
from matplotlib.dates import AutoDateFormatter, AutoDateLocator
xtick_locator = AutoDateLocator()
xtick_formatter = AutoDateFormatter(xtick_locator)
ax = plt.axes()
ax.xaxis.set_major_locator(xtick_locator)
ax.xaxis.set_major_formatter(xtick_formatter)
EDIT
For use with multiple subplots, use multiple locator/formatter pairs:
import datetime
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.dates import AutoDateFormatter, AutoDateLocator, date2num
x = [datetime.datetime.now() + datetime.timedelta(days=30*i) for i in range(20)]
y = np.random.random((20))
xtick_locator = AutoDateLocator()
xtick_formatter = AutoDateFormatter(xtick_locator)
for i in range(4):
ax = plt.subplot(2,2,i+1)
ax.xaxis.set_major_locator(xtick_locator)
ax.xaxis.set_major_formatter(xtick_formatter)
ax.plot(date2num(x),y)
plt.show()
You can do timeseries plot with pandas
For detail refer this : http://pandas.pydata.org/pandas-docs/dev/timeseries.html and
http://pandas.pydata.org/pandas-docs/dev/generated/pandas.Series.plot.html
import pandas as pd
DateStrList = ['01/01/2010','02/01/2010']
P = [2,3]
D = pd.Series([pd.to_datetime(date) for date in DateStrList])
series =pd.Series(P, index=D)
pd.Series.plot(series)
import matplotlib.pyplot as plt
import pandas
pandas.TimeSeries(P, index=D).plot()
plt.show()