I have data of all days through 5 years. I wnat to plot the maxmium value of each day. So, I grouped the data to a DataFrame with 365 rows:
dfmax=df.groupby('Day').agg('max')
And got the following dataframe:
Data_Value
Day
01-01 150
01-02 180
01-03 160
I want to plot labeling the x-axis with the dates, but the code:
plt.plot(dfmax.index,dfmax.Data_Value)
Returned an error: ValueError: could not convert string to float: '12-31'
I know that I have to convert index to datetime format, and saw some answers, but did not find a way to do it without sepcify an year at the label. I want the labes at month-day format
EDIT:
An example to reproduce the error:
df=pd.DataFrame({'Day':['01-01', '02-01','03-01'], 'Value':[1,6,3]})
plt.figure()
plt.plot(df.Day, df.Value)
plt.show()
Related
I have a dataframe with column : Week and Spend.
For instance :
Week Spend
2019-01-13 600
2018-12-30 400
The Week column is in datetime format.
When I create a line chart to see the trend, I get the plot with x axis labels as following:
I want the dates to be shown as is, instead of being aggregated as months. This is because I want to see the weekly change.
I have a dataset of S&P500 historical prices with the date, the price and other data that i don't need now to solve my problem.
Date Price
0 1981.01 6.19
1 1981.02 6.17
2 1981.03 6.24
3 1981.04 6.25
. . .
and so on till 2020
The date is a float with the year, a dot and the month.
I tried to plot all historical prices with matplotlib.pyplot as plt.
plt.plot(df["Price"].tail(100))
plt.title("S&P500 Composite Historical Data")
plt.xlabel("Date")
plt.ylabel("Price")
This is the result. I used df["Price"].tail(100) so you can see better the difference between the first and the second graph(You are going to see in a sec).
But then I tried to set the index from the one before(0, 1, 2 etc..) to the df["Date"] column in the DataFrame in order to see the date in the x axis.
df = df.set_index("Date")
plt.plot(df["Price"].tail(100))
plt.title("S&P500 Composite Historical Data")
plt.xlabel("Date")
plt.ylabel("Price")
This is the result, and it's quite disappointing.
I have the Date where it should be in the x axis but the problem is that the graph is different from the one before which is the right one.
If you need the dataset to try out the problem here you can find it.
It is called U.S. Stock Markets 1871-Present and CAPE Ratio.
Hope you've understood everything.
Thanks in advance
UPDATE
I found something that could cause the problem. If you look in depth at the date you can see that in month #10 each is written as a float(in the original dataset) like this: example Year:1884 1884.1. The problem occur when you use pd.to_datetime() to transform the Date float series to a Datetime. So the problem could be that the date in the month #10, when converted into a Datetime, become: (example from before) 1884-01-01 which is the first month in the year and it has an effect on the final plot.
SOLUTION
Finally, I solved my problem!
Yes, the error was the one I explain in the UPDATE paragraph, so I decided to add a 0 as a String where the lenght of the Date (as a string) is 6 in order to change, for example: 1884.1 ==> 1884.10
df["len"] = df["Date"].apply(len)
df["Date"] = df["Date"].where(df["len"] == 7, df["Date"] + "0")
Then i drop the len column i've just created.
df.drop(columns="len", inplace=True)
At the end I changed the "Date" to a Datetime with pd.to_datetime
df["Date"] = pd.to_datetime(df["Date"], format='%Y.%m')
df = df.set_index("Date")
And then I plot
df["Price"].tail(100).plot()
plt.title("S&P500 Composite Historical Data")
plt.xlabel("Date")
plt.ylabel("Price")
plt.show()
The easiest way would be to transform the date into an actual datetime index. This way matplotlib will automatically pick it up and plot it accordingly. For example, given your date format, you could do:
df["Date"] = pd.to_datetime(df["Date"].astype(str), format='%Y.%m')
df = df.set_index("Date")
plt.plot(df["Price"].tail(100))
Currently, the first plot you showed is actually plotting the Price column against the index, which seems to be a regular range index from 0 - 1800 and something. You suggested your data starts in 1981, so although each observation is evenly spaced on the x axis (it's spaced at an interval of 1, which is the jump from one index value to the next). That's why the chart looks reasonable. Yet the x-axis values don't.
Now when you set the Date (as float) to be the index, note that you're not evenly covering the interval between, for example, 1981 and 1982. You have evenly spaced values from 1981.1 - 1981.12, but nothing from 1981.12 - 1982. That's why the second chart is also plotted as expected. Setting the index to a DatetimeIndex as described above should remove this issue, as Matplotlib will know how to evenly space the dates along the x-axis.
I think your problem is that your Date is of float type and taking it as an x-axis does exactly what is expected for taking an array of the kind ([2012.01, 2012.02, ..., 2012.12, 2013.01....]) as x-axis. You might convert the Date column to a DateTimeIndex first and then use the built-in pandas plot method:
df["Price"].tail(100).plot()
It is not a good idea to treat df['Date'] as float. It should be converted into pandas datetime64[ns]. This can be achieved using pandas pd.to_datetime method.
Try this:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('ie_data.csv')
df=df[['Date','Price']]
df.dropna(inplace=True)
#converting to pandas datetime format
df['Date'] = df['Date'].astype(str).map(lambda x : x.split('.')[0] + x.split('.')[1])
df['Date'] = pd.to_datetime(df['Date'], format='%Y%m')
df.set_index(['Date'],inplace=True)
#plotting
df.plot() #full data plot
df.tail(100).plot() #plotting just the tail
plt.title("S&P500 Composite Historical Data")
plt.xlabel("Date")
plt.ylabel("Price")
plt.show()
Output:
From a whole data set, I need to plot the maximum & minimum temperatures for just the months of January and July. Column 2 is the date, and columns 8 and 9 are the 'TMAX' and 'TMIN.' This is what I have so far:
napa3=pd.read_csv('MET 51 Lab #10 data (Pandas, NAPA).csv',usecols=[2,8,9])
time2=pd.to_datetime(napa3['DATE'],format='%Y-%m-%d')
imon=time2.dt.month
jj=(imon==1)&(imon==7)
data_jj=napa3.loc[jj]
data_jj.plot.hist(title='TMAX & TMIN for January and July')
plt.show()
I keep getting the error: "TypeError: no numeric data to plot"
Why is this?
The problem can raise because the dates are saved as an "object" or a string.
However, I can't see that you have created dataframe?! you do read_csv but you do not make dataframe out of that:
dnapa3 = pd.DataFrame(napa3)
then repeat converting your time data and check:
print(dnapa3.dtypes)
after you became sure that your requested column values are string or object you can change the values of that column to floats:
dnapa3['your_temp_column_label'] = dnapa3['your_date_column_label'].astype(float)
This should work hopefully. Or silmilarly :
dnapa3['your_tem_column_label'] =pd.to_numeric(dnapa3['your_date_column_label'], errors='coerce')
I have data of days from all days from 2004-01-01 until 2015-31-12 and want to plot the maximum and the minimun value of each day.
The original data is on df and df['Day'] is a colum with day and month.
So, I created two new dataframes:
dfmin=df.groupby('Day').agg('min')
dfmax=df.groupby('Day').agg('max')
The new dataframes has one row for each day of the year, considering the max and the minimun value for each day in the range.
I want to label the axis with each day, but without specify any year.
I already saw this questions and this documentation but did not find the answer.
For example, I did:
observation_dates = np.arange('2013-01-01', '2014-01-01', dtype='datetime64[D]')
plt.plot(dfmin.index, dfmin.Data_Value)
plt.plot(dfmin.index, dfmax.Data_Value)
...
And created the following chart:
But I would like to do something like:
observation_dates = np.arange(' -01-01', ' -01-01', dtype='datetime64[D]')
...
So the axis would be labeled just with the days, but without specifying any year
EDIT TO CLARIFY A LITTLE MORE:
After group the data by days, I got the following dataframe (represented by the blue line at the chart):
DAY Data_Value
01-01 -160
01-02 -267
01-03 -267
I just want to plot this values using dates at x-axis
I am plotting the following pandas MultiIndex DataFrame:
print(log_returns_weekly.head())
AAPL MSFT TSLA FB GOOGL
Date Date
2016 1 -0.079078 0.005278 -0.155689 0.093245 0.002512
2 -0.001288 -0.072344 0.003811 -0.048291 -0.059711
3 0.119746 0.082036 0.179948 0.064994 0.061744
4 -0.150731 -0.102087 0.046722 0.030044 -0.074852
5 0.069314 0.067842 -0.075598 0.010407 0.056264
with the first sub-index representing the year, and the second one the week from that specific year.
This is simply achieved via the pandas plot() method; however, as seen below, the x-axis will not be in a (year, week) format i.e. (2016, 1), (2016, 2) etc. Instead, it simply shows 'Date,Date' - does anyone therefore know how I can overcome this issue?
log_returns_weekly.plot(figsize(8,8))
You need to convert your multiindex to single one and add a day, so it would be like this: 2016-01-01.
log1 = log_returns_weekly.set_index(log_returns_weekly.index.map(lambda x: pd.datetime(*x,1)))
log1.plot()