I am plotting the following pandas MultiIndex DataFrame:
print(log_returns_weekly.head())
AAPL MSFT TSLA FB GOOGL
Date Date
2016 1 -0.079078 0.005278 -0.155689 0.093245 0.002512
2 -0.001288 -0.072344 0.003811 -0.048291 -0.059711
3 0.119746 0.082036 0.179948 0.064994 0.061744
4 -0.150731 -0.102087 0.046722 0.030044 -0.074852
5 0.069314 0.067842 -0.075598 0.010407 0.056264
with the first sub-index representing the year, and the second one the week from that specific year.
This is simply achieved via the pandas plot() method; however, as seen below, the x-axis will not be in a (year, week) format i.e. (2016, 1), (2016, 2) etc. Instead, it simply shows 'Date,Date' - does anyone therefore know how I can overcome this issue?
log_returns_weekly.plot(figsize(8,8))
You need to convert your multiindex to single one and add a day, so it would be like this: 2016-01-01.
log1 = log_returns_weekly.set_index(log_returns_weekly.index.map(lambda x: pd.datetime(*x,1)))
log1.plot()
Related
I have a pandas dataframe, where one column contains a string for the quarter and year in the following format: Q12019
My Question: How do I convert this into datetime format?
You can use Pandas PeriodIndex to accomplish this. Just reformat your quarters column to the expected format %Y-%q (with some help from regex, move the year to the front):
reformatted_quarters = df['QuarterYear'].str.replace(r'(Q\d)(\d+)', r'\2\1')
print(reformatted_quarters)
This prints:
0 2019Q1
1 2018Q2
2 2019Q4
Name: QuarterYear, dtype: object
Then, feed this result to PeriodIndex to get the datetime format. Use 'Q' to specify a quarterly frequency:
datetimes = pd.PeriodIndex(reformatted_quarters, freq='Q').to_timestamp()
print(datetimes)
This prints:
DatetimeIndex(['2019-01-01', '2018-04-01', '2019-10-01'], dtype='datetime64[ns]', name='Quarter', freq=None)
Note: Pandas PeriodIndex functionality experienced a regression in behavior (documented here), so for Pandas versions greater than 0.23.4, you'll need to use reformatted_quarters.values instead:
datetimes = pd.PeriodIndex(reformatted_quarters.values, freq='Q').to_timestamp()
(quarter) => new Date(quarter.slice(-4), 3 * (quarter.slice(1, 2) - 1), 1)
This will give you the start of every quarter (e.g. q42019 will give 2019-10-01).
You should probably include some validation since it will just keep rolling over months (e.g. q52019 = q12020 = 2020-01-01)
I have a time-series where I want to get the sum the business day values for each week. A snapshot of the dataframe (df) used is shown below. Note that 2017-06-01 is a Friday and hence the days missing represent the weekend
I use resample to group the data by each week, and my aim is to get the sum. When I apply this function however I get results which I can't justify. I was expecting in the first row to get 0 which is the sum of the values contained in the first week, then 15 for the next week etc...
df_resampled = df.resample('W', label='left').sum()
df_resampled.head()
Can someone explain to me what am I missing since it seems like I have not understood the resampling function correctly?
I have data of days from all days from 2004-01-01 until 2015-31-12 and want to plot the maximum and the minimun value of each day.
The original data is on df and df['Day'] is a colum with day and month.
So, I created two new dataframes:
dfmin=df.groupby('Day').agg('min')
dfmax=df.groupby('Day').agg('max')
The new dataframes has one row for each day of the year, considering the max and the minimun value for each day in the range.
I want to label the axis with each day, but without specify any year.
I already saw this questions and this documentation but did not find the answer.
For example, I did:
observation_dates = np.arange('2013-01-01', '2014-01-01', dtype='datetime64[D]')
plt.plot(dfmin.index, dfmin.Data_Value)
plt.plot(dfmin.index, dfmax.Data_Value)
...
And created the following chart:
But I would like to do something like:
observation_dates = np.arange(' -01-01', ' -01-01', dtype='datetime64[D]')
...
So the axis would be labeled just with the days, but without specifying any year
EDIT TO CLARIFY A LITTLE MORE:
After group the data by days, I got the following dataframe (represented by the blue line at the chart):
DAY Data_Value
01-01 -160
01-02 -267
01-03 -267
I just want to plot this values using dates at x-axis
I have this pandas dataframe:
ISIN MATURITY PRICE
0 AR121489 Corp 29/09/2019 5.300
1 AR714081 Corp 29/12/2019 7.500
2 AT452141 Corp 29/06/2020 2.950
3 QJ100923 Corp 29/09/2020 6.662
My question is if there exists a way to interpolate a date in the column "MATURITY" and get the price value of that date. For example, If I select the date 18/11/2019, the value of the price on that date should be between 5.300 and 7.500. I don't know if what I am asking is possible but thank you so much for taking your time to read it and trying to help me.
What you can do if you wanted a daily frequency interpolated is first create a daily frequency range with your start and end-dates.
new_df = pd.DataFrame()
new_df["MATURITY"] = pd.date_range(start='29/09/2019', end='29/09/2020')
new_df = pd.concat([new_df,old_df], join="outer", axis=1)
new_df["PRICE"] = new_df["PRICE"].interpolate(method = "linear")
I would treat the dates as datetime objects and for interpolation convert the date from datetime object to some time-interval value i.e. either seconds since 20XX-XX-XX 00:00:00 or days and the same I would do for the output timemoments. After that the interpolation works also with NumPy interpolate method.
In matplotlib.dates there is a method date2num and also num2date worth to try.
I have dataframes of 1 minute bars going back years (the datetime is the index). I need to get a set of bars covering an irregular (non-consecutive) long list of dates.
For daily bars, I could do something like this:
datelist = ['20140101','20140205']
dfFiltered = df[df.index.isin(datelist)]
However if I try that on 1 minute bar data, it only gives me the bars with time 00:00:00, e.g. in this case it gives me two bars for 20140101 00:00:00 and 20140205 00:00:00.
My actual source df will look something like:
df1m = pd.DataFrame(index=pd.date_range('20100101', '20140730', freq='1min'),
data={'open':3, 'high':4, 'low':1, 'close':2}
).between_time('00:00:00', '07:00:00')
Is there any better way to get all the bars for each day in the list than looping over the list? Thanks in advance.
One way is to add a date column based on the index
df1m['date'] = pd.to_datetime(df1m.index.date)
Then use that column when filtering
datelist = ['20140101','20140205']
df1m[df1m['date'].isin(datelist)]