How to plot with time series data in multindex dataframe? - python

I am trying to plot hypothetical yearly data using pandas or matlib.
I have the dataframe as below
Data Frame
Here month day are indexes and minute, value are simple columns. I have tried doing
df.loc['AMD'].plot(x=col[0], y=col[1])
where col[0] is minute and col1 is values.
But what happens is that it superimposes all 00:00 values of all days at one place. So basically the plot is for one day only and all values are superimposed.
Can you guide me how I can plot it properly for full year?

Related

How to show dates instead of months in plots for a column with datetime type in Python?

I have a dataframe with column : Week and Spend.
For instance :
Week Spend
2019-01-13 600
2018-12-30 400
The Week column is in datetime format.
When I create a line chart to see the trend, I get the plot with x axis labels as following:
I want the dates to be shown as is, instead of being aggregated as months. This is because I want to see the weekly change.

Pandas: Resampling Hourly Data for each Group

I have a dataframe that conains gps locations of vehicles recieved at various times in a day. For each vehicle, I want to resample hourly data such that I have the median report (according to the time stamp) for each hour of the day. For hours where there are no corresponding rows, I want a blank row.
I am using the following code:
for i,j in enumerate(list(df.id.unique())):
data=df.loc[df.id==j]
data['hour']=data['timestamp'].hour
data_grouped=data.groupby(['imo','hour']).median().reset_index()
data = data_grouped.set_index('hour').reindex(idx).reset_index() #idx is a list of integers from 0 to 23.
Since my dataframe has millions of id's it takes me a lot of time to iterate though all of them. Is there an efficient way of doing this?
Unlike Pandas reindex dates in Groupby, I have multiple rows for each hour, in addition to some hours having no rows at all.
Tested in last version of pandas, convert hour column to categoricals with all possible categories and then aggregate without loop:
df['hour'] = pd.Categorical(df['timestamp'].dt.hour, categories=range(24))
df1 = df.groupby(['id','imo','hour']).median().reset_index()

How can I group monthly over years in Python with pandas?

I have a dataset ranging from 2009 to 2019. The Dates include Years, months and days. I have two columns: one with dates and the other with values. I need to group my Dataframe monthly summing up the Values in the other column. At the moment what I am doing is setting the date column as index and using "df.resample('M').sum()".
The problem is that this is grouping my Dataframe monthly but for each different year (so I have 128 values in the "date" column). How can I group my data only for the 12 months without taking into consideration years?
Thank you very much in advance
I attached two images as example of the Dataset I have and the one I want to obtain.
Dataframe I have
Dataframe I want to obtain
use dt.month on your date column.
Example is
df.groupby(df['date'].dt.month).agg({'value':'sum'})

How to plot the correlation coefficient for every last 30 days of two dataframe columns over the past year and plot it? (pandas)

Through the following code, i get the 1 year history data for both eth and btc price, i know how to get the correlation of the two columns for the 12 months. But how to get past 30 days correlation coefficient for each value of 1 year data and plot it?
def get_price(pair):
df=binance.fetch_ohlcv(pair,timeframe="1d",limit=365)
df=pd.DataFrame(df).rename(columns={0:"date",1:"open",2:"high",3:"low",4:"close",5:"vol"})
df.set_index("date",inplace=True)
df.index=pd.to_datetime(btc.index,unit="ms")+pd.Timedelta(hours=8)
return df
eth=get_price("ETH/USDT")
btc=get_price("BTC/USDT")
btc["close"].corr(eth["close"])
i tried the following code but not sure if it is correct?
btc["corre"]=btc["close"].rolling(30).corr(eth["close"].rolling(30))
You can groupby month, deriving month from your index. You can then subset your groupby to the two variables you want to correlate.
btc.groupby(btc.index.month)[['Val1','Val2']].corr()

How can I groupby a date range and and sum up dates that fall within the range?

I am using pandas dataframe and matplotlib to create a time series plot.
I have some dates within my dataframe that I need to group by a weekly frequency. I already tried to groupby(dt.week).count(), however my dates span over multiple years and the ticks don't show anything other than the number of the week (1-52).
ct_weekly_freq = df2['authorTimestamp'].groupby(df2['authorTimestamp'].dt.week).count().plot(kind='line')
plt.show()
This would be an alright plot if it wasn't for the fact that it involves multiple years. I would like to figure out how to have it sum up the dates that fall within a weekly frequency from the starting date and have the tick marks show the date as well.

Categories

Resources