I currently have a data frame dfB which looks like follows
My goal is to now plot the number of orders per week. I am not sure of how to go about grouping my column most_recent_order_date per week,though. How can I do this?
Convert the date column to datetime dtype if you haven't already.
dfB['most_recent_order_date'] = pd.to_datetime(dfB.most_recent_order_date)
Then use resample
dfB.resample('W-Mon', on='most_recent_order_date').sum()
Related
I have a dataframe with the following details: (first df in image)
I want to be able to add new rows to to df that calculate the column next_apt + days with the new timestamp that it was run. So I want it to look like this:
the other columns should be left as it it. just add the next next_apt with the newer timestamp that it was calculated and append the rows to the same df.
Use date_add and cast it to timestamp
This should work:
df1.withColumn("newDateWithTimestamp", F.date_add(F.col("next_apt"), F.col("days")).cast("timestamp")).show()
Input
Output
I have the below dataframe and i am trying to display how many rides per day.
But i can see only 1 column "near_penn" is considered as a column but "Date" is not.
c = df[['start day','near_penn','Date']]
c=c.loc[c['near_penn']==1]
pre_pandemic_df_new=pd.DataFrame()
pre_pandemic_df_new=c.groupby('Date').agg({'near_penn':'sum'})
print(pre_pandemic_df_new)
print(pre_pandemic_df_new.columns)
Why doesn't it consider "Date" as a column?
How can i make Date as a column of "pre_pandemic_df_new"?
Feel you can use to to_datetime method.
import pandas as pd
pre_pandemic_df_new["Date"]= pd.to_datetime(pre_pandemic_df_new["Date"])
Hope this works
Why doesn't it consider "Date" as a column?
Because the date is an index for your Dataframe.
How can I make Date as a column of "pre_pandemic_df_new"?
you can try this:
pre_pandemic_df_new.reset_index(level=['Date'])
df[['Date','near_penn']] = df[['Date_new','near_penn_new']]
Once you created your dataframe you can try this to add new columns to the end of the dataframe to test if it works before you make adjustments
OR
You can check for a value for the first row corresponding to the first "date" row.
These are the first things that came to my mind hope it helps
Hi I have a question regarding a resampling in Pandas.
In my data i have a date range from 31/12/2018 to 25/3/2019 with an interval of 7 days(e.g. 31/12/2018, 7/1/2019,14,2019 etc.), I want to resample the sales corresponding to those dates to a new range of dates, say 30/4/2020 to 24/9/2020 with a 7 day interval as previously used. Is there a way to do it using pandas resample function? As shown in the picture, I want to resample the sales from the dataframe on the left and populate the dataframe on the right.
Just to be clear: the left dataframe consists of 13 rows and the right consists of 22 rows.
lets try this:
df=pd.date_range(start='30/4/2020', end='24/9/2020')
The new data frame can be created from the old values, the 'index' is necessary because of the different length. If you wish you can apply df2.fillna(0),too.
df2= pd.DataFrame( {"date": pd.date_range("2020-04-30",freq="7D",periods=22), "sales":df1.sales},index=np.arange(22) )
Or without using 'index':
df2= pd.DataFrame( {"date": pd.date_range("2020-04-30",freq="7D",periods=22), "sales": np.concatenate([df1.sales.values,np.zeros(9)])})
MY DataFrame contains several data for each date. in my date column date has entered only for the first data of the day, for rest of the data of the day there is only sparse value. How can I fill all the all the unfilled date values with corresponding date?
Following is the snippet of my data frame
In Python
df['Date']=df['Date'].replace({'':np.nan}).ffill()
In R
library(zoo)
df$Date[df$Date=='']=NA
df$Date=na.locf(df$Date)
You can use fillna function.
# Say df is your dataframe
# To fill values forward use:
df.fillna(method='ffill')
# To fill values backward use:
df.fillna(method='bfill')
I am trying to combine 2 separate data series using one minute data to create a ratio then creating Open High Low Close (OHLC) files for the ratio for the entire day. I am bringing in two time series then creating associated dataframes using pandas. The time series have missing data so I am creating a datetime variable in each file then merging the files using the pd.merge approach on the datetime variable. Up this this point everything is going fine.
Next I group the data by the date using groupby. I then feed the grouped data to a for loop that calculates the OHLC and feeds that into a new dataframe for each respective day. However, the newly populated dataframe uses the date (from the grouping) as the dataframe index and the sorting is off. The index data looks like this (even when sorted):
01/29/2013
01/29/2014
01/29/2015
12/2/2013
12/2/2014
In short, the sorting is being done on only the month not the whole date as a date so it isn't chronological. My goal is to get it sorted by date so it would be chronological. Perhaps I need to create a new column in the dataframe referencing the index (not sure how). Or maybe there is a way to tell pandas the index is a date not just a value? I tried using various sort approaches including sort_index but since the dates are the index and don't seem to be treated as dates the sort functions sort by the month regardless of the year and thus my output file is out of order. In more general terms I am not sure how to reference/manipulate the actual unique identifier index in a pandas dataframe so any associated material would be useful.
Thank you
Years later...
This fixes the problem.
df is a dataframe
import pandas as pd
df.index = pd.to_datetime(df.index) #convert the index to a datetime object
df = df.sort_index() #sort the converted
This should get the sorting back into chronological order