How to fill missing dates in pandas DataFrame? - python

MY DataFrame contains several data for each date. in my date column date has entered only for the first data of the day, for rest of the data of the day there is only sparse value. How can I fill all the all the unfilled date values with corresponding date?
Following is the snippet of my data frame

In Python
df['Date']=df['Date'].replace({'':np.nan}).ffill()
In R
library(zoo)
df$Date[df$Date=='']=NA
df$Date=na.locf(df$Date)

You can use fillna function.
# Say df is your dataframe
# To fill values forward use:
df.fillna(method='ffill')
# To fill values backward use:
df.fillna(method='bfill')

Related

How to add a column (interger type) to a column that is a date? Pyspark/ Spark dataframe?

I have a dataframe with the following details: (first df in image)
I want to be able to add new rows to to df that calculate the column next_apt + days with the new timestamp that it was run. So I want it to look like this:
the other columns should be left as it it. just add the next next_apt with the newer timestamp that it was calculated and append the rows to the same df.
Use date_add and cast it to timestamp
This should work:
df1.withColumn("newDateWithTimestamp", F.date_add(F.col("next_apt"), F.col("days")).cast("timestamp")).show()
Input
Output

Populate dates in dataframe

I have a dataframe, which contains following records:
I need to fill this dataframe with rows with dates which are not present in it.
After inserting new dates the timestamp column should be in range df.timestamp.iloc[0] and df.timestamp.iloc[0]
You can use relativedelta() along with split() from the datetime library

How to set pandas series index from a dataframe and fill the series with other data?

I have a pandas dataframe myDataFrame with many columns and a multiple index(es) (two)
I want to create a series that has the same indexing as my dataframe myDataFrame but at each row I set a value.
I was thinking of something along the lines of:
mySeries.set_index(myDataFrame.index)
for i in mySeries.index()
mySeries.loc[i] = someValue
Thank you very much!
You can do
pd.Series(somevalue, index = df.index)

Collapsing values of a Pandas column based on Non-NA value of other column

I have a data like this in a csv file which I am importing to pandas df
I want to collapse the values of Type column by concatenating its strings to one sentence and keeping it at the first row next to date value while keeping rest all rows and values same.
As shown below.
Edit:
You can try ffill + transform
df1=df.copy()
df1[['Number', 'Date']]=df1[['Number', 'Date']].ffill()
df1.Type=df1.Type.fillna('')
s=df1.groupby(['Number', 'Date']).Type.transform(' '.join)
df.loc[df.Date.notnull(),'Type']=s
df.loc[df.Date.isnull(),'Type']=''

How to manipulate your Data Set based on the values of your index?

I have this Dataset, wind_modified. In this Dataset, columns are the locations and Index is the Date. And the Values in the columns are the wind speeds.
Let's say I want to find the average wind speed in January for each location, how do I use groupby or any other method to find the average?
Would it be possible without resetting the INDEX?
Edit - [This][2] is the actual dataset. I have combined the three columns "Yr, Mo, Dy" into one i.e. "DATE" and made it the INDEX.
I imported the dataset by using pd.read_fwf.
And "DATE" is of type datetime64[ns].
[2]:
Sure, if want all Januaries for all years first filter them by boolean indexing and add mean:
#if necessary convert index to DatetimeIndex
#df.index = pd.to_datetime(df.index)
df1 = df[df.index.month == 1].mean().to_frame().T
Or if need each January for each year separately after filter use groupby with DatetimeIndex.year and aggregate mean:
df2 = df[df.index.month == 1]
df3 = df2.groupby(df2.index.year).mean()

Categories

Resources