Populate dates in dataframe - python

I have a dataframe, which contains following records:
I need to fill this dataframe with rows with dates which are not present in it.
After inserting new dates the timestamp column should be in range df.timestamp.iloc[0] and df.timestamp.iloc[0]

You can use relativedelta() along with split() from the datetime library

Related

How to add a column (interger type) to a column that is a date? Pyspark/ Spark dataframe?

I have a dataframe with the following details: (first df in image)
I want to be able to add new rows to to df that calculate the column next_apt + days with the new timestamp that it was run. So I want it to look like this:
the other columns should be left as it it. just add the next next_apt with the newer timestamp that it was calculated and append the rows to the same df.
Use date_add and cast it to timestamp
This should work:
df1.withColumn("newDateWithTimestamp", F.date_add(F.col("next_apt"), F.col("days")).cast("timestamp")).show()
Input
Output

Iterating Date in python Dataframe

I realize this is probably a very trivial question but I have a dataframe of 1000+ rows and I want to create a new column "Date" but for a single date "2018-01-31". I tried the code below but python just returns "Length of values (1) does not match length of index"
I would really appreciate any help!
Date = ['2018-01-31']
for i in range(len(Output)):
Output['Date']= Date
Assuming Output is the name of your pandas dataframe with 1000+ rows you can do:
Output['Date'] = "2018-01-31"
or using the datetime library you could do:
from datetime import date
Output["Date"] = date(2018, 1, 31)
to format it as a date object rather than a string. You also do not need to iterate over each row if you are wanting the same value for each row. Simply adding a new column with the value will set the value of the new column to the assigned value for each row.

Convert to datetime using column position/number in python pandas

Very simple query but did not find the answer on google.
df with timestamp in date column
Date
22/11/2019 22:30:10 etc. say which is of the form object on doing df.dtype()
Code:
df['Date']=pd.to_datetime(df['Date']).dt.date
Now I want the date to be converted to datetime using column number rather than column name. Column number in this case will be 0(I have very big column names and similar multipe files, so I want to change date column to datetime using its position '0' in this case).
Can anyone help?
Use DataFrame.iloc for column (Series) by position:
df.iloc[:, 0] = pd.to_datetime(df.iloc[:, 0]).dt.date
Or is also possible extract column name by indexing:
df[df.columns[0]] = pd.to_datetime(df[df.columns[0]]).dt.date

How to fill missing dates in pandas DataFrame?

MY DataFrame contains several data for each date. in my date column date has entered only for the first data of the day, for rest of the data of the day there is only sparse value. How can I fill all the all the unfilled date values with corresponding date?
Following is the snippet of my data frame
In Python
df['Date']=df['Date'].replace({'':np.nan}).ffill()
In R
library(zoo)
df$Date[df$Date=='']=NA
df$Date=na.locf(df$Date)
You can use fillna function.
# Say df is your dataframe
# To fill values forward use:
df.fillna(method='ffill')
# To fill values backward use:
df.fillna(method='bfill')

Add pandas Series to a DataFrame, preserving index

I have been having some problems adding the contents of a pandas Series to a pandas DataFrame. I start with an empty DataFrame, initialised with several columns (corresponding to consecutive dates).
I would like to then sequentially fill the DataFrame using different pandas Series, each one corresponding to a different date. However, each Series has a (potentially) different index.
I would like the resulting DataFrame to have an index that is essentially the union of each of the Series indices.
I have been doing this so far:
for date in dates:
df[date] = series_for_date
However, my df index corresponds to that of the first Series and so any data in successive Series that correspond to an index 'key' not in the first Series are lost.
Any help would be much appreciated!
Ben
If i understand you can use concat:
pd.concat([series1,series2,series3],axis=1)

Categories

Resources