How to add values from one column to timestamp column as days? - python

I have a table df:
date day
2021-07-25 1
2021-07-29 1
2021-07-30 1
I want to filter it this way:
df[df['device_install_date'] + df['lifetime'] < '2021-07-27']
But it brings error:
TypeError: can only concatenate str (not "int") to str
How can I do that? values in day column should add as days to date. Column day has type pandas.core.series.Series

In order to do this you have to make sure your using the right data types. In you example I'm assuming the date column is a datetime so in order to increment the date column by days you need to use a time delta then compare the date to a date like so.
df = pd.DataFrame([
{'date':'2021-07-25','days':1},
{'date':'2021-07-29','days':1},
{'date':'2021-07-30','days':1}
])
df['date'] = pd.to_datetime(df['date'])
df['days'] = pd.to_timedelta(df['days'],unit='D')
query = df[(df['date']+df['days']).dt.date<pd.to_datetime('2021-07-27').date()]
print(query)

Related

How to convert string to date in defined format based on defined conditions in Python Pandas?

I have Data Frame in Python Pandas like below:
col1
------
20002211
19980515
First four values are year
Next two values are month
Next two values are day
And I need to replace values to 19000102 in "col1" if values concerning month are not from range 1- 12, because we have 12 months :)
Then I need to convert this string to date, so as a result I need as below:
col1
--------
1900-01-02
1998-05-15
Because in the first row was: 20002211, and month values was 22 and we have only 12 months in our calendar.
Second row was correct
Use pd.to_dateime with errors='coerce' as parameter.
If ‘coerce’, then invalid parsing will be set as NaT.
>>> pd.to_datetime(df['col'], format='%Y%m%d', errors='coerce') \
.fillna('1900-01-02')
0 1900-01-02
1 1998-05-15
Name: col, dtype: datetime64[ns]

Filter a Dataframe by months and year in pandas

I have a dataframe from which I am getting month and year values in a column like the following:
temp dd delivery_month
0 02-2021 #mm-yyyy
1 01-2021
2 02-2021
3 02-2021
I want to use the month and year from all of the rows in that column to filter another dataframe
but I got the following error:
Can only use .dt accessor with datetimelike values
The date in the second dataframe is like the following:
allot_df['dispatch_date'][0] 06-01-2020
I tried the following but it doesn't work
for row in temp_dd.itertuples():
allot_df = allot_df[allot_df['dispatch_date'].dt.month == row.delivery_month[:2]]
where temp_dd is the first dataframe
How can I filter the allot_df dataframe by both month and year from temp_dd?
I think problem is values not datetimes, if format mm-YYYY use:
allot_df['dispatch_date'] = pd.to_datetime(allot_df['dispatch_date'], format='%m-%Y')

Pandas: Change column of integers to datetime and add a timestamp

I have a dataframe with an id column, and a date column made up of an integer.
d = {'id': [1, 2], 'date': [20161031, 20170930]}
df = pd.DataFrame(data=d)
id date
0 1 20161031
1 2 20170930
I can convert the date column to an actual date like so.
df['date'] = df['date'].apply(lambda x: pd.to_datetime(str(x), format='%Y%m%d'))
id date
0 1 2016-10-31
1 2 2017-09-30
But I need to have this field as a timestamp with with hours, minutes, and seconds so that it is compatible with my database table. I don't care what the the values are, we can keep it easy by setting it to zeros.
2016-10-31 00:00:00
2017-09-30 00:00:00
What is the best way to change this field to a timestamp? I tried
df['date'] = df['date'].apply(lambda x: pd.to_datetime(str(x), format='%Y%m%d%H%M%S'))
but pandas didn't like that.
I think I could append six 0's to the end of every value in that field and then use the above statement, but I was wondering if there is a better way.
With pandas it is simpler and faster to convert entire columns. First you convert to string and then to time stamp
pandas.to_datatime(df['date'].apply(str))
PS there are few other conversion methods of varying performance https://datatofish.com/fastest-way-to-convert-integers-to-strings-in-pandas-dataframe/
The problem seems to be that pd.to_datetime doesn't accept dates in this integer format:
pd.to_datetime(20161031) gives Timestamp('1970-01-01 00:00:00.020161031')
It assumes the integers are nanoseconds since 1970-01-01.
You have to convert to a string first:
df['date'] = pd.to_datetime(df["date"].astype(str))
Output:
id date
0 1 2016-10-31
1 2 2017-09-30
Note that these are datetimes so they include a time component (which are all zero in this case) even though they are not shown in the data frame representation above.
print(df.loc[0,'date'])
Out:
Timestamp('2016-10-31 00:00:00')
You can use
df['date'] = pd.to_datetime(df["date"].dt.strftime('%Y%m%d%H%M%S'))

pandas change column type to datetime afterr group by

This is related to a previous question which I asked here (pandas average by timestamp and day of the week).
Here, I perform a groupby operation as follows:
df = pd.DataFrame(np.random.random(2838),index=pd.date_range('2019-09-13 12:40:00', periods=2838, freq='5T'))
# Reset the index
df.reset_index(inplace=True)
df.groupby(df.index.dt.strftime('%A %H:%M')).mean()
df.reset_index(inplace=True)
Now if I check the data types of the column, we have:
index object
0 float64
The column does not retain its datetime data type. How can I still preserve the column data type?
I wouldn't do grouping like that, instead, I would do double grouping/indexing:
days = df.index.day_name()
times = df.index.time
df.groupby([days,times]).mean()
which gives (head):
0
Friday 00:00:00 0.524322
00:05:00 0.857684
00:10:00 0.593461
00:15:00 0.755158
00:20:00 0.049511
where the first level index is the (string) day names, and second level index are datetime type.

pandas add a value column to datetime

I have this simple problem but for some reason it's giving a headache. I want to add a existing Date column with another column to get a newDate column.
For example: I have Date and n columns, and I want to add in NewDate column into my existing df.
df:
Date n NewDate (New Calculation here: Date + n)
05/31/2017 3 08/31/2017
01/31/2017 4 05/31/2017
12/31/2016 2 02/28/2017
I tried:
df['NewDate'] = (pd.to_datetime(df['Date']) + MonthEnd(n))
but I get an error saying "cannot convert the series to class 'int'
You're probably looking for an addition with a timedelta object.
v = pd.to_datetime(df.Date) + (pd.to_timedelta(df.n, unit='M'))
v
0 2017-08-30 07:27:18
1 2017-06-01 17:56:24
2 2017-03-01 20:58:12
dtype: datetime64[ns]
At the end, you can convert the result back into the same format as before -
df['NewDate'] = v.dt.strftime('%m/%d/%Y')

Categories

Resources