I am working on a log data where i had to find the usage of a software on a daily basis . for instance if the log shows for a user : start time 04/01/2019 9:15 AM End Time 04/03/2019 12:00 PM. If i take a difference between these two dates then i will get the data usage for the span not for a particular day. is there a way where i can get the data usage per day until the end date.
Data would be of similar form shown below
and here is what i am trying to achieve
Since, you don't provide some origin data, I create some fake data myself. Also I not sure whether you mean to compare Start date with End date from your description. If I misunderstand you, please post comment below.
In [10]: import pandas as pd
In [11]: import numpy as np
In [12]: df1 = pd.DataFrame({"A":[1,2], "Start":[20190302, 20190401], "End": [20190304, 20190402]})
In [13]: df1
Out[13]:
A Start End
0 1 20190302 20190304
1 2 20190401 20190402
In [14]: df2 = pd.DataFrame(df1.values.repeat((df1.End - df1.Start > 1) + 1, axis=0), columns=df1.columns)
In [15]: df2
Out[15]:
A Start End
0 1 20190302 20190304
1 1 20190302 20190304
2 2 20190401 20190402
If you need to compare your actually date, you may want to use something like datetime lib to do that. Form example:
In [28]: import datetime
In [29]: dt1 = datetime.datetime.strptime("11/30/2018 17:13", "%m/%d/%Y %H:%M")
In [30]: dt1
Out[30]: datetime.datetime(2018, 11, 30, 17, 13)
In [31]: dt2 = datetime.datetime.strptime("11/29/2018 17:13", "%m/%d/%Y %H:%M")
In [32]: dt3 = datetime.datetime.strptime("11/28/2018 17:13", "%m/%d/%Y %H:%M")
In [33]: dt1 - dt2
Out[33]: datetime.timedelta(days=1)
In [34]: (dt1 - dt2).days
Out[34]: 1
In [35]: (dt1 - dt3).days
Out[35]: 2
Related
d : Datetime object
Given a date d and a day of the week x in the range of 0–6, return the date of x within the same week as d.
I can think of some ways to do this, but they all seem rather inefficient. Is there a pythonic way?
Example
Input: datetime(2020,2,4,18,0,55,00000), 6
Output: date(2020,2,7)
Input: datetime(2020,2,4,18,0,55,00000), 0
Output date(2020,2,3)
This approach gets the first day in the week and goes from there to find the date requested by the weekday integer:
import datetime as dt
def weekday_in_week(d,weekday=None):
if not weekday:
return None
week_start = d - dt.timedelta(days=d.weekday())
return week_start + dt.timedelta(days=weekday)
Example usage:
In [27]: weekday_in_week(dt.date.today(),6)
Out[27]: datetime.date(2020, 2, 9)
Remember that the weekdays are as such: 0 is Monday, 6 is Sunday.
I know current year and current week,for example current year is 2018,current week is 8.
I want to know which year and week is it 10 weeks ago,10 weeks ago is the fiftieth week of 2017.
currentYear=2018
currentWeek=8
How to get it?
In [31]: from datetime import datetime as dt
In [32]: from datetime import timedelta
In [33]: current_date = dt(2018, 2, 20)
In [34]: current_date
Out[34]: datetime.datetime(2018, 2, 20, 0, 0)
In [35]: current_date.strftime('%V') <-- This is how we can get week of year.
Out[35]: '08'
In [36]: current_date - timedelta(weeks=10) <-- How to go back in time.
Out[36]: datetime.datetime(2017, 12, 12, 0, 0)
In [37]: ten_weeks_ago = _
In [38]: ten_weeks_ago.strftime('%V')
Out[38]: '50'
Best way,
Without knowing the date of 8th week of 2018,
Just creating date from week and year
here it is:
import datetime
d = "%s-W%s"%(currentYear,currentWeek)
r = datetime.datetime.strptime(d + '-0', "%Y-W%W-%w")
print(r-datetime.timedelta(weeks=10))
Output:
2017-12-17 00:00:00
Or if want in week format:
print((r-datetime.timedelta(weeks=10)).strftime('%V'))
Output:
50
after importing module import date
import time
from datetime import date
currentYear=datetime.strptime("2018-8-1", "%Y-%W-%w")
representing year and week and need a random weekday day added
Date1 :20061201
Date2 :01/12/2006
How could use pandas in Python to convert date1 into date2(day/month/year) format?Thanks!Date1 and Date2 are two column in csv files.
Data:
In [151]: df
Out[151]:
Date
0 20061201
1 20170530
Option 1:
In [152]: pd.to_datetime(df.Date, format='%Y%m%d').dt.strftime('%d/%m/%Y')
Out[152]:
0 01/12/2006
1 30/05/2017
Name: Date, dtype: object
Option 2:
In [153]: df.Date.astype(str).str.replace('(\d{4})(\d{2})(\d{2})', r'\3/\2/\1')
Out[153]:
0 01/12/2006
1 30/05/2017
Name: Date, dtype: object
If you're using pandas and want a timestamp object back
pd.to_datetime('20061201')
Timestamp('2006-12-01 00:00:00')
If you want a string back
str(pd.to_datetime('20061201').date())
'2006-12-01'
Assuming you have a dataframe df
df = pd.DataFrame(dict(Date1=['20161201']))
Then you can use the same techniques in vectorized form.
as timestamps
df.assign(Date2=pd.to_datetime(df.Date1))
Date1 Date2
0 20161201 2016-12-01
as strings
df.assign(Date2=pd.to_datetime(df.Date1).dt.date.astype(str))
Date1 Date2
0 20161201 2016-12-01
import datetime
A=datetime.datetime.strptime('20061201','%Y%m%d')
A.strftime('%m/%d/%Y')
You may use apply and lambda function here.
Suppose you have a dataset named df as below:
id date1
0 20061201
2 20061202
You can use the code like below:
df['date2'] = df['date1'].apply(lambda x: x[6:] + '/' + x[4:6] + '/' + x[:4])
The result will be:
id date1 date2
0 20061201 01/12/2016
2 20061202 02/12/2016
The simplest way is probably using the date parsing provided by datetime:
from datetime import datetime
datetime.strptime(str(20061201), "%Y%m%d")
You can apply this transformation to all rows in your pandas dataframe/series using the following:
from datetime import datetime
def convert_date(d):
return datetime.strptime(str(d), "%Y%m%d")
df['Date2'] = df.Date1.apply(convert_date)
This will add a Date2 column to your dataframe df, which is the datetime representation of the Date1 column.
You can then serialize the date again by using strftime:
def serialize_date(d):
return d.strftime(d, "%d/%m/%Y")
df['Date2'] = df.Date2.apply(serialize_date)
Alternatively you can do it all with string manipulations:
def reformat_date(d):
year = d // 10000
month = d % 10000 // 100
day = d % 100
return "{day}/{month}/{year}".format(day=day, month=month, year=year)
df['Date2'] = df.Date1.apply(reformat_date)
This is quite a bit faster than using the parsing machinery provided by strptime.
I have a DataFrame which is indexed with the last day of the month. Sometimes this date is a weekday and sometimes it is a weekend. Ignoring holidays, I'm looking to offset the date to the next business date if the date is on a weekend and leave the result unchanged if it is already on a weekday.
Some example data would be
import pandas as pd
idx = [pd.to_datetime('20150430'), pd.to_datetime('20150531'),
pd.to_datetime('20150630')]
df = pd.DataFrame(0, index=idx, columns=['A'])
df
A
2015-04-30 0
2015-05-31 0
2015-06-30 0
df.index.weekday
array([3, 6, 1], dtype=int32)
Something like the following works, however I would appreciate if someone has a solution that is a little more straightforward.
idx = df.index.copy()
wknds = (idx.weekday == 5) | (idx.weekday == 6)
idx2 = idx[~wknds]
idx2 = idx2.append(idx[wknds] + pd.datetools.BDay(1))
idx2 = idx2.order()
df.index = idx2
df
A
2015-04-30 0
2015-06-01 0
2015-06-30 0
You can add 0*BDay()
from pandas.tseries.offsets import BDay
df.index = df.index.map(lambda x : x + 0*BDay())
You can also use this with a Holiday calendar with CDay(calendar) in case there are holidays.
You can map the index with a lambda function, and set the result back to the index.
df.index = df.index.map(lambda x: x if x.dayofweek < 5 else x + pd.DateOffset(7-x.dayofweek))
df
A
2015-04-30 0
2015-06-01 0
2015-06-30 0
Using DataFrame.resample
A more idiomatic method would be to resample to business days:
df.resample('B', label='right', closed='right').first().dropna()
A
2015-04-30 0.0
2015-06-01 0.0
2015-06-30 0.0
Can also use a variation of the logic: a)given input date = 'inputdate', go back one business day using pandas date_range which has business days input; then b) go forward one business day using the same. To do this, you generate a vector with 2 inputs using data_range and select the min or max value to return the appropriate single value. So this could look as follows:
a) get business day before:
date_1b_bef = min(pd.date_range(start=inputdate, periods = 2, freq='-1B'))
b) get business day after the 'business day before':
date_1b_aft = max(pd.date_range(start=date_1b_bef, periods = 2, freq='1B'))
or substituting a) into b) to get one line:
date_1b_aft = max(pd.date_range(start=min(pd.date_range(start=inputdate, periods = 2, freq='-1B')), periods = 2, freq='1B'))
This can also be used with relativedelta to get the business day after some calendar period offset from inputdate. For example:
a) get the business day (using 'following' convention if offset day is not a business day) for 1 calendar month prior to 'input date':
date_1mbef_fol = max(pd.date_range(min(pd.date_range(start=inputdate + relativedelta(months=-1), periods = 2, freq='-1B')), periods = 2, freq = '1B'))
b) get the business day (using 'preceding' convention if offset day is not a business day) for 1 year prior to 'input date':
date_1ybef_pre = min(pd.date_range(max(pd.date_range(start=inputdate + relativedelta(years=-1), periods = 2, freq='1B')), periods = 2, freq = '-1B'))
I have two date features of type datetime. I'd like to express the difference between them in days cast as type int. How do I accomplish this:
In[]
print lcd.time_to_default
print lcd.issue_date
lcd['time_to_default']=(lcd.last_pymnt_date - lcd.issue_date)
lcd.time_to_default.head()
Out[92]:
datetime64[ns]
datetime64[ns]
0 1127 days
1 487 days
2 913 days
3 1127 days
4 1217 days
Name: time_to_default, dtype: timedelta64[ns]
I want to cast this series as an int, not timedelta64.
Addendum: I can't cast this as ".days" as the link above which supposes a duplicate, suggests.
In[]
lcd.time_to_default.days
Returns:
Out[]
'Series' object has no attribute 'days'
Just subtract the two datetime variables. That yields timedelta type.
Eg:
In [2]: datetime.datetime.now()
Out[2]: datetime.datetime(2015, 6, 2, 0, 30, 49, 548657)
In [3]: yesterday = datetime.datetime.now() - datetime.timedelta(days=1)
In [4]: datetime.datetime.now() - yesterday
Out[4]: datetime.timedelta(1, 17, 32459)
In [5]: diff = (datetime.datetime.now() - yesterday)
In [6]: diff.days
Out[6]: 1
Try this,
>>> from datetime import datetime
>>> date1 = datetime(2015,6,2)
>>> date2 = datetime(2015,5,2)
>>> diff = date1 - date2
>>> print (diff.days)
31
To get integer number of days from a series of timedelta64[ns], you could try (not tested):
result = np.divide(lcd.time_to_default, np.timedelta64(1, 'D'))
See Time difference in seconds from numpy.timedelta64 and Converting between datetime, Timestamp and datetime64.