Split based on _ and find the difference between dates - python

I am trying to find the difference between the below two dates. It is in the format of "dd-mm-yyyy". I splitted the two strings based on _ and extract the date, month and year.
previous_date = "filename_03_03_2021"
current_date = "filename_09_03_2021"
previous_array = previous_date.split("_")
Not sure after that what could be done to combine them into a date format and find the difference between dates in "days".
Any leads/suggestions would be appreciated.

You could index into the list after split like previous_array[1] to get the values and add those to date
But tnstead of using split, you might use a pattern with 3 capture groups to make the match a bit more specific to get the numbers and then subtract the dates and get the days value.
You might make the date like pattern more specific using the pattern on this page
import re
from datetime import date
previous_date = "filename_03_03_2021"
current_date = "filename_09_03_2021"
pattern = r"filename_(\d{2})_(\d{2})_(\d{4})"
pMatch = re.match(pattern, previous_date)
cMatch = re.match(pattern, current_date)
pDate = date(int(pMatch.group(3)), int(pMatch.group(2)), int(pMatch.group(1)))
cDate = date(int(cMatch.group(3)), int(cMatch.group(2)), int(cMatch.group(1)))
print((cDate - pDate).days)
Output
6
See a Python demo

Related

Python find when list of dates becomes non-consecutive

I have a list of dates which are mostly consecutive, for example:
['01-Jan-10', '02-Jan-10', '03-Jan-10', '04-Jan-10', '08-Jan-10', '09-Jan-10', '10-Jan-10', '11-Jan-10', '13-Jan-10']
This is just an illustration as the full list contains thousands of dates.
This list can have couple of spots where the consecutiveness breaks. In the example shown above, it is 05-Jan-10, 07-Jan-10, and then 12-Jan-10. I am looking for the minimal and maximal day in the gap time span. Is there any way to do this efficiently in python?
The datetime package from the standard library can be useful.
Check the right date format and apply it with strptime to all terms in the list, loop through a pairs and check the difference between (in days) them using timedelta arithmetics. To keep the same format (which is non-standard) you need apply strftime.
from datetime import datetime, timedelta
dates = ['01-Jan-10', '02-Jan-10', '03-Jan-10', '04-Jan-10', '08-Jan-10', '09-Jan-10', '10-Jan-10', '11-Jan-10', '13-Jan-10']
# date format code
date_format = '%d-%b-%y'
# cast to datetime objects
days = list(map(lambda d: datetime.strptime(d, date_format).date(), dates))
# check consecutive days
for d1, d2 in zip(days, days[1:]):
date_gap = (d2-d1).days
# check consecutiveness
if date_gap > 1:
# compute day boundary of the gap
min_day_gap, max_day_gap = d1 + timedelta(days=1), d2 - timedelta(days=1)
# apply format
min_day_gap = min_day_gap.strftime(date_format)
max_day_gap = max_day_gap.strftime(date_format)
# check
print(min_day_gap, max_day_gap)
#05-Jan-10 07-Jan-10
#12-Jan-10 12-Jan-10
Remark: it is not clear what would happen when the time gap is of 2 days, in this case the min & max day in the gap are identical. In that case add a conditional check date_gap == 2 and correct the behavior...
if date_gap == 2: ... elif date_gap > 1: ...
or add a comment/edit the question with a proper description.

queries while retrieving data from list in python

I have a list in python which has below data . each data represent a table name
[tablename_20211011, tablename_20201010, tablename_20211009,
tablename_20211009, tablename_20211008]
20211011 -- this is the date when table got created
how i can fetch the table names which are created in last 1 year python.
if crteria is 1 yr then result should be tablename_20211011,tablename_20211009, tablename_20211008,tablename_20211009
!!!works!!! here you dont need to mention last year date manually it does the job automatically
from datetime import date
(datetime.datetime.now() - datetime.timedelta(days=365)).strftime("%Y%m%d")
d1 = today.strftime("%Y%m%d")
#this gives you date of last year
[x for x in a if x[:-8]>=d1]
this returns the items after the given date
Assuming that it is always the last 8 characters of your filename that are the date (ie. YYYYMMDD format), you could just use:
files = ['tablename_20211011', 'tablename_20201010', 'tablename_20211009', 'tablename_20211009', 'tablename_20211008']
print ([x for x in files if x[-8:] >= '20210101'])
Simply set the date-string to the right of the >= symbol as needed.
If the date is not always the last 8 characters of the string, then you may need to use a regular expression (regex) approach to extract it.

How to convert two columns from decimal years to date

I'm new in Python and I have a problem.
I have two columns of data in decimal year in a .txt document and I want to trasform each number in the two columns to data (yyyy-mm-dd)
2014.16020 2019.07190
2000.05750 2019.10750
2001.82610 2019.10750
2010.36280 2019.07190
2005.24570 2019.10750
2015.92610 2019.10750
2003.43600 2014.37100
and then subtract the data of the second column from the data of the first column in order to obtain the number of days between the two datas.
for example the restult should be like:
1825
3285
2920
3283
ecc..
Give proper path of your file and output file. The below code would do the rest.
import pandas as pd
import numpy as np
df=pd.read_csv('your_file.txt',delimiter=' ',header=None,parse_dates=[0,1])
df['date_diffrence']=((df[1]-df[0])/np.timedelta64(1,'D')).astype(int)
df.to_csv('your_file_result.txt',header=None,sep=' ',index=False)
I put my explanations in the code below
from datetime import datetime # yes they named a class the same as module
x = '''2014.16020 2019.07190
2000.05750 2019.10750
2001.82610 2019.10750
2010.36280 2019.07190
2005.24570 2019.10750
2015.92610 2019.10750
2003.43600 2014.37100'''
# split input into lines. Assumption here is that there is one pair of dates per each line
lines = x.splitlines()
# set up a container (list) for outputs
deltas = []
# process line by line
for line in lines:
# split line into separate dates
inputs = line.split()
dates = []
for input in inputs:
# convert text to number
date_decimal = float(input)
# year is the integer part of the input
date_year = int(date_decimal)
# number of days is part of the year, which is left after we subtract year
year_fraction = date_decimal - date_year
# a little oversimplified here with int and assuming all years have 365 days
days = int(year_fraction * 365)
# now convert the year and days into string and then into date (there is probably a better way to do this - without the string step)
date = datetime.strptime("{}-{}".format(date_year, days),"%Y-%j")
# see https://docs.python.org/3/library/datetime.html#strftime-strptime-behavior for format explanation
dates.append(date)
deltas.append(dates[1] - dates[0])
# now print outputs
for delta in deltas:
print(delta.days)

Python: Extract two dates from string

I have a string s which contains two dates in it and I am trying to extract these two dates in order to subtract them from each other to count the number of days in between. In the end I am aiming to get a string like this: s = "o4_24d_20170708_20170801"
At the company I work we can't install additional packages so I am looking for a solution using native python. Below is what I have so far by using the datetime package which only extracts one date: How can I get both dates out of the string?
import re, datetime
s = "o4_20170708_20170801"
match = re.search('\d{4}\d{2}\d{2}', s)
date = datetime.datetime.strptime(match.group(), '%Y%m%d').date()
print date
from datetime import datetime
import re
s = "o4_20170708_20170801"
pattern = re.compile(r'(\d{8})_(\d{8})')
dates = pattern.search(s)
# dates[0] is full match, dates[1] and dates[2] are captured groups
start = datetime.strptime(dates[1], '%Y%m%d')
end = datetime.strptime(dates[2], '%Y%m%d')
difference = end - start
print(difference.days)
will print
24
then, you could do something like:
days = 'd{}_'.format(difference.days)
match_index = dates.start()
new_name = s[:match_index] + days + s[match_index:]
print(new_name)
to get
o4_d24_20170708_20170801
import re, datetime
s = "o4_20170708_20170801"
match = re.findall('\d{4}\d{2}\d{2}', s)
for a_date in match:
date = datetime.datetime.strptime(a_date, '%Y%m%d').date()
print date
This will print:
2017-07-08
2017-08-01
Your regex was working correctly at regexpal

retrieve different values from a single string with regular expressions and python

Suggest that I have this:
valuestringdate = "24/6/2010"
and I want to get something like this from the variable
day = 24
month = 6
year = 2010
Use the .split() method.
In this case,
dateList = valuestringdate.split("/")
Which would produce list: dateList = ["24","6","2010]
Using indexes:
day = dateList[0] would set day = "24"
From there you can use day = int(day) to convert the day from a string to an integer.
You should be able to figure it out from there.
You could just split the string:
day, month, year = valuestringdate.split('/')

Categories

Resources