split string dates into list of multiple dates

split string dates into list of multiple dates - python

I have the following list, I am trying to split the elements to multiple "dates" accordingly,I want to write a function to do it,I am not sure it regex is the way to go or datetime
x=["2- 7 MAY, 2020, 10-12 JUN, 2014","7 February, 2020, 6 February, 2020, 26 October, 2018","16 JUN, 2020, 24 JUL, 2020, 28 FEB, 2020, 15 SEPT, 2020, 8-11 MAY, 2023, 22 OCT, 2020","14 JUN, 2020"]
for i in x:
temp=my_func(i)
if len(temp)==1:
date1=temp[0]
date2=""
elif len(temp)>=2:
date1=temp[0]
date2=temp[1]
else:
continue
#rest of my code
here is the expected output of my_func
#my_func(x[0])=["2- 7 MAY, 2020", "10-12 JUN, 2014"]
#my_func([x[1]])=["7 February, 2020", "6 February, 2020", "26 October, 2018"]
#my_func(x[-1])=["14 JUN, 2020"]

According to your examples,
import re
for i in x:
temp =re.findall('\d.*?\d{4}',i)
#output
['2- 7 MAY, 2020', '10-12 JUN, 2014']
['7 February, 2020', '6 February, 2020', '26 October, 2018']
['16 JUN, 2020', '24 JUL, 2020', '28 FEB, 2020', '15 SEPT, 2020', '8-11 MAY, 2023', '22 OCT, 2020']
['14 JUN, 2020']

Split the string with ','. There are always even number of parts. 2 contiguous parts constitute a date. So just join up two parts to form a date string.
re is fine, but this should too:
>>> x = ["2- 7 MAY, 2020, 10-12 JUN, 2014","7 February, 2020, 6 February, 2020, 26 October, 2018","16 JUN, 2020, 24 JUL, 2020, 28 FEB, 2020, 15 SEPT, 2020, 8-11 MAY, 2023, 22 OCT, 2020","14 JUN, 2020"]
>>> result = []
>>> for s in x:
parts = s.split(',')
result.append([','.join(parts[i:i+2]).strip() for i in range(0,len(parts),2)])
>>> result
[['2- 7 MAY, 2020', '10-12 JUN, 2014'],
['7 February, 2020', '6 February, 2020', '26 October, 2018'],
['16 JUN, 2020', '24 JUL, 2020', '28 FEB, 2020', '15 SEPT, 2020', '8-11 MAY, 2023', '22 OCT, 2020'],
['14 JUN, 2020']
]
Your my_func will be simply:
>>> def my_func(s):
parts = s.split(',')
return [','.join(parts[i:i+2]).strip() for i in range(0,len(parts),2)]

global Varibale is used but the output is as you wanted it to be.
The program depends on the way how you format the dates. But with this format it works fine.
dates=["2- 7 MAY, 2020, 10-12 JUN, 2014","7 February, 2020, 6 February, 2020, 26 October, 2018","16 JUN, 2020, 24 JUL, 2020, 28 FEB, 2020, 15 SEPT, 2020, 8-11 MAY, 2023, 22 OCT, 2020","14 JUN, 2020"]
def splitdates(date):
if type(date) is int:
tosplit = str(dates[date])
else:
tosplit = date
month = ["J" , "j" , "f" , "F" , "m" , "M" , "A" , "a" , "s" , "S" ,"n" , "N" , "o" , "O" , "D" , "d"]
for item, character in enumerate(tosplit):
if character in month:
for item2, character in enumerate(tosplit[item+1:]):
if character.startswith(","):
for item3, character in enumerate(tosplit[item+item2+2:]):
if character.startswith(","):
global newdate
newdate.append(tosplit[:item + item2 + item3 + 3])
nextPart = tosplit[item + item2 + item3 + 3:]
if nextPart.endswith(";"):
newPart = nextPart
splitdates(newPart)
else:
newdate.append(tosplit[item+item2+item3+3:])
return newdate
newdate.append(tosplit[:item+item2+item3])
return newdate
for x in range(len(dates)):
newdate = []
print("Date: ",splitdates(x))
Output is:
Date: ['2- 7 MAY, 2020,', ' 10-12 JUN, 2014']
Date: ['7 February, 2020,', ' 6 February, 2020, 26 October, 2018']
Date: ['16 JUN, 2020,', ' 24 JUL, 2020, 28 FEB, 2020, 15 SEPT, 2020, 8-11 MAY, 2023, 22 OCT, 2020']
Date: ['14 JUN, 2']

**This is Simplest algorithm **
iterate over list pick one by one string now iterate over string with split method
pick two consecutive value add them and append into new list
x = ["2- 7 MAY, 2020, 10-12 JUN, 2014","7 February, 2020, 6 February, 2020, 26 October, 2018","16 JUN, 2020, 24 JUL, 2020, 28 FEB, 2020, 15 SEPT, 2020, 8-11 MAY, 2023, 22 OCT, 2020","14 JUN, 2020"]
for i in x:
final_answer=[]
f=i.split(',')
j=0
while(j<len(f)):
final_answer.append((str(f[j]+f[j+1])))
j=j+2
print(final_answer)
output
['2- 7 MAY 2020', ' 10-12 JUN 2014']
['7 February 2020', ' 6 February 2020', ' 26 October 2018']
['16 JUN 2020', ' 24 JUL 2020', ' 28 FEB 2020', ' 15 SEPT 2020', ' 8-11 MAY 2023', ' 22 OCT 2020']
['14 JUN 2020']

Related

Group python dataframe and display all correspond values for each unique key in a dictionary

I have the following dataset
id
date
7510
15 Jun 2020
7510
16 Jun 2020
7512
15 Jun 2020
7512
07 Jul 2020
7520
15 Jun 2020
7520
16 Aug 2020
I need to convert this to a dictionary which is quite straight forward, but need each unique id as a key and all corresponding values as values to the unique key.
for example;
dictionary = {7510: ["15 Jun 2020", "16 Jun 2020"], 7512: ["15 Jun 2020", "07 Jul 2020"],
7520: ["15 Jun 2020", "16 Aug 2020"] }

Try this:
df.groupby('id')['date'].agg(list).to_dict()
Output:
{7510: ['15 Jun 2020', '16 Jun 2020'],
7512: ['15 Jun 2020', '07 Jul 2020'],
7520: ['15 Jun 2020', '16 Aug 2020']}

Sort a list of dictionaries of dates by value

I'm trying to sort the values with current year.
Current year values should show first.
mdlist = [{0:'31 Jan 2022', 1:'', 2:'10 Feb 2022'},
{0:'10 Feb 2021', 1:'20, Feb 2021', 2:''},
{0:'10 Feb 2022', 1:'10 Feb 2022', 2:'10 Feb 2022'}]
mdlist = sorted(mdlist, key=lambda d:d[0])
but it is not working as expected
expected output:
mdlist = [{0:'31 Jan 2022', 1:'', 2:'10 Feb 2022'},
{0:'10 Feb 2022', 1:'10 Feb 2022', 2:'10 Feb 2022'},
{0:'10 Feb 2021', 1:'20 Feb 2021', 2:''}]

Maybe you could leverage the fact that these are datetimes by using the datetime module and sort it by the years in descending order and the month-days in ascending order:
from datetime import datetime
def sorting_key(dct):
ymd = datetime.strptime(dct[0], "%d %b %Y")
return -ymd.year, ymd.month, ymd.day
mdlist.sort(key=sorting_key)
Output:
[{0: '31 Jan 2022', 1: '', 2: '10 Feb 2022'},
{0: '10 Feb 2022', 1: '10 Feb 2022', 2: '10 Feb 2022'},
{0: '10 Feb 2021', 1: '20 Feb 2021', 2: ''}]

Use a key function that returns 0 if the year is 2022, 1 otherwise. This will sort all the 2022 dates first.
firstyear = '2022'
mdlist = sorted(mdlist, key=lambda d: 0 if d:d[0].split()[-1] == firstyear else 1)

Python - Given list of dates (as strings), how do we return only those that fall within last 365 days?

Given the following list of strings
from datetime import datetime
import numpy as np
strings = ['Nov 1 2021', 'Oct 25 2021', 'Oct 18 2021', 'Oct 11 2021', 'Oct 4 2021', 'Sep 27 2021',
'Sep 20 2021', 'Aug 24 2021', 'Aug 16 2021', 'Aug 9 2021', 'Aug 2 2021', 'Jul 26 2021',
'Jun 28 2021', 'Jun 21 2021', 'Jun 14 2021', 'Jun 7 2021', 'May 24 2021', 'May 10 2021',
'May 3 2021', 'Apr 26 2021', 'Apr 12 2021', 'Apr 12 2021', 'Apr 5 2021', 'Mar 22 2021',
'Feb 22 2021', 'Feb 13 2021', 'Feb 8 2021', 'Feb 1 2021', 'Nov 2 2020', 'Sep 28 2020',
'Aug 31 2020', 'Aug 20 2020', 'Aug 10 2020', 'Jun 29 2020', 'Jun 22 2020', 'Jun 15 2020',
'Mar 2 2020', 'Feb 10 2020', 'Feb 3 2020', 'Jan 27 2020', 'Jan 20 2020', 'Jan 13 2020',
'Jan 6 2020', 'Aug 26 2019', 'Aug 5 2019', 'Jul 29 2019', 'Jul 22 2019', 'Jul 15 2019']
What's the most efficient way to return a list of those dates that fall within the last 365 days?
Here's my failed attempt:
# Converts strings to datetime format and appends to new list, 'dates.'
dates = []
for item in strings:
convert_string = datetime.strptime(item, "%b %d %Y").date()
dates.append(convert_string)
# Given each item in 'dates' list returns corresponding
# list showing elapsed time between each item and today (Nov 11th 2021).
elapsed_time = []
def dateDelta(i):
today = datetime.fromisoformat(datetime.today().strftime('%Y-%m-%d')).date()
date = i
delta = (today - date).days
elapsed_time.append(delta)
for i in dates:
dateDelta(i)
# Concatenates 'dates' list and 'elapsed_times_list' in attempt to somehow connect the two.
date_and_elapsed_time = []
date_and_elapsed_time.append(dates)
date_and_elapsed_time.append(elapsed_time)
# Takes 'elapsed_time list' appends only the dates that fall within the last 365 days.
relevant_elapsed_time_list = []
for i in elapsed_time:
if i <= 365:
relevant_elapsed_time_list.append(i)
# Finds indices of 'relevant_elapsed_time_list' within last 365 days.
# After trawling StackOverflow posts, I import numpy in an effort to help with indexing.
# My thinking is I can use the indices of the relevant elapsed times from the
# 'elapsed_time_list' and return the corresponding date from the 'dates' list.
relevant_elapsed_time_list_indices = []
for item in relevant_elapsed_time_list:
indexes = []
for index, sub_lst in enumerate(date_and_elapsed_time):
try:
indexes.append((index, sub_lst.index(item)))
except ValueError:
pass
relevant_elapsed_time_list_indices.append(indexes)
relevant_elapsed_time_list_indices = np.array([[x[0][0], x[0][1]] for x in relevant_elapsed_time_list_indices])
At this point, I'm as yet unable to convert the relevant_elapsed_time_list_indices list to the corresponding indices for the first sub-list in date_and_elapsed_time. The point of this would be to then isolate those indices (i.e. dates).
What's the most efficient way to solve this problem?

You can convert the strings to datetime objects using .strptime, then use a conditional list comprehension that uses timedelta to pick ones that fall within the last 365 days:
from datetime import datetime, timedelta
last_365_days = [s for s in strings if datetime.strptime(s, "%b %d %Y") + timedelta(days=365) >= datetime.today()]
Alternatively you can compute the cutoff date in advance:
cutoff = datetime.today() - timedelta(days=365)
last_365_days = [s for s in strings if datetime.strptime(s, "%b %d %Y") >= cutoff]
The value of last_365_days should then be (for today):
['Nov 1 2021', 'Oct 25 2021', 'Oct 18 2021', 'Oct 11 2021', 'Oct 4 2021',
'Sep 27 2021', 'Sep 20 2021', 'Aug 24 2021', 'Aug 16 2021', 'Aug 9 2021',
'Aug 2 2021', 'Jul 26 2021', 'Jun 28 2021', 'Jun 21 2021', 'Jun 14 2021',
'Jun 7 2021', 'May 24 2021', 'May 10 2021', 'May 3 2021', 'Apr 26 2021',
'Apr 12 2021', 'Apr 12 2021', 'Apr 5 2021', 'Mar 22 2021', 'Feb 22 2021',
'Feb 13 2021', 'Feb 8 2021', 'Feb 1 2021']

Splitting date range of dates into two to lists

I want to spit the following list of dates:
Month=['1 October 2020 to 31 October 2020',
'1 October 2020 to 31 October 2020',
'1 October 2020 to 31 October 2020',
'1 October 2020 to 31 October 2020',
'1 October 2020 to 31 October 2020']
The desired output is as follows:
Month = [['1 October 2020', '31 October 2020'],
['1 October 2020', '31 October 2020'],
['1 October 2020','31 October 2020'],
['1 October 2020', '31 October 2020'],
['1 October 2020','31 October 2020']]
How can I do this using regex.
I used Month.str.split('to') but this is not working properly because October contains to as a result splitting October into three strings. Therefore, I guess I have to use regex for this. What is the best way to achieve this?

Use ' to ' as the partition instead of just to -- this matches the input format better anyway since if you split on to you'd need to also strip the whitespace.
>>> [list(i.split(' to ') for i in Month)]
[[['1 October 2020', '31 October 2020'], ['1 October 2020', '31 October 2020'], ['1 October 2020', '31 October 2020'], ['1 October 2020', '31 October 2020'], ['1 October 2020', '31 October 2020']]]

Function that only works for the first two list from a list of lists

I have this list:
mylist = [
[1890731350060, 'February 2016, March 2016, January 2016', 'INDEMNIZATIA DE HRANA', 1183],
[1890922350110, 'May 2015, June 2015, April 2015', 'INDEMNIZATIA DE HRANA', 1183],
[1890731350060, 'February 2016, March 2016, January 2016', 'INDEMNIZATIA DE HRANA', 1183]
]
and my desired output it this:
mylist = [
[1890731350060, 'Ian 2016, Feb 2016, Mar 2016', 'INDEMNIZATIA DE HRANA', 1183],
[1890922350110, 'Iun 2016, Mai 2016, Apr 2016', 'INDEMNIZATIA DE HRANA', 1183],
[1890731350060, 'Ian 2016, Feb 2016, Mar 2016', 'INDEMNIZATIA DE HRANA', 1183]
]
And for this I have this 2 functions:
from datetime import datetime
import re
def translateInRo(string, dyct):
substrs = sorted(dyct, key=len, reverse=True)
regexp = re.compile('|'.join(map(re.escape, substrs)))
return regexp.sub(lambda match: dyct[match.group(0)], string)
def orderDateslist(thislist):
i=0
for dates in thislist:
sorted_list = []
chgDates = dates[1].split(",")
for test1 in chgDates:
sorted_list.append(test1.strip())
test = sorted(sorted_list, key=lambda x: datetime.strptime(x, "%B %Y"))
str1 = ', '.join(test)
translate = translateInRo(
str1, {"January": "Ian", "February": "Feb", "March": "Mar", "April": "Apr", "May": "Mai", "June": "Iun", "July": "Iul", "August": "Aug", "September": "Sept", "October": "Oct", "November": "Nov", "December": "Dec"})
thislist[i][1] = translate
i = + 1
return thislist
And when I print:
print (orderDateslist(mylist))
[[1890731350060, 'Ian 2016, Feb 2016, Mar 2016', 'INDEMNIZATIA DE HRANA', 1183], [1890922350110, 'Ian 2016, Feb 2016, Mar 2016', 'INDEMNIZATIA DE HRANA', 1183], [1890731350060, 'February 2016, March 2016, January 2016', 'INDEMNIZATIA DE HRANA', 1183]]
The last list won't be computed, the function that I have works only for the first 2 lists from a list of lists, the ones after will stay the same, I want this function to work for a big number of lists, what I have to change ? I'm using python 3. ANd also the last one is duplicating.

Problem
To clarify the problem, from your expected code it appears you wish to replace the string of dates at index 1 of each sublist by:
sorting dates by time
abbreviating the months according to a translation dictionary
This can be done as follows:
# Given
import datetime
mylist = [
[1890731350060, 'February 2016, March 2016, January 2016', 'INDEMNIZATIA DE HRANA', 1183],
[1890922350110, 'May 2015, June 2015, April 2015', 'INDEMNIZATIA DE HRANA', 1183],
[1890731350060, 'February 2016, March 2016, January 2016', 'INDEMNIZATIA DE HRANA', 1183]
]
TRANSLATE = {
"January": "Ian", "February": "Feb", "March": "Mar", "April": "Apr",
"May": "Mai", "June": "Iun", "July": "Iul", "August": "Aug",
"September": "Sept", "October": "Oct", "November": "Nov", "December": "Dec"
}
Code
def transform_dates(iterable, translate=TRANSLATE):
transformed_lists = []
for i, sublst in enumerate(iterable):
transformed_lists.append(sublst[:])
# Clean dates string
raw_dates = sublst[1]
cleaned_dates = set(map(str.strip, raw_dates.split(",")))
# Sort dates string
months_yrs = sorted(cleaned_dates, key=lambda x: datetime.datetime.strptime(x, "%B %Y"))
months_yrs_split = [i.split() for i in months_yrs]
# Abbreviate months
abbrev_dates = [" ".join((translate[i[0]], i[1])) for i in months_yrs_split]
transformed_lists[i][1] = ", ".join(abbrev_dates)
return transformed_lists
transform_dates(mylist)
# [[1890731350060, 'Ian 2016, Feb 2016, Mar 2016', 'INDEMNIZATIA DE HRANA',1183],
# [1890922350110, 'Apr 2015, Mai 2015, Iun 2015', 'INDEMNIZATIA DE HRANA',1183],
# [1890731350060, 'Ian 2016, Feb 2016, Mar 2016', 'INDEMNIZATIA DE HRANA',1183]]
Notes
This function sorts by month and year.
lst = [1890731350060, 'February 2015, March 2013, January 2016', 'INDEMNIZATIA DE HRANA', 1183],
transform_dates(lst)
# [[1890731350060, 'Mar 2013, Feb 2015, Ian 2016', 'INDEMNIZATIA DE HRANA', 1183]]
This function removes duplicate dates.
lst = [1890731350060, 'May 2016, June 2016, May 2016, July 2016', 'INDEMNIZATIA DE HRANA', 1183],
transform_dates(lst)
# [[1890731350060,'Mai 2016, Iun 2016, Iul 2016', 'INDEMNIZATIA DE HRANA', 1183]]
Details
If you are new to Python, I add these details to help express what's happening.
The transform_dates() function accepts the list of lists called mylist as and argument. Inside the function, we first make a new list called transformed_lists that we will later append items to. We now loop over iterable (equivalent to mylist) to get each sublist and track their index positions (i).
We add a copy of the sublst to transform_dates (hence the [:] as this keeps us from modifying the original items in mylist). Then we start working on the first index that contains the string of dates. We clean the string, first by splitting it into a list of month-year pairs, and then strip trailing and leading spaces, e.g. ['February 2016', 'March 2016', 'January 2016']. If there are any duplicate dates, the set() removes them since a set is a collection of unique elements.
In preparation for the next step, we take this opportunity to sort them dates and split them further by the single space. Splitting makes a temporary nested list e.g. [['January', '2016'], ['February', '2016'], ['March', '2016']].
Finally, for each item in the latter nested list, we abbreviate the month using the TRANSLATE dictionary and join() it back with the year, making a single list of new strings e.g. ['Jan 2016', 'Feb 2016', 'Mar 2016']. Then we perform a final join() where each item is delimited by a comma (as requested), e.g. 'Jan 2016, Feb 2016, Mar 2016'.
We have finished transforming the string. Now we simply replace the old string at index 1 of our transformed_lists by assigning the new string to that index. In summary, we have systematically selected the string, decomposed it, transformed it, put it back together and reassign it to it's original position in the list. We repeat this process for every sublist in iterable until the loop is complete. The result is our transformed_lists, which is return by the function.

You can try this:
import re
import itertools
def orderdates(full_date):
table = {"January": "Ian", "February": "Feb", "March": "Mar", "April": "Apr", "May": "Mai", "June": "Iun", "July": "Iul", "August": "Aug", "September": "Sept", "October": "Oct", "November": "Nov", "December": "Dec"}
l = ["Ian", "Feb", "Mar", "Apr", "Mai", "Iun", "Iul", "Aug", "Sept", "Oct", "Nov", "Dec"]
new_dates = re.split(",\s", full_date)
final_dates = [[a, int(b)] for a, b in [i.split() for i in new_dates]]
new_dates = sorted(final_dates, key = lambda x: x[-1])
current = [list(b) for a, b in itertools.groupby(new_dates, lambda x: x[-1])]
new_current = [[table[i]+" "+str(b) for i, b in c] for c in current]
final_current = [sorted(b, key= lambda x:l.index(x.split()[0])) for b in new_current]
return list(itertools.chain.from_iterable(final_current))
mylist = [[1890731350060, 'January 2016, February 2016, March 2015', 'INDEMNIZATIA DE HRANA', 1183], [1890922350110, 'May 2015, June 2015, April 2015', 'INDEMNIZATIA DE HRANA', 1183], [1890731350060, 'February 2016, March 2016, January 2016', 'INDEMNIZATIA DE HRANA', 1183]]
new_data = [[i[0], orderdates(i[1]), i[2:]] for i in mylist]
new_data = [list(itertools.chain(*[[b] if not isinstance(b, list) else b for b in i])) for i in new_data]
print(new_data)
Output:
[[1890731350060, 'Mar 2015', 'Ian 2016', 'Feb 2016', 'INDEMNIZATIA DE HRANA', 1183], [1890922350110, 'Apr 2015', 'Mai 2015', 'Iun 2015', 'INDEMNIZATIA DE HRANA', 1183], [1890731350060, 'Ian 2016', 'Feb 2016', 'Mar 2016', 'INDEMNIZATIA DE HRANA', 1183]]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

split string dates into list of multiple dates - python

Related

Group python dataframe and display all correspond values for each unique key in a dictionary

Sort a list of dictionaries of dates by value

Python - Given list of dates (as strings), how do we return only those that fall within last 365 days?

Splitting date range of dates into two to lists

Function that only works for the first two list from a list of lists

Categories

Resources