Order dates inside a string from a list - python

I have this kind of list of lists I give you example input:
thislist= [[1, 'Aug 2014, Sept 2016, Ian 2014, Feb 2016', 2], [5,'Aug 2015, Sept 2012, Ian 2015, Aug 2017',4]]
I'm interested to work only at index[1] for each list (the one with the dates) and my desired output will be this:
thislist= [[1, 'Ian, Aug 2014; Feb, Sept 2016', 2], [5,'Sept 2012; Ian, Aug 2015; Aug 2017',4]]
(the above it's just a example, in my actual case I will have many more dates with years, but the format is exactly the same)
Basically I want to order each dates name abbreviation (they are in Romanian but they are quite same in English) on their actual order from calendar (ex: Ian, Feb, Mar, Apr ...etc) and to have them grouped like in the example on years in cronological order (2010, 2011, 2012, 2013 ....etc) and have that ";" for separation. How I can do this? I think the only option should be regex, but I'm not that good with it, so I can get to my desired output? I'm using python 3, thank you so much for your time!

You should take consider that "%B %Y" it takes full month Name because Romanian and English month abbreviation is not same in all cases
from datetime import datetime
thislist = [[1, 'August 2014, September 2016, January 2014, February 2016', 2],
[5, 'August 2015, September 2012, January 2015, February 2017', 4]]
sorted_list = []
months = []
i = 0
for dates in thislist:
sorted_list = []
chgDates = dates[1].split(",")
for test1 in chgDates:
sorted_list.append(test1.strip())
test = sorted(sorted_list, key=lambda x: datetime.strptime(x, "%B %Y"))
str1 = ', '.join(test)
thislist[i][1] = str1.replace(",", ";")
i = + 1
print(thislist)
Response:
[[1, 'January 2014; August 2014; February 2016; September 2016', 2], [5, 'September 2012; January 2015; August 2015; February 2017', 4]]

Now you can translate from English-> Romanian. You should read a little bit about lists and dictionary in python. I don't think that you will receive a full answer if you just wait for community.
from datetime import datetime
import re
thislist = [[1, 'August 2014, September 2016, January 2014, February 2016, March 2016', 2],
[5, 'August 2015, September 2012, January 2015, February 2017', 4]]
sorted_list = []
months = []
i = 0
def translateInRo(string, dyct):
substrs = sorted(dyct, key=len, reverse=True)
regexp = re.compile('|'.join(map(re.escape, substrs)))
return regexp.sub(lambda match: dyct[match.group(0)], string)
for dates in thislist:
sorted_list = []
chgDates = dates[1].split(",")
for test1 in chgDates:
sorted_list.append(test1.strip())
test = sorted(sorted_list, key=lambda x: datetime.strptime(x, "%B %Y"))
str1 = ', '.join(test)
translate = translateInRo(
str1, {"September": "Septembrie", "January": "Ianuarie", "September": "Septembrie", "February": "Februarie", "March": "Martie"})
thislist[i][1] = translate
i = + 1
print(thislist)

Related

Sort a list of dictionaries of dates by value

I'm trying to sort the values with current year.
Current year values should show first.
mdlist = [{0:'31 Jan 2022', 1:'', 2:'10 Feb 2022'},
{0:'10 Feb 2021', 1:'20, Feb 2021', 2:''},
{0:'10 Feb 2022', 1:'10 Feb 2022', 2:'10 Feb 2022'}]
mdlist = sorted(mdlist, key=lambda d:d[0])
but it is not working as expected
expected output:
mdlist = [{0:'31 Jan 2022', 1:'', 2:'10 Feb 2022'},
{0:'10 Feb 2022', 1:'10 Feb 2022', 2:'10 Feb 2022'},
{0:'10 Feb 2021', 1:'20 Feb 2021', 2:''}]
Maybe you could leverage the fact that these are datetimes by using the datetime module and sort it by the years in descending order and the month-days in ascending order:
from datetime import datetime
def sorting_key(dct):
ymd = datetime.strptime(dct[0], "%d %b %Y")
return -ymd.year, ymd.month, ymd.day
mdlist.sort(key=sorting_key)
Output:
[{0: '31 Jan 2022', 1: '', 2: '10 Feb 2022'},
{0: '10 Feb 2022', 1: '10 Feb 2022', 2: '10 Feb 2022'},
{0: '10 Feb 2021', 1: '20 Feb 2021', 2: ''}]
Use a key function that returns 0 if the year is 2022, 1 otherwise. This will sort all the 2022 dates first.
firstyear = '2022'
mdlist = sorted(mdlist, key=lambda d: 0 if d:d[0].split()[-1] == firstyear else 1)

split string dates into list of multiple dates

I have the following list, I am trying to split the elements to multiple "dates" accordingly,I want to write a function to do it,I am not sure it regex is the way to go or datetime
x=["2- 7 MAY, 2020, 10-12 JUN, 2014","7 February, 2020, 6 February, 2020, 26 October, 2018","16 JUN, 2020, 24 JUL, 2020, 28 FEB, 2020, 15 SEPT, 2020, 8-11 MAY, 2023, 22 OCT, 2020","14 JUN, 2020"]
for i in x:
temp=my_func(i)
if len(temp)==1:
date1=temp[0]
date2=""
elif len(temp)>=2:
date1=temp[0]
date2=temp[1]
else:
continue
#rest of my code
here is the expected output of my_func
#my_func(x[0])=["2- 7 MAY, 2020", "10-12 JUN, 2014"]
#my_func([x[1]])=["7 February, 2020", "6 February, 2020", "26 October, 2018"]
#my_func(x[-1])=["14 JUN, 2020"]
According to your examples,
import re
for i in x:
temp =re.findall('\d.*?\d{4}',i)
#output
['2- 7 MAY, 2020', '10-12 JUN, 2014']
['7 February, 2020', '6 February, 2020', '26 October, 2018']
['16 JUN, 2020', '24 JUL, 2020', '28 FEB, 2020', '15 SEPT, 2020', '8-11 MAY, 2023', '22 OCT, 2020']
['14 JUN, 2020']
Split the string with ','. There are always even number of parts. 2 contiguous parts constitute a date. So just join up two parts to form a date string.
re is fine, but this should too:
>>> x = ["2- 7 MAY, 2020, 10-12 JUN, 2014","7 February, 2020, 6 February, 2020, 26 October, 2018","16 JUN, 2020, 24 JUL, 2020, 28 FEB, 2020, 15 SEPT, 2020, 8-11 MAY, 2023, 22 OCT, 2020","14 JUN, 2020"]
>>> result = []
>>> for s in x:
parts = s.split(',')
result.append([','.join(parts[i:i+2]).strip() for i in range(0,len(parts),2)])
>>> result
[['2- 7 MAY, 2020', '10-12 JUN, 2014'],
['7 February, 2020', '6 February, 2020', '26 October, 2018'],
['16 JUN, 2020', '24 JUL, 2020', '28 FEB, 2020', '15 SEPT, 2020', '8-11 MAY, 2023', '22 OCT, 2020'],
['14 JUN, 2020']
]
Your my_func will be simply:
>>> def my_func(s):
parts = s.split(',')
return [','.join(parts[i:i+2]).strip() for i in range(0,len(parts),2)]
global Varibale is used but the output is as you wanted it to be.
The program depends on the way how you format the dates. But with this format it works fine.
dates=["2- 7 MAY, 2020, 10-12 JUN, 2014","7 February, 2020, 6 February, 2020, 26 October, 2018","16 JUN, 2020, 24 JUL, 2020, 28 FEB, 2020, 15 SEPT, 2020, 8-11 MAY, 2023, 22 OCT, 2020","14 JUN, 2020"]
def splitdates(date):
if type(date) is int:
tosplit = str(dates[date])
else:
tosplit = date
month = ["J" , "j" , "f" , "F" , "m" , "M" , "A" , "a" , "s" , "S" ,"n" , "N" , "o" , "O" , "D" , "d"]
for item, character in enumerate(tosplit):
if character in month:
for item2, character in enumerate(tosplit[item+1:]):
if character.startswith(","):
for item3, character in enumerate(tosplit[item+item2+2:]):
if character.startswith(","):
global newdate
newdate.append(tosplit[:item + item2 + item3 + 3])
nextPart = tosplit[item + item2 + item3 + 3:]
if nextPart.endswith(";"):
newPart = nextPart
splitdates(newPart)
else:
newdate.append(tosplit[item+item2+item3+3:])
return newdate
newdate.append(tosplit[:item+item2+item3])
return newdate
for x in range(len(dates)):
newdate = []
print("Date: ",splitdates(x))
Output is:
Date: ['2- 7 MAY, 2020,', ' 10-12 JUN, 2014']
Date: ['7 February, 2020,', ' 6 February, 2020, 26 October, 2018']
Date: ['16 JUN, 2020,', ' 24 JUL, 2020, 28 FEB, 2020, 15 SEPT, 2020, 8-11 MAY, 2023, 22 OCT, 2020']
Date: ['14 JUN, 2']
**This is Simplest algorithm **
iterate over list pick one by one string now iterate over string with split method
pick two consecutive value add them and append into new list
x = ["2- 7 MAY, 2020, 10-12 JUN, 2014","7 February, 2020, 6 February, 2020, 26 October, 2018","16 JUN, 2020, 24 JUL, 2020, 28 FEB, 2020, 15 SEPT, 2020, 8-11 MAY, 2023, 22 OCT, 2020","14 JUN, 2020"]
for i in x:
final_answer=[]
f=i.split(',')
j=0
while(j<len(f)):
final_answer.append((str(f[j]+f[j+1])))
j=j+2
print(final_answer)
output
['2- 7 MAY 2020', ' 10-12 JUN 2014']
['7 February 2020', ' 6 February 2020', ' 26 October 2018']
['16 JUN 2020', ' 24 JUL 2020', ' 28 FEB 2020', ' 15 SEPT 2020', ' 8-11 MAY 2023', ' 22 OCT 2020']
['14 JUN 2020']

How do I dynamically capture in regex two dates from one line of text?

I have a text that will change weekly:
text = "Weekly Comparison, Week 50 October 28 - November 3, 2016 October 30 - November 5, 2015"
I'm looking for regex patterns for year 1, and year 2.
(Both will change weekly so I need the formula to capture all months, days, years)
My output should be the following:
2015 = November 5, 2015
2016 = November 3, 2016
The framework I'm using does not allow for regex capture groups or splits, so I need the formula to be specialized for this type of string.
Thanks!
Code
As per my original comments
See regex in use here
(\w+\s+\d+,\s*(\d+))
Note: The above regex and the regex on regex101 do not match. This is done purposely. Regex101 can only demonstrate the output of substitutions, thus I've prepended .*? to the regex in order to properly display the expected output.
Results
Input
Weekly Comparison, Week 50 October 28 - November 3, 2016 October 30 - November 5, 2015
Output
2016 = November 3, 2016
2015 = November 5, 2015
Usage
import re
regex = r"(\w+\s+\d+,\s*(\d+))"
str = "Weekly Comparison, Week 50 October 28 - November 3, 2016 October 30 - November 5, 2015"
for (date, year) in re.findall(regex, str):
print year + ' = ' + date
You can try this:
text = "Weekly Comparison, Week 50 October 28 - November 3, 2016 October 30 - November 5, 2015"
import re
final_data = sorted(["{} = {}".format(re.findall("\d+$", i)[0], i) for i in re.findall("[a-zA-Z]+\s\d+,\s\d+", text)], key=lambda x:int(re.findall("^\d+", x)[0]))
Output:
['2015 = November 5, 2015', '2016 = November 3, 2016']
Using #ctwheels regex:
text = "Weekly Comparison, Week 50 October 28 - November 3, 2016 October 30 - November 5, 2015"
import re
result = [(date.split(",")[1].strip(), date) for date in re.findall(r'\w+\s+\d+,\s*\d+', text)]
print(result)
# [('2016', 'November 3, 2016'), ('2015', 'November 5, 2015')]

Map conversion inside a list

This is example of mylist for input:
mylist = [['RWILY MORSHED', 7670315350025, 'August/2014, Iulie/2014, Septembrie/2014', 1620, 1620], ['AL BADRI MOHAMMED YAHYA TAWFEEQ', 7700119350028, 'Martie/2015, Aprilie/2015, Februarie/2015', 1620, 1620]]
and my desired output will be this:
mylist = [['RWILY MORSHED', 7670315350025, 'August 2014, July 2014, September 2014', 1620, 1620], ['AL BADRI MOHAMMED YAHYA TAWFEEQ', 7700119350028, 'March 2015, April 2015, February 2015', 1620, 1620]]
Basically I want to convert the months names from Romanian to English and get the above desired output(I need all the month in the same index 2 position like in output)!
And to that I ended up with this:
conversionsEnNames = {"Ianuarie": "January", "Februarie": "February","Martie": "March", "Aprilie": "April","Mai": "May","Iunie": "June", "Iulie": "July","August": "August", "Septembrie": "September","Octombrie": "October", "Noiembrie": "November","Decembrie": "December"}
for i in mylist:
i[2]=i[2].replace("/", " ")
for j in i:
if j in conversionsEnNames:
j = conversionsEnNames[j]
i[2]=j
print(mylist)
But that will print:
[['RWILY MORSHED', 7670315350025, 'August 2014, Iulie 2014, Septembrie 2014', 1620, 1620], ['AL BADRI MOHAMMED YAHYA TAWFEEQ', 7700119350028, 'Martie 2015, Aprilie 2015, Februarie 2015', 1620, 1620]]
And for sure the map won't work since 'Iulie 2014' != 'Iulie' so that it can convert it to July, what I to modify in order to achieve my disired output? Thank you so much for your time!
That should do it.
for item in mylist:
item[2] = item[2].replace('/', ' ')
for roman, english in conversionsEnNames.items():
item[2] = item[2].replace(roman, english)
You could also use regular expressions (just because regexes are cool, not because the solution is better than the others). Here's a snippet that takes care of the replacement in a single string:
import re
def rep(m):
if m.group(1) in conversionEnNames:
return conversionEnNames[m.group(1)] + " " + m.group(2)
else:
return m.group(0)
test = "blah blah Iulie/2014 blah Februarie/2015 blih blah bluh"
result = re.sub(r"([A-Za-z]+)/(\d\d\d\d)", rep, test)
It should yield
"blah blah July 2014 blah February 2015 blih blah bluh"
You can improve on that as needed.
you can use a simple dict lookup
def convert_date(date_string, conversionsEnNames):
parts = [part.split('/') for part in date_string.split(', ')]
return ', '. join(conversionsEnNames[part[0]] + ' ' + part[1] for part in parts)
for line in mylist:
line[2] = convert_date(line[2], conversionsEnNames)
mylist
[['RWILY MORSHED', 7670315350025, 'August 2014, July 2014, September 2014', 1620, 1620],
['AL BADRI MOHAMMED YAHYA TAWFEEQ', 7700119350028, 'March 2015, April 2015, February 2015', 1620, 1620]]
Just change the last three lines in the loop.
>>> for i in mylist:
... i[2]=i[2].replace("/", " ")
... for j in conversionsEnNames:
... i[2] = i[2].replace(j, conversionsEnNames[j])
...
>>> print(mylist)
[['RWILY MORSHED', 7670315350025, 'August 2014, July 2014, September 2014', 1620, 1620], ['AL BADRI MOHAMMED YAHYA TAWFEEQ', 7700119350028, 'March 2015, April 2015, February 2015', 1620, 1620]]
Be careful: i is not the "months" string, but the whole list.
Hence "Iulie" does not belong to the list (but is a substring of an element in the list).
Moreover, you can use .replace(x, conversionsEnNames[x]) for each x in conversionsEnNames: this is not "safe" (why?), but it works, in this case.

Function that only works for the first two list from a list of lists

I have this list:
mylist = [
[1890731350060, 'February 2016, March 2016, January 2016', 'INDEMNIZATIA DE HRANA', 1183],
[1890922350110, 'May 2015, June 2015, April 2015', 'INDEMNIZATIA DE HRANA', 1183],
[1890731350060, 'February 2016, March 2016, January 2016', 'INDEMNIZATIA DE HRANA', 1183]
]
and my desired output it this:
mylist = [
[1890731350060, 'Ian 2016, Feb 2016, Mar 2016', 'INDEMNIZATIA DE HRANA', 1183],
[1890922350110, 'Iun 2016, Mai 2016, Apr 2016', 'INDEMNIZATIA DE HRANA', 1183],
[1890731350060, 'Ian 2016, Feb 2016, Mar 2016', 'INDEMNIZATIA DE HRANA', 1183]
]
And for this I have this 2 functions:
from datetime import datetime
import re
def translateInRo(string, dyct):
substrs = sorted(dyct, key=len, reverse=True)
regexp = re.compile('|'.join(map(re.escape, substrs)))
return regexp.sub(lambda match: dyct[match.group(0)], string)
def orderDateslist(thislist):
i=0
for dates in thislist:
sorted_list = []
chgDates = dates[1].split(",")
for test1 in chgDates:
sorted_list.append(test1.strip())
test = sorted(sorted_list, key=lambda x: datetime.strptime(x, "%B %Y"))
str1 = ', '.join(test)
translate = translateInRo(
str1, {"January": "Ian", "February": "Feb", "March": "Mar", "April": "Apr", "May": "Mai", "June": "Iun", "July": "Iul", "August": "Aug", "September": "Sept", "October": "Oct", "November": "Nov", "December": "Dec"})
thislist[i][1] = translate
i = + 1
return thislist
And when I print:
print (orderDateslist(mylist))
[[1890731350060, 'Ian 2016, Feb 2016, Mar 2016', 'INDEMNIZATIA DE HRANA', 1183], [1890922350110, 'Ian 2016, Feb 2016, Mar 2016', 'INDEMNIZATIA DE HRANA', 1183], [1890731350060, 'February 2016, March 2016, January 2016', 'INDEMNIZATIA DE HRANA', 1183]]
The last list won't be computed, the function that I have works only for the first 2 lists from a list of lists, the ones after will stay the same, I want this function to work for a big number of lists, what I have to change ? I'm using python 3. ANd also the last one is duplicating.
Problem
To clarify the problem, from your expected code it appears you wish to replace the string of dates at index 1 of each sublist by:
sorting dates by time
abbreviating the months according to a translation dictionary
This can be done as follows:
# Given
import datetime
mylist = [
[1890731350060, 'February 2016, March 2016, January 2016', 'INDEMNIZATIA DE HRANA', 1183],
[1890922350110, 'May 2015, June 2015, April 2015', 'INDEMNIZATIA DE HRANA', 1183],
[1890731350060, 'February 2016, March 2016, January 2016', 'INDEMNIZATIA DE HRANA', 1183]
]
TRANSLATE = {
"January": "Ian", "February": "Feb", "March": "Mar", "April": "Apr",
"May": "Mai", "June": "Iun", "July": "Iul", "August": "Aug",
"September": "Sept", "October": "Oct", "November": "Nov", "December": "Dec"
}
Code
def transform_dates(iterable, translate=TRANSLATE):
transformed_lists = []
for i, sublst in enumerate(iterable):
transformed_lists.append(sublst[:])
# Clean dates string
raw_dates = sublst[1]
cleaned_dates = set(map(str.strip, raw_dates.split(",")))
# Sort dates string
months_yrs = sorted(cleaned_dates, key=lambda x: datetime.datetime.strptime(x, "%B %Y"))
months_yrs_split = [i.split() for i in months_yrs]
# Abbreviate months
abbrev_dates = [" ".join((translate[i[0]], i[1])) for i in months_yrs_split]
transformed_lists[i][1] = ", ".join(abbrev_dates)
return transformed_lists
transform_dates(mylist)
# [[1890731350060, 'Ian 2016, Feb 2016, Mar 2016', 'INDEMNIZATIA DE HRANA',1183],
# [1890922350110, 'Apr 2015, Mai 2015, Iun 2015', 'INDEMNIZATIA DE HRANA',1183],
# [1890731350060, 'Ian 2016, Feb 2016, Mar 2016', 'INDEMNIZATIA DE HRANA',1183]]
Notes
This function sorts by month and year.
lst = [1890731350060, 'February 2015, March 2013, January 2016', 'INDEMNIZATIA DE HRANA', 1183],
transform_dates(lst)
# [[1890731350060, 'Mar 2013, Feb 2015, Ian 2016', 'INDEMNIZATIA DE HRANA', 1183]]
This function removes duplicate dates.
lst = [1890731350060, 'May 2016, June 2016, May 2016, July 2016', 'INDEMNIZATIA DE HRANA', 1183],
transform_dates(lst)
# [[1890731350060,'Mai 2016, Iun 2016, Iul 2016', 'INDEMNIZATIA DE HRANA', 1183]]
Details
If you are new to Python, I add these details to help express what's happening.
The transform_dates() function accepts the list of lists called mylist as and argument. Inside the function, we first make a new list called transformed_lists that we will later append items to. We now loop over iterable (equivalent to mylist) to get each sublist and track their index positions (i).
We add a copy of the sublst to transform_dates (hence the [:] as this keeps us from modifying the original items in mylist). Then we start working on the first index that contains the string of dates. We clean the string, first by splitting it into a list of month-year pairs, and then strip trailing and leading spaces, e.g. ['February 2016', 'March 2016', 'January 2016']. If there are any duplicate dates, the set() removes them since a set is a collection of unique elements.
In preparation for the next step, we take this opportunity to sort them dates and split them further by the single space. Splitting makes a temporary nested list e.g. [['January', '2016'], ['February', '2016'], ['March', '2016']].
Finally, for each item in the latter nested list, we abbreviate the month using the TRANSLATE dictionary and join() it back with the year, making a single list of new strings e.g. ['Jan 2016', 'Feb 2016', 'Mar 2016']. Then we perform a final join() where each item is delimited by a comma (as requested), e.g. 'Jan 2016, Feb 2016, Mar 2016'.
We have finished transforming the string. Now we simply replace the old string at index 1 of our transformed_lists by assigning the new string to that index. In summary, we have systematically selected the string, decomposed it, transformed it, put it back together and reassign it to it's original position in the list. We repeat this process for every sublist in iterable until the loop is complete. The result is our transformed_lists, which is return by the function.
You can try this:
import re
import itertools
def orderdates(full_date):
table = {"January": "Ian", "February": "Feb", "March": "Mar", "April": "Apr", "May": "Mai", "June": "Iun", "July": "Iul", "August": "Aug", "September": "Sept", "October": "Oct", "November": "Nov", "December": "Dec"}
l = ["Ian", "Feb", "Mar", "Apr", "Mai", "Iun", "Iul", "Aug", "Sept", "Oct", "Nov", "Dec"]
new_dates = re.split(",\s", full_date)
final_dates = [[a, int(b)] for a, b in [i.split() for i in new_dates]]
new_dates = sorted(final_dates, key = lambda x: x[-1])
current = [list(b) for a, b in itertools.groupby(new_dates, lambda x: x[-1])]
new_current = [[table[i]+" "+str(b) for i, b in c] for c in current]
final_current = [sorted(b, key= lambda x:l.index(x.split()[0])) for b in new_current]
return list(itertools.chain.from_iterable(final_current))
mylist = [[1890731350060, 'January 2016, February 2016, March 2015', 'INDEMNIZATIA DE HRANA', 1183], [1890922350110, 'May 2015, June 2015, April 2015', 'INDEMNIZATIA DE HRANA', 1183], [1890731350060, 'February 2016, March 2016, January 2016', 'INDEMNIZATIA DE HRANA', 1183]]
new_data = [[i[0], orderdates(i[1]), i[2:]] for i in mylist]
new_data = [list(itertools.chain(*[[b] if not isinstance(b, list) else b for b in i])) for i in new_data]
print(new_data)
Output:
[[1890731350060, 'Mar 2015', 'Ian 2016', 'Feb 2016', 'INDEMNIZATIA DE HRANA', 1183], [1890922350110, 'Apr 2015', 'Mai 2015', 'Iun 2015', 'INDEMNIZATIA DE HRANA', 1183], [1890731350060, 'Ian 2016', 'Feb 2016', 'Mar 2016', 'INDEMNIZATIA DE HRANA', 1183]]

Categories

Resources