How to find time format in a text? - python

I want to extract only time from a text with so many different formats of dates and time such as 'thursday, august 6, 2020 4:32:54 pm', '25 september 2020 04:05 pm' and '29 april 2020 07:42'. So I want to extract, for example, 4:32:54, 07:42, 04:05. Can you help me with that?

I would try something like this:
times = [
'thursday, august 6, 2020 4:32:54 pm',
'25 september 2020 04:05 pm',
'29 april 2020 07:42',
]
print("\n".join("".join(i for i in t.split() if ":" in i) for t in times))
Output:
4:32:54
04:05
07:42

Related

Pandas date convesrion unconverted data remains

In Pandas (Juypter) I have a column with dates in string format:
koncerti.Date.values[:20]
array(['15 September 2010', '16 September 2010', '18 September 2010',
'20 September 2010', '21 September 2010', '23 September 2010',
'24 September 2010', '26 September 2010', '28 September 2010',
'30 September 2010', '1 October 2010', '3 October 2010',
'5 October 2010', '6 October 2010', '8 October 2010',
'10 October 2010', '12 October 2010', '13 October 2010',
'15 October 2010', '17 October 2010'], dtype=object)
I try to convert them to date format with the following statement:
koncerti.Date = pd.to_datetime(koncerti.Date, format='%d %B %Y')
Unfortunatelly, it produces the following error: ValueError: unconverted data remains: [31]
What does it mean this error?
Solution: koncerti.Date = pd.to_datetime(koncerti.Date, format='%d %B %Y', exact=False)
Addditional parameter was needed: exact=False

Splitting date range of dates into two to lists

I want to spit the following list of dates:
Month=['1 October 2020 to 31 October 2020',
'1 October 2020 to 31 October 2020',
'1 October 2020 to 31 October 2020',
'1 October 2020 to 31 October 2020',
'1 October 2020 to 31 October 2020']
The desired output is as follows:
Month = [['1 October 2020', '31 October 2020'],
['1 October 2020', '31 October 2020'],
['1 October 2020','31 October 2020'],
['1 October 2020', '31 October 2020'],
['1 October 2020','31 October 2020']]
How can I do this using regex.
I used Month.str.split('to') but this is not working properly because October contains to as a result splitting October into three strings. Therefore, I guess I have to use regex for this. What is the best way to achieve this?
Use ' to ' as the partition instead of just to -- this matches the input format better anyway since if you split on to you'd need to also strip the whitespace.
>>> [list(i.split(' to ') for i in Month)]
[[['1 October 2020', '31 October 2020'], ['1 October 2020', '31 October 2020'], ['1 October 2020', '31 October 2020'], ['1 October 2020', '31 October 2020'], ['1 October 2020', '31 October 2020']]]

split string dates into list of multiple dates

I have the following list, I am trying to split the elements to multiple "dates" accordingly,I want to write a function to do it,I am not sure it regex is the way to go or datetime
x=["2- 7 MAY, 2020, 10-12 JUN, 2014","7 February, 2020, 6 February, 2020, 26 October, 2018","16 JUN, 2020, 24 JUL, 2020, 28 FEB, 2020, 15 SEPT, 2020, 8-11 MAY, 2023, 22 OCT, 2020","14 JUN, 2020"]
for i in x:
temp=my_func(i)
if len(temp)==1:
date1=temp[0]
date2=""
elif len(temp)>=2:
date1=temp[0]
date2=temp[1]
else:
continue
#rest of my code
here is the expected output of my_func
#my_func(x[0])=["2- 7 MAY, 2020", "10-12 JUN, 2014"]
#my_func([x[1]])=["7 February, 2020", "6 February, 2020", "26 October, 2018"]
#my_func(x[-1])=["14 JUN, 2020"]
According to your examples,
import re
for i in x:
temp =re.findall('\d.*?\d{4}',i)
#output
['2- 7 MAY, 2020', '10-12 JUN, 2014']
['7 February, 2020', '6 February, 2020', '26 October, 2018']
['16 JUN, 2020', '24 JUL, 2020', '28 FEB, 2020', '15 SEPT, 2020', '8-11 MAY, 2023', '22 OCT, 2020']
['14 JUN, 2020']
Split the string with ','. There are always even number of parts. 2 contiguous parts constitute a date. So just join up two parts to form a date string.
re is fine, but this should too:
>>> x = ["2- 7 MAY, 2020, 10-12 JUN, 2014","7 February, 2020, 6 February, 2020, 26 October, 2018","16 JUN, 2020, 24 JUL, 2020, 28 FEB, 2020, 15 SEPT, 2020, 8-11 MAY, 2023, 22 OCT, 2020","14 JUN, 2020"]
>>> result = []
>>> for s in x:
parts = s.split(',')
result.append([','.join(parts[i:i+2]).strip() for i in range(0,len(parts),2)])
>>> result
[['2- 7 MAY, 2020', '10-12 JUN, 2014'],
['7 February, 2020', '6 February, 2020', '26 October, 2018'],
['16 JUN, 2020', '24 JUL, 2020', '28 FEB, 2020', '15 SEPT, 2020', '8-11 MAY, 2023', '22 OCT, 2020'],
['14 JUN, 2020']
]
Your my_func will be simply:
>>> def my_func(s):
parts = s.split(',')
return [','.join(parts[i:i+2]).strip() for i in range(0,len(parts),2)]
global Varibale is used but the output is as you wanted it to be.
The program depends on the way how you format the dates. But with this format it works fine.
dates=["2- 7 MAY, 2020, 10-12 JUN, 2014","7 February, 2020, 6 February, 2020, 26 October, 2018","16 JUN, 2020, 24 JUL, 2020, 28 FEB, 2020, 15 SEPT, 2020, 8-11 MAY, 2023, 22 OCT, 2020","14 JUN, 2020"]
def splitdates(date):
if type(date) is int:
tosplit = str(dates[date])
else:
tosplit = date
month = ["J" , "j" , "f" , "F" , "m" , "M" , "A" , "a" , "s" , "S" ,"n" , "N" , "o" , "O" , "D" , "d"]
for item, character in enumerate(tosplit):
if character in month:
for item2, character in enumerate(tosplit[item+1:]):
if character.startswith(","):
for item3, character in enumerate(tosplit[item+item2+2:]):
if character.startswith(","):
global newdate
newdate.append(tosplit[:item + item2 + item3 + 3])
nextPart = tosplit[item + item2 + item3 + 3:]
if nextPart.endswith(";"):
newPart = nextPart
splitdates(newPart)
else:
newdate.append(tosplit[item+item2+item3+3:])
return newdate
newdate.append(tosplit[:item+item2+item3])
return newdate
for x in range(len(dates)):
newdate = []
print("Date: ",splitdates(x))
Output is:
Date: ['2- 7 MAY, 2020,', ' 10-12 JUN, 2014']
Date: ['7 February, 2020,', ' 6 February, 2020, 26 October, 2018']
Date: ['16 JUN, 2020,', ' 24 JUL, 2020, 28 FEB, 2020, 15 SEPT, 2020, 8-11 MAY, 2023, 22 OCT, 2020']
Date: ['14 JUN, 2']
**This is Simplest algorithm **
iterate over list pick one by one string now iterate over string with split method
pick two consecutive value add them and append into new list
x = ["2- 7 MAY, 2020, 10-12 JUN, 2014","7 February, 2020, 6 February, 2020, 26 October, 2018","16 JUN, 2020, 24 JUL, 2020, 28 FEB, 2020, 15 SEPT, 2020, 8-11 MAY, 2023, 22 OCT, 2020","14 JUN, 2020"]
for i in x:
final_answer=[]
f=i.split(',')
j=0
while(j<len(f)):
final_answer.append((str(f[j]+f[j+1])))
j=j+2
print(final_answer)
output
['2- 7 MAY 2020', ' 10-12 JUN 2014']
['7 February 2020', ' 6 February 2020', ' 26 October 2018']
['16 JUN 2020', ' 24 JUL 2020', ' 28 FEB 2020', ' 15 SEPT 2020', ' 8-11 MAY 2023', ' 22 OCT 2020']
['14 JUN 2020']

How do I dynamically capture in regex two dates from one line of text?

I have a text that will change weekly:
text = "Weekly Comparison, Week 50 October 28 - November 3, 2016 October 30 - November 5, 2015"
I'm looking for regex patterns for year 1, and year 2.
(Both will change weekly so I need the formula to capture all months, days, years)
My output should be the following:
2015 = November 5, 2015
2016 = November 3, 2016
The framework I'm using does not allow for regex capture groups or splits, so I need the formula to be specialized for this type of string.
Thanks!
Code
As per my original comments
See regex in use here
(\w+\s+\d+,\s*(\d+))
Note: The above regex and the regex on regex101 do not match. This is done purposely. Regex101 can only demonstrate the output of substitutions, thus I've prepended .*? to the regex in order to properly display the expected output.
Results
Input
Weekly Comparison, Week 50 October 28 - November 3, 2016 October 30 - November 5, 2015
Output
2016 = November 3, 2016
2015 = November 5, 2015
Usage
import re
regex = r"(\w+\s+\d+,\s*(\d+))"
str = "Weekly Comparison, Week 50 October 28 - November 3, 2016 October 30 - November 5, 2015"
for (date, year) in re.findall(regex, str):
print year + ' = ' + date
You can try this:
text = "Weekly Comparison, Week 50 October 28 - November 3, 2016 October 30 - November 5, 2015"
import re
final_data = sorted(["{} = {}".format(re.findall("\d+$", i)[0], i) for i in re.findall("[a-zA-Z]+\s\d+,\s\d+", text)], key=lambda x:int(re.findall("^\d+", x)[0]))
Output:
['2015 = November 5, 2015', '2016 = November 3, 2016']
Using #ctwheels regex:
text = "Weekly Comparison, Week 50 October 28 - November 3, 2016 October 30 - November 5, 2015"
import re
result = [(date.split(",")[1].strip(), date) for date in re.findall(r'\w+\s+\d+,\s*\d+', text)]
print(result)
# [('2016', 'November 3, 2016'), ('2015', 'November 5, 2015')]

Split, edit, and replace values in list

having trouble doing some email text munging. I have participant sign ups in a list like this:
body=['Study: Study 1', 'Date: Friday, March 28, 2014 3:15 PM - 4:00 PM',
'Location: Some Place','Participant: John Doe','Study: Study 1',
'Date: Friday, March 28, 2014 4:00 PM - 4:40 PM',
'Location: Some Place','Participant: Mary Smith']
I'm new to using python, so I'm not sure if there a specific name for the operation I want. Practically, what I want is to take the list items with the 'Participant: tag, remove that tag, and split the names up into separate list items for first and last names. So, something like this:
body=['Study: Study 1', 'Date: Friday, March 28, 2014 3:15 PM - 4:00 PM',
'Location: Some Place','John' ,'Doe']
I've tried using a list comprehension similar to here:
[item.split(' ')[1:] for item in body if re.match('Participant:*', item)]
which gives me back a nested list like this:
[['John', 'Doe'],['Mary','Smith']]
But, I have no idea how to make those nested lists with the first and last names into single list items, and no idea how to insert them back into the original list.
Any help is much appreciated!
You can have you cake and eat it with:
[elem
for line in body
for elem in (line.split()[1:] if line.startswith('Participant:') else (line,))]
This produces output in a nested loop, where the inner loop either iterates over the split output or a tuple with one element, the unsplit list element:
>>> from pprint import pprint
>>> body=['Study: Study 1', 'Date: Friday, March 28, 2014 3:15 PM - 4:00 PM',
... 'Location: Some Place','Participant: John Doe','Study: Study 1',
... 'Date: Friday, March 28, 2014 4:00 PM - 4:40 PM',
... 'Location: Some Place','Participant: Mary Smith']
>>> [elem
... for line in body
... for elem in (line.split()[1:] if line.startswith('Participant:') else (line,))]
['Study: Study 1', 'Date: Friday, March 28, 2014 3:15 PM - 4:00 PM', 'Location: Some Place', 'John', 'Doe', 'Study: Study 1', 'Date: Friday, March 28, 2014 4:00 PM - 4:40 PM', 'Location: Some Place', 'Mary', 'Smith']
>>> pprint(_)
['Study: Study 1',
'Date: Friday, March 28, 2014 3:15 PM - 4:00 PM',
'Location: Some Place',
'John',
'Doe',
'Study: Study 1',
'Date: Friday, March 28, 2014 4:00 PM - 4:40 PM',
'Location: Some Place',
'Mary',
'Smith']
IMHO, this sort of thing is cleanest with a function:
def do_whatever(lst):
for item in lst:
if item.startswith('Participant:'):
head, tail = item.split(':', 1)
for name in tail.split():
yield name
else:
yield item
body = list(do_whatever(body))
e.g.:
>>> def do_whatever(lst):
... for item in lst:
... if item.startswith('Participant:'):
... head, tail = item.split(':', 1)
... for name in tail.split():
... yield name
... else:
... yield item
...
>>> body=['Study: Study 1', 'Date: Friday, March 28, 2014 3:15 PM - 4:00 PM',
... 'Location: Some Place','Participant: John Doe','Study: Study 1',
... 'Date: Friday, March 28, 2014 4:00 PM - 4:40 PM',
... 'Location: Some Place','Participant: Mary Smith']
>>> body = list(do_whatever(body))
>>> body
['Study: Study 1', 'Date: Friday, March 28, 2014 3:15 PM - 4:00 PM', 'Location: Some Place', 'John', 'Doe', 'Study: Study 1', 'Date: Friday, March 28, 2014 4:00 PM - 4:40 PM', 'Location: Some Place', 'Mary', 'Smith']
Sorry for the really bad function name -- I'm not feeling creative at the moment...

Categories

Resources