Pandas date convesrion unconverted data remains

Pandas date convesrion unconverted data remains - python

In Pandas (Juypter) I have a column with dates in string format:
koncerti.Date.values[:20]
array(['15 September 2010', '16 September 2010', '18 September 2010',
'20 September 2010', '21 September 2010', '23 September 2010',
'24 September 2010', '26 September 2010', '28 September 2010',
'30 September 2010', '1 October 2010', '3 October 2010',
'5 October 2010', '6 October 2010', '8 October 2010',
'10 October 2010', '12 October 2010', '13 October 2010',
'15 October 2010', '17 October 2010'], dtype=object)
I try to convert them to date format with the following statement:
koncerti.Date = pd.to_datetime(koncerti.Date, format='%d %B %Y')
Unfortunatelly, it produces the following error: ValueError: unconverted data remains: [31]
What does it mean this error?

Solution: koncerti.Date = pd.to_datetime(koncerti.Date, format='%d %B %Y', exact=False)
Addditional parameter was needed: exact=False

Related

Sort a list of dictionaries of dates by value

I'm trying to sort the values with current year.
Current year values should show first.
mdlist = [{0:'31 Jan 2022', 1:'', 2:'10 Feb 2022'},
{0:'10 Feb 2021', 1:'20, Feb 2021', 2:''},
{0:'10 Feb 2022', 1:'10 Feb 2022', 2:'10 Feb 2022'}]
mdlist = sorted(mdlist, key=lambda d:d[0])
but it is not working as expected
expected output:
mdlist = [{0:'31 Jan 2022', 1:'', 2:'10 Feb 2022'},
{0:'10 Feb 2022', 1:'10 Feb 2022', 2:'10 Feb 2022'},
{0:'10 Feb 2021', 1:'20 Feb 2021', 2:''}]

Maybe you could leverage the fact that these are datetimes by using the datetime module and sort it by the years in descending order and the month-days in ascending order:
from datetime import datetime
def sorting_key(dct):
ymd = datetime.strptime(dct[0], "%d %b %Y")
return -ymd.year, ymd.month, ymd.day
mdlist.sort(key=sorting_key)
Output:
[{0: '31 Jan 2022', 1: '', 2: '10 Feb 2022'},
{0: '10 Feb 2022', 1: '10 Feb 2022', 2: '10 Feb 2022'},
{0: '10 Feb 2021', 1: '20 Feb 2021', 2: ''}]

Use a key function that returns 0 if the year is 2022, 1 otherwise. This will sort all the 2022 dates first.
firstyear = '2022'
mdlist = sorted(mdlist, key=lambda d: 0 if d:d[0].split()[-1] == firstyear else 1)

Python - Given list of dates (as strings), how do we return only those that fall within last 365 days?

Given the following list of strings
from datetime import datetime
import numpy as np
strings = ['Nov 1 2021', 'Oct 25 2021', 'Oct 18 2021', 'Oct 11 2021', 'Oct 4 2021', 'Sep 27 2021',
'Sep 20 2021', 'Aug 24 2021', 'Aug 16 2021', 'Aug 9 2021', 'Aug 2 2021', 'Jul 26 2021',
'Jun 28 2021', 'Jun 21 2021', 'Jun 14 2021', 'Jun 7 2021', 'May 24 2021', 'May 10 2021',
'May 3 2021', 'Apr 26 2021', 'Apr 12 2021', 'Apr 12 2021', 'Apr 5 2021', 'Mar 22 2021',
'Feb 22 2021', 'Feb 13 2021', 'Feb 8 2021', 'Feb 1 2021', 'Nov 2 2020', 'Sep 28 2020',
'Aug 31 2020', 'Aug 20 2020', 'Aug 10 2020', 'Jun 29 2020', 'Jun 22 2020', 'Jun 15 2020',
'Mar 2 2020', 'Feb 10 2020', 'Feb 3 2020', 'Jan 27 2020', 'Jan 20 2020', 'Jan 13 2020',
'Jan 6 2020', 'Aug 26 2019', 'Aug 5 2019', 'Jul 29 2019', 'Jul 22 2019', 'Jul 15 2019']
What's the most efficient way to return a list of those dates that fall within the last 365 days?
Here's my failed attempt:
# Converts strings to datetime format and appends to new list, 'dates.'
dates = []
for item in strings:
convert_string = datetime.strptime(item, "%b %d %Y").date()
dates.append(convert_string)
# Given each item in 'dates' list returns corresponding
# list showing elapsed time between each item and today (Nov 11th 2021).
elapsed_time = []
def dateDelta(i):
today = datetime.fromisoformat(datetime.today().strftime('%Y-%m-%d')).date()
date = i
delta = (today - date).days
elapsed_time.append(delta)
for i in dates:
dateDelta(i)
# Concatenates 'dates' list and 'elapsed_times_list' in attempt to somehow connect the two.
date_and_elapsed_time = []
date_and_elapsed_time.append(dates)
date_and_elapsed_time.append(elapsed_time)
# Takes 'elapsed_time list' appends only the dates that fall within the last 365 days.
relevant_elapsed_time_list = []
for i in elapsed_time:
if i <= 365:
relevant_elapsed_time_list.append(i)
# Finds indices of 'relevant_elapsed_time_list' within last 365 days.
# After trawling StackOverflow posts, I import numpy in an effort to help with indexing.
# My thinking is I can use the indices of the relevant elapsed times from the
# 'elapsed_time_list' and return the corresponding date from the 'dates' list.
relevant_elapsed_time_list_indices = []
for item in relevant_elapsed_time_list:
indexes = []
for index, sub_lst in enumerate(date_and_elapsed_time):
try:
indexes.append((index, sub_lst.index(item)))
except ValueError:
pass
relevant_elapsed_time_list_indices.append(indexes)
relevant_elapsed_time_list_indices = np.array([[x[0][0], x[0][1]] for x in relevant_elapsed_time_list_indices])
At this point, I'm as yet unable to convert the relevant_elapsed_time_list_indices list to the corresponding indices for the first sub-list in date_and_elapsed_time. The point of this would be to then isolate those indices (i.e. dates).
What's the most efficient way to solve this problem?

You can convert the strings to datetime objects using .strptime, then use a conditional list comprehension that uses timedelta to pick ones that fall within the last 365 days:
from datetime import datetime, timedelta
last_365_days = [s for s in strings if datetime.strptime(s, "%b %d %Y") + timedelta(days=365) >= datetime.today()]
Alternatively you can compute the cutoff date in advance:
cutoff = datetime.today() - timedelta(days=365)
last_365_days = [s for s in strings if datetime.strptime(s, "%b %d %Y") >= cutoff]
The value of last_365_days should then be (for today):
['Nov 1 2021', 'Oct 25 2021', 'Oct 18 2021', 'Oct 11 2021', 'Oct 4 2021',
'Sep 27 2021', 'Sep 20 2021', 'Aug 24 2021', 'Aug 16 2021', 'Aug 9 2021',
'Aug 2 2021', 'Jul 26 2021', 'Jun 28 2021', 'Jun 21 2021', 'Jun 14 2021',
'Jun 7 2021', 'May 24 2021', 'May 10 2021', 'May 3 2021', 'Apr 26 2021',
'Apr 12 2021', 'Apr 12 2021', 'Apr 5 2021', 'Mar 22 2021', 'Feb 22 2021',
'Feb 13 2021', 'Feb 8 2021', 'Feb 1 2021']

How to find time format in a text?

I want to extract only time from a text with so many different formats of dates and time such as 'thursday, august 6, 2020 4:32:54 pm', '25 september 2020 04:05 pm' and '29 april 2020 07:42'. So I want to extract, for example, 4:32:54, 07:42, 04:05. Can you help me with that?

I would try something like this:
times = [
'thursday, august 6, 2020 4:32:54 pm',
'25 september 2020 04:05 pm',
'29 april 2020 07:42',
]
print("\n".join("".join(i for i in t.split() if ":" in i) for t in times))
Output:
4:32:54
04:05
07:42

Splitting date range of dates into two to lists

I want to spit the following list of dates:
Month=['1 October 2020 to 31 October 2020',
'1 October 2020 to 31 October 2020',
'1 October 2020 to 31 October 2020',
'1 October 2020 to 31 October 2020',
'1 October 2020 to 31 October 2020']
The desired output is as follows:
Month = [['1 October 2020', '31 October 2020'],
['1 October 2020', '31 October 2020'],
['1 October 2020','31 October 2020'],
['1 October 2020', '31 October 2020'],
['1 October 2020','31 October 2020']]
How can I do this using regex.
I used Month.str.split('to') but this is not working properly because October contains to as a result splitting October into three strings. Therefore, I guess I have to use regex for this. What is the best way to achieve this?

Use ' to ' as the partition instead of just to -- this matches the input format better anyway since if you split on to you'd need to also strip the whitespace.
>>> [list(i.split(' to ') for i in Month)]
[[['1 October 2020', '31 October 2020'], ['1 October 2020', '31 October 2020'], ['1 October 2020', '31 October 2020'], ['1 October 2020', '31 October 2020'], ['1 October 2020', '31 October 2020']]]

split string dates into list of multiple dates

I have the following list, I am trying to split the elements to multiple "dates" accordingly,I want to write a function to do it,I am not sure it regex is the way to go or datetime
x=["2- 7 MAY, 2020, 10-12 JUN, 2014","7 February, 2020, 6 February, 2020, 26 October, 2018","16 JUN, 2020, 24 JUL, 2020, 28 FEB, 2020, 15 SEPT, 2020, 8-11 MAY, 2023, 22 OCT, 2020","14 JUN, 2020"]
for i in x:
temp=my_func(i)
if len(temp)==1:
date1=temp[0]
date2=""
elif len(temp)>=2:
date1=temp[0]
date2=temp[1]
else:
continue
#rest of my code
here is the expected output of my_func
#my_func(x[0])=["2- 7 MAY, 2020", "10-12 JUN, 2014"]
#my_func([x[1]])=["7 February, 2020", "6 February, 2020", "26 October, 2018"]
#my_func(x[-1])=["14 JUN, 2020"]

According to your examples,
import re
for i in x:
temp =re.findall('\d.*?\d{4}',i)
#output
['2- 7 MAY, 2020', '10-12 JUN, 2014']
['7 February, 2020', '6 February, 2020', '26 October, 2018']
['16 JUN, 2020', '24 JUL, 2020', '28 FEB, 2020', '15 SEPT, 2020', '8-11 MAY, 2023', '22 OCT, 2020']
['14 JUN, 2020']

Split the string with ','. There are always even number of parts. 2 contiguous parts constitute a date. So just join up two parts to form a date string.
re is fine, but this should too:
>>> x = ["2- 7 MAY, 2020, 10-12 JUN, 2014","7 February, 2020, 6 February, 2020, 26 October, 2018","16 JUN, 2020, 24 JUL, 2020, 28 FEB, 2020, 15 SEPT, 2020, 8-11 MAY, 2023, 22 OCT, 2020","14 JUN, 2020"]
>>> result = []
>>> for s in x:
parts = s.split(',')
result.append([','.join(parts[i:i+2]).strip() for i in range(0,len(parts),2)])
>>> result
[['2- 7 MAY, 2020', '10-12 JUN, 2014'],
['7 February, 2020', '6 February, 2020', '26 October, 2018'],
['16 JUN, 2020', '24 JUL, 2020', '28 FEB, 2020', '15 SEPT, 2020', '8-11 MAY, 2023', '22 OCT, 2020'],
['14 JUN, 2020']
]
Your my_func will be simply:
>>> def my_func(s):
parts = s.split(',')
return [','.join(parts[i:i+2]).strip() for i in range(0,len(parts),2)]

global Varibale is used but the output is as you wanted it to be.
The program depends on the way how you format the dates. But with this format it works fine.
dates=["2- 7 MAY, 2020, 10-12 JUN, 2014","7 February, 2020, 6 February, 2020, 26 October, 2018","16 JUN, 2020, 24 JUL, 2020, 28 FEB, 2020, 15 SEPT, 2020, 8-11 MAY, 2023, 22 OCT, 2020","14 JUN, 2020"]
def splitdates(date):
if type(date) is int:
tosplit = str(dates[date])
else:
tosplit = date
month = ["J" , "j" , "f" , "F" , "m" , "M" , "A" , "a" , "s" , "S" ,"n" , "N" , "o" , "O" , "D" , "d"]
for item, character in enumerate(tosplit):
if character in month:
for item2, character in enumerate(tosplit[item+1:]):
if character.startswith(","):
for item3, character in enumerate(tosplit[item+item2+2:]):
if character.startswith(","):
global newdate
newdate.append(tosplit[:item + item2 + item3 + 3])
nextPart = tosplit[item + item2 + item3 + 3:]
if nextPart.endswith(";"):
newPart = nextPart
splitdates(newPart)
else:
newdate.append(tosplit[item+item2+item3+3:])
return newdate
newdate.append(tosplit[:item+item2+item3])
return newdate
for x in range(len(dates)):
newdate = []
print("Date: ",splitdates(x))
Output is:
Date: ['2- 7 MAY, 2020,', ' 10-12 JUN, 2014']
Date: ['7 February, 2020,', ' 6 February, 2020, 26 October, 2018']
Date: ['16 JUN, 2020,', ' 24 JUL, 2020, 28 FEB, 2020, 15 SEPT, 2020, 8-11 MAY, 2023, 22 OCT, 2020']
Date: ['14 JUN, 2']

**This is Simplest algorithm **
iterate over list pick one by one string now iterate over string with split method
pick two consecutive value add them and append into new list
x = ["2- 7 MAY, 2020, 10-12 JUN, 2014","7 February, 2020, 6 February, 2020, 26 October, 2018","16 JUN, 2020, 24 JUL, 2020, 28 FEB, 2020, 15 SEPT, 2020, 8-11 MAY, 2023, 22 OCT, 2020","14 JUN, 2020"]
for i in x:
final_answer=[]
f=i.split(',')
j=0
while(j<len(f)):
final_answer.append((str(f[j]+f[j+1])))
j=j+2
print(final_answer)
output
['2- 7 MAY 2020', ' 10-12 JUN 2014']
['7 February 2020', ' 6 February 2020', ' 26 October 2018']
['16 JUN 2020', ' 24 JUL 2020', ' 28 FEB 2020', ' 15 SEPT 2020', ' 8-11 MAY 2023', ' 22 OCT 2020']
['14 JUN 2020']

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pandas date convesrion unconverted data remains - python

Solution: koncerti.Date = pd.to_datetime(koncerti.Date, format='%d %B %Y', exact=False) Addditional parameter was needed: exact=False

Related

Sort a list of dictionaries of dates by value

Python - Given list of dates (as strings), how do we return only those that fall within last 365 days?

How to find time format in a text?

Splitting date range of dates into two to lists

split string dates into list of multiple dates

Categories

Resources