I was just curious if there was a way to delete text from a string or only capture specific text when the string varies in info.
Exmaples of the strings I'm working:
3/5/2019 12:38 PM
10/30/2019 6:32 AM
9/12/2019 9:53 AM
I want to be able to extract the date and hour of the day separately and append them to a list. However obviously those vary and even the index of the hour can change as the day, month or hour can become > 10 which can push it back up to three spaces.
import re
s = "3/5/2019 12:38 PM"
result = re.compile(r"[\s\/:]").split(s)
result:
['3', '5', '2019', '12', '38', 'PM']
This should solve your problem assuming the delimiter when the string come in are the same.
you can use regular expressions
something likes this:
import re
m = re.match("(\d+/\d+/\d+) (\d+:\d+) (\wM)", "3/5/2019 12:38 PM")
print(m.groups())
this will print a tuple with the first item being the date and second item being the time and the third item being the PM or AM: ('3/5/2019', '12:38', 'PM') which you can easily parse yourself
Edit
you can also use the datetime module to parse the date string:
import datetime
dt = datetime.datetime.strptime("3/5/2019 12:38 PM","%d/%m/%Y %I:%M %p")
print(dt.date(), dt.hour)
which will give you a datetime object which you can get all the information from
Related
how do i convert a date: 2022-09-28 to 28092022 in python?
I have some files that have this date pattern in their name and I need to convert this date to find the latest one, is possible?
Every help is welcome :)
So, when you use date.today() you get back a datetime.date object:
>>> from datetime import date
>>> date.today()
datetime.date(2022, 9, 28)
On this, you can directly use the .strftime() method to get back a string with whatever format you would like:
>>> date.today().strftime('%d%m%Y')
'28092022'
You can use this function:
def getDateString(date):
year, month, day = str(date).split("-")
return day+month+year
Eg:
date = datetime(year=2022, month=9, day = 28).date()
print(getDateString(date)) // returns '28092022'
we can use regex to get the task done:
the first step should be to get the date expression from the file name.
(if there is more than one file name to read and modify, we can run the task in a loop).
extract the date expression from the file name:
import re
file_name = 'backup 2022-09-28-17:33.xyz' # a sample file name
pattern = r'(\d{4}-\d{2}-\d{2})'
date = ''.join(re.findall(pattern, file_name)).split('-')
result of date: ['2022', '09', '28']
in the second step we replace the current date expression by the new ones:
file_name = re.sub(pattern, ''.join(date[::-1]), file_name)
print(file_name)
result: backup 28092022-17:33.xyz
My problem is the following:
I have the string:
datetime = "2021/04/07 08:30:00"
I want to save in the variable hour, 08 and
I want to save in the variable minutes, 30
What I've done is the following:
import re
pat = re.compile(' (.*):')
hour = re.search(pat, datetime)
minutes = re.search(pat, datetime)
print(hour.group(1))
print(minutes.group(1))
What I obtain from the prints is
08:30 and 30, so the minutes are correct but for some reason that I'm not understanding, in the hours the first : is skipped and takes everything from the whitespace to the second :.
What am I doing wrong? Thank you.
Please use strptime from datetime module which is recommended way to handle string dates in python.
strptime returns a datetime object from the string date, and this datetime object comes with all sorts of goodies like date, time, hour, isoformat, timestamp etc which makes working with datetimes breeze.
datetime.datetime.strptime("2021/04/07 08:30:00", "%Y/%m/%d %H:%M:%S")
datetime.datetime(2021, 4, 7, 8, 30)
datetime.datetime.strptime("2021/04/07 08:30:00", "%Y/%m/%d %H:%M:%S").hour
8
datetime.datetime.strptime("2021/04/07 08:30:00", "%Y/%m/%d %H:%M:%S").second
0
Ah, no no, python has a much better approach with datetime.strptime
https://www.programiz.com/python-programming/datetime/strptime
So for you:
from datetime import datetime
dt_string = "2021/04/07 08:30:00"
# Considering date is in dd/mm/yyyy format
dt_object1 = datetime.strptime(dt_string, "%Y/%m/%d %H:%M:%S")
You want hours?
hours = dt_object1.hour
or minutes?
mins = dt_object1.minute
Now, if what you have presented is just an example of where you need to work around whitespace, then you could split the string up. Again with dt_string:
dt_string1 = dt_string.split(" ")
dateString = dt_string1.split("/") # A list in [years, months, days]
timeString = dt_string2.split(":") # A list in [hours, minutes, seconds]
Wildcard . matches any single character, even the :. So .* matches the 08:30.
Use:
hour = re.search('\ ([0-9]*):', datetime)
Output:
>>> hour.group(1)
'08'
You can try below regex to make it non greedy and stop at first :
hour = re.search(' (.*?):', datetime)
An alternative to what you are doing is to split the original datetime by the space into a variable such as dates, which will give you ['2021/04/07', '08:30:00']. You can then access the second value of the list variable dates and split it again by ':', to get the individual time, and access the parts of the list varaible time for hours, minutes, and seconds, from the variable time.
datetime = "2021/04/07 08:30:00"
dates = datetime.split(" ")
print(dates)
time = dates[1].split(":")
print(time)
Printing the code will give you
print(dates) --> ['2021/04/07', '08:30:00']
print(time) --> ['08', '30', '00']
You can access individual parts of time with time[0] for '08', time[1] for '30' etc.
I would use re.compile with named capture groups and iterate:
inp = "Hello World 2021/04/07 08:30:00 Goodbye"
r = re.compile(r'\b\d{4}/\d{2}/\d{2} (?P<hour>\d{2}):(?P<minute>\d{2}):\d{2}\b')
output = [m.groupdict() for m in r.finditer(inp)]
print(output[0]['hour']) # 08
print(output[0]['minute']) # 30
This is a simple datetime question. Python already has the ability to do exactly what you need. 3 steps:
use strptime to generate your date time from your string.
you can get the formatting options here
return just the hour or minute from the datetime object
from datetime import datetime
dt_string = "2021/04/07 08:30:00"
dt_object = datetime.strptime(dt_string, "%Y/%m/%d %H:%M:%S")
print(dt_object)
print(dt_object.hour, dt_object.minute)
# 2021-04-07 08:30:00
# 8 30
You can do that thing using Striptime in some cases if you need to do with string you can search those thing using +[what are the things inside here you can make it here example 0-7 or a-z or A-Z or symbols]+
import re
datetime = "2021/04/07 08:30:00"
#here you need to make a change
hour = re.search(' (.*):+[0-7]+:', datetime)
minutes = re.search(':(.*):', datetime)
print(hour.group(1))
print(minutes.group(1))
I'm trying to create a function that takes the "day" part of a Y-M-D date string.
For example:
Input: ["2022 November 23,2023 April 9"]
Output: 23
I have tried to do this by using the .split() function to split the string up at the comma, then slicing the last 2 indexes out to get the day. However, while I can get the last term of the new split string easily, I cannot get the 2nd-to-last term.
Ex:
y_m_d="2022 November 20,2023 April 9"
split_ymd=y_m_d.split(",")
first_value=split_ymd[0]
print(split_ymd[-1]) #This prints "0"
However, adding the 2nd argument to the slice command breaks it
y_m_d="2022 November 20,2023 April 9"
split_ymd=y_m_d.split(",")
first_value=split_ymd[0]
print(split_ymd[-1:-2]) #This prints "[]"
I understand that some of the terminologies above might not be correct as I a new to learn python, and programming in general, and that the code above is very messy but I just need help knowing why the slice command above does not work. I am open to suggestions on improving the code itself, but I really just want to know why the slice does not work in his situation.
Thanks!
The slice doesn't work in this situation because the slice doesn't work in any situation. It's unrelated to .split() or anything else you're doing.
Consider this simpler test case:
>>> [1,2,3,4][-1]
4
>>> [1,2,3,4][-1:-2]
[]
This happens because -1 refers to index 3 and -2 refers to index 2, and the span [3,2) is backwards so it's treated as empty.
You can swap them if you actually want a range:
>>> [1,2,3,4][-2:-1]
[3]
Or you can just use -2 if you want the second-from-last element:
>>> [1,2,3,4][-2]
3
You can split your initial list on the comma to create a list containing
multiple strings, each representing a date.
Then iterate over those dates, splitting them by spaces. The last value in the subsequent list is the day value you are looking for.
Let me know if this isn't clear.
It looks something like this:
list_of_dates = ["2022 November 23, 2023 April 9"]
# This separates all dates by splitting on the comma
dates = "".join(list_of_dates).split(",")
days = []
for d in dates:
# This splits each date on the space
temp = d.split(" ")
days.append(temp[-1])
print(days)
# Output: ["23", "9"]
So, I have two recommendations here:
Try learning how python slicing works (both negative and positive numbers)
For your actual solution, I see a list of dates. Comma separating it and then parsing the date into a datetime object might make things easier here.
# For example,
date_str_list = "2022 November 20,2023 April 9"
for date_str in date_str_list.split(","):
date = datetime.datetime.strptime(date_str, "%Y %B %d")
day = date.day
See https://docs.python.org/3/library/datetime.html for more details on how you can control the string format of a datetime object.
The Amazing datetime Module!
First lesson: Let the built-ins do the hard work for you!
Here is an example of your function which parses a date string and returns the day. Additionally, here is an example implementation of how you can use the function.
Hope this helps!
The Function:
from datetime import datetime as dt
def extract_day(date_string, mask='%Y %B %d'):
"""Extract and return the day.
Args:
date_string (str): Date text as a string.
mask (str): Format of the provided date string,
used for parsing.
"""
day = dt.strptime(date_string, mask).day
return day
Implementation:
dates1 = "2022 November 20,2023 April 9"
dates2 = "20 Nov 2022,9 Apr 2023"
date_strings1 = dates1.split(',')
date_strings2 = dates2.split(',')
for date in date_strings1:
day = extract_day(date)
print('\nOriginal string: {}'.format(date))
print('Extracted day: {}'.format(day))
for date in date_strings2:
day = extract_day(date, mask='%d %b %Y')
print('\nOriginal string: {}'.format(date))
print('Extracted day: {}'.format(day))
Output:
Original string: 2022 November 20
Extracted day: 20
Original string: 2023 April 9
Extracted day: 9
Original string: 20 Nov 2022
Extracted day: 20
Original string: 9 Apr 2023
Extracted day: 9
How about:
ymd="2022 November 20,2023 April 9"
lchar = ymd.find(',')
fchar = lchar-2
d_int = int(y_m_d[fchar:lchar])
print(d_int)
If there is only one comma after the date this should give you what you want.
You can tackle this in a few different ways:
Using the len function:
date = "2013 November 20,2023 April 10"
splitted = date.split(',')
splitted[0][len(splitted[0])-2:]
Use negative indexing:
date = "2013 November 20,2023 April 10"
splitted = date.split(',')
splitted[0][-2:]
Can anyone help me with converting date strings like:
1st-October-1998
2nd-October-1998
3rd-October-1998
nth-October-1998
I cannot see in datetime.strptime() behaviour that it allows for this format.
You can try with dateutil.pareser :
import dateutil.parser
s = "1st-October-1998"
d = dateutil.parser.parse(s)
print(d.date())
Output :
1998-10-01
You can still use strptime however you need to removed the extra chars in data using regex
import re
date_string = "1st-October-1998"
def remove_extra_chars(ds):
return re.sub(r'(\d)(st|nd|rd|th)', r'\1', ds)
d = datetime.strptime(solve(date_string), '%d-%B-%Y')
print(d.strftime('%d-%B-%Y')) # output: 01-October-1998
print(d.strftime('%Y-%m-%d')) # output: 1998-10-01
Apparently I made a mistake in answering your question.
To convert a string to date without using regex, we can try
from datetime import datetime as dt
s = '22nd-October-1998'
dt.strptime(s.replace(s[s.find('-')-2:s.find('-')], ''), '%d-%B-%Y').date()
The idea is to find the character - and then replace 2 characters before - with an empty string then convert it using datetime.strptime().
In a DataFrame, we can do it by using pandas native functions. Suppose that the DataFrame is df and the date string format column is Date, then we can convert the column to date time format by using
pd.to_datetime(df['Date'].replace(dict.fromkeys(['st', 'nd', 'rd', 'th'], ''),
regex=True), format='%d-%B-%Y')
The idea is to replace the substrings ['st', 'nd', 'rd', 'th'] with an empty string then convert the column using pandas.to_datetime().
I need to parse a few dates that are roughly in the format (1 or 2-digit year)-(Month abbreviation), for example:
5-Jun (June 2005)
13-Jan (January 2013)
I tried using strptime with the format %b-%y but it did not consistently produce the desired date. Per the documentation, this is because some years in my dataset are not zero-padded.
Further, when I tested the datetime module (please see below for my code) on the string "5-Jun", I got "2019-06-05", instead of the desired result (June 2005), even if I set yearfirst=True when calling parse.
from dateutil.parser import parse
parsed = parse("5-Jun",yearfirst=True)
print(parsed)
It will be easier if 0 is padded to single digit years, as it can be directly converted to time using format. Regular expression is used here to replace any instance of single digit number with it's '0 padded in front' value. I've used regex from here.
Sample code:
import re
match_condn = r'\b([0-9])\b'
replace_str = r'0\1'
datetime.strptime(re.sub(match_condn, replace_str, '15-Jun'), '%y-%b').strftime("%B %Y")
Output:
June 2015
One approach is to use str.zfill
Ex:
import datetime
d = ["5-Jun", "13-Jan"]
for date in d:
date, month = date.split("-")
date = date.zfill(2)
print(datetime.datetime.strptime(date+"-"+month, "%y-%b").strftime("%B %Y"))
Output:
June 2005
January 2013
Ah. I see from #Rakesh's answer what your data is about. I thought you needed to parse the full name of the month. So you had your two terms %b and %y backwards, but then you had the problem with the single-digit years. I get it now. Here's a much simpler way to get what you want if you can assume your dates are always in one of the two formats you indicate:
inp = "5-Jun"
t = time.strptime(("0" + inp)[-6:], "%y-%b")