Python: Extract two dates from string - python

I have a string s which contains two dates in it and I am trying to extract these two dates in order to subtract them from each other to count the number of days in between. In the end I am aiming to get a string like this: s = "o4_24d_20170708_20170801"
At the company I work we can't install additional packages so I am looking for a solution using native python. Below is what I have so far by using the datetime package which only extracts one date: How can I get both dates out of the string?
import re, datetime
s = "o4_20170708_20170801"
match = re.search('\d{4}\d{2}\d{2}', s)
date = datetime.datetime.strptime(match.group(), '%Y%m%d').date()
print date

from datetime import datetime
import re
s = "o4_20170708_20170801"
pattern = re.compile(r'(\d{8})_(\d{8})')
dates = pattern.search(s)
# dates[0] is full match, dates[1] and dates[2] are captured groups
start = datetime.strptime(dates[1], '%Y%m%d')
end = datetime.strptime(dates[2], '%Y%m%d')
difference = end - start
print(difference.days)
will print
24
then, you could do something like:
days = 'd{}_'.format(difference.days)
match_index = dates.start()
new_name = s[:match_index] + days + s[match_index:]
print(new_name)
to get
o4_d24_20170708_20170801

import re, datetime
s = "o4_20170708_20170801"
match = re.findall('\d{4}\d{2}\d{2}', s)
for a_date in match:
date = datetime.datetime.strptime(a_date, '%Y%m%d').date()
print date
This will print:
2017-07-08
2017-08-01
Your regex was working correctly at regexpal

Related

Split URL at - With Python

Does anyone know how I can extract the end 6 characters in a absoloute URL e.g
/es/ideas-de-trading-y-noticias/el-ibex-35-insiste-en-buscar-los-7900-puntos-a-la-espera-de-las--221104
This is not a typical URL sometimetimes it ends -221104
Also, is there a way to turn 221104 into the date 04 11 2022 easily?
Thanks in advance
Mark
You should use the datetime module for parsing strings into datetimes, like so.
from datetime import datetime
url = 'https://www.ig.com/es/ideas-de-trading-y-noticias/el-ibex-35-insiste-en-buscar-los-7900-puntos-a-la-espera-de-las--221104'
datetime_string = url.split('--')[1]
date = datetime.strptime(datetime_string, '%y%m%d')
print(f"{date.day} {date.month} {date.year}")
the %y%m%d text tells the strptime method that the string of '221104' is formatted in the way that the first two letters are the year, the next two are the month, and the final two are the day.
Here is a link to the documentation on using this method:
https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior
If the url always has this structure (that is it has the date at the end after a -- and only has -- once), you can get the date with:
str_date = str(url).split("--")[1]
Relaxing the assumption to have only one --, we can have the code working by just taking the last element of the splitted list (again assuming the date is always at the end):
str_date = str(url).split("--")[-1]
(Thanks to #The Myth for pointing that out)
To convert the obtained date into a datetime.date object and get it in the format you want:
from datetime import datetime
datetime_date = datetime.strptime(str_date, "%y%m%d")
formatted_date = datetime_date.strftime("%d %m %Y")
print(formatted_date) # 04 11 2022
Docs:
strftime
strptime
behaviour of the above two functions and format codes
Taking into consideration the date is constant in the format yy-mm-dd. You can split the URL by:
url = "https://www.ig.com/es/ideas-de-trading-y-noticias/el-ibex-35-insiste-en-buscar-los-7900-puntos-a-la-espera-de-las--221104"
time = url[-6:] # Gets last 6 values
To convert yy-mm-dd into dd mm yy we will use the DateTime module:
import datetime as dt
new_time = dt.datetime.strptime(time, '%y%m%d') # Converts your date into datetime using the format
format_time = dt.datetime.strftime(new_time, '%d-%m-%Y') # Format
print(format_time)
The whole code looks like this:
url = "https://www.ig.com/es/ideas-de-trading-y-noticias/el-ibex-35-insiste-en-buscar-los-7900-puntos-a-la-espera-de-las--221104"
time = url[-6:] # Gets last 6 values
import datetime as dt
new_time = dt.datetime.strptime(time, '%y%m%d') # Converts your date into datetime using the format
format_time = dt.datetime.strftime(new_time, '%d %m %Y') # Format
print(format_time)
Learn more about datetime
You can use python built-in split function.
date = url.split("--")[1]
It gives us 221104
then you can modify the string by rearranging it
date_string = f"{date[4:6]} {date[2:4]} {date[0:2]}"
this gives us 04 11 22
Assuming that -- will only be there as it is in the url you posted, you can do something as follows:
You can split the URL at -- & extract the element
a = 'https://www.ig.com/es/ideas-de-trading-y-noticias/el-ibex-35-insiste-en-buscar-los-7900-puntos-a-la-espera-de-las--221104'
desired_value = a.split('--')[1]
& to convert:
from datetime import datetime
converted_date = datetime.strptime(desired_value , "%y%m%d")
formatted_date = datetime.strftime(converted_date, "%d %m %Y")

How to extract date from filename in python?

I need to extract the event date written on the filename to be in a new column called event_date, I am assumed I can use regex but I still do not get the exact formula to implement.
The filename is written below
file_name = X-Y Cable Installment Monitoring (10-7-20).xlsx
The (10-7-20) is in mm-dd-yy format.
I expect the date would result df['event_date'] = 2020-10-07
How should I write my script to get the correct date from the filename.
Thanks in advance.
use str.rsplit() with datetime module -
Steps -
extract date
convert it into the required datetime format.
from datetime import datetime
file_name = 'X-Y Cable Installment Monitoring (10-7-20).xlsx'
date = file_name.rsplit('(')[1].rsplit(')')[0] # '10-7-20'
date = datetime.strptime(date, "%m-%d-%y").strftime('%Y-%m-%d') # '2020-10-07'
Or via regex -
import re
regex = re.compile(r"(\d{1,2}-\d{1,2}-\d{2})") # pattern to capture date
matchArray = regex.findall(file_name)
date = matchArray[0]
date = datetime.strptime(date, "%m-%d-%y").strftime('%Y-%m-%d')

How to search for string between whitespace and marker? Python

My problem is the following:
I have the string:
datetime = "2021/04/07 08:30:00"
I want to save in the variable hour, 08 and
I want to save in the variable minutes, 30
What I've done is the following:
import re
pat = re.compile(' (.*):')
hour = re.search(pat, datetime)
minutes = re.search(pat, datetime)
print(hour.group(1))
print(minutes.group(1))
What I obtain from the prints is
08:30 and 30, so the minutes are correct but for some reason that I'm not understanding, in the hours the first : is skipped and takes everything from the whitespace to the second :.
What am I doing wrong? Thank you.
Please use strptime from datetime module which is recommended way to handle string dates in python.
strptime returns a datetime object from the string date, and this datetime object comes with all sorts of goodies like date, time, hour, isoformat, timestamp etc which makes working with datetimes breeze.
datetime.datetime.strptime("2021/04/07 08:30:00", "%Y/%m/%d %H:%M:%S")
datetime.datetime(2021, 4, 7, 8, 30)
datetime.datetime.strptime("2021/04/07 08:30:00", "%Y/%m/%d %H:%M:%S").hour
8
datetime.datetime.strptime("2021/04/07 08:30:00", "%Y/%m/%d %H:%M:%S").second
0
Ah, no no, python has a much better approach with datetime.strptime
https://www.programiz.com/python-programming/datetime/strptime
So for you:
from datetime import datetime
dt_string = "2021/04/07 08:30:00"
# Considering date is in dd/mm/yyyy format
dt_object1 = datetime.strptime(dt_string, "%Y/%m/%d %H:%M:%S")
You want hours?
hours = dt_object1.hour
or minutes?
mins = dt_object1.minute
Now, if what you have presented is just an example of where you need to work around whitespace, then you could split the string up. Again with dt_string:
dt_string1 = dt_string.split(" ")
dateString = dt_string1.split("/") # A list in [years, months, days]
timeString = dt_string2.split(":") # A list in [hours, minutes, seconds]
Wildcard . matches any single character, even the :. So .* matches the 08:30.
Use:
hour = re.search('\ ([0-9]*):', datetime)
Output:
>>> hour.group(1)
'08'
You can try below regex to make it non greedy and stop at first :
hour = re.search(' (.*?):', datetime)
An alternative to what you are doing is to split the original datetime by the space into a variable such as dates, which will give you ['2021/04/07', '08:30:00']. You can then access the second value of the list variable dates and split it again by ':', to get the individual time, and access the parts of the list varaible time for hours, minutes, and seconds, from the variable time.
datetime = "2021/04/07 08:30:00"
dates = datetime.split(" ")
print(dates)
time = dates[1].split(":")
print(time)
Printing the code will give you
print(dates) --> ['2021/04/07', '08:30:00']
print(time) --> ['08', '30', '00']
You can access individual parts of time with time[0] for '08', time[1] for '30' etc.
I would use re.compile with named capture groups and iterate:
inp = "Hello World 2021/04/07 08:30:00 Goodbye"
r = re.compile(r'\b\d{4}/\d{2}/\d{2} (?P<hour>\d{2}):(?P<minute>\d{2}):\d{2}\b')
output = [m.groupdict() for m in r.finditer(inp)]
print(output[0]['hour']) # 08
print(output[0]['minute']) # 30
This is a simple datetime question. Python already has the ability to do exactly what you need. 3 steps:
use strptime to generate your date time from your string.
you can get the formatting options here
return just the hour or minute from the datetime object
from datetime import datetime
dt_string = "2021/04/07 08:30:00"
dt_object = datetime.strptime(dt_string, "%Y/%m/%d %H:%M:%S")
print(dt_object)
print(dt_object.hour, dt_object.minute)
# 2021-04-07 08:30:00
# 8 30
You can do that thing using Striptime in some cases if you need to do with string you can search those thing using +[what are the things inside here you can make it here example 0-7 or a-z or A-Z or symbols]+
import re
datetime = "2021/04/07 08:30:00"
#here you need to make a change
hour = re.search(' (.*):+[0-7]+:', datetime)
minutes = re.search(':(.*):', datetime)
print(hour.group(1))
print(minutes.group(1))

Split based on _ and find the difference between dates

I am trying to find the difference between the below two dates. It is in the format of "dd-mm-yyyy". I splitted the two strings based on _ and extract the date, month and year.
previous_date = "filename_03_03_2021"
current_date = "filename_09_03_2021"
previous_array = previous_date.split("_")
Not sure after that what could be done to combine them into a date format and find the difference between dates in "days".
Any leads/suggestions would be appreciated.
You could index into the list after split like previous_array[1] to get the values and add those to date
But tnstead of using split, you might use a pattern with 3 capture groups to make the match a bit more specific to get the numbers and then subtract the dates and get the days value.
You might make the date like pattern more specific using the pattern on this page
import re
from datetime import date
previous_date = "filename_03_03_2021"
current_date = "filename_09_03_2021"
pattern = r"filename_(\d{2})_(\d{2})_(\d{4})"
pMatch = re.match(pattern, previous_date)
cMatch = re.match(pattern, current_date)
pDate = date(int(pMatch.group(3)), int(pMatch.group(2)), int(pMatch.group(1)))
cDate = date(int(cMatch.group(3)), int(cMatch.group(2)), int(cMatch.group(1)))
print((cDate - pDate).days)
Output
6
See a Python demo

How to format date in python

I made a crawler using python.
But my crawler get date in this format:
s = page_ad.findAll('script')[25].text.replace('\'', '"')
s = re.search(r'\{.+\}', s, re.DOTALL).group() # get json data
s = re.sub(r'//.+\n', '', s) # replace comment
s = re.sub(r'\s+', '', s) # strip whitspace
s = re.sub(r',}', '}', s) # get rid of last , in the dict
dataLayer = json.loads(s)
print dataLayer["page"]["adDetail"]["adDate"]
2017-01-1412:28:07
I want only date without hours (2017-01-14), how get only date if not have white spaces?
use string subset:
>>> date ="2017-01-1412:28:07"
>>> datestr= date[:-8]
>>> datestr
'2017-01-14'
>>>
As this is not a standard date format, just slice the end.
st = "2017-01-1412:28:07"
res = st[:10]
print res
>>>2017-01-14
try this code:
In [2]: from datetime import datetime
In [3]: now = datetime.now()
In [4]: now.strftime('%Y-%m-%d')
Out[4]: '2017-01-24'
Update
I suggest you parse the date first into datetime object and then show the relevant information out of it.
for this a better approach would be using a library for this.
I use dateparser for this tasks, example usage:
import dateparser
date = dateparser.parse('12/12/12')
date.strftime('%Y-%m-%d')
Use datetime as follows to first convert it into a datetime object, and then format the output as required using the stftime() function:
from datetime import datetime
ad_date = dataLayer["page"]["adDetail"]["adDate"]
print datetime.strptime(ad_date, "%Y-%m-%d%H:%M:%S").strftime("%Y-%m-%d")
This will print:
2017-01-14
By using this method, it would give you the flexibility to display other items, for example adding %A to the end would give you the day of the week:
print datetime.strptime(ad_date, "%Y-%m-%d%H:%M:%S").strftime("%Y-%m-%d %A")
e.g.
2017-01-14 Saturday

Categories

Resources