How to extract date from filename in python? - python

I need to extract the event date written on the filename to be in a new column called event_date, I am assumed I can use regex but I still do not get the exact formula to implement.
The filename is written below
file_name = X-Y Cable Installment Monitoring (10-7-20).xlsx
The (10-7-20) is in mm-dd-yy format.
I expect the date would result df['event_date'] = 2020-10-07
How should I write my script to get the correct date from the filename.
Thanks in advance.

use str.rsplit() with datetime module -
Steps -
extract date
convert it into the required datetime format.
from datetime import datetime
file_name = 'X-Y Cable Installment Monitoring (10-7-20).xlsx'
date = file_name.rsplit('(')[1].rsplit(')')[0] # '10-7-20'
date = datetime.strptime(date, "%m-%d-%y").strftime('%Y-%m-%d') # '2020-10-07'
Or via regex -
import re
regex = re.compile(r"(\d{1,2}-\d{1,2}-\d{2})") # pattern to capture date
matchArray = regex.findall(file_name)
date = matchArray[0]
date = datetime.strptime(date, "%m-%d-%y").strftime('%Y-%m-%d')

Related

Split URL at - With Python

Does anyone know how I can extract the end 6 characters in a absoloute URL e.g
/es/ideas-de-trading-y-noticias/el-ibex-35-insiste-en-buscar-los-7900-puntos-a-la-espera-de-las--221104
This is not a typical URL sometimetimes it ends -221104
Also, is there a way to turn 221104 into the date 04 11 2022 easily?
Thanks in advance
Mark
You should use the datetime module for parsing strings into datetimes, like so.
from datetime import datetime
url = 'https://www.ig.com/es/ideas-de-trading-y-noticias/el-ibex-35-insiste-en-buscar-los-7900-puntos-a-la-espera-de-las--221104'
datetime_string = url.split('--')[1]
date = datetime.strptime(datetime_string, '%y%m%d')
print(f"{date.day} {date.month} {date.year}")
the %y%m%d text tells the strptime method that the string of '221104' is formatted in the way that the first two letters are the year, the next two are the month, and the final two are the day.
Here is a link to the documentation on using this method:
https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior
If the url always has this structure (that is it has the date at the end after a -- and only has -- once), you can get the date with:
str_date = str(url).split("--")[1]
Relaxing the assumption to have only one --, we can have the code working by just taking the last element of the splitted list (again assuming the date is always at the end):
str_date = str(url).split("--")[-1]
(Thanks to #The Myth for pointing that out)
To convert the obtained date into a datetime.date object and get it in the format you want:
from datetime import datetime
datetime_date = datetime.strptime(str_date, "%y%m%d")
formatted_date = datetime_date.strftime("%d %m %Y")
print(formatted_date) # 04 11 2022
Docs:
strftime
strptime
behaviour of the above two functions and format codes
Taking into consideration the date is constant in the format yy-mm-dd. You can split the URL by:
url = "https://www.ig.com/es/ideas-de-trading-y-noticias/el-ibex-35-insiste-en-buscar-los-7900-puntos-a-la-espera-de-las--221104"
time = url[-6:] # Gets last 6 values
To convert yy-mm-dd into dd mm yy we will use the DateTime module:
import datetime as dt
new_time = dt.datetime.strptime(time, '%y%m%d') # Converts your date into datetime using the format
format_time = dt.datetime.strftime(new_time, '%d-%m-%Y') # Format
print(format_time)
The whole code looks like this:
url = "https://www.ig.com/es/ideas-de-trading-y-noticias/el-ibex-35-insiste-en-buscar-los-7900-puntos-a-la-espera-de-las--221104"
time = url[-6:] # Gets last 6 values
import datetime as dt
new_time = dt.datetime.strptime(time, '%y%m%d') # Converts your date into datetime using the format
format_time = dt.datetime.strftime(new_time, '%d %m %Y') # Format
print(format_time)
Learn more about datetime
You can use python built-in split function.
date = url.split("--")[1]
It gives us 221104
then you can modify the string by rearranging it
date_string = f"{date[4:6]} {date[2:4]} {date[0:2]}"
this gives us 04 11 22
Assuming that -- will only be there as it is in the url you posted, you can do something as follows:
You can split the URL at -- & extract the element
a = 'https://www.ig.com/es/ideas-de-trading-y-noticias/el-ibex-35-insiste-en-buscar-los-7900-puntos-a-la-espera-de-las--221104'
desired_value = a.split('--')[1]
& to convert:
from datetime import datetime
converted_date = datetime.strptime(desired_value , "%y%m%d")
formatted_date = datetime.strftime(converted_date, "%d %m %Y")

how can i convert yyyy-MM-dd (date.today()) to ddMMyyyy(string or date) in python

how do i convert a date: 2022-09-28 to 28092022 in python?
I have some files that have this date pattern in their name and I need to convert this date to find the latest one, is possible?
Every help is welcome :)
So, when you use date.today() you get back a datetime.date object:
>>> from datetime import date
>>> date.today()
datetime.date(2022, 9, 28)
On this, you can directly use the .strftime() method to get back a string with whatever format you would like:
>>> date.today().strftime('%d%m%Y')
'28092022'
You can use this function:
def getDateString(date):
year, month, day = str(date).split("-")
return day+month+year
Eg:
date = datetime(year=2022, month=9, day = 28).date()
print(getDateString(date)) // returns '28092022'
we can use regex to get the task done:
the first step should be to get the date expression from the file name.
(if there is more than one file name to read and modify, we can run the task in a loop).
extract the date expression from the file name:
import re
file_name = 'backup 2022-09-28-17:33.xyz' # a sample file name
pattern = r'(\d{4}-\d{2}-\d{2})'
date = ''.join(re.findall(pattern, file_name)).split('-')
result of date: ['2022', '09', '28']
in the second step we replace the current date expression by the new ones:
file_name = re.sub(pattern, ''.join(date[::-1]), file_name)
print(file_name)
result: backup 28092022-17:33.xyz

Split the Date and Year and format the date into standard MM/DD/YYYY in Python

I'm working on date formatting and few cells contains data i.e. June/142017(no slash between date and year). I want to split the date and year and convert into standard format MM/DD/YYYY.
I'm formatting the date into standard format, which is becoming exclusive to June Month, by using the replace function, i.e. replace("June/142017", "June/14/2017"). Please, could you assist me with the code that should split and convert into standard format which is not specific.
Below is the code I'm using:
`import pandas as pd
import datetime as dt
File = pd.read_excel("Final_file.xlsx")
LFile = File.replace("June/142017","June/14/2017")
LFile["Date"] = pd.to_datetime(LFile["Date"]).dt.strftime("%m/%d/%Y")
LFile.to_excel("Updated_Final_File.xlsx")`
*** FYI - I'm new to Python.
Thank you in Advance.
Use format %B/%d%Y for match June/142017:
File = pd.read_excel("Final_file.xlsx")
d1 = pd.to_datetime(LFile["Date"], format='%B/%d%Y', errors='coerce')
d2 = pd.to_datetime(LFile["Date"], errors='coerce')
LFile["Date"] = d2.fillna(d1).dt.strftime("%m/%d/%Y")
LFile.to_excel("Updated_Final_File.xlsx")

I want to know how to just use the date portion of the Unix timestamp. I have left blank the portion I am having trouble trying to figure out

I am lost as to how to turn the time stamp into a formal date and how to format it.
import os
import datetime
def file_date(filename):
# Create the file in the current directory
with open(filename,'w') as file:
#Create the timestamp
timestamp = os.path.getmtime(filename)
# Convert the timestamp into a readable format, then into a string
date_time = datetime.datetime.fromtimestamp(timestamp)
# Return just the date portion
date = ______
# Hint: how many characters are in “yyyy-mm-dd”? 10
return ("{}".format(date))
print(file_date("newfile.txt"))
# Should be today's date in the format of yyyy-mm-dd

How do I shorten this code to make the date SQL compatible?

I'm using tkcalendar to get dates. The date format is 'DD/MM/YYYY'. However, SQL DATE format accepts only 'YYYY-MM-DD'.
I have the following code:
dated = cal.get_date()
dates = dated.split('/')
reverse_dates = dates[::-1]
separator = '-'
final_date = separator.join(reverse_dates)
print(final_date)
where, cal is a tkcalendar.Calendar object. Is there a shorter way to write this code?
Try using strftime:
date = cal.get_date()
date_str = date.strftime("%Y-%m-%d")
print("date formatted: ", date_str)
Using a value of datetime.now() for the starting date, the output from the above script is:
date formatted: 2020-11-23

Categories

Resources