Batch filename rename with conditional or math operation

Batch filename rename with conditional or math operation - python

this is my firt question here. Thanks in advance.
I have automatically uploaded hundreds of images to a Webfaction server with an incorrect timestamp in the filename and now I have to rename them.
Here is an example:
tl0036_20161121-120558.jpg
myCode_yyyymmdd_hhmmss.jpg
I have to change the "hh" characters 1 hour back, so 12 should be 11. They are the 17th and 18th position. I imagine two ways of doing it:
Math operation: Images are taken from 1am to 11pm, so I see no problem in doing just a math operation like 12-1=11 for all of them.
Conditional: if 17th and 18th characters value are 12, then rename to 11. 24 conditions should be written, from 01 to 23 starting value.
I have found many answers here to replace/rename a fixed string and others about conditional operation, but nothing about my kind of mixed replacement.
Please I need advidce in how the script should be assuming it will be executed into the files folder. I am a novice used to work with bash or python.
Thank you!

Solution using datetime in Python
import time
import datetime
def change_filename(fn):
# EXTRACT JUST THE TIMESTAMP AS A STRING
timestamp_str = fn[7:22]
# CONVERT TO UNIX TIMESTAMP
timestamp = time.mktime(datetime.datetime.strptime(timestamp_str, "%Y%m%d-%H%M%S").timetuple())
# SUBTRACT AN HOUR (3600 SECONDS)
timestamp = timestamp - 3600
# CHANGE BACK TO A STRING
timestamp_str = datetime.datetime.fromtimestamp(timestamp).strftime("%Y%m%d-%H%M%S")
# RETURN THE FILENAME WITH THE NEW TIMESTAMP
return fn[:7] + timestamp_str + fn[22:]
This takes into account possible changes in the day, month, and year that could happen by putting the timestamp back an hour. If you're using a 12-hour time rather than 24 hour, you can use the format string "%Y%m%d-%I%M%S" instead; see the Python docs.
Credit to: Convert string date to timestamp in Python and Converting unix timestamp string to readable date in Python
This assumes that your myCode is of a fixed length, if not you could use the str.split method to pull out the hours from after the -, or if your filenames have an unknown number/placement of -s, you could look at using regular expressions to find the hours and replace them using capturing groups.
In Python, you can use a combination of glob and shutil.move to walk through your files and rename them using that function. You might want to use a regular expression to ensure that you only operate on files matching your naming scheme, if there are other files also in the directory/ies.
Naive Solution
With the caveats about the length of myCode and filename format as above.
If your timestamps are using the 24 hour format (00-23 hours), then you can replace the hours by subtracting one, as you say; but you'd have to use conditionals to ensure that you take care of turning 23 into 00, and take care of adding a leading zero to hours less than 10.
An example in Python would be:
def change_filename(fn):
hh = int(fn[16:18])
if hh == 0:
hh = 23
else:
hh -= 1
hh = str(hh)
# ADD LEADING ZERO IF hh < 10
if len(hh) == 1:
hh = '0' + hh
return fn[:16] + str(hh) + fn[18:]
As pointed out above, an important point to bear in mind is that this approach would not put the day back by one if the hour is 00 and is changed to 23, so you would have to take that into account as well. The same could happen for the month, and the year, and you'd have to take these all into account. It's much better to use datetime.

For your file renaming logic, not only are you going to have issues over day boundaries, but also month, year, and leap-year boundaries. For example, if your filename is tl0036_20160101-000558.jpg, it needs to change to tl0036_20151231-230558.jpg. Another example: tl0036_20160301-000558.jpg will be tl0036_20160229-230558.jpg.
Creating this logic from scratch will be very time consuming - but luckily there's the datetime module in Python that has all this logic already built in.
Your script should consist of the following steps:
Iterate through each '.jpg' file in your folder.
Try to match your timestamp file name for each '.jpg'
Extract the timestamp values and create a datetime.datetime object out of those values.
Subtract a datetime.timedelta object equal to 1 hour from that datetime.datetime object, and set that as your new timestamp.
Contstruct a new filename with your new timestamp.
Replace the old filename with the new filename.
Here's an example implementation:
import datetime
import glob
import os
import re
def change_timestamps(source_folder, hourdiff = -1):
file_re = re.compile(r'(.*)_(\d{8}-\d{6})\.jpg')
for source_file_name in glob.iglob(os.path.join(source_folder, '*.jpg')):
file_match = file_re.match(source_file_name)
if file_match is not None:
old_time = datetime.datetime.strptime(file_match.group(2), "%Y%m%d-%H%M%S")
new_time = old_time + datetime.timedelta(hours = hourdiff)
new_file_str = '{}_{}.jpg'.format(file_match.group(1), new_time.strftime("%Y%m%d-%H%M%S"))
new_file_name = os.path.join(source_folder, new_file_str)
os.replace(source_file_name, new_file_name)

Related

Convert date with more than 31 days to regular date format in python

I have exported some data from another programm, where I added up time for a station waiting.
So after some time, I have the format '32:00:00:33.7317' for the waiting time.
This is my function to convert every date into the format I want:
def Datum_formatieren(Datensatz):
if len(str(Datensatz)) == 24:
return datetime.datetime.strptime(Datensatz, "%d.%m.%Y %H:%M:%S.%f").strftime("%d%H%M")
elif len(str(Datensatz)) == 3:
return 0
#return datetime.datetime.strptime(Datensatz, "%S.%f").strftime("%d%H%M")
elif len(str(Datensatz)) == 5:
return str(Datensatz)
elif len(str(Datensatz)) == 7:
return str(Datensatz)
elif len(str(Datensatz)) == 6:
return datetime.datetime.strptime(str(Datensatz), "%S.%f").strftime("%d%H%M")
elif len(str(Datensatz)) == 9 or len(str(Datensatz))==10:
return datetime.datetime.strptime(str(Datensatz), "%M:%S.%f").strftime("%d%H%M")
elif len(str(Datensatz)) == 12 or len(str(Datensatz)) ==13:
return datetime.datetime.strptime(str(Datensatz), "%H:%M:%S.%f").strftime("%d%H%M")
elif len(str(Datensatz)) == 15 or len(str(Datensatz)) == 16:
return datetime.datetime.strptime(str(Datensatz), "%d:%H:%M:%S.%f").strftime("%d%H%M")
I get the following error since python does not recognize days above 30 or 31:
ValueError: time data '32:00:00:33.7317' does not match format '%d:%H:%M:%S.%f'
How do I convert all entries with days above 31 into a format, which python can recognize?

You cannot use datetime.datetime.strptime() to construct datetimes that are invalid - why see other answer.
You can however leverage datetime.timespan:
import datetime
def Datum_formatieren(Datensatz):
# other cases omitted for brevity
# Input: "days:hours:minutes:seconds.ms"
if len(Datensatz) in (15,16):
k = list(map(float,Datensatz.split(":")))
secs = k[0]*60*60*24 + k[1]*60*60 + k[2]*60 + k[3]
td = datetime.timedelta(seconds=secs)
days = td.total_seconds() / 24 / 60 // 60
hours = (td.total_seconds() - days * 24*60*60) / 60 // 60
minuts = (td.total_seconds() - days *24*60*60 - hours * 60*60) // 60
print(td)
return f"{td.days}{int(hours):02d}{int(minuts):02d}"
print(Datum_formatieren("32:32:74:33.731"))
Output for "32:32:74:33.731":
33 days, 9:14:33.731000 # timespan
330914 # manually parsed

You are misusing datetime wich only map to correct dates with times - not "any amount time passed".
Use a timedelta instead:
Adapted from datetime.timedelta:
from datetime import datetime, timedelta
delta = timedelta( days=50, seconds=27, microseconds=10,
milliseconds=29000, minutes=5, hours=8, weeks=2 )
print(datetime.now() + delta)
You can add any timedelta to a normal datetime and get the resulting value.
If you want to stick wich your approach you may want to shorten it:
if len(str(Datensatz)) == 9 or len(str(Datensatz))==10:
if len(Datensatz) in (9,10):
Related: How to construct a timedelta object from a simple string (look at its answers and take inspiration with attribution from it)

You're taking the Datensatz variable, converting it to string using str(), then parsing it back into an internal representation; there is almost always a better way to do it.
Can you check what type the Datensatz variable has, perhaps print(type(Datensatz)) or based on the rest of your code?
Most likely the Datensatz variable already has fields for the number of days, hours, minutes and seconds. It's usually much better to base your logic on those directly, rather than converting to string and back.
As others have pointed out, you're trying to use a datetime.datetime to represent a time interval; this is incorrect. Instead, you need to either:
Use the datetime.timedelta type, which is designed for time intervals. It can handle periods over 30 days correctly:
>>> print(datetime.timedelta(days=32, seconds=12345))
32 days, 3:25:45
>>>
Since your function is named Datum_formatieren, perhaps you intend to take Datensatz and convert it to string, for output to the user or to another system.
In that case, you should take the fields directly in Datensatz and convert them appropriately, perhaps using f-strings or % formatting. Depending on the situation, you may need to do some arithmetic. The details will depend on the type of Datensatz and the format you need on the output.

is there a way to modify a string to remove a decimal?

I have a file with a lot of images. Each image is named something like:
100304.jpg
100305.jpg
100306.jpg
etc...
I also have a spreadsheet, Each image is a row, the first value in the row is the name, the values after the name are various decimals and 0's to describe features of each image.
The issue is that when I pull the name from the sheet, something is adding a decimal which then results in the file not being able to be transferred via the shutil.move()
import xlrd
import shutil
dataLocation = "C:/Users/User/Documents/Python/Project/sort_solutions_rev1.xlsx"
imageLocBase = "C:/Users/User/Documents/Python/Project/unsorted"
print("Specify which folder to put images in. Type the number only.")
print("1")
print("2")
print("3")
int(typeOfSet) = input("")
#Sorting for folder 1
if int(typeOfSet) == 1:
#Identifying what to move
name = str(sheet.cell(int(nameRow), 0).value)
sortDataStorage = (sheet.cell(int(nameRow), 8).value) #float
sortDataStorageNoFloat = str(sortDataStorage) #non-float
print("Proccessing: " + name)
print(name + " has a correlation of " + (sortDataStorageNoFloat))
#sorting for this folder utilizes the information in column 8)
if sortDataStorage >= sortAc:
print("test success")
folderPath = "C:/Users/User/Documents/Python/Project/Image Folder/Folder1"
shutil.move(imageLocBase + "/" + name, folderPath)
print(name + " has been sorted.")
else:
print(name + " does not meet correlation requirement. Moving to next image.")
The issue I'm having occurs with the shutil.move(imageLocBase + "/" +name, folderPath)
For some reason my code takes the name from the spreadsheet (ex: 100304) and then adds a ".0" So when trying to move a file, it is trying to move 100304.0 (which doesn't exist) instead of 100304.

Using pandas to read your Excel file.
As suggested in a comment on the original question, here is a quick example of how to use pandas to read your Excel file, along with an example of the data structure.
Any questions, feel free to shout, or have a look into the docs.
import pandas as pd
# My path looks a little different as I'm on Linux.
path = '~/Desktop/so/MyImages.xlsx'
df = pd.read_excel(path)
Data Structure
This is completely contrived as I don't have an example of your actual file.
IMAGE_NAME FEATURE_1 FEATURE_2 FEATURE_3
0 100304.jpg 0.0111 0.111 1.111
1 100305.jpg 0.0222 0.222 2.222
2 100306.jpg 0.0333 0.333 3.333
Hope this helps get you started.
Suggestion:
Excel likes to think it's clever and does 'unexpected' things, as you're experiencing with the decimal (data type) issue. Perhaps consider storing your image data in a database (SQLite) or as plain old CSV file. Pandas can read from either of these as well! :-)

splitOn = '.'
nameOfFile = text.split(splitOn, 1)[0]
Should work
if we take your file name eg 12345.0 and create a var
name = "12345.0"
Now we need to split this var. In this case we wish to split on .
So we save this condition as a second var
splitOn = '.'
Using the .split for python.
Here we offer the text (variable name) and the python split command.
so to make it literal
12345.0
split at .
only make one split and save as two vars in a list
(so we have 12345 at position 0 (1st value)
and 0 at position 1 (2nd value) in a list)
save 1st var
(as all lists are 0 based we ask for [0]
(if you ever get confused with list, arrays etc just start counting
from 0 instead of one on your hands and then you know
ie position 0 1 2 3 4 = 1st value, 2nd value, 3rd value, 4th value, 5th value)
nameOfFile = name.split(splitOn, 1)[0]
12345.0 split ( split on . , only one split ) save position 0 ie first value
So.....
name = 12345.0
splitOn = '.'
nameOfFile = name.split(splitOn, 1)[0]
yield(nameOfFile)
output will be
12345
I hope that helps
https://www.geeksforgeeks.org/python-string-split/
OR
as highlighted below, convert to float to in
https://www.geeksforgeeks.org/type-conversion-python/
if saved as float
name 12345.0
newName = round(int(name))
this will round the float (as its 0 will round down)
OR
if float is saved as a string
print(int(float(name)))

Apparently the value you retrieve from the spreadsheet comes parsed as a float, so when you cast it to string it retains the decimal part.
You can trim the “.0” from the string value, or cast it to integer before casting to string.
You could also check the spreadsheet’s cell format and ensure it is set to normal (idk the setting, but something that is not a number). With that fixed, your data probably wont come with the .0 anymore.

If always add ".0" to the end of the variable, You need to read the var_string "name" in this way:
shutil.move(imageLocBase + "/" + name[:-2], folderPath)
A string is like a list that we can choose the elements to read.
Slicing is colled this method
Sorry for my English. Bye

All these people have taken time to reply, please out of politeness rate the replies.

How to change the string when the string is present

I need some help with my code. I have a trouble with changing the strings.
I am checking with the strings if the variable getTime3 have a string 30 then I want to replace it with 00. On my code, it will find the string 30 to replace it with 0030 which is wrong. It should be 00.
Here is the code:
if getTime3 == '11:30PM':
self.getControl(346).setLabel('12:00AM')
elif getTime3 == '12:30PM':
self.getControl(346).setLabel('1:00AM')
else:
ind = getTime3.find(':')
if getTime3[ind+1:ind+3]=='30':
getTime3 = getTime3[:ind]+':00'+getTime3[+2:]
self.getControl(346).setLabel(getTime3)
else:
getTime3 = str(int(getTime3[:ind])+1)+':30'+getTime3[+2:]
self.getControl(346).setLabel(getTime3)
What I am expect for the two special cases, when the program finds the :, it will check if 30 is present then change the current hour to the next hour and make a new string with AM/PM label, example: change from 8 to 9 and replace 30 with 00 to make it to show 9:00PM. If the ending is 00 then I want to change 00 to 30 instead. I want to add 30 in the minute section and again preserves the AM/PM part. If the getTime3 have the string 11:30AM then I want to change it to 12:00PM.
Can you please help me with how to fix the 0030 to make it to show 00 instead and add the next hour?

With Python, slice like x[a:b] in the slice starting at a (inclusive), and finishing at b (exclusive).
So: getTime3[:ind] is the slice from 0 to ind exclusive, which is the hours without the ":".
And indexes are absolute index, not relative. So getTime3[+2:] is the same as getTime3[2:], which correspond to the substring starting at index 2.
What you want is:
getTime3 = getTime3[:ind] + ':00' + getTime3[ind + 3:]
# or
getTime3 = getTime3[:ind + 1] + '00' + getTime3[ind + 3:]
Example:
getTime3 = '08:30PM'
ind = getTime3.index(":")
getTime3[:ind] + ':00' + getTime3[ind + 3:]
# -> '08:00PM'
EDIT
If you want to do some calculation on time, you can use the datetime module.
time_fmt = '%I:%M%p'
Is the format used to represent time like '09:30PM', where:
%I Hour (12-hour clock) as a zero-padded decimal number.
%M Minute as a zero-padded decimal number.
%p Locale’s equivalent of either AM or PM.
How to add 30 min:
import datetime
time3 = '09:30PM'
dt3 = datetime.datetime.strptime(time3, time_fmt)
dt3 += datetime.timedelta(minutes=30)
time3 = dt3.strftime(time_fmt)
If you want to set the minutes to 0, you can do:
dt3 = datetime.datetime.strptime(time3, time_fmt)
dt3 = d3.replace(minute=0)

Please don't use the wrong tool for the task. You are manipulating times, yet using strings to do it. Start with this to turn 11:30PM into 12:00AM:
import datetime
t3 = datetime.datetime.strptime(getTime3, '%I:%M%p')
t3 += datetime.timedelta(minutes=30)
print(t3.strftime('%I:%M%p'))
Adding timedelta(minutes=30) makes the intent perfectly clear. Some relevant documentation is at https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior

Python script for EC2 snapshots, use datetime to delete old snapshots

I am a beginner with Python and I have written a python script which takes a snaphot of a specified volume and then retains only the number of snapshots requested for that volume.
#Built with Python 3.3.2
import boto.ec2
from boto.ec2.connection import EC2Connection
from boto.ec2.regioninfo import RegionInfo
from boto.ec2.snapshot import Snapshot
from datetime import datetime
from functools import cmp_to_key
import sys
aws_access_key = str(input("AWS Access Key: "))
aws_secret_key = str(input("AWS Secret Key: "))
regionname = str(input("AWS Region Name: "))
regionendpoint = str(input("AWS Region Endpoint: "))
region = RegionInfo(name=regionname, endpoint=regionendpoint)
conn = EC2Connection(aws_access_key_id = aws_access_key, aws_secret_access_key = aws_secret_key, region = region)
print (conn)
volumes = conn.get_all_volumes()
print ("%s" % repr(volumes))
vol_id = str(input("Enter Volume ID to snapshot: "))
keep = int(input("Enter number of snapshots to keep: "))
volume = volumes[0]
description = str(input("Enter volume snapshot description: "))
if volume.create_snapshot(description):
print ('Snapshot created with description: %s' % description)
snapshots = volume.snapshots()
print (snapshots)
def date_compare(snap1, snap2):
if snap1.start_time < snap2.start_time:
return -1
elif snap1.start_time == snap2.start_time:
return 0
return 1
snapshots.sort(key=cmp_to_key(date_compare))
delta = len(snapshots) - keep
for i in range(delta):
print ('Deleting snapshot %s' % snapshots[i].description)
snapshots[i].delete()
What I want to do now is rather than use the number of snapshots to keep I want to change this to specifying the date range of the snapshots to keep. For example delete anything older than a specific date & time. I kind of have an idea where to start and based on the above script I have the list of snapshots sorted by date. What I would like to do is prompt the user to specify the date and time from where snapshots would be deleted eg 2015-3-4 14:00:00 anything older than this would be deleted. Hoping someone can get me started here
Thanks!!

First, you can prompt user to specify the date and time from when snapshots would be deleted.
import datetime
user_time = str(input("Enter datetime from when you want to delete, like this format 2015-3-4 14:00:00:"))
real_user_time = datetime.datetime.strptime(user_time, '%Y-%m-%d %H:%M:%S')
print real_user_time # as you can see here, user time has been changed from a string to a datetime object
Second, delete anything older than that
SOLUTION ONE:
for snap in snapshots:
start_time = datetime.datetime.strptime(snap.start_time[:-5], '%Y-%m-%dT%H:%M:%S')
if start_time > real_user_time:
snap.delete()
SOLUTION TWO:
Since snapshots is sorted, you only find the first snap older than real_user_time and delete all the rest of them.
snap_num = len(snapshots)
for i in xrange(snap_num):
# if snapshots[i].start_time is not the format of datetime object, you will have to format it first like above
start_time = datetime.datetime.strptime(snapshots[i].start_time[:-5], '%Y-%m-%dT%H:%M:%S')
if start_time > real_user_time:
for n in xrange(i,snap_num):
snapshots[n].delete()
break
Hope it helps. :)

Be careful. Make sure to normalize the start time values (e.g., convert them to UTC). It doesn't make sense to compare the time in user local timezone with whatever timezone is used on the server. Also the local timezone may have different utc offsets at different times anyway. See Find if 24 hrs have passed between datetimes - Python.
If all dates are in UTC then you could sort the snapshots as:
from operator import attrgetter
snapshots.sort(key=attrgetter('start_time'))
If snapshots is sorted then you could "delete anything older than a specific date & time" using bisect module:
from bisect import bisect
class Seq(object):
def __init__(self, seq):
self.seq = seq
def __len__(self):
return len(self.seq)
def __getitem__(self, i):
return self.seq[i].start_time
del snapshots[:bisect(Seq(snapshots), given_time)]
it removes all snapshots with start_time <= given_time.
You could also remove older snapshots without sorting:
snapshots[:] = [s for s in snapshots if s.start_time > given_time]
If you want to call .delete() method explicitly without changing snapshots list:
for s in snapshots:
if s.start_time <= given_time:
s.delete()
If s.start_time is a string that uses 2015-03-04T06:35:18.000Z format then given_time should also be in that format (note: Z here means that the time is in UTC) if user uses a different timezone; you have to convert the time before comparison (str -> datetime -> datetime in utc -> str). If given_time is already a string in the correct format then you could compare the string directly without converting them to datetime first.

Regex is not validating date correctly

def chkDay(x, size, part):
dayre = re.compile('[0-3][0-9]') # day digit 0-9
if (dayre.match(x)):
if (len(x) > size):
return tkMessageBox.showerror("Warning", "This "+ part +" is invalid")
app.destroy
else:
tkMessageBox.showinfo("OK", "Thanks for inserting a valid "+ part)
else:
tkMessageBox.showerror("Warning", part + " not entered correctly!")
root.destroy
#when clicked
chkDay(vDay.get(),31, "Day")
#interface of tkinter
vDay = StringVar()
Entry(root, textvariable=vDay).pack()
Problem:
Not validating, I can put in a day greater than 31 and it still shows: OK
root (application) does not close when I call root.destroy

Validating date with regex is hard. You can use some patterns from: http://regexlib.com/DisplayPatterns.aspx?cattabindex=4&categoryId=5&AspxAutoDetectCookieSupport=1
or from http://answers.oreilly.com/topic/226-how-to-validate-traditional-date-formats-with-regular-expressions/
Remember that it is especially hard to check if year is leap, for example is date 2011-02-29 valid or not?
I think it is better to use specialized functions to parse and validate date. You can use strptime() from datetime module.

Let the standard datetime library handle your datetime data as well as parsing:
import datetime
try:
dt = datetime.datetime.strptime(date_string, '%Y-%m-%d')
except ValueError:
# insert error handling
else:
# date_string is ok, it represents the date stored in dt, now use it

31 is actually in your regex because [0-3][0-9] is not exactly what you're looking for.
You would better try to cast it to a int and explicitly check its bound.
Else the correct regex would be ([0-2]?\d|3[01]) to match a number from 0 up to 31

In order to limit the values between 1 and 31, you could use:
[1-9]|[12][0-9]|3[01]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Batch filename rename with conditional or math operation - python

Related

Convert date with more than 31 days to regular date format in python

is there a way to modify a string to remove a decimal?

How to change the string when the string is present

Python script for EC2 snapshots, use datetime to delete old snapshots

Regex is not validating date correctly

Categories

Resources