How do I copy files by date created? - python

I am trying to copy files from one folder to another. Sometimes the folder has 5 gigs worth of files, but I only need two months worth of files. How do I tell python to copy files from a date range of today to 2 months ago?
example: copy files created on 2.4.2022 - 4.4.2022.
would I do:
import shutil
import datetime
for file in range(2.4.2022, 4.4.2022):
shutil.copy('C:\\folder', 'C:\\folder2')
I need python to automatically use today's date. So when the code is run Python will use the date range of, the date that the code is run to two months ago.
Thank you for your help!
I am not good with python yet. I was able to use shutil.copytree for one folder. That worked because I need all the files in that particular folder, as for the second folder I don't need all the files.

I would recommend a couple of things.
First, you can compare dates as long as they have the right format, for example, you need to split your folder names from 2.4.2022, to datetime(2022,4,2), then in your program you can compare them like.
if datetime(2022,4,2) > datetime(2020,1,1):
print ("This folder needs to be copied")
...your copy statements
So, if this is a one time activity, you can just convert those folder names to datetime(), then compare them in a for loop against the initial date that you need (or dates), then run the copy.

Related

Python script to move files based on date and partition

I am looking to create a python script which can take a date to move file from one s3 folder to another s3 folder. Now while moving it uses created date to create folder in target i.e. stage/2023/01/12 and copy the file to this new folder.
Thanks
Param
I have used boto3 but not sure how to achieve that
To get the modification or creation date of each file, look here. It explains how to get the modification or creation date (returned as POSIX timestamp, i.e. seconds from the Unix epoch, January 1 1970).
You'll likely want to make the POSIX timestamp easier to work with by using the python datetime module; you'll begin by converting to a datetime object with date.fromtimestamp(your_posix_timestamp_here)
To programmatically create folders for year, month, and day, and copy the file to that folder: First pull the year, month, and day out of the datetime object, then do something like this:
#!/usr/bin/env python3
import os
from pathlib import Path
Path('2023/01/12/').mkdir(parents=True, exist_ok=True) # make nested folders for year, month, day
shutil.move("path/to/current/file.foo", "2023/01/12/file.foo") # move the file
Hope that helps!

Move file using specific nav path to another folder

I have a list of navigation paths to specific files, which all come from different folders. I'd like to move them all to a new folder.
Specifically, my data is formatted in two columns in a dataframe, where I'd like to move each file to its new folder. (Each row describes a file.)
My input:
df = pd.DataFrame({'old_path': ['/Users/myname/images/cat/file_0.jpg', '/Users/myname/images/dog/file_1.jpg', '/Users/myname/images/squirrel/file_2.jpg'],
'new_path': ['/Users/myname/mydir/file_0.jpg', '/Users/myname/mydir/file_1.jpg', '/Users/myname/mydir/file_2.jpg'],
})
However, I have yet to figure out how to modify the code below to do this in a loop or anything similarly helpful: How to move a file in Python?
import os
import shutil
shutil.move("path/to/current/file.foo", "path/to/new/destination/for/file.foo")
In this example, all the files are being moved to the same folder. But, it'd be great if the answered code were generalizable to send files to different folders, too.
Use zip with a for loop:
for old, new in zip(df['old_path'], df['new_path']):
shutil.move(old, new)

Automate appending yesterday's date to end of file names using Python?

I'm trying to solve a problem at work, I am not a developer but work in general IT operations and am trying to learn a bit here and there, so I may be way off right now with what I'm trying to do here. I've just been utilizing online resources, here and a little bit from the book Automate the Boring Stuff with Python. Here is my objective:
I have two files that are automatically placed in a folder on my computer every morning at the same time using a post processor, and I need to add yesterday's date to the end of the file names before I upload them to an FTP server which I have do each morning around the same time. I am trying to write a Python script that I can somehow schedule to run each morning right after the files are placed in the folder, which will append yesterday's date in MMDDYYYY format. For example, if the files are called "holdings.CSV" and "transactions.CSV" when they are placed in the folder, I need to rename them to "holdings01112022.CSV" and "transactions01112022.CSV". I only want to rename the new files in the folder, the files from previous days with the dates already appended will remain in the folder. Again, I'm a total beginner, so my code may not make sense and there may be superfluous or redundant lines, I'd love corrections... Am I going down the right path here, am I off altogether? Any suggestions?
import os, re
from datetime import date
from datetime import timedelta
directory = 'C:\Users\me\main folder\subfolder'
filePattern = re.compile('%s.CSV', re.VERBOSE)
for originalName in os.listdir('.'):
mo = filePattern.search(originalName)
if mo == None:
continue
today = date.today()
yesterday = today - timedelta(days = 1), '%M%D%Y'
for originalName in directory:
newName = originalName + yesterday
os.rename(os.path.join(directory, originalName), os.path.join(directory, newName))
Any help is appreciated. Thanks.
Here's a short example on how to code your algorithm.
import pathlib
from datetime import date, timedelta
if __name__ == '__main__':
directory = pathlib.Path('/Users/cesarv/Downloads/tmp')
yesterday = date.today() - timedelta(days=1)
for file in directory.glob('*[!0123456789].csv'):
new_path = file.with_stem(file.stem + yesterday.strftime('%m-%d-%Y'))
if not new_path.exists():
print(f'Renaming {file.name} to {new_path.name}')
file.rename(new_path)
else:
print(f'File {new_path.name} already exists.')
By using pathlib, you'll simplify the handling of filenames and paths and, if you run this program on Linux or macOS, it will work just the same.
Note that:
We're limiting the list of files to process with glob(), where the pattern *[!0123456789].csv means "all filenames that do not end with a digit before the suffix (the [!0123456789] represents one character, before the suffix, that should not - ! - equal any of the characters in the brackets)." This allows you to process only files that do not contain a date in their names.
The elements given by the for cycle, referenced with the file variable, are objects of class Path, which give us methods and properties we can work with, such as with_stem() and stem.
We create a new Path object, new_path, which will have its stem (the file's name part without the suffix .csv) renamed to the original file's name (file.name) plus yesterday in the format you require by using the method with_stem().
Since this new Path object contains the same path of the original file, we can use it to rename the original file.
As you can see, we can also check that a file with the new name does not exist before any renaming by using the exists() method.
You can also review the documentation on pathlib.
If you have any doubts, ask away!
Are you running to any issue? If all you want is to get the csv file you can simplify your code by using one if statement instead of using regex. You also need to convert your date to string before adding it to your file name. (i.e: newName = originalName[:-3] +str(yesterday)).
If this code didnt solve your problem please mention the error that you are receiving to make it easier for others to help you.
Here is my suggestion to eliminate possible errors without knowing what type of error you are getting.
import os, re
from datetime import date
from datetime import timedelta
directory = 'C:/Users/me/main folder/subfolder'
for originalName in os.listdir(directory):
print(f'original name is = {originalName}')
if originalName[-3:].lower() =='csv':
today = date.today()
yesterday = today - timedelta(days = 1)
yesterday_file_name = originalName[:-4]+yesterday.strftime("%m%d%Y")+'.csv'
os.rename(os.path.join(directory, originalName), os.path.join(directory, yesterday_file_name))
#print('formatted file names are')
#print(yesterday_file_name)
#here is the sample output
#original name is = holdings.csv
#formatted file names are
#holdings01112022.csv
#original name is = transactions.CSV
#formatted file names are
#transactions01112022.csv

Compare folders with DateTime Stamps - Python

I have a directory (Say Main folder) which contains two sub-directories. The two sub-directories have date-time stamp in their names: Like folder07242020_15_21PM and folder07242020_15_26PM. The Date and Time stamp in their names represent the date-time when they were created.
Can someone help me write a python code which will go to the Main folder, read the sub-directory names and then print something like
"folder07242020_15_26PM was created after folder07242020_15_21PM".
Thanks.

Most efficient/fastest way to get a single file from a directory

What is the most efficient and fastest way to get a single file from a directory using Python?
More details on my specific problem:
I have a directory containing a lot of pregenerated files, and I just want to pick a random one. Since I know that there's no really efficient way of picking a random file from a directory other than listing all the files first, my files are generated with an already random name, thus they are already randomly sorted, and I just need to pick the first file from the folder.
So my question is: how can I pick the first file from my folder, without having to load the whole list of files from the directory (nor having the OS to do that, my optimal goal would be to force the OS to just return me a single file and then stop!).
Note: I have a lot of files in my directory, hence why I would like to avoid listing all the files to just pick one.
Note2: each file is only picked once, then deleted to ensure that only new files are picked the next time (thus ensuring some kind of randomness).
SOLUTION
I finally chose to use an index file that will store:
the index of the current file to be picked (eg: 1 for file1.ext, 2 for file2.ext, etc..)
the index of the last file generated (eg: 1999 for file1999.ext)
Of course, this means that my files are not generated with a random name anymore, but using a deterministic incrementable pattern (eg: "file%s.ext" % ID)
Thus I have a near constant time for my two main operations:
Accessing the next file in the folder
Counting the number of files that are left (so that I can generate new files in a background thread when needed).
This is a specific solution for my problem, for more generic solutions, please read the accepted answer.
Also you might be interested into these two other solutions I've found to optimize the access of files and directory walking using Python:
os.walk optimized
Python FAM (File Alteration Monitor)
Don't have a lot of pregenerated files in 1 directory. Divide them over subdirectories if more than 'n' files in the directory.
when creating the files add the name of the newest file to a list stored in a text file. When you want to read/process/delete a file:
Open the text file
Set filename to the name on the top of the list.
Delete the name from the top of the list
Close the text file
Process filename.
Just use random.choice() on the os.listdir() result:
import random
import os
randomfilename = random.choice(os.listdir(path_to_directory))
os.listdir() returns results in the ordering given by the OS. Using random filenames does not change that ordering, only adding items to or removing items from the directory can influence that ordering.
If your fear that you'll have too many files, do not use a single directory. Instead, set up a tree of directories with pre-generated names, pick one of those at random, then pick a file from there.

Categories

Resources