Compare folders with DateTime Stamps - Python

Compare folders with DateTime Stamps - Python - python

I have a directory (Say Main folder) which contains two sub-directories. The two sub-directories have date-time stamp in their names: Like folder07242020_15_21PM and folder07242020_15_26PM. The Date and Time stamp in their names represent the date-time when they were created.
Can someone help me write a python code which will go to the Main folder, read the sub-directory names and then print something like
"folder07242020_15_26PM was created after folder07242020_15_21PM".
Thanks.

Related

Python script to move files based on date and partition

I am looking to create a python script which can take a date to move file from one s3 folder to another s3 folder. Now while moving it uses created date to create folder in target i.e. stage/2023/01/12 and copy the file to this new folder.
Thanks
Param
I have used boto3 but not sure how to achieve that

To get the modification or creation date of each file, look here. It explains how to get the modification or creation date (returned as POSIX timestamp, i.e. seconds from the Unix epoch, January 1 1970).
You'll likely want to make the POSIX timestamp easier to work with by using the python datetime module; you'll begin by converting to a datetime object with date.fromtimestamp(your_posix_timestamp_here)
To programmatically create folders for year, month, and day, and copy the file to that folder: First pull the year, month, and day out of the datetime object, then do something like this:
#!/usr/bin/env python3
import os
from pathlib import Path
Path('2023/01/12/').mkdir(parents=True, exist_ok=True) # make nested folders for year, month, day
shutil.move("path/to/current/file.foo", "2023/01/12/file.foo") # move the file
Hope that helps!

How to run my Python code for every Excel file contained in a folder?

I have a folder named with a certain acronym, and inside this folder you can find a certain number of Excel files.
The folder's name indicates the name of the apartment (for ex. UDC06_45) and, inside this folder, all of the Excel files' name are composed by:
the name of the apartment, followed by the name of the appliance that is located in that apartment (for ex. UDC06_45_Oven).
These Excel files are very simple DataFrames, they contain energy consumption measurements: one column named "timestamps" and one column named "Energy" (all of these measurements have a 15 min frequency). All of the Excel files inside the folder are made with the same identical structure.
My Python code takes as input only one of these Excel files at a time and makes few operations on them (resampling, time interpolation, etc.) starting with the command "pd.read_excel()", and creates an output Excel file with "df.to_excel()" after giving it a name.
What I want to do is to apply my code automatically to all of the files in that folder.
The code should take as input only the name of the folder ("UDC06_45") and create as many output files as needed.
So if the folder contains only two appliances:
"UDC06_45_Oven"
"UDC06_45_Fridge"
the code will elaborate them both, one after the other, and I should obtain two dinstinct Excel files as output. Their name is just composed by the input file's name followed by "_output":
"UDC06_45_Oven_output"
"UDC06_45_Fridge_output".
In general, this must be done for every Excel file contained in that folder. If the folder contains 5 appliances, meaning 5 input Excel files, I should obtain 5 output Excel files... and so on.
How can I do it?

In the following code only assing your path, in my case I have used a test folder path path=r'D:\test' this code will create a new folder automatically in the same path.
import pandas as pd
import os
from glob import glob
path=r'D:\test' # add whatever your path is in place of 'D:\test'
input_folder='UDC06_45' # name of input folder
output_folder=input_folder+'_out'
new_path=path+'/'+output_folder
if not os.path.exists(new_path):
os.makedirs(new_path)
files=glob(path+'/'+input_folder+'/'+'*.xlsx')
for file in files:
name=file.split(path+'/'+input_folder+'\\')[-1].rsplit('.')[0]
df=pd.read_excel(file)
#do all your operations here
df.to_excel(new_path+'/'+name+'_output.xlsx')

How do I copy files by date created?

I am trying to copy files from one folder to another. Sometimes the folder has 5 gigs worth of files, but I only need two months worth of files. How do I tell python to copy files from a date range of today to 2 months ago?
example: copy files created on 2.4.2022 - 4.4.2022.
would I do:
import shutil
import datetime
for file in range(2.4.2022, 4.4.2022):
shutil.copy('C:\\folder', 'C:\\folder2')
I need python to automatically use today's date. So when the code is run Python will use the date range of, the date that the code is run to two months ago.
Thank you for your help!
I am not good with python yet. I was able to use shutil.copytree for one folder. That worked because I need all the files in that particular folder, as for the second folder I don't need all the files.

I would recommend a couple of things.
First, you can compare dates as long as they have the right format, for example, you need to split your folder names from 2.4.2022, to datetime(2022,4,2), then in your program you can compare them like.
if datetime(2022,4,2) > datetime(2020,1,1):
print ("This folder needs to be copied")
...your copy statements
So, if this is a one time activity, you can just convert those folder names to datetime(), then compare them in a for loop against the initial date that you need (or dates), then run the copy.

Split 2 csv files into smaller sets of files based on unique values in python

Sorry if this question has been asked before, I just couldn't find a simple example.
I have 2 large CSV files that I would like to split based on the unique values in the Location & LocationType Column. I would like to store the split csv files into sub-directories for each value in a folder named item/{item_name} where item_name is the unique value in Location & Location_type.
Location.csv
Location-type.csv
Each split csv file should have the same header line as the parent file
If the sub-directory already exists, delete those files before writing the new files.
End result would be a directory called item with two sub-directories called fm5 & fm15 with our split CSV files stored. location.csv & location_type.csv
Thank you in advance

would like to know the workflow for this type of project
open the file
sort the contents on the desired column
group by the desired column
write each group to a new file

Combine several excel files from multiple folders and subfolders into one pandas dataframe

My main folder is called "Data". Inside, I have 20 folders labelled from 1 to 20. In each of these 20 subfolders I have another 1 to 5 subfolders and one of them is called "test_results" (the one I am interested in). Inside that test_result folder I have several files, ranging from .jpeg, .csv, .xlxs. I need to work with the .xlxs files. How do I retrieve the .xlxs files ONLY that fall inside the parent folder "Data" and concatenate them into one data frame so that I can do my analyses?
I know how to do so when all the files are located in a single folder but the fact that they are in subfolders and mixed with other types of files adds complexity to it and I am unable to figure it out.

Use pathlib module.
Demo:
from pathlib import Path
p = Path(r'/path/to/Data')
df = pd.concat([pd.read_excel(f) for f in p.glob('**/test_results/*.xlsx')],
ignore_index=True)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.