How can I process only files with a certain name? - python

I am generating a python code that automatically processes and combines JSON datasets.
Meanwhile, when I access each folder, there are two JSON datasets in a folder, which are, for example
download/2019/201901/dragon.csv
download/2019/201901/kingdom.csv
and the file names are the same across all folders. In other words, each folder has two datasets with the name above.
in the 'download' folder, there are 4 folders, 2019, 2020, 2021, 2022, and
in the folder of each year, there are folders for each month, e.g., 2019/201901, 2019/201902, ~~
In this situation, I want to process only 'dragon.csv's. I wonder how I can do it. my current code is
import os
import pandas as pd
import numpy as np
path = 'download/2019'
save_path = 'download'
class Preprocess:
def __init__(self, path, save_path):
self.path = path
self.save_path = save_path
after finishing processing,
def save_dataset(path, save_path):
for dir in os.listdir(path):
for file in os.listdir(os.path.join(path, dir)):
if file[-3:] == 'csv':
df = pd.read_csv(os.path.join(path, dir, file))
print(f'Reading data from {os.path.join(path, dir, file)}')
print('Start Preprocessing...')
df = preprocessing(df)
print('Finished!')
if not os.path.exists(os.path.join(save_path, dir)):
os.makedirs(os.path.join(save_path, dir))
df.to_csv(os.path.join(save_path, dir, file), index=False)
save_dataset(path, save_path)

If I understand your question, you only want to process files that include the substring "dragon". You could do this by adding a conditional to your if-clause. So instead of writing if file[-3:] == 'csv' simply write if file[-3:] == 'csv' and 'dragon' in file

You can use pathlib's glob method:
from pathlib import Path
p = Path() # nothing if you're in the folder containing `download` else point to that folder
dragons_paths = p.glob("download/**/dragons.csv")
dragons_paths contains a generator that will point to all the dragons.csv files under download folder.
PS. You should avoid shadowing dir, maybe call your variable dir_ or d.

Related

Read all files in project folder with variable path - Python

Looking for thoughts on how to read all csv files inside a folder in a project.
As an example, the following code is a part of my present working code, where my 'ProjectFolder' is on Desktop, and I am hardcoding the path. Inside the project folder, I have 'csvfolder' where I have all my csv files
However if I move the "ProjectFolder" to a different Hard drive, or other location, my path fails and I have to provide a new path. Is there an smart way to not worry about location of the project folder?
path = r'C:\Users\XXX\Desktop\ProjectFolder\csvFolder' # use your path
all_files = glob.glob(path + "/*.csv")
df_mm = pd.concat((pd.read_csv(f, usecols=["[mm]"]) for f in all_files),
axis = 1, ignore_index = True)
We have dynamic and absolute path concepts, just search on google "absolute vs relative path"; in your case if your python file is in the ProjectFolder you can simply try this:
from os import listdir
from os.path import dirname, realpath, join
def main():
# This is your project directory
current_directory_path = dirname(realpath(__file__))
# This is your csv directory
csv_files_directory_path = join(current_directory_path, "csvFolder")
for each_file_name in listdir(csv_files_directory_path):
if each_file_name.endswith(".csv"):
each_csv_file_full_path = join(csv_files_directory_path, each_file_name)
# Do whatever you want with each_csv_file_full_path
if __name__ == '__main__':
main()

shutil.move() only works with existing folder?

I would like to use the shutil.move() function to move some files which match a certain pattern to a newly created(inside python script)folder, but it seems that this function only works with existing folders.
For example, I have 'a.txt', 'b.txt', 'c.txt' in folder '/test', and I would like to create a folder '/test/b' in my python script using os.join() and move all .txt files to folder '/test/b'
import os
import shutil
import glob
files = glob.glob('./*.txt') #assume that we in '/test'
for f in files:
shutil.move(f, './b') #assume that './b' already exists
#the above code works as expected, but the following not:
import os
import shutil
import glob
new_dir = 'b'
parent_dir = './'
path = os.path.join(parent_dir, new_dir)
files = glob.glob('./*.txt')
for f in files:
shutil.move(f, path)
#After that, I got only 'b' in '/test', and 'cd b' gives:
#[Errno 20] Not a directory: 'b'
Any suggestion is appreciated!
the problem is that when you create the destination path variable name:
path = os.path.join(parent_dir, new_dir)
the path doesn't exist. So shutil.move works, but not like you're expecting, rather like a standard mv command: it moves each file to the parent directory with the name "b", overwriting each older file, leaving only the last one (very dangerous, because risk of data loss)
Create the directory first if it doesn't exist:
path = os.path.join(parent_dir, new_dir)
if not os.path.exists(path):
os.mkdir(path)
now shutil.move will create files when moving to b because b is a directory.

Python: Move only file who filename match with the current date in another dir

I have this situation:
On my HD i have this kind of directory structure:
root>
root>dir1>05-13-2018_xxxxxxxx.TXT<br>
root>dir1>05-14-2018_xxxxxxxx.TXT <-- today file to copy in another dir<br>
root>dir2>05-13-2018_xxxxxxxx.TXT<br>
root>dir2>05-14-2018_xxxxxxxx.TXT <-- today file to copy in another dir<br>
root>dir3>ecc...
i have formatted a python variable who reflect the today date like this data_oggi = str(datetime.datetime.today().strftime('%m-%d-%Y'))
i need to scan all subdirs and move the file with match date in filename in another dir...
I'm in empasse,
can anybody help me?
Thank guys for every support.
The best way would be using glob module to list all matching files in your source directory and moving them to the destination using shutil.
I hope following will be helpful:
import os
import shutil
import datetime
import glob
BASE = "/root/"
DESTINATION = '/root/test2'
os.path.exists(DESTINATION) or os.makedirs(DESTINATION)
# DIR = 'dir1'
DIR = 'dir1'
def get_file_key():
"""
Builds file identifier
"""
return str(datetime.datetime.today().strftime('%m-%d-%Y'))
def move_file(file_path):
try:
shutil.move(file_path, DESTINATION)
except:
print('file exists....')
if __name__ == '__main__':
file_for_today = get_file_key()
files = glob.glob(BASE + '/**/{key}*.txt'.format(key=file_for_today), recursive=True)
for file in files:
move_file(file)

Keeping renamed text files in original folder

This is my current (from a Jupyter notebook) code for renaming some text files.
The issue is when I run the code, the renamed files are placed in my current working Jupyter folder. I would like the files to stay in the original folder
import glob
import os
path = 'C:\data_research\text_test\*.txt'
files = glob.glob(r'C:\data_research\text_test\*.txt')
for file in files:
os.rename(file, file[-27:])
You should only change the name and keep the path the same. Your filename will not always be longer than 27 so putting this into you code is not ideal. What you want is something that just separates the name from the path, no matter the name, no matter the path. Something like:
import os
import glob
path = 'C:\data_research\text_test\*.txt'
files = glob.glob(r'C:\data_research\text_test\*.txt')
for file in files:
old_name = os.path.basename(file) # now this is just the name of your file
# now you can do something with the name... here i'll just add new_ to it.
new_name = 'new_' + old_name # or do something else with it
new_file = os.path.join(os.path.dirname(file), new_name) # now we put the path and the name together again
os.rename(file, new_file) # and now we rename.
If you are using windows you might want to use the ntpath package instead.
file[-27:] takes the last 27 characters of the filename so unless all of your filenames are 27 characters long, it will fail. If it does succeed, you've stripped off the target directory name so the file is moved to your current directory. os.path has utilities to manage file names and you should use them:
import glob
import os
path = 'C:\data_research\text_test*.txt'
files = glob.glob(r'C:\data_research\text_test*.txt')
for file in files:
dirname, basename = os.path.split(file)
# I don't know how you want to rename so I made something up
newname = basename + '.bak'
os.rename(file, os.path.join(dirname, newname))

Python: Loop through directory, check if certain amount of files is in there, if not; copy 2 files from other directory and one file based on name

I'm still in the learning proces of python.
I'm trying to make a script that does the following:
Loop through directory's based on todays date (so if I run it tomorrow, itll look for the folders with tomorrows date on it).
Check if there are .pdf files in it.
If there arent any .pdf files in them: copy standard 2 of them from another directory + copy one based on name of the excel file name. (So lets say the excel filed is called: Excelfile45 then it should copy the pdf file called: "45") EDIT: It can also be based on directory map if that is an easier way of doing things.
So this is I got this far:
import os, fnmatch
def findDir (path, filter):
for root, dirs, files in os.walk(path):
for file in fnmatch.filter(files, filter):
yield os.path.join(root, file)
for pdfFile in findDir(r'C:\new', '*.pdf'):
print(pdfFile)
Its runs through the directories and looks for PDF's in them. But now I've got no clue on how to continue.
Any help is greatly appreciated!
Also my apologies for any grammar / spelling errors.
Your specs are pretty vague, so I had to assume a lot of things. I think this code achieves what you want, but you may have to tweak it a bit (for example date format in the directory name).
I assumed a directory structure like this:
c:\new (base dir)
daily_2014_12_14
daily_2014_12_15
...
standard
And the code:
import os
import fnmatch
import time
import shutil
import re
# directories
base_dir = "C:\new"
standard_dir = os.path.join(base_dir, "standard")
# find files in directory. based on yours, but modified to return a list.
def find_dir(path, name_filter):
result = []
for root, dirs, files in os.walk(path):
for filename in fnmatch.filter(files, name_filter):
result.append(os.path.join(root, filename))
return result
# getting today's directory. you can rearrange year-month-day as you want.
def todays_dir():
date_str = time.strftime("%Y_%m_%d")
return os.path.join(base_dir, "daily_" + date_str)
# copy a file from one directory to another
def copy(filename, from_dir, to_dir):
from_file = os.path.join(from_dir, filename)
to_file = os.path.join(to_dir, filename)
shutil.copyfile(from_file, to_file)
# main logic
today_dir = todays_dir()
pdfs = find_dir(today_dir, "*.pdf")
excels = find_dir(today_dir, "*.xls")
if len(pdfs) == 0:
copy("standard1.pdf", standard_dir, today_dir)
copy("standard2.pdf", standard_dir, today_dir)
if len(excels) == 1:
excel_name = os.path.splitext(excels[0])[0]
excel_num = re.findall(r"\d+", excel_name)[-1]
copy(excel_num + ".pdf", standard_dir, today_dir)
Also: I agree with Iplodman's comment. Show us a bit more effort next time.

Categories

Resources