I’d like to write a function to iterate over excel files that are in different folders. Parts of the path of each file are the same, for instance:
C:\Main\Division\Reports\Year\Data.xls
The only part of each path that changes is ‘Year’. The files all have the same name.
Is there a way to do this with a placeholder for Year? If not, what approach should I take?
You can use os.listdir function
directory = "C:\Main\Division\Reports"
root_dir = os.path.dirname(directory)
for data in os.listdir(directory):
file_name = os.path.join(root_dir, data, 'Data.xls')
# do something
You could try os.walk
import os
parent = "C:\Main\Division\Reports"
for root, directory, files in os.walk(parent):
print root
print directory
print files
Related
So my program search_file.py is trying to look for .log files in the directory it is currently placed in. I used the following code to do so:
import os
# This is to get the directory that the program is currently running in
dir_path = os.path.dirname(os.path.realpath(__file__))
# for loop is meant to scan through the current directory the program is in
for root, dirs, files in os.walk(dir_path):
for file in files:
# Check if file ends with .log, if so print file name
if file.endswith('.log')
print(file)
My current directory is as follows:
search_file.py
sample_1.log
sample_2.log
extra_file (this is a folder)
And within the extra_file folder we have:
extra_sample_1.log
extra_sample_2.log
Now, when the program runs and prints the files out it also takes into account the .log files in the extra_file folder. But I do not want this. I only want it to print out sample_1.log and sample_2.log. How would I approach this?
Try this:
import os
files = os.listdir()
for file in files:
if file.endswith('.log'):
print(file)
The problem in your code is os.walk traverses the whole directory tree and not just your current directory. os.listdir returns a list of all filenames in a directory with the default being your current directory which is what you are looking for.
os.walk documentation
os.listdir documentation
By default, os.walk does a root-first traversal of the tree, so you know the first emitted data is the good stuff. So, just ask for the first one. And since you don't really care about root or dirs, use _ as the "don't care" variable name
# get root files list.
_, _, files = next(os.walk(dir_path))
for file in files:
# Check if file ends with .log, if so print file name
if file.endswith('.log')
print(file)
Its also common to use glob:
from glob import glob
dir_path = os.path.dirname(os.path.realpath(__file__))
for file in glob(os.path.join(dir_path, "*.log")):
print(file)
This runs the risk that there is a directory that ends in ".log", so you could also add a testing using os.path.isfile(file).
My notebook is in the home folder where I also have another folder "test". In the test folder, I have 5 sub folders. Each of the folder contains a .shp file. I want to iterate in all sub folders within test and open all .shp files. It doesn't matter if they get overwritten.
data = gpd.read_file("./test/folder1/file1.shp")
data.head()
How can I do so? I tried this
path = os.getcwd()
files = glob.glob(os.path.join(path + "/test/", "*.shp"))
print(files)
but this would only go in 1 layer deep.
you can use the os.walk method in the os library.
import os
import pandas as pd
for root, dirs, files in os.walk("./test"):
for name in files:
fpath = os.path.join(root, name)
data = pd.read_file(fpath)
Just do os.chdir(path), and then use glob.glob(os.path.join('*.shp')). It should work.
You have already given the string to join 'os.path'.
I am trying to use python library os to loop through all my subdirectories in the root directory, and target specific file name and rename them.
Just to make it clear this is my tree structure
My python file is located at the root level.
What I am trying to do, is to target the directory 942ba loop through all the sub directories and locate the file 000000 and rename it to 000000.csv
the current code I have is as follow:
import os
root = '<path-to-dir>/942ba956-8967-4bec-9540-fbd97441d17f/'
for dirs, subdirs, files in os.walk(root):
for f in files:
print(dirs)
if f == '000000':
dirs = dirs.strip(root)
f_new = f + '.csv'
os.rename(os.path.join(r'{}'.format(dirs), f), os.path.join(r'{}'.format(dirs), f_new))
But this is not working, because when I run my code, for some reasons the code strips the date from the subduers
can anyone help me to understand how to solve this issue?
A more efficient way to iterate through the folders and only select the files you are looking for is below:
source_folder = '<path-to-dir>/942ba956-8967-4bec-9540-fbd97441d17f/'
files = [os.path.normpath(os.path.join(root,f)) for root,dirs,files in os.walk(source_folder) for f in files if '000000' in f and not f.endswith('.gz')]
for file in files:
os.rename(f, f"{f}.csv")
The list comprehension stores the full path to the files you are looking for. You can change the condition inside the comprehension to anything you need. I use this code snippet a lot to find just images of certain type, or remove unwanted files from the selected files.
In the for loop, files are renamed adding the .csv extension.
I would use glob to find the files.
import os, glob
zdir = '942ba956-8967-4bec-9540-fbd97441d17f'
files = glob.glob('*{}/000000'.format(zdir))
for fly in files:
os.rename(fly, '{}.csv'.format(fly))
I want to add the names of all the files in a specific folder to a list how can i do that? the pathway is from dropbox -> a folder called 'UMM' -> a folder called '2018' could someone help me with the code on this. I have tried using os.walk() but it doesn't seem to work
You can use os.walk and append only names which are in files.
from os import walk
file_names = list()
path = 'path/of/folder'
for root, dirc, files in walk(path):
for FileName in files:
file_names.append(FileName)
print(file_names)
This will append all the files name from all the directories and sub-directories of the specified path.
this will create a list of the files in a folder
from os import listdir
# the path
path = ''
fileList = listdir(path)
I am trying to create a program that copies files with certain file extension to the given folder. When files are located in subfolders instead of the root folder the program fails to get correct path. In its current state the program works perfectly for the files in the root folder, but it crashes when it finds matching items in subfolders. The program tries to use rootfolder as directory instead of the correct subfolder.
My code is as follows
# Selective_copy.py walks through file tree and copies files with
# certain extension to give folder
import shutil
import os
import re
# Deciding the folders and extensions to be targeted
# TODO: user input instead of static values
extension = "zip"
source_folder = "/Users/viliheikkila/documents/kooditreeni/"
destination_folder = "/Users/viliheikkila/documents/test"
def Selective_copy(source_folder):
# create regex to identify file extensions
mo = re.compile(r"(\w+).(\w+)") # Group(2) represents the file extension
for dirpath, dirnames, filenames in os.walk(source_folder):
for i in filenames:
if mo.search(i).group(2) == extension:
file_path = os.path.abspath(i)
print("Copying from " + file_path + " to " + destination_folder)
shutil.copy(file_path, destination_folder)
Selective_copy(source_folder)
dirpath is one of the things provided by walk for a reason: it gives the path to the directory that the items in files is located in. You can use that to determine the subfolder you should be using.
file_path = os.path.abspath(i)
This line is blatantly wrong.
Keep in mind that filenames keeps list of base file names. At this point it's just a list of strings and (technically) they are not associated at all with files in filesystem.
os.path.abspath does string-only operations and attempts to merge file name with current working dir. As a result, merged filename points to file that does not exist.
What should be done is merge between root and base file name (both values yield from os.walk):
file_path = os.path.abspath(dirpath, i)