os.listdir not listing files in directory - python

For some reason os.listdir isn't working for me. I have 6 .xlsx files inside the input_dir but it creating a list with nothing in it instead of showing a list of 6 files. If I move the .xlsx files into where the script is one directory back, and update the input_dir path it then finds all 6 files but I need the 6 files to be one directory up in their own folder. And when I move them one directory up into their own folder, and I update the input_dir path it doesn't find them at all.
import openpyxl as xl
import os
import pandas as pd
import xlsxwriter
input_dir='C:\\Users\\work\\comparison'
files = [file for file in os.listdir(input_dir)
if os.path.isfile(file) and file.endswith(".xlsx")]
for file in files:
input_file = os.path.join(input_dir, file)
wb1=xl.load_workbook(input_file)
ws1=wb1.worksheets[0]

When you move the files into input_dir, the following line creates an empty list:
files = [file for file in os.listdir(input_dir)
if os.path.isfile(file) and file.endswith(".xlsx")]
This is because you are checking for os.path.isfile(file) instead of os.path.isfile(os.path.join(input_dir, file))
When files are present in the same directory as the script, it's able to find the file and creates the list correctly.
Alternatively, you could try using glob.glob which accepts a file path pattern and returns full path to the file in the iterator.

The problem comes from os.path.isfile(file) : os.listdir(input_dir) returns a list of filenames inside the input_dir directory, but without their path. Hence your error, as os.path.isfile(file) will look into your current directory, which obviously doesn't have any of those filenames.
You can easily correct this by simply changing os.path.isfile(input_dir + '\\' + file), but a prettier solution would rather be to simply delete this part of the code (as if os.listdir returns this filename, then it is necessary into your directory and there's no need to check if so) :
files = [file for file in os.listdir(input_dir) if file.endswith(".xlsx")]

Related

Python: Finding files in directory but ignoring folders and their contents

So my program search_file.py is trying to look for .log files in the directory it is currently placed in. I used the following code to do so:
import os
# This is to get the directory that the program is currently running in
dir_path = os.path.dirname(os.path.realpath(__file__))
# for loop is meant to scan through the current directory the program is in
for root, dirs, files in os.walk(dir_path):
for file in files:
# Check if file ends with .log, if so print file name
if file.endswith('.log')
print(file)
My current directory is as follows:
search_file.py
sample_1.log
sample_2.log
extra_file (this is a folder)
And within the extra_file folder we have:
extra_sample_1.log
extra_sample_2.log
Now, when the program runs and prints the files out it also takes into account the .log files in the extra_file folder. But I do not want this. I only want it to print out sample_1.log and sample_2.log. How would I approach this?
Try this:
import os
files = os.listdir()
for file in files:
if file.endswith('.log'):
print(file)
The problem in your code is os.walk traverses the whole directory tree and not just your current directory. os.listdir returns a list of all filenames in a directory with the default being your current directory which is what you are looking for.
os.walk documentation
os.listdir documentation
By default, os.walk does a root-first traversal of the tree, so you know the first emitted data is the good stuff. So, just ask for the first one. And since you don't really care about root or dirs, use _ as the "don't care" variable name
# get root files list.
_, _, files = next(os.walk(dir_path))
for file in files:
# Check if file ends with .log, if so print file name
if file.endswith('.log')
print(file)
Its also common to use glob:
from glob import glob
dir_path = os.path.dirname(os.path.realpath(__file__))
for file in glob(os.path.join(dir_path, "*.log")):
print(file)
This runs the risk that there is a directory that ends in ".log", so you could also add a testing using os.path.isfile(file).

Python loop through directories

I am trying to use python library os to loop through all my subdirectories in the root directory, and target specific file name and rename them.
Just to make it clear this is my tree structure
My python file is located at the root level.
What I am trying to do, is to target the directory 942ba loop through all the sub directories and locate the file 000000 and rename it to 000000.csv
the current code I have is as follow:
import os
root = '<path-to-dir>/942ba956-8967-4bec-9540-fbd97441d17f/'
for dirs, subdirs, files in os.walk(root):
for f in files:
print(dirs)
if f == '000000':
dirs = dirs.strip(root)
f_new = f + '.csv'
os.rename(os.path.join(r'{}'.format(dirs), f), os.path.join(r'{}'.format(dirs), f_new))
But this is not working, because when I run my code, for some reasons the code strips the date from the subduers
can anyone help me to understand how to solve this issue?
A more efficient way to iterate through the folders and only select the files you are looking for is below:
source_folder = '<path-to-dir>/942ba956-8967-4bec-9540-fbd97441d17f/'
files = [os.path.normpath(os.path.join(root,f)) for root,dirs,files in os.walk(source_folder) for f in files if '000000' in f and not f.endswith('.gz')]
for file in files:
os.rename(f, f"{f}.csv")
The list comprehension stores the full path to the files you are looking for. You can change the condition inside the comprehension to anything you need. I use this code snippet a lot to find just images of certain type, or remove unwanted files from the selected files.
In the for loop, files are renamed adding the .csv extension.
I would use glob to find the files.
import os, glob
zdir = '942ba956-8967-4bec-9540-fbd97441d17f'
files = glob.glob('*{}/000000'.format(zdir))
for fly in files:
os.rename(fly, '{}.csv'.format(fly))

on Creating zip file all folders from root directory are added

I am trying to zip all the files and folders present in a folder3 using python.
I have used zipFile for this. The zip contains all the folders from the root directory to the directory I want to create zip folder of.
def CreateZip(dir_name):
os.chdir(dir_name)
zf = zipfile.ZipFile("temp.zip", "w")
for dirname, subdirs, files in os.walk(dir_name):
zf.write(dirname)
for filename in files:
file=os.path.join(dirname, filename)
zf.write(file)
zf.printdir()
zf.close()
Expected output:
toBeZippedcontent1\toBeZippedFile1.txt
toBeZippedcontent1\toBeZippedFile2.txt
toBeZippedcontent1\toBeZippedFile1.txt
toBeZippedcontent2\toBeZippedFile2.txt
Current output (folder structure inside zip file):
folder1\folder2\folder3\toBeZippedcontent1\toBeZippedFile1.txt
folder1\folder2\folder3\toBeZippedcontent1\toBeZippedFile2.txt
folder1\folder2\folder3\toBeZippedcontent2\toBeZippedFile1.txt
folder1\folder2\folder3\toBeZippedcontent2\toBeZippedFile2.txt
walk() gives absolute path for dirname so join() create absolut path for your files.
You may have to remove folder1\folder2\folder3 from path to create relative path.
file = os.path.relpath(file)
zf.write(file)
You could try to slice it
file = file[len("folder1\folder2\folder3\\"):]
zf.write(file)
but relpath() should be better.
You can also use second argument to change path/name inside zip file
z.write(file, 'path/filename.ext')
It can be useful if you run code from different folder and you don't use os.chdir() so you can't create relative path.

Run the for loop for each file in directory using Python

I want to run for loop in python for each file in a directory. The directory names will be passed through a separate file (folderlist.txt).
Inside my main folder (/user/), new folders get added daily. So I want to run for loop for each file in the given folder. And don't want to run against folder which files have already been run through the loop. I'm thinking of maintaining folderlist.txt which will have folder names of only newly added folders each day which will be then passed to for loop.
For example under my main path (/user/) we see below folders :
(file present inside each folder are listed below folder name just to give the idea)
(day 1)
folder1
file1, file2, file3
folder2
file4, file5
folder3
file6
(day 2)
folder4
file7, file8, file9, file10
folder5
file11, file12
import os
with open('/user/folderlist.txt') as f:
for line in f:
line=line.strip("\n")
dir='/user/'+line
for files in os.walk (dir):
for file in files:
print(file)
# for filename in glob.glob(os.path.join (dir, '*.json')):
# print(filename)
I tried using os.walk and glob modules in the above code but looks like the loop is running more number of times than files in the folder. Please provide inputs.
Try changing os.walk(dir) for os.listdir(dir). This will give you a list of all the elements in the directory.
import os
with open('/user/folderlist.txt') as f:
for line in f:
line = line.strip("\n")
dir = '/user/' + line
for file in os.listdir(dir):
if file.endswith("fileExtension"):
print(file)
Hope it helps
*Help on function walk in module os:
walk(top, topdown=True, onerror=None, followlinks=False)
Directory tree generator.
For each directory in the directory tree rooted at top (including top
itself, but excluding '.' and '..'), yields a 3-tuple
dirpath, dirnames, filenames
dirpath is a string, the path to the directory. dirnames is a list of
the names of the subdirectories in dirpath (excluding '.' and '..').
filenames is a list of the names of the non-directory files in dirpath.
Note that the names in the lists are just names, with no path components.
To get a full path (which begins with top) to a file or directory in
dirpath, do os.path.join(dirpath, name).*
Therefore the files in the second loop is iterating on dirpath(string), dirnames(list), filenames(list).
Using os.listdir(dir) gives a list of all the files and folders in the dir as list.

Copy files (as backup) and change original file names (rearranging contents)

i'm a total python noob but i want to learn it and integrate it to my workflow.
I have about 400 files containing 4 different parts in the filename separated by an underline:
-> Version_Date_ProjectName_ProjectNumber
As we allways look at the Projectnumber first, we arranged the contents of the filename for new projects to:
-> ProjectNumber_Version_ProjektName
My Problem now is, that i like to rename all the existing files to be rearranged to the new format while having them backed up in a subdirectory called "Archiv".
It just has to be a simple script that i put in the directory and every file in this directory will be copied as backup and changed to the new filename.
EDIT:
My first step was to create a subfolder within the source directory, and it worked somehow. But no i saw, that i just need to backup the files with a specific file extension.
import os, shutil
src_dir= os.curdir
dst_dir= os.path.join(os.curdir, "Archiv")
shutil.copytree(src_dir, dst_dir)
i tried to extend the code with the solutions from here but it doesn't work out. :/
import os
import shutil
import glob
src_path = "YOU_SOURCE_PATH"
dest_path = "YOUR DESTINATION PATH"
if not os.path.exists(dest_path):
os.makedirs(dest_path)
files = glob.iglob(os.path.join(src_dir, "*.pdf"))
for file in files:
if os.path.isfile(file):
shutil.copy2(file, dest_path)

Categories

Resources