I'm trying to use os.walk to iterate through a series of folders, find the earliest excel file in a given folder, and copy it to a temporary folder. This loop keeps exiting prior to creating the matches list. I'm getting the error
NameError: name 'xlsxExt' is not defined
How can I get the loop to run through each file in the expFolder, create one 'matches' list for each expFolder, and sort it? Any help would be appreciate!
path = r'C:\Users...'
tempFolder = os.mkdir(os.path.join(path, 'tempFolder'))
for dataFolder, expFolder, file in os.walk(path):
print("Starting", dataFolder)
matches = []
for file_name in file:
if file_name.endswith('.xlsx'):
xlsxExt = os.path.join(dataFolder, file_name)
matches.append(xlsxExt) #breaks here
sortedMatches = sorted(matches, key=os.path.getmtime)
first = sortedMatches[0][0:]
print(first)
I think there are two issues with your code.
The first is that your append call is happening outside of the inner loop. This means that you only append the last value you saw from the loop, not all the values that have the right extension.
The second issue is that you don't properly deal with folders that have no matching files. This is the current cause of your exception, though the exact exception details would change if you move the append call as suggested above. If you avoid the NameError, you'd instead get an IndexError when you try to do sortedMatches[0][0:] when sortedMatches is empty.
I suggest something like this:
for dataFolder, expFolder, file in os.walk(path):
print("Starting", dataFolder)
matches = []
for file_name in file:
if file_name.endswith('.xlsx'):
xlsxExt = os.path.join(dataFolder, file_name)
matches.append(xlsxExt) # indent this line!
if matches: # add this, ignores empty lists
sortedMatches = sorted(matches, key=os.path.getmtime) # indent the rest
first = sortedMatches[0] # you don't need to slice here
print(first)
As I commented, you don't need the slice when getting first from sortedMatches. You were just copying the whole filename string, which is entirely unnecessary. You don't need to make that change, it's unrelated to your errors.
You are initialising the variable xlsxExt inside for-1-for-2-if block. You are getting this error because the code is not able to initialise the variable and you are appending it to a list. Which means that either your for-2 or if are not getting executed because of the conditions not being satisfied. Always initialise such variables outside your conditional statements.
Related
I would like to delete files that begin with the array values and end with "_1.txt", from this given directory. I have this so far, which deletes the files successfully, but it throws an error every time, "FileNotFoundError: [Errno 2] No such file or directory: 'tom_1.txt'". I think the loop is not ending somehow, but I cannot figure out how to fix it.
import os
directory = '/home/ec2-user/SageMaker/'
names = np.array(['tom','jen','bob'])
for filename in os.scandir(directory):
for name in names:
os.remove(f'{name}_1.txt')
os.scandir only returns the file names. You have to add the path:
os.remove( f'{directory}{name}_1.txt' )
However, unless every name exists, that might still fail. You might consider:
name = f'{directory}{name}_1.txt'
if os.path.isfile(name):
os.remove(name)
I didn't even notice you aren't using the result of os.scandir. You can remove that outer loop and this will still work.
I created a script that is supposed to download some data, then run a few processes. The data source (being ArcGIS Online) always downloads the data as a zip file and when extracted the folder name will be a series of letters and numbers. I noticed that these occasionally change (not entirely sure why). My thought is to run an os.listdir to get the folder name then rename it. Where I run into issues is that the list returns the folder name with brackets and quotes. It returns as ['f29a52b8908242f5b1f32c58b74c063b.gdb'] as the folder name while folder in the file explorer does not have the brackets and quotes. Below is my code and the error I receive.
from zipfile import ZipFile
file_name = "THDNuclearFacilitiesBaseSandboxData.zip"
with ZipFile(file_name) as zip:
# unzipping all the files
print("Unzipping "+ file_name)
zip.extractall("C:/NAPSG/PROJECTS/DHS/THD_Nuclear_Facilities/SCRIPT/CountyDownload/Data")
print('Unzip Complete')
#removes old zip file
os.remove(file_name)
x = os.listdir("C:/NAPSG/PROJECTS/DHS/THD_Nuclear_Facilities/SCRIPT/CountyDownload/Data")
os.renames(str(x), "Test.gdb")
Output:
FileNotFoundError: [WinError 2] The system cannot find the file specified: "['f29a52b8908242f5b1f32c58b74c063b.gdb']" -> 'Test.gdb'
I'm relatively new to python scripting, so if there is an easier alternative, that would be great as well. Thanks!
os.listdir() returns a list files/objects that are in a folder.
lists are represented, when printed to the screen, using a set of brackets.
The name of each file is a string of characters and strings are represented, when printed to the screen, using quotes.
So we are seeing a list with a single filename:
['f29a52b8908242f5b1f32c58b74c063b.gdb']
To access an item within a list using Python, you can using index notation (which happens to also use brackets to tell Python which item in the list to use by referencing the index or number of the item.
Python list indexes starting at zero, so to get the first (and in this case only item in the list), you can use x[0].
x = os.listdir("C:/NAPSG/PROJECTS/DHS/THD_Nuclear_Facilities/SCRIPT/CountyDownload/Data")
os.renames(x[0], "Test.gdb")
Having said that, I would generally not use x as a variable name in this case... I might write the code a bit differently:
files = os.listdir("C:/NAPSG/PROJECTS/DHS/THD_Nuclear_Facilities/SCRIPT/CountyDownload/Data")
os.renames(files[0], "Test.gdb")
Square brackets indicate a list. Try x[0] that should get rid of the brackets and be just the data.
The return from listdir may be a list with only one value or a whole bunch
I have a class that takes in positional arguments (startDate, endDate, unmappedDir, and fundCodes), I have the following methods:
The method below is supposed to take in a an array of fundCodes and look in a directory and see if it finds files matching a certain format
def file_match(self, fundCodes):
# Get a list of the files in the unmapped directory
files = os.listdir(self.unmappedDir)
# loop through all the files and search for matching fund code
for check_fund in fundCodes:
# set a file pattern
file_match = 'unmapped_positions_{fund}_{start}_{end}.csv'.format(fund=check_fund, start=self.startDate, end=self.endDate)
# look in the unmappeddir and see if there's a file with that name
if file_match in files:
# if there's a match, load unmapped positions as etl
return self.read_file(file_match)
else:
Logger.error('No file found with those dates/funds')
The other method is simply supposed to create an etl table from that file.
def read_file(self, filename):
loadDir = Path(self.unmappedDir)
for file in loadDir.iterdir():
print('*' *40)
Logger.info("Found a file : {}".format(filename))
print(filename)
unmapped_positions_table = etl.fromcsv(filename)
print(unmapped_positions_table)
print('*' * 40)
return unmapped_positions_table
When running it, I'm able to retrieve the filename:
Found a file : unmapped_positions_PUPSFF_2018-07-01_2018-07-11.csv
unmapped_positions_PUPSFF_2018-07-01_2018-07-11.csv
But when trying to create the table, I get this error:
FileNotFoundError: [Errno 2] No such file or directory: 'unmapped_positions_PUPSFF_2018-07-01_2018-07-11.csv'
Is it expecting a full path to the filename or something?
The proximate problem is that you need a full pathname.
The filename that you're trying to call fromcsv on is passed into the function, and ultimately came from listdir(self.unmappedDir). This means it's a path relative to self.unmappedDir.
Unless that happens to also be your current working directory, it's not going to be a valid path relative to the current working directory.
To fix that, you'd want to use os.path.join(self.unmappedDir, filename) instead of just filename. Like this:
return self.read_file(os.path.join(self.unmappedDir), file_match)
Or, alternatively, you'd want to use pathlib objects instead of strings, as you do with the for file in loadDir.iterdir(): loop. If file_match is a Path instead of a dumb string, then you can just pass it to read_file and it'll work.
But, if that's what you actually want, you've got a lot of useless code. In fact, the entire read_file function should just be one line:
def read_file(self, path):
return etl.fromcsv(path)
What you're doing instead is looping over every file in the directory, then ignoring that file and reading filename, and then returning early after the first one. So, if there's 1 file there, or 20 of them, this is equivalent to the one-liner; if there are no files, it returns None. Either way, it doesn't do anything useful except to add complexity, wasted performance, and multiple potential bugs.
If, on the other hand, the loop is supposed to do something meaningful, then you should be using file rather than filename inside the loop, and you almost certainly shouldn't be doing an unconditional return inside the loop.
with this:
files = os.listdir(self.unmappedDir)
you're getting the file names of self.unmappedDir
So when you get a match on the name (when generating your name), you have to read the file by passing the full path (else the routine probably checks for the file in the current directory):
return self.read_file(os.path.join(self.unmappedDir,file_match))
Aside: use a set here:
files = set(os.listdir(self.unmappedDir))
so the filename lookup will be much faster than with a list
And your read_file method (which I didn't see earlier) should just open the file, instead of scanning the directory again (and returning at first iteration anyway, so it doesn't make sense):
def read_file(self, filepath):
print('*' *40)
Logger.info("Found a file : {}".format(filepath))
print(filepath)
unmapped_positions_table = etl.fromcsv(filepath)
print(unmapped_positions_table)
print('*' * 40)
return unmapped_positions_table
Alternately, don't change your main code (except for the set part), and prepend the directory name in read_file since it's an instance method so you have it handy.
I have zip file downloaded from website. I wanted to make script that rename zip file and before unzip, it checks how many files are in it and unzip it.
The problem is that zip file is in the directory but it keep giving me error that
'FileNotFoundError: [Errno 2] No such file or directory: 'filename.zip''
I assumed that It might be caused by file name because I use ubuntu and when I downloaded the file, the name was broken because it was not English. so I changed it into numbers (ex:20176) but still getting this error.
my script
path means absolute path.
data_type = '{}{}'.format('201706', '.zip')
filename = [i for i in os.listdir('user/directory')]
filename.sort(key=lambda ctime: ctime[0])
downloaded = str(filename[0])
old = os.path.join('user/directory', downloaded)
new = os.path.join('user/directory', data_type)
os.rename(old, new)
zip = ZipFile(data_type)
archived_files = zip.namelist()
amount = len(archived_files)
Let's suppose the first filename in the sorted list is myfile.txt. Your code
old = os.path.join('user/directory', downloaded)
new = os.path.join('user/directory', data_type)
os.rename(old, new)
renames the first file in the directory listing, user/directory/myfile.txt (not, due to the considerations above, the oldest one) to user/directory/201706.zip. The next statement then tries to open 2010706.zip, which of course doesn't exist. It should work if you try
zip = ZipFile(new)
Unfortunately there's no guarantee that the file actually will be a zipfile, so the operation may fail.
Some other points to consider, perhaps in other questions:
I suspect you are misunderstanding the sorting functions: although you appear to want to sort the list of filenames on creation time, just calling a lambda's parameter ctime doesn't mean that Python will understand your needs. If only programming languages had a DWIM ("do what I mean") mode, life would be so much easier!
The key argument to sort is a function that the sort function calls once for each value to be sorted, expecting it to return a "sort key" (that is, a value that represents its place in the required ordering). Suppose I take your lambda and apply it to a filename:
In [1]: ruth_lambda = lambda ctime: ctime[0]
In [2]: ruth_lambda("MY_FILENAME.TXT")
Out[2]: 'M'
You can see that you are sorting on the first character of the filename, and I doubt that's what you really want. But we can go into that later.
A quick note on formatting: it would have been simpler to write
data_type = '{}{}'.format(zipfile_name, '.zip')
as
data_type = '{}.zip'.format(zipfile_name)
Finally, since os.listdir returns a list, instead of
filename = [i for i in os.listdir('user/directory')]
it's simpler to write
filename = os.listdir('user/directory')
though the extra computation will do no harm. As a beginner you will find that as you improve, your older code starts to look really clunky - don't worry about that, it's a common experience! Just move forwards and try not to repeat old mistakes.
I want to make a name list and store all the names in four folders. I build
namelist = {1:[], 2:[], 3:[], 4:[]}
In the method, I write
for file_name in sorted(os.listdir(full_subdir_name)):
full_file_name = os.path.join(full_subdir_name,file_name)
#namelist[level] += blabla...
I want to add the names from the first folder into namelist[1], from the second folder to namelist[2]. I don't know how I can add all the names in different levels to it. Thx!
I am not totally sure this is what you are getting at with your question above. But it seems that first, you want to use enumerate() to allow you to keep the index and name of the four folders. But then, I think you will actually need an additional for-loop, based on your comments above, to actually append all of the contents for each of the files within each sub-folder. The following should do the trick.
Also note that your dictionary keys start with 1, so you need to account for the fact that the enumerate index starts with 0.
# Here, enumerate gives back the index and element.
for i,file_name in enumerate(sorted(os.listdir(full_subdir_name))):
full_file_name = os.path.join(full_subdir_name,file_name)
# Here, 'elem' will be the strings naming the actual
# files inside of the folders.
for elem in sorted(os.listdir(full_file_name)):
# Here I am assuming you don't want to append the full path,
# but you can easily change what to append by adding the
# whole current file path: os.path.join(full_file_name, elem)
namelist[i+1].append(elem)