python-read file names and build a namelist - python

I want to make a name list and store all the names in four folders. I build
namelist = {1:[], 2:[], 3:[], 4:[]}
In the method, I write
for file_name in sorted(os.listdir(full_subdir_name)):
full_file_name = os.path.join(full_subdir_name,file_name)
#namelist[level] += blabla...
I want to add the names from the first folder into namelist[1], from the second folder to namelist[2]. I don't know how I can add all the names in different levels to it. Thx!

I am not totally sure this is what you are getting at with your question above. But it seems that first, you want to use enumerate() to allow you to keep the index and name of the four folders. But then, I think you will actually need an additional for-loop, based on your comments above, to actually append all of the contents for each of the files within each sub-folder. The following should do the trick.
Also note that your dictionary keys start with 1, so you need to account for the fact that the enumerate index starts with 0.
# Here, enumerate gives back the index and element.
for i,file_name in enumerate(sorted(os.listdir(full_subdir_name))):
full_file_name = os.path.join(full_subdir_name,file_name)
# Here, 'elem' will be the strings naming the actual
# files inside of the folders.
for elem in sorted(os.listdir(full_file_name)):
# Here I am assuming you don't want to append the full path,
# but you can easily change what to append by adding the
# whole current file path: os.path.join(full_file_name, elem)
namelist[i+1].append(elem)

Related

If then loop exits without populating list

I'm trying to use os.walk to iterate through a series of folders, find the earliest excel file in a given folder, and copy it to a temporary folder. This loop keeps exiting prior to creating the matches list. I'm getting the error
NameError: name 'xlsxExt' is not defined
How can I get the loop to run through each file in the expFolder, create one 'matches' list for each expFolder, and sort it? Any help would be appreciate!
path = r'C:\Users...'
tempFolder = os.mkdir(os.path.join(path, 'tempFolder'))
for dataFolder, expFolder, file in os.walk(path):
print("Starting", dataFolder)
matches = []
for file_name in file:
if file_name.endswith('.xlsx'):
xlsxExt = os.path.join(dataFolder, file_name)
matches.append(xlsxExt) #breaks here
sortedMatches = sorted(matches, key=os.path.getmtime)
first = sortedMatches[0][0:]
print(first)
I think there are two issues with your code.
The first is that your append call is happening outside of the inner loop. This means that you only append the last value you saw from the loop, not all the values that have the right extension.
The second issue is that you don't properly deal with folders that have no matching files. This is the current cause of your exception, though the exact exception details would change if you move the append call as suggested above. If you avoid the NameError, you'd instead get an IndexError when you try to do sortedMatches[0][0:] when sortedMatches is empty.
I suggest something like this:
for dataFolder, expFolder, file in os.walk(path):
print("Starting", dataFolder)
matches = []
for file_name in file:
if file_name.endswith('.xlsx'):
xlsxExt = os.path.join(dataFolder, file_name)
matches.append(xlsxExt) # indent this line!
if matches: # add this, ignores empty lists
sortedMatches = sorted(matches, key=os.path.getmtime) # indent the rest
first = sortedMatches[0] # you don't need to slice here
print(first)
As I commented, you don't need the slice when getting first from sortedMatches. You were just copying the whole filename string, which is entirely unnecessary. You don't need to make that change, it's unrelated to your errors.
You are initialising the variable xlsxExt inside for-1-for-2-if block. You are getting this error because the code is not able to initialise the variable and you are appending it to a list. Which means that either your for-2 or if are not getting executed because of the conditions not being satisfied. Always initialise such variables outside your conditional statements.

Rename directory with constantly changing name

I created a script that is supposed to download some data, then run a few processes. The data source (being ArcGIS Online) always downloads the data as a zip file and when extracted the folder name will be a series of letters and numbers. I noticed that these occasionally change (not entirely sure why). My thought is to run an os.listdir to get the folder name then rename it. Where I run into issues is that the list returns the folder name with brackets and quotes. It returns as ['f29a52b8908242f5b1f32c58b74c063b.gdb'] as the folder name while folder in the file explorer does not have the brackets and quotes. Below is my code and the error I receive.
from zipfile import ZipFile
file_name = "THDNuclearFacilitiesBaseSandboxData.zip"
with ZipFile(file_name) as zip:
# unzipping all the files
print("Unzipping "+ file_name)
zip.extractall("C:/NAPSG/PROJECTS/DHS/THD_Nuclear_Facilities/SCRIPT/CountyDownload/Data")
print('Unzip Complete')
#removes old zip file
os.remove(file_name)
x = os.listdir("C:/NAPSG/PROJECTS/DHS/THD_Nuclear_Facilities/SCRIPT/CountyDownload/Data")
os.renames(str(x), "Test.gdb")
Output:
FileNotFoundError: [WinError 2] The system cannot find the file specified: "['f29a52b8908242f5b1f32c58b74c063b.gdb']" -> 'Test.gdb'
I'm relatively new to python scripting, so if there is an easier alternative, that would be great as well. Thanks!
os.listdir() returns a list files/objects that are in a folder.
lists are represented, when printed to the screen, using a set of brackets.
The name of each file is a string of characters and strings are represented, when printed to the screen, using quotes.
So we are seeing a list with a single filename:
['f29a52b8908242f5b1f32c58b74c063b.gdb']
To access an item within a list using Python, you can using index notation (which happens to also use brackets to tell Python which item in the list to use by referencing the index or number of the item.
Python list indexes starting at zero, so to get the first (and in this case only item in the list), you can use x[0].
x = os.listdir("C:/NAPSG/PROJECTS/DHS/THD_Nuclear_Facilities/SCRIPT/CountyDownload/Data")
os.renames(x[0], "Test.gdb")
Having said that, I would generally not use x as a variable name in this case... I might write the code a bit differently:
files = os.listdir("C:/NAPSG/PROJECTS/DHS/THD_Nuclear_Facilities/SCRIPT/CountyDownload/Data")
os.renames(files[0], "Test.gdb")
Square brackets indicate a list. Try x[0] that should get rid of the brackets and be just the data.
The return from listdir may be a list with only one value or a whole bunch

Looping through files using lists

I have a folder with pseudo directory (/usr/folder/) of files that look like this:
target_07750_20181128.tsv.gz
target_07750_20181129.tsv.gz
target_07751_20181130.tsv.gz
target_07751_20181203.tsv.gz
target_07751_20181204.tsv.gz
target_27103_20181128.tsv.gz
target_27103_20181129.tsv.gz
target_27103_20181130.tsv.gz
I am trying to join the above tsv files to one xlsx file on store code (found in the file names above).
I am reading say file.xlsx and reading that in as a pandas dataframe.
I have extracted store codes from file.xlsx so I have the following:
stores = instore.store_code.astype(str).unique()
output:
07750
07751
27103
So my end goal is to loop through each store in stores and find which filename that corresponds to in directory. Here is what I have so far but I can't seem to get the proper filename to print:
import os
for store in stores:
print(store)
if store in os.listdir('/usr/folder/'):
print(os.listdir('/usr/folder/'))
The output I'm expecting to see for say store_code in loop = '07750' would be:
07750
target_07750_20181128.tsv.gz
target_07750_20181129.tsv.gz
Instead I'm only seeing the store codes returned:
07750
07751
27103
What am I doing wrong here?
The reason your if statement fails is that it checks if "07750" etc is one of the filenames in the directory, which it is not. What you want is to see if "07750" is contained in one of the filenames.
I'd go about it like this:
from collections import defaultdict
store_files = defaultdict(list)
for filename in os.listdir('/usr/folder/'):
store_number = <some string magic to extract the store number; you figure it out>
store_files[store_number].append(filename)
Now store_files will be a dictionary with a list of filenames for each store number.
The problem is that you're assuming a substring search -- that's not how in works on a list. For instance, on the first iteration, your if looks like this:
if "07750" in ["target_07750_20181128.tsv.gz",
"target_07750_20181129.tsv.gz",
"target_07751_20181130.tsv.gz",
... ]:
The string "07755" is not an element of that list. It does appear as a substring, but in doesn't work that way on a list. Instead, try this:
for filename in os.listdir('/usr/folder/'):
if '_' + store + '_' in filename:
print(filename)
Does that help?

How can I make an array of subdirectories in Python?

I want to make an array of subdirectories in Python. Here is an example layout and model list I would like to obtain.
Root
|
directories
/ \
subdir_1... subdir_n
Hence from the root, I would like to run a program that will make a list of all the subdirectories. That way if I were to write:
print(List_of_Subdirectories)
Where List_of_Subdirectories is the list of appended directories. I would obtain the output:
[subdir_1, subdir_2, ... , subdir_n]
In essence, I would like to achieve the same results as if I were to hard code every directory into a list. For example:
List_of_Subdirectories = ["subdir_1", "subdir_2", ... , "subdir_n"]
Where subdir_n denotes an arbitrary nth directory.
Unlike other posts here on stack overflow, I would like the list to contain just the directory names without tuples or paths.
If you just want the directory names, you can use os.walk to do this:
os.walk(directory)
will yield a tuple for each subdirectory. The first entry in the 3-tuple is a directory name. You can wrap this in a function to simply return the list of directory names like so:
def list_paths(path):
directories = [x[1] for x in os.walk(path)]
non_empty_dirs = [x for x in directories if x] # filter out empty lists
return [item for subitem in non_empty_dirs for item in subitem] # flatten the list
should give you all of the directories.
If all you want is a list of the subdirectories of an specified directory, all you need is os.listdir and a filter to display only directories.
It's as simple as:
List_of_Subdirectories = list(filter(os.path.isdir, os.listdir()))
print(List_of_Subdirectories)
The return from os.listdir is a list containing the names of all the available elements in the specified directory (or . by default), directories and files. We filter only the directories using os.path.isdir. Then, as you want a list, we explicitly convert the filtered result.
You wouldn't be able to print the filtered result, but you would be able to iterate over it. The snippet below would achieve the same result as the one avobe.
directory_elements = filter(os.path.isdir, os.listdir())
List_of_Subdirectories = []
for element in directory_elements:
List_of_Subdirectories.append(element)
print(List_of_Subdirectories)

Automatically find files that start with similar strings (and find these strings) using Python

I have a directory with a number of files in a format similar to this:
"ABC_01.dat", "ABC_02.dat", "ABC_03-08.dat", "DEF_13.dat", "DEF_14.dat", "DEF_16.dat", "GHI_09.dat", "GHI_12-14.dat"
etc., you get the idea. Essentially, what I want to do is merge all files whose names start with a similar string. At the moment, I do this by manually setting a variable names = ["ABC", "DEF", "GHI"], iterating over it (for name in names) and getting the respective filenames using glob glob.glob(name + "*.dat"). The merging step is later done using pandas. I don't just need the names/prefixes for finding the files; they are used later in my script to set the output files' names.
Is there a way I can automatically generate the variable names if I know that the files are all in the format name_*.dat?
Consider this :
names = set([name.rpartition('_')[0] for name in glob('*_*.dat')])
This will get all unique prefixes before '_'. You will also want to set a correct path in glob() before matching.
You can do this:
result = [filter(lambda x:x.startswith(sn), fileNames) for sn in set([i.split('_')[0] for i in glob.glob("*.*")])]
print result
output:
[['ABC_01.dat', 'ABC_02.dat', 'ABC_03-08.dat'], ['GHI_09.dat', 'GHI_12-14.dat'], ['DEF_13.dat', 'DEF_14.dat', 'DEF_16.dat']]
Now, all files from result[0] are to be merged; similarly for result[1],...

Categories

Resources