I created a script that is supposed to download some data, then run a few processes. The data source (being ArcGIS Online) always downloads the data as a zip file and when extracted the folder name will be a series of letters and numbers. I noticed that these occasionally change (not entirely sure why). My thought is to run an os.listdir to get the folder name then rename it. Where I run into issues is that the list returns the folder name with brackets and quotes. It returns as ['f29a52b8908242f5b1f32c58b74c063b.gdb'] as the folder name while folder in the file explorer does not have the brackets and quotes. Below is my code and the error I receive.
from zipfile import ZipFile
file_name = "THDNuclearFacilitiesBaseSandboxData.zip"
with ZipFile(file_name) as zip:
# unzipping all the files
print("Unzipping "+ file_name)
zip.extractall("C:/NAPSG/PROJECTS/DHS/THD_Nuclear_Facilities/SCRIPT/CountyDownload/Data")
print('Unzip Complete')
#removes old zip file
os.remove(file_name)
x = os.listdir("C:/NAPSG/PROJECTS/DHS/THD_Nuclear_Facilities/SCRIPT/CountyDownload/Data")
os.renames(str(x), "Test.gdb")
Output:
FileNotFoundError: [WinError 2] The system cannot find the file specified: "['f29a52b8908242f5b1f32c58b74c063b.gdb']" -> 'Test.gdb'
I'm relatively new to python scripting, so if there is an easier alternative, that would be great as well. Thanks!
os.listdir() returns a list files/objects that are in a folder.
lists are represented, when printed to the screen, using a set of brackets.
The name of each file is a string of characters and strings are represented, when printed to the screen, using quotes.
So we are seeing a list with a single filename:
['f29a52b8908242f5b1f32c58b74c063b.gdb']
To access an item within a list using Python, you can using index notation (which happens to also use brackets to tell Python which item in the list to use by referencing the index or number of the item.
Python list indexes starting at zero, so to get the first (and in this case only item in the list), you can use x[0].
x = os.listdir("C:/NAPSG/PROJECTS/DHS/THD_Nuclear_Facilities/SCRIPT/CountyDownload/Data")
os.renames(x[0], "Test.gdb")
Having said that, I would generally not use x as a variable name in this case... I might write the code a bit differently:
files = os.listdir("C:/NAPSG/PROJECTS/DHS/THD_Nuclear_Facilities/SCRIPT/CountyDownload/Data")
os.renames(files[0], "Test.gdb")
Square brackets indicate a list. Try x[0] that should get rid of the brackets and be just the data.
The return from listdir may be a list with only one value or a whole bunch
Related
I have a directory with files like: data_Mon_15-8-22.csv, data_Tue_16-8-22.csv, data_Mon_22-8-22.csv etc and I am trying to delete all but the Monday files. However, my script doesn't seem to differentiate between the filenames and just deletes everything despite me stating it. Where did I go wrong? Any help would be much appreciated!
My Code:
def file_delete():
directory = pathlib.Path('/Path/To/Data')
for file in directory.glob('data_*.csv'):
if file != 'data_Mon_*.csv':
os.remove(file)]
if all Monday files start with "data_Mon_" then you might use str.startswith:
def file_delete():
directory = pathlib.Path('/Path/To/Data')
for file in directory.glob('data_*.csv'):
if not file.name.startswith('data_Mon_'):
os.remove(file)
if file != 'data_Mon_*.csv'
There's two problems here:
file is compared against the string 'data_Mon_*.csv'. Since file isn't a string, these two objects will never be equal. So the if condition will always be true. To fix this, you need to get the file's name, rather than using the file object directly.
Even if you fix this, the string 'data_Mon_*.csv' is literal. In other words, the * is a *. Unlike directory.glob('data_*.csv'), this will only match a * rather than match "anything" as in a glob expression. In order to fix this, you need to use a regular expression to match against your file name.
I'm trying to create a list of excel files that are saved to a specific directory, but I'm having an issue where when the list is generated it creates a duplicate entry for one of the file names (I am absolutely certain there is not actually a duplicate of the file).
import glob
# get data file names
path =r'D:\larvalSchooling\data'
filenames = glob.glob(path + "/*.xlsx")
output:
>>> filenames
['D:\\larvalSchooling\\data\\copy.xlsx', 'D:\\larvalSchooling\\data\\Raw data-SF_Fri_70dpf_GroupABC_n5_20200828_1140-Trial 1.xlsx', 'D:\\larvalSchooling\\data\\Raw data-SF_Sat_70dpf_GroupA_n5_20200808_1015-Trial 1.xlsx', 'D:\\larvalSchooling\\data\\Raw data-SF_Sat_84dpf_GroupABCD_n5_20200822_1440-Trial 1.xlsx', 'D:\\larvalSchooling\\data\\~$Raw data-SF_Fri_70dpf_GroupABC_n5_20200828_1140-Trial 1.xlsx']
you'll note 'D:\larvalSchooling\data\Raw data-SF_Fri_70dpf_GroupABC_n5_20200828_1140-Trial 1.xlsx' is listed twice.
Rather than going through after the fact and removing duplicates I was hoping to figure out why it's happening to begin with.
I'm using python 3.7 on windows 10 pro
If you wrote the code to remove duplicates (which can be as simple as filenames = set(filenames)) you'd see that you still have two filenames. Print them out one on top of the other to make a visual comparison easier:
'D:\\larvalSchooling\\data\\Raw data-SF_Sat_84dpf_GroupABCD_n5_20200822_1440-Trial 1.xlsx',
'D:\\larvalSchooling\\data\\~$Raw data-SF_Fri_70dpf_GroupABC_n5_20200828_1140-Trial 1.xlsx'
The second one has a leading ~ (probably an auto-backup).
Whenever you open an excel file it will create a ghost copy that works as a temporary backup copy for that specific file. In this case:
Raw data-SF_Fri_70dpf_GroupABC_n5_20200828_1140-Trial1.xlsx
~$ Raw data-SF_Fri_70dpf_GroupABC_n5_20200828_1140-Trial1.xlsx
This means that the file is open by some software and it's showing you that backup inside(usually that file is hidden from the explorer as well)
Just search for the program and close it. Other actions, such as adding validation so the "~$.*.xlsx" type of file is ignored should be also implemented if this is something you want to avoid.
You can use os.path.splittext to get the file extension and loop through the directory using os.listdir . The open excel files can be skipped using the following code:
filenames = []
for file in os.listdir('D:\larvalSchooling\data'):
filename, file_extension = os.path.splitext(file)
if file_extension == '.xlsx':
if not file.startswith('~$'):
filenames.append(file)
Note: this might not be the best solution, but it'll get the job done :)
I am trying to import several "dat" files to python spyder.
dat_file_list_images
Here are dat files listed, and there are two types on the list ended with _1 and _2. I wanna import dat files ended with "_1" only.
Is there any way to import them at once with one single loop?
After I import them, I would like to aggregate all to one single matrix.
import os
files_to_import = [f for f in os.listdir(folder_path)
if f.endswith("1")]
Make sure that you know whether the files have a .dat-extension or not - in Windows Explorer, the default setting is to hide file endings, and this will make your code fail if the files have a different ending.
What this code does is called list comprehension - os.listdir() provides all the files in the folder, and you create a list with only the ones that end with "1".
Uses str.endswith() it will return true if the entered string is ended with checking string
According to this website
Syntax: str.endswith(suffix[, start[, end]])
For your case:
You will need a loop to get filenames as String and check it while looping if it ends with "_1"
yourFilename = "yourfilename_1"
if yourFilename.endswith("_1"):
# do your job here
I have zip file downloaded from website. I wanted to make script that rename zip file and before unzip, it checks how many files are in it and unzip it.
The problem is that zip file is in the directory but it keep giving me error that
'FileNotFoundError: [Errno 2] No such file or directory: 'filename.zip''
I assumed that It might be caused by file name because I use ubuntu and when I downloaded the file, the name was broken because it was not English. so I changed it into numbers (ex:20176) but still getting this error.
my script
path means absolute path.
data_type = '{}{}'.format('201706', '.zip')
filename = [i for i in os.listdir('user/directory')]
filename.sort(key=lambda ctime: ctime[0])
downloaded = str(filename[0])
old = os.path.join('user/directory', downloaded)
new = os.path.join('user/directory', data_type)
os.rename(old, new)
zip = ZipFile(data_type)
archived_files = zip.namelist()
amount = len(archived_files)
Let's suppose the first filename in the sorted list is myfile.txt. Your code
old = os.path.join('user/directory', downloaded)
new = os.path.join('user/directory', data_type)
os.rename(old, new)
renames the first file in the directory listing, user/directory/myfile.txt (not, due to the considerations above, the oldest one) to user/directory/201706.zip. The next statement then tries to open 2010706.zip, which of course doesn't exist. It should work if you try
zip = ZipFile(new)
Unfortunately there's no guarantee that the file actually will be a zipfile, so the operation may fail.
Some other points to consider, perhaps in other questions:
I suspect you are misunderstanding the sorting functions: although you appear to want to sort the list of filenames on creation time, just calling a lambda's parameter ctime doesn't mean that Python will understand your needs. If only programming languages had a DWIM ("do what I mean") mode, life would be so much easier!
The key argument to sort is a function that the sort function calls once for each value to be sorted, expecting it to return a "sort key" (that is, a value that represents its place in the required ordering). Suppose I take your lambda and apply it to a filename:
In [1]: ruth_lambda = lambda ctime: ctime[0]
In [2]: ruth_lambda("MY_FILENAME.TXT")
Out[2]: 'M'
You can see that you are sorting on the first character of the filename, and I doubt that's what you really want. But we can go into that later.
A quick note on formatting: it would have been simpler to write
data_type = '{}{}'.format(zipfile_name, '.zip')
as
data_type = '{}.zip'.format(zipfile_name)
Finally, since os.listdir returns a list, instead of
filename = [i for i in os.listdir('user/directory')]
it's simpler to write
filename = os.listdir('user/directory')
though the extra computation will do no harm. As a beginner you will find that as you improve, your older code starts to look really clunky - don't worry about that, it's a common experience! Just move forwards and try not to repeat old mistakes.
I made a script in the past to mass rename any file greater than x characters in a directory. When I made that script I had a source directory which you would need to input manually. Any file that was over x characters in that directory would be stripped of it's extension, renamed, then the extension would be re added and it would use os.path.join to join the source and the newly created filename+ext. I'm now making another script and used os.path.join("Folder in the current dir", "file in that dir"). Because this worked I'm guessing that when os.path.join is called with just a foldername and no full path in it's first parameter it starts it's search from the directory that the script it was run in? Just wondering if this is correct.
os.path.join has nothing to do with any actual filesystem, and does not "start" anywhere. It simply joins two arbitrary paths, whether they exist or not.
What os.path.join does is to just join path elements the system-compatible way, taking into effect the particular directory separator character, etc., into account. It's a simple string manipulation tool.
So the returned result simply starts from whatever you give to it as the first argument.