python : unzipping error 'no such file or directory' - python

I have zip file downloaded from website. I wanted to make script that rename zip file and before unzip, it checks how many files are in it and unzip it.
The problem is that zip file is in the directory but it keep giving me error that
'FileNotFoundError: [Errno 2] No such file or directory: 'filename.zip''
I assumed that It might be caused by file name because I use ubuntu and when I downloaded the file, the name was broken because it was not English. so I changed it into numbers (ex:20176) but still getting this error.
my script
path means absolute path.
data_type = '{}{}'.format('201706', '.zip')
filename = [i for i in os.listdir('user/directory')]
filename.sort(key=lambda ctime: ctime[0])
downloaded = str(filename[0])
old = os.path.join('user/directory', downloaded)
new = os.path.join('user/directory', data_type)
os.rename(old, new)
zip = ZipFile(data_type)
archived_files = zip.namelist()
amount = len(archived_files)

Let's suppose the first filename in the sorted list is myfile.txt. Your code
old = os.path.join('user/directory', downloaded)
new = os.path.join('user/directory', data_type)
os.rename(old, new)
renames the first file in the directory listing, user/directory/myfile.txt (not, due to the considerations above, the oldest one) to user/directory/201706.zip. The next statement then tries to open 2010706.zip, which of course doesn't exist. It should work if you try
zip = ZipFile(new)
Unfortunately there's no guarantee that the file actually will be a zipfile, so the operation may fail.
Some other points to consider, perhaps in other questions:
I suspect you are misunderstanding the sorting functions: although you appear to want to sort the list of filenames on creation time, just calling a lambda's parameter ctime doesn't mean that Python will understand your needs. If only programming languages had a DWIM ("do what I mean") mode, life would be so much easier!
The key argument to sort is a function that the sort function calls once for each value to be sorted, expecting it to return a "sort key" (that is, a value that represents its place in the required ordering). Suppose I take your lambda and apply it to a filename:
In [1]: ruth_lambda = lambda ctime: ctime[0]
In [2]: ruth_lambda("MY_FILENAME.TXT")
Out[2]: 'M'
You can see that you are sorting on the first character of the filename, and I doubt that's what you really want. But we can go into that later.
A quick note on formatting: it would have been simpler to write
data_type = '{}{}'.format(zipfile_name, '.zip')
as
data_type = '{}.zip'.format(zipfile_name)
Finally, since os.listdir returns a list, instead of
filename = [i for i in os.listdir('user/directory')]
it's simpler to write
filename = os.listdir('user/directory')
though the extra computation will do no harm. As a beginner you will find that as you improve, your older code starts to look really clunky - don't worry about that, it's a common experience! Just move forwards and try not to repeat old mistakes.

Related

Duplicate in list created from filenames (python)

I'm trying to create a list of excel files that are saved to a specific directory, but I'm having an issue where when the list is generated it creates a duplicate entry for one of the file names (I am absolutely certain there is not actually a duplicate of the file).
import glob
# get data file names
path =r'D:\larvalSchooling\data'
filenames = glob.glob(path + "/*.xlsx")
output:
>>> filenames
['D:\\larvalSchooling\\data\\copy.xlsx', 'D:\\larvalSchooling\\data\\Raw data-SF_Fri_70dpf_GroupABC_n5_20200828_1140-Trial 1.xlsx', 'D:\\larvalSchooling\\data\\Raw data-SF_Sat_70dpf_GroupA_n5_20200808_1015-Trial 1.xlsx', 'D:\\larvalSchooling\\data\\Raw data-SF_Sat_84dpf_GroupABCD_n5_20200822_1440-Trial 1.xlsx', 'D:\\larvalSchooling\\data\\~$Raw data-SF_Fri_70dpf_GroupABC_n5_20200828_1140-Trial 1.xlsx']
you'll note 'D:\larvalSchooling\data\Raw data-SF_Fri_70dpf_GroupABC_n5_20200828_1140-Trial 1.xlsx' is listed twice.
Rather than going through after the fact and removing duplicates I was hoping to figure out why it's happening to begin with.
I'm using python 3.7 on windows 10 pro
If you wrote the code to remove duplicates (which can be as simple as filenames = set(filenames)) you'd see that you still have two filenames. Print them out one on top of the other to make a visual comparison easier:
'D:\\larvalSchooling\\data\\Raw data-SF_Sat_84dpf_GroupABCD_n5_20200822_1440-Trial 1.xlsx',
'D:\\larvalSchooling\\data\\~$Raw data-SF_Fri_70dpf_GroupABC_n5_20200828_1140-Trial 1.xlsx'
The second one has a leading ~ (probably an auto-backup).
Whenever you open an excel file it will create a ghost copy that works as a temporary backup copy for that specific file. In this case:
Raw data-SF_Fri_70dpf_GroupABC_n5_20200828_1140-Trial1.xlsx
~$ Raw data-SF_Fri_70dpf_GroupABC_n5_20200828_1140-Trial1.xlsx
This means that the file is open by some software and it's showing you that backup inside(usually that file is hidden from the explorer as well)
Just search for the program and close it. Other actions, such as adding validation so the "~$.*.xlsx" type of file is ignored should be also implemented if this is something you want to avoid.
You can use os.path.splittext to get the file extension and loop through the directory using os.listdir . The open excel files can be skipped using the following code:
filenames = []
for file in os.listdir('D:\larvalSchooling\data'):
filename, file_extension = os.path.splitext(file)
if file_extension == '.xlsx':
if not file.startswith('~$'):
filenames.append(file)
Note: this might not be the best solution, but it'll get the job done :)

Writing Data to File then Moving to Desktop

I'm sure this is really simple but I can't figure out how to make a parsed result into its own file then move it to my desktop using python. Here is my code so far. I just want to save the result "names" as its own file then move it to my desktop but I can't find the answer anywhere. Is this an uncommon practice?
from gedcom.element.individual import IndividualElement
from gedcom.parser import Parser
import os
import shutil
import pickle
# Path to your `.ged` file
file_path = '/Users/Justin/Desktop/Lienhard_Line.ged'
# Initialize the parser
gedcom_parser = Parser()
# Parse your file
gedcom_parser.parse_file(file_path, False)
root_child_elements = gedcom_parser.get_root_child_elements()
# Iterate through all root child elements
for element in root_child_elements:
# Is the `element` an actual `IndividualElement`? (Allows usage of extra functions such as `surname_match` and `get_name`.)
if isinstance(element, IndividualElement):
# Get all individuals whose surname matches "Doe"
if element.surname_match('Lienhard'):
# Unpack the name tuple
(first, last) = element.get_name()
names = (first + " " + last)
pickle.dumps(names)
Saving a file to one location and then moving it is to another not how it's usually done, no. Just save to the final location.
from pathlib import Path
pic = Path.home() / 'Desktop' / 'names.pickle'
with pic.open('w') as picklefile:
pickle.dump(names, picklefile)
The pathlib library makes working with file names somewhat easier than the old os.path API, though both are in common use.
Writing and then renaming has some uses; if you need to absolutely make sure your data is saved somewhere, saving it to a temporary file until it can be renamed to its final name is a fairly common technique. But in this case, saving to a different location first seems like it would only introduce brittleness.
The above assumes that you have a directory named Desktop in your home directory, as is commonly the default on beginner-oriented systems. To be perfectly robust, the code should create the directory if it doesn't already exist, or perhaps simply save to your home directory.
Indeed, a much better convention for most purposes is simply to always save in the current directory, and let the user take it from there. Hardcoding paths in another directory just adds confusion and uncertainty.
with open('names.pickle', 'w') as picklefile:
pickle.dump(names, picklefile)
This code creates a file called names.pickle in the invoking user's current working directory. The operating system provides the concept of a current working directory for precisely this purpose. Perhaps see also Difference between ./ and ~/ if you need more details about how this works.

Rename directory with constantly changing name

I created a script that is supposed to download some data, then run a few processes. The data source (being ArcGIS Online) always downloads the data as a zip file and when extracted the folder name will be a series of letters and numbers. I noticed that these occasionally change (not entirely sure why). My thought is to run an os.listdir to get the folder name then rename it. Where I run into issues is that the list returns the folder name with brackets and quotes. It returns as ['f29a52b8908242f5b1f32c58b74c063b.gdb'] as the folder name while folder in the file explorer does not have the brackets and quotes. Below is my code and the error I receive.
from zipfile import ZipFile
file_name = "THDNuclearFacilitiesBaseSandboxData.zip"
with ZipFile(file_name) as zip:
# unzipping all the files
print("Unzipping "+ file_name)
zip.extractall("C:/NAPSG/PROJECTS/DHS/THD_Nuclear_Facilities/SCRIPT/CountyDownload/Data")
print('Unzip Complete')
#removes old zip file
os.remove(file_name)
x = os.listdir("C:/NAPSG/PROJECTS/DHS/THD_Nuclear_Facilities/SCRIPT/CountyDownload/Data")
os.renames(str(x), "Test.gdb")
Output:
FileNotFoundError: [WinError 2] The system cannot find the file specified: "['f29a52b8908242f5b1f32c58b74c063b.gdb']" -> 'Test.gdb'
I'm relatively new to python scripting, so if there is an easier alternative, that would be great as well. Thanks!
os.listdir() returns a list files/objects that are in a folder.
lists are represented, when printed to the screen, using a set of brackets.
The name of each file is a string of characters and strings are represented, when printed to the screen, using quotes.
So we are seeing a list with a single filename:
['f29a52b8908242f5b1f32c58b74c063b.gdb']
To access an item within a list using Python, you can using index notation (which happens to also use brackets to tell Python which item in the list to use by referencing the index or number of the item.
Python list indexes starting at zero, so to get the first (and in this case only item in the list), you can use x[0].
x = os.listdir("C:/NAPSG/PROJECTS/DHS/THD_Nuclear_Facilities/SCRIPT/CountyDownload/Data")
os.renames(x[0], "Test.gdb")
Having said that, I would generally not use x as a variable name in this case... I might write the code a bit differently:
files = os.listdir("C:/NAPSG/PROJECTS/DHS/THD_Nuclear_Facilities/SCRIPT/CountyDownload/Data")
os.renames(files[0], "Test.gdb")
Square brackets indicate a list. Try x[0] that should get rid of the brackets and be just the data.
The return from listdir may be a list with only one value or a whole bunch

How to read a file from a directory and convert it to a table?

I have a class that takes in positional arguments (startDate, endDate, unmappedDir, and fundCodes), I have the following methods:
The method below is supposed to take in a an array of fundCodes and look in a directory and see if it finds files matching a certain format
def file_match(self, fundCodes):
# Get a list of the files in the unmapped directory
files = os.listdir(self.unmappedDir)
# loop through all the files and search for matching fund code
for check_fund in fundCodes:
# set a file pattern
file_match = 'unmapped_positions_{fund}_{start}_{end}.csv'.format(fund=check_fund, start=self.startDate, end=self.endDate)
# look in the unmappeddir and see if there's a file with that name
if file_match in files:
# if there's a match, load unmapped positions as etl
return self.read_file(file_match)
else:
Logger.error('No file found with those dates/funds')
The other method is simply supposed to create an etl table from that file.
def read_file(self, filename):
loadDir = Path(self.unmappedDir)
for file in loadDir.iterdir():
print('*' *40)
Logger.info("Found a file : {}".format(filename))
print(filename)
unmapped_positions_table = etl.fromcsv(filename)
print(unmapped_positions_table)
print('*' * 40)
return unmapped_positions_table
When running it, I'm able to retrieve the filename:
Found a file : unmapped_positions_PUPSFF_2018-07-01_2018-07-11.csv
unmapped_positions_PUPSFF_2018-07-01_2018-07-11.csv
But when trying to create the table, I get this error:
FileNotFoundError: [Errno 2] No such file or directory: 'unmapped_positions_PUPSFF_2018-07-01_2018-07-11.csv'
Is it expecting a full path to the filename or something?
The proximate problem is that you need a full pathname.
The filename that you're trying to call fromcsv on is passed into the function, and ultimately came from listdir(self.unmappedDir). This means it's a path relative to self.unmappedDir.
Unless that happens to also be your current working directory, it's not going to be a valid path relative to the current working directory.
To fix that, you'd want to use os.path.join(self.unmappedDir, filename) instead of just filename. Like this:
return self.read_file(os.path.join(self.unmappedDir), file_match)
Or, alternatively, you'd want to use pathlib objects instead of strings, as you do with the for file in loadDir.iterdir(): loop. If file_match is a Path instead of a dumb string, then you can just pass it to read_file and it'll work.
But, if that's what you actually want, you've got a lot of useless code. In fact, the entire read_file function should just be one line:
def read_file(self, path):
return etl.fromcsv(path)
What you're doing instead is looping over every file in the directory, then ignoring that file and reading filename, and then returning early after the first one. So, if there's 1 file there, or 20 of them, this is equivalent to the one-liner; if there are no files, it returns None. Either way, it doesn't do anything useful except to add complexity, wasted performance, and multiple potential bugs.
If, on the other hand, the loop is supposed to do something meaningful, then you should be using file rather than filename inside the loop, and you almost certainly shouldn't be doing an unconditional return inside the loop.
with this:
files = os.listdir(self.unmappedDir)
you're getting the file names of self.unmappedDir
So when you get a match on the name (when generating your name), you have to read the file by passing the full path (else the routine probably checks for the file in the current directory):
return self.read_file(os.path.join(self.unmappedDir,file_match))
Aside: use a set here:
files = set(os.listdir(self.unmappedDir))
so the filename lookup will be much faster than with a list
And your read_file method (which I didn't see earlier) should just open the file, instead of scanning the directory again (and returning at first iteration anyway, so it doesn't make sense):
def read_file(self, filepath):
print('*' *40)
Logger.info("Found a file : {}".format(filepath))
print(filepath)
unmapped_positions_table = etl.fromcsv(filepath)
print(unmapped_positions_table)
print('*' * 40)
return unmapped_positions_table
Alternately, don't change your main code (except for the set part), and prepend the directory name in read_file since it's an instance method so you have it handy.

Pythonic way to smart-rename files for record-keeping sake?

Using IronPython 2.6 (I'm new), I'm trying to write a program that opens a file, saves it at a series of locations, and then opens/manipulates/re-saves those. It will be run by an upper-level program on a loop, and this entire procedure is designed to catch/preserve corrupted saves so my company can figure out why this glitch of corruption occasionally happens.
I've currently worked out the Open/Save to locations parts of the script and now I need to build a function that opens, checks for corruption, and (if corrupted) moves the file into a subfolder (with an iterative renaming applied, for copies) or (if okay), modifies the file and saves a duplicate, where the process is repeated on the duplicate, sans duplication.
I tell this all for context to the root problem. In my situation, what is the most pythonic, consistent, and windows/unix friendly way to move a file (corrupted) into a subfolder while also renaming it based on the number of pre-existing copies of the file that exist within said subfolder?
In other words:
In a folder structure built as:
C:\Folder\test.txt
C:\Folder\Subfolder
C:\Folder\Subfolder\test.txt
C:\Folder\Subfolder\test01.txt
C:\Folder\Subfolder\test02.txt
C:\Folder\Subfolder\test03.txt
How to I move test.txt such that:
C:\Folder\Subfolder
C:\Folder\Subfolder\test.txt
C:\Folder\Subfolder\test01.txt
C:\Folder\Subfolder\test02.txt
C:\Folder\Subfolder\test03.txt
C:\Folder\Subfolder\test04.txt
In an automated way, so that I can loop my program overnight and have it stack up the corrupted text files I need to save? Note: They're not text files in practice, just example.
assuming you are going to use the convention of incrementally suffinxing numbers to the files:
import os.path
import shutil
def store_copy( file_to_copy, destination):
filename, extension = os.path.splitext( os.path.basename(file_to_copy)
existing_files = [i for in in os.listdir(destination) if i.startswith(filename)]
new_file_name = "%s%02d%s" % (filename, len(existing_files), extension)
shutil.copy2(file_to_copy, os.path.join(destination, new_file_name)
There's a fail case if you have subdirectories or files in destination whose names overlap with the source file, ie, if your file is named 'example.txt' and the destination containst 'example_A.txt' as well as 'example.txt' and 'example01.txt' If that's a possibility you'd have to change the test in the existing files = line to something more sophisticated

Categories

Resources