Extract a particular file doesn't works in python - python

I need to 1) Find a zipfile at a particular directory location 2) If it exists then unzip it 3) Out of its contents find a specific file and move it to other directory.
def searchfile():
for file in os.listdir('/user/adam/datafiles'):
if fnmatch.fnmatch(file, 'abc.zip'):
return True
return False
if searchfile():
print('File exists')
else:
print('File not found')
def file_extract():
os.chdir('/user/adam/datafiles')
file_name = 'abc.zip'
destn = '/user/adam/extracted_files'
zip_archive = ZipFile (file_name)
zip_archive.extract('class.xlsx',destn)
print("Extracted the file")
zip_archive.close()
file_extract()
When I execute the above script, it shows no compile time issues or runtime issues,. but it just works for the first function. When I check for the files in the extracte_files folder I don't see the files.

So for the sake of completeness and as my comment solved your issue, I guess I should make it an Answer:
In Python, if a function foo is defined (def foo(<...>):),
foo refers to the function itself and can be copied around (effectively copying a pointer), passed to other functions, ... as about any object;
foo() is a call without passed argument to that function.
As this question does not seem to be an assignment, I will add the following:
To improve your code, you might want to look into:
Parameters to functions (you functions are currently only doing one single thing. For example, you could pass the file and directory names to searchfile);
os.path and all its content;
The in test for checking if an object is in a container;
The with statement for clearer and safer handling of objects such as ZipFile instances;
The x if b else y construst;
Notice that even if the archive does not exist, your code still attempts to extract a file from it.
Here is a more robust way to implement what you want:
import os
import zipfile
arch_name, file_name = 'abc.zip', 'class.xlsx'
home_dir = os.path.join(os.path.abspath(os.sep), 'user', 'adam')
# or even better: home_dir = os.path.expanduser('~')
arch_dir = os.path.join(home_dir, 'datafiles')
dest_dir = os.path.join(home_dir, 'extracted_files')
arch_path = os.path.join(arch_dir, arch_name)
if os.path.isfile(arch_path):
print('File {} exists'.format(arch_path))
with zipfile.ZipFile(arch_path) as archive:
archive.extract(file_name, dest_dir)
print('Extracted {} from {}'.format(file_name, arch_name))
else:
print('File {} not found'.format(arch_path))
Disclaimer: This code is untested and could contain minor mistakes!
Notice how the second half of the code works with generic variables that can easily be modified in a single place in the first half. Also, notice the improved readability of if os.path.isfile(arch_path): as opposed to if searchfile(): (requiring us to then read the implementation of searchfile).

There is a good chapter that will help with this in Automate the Boring Stuff
https://automatetheboringstuff.com/chapter9/

Related

Check if a given directory contains any directory in python

Essentially, I'm wondering if the top answer given to this question can be implemented in Python. I am reviewing the modules os, os.path, and shutil and I haven't yet been able to find an easy equivalent, though I assume I'm just missing something simple.
More specifically, say I have a directory A, and inside directory A is any other directory. I can call os.walk('path/to/A') and check if dirnames is empty, but I don't want to make the program go through the entire tree rooted at A; i.e. what I'm looking for should stop and return true as soon as it finds a subdirectory.
For clarity, on a directory containing files but no directories an acceptable solution will return False.
maybe you want
def folders_in(path_to_parent):
for fname in os.listdir(path_to_parent):
if os.path.isdir(os.path.join(path_to_parent,fname)):
yield os.path.join(path_to_parent,fname)
print(list(folders_in("/path/to/parent")))
this will return a list of all subdirectories ... if its empty then there are no subdirectories
or in one line
set([os.path.dirname(p) for p in glob.glob("/path/to/parent/*/*")])
although for a subdirectory to be counted with this method it must have some file in it
or manipulating walk
def subfolders(path_to_parent):
try:
return next(os.walk(path_to_parent))[1]
except StopIteration:
return []
I would just do as follows:
#for example
dir_of_interest = "/tmp/a/b/c"
print(dir_of_interest in (v[0] for v in os.walk("/tmp/")))
This prints True or False, depending if dir_of_interest is in the generator. And you use here generator, so the directories to check are generated one by one.
You can break from the walk anytime you want. For example, this brakes is a current folder being walked, has no subdirectories:
for root, dirs, files in os.walk("/tmp/"):
print(root,len(dirs))
if not len(dirs): break
Maybe this is in line with what you are after.
Try this:
#!/usr/local/cpython-3.4/bin/python
import glob
import os
top_of_hierarchy = '/tmp/'
#top_of_hierarchy = '/tmp/orbit-dstromberg'
pattern = os.path.join(top_of_hierarchy, '*')
for candidate in glob.glob(pattern):
if os.path.isdir(candidate):
print("{0} is a directory".format(candidate))
break
else:
print('No directories found')
# Tested on 2.6, 2.7 and 3.4
I apparently can't comment yet; however, I wanted to update part of the answer https://stackoverflow.com/users/541038/joran-beasley gave, or at least what worked for me.
Using python3 (3.7.3), I had to modify his first code snippet as follows:
import os
def has_folders(path_to_parent):
for fname in os.listdir(path_to_parent):
if os.path.isdir(os.path.join(path_to_parent, fname)):
yield os.path.join(path_to_parent, fname)
print(list(has_folders("/repo/output")))
Further progress on narrowing to "does given directory contain any directory" results in code like:
import os
def folders_in(path_to_parent):
for fname in os.listdir(path_to_parent):
if os.path.isdir(os.path.join(path_to_parent, fname)):
yield os.path.join(path_to_parent, fname)
def has_folders(path_to_parent):
folders = list(folders_in(path_to_parent))
return len(folders) != 0
print(has_folders("the/path/to/parent"))
The result of this code should be True or False

How can I open multiple files (number of files unknown beforehand) using "with open" statement?

I specifically need to use with open statement for opening the files, because I need to open a few hundred files together and merge them using K-way merge. I understand, ideally I should have kept K low, but I did not foresee this problem.
Starting from scratch is not an option now as I have a deadline to meet. So at this point, I need very fast I/O that does not store the whole/huge portion of file in memory (because there are hundreds of files, each of ~10MB). I just need to read one line at a time for K-way merge. Reducing memory usage is my primary focus right now.
I learned that with open is the most efficient technique, but I cannot understand how to open all the files together in a single with open statement. Excuse my beginner ignorance!
Update: This problem was solved. It turns out the issue was not about how I was opening the files at all. I found out that the excessive memory usage was due to inefficient garbage collection. I did not use with open at all. I used the regular f=open() and f.close(). Garbage collection saved the day.
It's fairly easy to write your own context manager to handle this by using the built-in contextmanger function decorator to define "a factory function for with statement context managers" as the documentation puts it. For example:
from contextlib import contextmanager
#contextmanager
def multi_file_manager(files, mode='rt'):
""" Open multiple files and make sure they all get closed. """
files = [open(file, mode) for file in files]
yield files
for file in files:
file.close()
if __name__ == '__main__':
filenames = 'file1', 'file2', 'file3'
with multi_file_manager(filenames) as files:
a = files[0].readline()
b = files[2].readline()
...
If you don't know all the files ahead of time, it would be equally easy to create a context manager that supported adding them incrementally with the context. In the code below, a contextlib.ContextDecorator is used as the base class to simplify the implementation of a MultiFileManager class.
from contextlib import ContextDecorator
class MultiFileManager(ContextDecorator):
def __init__(self, files=None):
self.files = [] if files is None else files
def __enter__(self):
return self
def __exit__(self, exc_type, exc_val, exc_tb):
for file in self.files:
file.close()
def __iadd__(self, other):
"""Add file to be closed when leaving context."""
self.files.append(other)
return self
if __name__ == '__main__':
filenames = 'mfm_file1.txt', 'mfm_file2.txt', 'mfm_file3.txt'
with MultiFileManager() as mfmgr:
for count, filename in enumerate(filenames, start=1):
file = open(filename, 'w')
mfmgr += file # Add file to be closed later.
file.write(f'this is file {count}\n')
with open(...) as f:
# do stuff
translates roughly to
f = open(...)
# do stuff
f.close()
In your case, I wouldn't use the with open syntax. If you have a list of filenames, then do something like this
filenames = os.listdir(file_directory)
open_files = map(open, filenames)
# do stuff
for f in open_files:
f.close()
If you really want to use the with open syntax, you can make your own context manager that accepts a list of filenames
class MultipleFileManager(object):
def __init__(self, files):
self.files = files
def __enter__(self):
self.open_files = map(open, self.files)
return self.open_files
def __exit__(self):
for f in self.open_files:
f.close()
And then use it like this:
filenames = os.listdir(file_directory)
with MulitpleFileManager(filenames) as files:
for f in files:
# do stuff
The only advantage I see to using a context manager in this case is that you can't forget to close the files. But there is nothing wrong with manually closing the files. And remember, the os will reclaim its resources when your program exits anyway.
While not a solution for 2.7, I should note there is one good, correct solution for 3.3+, contextlib.ExitStack, which can be used to do this correctly (surprisingly difficult to get right when you roll your own) and nicely:
from contextlib import ExitStack
with open('source_dataset.txt') as src_file, ExitStack() as stack:
files = [stack.enter_context(open(fname, 'w')) for fname in fname_list]
... do stuff with src_file and the values in files ...
... src_file and all elements in stack cleaned up on block exit ...
Importantly, if any of the opens fails, all of the opens that succeeded prior to that point will be cleaned up deterministically; most naive solutions end up failing to clean up in that case, relying on the garbage collector at best, and in cases like lock acquisition where there is no object to collect, failing to ever release the lock.
Posted here since this question was marked as the "original" for a duplicate that didn't specify Python version.

os.walk() never returns when asked to print dirpaths

I have a simple directory structure:
rootdir\
subdir1\
file1.tif
subdir2\
file2.tif
...
subdir13\
file13.tif
subdir14\
file14.tif
If I call:
import os
print os.listdir('absolute\path\to\rootdir')
...then I get what you'd expect:
['subdir1', 'subdir2', ... 'subdir13', 'subdir14']
Same thing happens if I call os.listdir() on those sub-directories. For each one it returns the name of the file in that directory. No problems there.
And if I call:
import os
for dirpath, dirnames, filenames in os.walk('absolute\path\to\rootdir'):
print filenames
print dirnames
...then I get what you'd expect:
[]
['subdir1', 'subdir2', ... 'subdir13', 'subdir14']
['file1.tif']
[]
['file2.tif']
[]
...
But here's the strangeness. When I call:
import os
for dirpath, dirnames, filenames in os.walk('absolute\path\to\rootdir'):
print filenames
print dirnames
print dirpath
...it never returns, ever. Even if I try:
print [each[0] for each in os.walk('absolute\path\to\roodir')]
...or anything of the sort. I can always print the second and third parts of the tuple returned by os.walk(), but the moment that I try to touch the first part the whole thing just stops.
Even stranger, this behavior only appears in scripts launched using the shell. The command line interpreter acts normally. I'm curious, what's going on here?
-----EDIT-----
Actual code:
ALLOWED_IMGFORMATS = [".jpg",".tif"]
def getCategorizedFiles(pathname):
cats = [each[0] for each in os.walk(pathname) if not each[0] == pathname]
ncats = len(cats)
tree = [[] for i in range(ncats+1)]
for cat in cats:
catnum = int(os.path.basename(cat))
for item in os.listdir(cat):
if not item.endswith('.sift') and os.path.splitext(item)[-1].lower() in ALLOWED_IMGFORMATS:
tree[catnum].append(cat + '\\' + item)
fileDict = {cat : tree[cat] for cat in range(1,ncats+1)}
return fileDict
----EDIT 2----
Another development. As stated above, this problem exists when the code is in scripts launched from the shell. But not any shell. The problem exists with Console 2, but not the Windows command prompt. It also exists when the script is launched from java (how I originally came across the problem) like so: http://www.programmersheaven.com/mb/python/415726/415726/invoking-python-script-from-java/?S=B20000
I've never really trusted os.walk(). Just write your own recursive stuff. It's not hard:
def contents(folder, l): # Recursive, returns list of all files with full paths
directContents = os.listdir(folder)
for item in directContents:
if os.path.isfile(os.path.join(folder, item)):
l.append(os.path.join(folder, item))
else:contents(os.path.join(folder, item), l)
return l
contents = contents(folder, [])
contents will then be a list of all the files with full paths included. You can use os.split() if you like to make it a little easier to read.
Knowing how this works eliminates the uncertainty of using os.walk() in your code, which means you'll be able to identify if the problem in your code is really involved with os.walk().
If you need to put them in a dictionary (because dictionaries have aliasing benefits, too), you can also sort your files that way.

Assembling output from a recursive function?

I'm attempting to loop through a directory and any nested directories within. It seemed like recursion would be a good way to go about it.
I ended up with this code:
def get_file_list(directory=os.getcwd()):
for i in os.listdir(directory):
if os.path.isdir(i):
get_file_list(i)
continue
print i
This prints everything beautifully -- exactly the output I expected. However, I wanted to take this list of files and pass it to another function for further processing. So I tried compiling everything into a list.
def get_file_list(directory=os.getcwd()):
files = []
for i in os.listdir(directory):
if os.path.isdir(i):
get_file_list(i)
continue
files.append(i)
return files
So now, the problem is that it only returns the files from the current working directory. After some thinking, I guess this is a scoping issue. A new files variable is being created in a unique piece of memory each time get_file_list() is called, right? So how do you get around something like this? How do you assemble the results from nested calls?
all_files =[]
for current_dir,files,directories in os.walk("C:\\"):
current_files = [os.path.join(current_dir,file) for file in files]
all_files.extend(current_files)
print all files
I would think would work better
Use extend:
def get_file_list(directory='.'):
files = []
for i in os.listdir(directory):
if os.path.isdir(i):
files.extend(get_file_list(i))
else:
files.append(i)
return files
Also, I changed your os.getcwd() call to just . since you probably want it to default to the current current working directory, not the working directory at the point at which the function was defined.
Use generators! They're very powerful and make things easy to read. Here are some references.
Basically, you use "yield" to return values instead of "return". When the function encounters a "yield" statement, it returns the value and pauses the execution of the function, meaning when the function is called again later, it picks up where it left off!
And to top it off, you can tell python to iterate over generator functions using "for x in my_generator_function()". Very handy.
import os
#this is a "generator function"
def get_files(directory='.'):
for item in os.listdir(directory):
item = os.path.join(directory, item)
if os.path.isdir(item):
for subitem in get_files(item):
yield subitem
# The fact that there's a "yield" statement here
# tells python that this is a generator function
else:
yield item
for item in get_files():
print item # Do something besides printing here, obviously ;)
A common way to do this recursively in the spirit of your original question is to pass in the list you are appending to as a parameter. Pass the empty list to the very first call to the function. A recursive "helper" (often implemented as a nested function) can accumulate the files.
EDIT:
Here is a complete script (fixed from a previous version):
import os
def get_file_list(directory=os.getcwd()):
def file_list(directory, files):
for i in os.listdir(directory):
if os.path.isdir(i):
file_list(i, files)
continue
files.append(i)
return files
return file_list(directory, [])
print get_file_list()
import os
def get_file_list(files,directory=os.getcwd()):
for i in os.listdir(directory):
if os.path.isdir(i):
get_file_list(files,i) #note me needed to amend this call to pass the reference down the calls
continue
files.append(i) #insert the file name into our referenced list.
myfiles = [] #the list we want to insert all the file names into
get_file_list(myfiles) #call the function and pass a reference to myfiles in
print('\n'.join(myfiles))

Can anyone figure out my problem [Python]

I have been trying to debug the below python cgi code but doesn't seems to work. When i try in new file it these three lines seems to work
filename=unique_file('C:/wamp/www/project/input.fasta')
prefix, suffix = os.path.splitext(filename)
fd, filename = tempfile.mkstemp(suffix, prefix+"_", dirname)
But, when i try like this way then i get error unique_file is not define >>>
form=cgi.FieldStorage()
i=(form["dfile"].value)
j=(form["sequence"].value)
if (i!="" and j=="" ):
filename=(form["dfile"].filename)
(name, ext) = os.path.splitext(filename)
alignfile=name + '.aln'
elif(j!="" and i==""):
filename=unique_file('C:/wamp/www/project/input.fasta')
prefix, suffix = os.path.splitext(filename)
fd, filename = tempfile.mkstemp(suffix, prefix+"_", dirname)
file = open(filename, 'w')
value=str(j)
file.write(value)
file.close()
(name, ext) = os.path.splitext(filename)
alignfile=name + '.aln'
What i am trying to do is check two options from form:- Fileupload and textarea. If fileupload is true then there is nothing to do except separating file and its extension. But when textarea is true then i have to generate unique file name and write content in it and pass filename and its extension.
Error i got is...
type 'exceptions.NameError'>: name 'unique_file' is not defined
args = ("name 'unique_file' is not defined",)
message = "name 'unique_file' is not defined"
Any suggestions and corrections are appreciated
Thanks for your concern
unique_file() isn't a built-in function of Python. So I assume, either you forget a line in your first code snippet which actually imports this function, or you configured your python interpreter to load a startup file (http://docs.python.org/using/cmdline.html#envvar-PYTHONSTARTUP). In the second case, the CGI script can't find this function because it runs with the web server identity which probably lacks the PYTHONSTARTUP env. variable definition.
You need to either import or define your unique_file method before using it.
They will look something like:
from mymodule import unique_file
or:
def unique_file():
# return a unique file
Usually when a compiler or interpreter says something isn't defined, that's precisely what the problem is. So, you have to answer the question "why is it not defined?". Have you actually defined a method named "unique_file"? If so, maybe the name is misspelled, or maybe it's not defined before this code is executed.
If the function is in another file or module, have you imported that module to gain access to the function?
When you say it works in one way but not the other, what's the difference? Does one method auto-import some functions that the other does not?
Since unique_file is not a built-in command, you're probably forgetting to actually define a function with that name, or forgetting to import it from an existing module.

Categories

Resources