Im having some trouble writing a program and was wondering if somebody could help. Here is my code so far:
def parseExtension(filename):
periodPosition = filename.find(".")
extension = (filename[periodPosition + 1:])
return extension
def fileExtensionExists(fileList, fileExtension):
for fileName in fileList:
return(parseExtension(fileList))
return
print(fileExtensionExists( ["python.exe", "assignment5.docx",
"assignment4.py", "shortcuts.docx", "geographyhw1.txt"], "py"))
The program consists of two functions. The first function you see takes each file from the file list, finds the period, and returns what comes after it, aka the extension.
The second function (main function) is where im having trouble. The second function is suppose to call the first function in a for loop to receive the extensions of all the files in the list, and then compare all the returned extensions to the second hard coded parameter, "py" or "fileExtension" within the function. If there are "py" files in the list, the function should return true, if not, it should return false.
return exits the function, it won't continue executing the function after a return statement.
Either build up your results in a list and then return them after the end of the for loop, or use yield to pass them back one at a time (but then you need to use a loop or construct a list from the results).
def fileExtensionExists(fileList, fileExtension):
extensions = []
for fileName in fileList:
extensions.append(parseExtension(fileList))
return extensions
or
def fileExtensionExists(fileList, fileExtension):
for fileName in fileList:
yield parseExtension(fileList)
print(list(fileExtensionExists( ["python.exe", "assignment5.docx",
"assignment4.py", "shortcuts.docx", "geographyhw1.txt"], "py")))
BTW, unless this is a school exercise, do yourself a favour and use use os.path.splitext() to split off the extension, there's really no need to re-invent the wheel here, Python comes with lots of wheels. splitext() will include the period in front of the extension so you would want to pass in ".py" but that usually keeps code cleaner in any case.
First of all, you need to supply fileName in the parseExtension function call. But then you have some more issues. I think what you want is this.
def parseExtension(filename):
periodPosition = filename.find(".")
extension = (filename[periodPosition + 1:])
return extension
def fileExtensionExists(fileList, fileExtension):
for fileName in fileList:
if parseExtension(fileName) == fileExtension:
return True
return False
my_files = ["python.exe", "assignment5.docx", "assignment4.py",
"shortcuts.docx", "geographyhw1.txt"]
print(fileExtensionExists(my_files, "py"))
Your for loop does nothing: it returns after the first element off the list fileExtensionExists exits at the first return met
Related
I know this may be a stupid question but I am not sure how to use functions properly. For an assignment I had to come up with a program and it had to be written in the form of a function. Initially I started off doing it without the function format planning to add it at the very end but when I do so I get an error when I close the function at the very end. Could someone help me figure out what do I need to do please?
It's not how you're "closing" (returning from) the function, but how you're calling it and how it's using the parameter.
Change:
def find_repeats(fname):
file_name = "whatever.txt"
to:
def find_repeats(file_name):
# don't set file_name to anything, it already has a value from the caller
so that the body of the function will use the file_name you pass in instead of a hardcoded one.
Now pass it in when you call the function by changing:
find_repeats(fname) # this would error because what's fname?
to:
find_repeats("whatever.txt")
I think this is what you are looking for
fname = "whatever.txt"
def find_repeats(fname):
textfile = open(fname, "r")
for line_number, line in enumerate(textfile):
for x, y in zip(line.split(), line.split()[1:]):
if(x==y):
print(line_number,x,y,"Error")
textfile.close() #function ends (close!) here
find_repeats(fname) #This is how you call above function
I need to 1) Find a zipfile at a particular directory location 2) If it exists then unzip it 3) Out of its contents find a specific file and move it to other directory.
def searchfile():
for file in os.listdir('/user/adam/datafiles'):
if fnmatch.fnmatch(file, 'abc.zip'):
return True
return False
if searchfile():
print('File exists')
else:
print('File not found')
def file_extract():
os.chdir('/user/adam/datafiles')
file_name = 'abc.zip'
destn = '/user/adam/extracted_files'
zip_archive = ZipFile (file_name)
zip_archive.extract('class.xlsx',destn)
print("Extracted the file")
zip_archive.close()
file_extract()
When I execute the above script, it shows no compile time issues or runtime issues,. but it just works for the first function. When I check for the files in the extracte_files folder I don't see the files.
So for the sake of completeness and as my comment solved your issue, I guess I should make it an Answer:
In Python, if a function foo is defined (def foo(<...>):),
foo refers to the function itself and can be copied around (effectively copying a pointer), passed to other functions, ... as about any object;
foo() is a call without passed argument to that function.
As this question does not seem to be an assignment, I will add the following:
To improve your code, you might want to look into:
Parameters to functions (you functions are currently only doing one single thing. For example, you could pass the file and directory names to searchfile);
os.path and all its content;
The in test for checking if an object is in a container;
The with statement for clearer and safer handling of objects such as ZipFile instances;
The x if b else y construst;
Notice that even if the archive does not exist, your code still attempts to extract a file from it.
Here is a more robust way to implement what you want:
import os
import zipfile
arch_name, file_name = 'abc.zip', 'class.xlsx'
home_dir = os.path.join(os.path.abspath(os.sep), 'user', 'adam')
# or even better: home_dir = os.path.expanduser('~')
arch_dir = os.path.join(home_dir, 'datafiles')
dest_dir = os.path.join(home_dir, 'extracted_files')
arch_path = os.path.join(arch_dir, arch_name)
if os.path.isfile(arch_path):
print('File {} exists'.format(arch_path))
with zipfile.ZipFile(arch_path) as archive:
archive.extract(file_name, dest_dir)
print('Extracted {} from {}'.format(file_name, arch_name))
else:
print('File {} not found'.format(arch_path))
Disclaimer: This code is untested and could contain minor mistakes!
Notice how the second half of the code works with generic variables that can easily be modified in a single place in the first half. Also, notice the improved readability of if os.path.isfile(arch_path): as opposed to if searchfile(): (requiring us to then read the implementation of searchfile).
There is a good chapter that will help with this in Automate the Boring Stuff
https://automatetheboringstuff.com/chapter9/
I have written a small scrip (this is partial), the full code should search a bunch of .c files and check if the parameters within it are being used or not. This particular code is responsible for grabbing the parameter from a row, so it can be used to search the .c files for identical parameter names and it's values.
The issue is that the first instant of print (inside takeTheParam method) shows the correct parameter in command prompt, while the second print instant (after the call to the takeTheParam method) shows a blank in command prompt.
import os
theParam = ""
def takeTheParam(row, theParam):
for item in row.split():
if "_" in item:
theParam = item
print theParam
return theParam
for root, dirs, files in os.walk('C:/pathtoworkdir'):
for cFile in files:
if cFile.endswith('.c'):
with open(os.path.join(root, cFile), 'r') as this:
for row in this:
if '=' in row:
takeTheParam(row, theParam)
print theParam
while theParam not in usedParameters: # Has the param already been checked?
value(row, savedValue, statements, cur)
searchAndValueExtract(theParam, parameterCounter, compareValue)
while isEqual(savedValue, compareValue, equalValueCounter):
searchAndValueExtract(theParam, parameterCounter, compareValue)
else:
# If IsEqual returns false, that means a param has different values
# and it's therefore being used
usedParameters.append(theParam)
pass
I have't got enough experience in python to figure out why this happens, but I suspect that when theParam is used outside of the method it's value it retrieved from it's definition on the beginning of the code (theParam = "") and I have no idea why, if this is the case.
Change
takeTheParam(row, theParam)
to
theParam = takeTheParam(row, theParam)
The returned variable is never assigned to theParam in your case, so it would stay "" forever. Now it isn't anymore.
I'm attempting to loop through a directory and any nested directories within. It seemed like recursion would be a good way to go about it.
I ended up with this code:
def get_file_list(directory=os.getcwd()):
for i in os.listdir(directory):
if os.path.isdir(i):
get_file_list(i)
continue
print i
This prints everything beautifully -- exactly the output I expected. However, I wanted to take this list of files and pass it to another function for further processing. So I tried compiling everything into a list.
def get_file_list(directory=os.getcwd()):
files = []
for i in os.listdir(directory):
if os.path.isdir(i):
get_file_list(i)
continue
files.append(i)
return files
So now, the problem is that it only returns the files from the current working directory. After some thinking, I guess this is a scoping issue. A new files variable is being created in a unique piece of memory each time get_file_list() is called, right? So how do you get around something like this? How do you assemble the results from nested calls?
all_files =[]
for current_dir,files,directories in os.walk("C:\\"):
current_files = [os.path.join(current_dir,file) for file in files]
all_files.extend(current_files)
print all files
I would think would work better
Use extend:
def get_file_list(directory='.'):
files = []
for i in os.listdir(directory):
if os.path.isdir(i):
files.extend(get_file_list(i))
else:
files.append(i)
return files
Also, I changed your os.getcwd() call to just . since you probably want it to default to the current current working directory, not the working directory at the point at which the function was defined.
Use generators! They're very powerful and make things easy to read. Here are some references.
Basically, you use "yield" to return values instead of "return". When the function encounters a "yield" statement, it returns the value and pauses the execution of the function, meaning when the function is called again later, it picks up where it left off!
And to top it off, you can tell python to iterate over generator functions using "for x in my_generator_function()". Very handy.
import os
#this is a "generator function"
def get_files(directory='.'):
for item in os.listdir(directory):
item = os.path.join(directory, item)
if os.path.isdir(item):
for subitem in get_files(item):
yield subitem
# The fact that there's a "yield" statement here
# tells python that this is a generator function
else:
yield item
for item in get_files():
print item # Do something besides printing here, obviously ;)
A common way to do this recursively in the spirit of your original question is to pass in the list you are appending to as a parameter. Pass the empty list to the very first call to the function. A recursive "helper" (often implemented as a nested function) can accumulate the files.
EDIT:
Here is a complete script (fixed from a previous version):
import os
def get_file_list(directory=os.getcwd()):
def file_list(directory, files):
for i in os.listdir(directory):
if os.path.isdir(i):
file_list(i, files)
continue
files.append(i)
return files
return file_list(directory, [])
print get_file_list()
import os
def get_file_list(files,directory=os.getcwd()):
for i in os.listdir(directory):
if os.path.isdir(i):
get_file_list(files,i) #note me needed to amend this call to pass the reference down the calls
continue
files.append(i) #insert the file name into our referenced list.
myfiles = [] #the list we want to insert all the file names into
get_file_list(myfiles) #call the function and pass a reference to myfiles in
print('\n'.join(myfiles))
I am trying to do something to all the files under a given path. I don't want to collect all the file names beforehand then do something with them, so I tried this:
import os
import stat
def explore(p):
s = ''
list = os.listdir(p)
for a in list:
path = p + '/' + a
stat_info = os.lstat(path )
if stat.S_ISDIR(stat_info.st_mode):
explore(path)
else:
yield path
if __name__ == "__main__":
for x in explore('.'):
print '-->', x
But this code skips over directories when it hits them, instead of yielding their contents. What am I doing wrong?
Iterators do not work recursively like that. You have to re-yield each result, by replacing
explore(path)
with something like
for value in explore(path):
yield value
Python 3.3 added the syntax yield from X, as proposed in PEP 380, to serve this purpose. With it you can do this instead:
yield from explore(path)
If you're using generators as coroutines, this syntax also supports the use of generator.send() to pass values back into the recursively-invoked generators. The simple for loop above would not.
The problem is this line of code:
explore(path)
What does it do?
calls explore with the new path
explore runs, creating a generator
the generator is returned to the spot where explore(path) was executed . . .
and is discarded
Why is it discarded? It wasn't assigned to anything, it wasn't iterated over -- it was completely ignored.
If you want to do something with the results, well, you have to do something with them! ;)
The easiest way to fix your code is:
for name in explore(path):
yield name
When you are confident you understand what's going on, you'll probably want to use os.walk() instead.
Once you have migrated to Python 3.3 (assuming all works out as planned) you will be able to use the new yield from syntax and the easiest way to fix your code at that point will be:
yield from explore(path)
Use os.walk instead of reinventing the wheel.
In particular, following the examples in the library documentation, here is an untested attempt:
import os
from os.path import join
def hellothere(somepath):
for root, dirs, files in os.walk(somepath):
for curfile in files:
yield join(root, curfile)
# call and get full list of results:
allfiles = [ x for x in hellothere("...") ]
# iterate over results lazily:
for x in hellothere("..."):
print x
Change this:
explore(path)
To this:
for subpath in explore(path):
yield subpath
Or use os.walk, as phooji suggested (which is the better option).
That calls explore like a function. What you should do is iterate it like a generator:
if stat.S_ISDIR(stat_info.st_mode):
for p in explore(path):
yield p
else:
yield path
EDIT: Instead of the stat module, you could use os.path.isdir(path).
Try this:
if stat.S_ISDIR(stat_info.st_mode):
for p in explore(path):
yield p
os.walk is great if you need to traverse all the folders and subfolders. If you don't need that, it's like using an elephant gun to kill a fly.
However, for this specific case, os.walk could be a better approach.
You can also implement the recursion using a stack.
There is not really any advantage in doing this though, other than the fact that it is possible. If you are using python in the first place, the performance gains are probably not worthwhile.
import os
import stat
def explore(p):
'''
perform a depth first search and yield the path elements in dfs order
-implement the recursion using a stack because a python can't yield within a nested function call
'''
list_t=type(list())
st=[[p,0]]
while len(st)>0:
x=st[-1][0]
print x
i=st[-1][1]
if type(x)==list_t:
if i>=len(x):
st.pop(-1)
else:
st[-1][1]+=1
st.append([x[i],0])
else:
st.pop(-1)
stat_info = os.lstat(x)
if stat.S_ISDIR(stat_info.st_mode):
st.append([['%s/%s'%(x,a) for a in os.listdir(x)],0])
else:
yield x
print list(explore('.'))
To answer the original question as asked, the key is that the yield statement needs to be propagated back out of the recursion (just like, say, return). Here is a working reimplementation of os.walk(). I'm using this in a pseudo-VFS implementation, where I additionally replace os.listdir() and similar calls.
import os, os.path
def walk (top, topdown=False):
items = ([], [])
for name in os.listdir(top):
isdir = os.path.isdir(os.path.join(top, name))
items[isdir].append(name)
result = (top, items[True], items[False])
if topdown:
yield result
for folder in items[True]:
for item in walk(os.path.join(top, folder), topdown=topdown):
yield item
if not topdown:
yield result