I have inherited some Python scripts and I'm working to understand them. I am a beginner-level Python programmer but very experienced in several other scripting languages.
The following Python code snippet generates a file list which is then used in a later code block. I would like to understand exactly how it is doing it. I understand that os.path.isfile is a test for filetype and os.path.join combines the arguments in to a filepath string. Could someone help me understand the rest?
flist = [file for file in whls if os.path.isfile(os.path.join(whdir, i, file))]
whls is an iterable of some kind.
For each element in whls, it checks if os.path.join(whdir, i, that_element) is a file.
(os.path.join("C:","users","adsmith") on Windows is r"C:\users\adsmith")
If so, it includes it in that list.
As #jonsharpe posted in the comments, this is an example of a list comprehension which are well worth your time to master.
The list comprehension means that python will iterate over each member of whls (this is maybe a tuple/list?), and for each item, it will test whether os.path.join(whdir, i, file) is a file (as opposed to a directory etc). It will return a list containing only the elements from whls that pass this condition check.
This list comprehension is equivalent to the following loop:
flist = []
for file in whls:
if os.path.isfile(os.path.join(whdir, i, file)):
flist.append(file)
The list comprehension is more compact. Performance-wise, they are similar, with the list comprehension being a little faster because it doesn't load the append() method.
Related
I am trying to remove multiple files using this method:
map(os.remove, glob.glob("*.pdf"))
I am getting a list of files with pdf extension but this does not remove any files.
My solution was to wrap this map with list().
Any other solution that does not require using list or something?
Use a for loop:
for i in glob.glob('*.pdf'):
os.remove(i)
The reason why map(...) doesn't work by itself is that map(...) will return a generator and it will only evaluate the expressions once the items are actually accessed from the generator.
Furthermore, the point of map(...) is to group the results of the expressions returned by the function that is called on every item, but that doesn't really make sense here since os.remove(...) doesn't really return anything (or, in other words, returns None), so since assembling a list of Nones doesn't really serve much purpose (you're throwing it out right away anyways), using a for loop is a more appropriate way to approach this task.
Someone has challenged me to create a program that sorts their pictures into folders based on the month they were taken, and I want to do it in one line (I know, it's inefficient and unreadable, but I still want to do it because one-liners are cool)
I needed a for loop to accomplish this, but the only way I know of to use a for loop in one line is list comprehension, so that's what I did, but it creates an empty list, and doesn't print anything from the list or anything.
What I'm doing is renaming the file to be the month created + original filename (ex: bacon.jpg --> May\bacon.jpg)
Here is my code (Python 3.7.3):
import time
import os.path
[os.rename(str(os.fspath(f)), str(time.ctime(os.path.getctime(str(os.fspath(f))))).split()[1] + '\\' + str(os.fspath(f))) for f in os.listdir() if f.endswith('.jpg')]
and the more readable, non-list-comprehension version:
import time
import os.path
for f in os.listdir():
fn = str(os.fspath(f))
dateCreated = str(time.ctime(os.path.getctime(fn)))
monthCreated = dateCreated.split()[1]
os.rename(fn, monthCreated + '\\' + fn)
Is list comprehension a bad way to do it? Also, is there a reason why, if I print the list it's [] instead of [None, None, None, None, None, (continuing "None"s for every image moved)]?
Please note: I understand that it's inefficient and bad practice. If I were doing this for purposes other than just for fun to see if I could do it, I would obviously not try to do it in one line.
This is bad in two immediate respects:
You're using a list comprehension when you're not actually interested in constructing a list -- you ignore the object you just constructed.
Your construction has an ugly side effect in the OS.
Your purpose appears to be renaming a sequence of files, not constructing a list. The Python facility you want is, I believe, the map function. Write a function to change one file name, and then use map on a list of file names -- or tuples of old, new file names -- to run through the sequence of desired changes.
Is list comprehension a bad way to do it?
YES. But if you want to do it in one line, it is either that or using ";". For instance:
for x in range(5): print(x);print(x+2)
And, by the way, just renaming a file including a slash will not create a folder. You have to use os.mkdir('foldername').
In the end, if you really want to do that, I would just recommend doing it normally in many lines and then separating it with semicolons in a single line.
Why am I seeing extra ] characters in output of a list construction that should have just a list of lists? Is this a terminal problem (using CoCalc's terminal)?
Particularly, the output should have just two levels of lists, the global list and each of the sublists inside it.
But when I read through the output of data in a Python interpreter in CoCalc's terminal. Then I see this kind of thing:
Notice the extra ] as if there was inner lists that should not exist. Also notice the numbering which seems to not be in order, even though in the data it is ordered.
What's happening here?
To reconstruct the problem:
Download the dorothea_valid.data file from here:
https://archive.ics.uci.edu/ml/machine-learning-databases/dorothea/DOROTHEA/
Then create a project in CoCalc (https://cocalc.com/). Upload dorothea_valid.data to that project.
Start a Linux terminal in CoCalc, and make sure you know the path/working directory so that you can find dorothea_valid.data from Python. In the Linux terminal start the Python interpreter by writing python.
Paste the following function meant for reading a file with sequences of integer values separated by "\n" to the interpreter:
def read_datafile(fname):
data = list()
with open(fname, 'r') as file:
for line in file:
data.append([int(i) for i in line.split()])
return data
# and then call print(read_datafile(fname)) to get the output.
Then call read_datafile() on dorothea_valid.data, and then print the resulting object as suggested in the above comment. The screen captured lines are seen when scrolling right to the bottom, however problems may be seen from other parts of the output as well.
EDIT:
It's now 10/08/2022 and I'm unable to see the problem. Maybe it has been fixed in CoCalc.
You are creating inner lists. You're using one list comprehension per line of the file so it's making one list of integers per line. If you want it all as one list, use extend rather than append:
for line in file:
data.extend(int(i) for i in line.split())
Notice I'm using a generator expression here rather than a list comprehension. Using a list comprehension is a waste becaues it creates the whole list in memory only to be read through once and then discarded.
In python 2, I used map to apply a function to several items, for instance, to remove all items matching a pattern:
map(os.remove,glob.glob("*.pyc"))
Of course I ignore the return code of os.remove, I just want all files to be deleted. It created a temp instance of a list for nothing, but it worked.
With Python 3, as map returns an iterator and not a list, the above code does nothing.
I found a workaround, since os.remove returns None, I use any to force iteration on the full list, without creating a list (better performance)
any(map(os.remove,glob.glob("*.pyc")))
But it seems a bit hazardous, specially when applying it to methods that return something. Another way to do that with a one-liner and not create an unnecessary list?
The change from map() (and many other functions from 2.7 to 3.x) returning a generator instead of a list is a memory saving technique. For most cases, there is no performance penalty to writing out the loop more formally (it may even be preferred for readability).
I would provide an example, but #vaultah nailed it in the comments: still a one-liner:
for x in glob.glob("*.pyc"): os.remove(x)
When I was a python beginner, I could create a multiple lines for loop that make a list of 1~100:
a=[]
for i in range(1,101):
a.append(i)
When I knew how to write a single line for loop, I could simply my code.
a=[ _ for _ in range(1,101)]
When I review python document and relearn python in detail now, I find range() built-in function it can directly make a list, but I look no one doing this. Why?
a=range(1,101)
In Python 2.x
If you want to create a list of numbers from 1 to 100, you simply do:
range(1, 101)
In Python 3.x
range() no longer returns a list, but instead returns a generator. We can easily convert that into a list though.
list(range(1, 101))
When I review python document and relearn python in detail now, I find
range() built-in function it can directly make a list, but I look no
one doing this.
Depends, if you are using Python 2.X it does but for Python 3.X it produces a range object which should be iterated upon to create a list if you need to.
But in any case for all practical purpose extending a range object as a List comprehension is useless and have an unnecessary memory hogging.