Why is concatenating strings with ''.join(list) so popular? [duplicate] - python

This question already has answers here:
Python string join performance
(7 answers)
Closed 5 years ago.
I know that ''.join(list) is the preferred method to concatenate strings as opposed to say:
for x in list:
s += x
My question is why is this much faster?
Also, what if I need to concatenate items that are not already in a list? Is it still faster to put them in a list just for the purpose of doing the ''.join(list)?
EDIT: This is different than the previously linked question because I'm specifically interested in knowing if the items are not in a list already, is it still recommended for performance reasons to put them in a list for the sole purpose of joining.

This is faster because the join method gets to dive "under the surface" and use lower-level optimizations not available from the Python layer. The loop has to plod through the sequence generator and deal with each object in turn. Also, your loop has to build a new string on each iteration, a slow process. join gets to use mutable strings on the C layer or below.
If the objects aren't already in a list ... it depends on the application. However, I suspect that almost any such application will have to go through that loop-ish overhead somewhere just to form the list, so you'd lose some of the advantage of join, although the mutable string would still save time.

Yes, join is faster because it doesn't need to keep building new strings.
But you don't need a list to use join! You can give it any iterable, such as a generator expression:
''.join(x for x in lst if x != 'toss')
It appears that join is optimized when you use a list though. All of these are equivalent, but the one with a list comprehension is fastest.
>>> timeit("s=''.join('x' for i in range(200) if i!=47)")
15.870241802178043
>>> timeit("s=''.join(['x' for i in range(200) if i!=47])")
11.294011708363996
>>> timeit("s=''\nfor i in range(200):\n if i!=47:\n s+='x'")
16.86279364279278

Related

If I am appending multiple elements to a list in python, is it more worth putting them all into a list then extending it? [duplicate]

This question already has answers here:
Python - append VS extend efficiency
(3 answers)
Closed last year.
For example:
lst.append(x)
lst.append(25)
lst.append(y)
Would it be better to write this:
lst.extend([x, 25, y])
I would argue that it depends on the nature of the list being appended. Given that the list is fixed, meaning that you are sure what elements are in it, it would make sense to use the latter version (lst.extend([x, 25, y])).
Given that you want to append elements depending on certain logical conclusions, you should probably append elements separately, except if you are sure that two elements are always being appended together.
What I mean by this is that if you want to append elements to the list depending on logic such as if-statements, where one if statement might append one element to the list but not another, it makes sense to split the elements being appended into several separate elements. If you know on the other hand that all elements will be added at the same time, you would much rather use the latter version, as it makes your code more compact and easier to read.
extend() also runs faster than append(), if that is what you are after

Which of these is the fastest way to check if a list is empty in Python? [duplicate]

This question already has answers here:
How do I check if a list is empty?
(27 answers)
Closed 2 years ago.
Before getting to the main question, I should first ask: When you're trying to check if a list is empty in Python, is there any case where the four cases below would yield a different boolean?
if not []
if not len([])
if len([]) == 0
if len([]) is 0
If not, which is the fastest way to check for this boolean and why? - i.e. what exactly is happening under the hood for each case? The difference may be trivial but I'm curious as to how these might differ during execution.
if not array
This is the most idiomatic way to check it. Caveat: it will not work on other iterables, e.g. numpy arrays.
if not len(array)
Equivalent to the expression above, but is not as idiomatic. It will work on numpy arrays, but still might fail on other iterables with custom __len__ (nonexistent threat, to be clear)
if len(array) == 0
Same as above, but eliminates the nonexistent threat from custom iterables
if len(array) is 0
DANGER ZONE: it will work in CPython because of implementation details, but generally there is no guarantee it won't break in the future, or that it'll work on other Python implementations. Avoid at all costs.

Joining Lists and Splitting Strings [duplicate]

This question already has answers here:
Why is it string.join(list) instead of list.join(string)?
(11 answers)
Closed 7 years ago.
I have some previous experience with C++ and just getting started up with Python. I read this text from Dive into Python :
In my experience, a general idea is, if you want to perform an operation on object 'O', you call a method on that object to do it.
Eg. If I have a list object, and I want to get summation of all elements, I would do something like :
listObject.getSumOfAllElements()
However, the call given in above book excerpt looks a little odd to me. To me this would make more sense :
return (["%s=%s" % (k, v) for k, v in params.items()]).join(";")
i.e. join all the elements of the list as a string, and here, use this parameter ';' as a delimiter.
Is this a design difference or just syntactically a little different than what I am thinking ?
Edit:
For completion, the book says this a little later :
Is this a design difference or just syntactically a little different than what I am thinking?
Yes, I think it is by design. The join function is intentionally generic, so that it can be used with any iterable (including iterator objects and generators). This avoids implementing join for list, tuple, set etc separately.

Python build a list by two different methods [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
What is the difference between LIST.append(1) and LIST = LIST + [1] (Python)
I'm new to Python and new to programming. I followed the book ThinkPython and here is one thing I can't get straight.
Exercise 10.7 Write a function that reads the file words.txt and builds a list with one element per word. Write two versions of this function, one using the append method and the other using the idiom t = t + [x]. Which one takes longer to run? Why?
I tried the two methods and found the later one (t=t+[x]) took much longer time than append method. Here is my first question, why would this happen?
I changed the line t=t+[x] to t+=[x] just for no reason only to find this revised version take almost the same time as the append method. I thought t=t+[x] is equal to t+=[x], apparently they are not. Why?
BTW: I tried search Google using python += as key words but it seems Google won't take += as a key word even I put a quotation mark to it.
t = t + [x]
takes t, concatenates with [x] (calling t's method __add__), which creates a new list, which is then named t.
t += [x]
calls the t's method __iadd__ which works directly on the list itself. There is no extra list created.
First, you need to know, that the add method results creating a new object, while append() just modifies the existing object, thus resulting in better performance.
As for the second question, knowing the above, you may find out what the '+=' or 'plus equals' operator is equivalent to in python and therefore behave differently to '+' operator.
You might also want to check out this link which explains the difference between add and iadd methods which are being called in your example and perhaps this one as well to establish your knowledge.

Length of the longest sublist? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Python’s most efficient way to choose longest string in list?
I have a list L
L = [[1,2,3],[5,7],[1,3],[77]]
I want to return the length of the longest sublist without needing to loop through them, in this case 3 because [1,2,3] is length 3 and it is the longest of the four sublists. I tried len(max(L)) but this doesn't do what I want. Any way to do this or is a loop my only way?
max(L,key=len) will give you the object with the longest length ([1,2,3] in your example) -- To actually get the length (if that's all you care about), you can do len(max(L,key=len)) which is a bit ugly -- I'd break it up onto 2 lines. Or you can use the version supplied by ecatamur.
All of these answers have loops -- in my case, the loops are implicit which usually means they'll be executed in optimized native machine code. If you think about it, How could you know which element is the longest without looking at each one?
Finally, note that key=function isn't a feature that is specific to max. A lot of the python builtins (max,min,sorted,itertools.groupby,...) use this particular keyword argument. It's definitely worth investing a little time to understand how it works and what it typically does.
Try a comprehension:
max(len(l) for l in L)

Categories

Resources