I'm learning about data structures and algorithm efficiency in my CS class right now, and I've created an algorithm to return a new list of items that in reverse order from the original list. I'm trying to figure out how to do this in place with Python lists using recursion, but the solution is eluding me. Here is my code:
def reverse_list(in_list):
n = len(in_list)
print('List is: ', in_list)
if n <= 2:
in_list[0], in_list[-1] = in_list[-1], in_list[0]
return in_list
else:
first = [in_list[0]]
last = [in_list[-1]]
temp = reverse_list(in_list[1:-1])
in_list = last + temp + first
print('Now list is: ', in_list)
return in_list
if __name__ == '__main__':
list1 = list(range(1, 12))
list2 = reverse_list(list1)
print(list2)
As a side note, this is O(n) in average case due to it being n/2 right?
You really don't want to be using recursion here as it doesn't really simplify the problem for the extra overhead involved with the function calls:
list3 = list(range(1, 10))
for i in range(len(list3)/2):
list3[i], list3[len(list3)-i-1] = list3[len(list3)-i-1], list3[i]
print(list3)
if you wanted to approach it using recursion, you'd pass in an index counter to each call of your function till you had gotten half way through the list. This would give you O(n/2) runtime.
Your n <= 2 case is good. It is "in-place" in both of the usual senses. It modifies the same list as is passed in, and it only uses a constant amount of memory in addition to the list passed in.
Your general case is where you have problems. It uses a lot of extra memory (n/2 levels of recursion, each of which holds two lists across the recursive call, so all those lists at all levels have to exist at once), and it does not modify the list passed in (rather, it returns a new list).
I don't want to do too much for you, but the key to solving this is to pass one (or two if you prefer) extra parameter(s) into the recursive step. Either an optional parameter to your function, or define a recursive helper function that has them. Use this to indicate what region of the list to reverse in-place.
As someone mentioned in comments, CPython doesn't have a tail-recursion optimization. This means that no algorithm that uses call-recursion like this is ever going to be truly "in-place", since it will use linear amounts of stack. But "in-place" in the sense of "modifies the original list instead of returning a new one" is certainly doable.
Related
So I was wondering how to best create a list of blank lists:
[[],[],[]...]
Because of how Python works with lists in memory, this doesn't work:
[[]]*n
This does create [[],[],...] but each element is the same list:
d = [[]]*n
d[0].append(1)
#[[1],[1],...]
Something like a list comprehension works:
d = [[] for x in xrange(0,n)]
But this uses the Python VM for looping. Is there any way to use an implied loop (taking advantage of it being written in C)?
d = []
map(lambda n: d.append([]),xrange(0,10))
This is actually slower. :(
The probably only way which is marginally faster than
d = [[] for x in xrange(n)]
is
from itertools import repeat
d = [[] for i in repeat(None, n)]
It does not have to create a new int object in every iteration and is about 15 % faster on my machine.
Edit: Using NumPy, you can avoid the Python loop using
d = numpy.empty((n, 0)).tolist()
but this is actually 2.5 times slower than the list comprehension.
The list comprehensions actually are implemented more efficiently than explicit looping (see the dis output for example functions) and the map way has to invoke an ophaque callable object on every iteration, which incurs considerable overhead overhead.
Regardless, [[] for _dummy in xrange(n)] is the right way to do it and none of the tiny (if existent at all) speed differences between various other ways should matter. Unless of course you spend most of your time doing this - but in that case, you should work on your algorithms instead. How often do you create these lists?
Here are two methods, one sweet and simple(and conceptual), the other more formal and can be extended in a variety of situations, after having read a dataset.
Method 1: Conceptual
X2=[]
X1=[1,2,3]
X2.append(X1)
X3=[4,5,6]
X2.append(X3)
X2 thus has [[1,2,3],[4,5,6]] ie a list of lists.
Method 2 : Formal and extensible
Another elegant way to store a list as a list of lists of different numbers - which it reads from a file. (The file here has the dataset train)
Train is a data-set with say 50 rows and 20 columns. ie. Train[0] gives me the 1st row of a csv file, train[1] gives me the 2nd row and so on. I am interested in separating the dataset with 50 rows as one list, except the column 0 , which is my explained variable here, so must be removed from the orignal train dataset, and then scaling up list after list- ie a list of a list. Here's the code that does that.
Note that I am reading from "1" in the inner loop since I am interested in explanatory variables only. And I re-initialize X1=[] in the other loop, else the X2.append([0:(len(train[0])-1)]) will rewrite X1 over and over again - besides it more memory efficient.
X2=[]
for j in range(0,len(train)):
X1=[]
for k in range(1,len(train[0])):
txt2=train[j][k]
X1.append(txt2)
X2.append(X1[0:(len(train[0])-1)])
To create list and list of lists use below syntax
x = [[] for i in range(10)]
this will create 1-d list and to initialize it put number in [[number] and set length of list put length in range(length)
To create list of lists use below syntax.
x = [[[0] for i in range(3)] for i in range(10)]
this will initialize list of lists with 10*3 dimension and with value 0
To access/manipulate element
x[1][5]=value
So I did some speed comparisons to get the fastest way.
List comprehensions are indeed very fast. The only way to get close is to avoid bytecode getting exectuded during construction of the list.
My first attempt was the following method, which would appear to be faster in principle:
l = [[]]
for _ in range(n): l.extend(map(list,l))
(produces a list of length 2**n, of course)
This construction is twice as slow as the list comprehension, according to timeit, for both short and long (a million) lists.
My second attempt was to use starmap to call the list constructor for me, There is one construction, which appears to run the list constructor at top speed, but still is slower, but only by a tiny amount:
from itertools import starmap
l = list(starmap(list,[()]*(1<<n)))
Interesting enough the execution time suggests that it is the final list call that is makes the starmap solution slow, since its execution time is almost exactly equal to the speed of:
l = list([] for _ in range(1<<n))
My third attempt came when I realized that list(()) also produces a list, so I tried the apperently simple:
l = list(map(list, [()]*(1<<n)))
but this was slower than the starmap call.
Conclusion: for the speed maniacs:
Do use the list comprehension.
Only call functions, if you have to.
Use builtins.
I'm new to python and am trying to figure out the concept of time and space complexity. I want to make a dict of two lists, both of the same length. I can do this in the following two ways:
1) by looping over the lists and adding them to the dict:
dictLists = {}
for i in range(0,len(list1)):
dictLists[list1[i]] = list2[i]
2) by zipping the lists and then making a dict from that:
dictZip = dict(zip(list1,list2))
To my understanding, the time complexity of the first method should be O(N) where N is the length of the lists. However, I do not know the time complexity for the second option, except the fact that the zip operation itself takes O(1) time complexity.
What would be the difference in time complexity between these two methods? Would there be additional space complexity in the second method due to an extra zip object?
Both have the same time and space complexity. They each have their own individual overheads that aren’t included when talking about complexity, like the zip object you mentioned and the range object you didn’t, all the function calls that happen in the shadows….
In practice, these aren’t important, so don’t micro-optimize prematurely (“prematurely” here means without having a good reason to expect a performance problem, without encountering one, and without benchmarking) – pick the readable option of dict(zip(list1, list2)).
P.S.
except the fact that the zip operation itself takes O(1) time complexity
Creating a zip is O(1), but iterating over all of its elements is O(N) on the number of elements.
Due to python being a dynamic interpreted language and needs to figure out the type of variables in runtime, some variation of the way you implement your code can be noticeably different in run time. For example in the first solution, python would need to figure out the type of "i" in every iteration (can by fixed using cython) so this would kinda slow down the program. With that being said you would not probably notice that with a small number of iteration. As you can see in the testbench the first approach is almost 4X slower.
import time
list1 = [x for x in range(1000000)]
list2 = [x for x in range(1000000)]
dictLists = dict()
l = len(list1)
s = time.time()
for i in range(0, l):
dictLists[list1[i]] = list2[i]
print(f"Time: {time.time()-s}")
# 0.39275574684143066
dictLists = dict()
s = time.time()
dictZip = dict(zip(list1,list2))
print(f"Time: {time.time()-s}")
# 0.09296393394470215
I noticed from this answer that the code
for i in userInput:
if i in wordsTask:
a = i
break
can be written as a list comprehension in the following way:
next([i for i in userInput if i in wordsTask])
I have a similar problem which is that I would like to write the following (simplified from original problem) code in terms of a list comprehension:
for i in xrange(N):
point = Point(long_list[i],lat_list[i])
for feature in feature_list:
polygon = shape(feature['geometry'])
if polygon.contains(point):
new_list.append(feature['properties'])
break
I expect each point to be associated with a single polygon from the feature list. Hence, once a polygon that contains the point is found, break is used to move on to the next point. Therefore, new_list will have exactly N elements.
I wrote it as a list comprehension as follows:
new_list = [feature['properties'] for i in xrange(1000) for feature in feature_list if shape(feature['geometry']).contains(Point(long_list[i],lat_list[i])]
Of course, this doesn't take into account the break in the if statement, and therefore takes significantly longer than using nested for loops. Using the advice from the above-linked post (which I probably don't fully understand), I did
new_list2 = next(feature['properties'] for i in xrange(1000) for feature in feature_list if shape(feature['geometry']).contains(Point(long_list[i],lat_list[i]))
However, new_list2 has much fewer than N elements (in my case, N=1000 and new_list2 had only 5 elements)
Question 1: Is it even worth doing this as a list comprehension? The only reason is that I read that list comprehensions are usually a bit faster than nested for loops. With 2 million data points, every second counts.
Question 2: If so, how would I go about incorporating the break statement in a list comprehension?
Question 3: What was the error going on with using next in the way I was doing?
Thank you so much for your time and kind help.
List comprehensions are not necessarily faster than a for loop. If you have a pattern like:
some_var = []
for ...:
if ...:
some_var.append(some_other_var)
then yes, the list comprehension is faster than the bunch of .append()s. You have extenuating circumstances, however. For one thing, it is actually a generator expression in the case of next(...) because it doesn't have the [ and ] around it.
You aren't actually creating a list (and therefore not using .append()). You are merely getting one value.
Your generator calls Point(long_list[i], lat_list[i]) once for each feature for each i in xrange(N), whereas the loop calls it only once for each i.
and, of course, your generator expression doesn't work.
Why doesn't your generator expression work? Because it finds only the first value overall. The loop, on the other hand, finds the first value for each i. You see the difference? The generator expression breaks out of both loops, but the for loop breaks out of only the inner one.
If you want a slight improvement in performance, use itertools.izip() (or just zip() in Python 3):
from itertools import izip
for long, lat in izip(long_list, lat_list):
point = Point(long, lat)
...
I don't know that complex list comprehensions or generator expressions are that much faster than nested loops if they're running the same algorithm (e.g. visiting the same number of values). To get a definitive answer you should probably try to implement a solution both ways and test to see which is faster for your real data.
As for how to short-circuit the inner loop but not the outer one, you'll need to put the next call inside the main list comprehension, with a separate generator expression inside of it:
new_list = [next(feature['properties'] for feature in feature_list
if shape(feature['shape']).contains(Point(long, lat)))
for long, lat in zip(long_list, lat_list)]
I've changed up one other thing: Rather than indexing long_list and lat_list with indexes from a range I'm using zip to iterate over them in parallel.
Note that if creating the Point objects over and over ends up taking too much time, you can streamline that part of the code by adding in another nested generator expression that creates the points and lets you bind them to a (reusable) name:
new_list = [next(feature['properties'] for feature in feature_list
if shape(feature['shape']).contains(point))
for point in (Point(long, lat) for long, lat in zip(long_list, lat_list))]
I am trying to create a function, new_function, that takes a number as an argument.
This function will manipulate values in a list based on what number I pass as an argument. Within this function, I will place another function, new_sum, that is responsible for manipulating values inside the list.
For example, if I pass 4 into new_function, I need new_function to run new_sum on each of the first four elements. The corresponding value will change, and I need to create four new lists.
example:
listone=[1,2,3,4,5]
def new_function(value):
for i in range(0,value):
new_list=listone[:]
variable=new_sum(i)
new_list[i]=variable
return new_list
# running new_function(4) should return four new lists
# [(new value for index zero, based on new_sum),2,3,4,5]
# [1,(new value for index one, based on new_sum),3,4,5]
# [1,2,(new value for index two, based on new_sum),4,5]
# [1,2,3,(new value for index three, based on new_sum),5]
My problem is that i keep on getting one giant list. What am I doing wrong?
Fix the indentation of return statement:
listone=[1,2,3,4,5]
def new_function(value):
for i in range(0,value):
new_list=listone[:]
variable=new_sum(i)
new_list[i]=variable
return new_list
The problem with return new_list is that once you return, the function is done.
You can make things more complicated by accumulating the results and returning them all at the end:
listone=[1,2,3,4,5]
def new_function(value):
new_lists = []
for i in range(0,value):
new_list=listone[:]
variable=new_sum(i)
new_list[i]=variable
new_lists.append(new_list)
return new_lists
However, this is exactly what generators are for: If you yield instead of return, that gives the caller one value, and then resumes when he asks for the next value. So:
listone=[1,2,3,4,5]
def new_function(value):
for i in range(0,value):
new_list=listone[:]
variable=new_sum(i)
new_list[i]=variable
yield new_list
The difference is that the first version gives the caller a list of four lists, while the second gives the caller an iterator of four lists. Often, you don't care about the difference—and, in fact, an iterator may be better for responsiveness, memory, or performance reasons.*
If you do care, it often makes more sense to just make a list out of the iterator at the point you need it. In other words, use the second version of the function, then just writes:
new_lists = list(new_function(4))
By the way, you can simplify this by not trying to mutate new_list in-place, and instead just change the values while copying. For example:
def new_function(value):
for i in range(value):
yield listone[:i] + [new_sum(i)] + listone[i+1:]
* Responsiveness is improved because you get the first result as soon as it's ready, instead of only after they're all ready. Memory use is improved because you don't need to keep all of the lists in memory at once, just one at a time. Performance may be improved because interleaving the work can result in better cache behavior and pipelining.
So I was wondering how to best create a list of blank lists:
[[],[],[]...]
Because of how Python works with lists in memory, this doesn't work:
[[]]*n
This does create [[],[],...] but each element is the same list:
d = [[]]*n
d[0].append(1)
#[[1],[1],...]
Something like a list comprehension works:
d = [[] for x in xrange(0,n)]
But this uses the Python VM for looping. Is there any way to use an implied loop (taking advantage of it being written in C)?
d = []
map(lambda n: d.append([]),xrange(0,10))
This is actually slower. :(
The probably only way which is marginally faster than
d = [[] for x in xrange(n)]
is
from itertools import repeat
d = [[] for i in repeat(None, n)]
It does not have to create a new int object in every iteration and is about 15 % faster on my machine.
Edit: Using NumPy, you can avoid the Python loop using
d = numpy.empty((n, 0)).tolist()
but this is actually 2.5 times slower than the list comprehension.
The list comprehensions actually are implemented more efficiently than explicit looping (see the dis output for example functions) and the map way has to invoke an ophaque callable object on every iteration, which incurs considerable overhead overhead.
Regardless, [[] for _dummy in xrange(n)] is the right way to do it and none of the tiny (if existent at all) speed differences between various other ways should matter. Unless of course you spend most of your time doing this - but in that case, you should work on your algorithms instead. How often do you create these lists?
Here are two methods, one sweet and simple(and conceptual), the other more formal and can be extended in a variety of situations, after having read a dataset.
Method 1: Conceptual
X2=[]
X1=[1,2,3]
X2.append(X1)
X3=[4,5,6]
X2.append(X3)
X2 thus has [[1,2,3],[4,5,6]] ie a list of lists.
Method 2 : Formal and extensible
Another elegant way to store a list as a list of lists of different numbers - which it reads from a file. (The file here has the dataset train)
Train is a data-set with say 50 rows and 20 columns. ie. Train[0] gives me the 1st row of a csv file, train[1] gives me the 2nd row and so on. I am interested in separating the dataset with 50 rows as one list, except the column 0 , which is my explained variable here, so must be removed from the orignal train dataset, and then scaling up list after list- ie a list of a list. Here's the code that does that.
Note that I am reading from "1" in the inner loop since I am interested in explanatory variables only. And I re-initialize X1=[] in the other loop, else the X2.append([0:(len(train[0])-1)]) will rewrite X1 over and over again - besides it more memory efficient.
X2=[]
for j in range(0,len(train)):
X1=[]
for k in range(1,len(train[0])):
txt2=train[j][k]
X1.append(txt2)
X2.append(X1[0:(len(train[0])-1)])
To create list and list of lists use below syntax
x = [[] for i in range(10)]
this will create 1-d list and to initialize it put number in [[number] and set length of list put length in range(length)
To create list of lists use below syntax.
x = [[[0] for i in range(3)] for i in range(10)]
this will initialize list of lists with 10*3 dimension and with value 0
To access/manipulate element
x[1][5]=value
So I did some speed comparisons to get the fastest way.
List comprehensions are indeed very fast. The only way to get close is to avoid bytecode getting exectuded during construction of the list.
My first attempt was the following method, which would appear to be faster in principle:
l = [[]]
for _ in range(n): l.extend(map(list,l))
(produces a list of length 2**n, of course)
This construction is twice as slow as the list comprehension, according to timeit, for both short and long (a million) lists.
My second attempt was to use starmap to call the list constructor for me, There is one construction, which appears to run the list constructor at top speed, but still is slower, but only by a tiny amount:
from itertools import starmap
l = list(starmap(list,[()]*(1<<n)))
Interesting enough the execution time suggests that it is the final list call that is makes the starmap solution slow, since its execution time is almost exactly equal to the speed of:
l = list([] for _ in range(1<<n))
My third attempt came when I realized that list(()) also produces a list, so I tried the apperently simple:
l = list(map(list, [()]*(1<<n)))
but this was slower than the starmap call.
Conclusion: for the speed maniacs:
Do use the list comprehension.
Only call functions, if you have to.
Use builtins.