Confused myself with matrix and array addition - python

I was doing some practice with lists, arrays and matrices in python and I got confused at something.
if I do:
list1 = [1,2,3,4]
list2 = [2,3,4,5]
print list1 + list2
Output:
I get [1,2,3,4,2,3,4,5]
I think it was like yesterday I was doing something similar but I got
Output2:
[3,5,7,9]
the actual addition of the values of each respective element on both lists. But I was actually expecting it to be the first output, but it added the values.
I haven't done linear algebra or prob&stats in a while. What was the method called for the output I got in output1? and output2? I've confused myself bad.
edit: http://docs.scipy.org/doc/numpy-dev/reference/generated/numpy.add.html
If you look at the 2nd example they do a 3x3 array + 1x3 array. I thought if there not the same dimension you can't add them?

When using standard lists, addition is defined as concatenation of the two lists
import numpy as np
list1 = [1,2,3,4]
list2 = [2,3,4,5]
print list1 + list2
# [1, 2, 3, 4, 2, 3, 4, 5]
When using numpy types, addition is defined as element-wise addition rather than list concatenation.
array1 = np.array(list1)
array2 = np.array(list2)
print array1 + array2
# [3 5 7 9]
This is often called a vectorized operation. In cases where arrays are large it can be faster than iterating over the structures, since the vectorized operation utilize a highly optimized implementation which is provided by numpy.

If you do not understand zip or numpy -
Assuming both lists list1 and list2 have same length, this will do your work
[a[i]+b[i] for i in xrange(len(a))]
PS: simply using list1 + list2 would only concatenate these two lists.
To add each of the element you need to iterate through the lists.

You can get "Output2" canonically with sum() and zip():
result = [sum(item) for item in zip(list1, list2)]
If you put each list1, list2, etc. into a container (such as a tuple or another list, e.g. lists = [list1, list2]) you can instead use zip(*lists) and then not have to change that code for any quantity of lists.

Related

best way to create a numpy array from a list and additional individual values

I want to create an array from list entries and some additional individual values.
I am using the following approach which seems clumsy:
x=[1,2,3]
y=some_variable1
z=some_variable2
x.append(y)
x.append(z)
arr = np.array(x)
#print arr --> [1 2 3 some_variable1 some_variable2]
is there a better solution to the problem?
You can use list addition to add the variables all placed in a list to the larger one, like so:
arr = np.array(x + [y, z])
Appending or concatenating lists is fine, and probably fastest.
Concatenating at the array level works as well
In [456]: np.hstack([x,y,z])
Out[456]: array([1, 2, 3, 4, 5])
This is compact, but under the covers it does
np.concatenate([np.array(x),np.array([y]),np.array([z])])

Intersection of two lists, keeping duplicates in the first list

I have two flat lists where one of them contains duplicate values.
For example,
array1 = [1,4,4,7,10,10,10,15,16,17,18,20]
array2 = [4,6,7,8,9,10]
I need to find values in array1 that are also in array2, KEEPING THE DUPLICATES in array1.
Desired outcome will be
result = [4,4,7,10,10,10]
I want to avoid loops as actual arrays will contain over millions of values.
I have tried various set and intersect combinations, but just couldn't keep the duplicates..
What do you mean you don't want to use loops? You're going to have to iterate over it one way or another. Just take in each item individually and check if it's in array2 as you go:
items = set(array2)
found = [i for i in array1 if i in items]
Furthermore, depending on how you are going to use the result, consider having a generator:
found = (i for i in array1 if i in array2)
so that you won't have to have the whole thing in memory all at once.
There following will do it:
array1 = [1,4,4,7,10,10,10,15,16,17,18,20]
array2 = [4,6,7,8,9,10]
set2 = set(array2)
print [el for el in array1 if el in set2]
It keeps the order and repetitions of elements in array1.
It turns array2 into a set for faster lookups. Note that this is only beneficial if array2 is sufficiently large; if array2 is small, it may be more performant to keep it as a list.
Following on from #Alex's answer, if you also want to extract the indices for each token, then here's how:
found = [[index,i] for index,i in enumerate(array1) if i in array2]

Basic python: how to increase value of item in list [duplicate]

This question already has answers here:
Why does this iterative list-growing code give IndexError: list assignment index out of range? How can I repeatedly add (append) elements to a list?
(9 answers)
Closed 4 months ago.
This is such a simple issue that I don't know what I'm doing wrong. Basically I want to iterate through the items in an empty list and increase each one according to some criteria. This is an example of what I'm trying to do:
list1 = []
for i in range(5):
list1[i] = list1[i] + 2*i
This fails with an list index out of range error and I'm stuck. The expected result (what I'm aiming at) would be a list with values:
[0, 2, 4, 6, 8]
Just to be more clear: I'm not after producing that particular list. The question is about how can I modify items of an empty list in a recursive way. As gnibbler showed below, initializing the list was the answer. Cheers.
Ruby (for example) lets you assign items beyond the end of the list. Python doesn't - you would have to initialise list1 like this
list1 = [0] * 5
So when doing this you are actually using i so you can just do your math to i and just set it to do that. there is no need to try and do the math to what is going to be in the list when you already have i. So just do list comprehension:
list1 = [2*i for i in range(5)]
Since you say that it is more complex, just don't use list comprehension, edit your for loop as such:
for i in range(5):
x = 2*i
list1[i] = x
This way you can keep doing things until you finally have the outcome you want, store it in a variable, and set it accordingly! You could also do list1.append(x), which I actually prefer because it will work with any list even if it's not in order like a list made with range
Edit: Since you want to be able to manipulate the array like you do, I would suggest using numpy! There is this great thing called vectorize so you can actually apply a function to a 1D array:
import numpy as np
list1 = range(5)
def my_func(x):
y = x * 2
vfunc = np.vectorize(my_func)
vfunc(list1)
>>> array([0, 2, 4, 6, 8])
I would advise only using this for more complex functions, because you can use numpy broadcasting for easy things like multiplying by two.
Your list is empty, so when you try to read an element of the list (right hand side of this line)
list1[i] = list1[i] + 2*i
it doesn't exist, so you get the error message.
You may also wish to consider using numpy. The multiplication operation is overloaded to be performed on each element of the array. Depending on the size of your list and the operations you plan to perform on it, using numpy very well may be the most efficient approach.
Example:
>>> import numpy
>>> 2 * numpy.arange(5)
array([0, 2, 4, 6, 8])
I would instead write
for i in range(5):
list1.append(2*i)
Yet another way to do this is to use the append method on your list. The reason you're getting an out of range error is because you're saying:
list1 = []
list1.__getitem__(0)
and then manipulate this item, BUT that item does not exist since your made an empty list.
Proof of concept:
list1 = []
list1[1]
IndexError: list index out of range
We can, however, append new stuff to this list like so:
list1 = []
for i in range(5):
list1.append(i * 2)

Sorting based on one of the list among Nested list in python

I have a list as [[4,5,6],[2,3,1]]. Now I want to sort the list based on list[1] i.e. output should be [[6,4,5],[1,2,3]]. So basically I am sorting 2,3,1 and maintaining the order of list[0].
While searching I got a function which sorts based on first element of every list but not for this. Also I do not want to recreate list as [[4,2],[5,3],[6,1]] and then use the function.
Since [4, 5, 6] and [2, 3, 1] serves two different purposes I will make a function taking two arguments: the list to be reordered, and the list whose sorting will decide the order. I'll only return the reordered list.
This answer has timings of three different solutions for creating a permutation list for a sort. Using the fastest option gives this solution:
def pyargsort(seq):
return sorted(range(len(seq)), key=seq.__getitem__)
def using_pyargsort(a, b):
"Reorder the list a the same way as list b would be reordered by a normal sort"
return [a[i] for i in pyargsort(b)]
print using_pyargsort([4, 5, 6], [2, 3, 1]) # [6, 4, 5]
The pyargsort method is inspired by the numpy argsort method, which does the same thing much faster. Numpy also has advanced indexing operations whereby an array can be used as an index, making possible very quick reordering of an array.
So if your need for speed is great, one would assume that this numpy solution would be faster:
import numpy as np
def using_numpy(a, b):
"Reorder the list a the same way as list b would be reordered by a normal sort"
return np.array(a)[np.argsort(b)].tolist()
print using_numpy([4, 5, 6], [2, 3, 1]) # [6, 4, 5]
However, for short lists (length < 1000), this solution is in fact slower than the first. This is because we're first converting the a and b lists to array and then converting the result back to list before returning. If we instead assume you're using numpy arrays throughout your application so that we do not need to convert back and forth, we get this solution:
def all_numpy(a, b):
"Reorder array a the same way as array b would be reordered by a normal sort"
return a[np.argsort(b)]
print all_numpy(np.array([4, 5, 6]), np.array([2, 3, 1])) # array([6, 4, 5])
The all_numpy function executes up to 10 times faster than the using_pyargsort function.
The following logaritmic graph compares these three solutions with the two alternative solutions from the other answers. The arguments are two randomly shuffled ranges of equal length, and the functions all receive identically ordered lists. I'm timing only the time the function takes to execute. For illustrative purposes I've added in an extra graph line for each numpy solution where the 60 ms overhead for loading numpy is added to the time.
As we can see, the all-numpy solution beats the others by an order of magnitude. Converting from python list and back slows the using_numpy solution down considerably in comparison, but it still beats pure python for large lists.
For a list length of about 1'000'000, using_pyargsort takes 2.0 seconds, using_nympy + overhead is only 1.3 seconds, while all_numpy + overhead is 0.3 seconds.
The sorting you describe is not very easy to accomplish. The only way that I can think of to do it is to use zip to create the list you say you don't want to create:
lst = [[4,5,6],[2,3,1]]
# key = operator.itemgetter(1) works too, and may be slightly faster ...
transpose_sort = sorted(zip(*lst),key = lambda x: x[1])
lst = zip(*transpose_sort)
Is there a reason for this constraint?
(Also note that you could do this all in one line if you really want to:
lst = zip(*sorted(zip(*lst),key = lambda x: x[1]))
This also results in a list of tuples. If you really want a list of lists, you can map the result:
lst = map(list, lst)
Or a list comprehension would work as well:
lst = [ list(x) for x in lst ]
If the second list doesn't contain duplicates, you could just do this:
l = [[4,5,6],[2,3,1]] #the list
l1 = l[1][:] #a copy of the to-be-sorted sublist
l[1].sort() #sort the sublist
l[0] = [l[0][l1.index(x)] for x in l[1]] #order the first sublist accordingly
(As this saves the sublist l[1] it might be a bad idea if your input list is huge)
How about this one:
a = [[4,5,6],[2,3,1]]
[a[0][i] for i in sorted(range(len(a[1])), key=lambda x: a[1][x])]
It uses the principal way numpy does it without having to use numpy and without the zip stuff.
Neither using numpy nor the zipping around seems to be the cheapest way for giant structures. Unfortunately the .sort() method is built into the list type and uses hard-wired access to the elements in the list (overriding __getitem__() or similar does not have any effect here).
So you can implement your own sort() which sorts two or more lists according to the values in one; this is basically what numpy does.
Or you can create a list of values to sort, sort that, and recreate the sorted original list out of it.

list comprehension to merge various lists in python

I need to plot a lot of data samples, each stored in a list of integers. I want to create a list from a lot of concatenated lists, in order to plot it with enumerate(big_list) in order to get a fixed-offset x coordinate.
My current code is:
biglist = []
for n in xrange(number_of_lists):
biglist.extend(recordings[n][chosen_channel])
for x,y in enumerate(biglist):
print x,y
Notes: number_of_lists and chosen_channel are integer parameters defined elsewhere, and print x,y is for example (actually there are other statements to plot the points.
My question is:
is there a better way, for example, list comprehensions or other operation, to achieve the same result (merged list) without the loop and the pre-declared empty list?
Thanks
import itertools
for x,y in enumerate(itertools.chain(*(recordings[n][chosen_channel] for n in xrange(number_of_lists))):
print x,y
You can think of itertools.chain() as managing an iterator over the individual lists. It remembers which list and where in the list you are. This saves you all memory you would need to create the big list.
>>> import itertools
>>> l1 = [2,3,4,5]
>>> l2=[9,8,7]
>>> itertools.chain(l1,l2)
<itertools.chain object at 0x100429f90>
>>> list(itertools.chain(l1,l2))
[2, 3, 4, 5, 9, 8, 7]

Categories

Resources