Efficient way to compare elements in two lists? - python

I know this is similar to Efficient way to compare elements in 2 lists, but I have an extension on the question basically.
Say I have two lists:
a = [1,2,4,1,0,3,2]
b = [0,1,2,3,4]
I want to find out the indices of a where the element is equal to each element of b.
For instance, I would want the sample output for b[1] to tell me that a = b[1] at [0,3].
A data frame output would be useful as well, something like:
b index_a
0 4
1 0
1 3
2 1
2 6
3 5
4 3
What I used before was:
b = pd.DataFrame(b)
a = pd.DataFrame(a)
pd.merge(b.reset_index(),a.reset_index(),
left_on=b.columns.tolist(),
right_on = a.columns.tolist(),
suffixes = ('_b','_a'))['index_b','index_a']]
However, I am unsure if this is necessary since these are for lists. ( I used this method previously when I was working with dataframes ).
I am doing this operation thousands of times with much larger lists so I am wondering if there is a more efficient method.
In addition, b is just list(range(X)) where in this case X = 5
If anyone has some input I'd greatly appreciate it!
Thanks

A very simple and efficient solution is to build a mapping from the values in the range 0..N-1 to indices of a. The mapping can be a simple list, so you end up with:
indices = [[] for _ in b]
for i, x in enumerate(a):
indices[x].append(i)
Example run:
>>> a = [1,2,4,1,0,3,2]
>>> b = [0,1,2,3,4]
>>> indices = [[] for _ in b]
>>> for i,x in enumerate(a):
... indices[x].append(i)
...
>>> indices[1]
[0, 3]
Note that b[i] == i so keeping the b list is pretty useless.

import collections
dd=collections.defaultdict(list)
for i,x in enumerate(a):
dd[x].append(i)
>>> sorted(dd.items())
[(0, [4]), (1, [0, 3]), (2, [1, 6]), (3, [5]), (4, [2])]

If b is sorted consecutive integers as you shown here, then bucket sort is most effective.
Otherwise, you may construct a hash table, with value b as the key, and construction a list of a's as values.

I'm not sure if this is efficient enough for your needs, but this would work:
from collections import defaultdict
indexes = defaultdict(set)
a = [1,2,4,1,0,3,2]
b = [0,1,2,3,4]
for i, x in enumerate(a):
indexes[x].add(i)
for x in b:
print b, indexes.get(x)

Related

Python sum values from multiple lists (more than two)

Looking for a pythonic way to sum values from multiple lists:
I have got the following list of lists:
a = [0,5,2]
b = [2,1,1]
c = [1,1,1]
d = [5,3,4]
my_list = [a,b,c,d]
I am looking for the output:
[8,10,8]
I`ve used:
print ([sum(x) for x in zip(*my_list )])
but zip only works when I have 2 elements in my_list.
Any idea?
zip works for an arbitrary number of iterables:
>>> list(map(sum, zip(*my_list)))
[8, 10, 8]
which is, of course, roughly equivalent to your comprehension which also works:
>>> [sum(x) for x in zip(*my_list)]
[8, 10, 8]
Numpy has a nice way of doing this, it is also able to handle very large arrays. First we create the my_list as a numpy array as such:
import numpy as np
a = [0,5,2]
b = [2,1,1]
c = [1,1,1]
d = [5,3,4]
my_list = np.array([a,b,c,d])
To get the sum over the columns, you can do the following
np.sum(my_list, axis=0)
Alternatively, the sum over the rows can be retrieved by
np.sum(my_list, axis=1)
I'd make it a numpy array and then sum along axis 0:
my_list = numpy.array([a,b,c,d])
my_list.sum(axis=0)
Output:
[ 8 10 8]

In-place modification of Python lists

I am trying to perform in-place modification of a list of list on the level of the primary list. However, when I try to modify the iterating variable (row in the example below), it appears to create a new pointer to it rather than modifying it.
Smallest example of my problem.
c = [1,2,3]
for x in c:
x = x + 3
print(c) #returns [1,2,3], expected [4,5,6]
The above example is a trivial example of my problem. Is there a way to modify x elementwise, in-place and have the changes appear in C?
Less trivial example of my problem. I am switching all 0's to 1's and vice-versa.
A = [[1,1,0],
[1,0,1],
[0,0,0]]
for row in A:
row = list(map(lambda val: 1 - val, row))
print(A)
Expected
A = [[0,0,1],
[0,1,0],
[1,1,1]]
Returned
A = [[1,1,0],
[1,0,1],
[0,0,0]]
update:
Great answers so far. I am interested how the iterating variable (row in the second example) is linked to the iterable variable (A in the second example).
If I do the following, which reverses each sublist of A, it works perfectly.
Why does the following example, where I modify the iterating variable works but the above examples do not?
A = [[1,1,0],
[1,0,1],
[0,0,0]]
for row in A:
row.reverse()
print(A)
#returns, as expected
A = [[0, 1, 1],
[1, 0, 1],
[0, 0, 0]]
I found this in the docs: https://docs.python.org/3/tutorial/controlflow.html#for
Python’s for statement iterates over the items of any sequence (a list
or a string), in the order that they appear in the sequence.
If you need to modify the sequence you are iterating over while inside
the loop (for example to duplicate selected items), it is recommended
that you first make a copy. Iterating over a sequence does not
implicitly make a copy.
I was wrong in my first response, when iterating through a list it returns the actual items in that list. However, it seems they cannot be edited directly while they are being iterated through. This is why iterating through the integers the length of the list works.
As for why the .reverse() function works, I think it's because it is affecting a list instead of a value. I tried to use similar built in functions on nonlist datatypes like .replace() on strings and it had no effect.
All of the other list functions I tried worked: .append(), .remove(), and .reverse() as you showed. I'm not sure why this is, but I hope it clears up what you can do in for loops a bit more.
Answer to old question below:
The way you are using the for loops doesn't affect the actual list, just the temporary variable that is iterating through the list. There are a few ways you can fix this. Instead of iterating through each element you can can count up to the length of the list and modify the list directly.
c = [1,2,3]
for n in range(len(c)):
c[n] += 3
print(c)
You can also use the enumerate() function to iterate through both a counter and list items.
c = [1,2,3]
for n, x in enumerate(c):
c[n] = x + 3
print(c)
In this case, n is a counter and x is the item in the list.
Finally, you can use list comprehension to generate a new list with desired differences in one line.
c = [1, 2, 3]
d = [x + 3 for x in c]
print(d)
The usual way to poke values into an existing list in Python is to use enumerate which lets you iterate over both the indices and the values at once -- then use the indices to manipulate the list:
c = [1,2,3]
for index, value in enumerate(c):
c[index] = value + 3
For your second example you'd do almost the same:
A = [[1,1,0],
[1,0,1],
[0,0,0]]
for row in A:
for index, val in row:
row[index] = 0 if val > 0 else 1
In the second example the list objects in A become the loop variable row -- and since you're only mutating them (not assigning to them) you don't need enumerate and the index
If you want to keep it consice without creating an additional variable, you could also do:
c = [1,2,3]
print(id(c))
c[:] = [i+3 for i in c]
print(c, id(c))
Output:
2881750110600
[4, 5, 6] 2881750110600
Using list comprehension here also will work:
A = [[1,1,0],
[1,0,1],
[0,0,0]]
A = [[0 if x > 0 else 1 for x in row] for row in A]
print(A)
Output:
[[0, 0, 1],
[0, 1, 0],
[1, 1, 1]]

Python: how to find common values in three lists

I try to find common list of values for three different lists:
a = [1,2,3,4]
b = [2,3,4,5]
c = [3,4,5,6]
of course naturally I try to use the and operator however that way I just get the value of last list in expression:
>> a and b and c
out: [3,4,5,6]
Is any short way to find the common values list:
[3,4]
Br
Use sets:
>>> a = [1, 2, 3, 4]
>>> b = [2, 3, 4, 5]
>>> c = [3, 4, 5, 6]
>>> set(a) & set(b) & set(c)
{3, 4}
Or as Jon suggested:
>>> set(a).intersection(b, c)
{3, 4}
Using sets has the benefit that you don’t need to repeatedly iterate the original lists. Each list is iterated once to create the sets, and then the sets are intersected.
The naive way to solve this using a filtered list comprehension as Geotob did will iterate lists b and c for each element of a, so for longer list, this will be a lot less efficient.
out = [x for x in a if x in b and x in c]
is a quick and simple solution. This constructs a list out with entries from a, if those entries are in b and c.
For larger lists, you want to look at the answer provided by #poke
For those still stumbling uppon this question, with numpy one can use:
np.intersect1d(array1, array2)
This works with lists as well as numpy arrays.
It could be extended to more arrays with the help of functools.reduce, or it can simply be repeated for several arrays.
from functools import reduce
reduce(np.intersect1d, (array1, array2, array3))
or
new_array = np.intersect1d(array1, array2)
np.intersect1d(new_array, array3)

Binary list from indices of ascending integer list

I have an ascending list of integers e that starts from 0 and I would like to have a binary list b whose i-th element is 1 if and only if i belongs to e.
For example, if e=[0,1,3,6], then this binary list should be [1,1,0,1,0,0,1],
where the first 1 is because 0 is in e, the second 1 is because 1 is in e, the
third 0 is because 2 is not in e, and so on.
You can find my code for that below.
My question is: is there something built-in in python for that? If not, is my
approach the most efficient?
def list2bin(e):
b=[1]
j=1
for i in range(1, e[-1]+1):
if i==e[j]:
b.append(1)
j+=1
else:
b.append(0)
return(b)
This can be done with a list comprehension, and in case e is huge then better convert it to a set first:
>>> e = [0, 1, 3, 6]
>>> [int(i in e) for i in xrange(0, e[-1]+1)]
[1, 1, 0, 1, 0, 0, 1]
The in operator returns True/False if an item is found in the list, you can convert that bool to an integer using int. Note that for lists the in is O(N) operation, so if e is large then converting it to a set will provide you much more efficiency.
I don't think there are a built-in way to do that. But you can use List Comprehensions:
a = [ 1 if i in e else 0 for i in range(1, e[-1]+1) ]
Get fun.

python: Matching between indexes in a matrix and list

Suppose you have tuple of tuples(looks like a matrix).
Now I want to change its contents, so I convert it into a list.
Suppose that I have the number of rows and cols of the matrix.
How can I match between the indexes in the matrix to the indexes in the list?
Thanx in advance.
With a list, you can simply use the [] operator again. So, for example:
>>> a = [[1,2], [3, 4]]
>>> a[0][0]
1
>>> a[0][1]
2
>>> a[1][0]
3
>>> a[1][1]
4
>>> type(a)
<type 'list'>
>>> type(a[0])
<type 'list'>
>>> type(a[0][0])
<type 'int'>
The explanation is simple, the first time, you use the [] operator, you get a list, so you can use the [] operator again. Like this, you can emulate a matrix.
If you want to find the indexes, then you can use this nifty little function:
def finder(value_to_find, matrix):
for r, row in enumerate(matrix):
for c, col in enumerate(row):
if col == value_to_find:
return r, c
And, for a demo:
>>> a = [[1,2], [3, 4]]
>>> a[0][0]
1
>>> a[0][1]
2
>>> a[1][0]
3
>>> a[1][1]
4
>>> def finder(value_to_find, matrix):
for r, row in enumerate(matrix):
for c, col in enumerate(row):
if col == value_to_find:
return r, c
>>> finder(4, a)
(1, 1)
And here is an explanation with comments:
def finder(value_to_find, matrix):
"""
Function to find the indexes of a given value, on first notice
#param value_to_find: The value we need to find
#param matrix: The matrix we are to work with
#return: A tuple of row, column
"""
# Looping over the rows (lists within the main matrix)
for r, row in enumerate(matrix): # Using enumerate returns the index, r and the value row (which is a list)
for c, col in enumerate(row): # Looping over the values in each row, with index c and value col
if col == value_to_find: # If the col is equal to the value we want, then we return the row, column tuple
return r, c
If you have a 1-D matrix, then you can look at this solution from Hyperborius:
listindex = row * length_of_row + column
a = [[1,2], [3, 4]]
if a is the matrix, we can access the individual elements by using two parameters, row and column. Here, row refers to the number of list and column will refer to the position of the element in the list.
So, column is nothing but the normal way of referencing an element in the list and row is nothing but the normal way of referencing a list inside a list. Both row and column are zero based indices.
The format is
a[row][column]
when we say
a[row]
it means that, from the list of lists, get the list at position row and when we say
a[row][column]
we say that, from the list which we want, pick the element at position column.
Assuming you have a tuple of tuples:
>>> a = ((1,2,3),(4,5,6),(7,8,9))
which you converted to a flat list, presumably using a technique like this (credits to Joel Cornett, https://stackoverflow.com/a/10636583/219229):
>>> b = list(sum(a, ()))
>>> b
[1, 2, 3, 4, 5, 6, 7, 8, 9]
Such that b effectively lost the original multi-dimensional indexing. If you already know the original index from a, you can calculate its index in b as following:
>>> matrix_width = len(a[0])
... (assuming you have the indices i,j) ...
>>> index_in_b = j*matrix_width + i
>>> item_to_find = b[index_in_b]
If you expand this to multi-dimensional arrays, like 3d-array, and you have indices i, j, k, then it should be index = i + (j * width) + (k * width * span), where width = a[0][0], and span = a[0]
P/S: Just in case you want to convert it to list of lists, here are some shorthands you can use:
>>> b = list[map(list,a)] # using map
>>> b = [list(x) for x in a] # using list comprehension

Categories

Resources