Mapping two list without looping - python

I have two lists of equal length. The first list l1 contains data.
l1 = [2, 3, 5, 7, 8, 10, ... , 23]
The second list l2 contains the category the data in l1 belongs to:
l2 = [1, 1, 2, 1, 3, 4, ... , 3]
How can I partition the first list based on the positions defined by numbers such as 1, 2, 3, 4 in the second list, using a list comprehension or lambda function. For example, 2, 3, 7 from the first list belongs to the same partition as they have corresponding values in the second list.
The number of partitions is known at the beginning.

You can use a dictionary:
>>> l1 = [2, 3, 5, 7, 8, 10, 23]
>>> l2 = [1, 1, 2, 1, 3, 4, 3]
>>> d = {}
>>> for i, j in zip(l1, l2):
... d.setdefault(j, []).append(i)
...
>>>
>>> d
{1: [2, 3, 7], 2: [5], 3: [8, 23], 4: [10]}

If a dict is fine, I suggest using a defaultdict:
>>> from collections import defaultdict
>>> d = defaultdict(list)
>>> for number, category in zip(l1, l2):
... d[category].append(number)
...
>>> d
defaultdict(<type 'list'>, {1: [2, 3, 7], 2: [5], 3: [8, 23], 4: [10]})
Consider using itertools.izip for memory efficiency if you are using Python 2.
This is basically the same solution as Kasramvd's, but I think the defaultdict makes it a little easier to read.

This will give a list of partitions using list comprehension :
>>> l1 = [2, 3, 5, 7, 8, 10, 23]
>>> l2 = [1, 1, 2, 1, 3, 4, 3]
>>> [[value for i, value in enumerate(l1) if j == l2[i]] for j in set(l2)]
[[2, 3, 7], [5], [8, 23], [10]]

A nested list comprehension :
[ [ l1[j] for j in range(len(l1)) if l2[j] == i ] for i in range(1, max(l2)+1 )]

If it is reasonable to have your data stored in numpy's ndarrays you can use extended indexing
{i:l1[l2==i] for i in set(l2)}
to construct a dictionary of ndarrays indexed by category code.
There is an overhead associated with l2==i (i.e., building a new Boolean array for each category) that grows with the number of categories, so that you may want to check which alternative, either numpy or defaultdict, is faster with your data.
I tested with n=200000, nc=20 and numpy was faster than defaultdict + izip (124 vs 165 ms) but with nc=10000 numpy was (much) slower (11300 vs 251 ms)

Using some itertools and operator goodies and a sort you can do this in a one liner:
>>> l1 = [2, 3, 5, 7, 8, 10, 23]
>>> l2 = [1, 1, 2, 1, 3, 4, 3]
>>> itertools.groupby(sorted(zip(l2, l1)), operator.itemgetter(0))
The result of this is a itertools.groupby object that can be iterated over:
>>> for g, li in itertools.groupby(sorted(zip(l2, l1)), operator.itemgetter(0)):
>>> print(g, list(map(operator.itemgetter(1), li)))
1 [2, 3, 7]
2 [5]
3 [8, 23]
4 [10]

This is not a list comprehension but a dictionary comprehension. It resembles #cromod's solution but preserves the "categories" from l2:
{k:[val for i, val in enumerate(l1) if k == l2[i]] for k in set(l2)}
Output:
>>> l1
[2, 3, 5, 7, 8, 10, 23]
>>> l2
[1, 1, 2, 1, 3, 4, 3]
>>> {k:[val for i, val in enumerate(l1) if k == l2[i]] for k in set(l2)}
{1: [2, 3, 7], 2: [5], 3: [8, 23], 4: [10]}
>>>

Related

Create dictionary where keys are from a list and values are the sum of corresponding elements in another list

I have two lists L1 and L2. Each unique element in L1 is a key which has a value in the second list L2. I want to create a dictionary where the values are the sum of elements in L2 that are associated to the same key in L1.
I did the following but I am not very proud of this code. Is there any simpler pythonic way to do it ?
L = [2, 3, 7, 3, 4, 5, 2, 7, 7, 8, 9, 4] # as L1
W = range(len(L)) # as L2
d = { l:[] for l in L }
for l,w in zip(L,W): d[l].append(w)
d = {l:sum(v) for l,v in d.items()}
EDIT:
Q: How do I know which elements of L2 are associated to a given key element of L1?
A: if they have the same index. For example if the element 7 is repeated 3 times in L1 (e.g. L1[2] == L1[7] == L1[8] = 7), then I want the value of the key 7 to be L2[2]+L2[7]+L2[8]
You can use enumerate() in order to access to item's index while you loop over the list and use collections.defaultdict() (by passing the int as it's missing function which will be evaluated as 0 at first time) to preserve the items and add the values while encounter a duplicate key:
>>> from collections import defaultdict
>>> d = defaultdict(int)
>>> for i,j in enumerate(L):
... d[j]+=i
...
>>> d
defaultdict(<type 'int'>, {2: 6, 3: 4, 4: 15, 5: 5, 7: 17, 8: 9, 9: 10})
If you don't need the intermediate dict of lists you can use the collections.Counter:
import collections
L = [2, 3, 7, 3, 4, 5, 2, 7, 7, 8, 9, 4] # as L1
W = range(len(L)) # as L2
d2 = collections.Counter()
for i, value in enumerate(L):
d2[value] += i
which behaves like a normal dict:
Counter({2: 6, 3: 4, 4: 15, 5: 5, 7: 17, 8: 9, 9: 10})
Hope this may help you.
L = [2, 3, 7, 3, 4, 5, 2, 7, 7, 8, 9, 4] # as L1
dict_a = dict.fromkeys(set(L),0)
for l,w in enumerate(L):
dict_a[w] = int(dict_a[w]) + l

Python: Find-replace on lists

I first want to note that my question is different from what's in this link:
finding and replacing elements in a list (python)
What I want to ask is whether there is some known API or conventional way to achieve such a functionality (If it's not clear, a function/method like my imaginary list_replace() is what I'm looking for):
>>> list = [1, 2, 3]
>>> list_replace(list, 3, [3, 4, 5])
>>> list
[1, 2, 3, 4, 5]
An API with limitation of number of replacements will be better:
>>> list = [1, 2, 3, 3, 3]
>>> list_replace(list, 3, [8, 8], 2)
>>> list
[1, 2, 8, 8, 8, 8, 3]
And another optional improvement is that the input to replace will be a list itself, instead of a single value:
>>> list = [1, 2, 3, 3, 3]
>>> list_replace(list, [2, 3], [8, 8], 2)
>>> list
[1, 8, 8, 3, 3]
Is there any API that looks at least similar and performs these operations, or should I write it myself?
Try;
def list_replace(ls, val, l_insert, num = 1):
l_insert_len = len(l_insert)
indx = 0
for i in range(num):
indx = ls.index(val, indx) #it throw value error if it cannot find an index
ls = ls[:indx] + l_insert + ls[(indx + 1):]
indx += l_insert_len
return ls
This function works for both first and second case;
It wont work with your third requirement
Demo
>>> list = [1, 2, 3]
>>> list_replace(list, 3, [3, 4, 5])
[1, 2, 3, 4, 5]
>>> list = [1, 2, 3, 3, 3]
>>> list_replace(list, 3, [8, 8], 2)
[1, 2, 8, 8, 8, 8, 3]
Note
It returns a new list; The list passed in will not change.
how about this, it work for the 3 requirements
def list_replace(origen,elem,new,cantidad=None):
n=0
resul=list()
len_elem=0
if isinstance(elem,list):
len_elem=len(elem)
for i,x in enumerate(origen):
if x==elem or elem==origen[i:i+len_elem]:
if cantidad and n<cantidad:
resul.extend(new)
n+=1
continue
elif not cantidad:
resul.extend(new)
continue
resul.append(x)
return resul
>>>list_replace([1,2,3,4,5,3,5,33,23,3],3,[42,42])
[1, 2, 42, 42, 4, 5, 42, 42, 5, 33, 23, 42, 42]
>>>list_replace([1,2,3,4,5,3,5,33,23,3],3,[42,42],2)
[1, 2, 42, 42, 4, 5, 42, 42, 5, 33, 23, 3]
>>>list_replace([1,2,3,4,5,3,5,33,23,3],[33,23],[42,42,42],2)
[1, 2, 3, 4, 5, 3, 5, 42, 42, 42, 23, 3]
Given this isn't hard to write, and not a very common use case, I don't think it will be in the standard library. What would it be named, replace_and_flatten? It's quite hard to explain what that does, and justify the inclusion.
Explicit is also better than implicit, so...
def replace_and_flatten(lst, searched_item, new_list):
def _replace():
for item in lst:
if item == searched_item:
yield from new_list # element matches, yield all the elements of the new list instead
else:
yield item # element doesn't match, yield it as is
return list(_replace()) # convert the iterable back to a list
I developed my own function, you are welcome to use and to review it.
Note that in contradiction to the examples in the question - my function creates and returns a new list. It does not modify the provided list.
Working examples:
list = [1, 2, 3]
l2 = list_replace(list, [3], [3, 4, 5])
print('Changed: {0}'.format(l2))
print('Original: {0}'.format(list))
list = [1, 2, 3, 3, 3]
l2 = list_replace(list, [3], [8, 8], 2)
print('Changed: {0}'.format(l2))
print('Original: {0}'.format(list))
list = [1, 2, 3, 3, 3]
l2 = list_replace(list, [2, 3], [8, 8], 2)
print('Changed: {0}'.format(l2))
print('Original: {0}'.format(list))
I always print also the original list, so you can see that it is not modified:
Changed: [1, 2, 3, 4, 5]
Original: [1, 2, 3]
Changed: [1, 2, 8, 8, 8, 8, 3]
Original: [1, 2, 3, 3, 3]
Changed: [1, 8, 8, 3, 3]
Original: [1, 2, 3, 3, 3]
Now, the code (tested with Python 2.7 and with Python 3.4):
def list_replace(lst, source_sequence, target_sequence, limit=0):
if limit < 0:
raise Exception('A negative replacement limit is not supported')
source_sequence_len = len(source_sequence)
target_sequence_len = len(target_sequence)
original_list_len = len(lst)
if source_sequence_len > original_list_len:
return list(lst)
new_list = []
i = 0
replace_counter = 0
while i < original_list_len:
suffix_is_long_enough = source_sequence_len <= (original_list_len - i)
limit_is_satisfied = (limit == 0 or replace_counter < limit)
if suffix_is_long_enough and limit_is_satisfied:
if lst[i:i + source_sequence_len] == source_sequence:
new_list.extend(target_sequence)
i += source_sequence_len
replace_counter += 1
continue
new_list.append(lst[i])
i += 1
return new_list
I developed a function for you (it works for your 3 requirements):
def list_replace(lst,elem,repl,n=0):
ii=0
if type(repl) is not list:
repl = [repl]
if type(elem) is not list:
elem = [elem]
if type(elem) is list:
length = len(elem)
else:
length = 1
for i in range(len(lst)-(length-1)):
if ii>=n and n!=0:
break
e = lst[i:i+length]
if e==elem:
lst[i:i+length] = repl
if n!=0:
ii+=1
return lst
I've tried with your examples and it works ok.
Tests made:
print list_replace([1,2,3], 3, [3, 4, 5])
print list_replace([1, 2, 3, 3, 3], 3, [8, 8], 2)
print list_replace([1, 2, 3, 3, 3], [2, 3], [8, 8], 2)
NOTE: never use list as a variable. I need that object to do the is list trick.

extracting item with most common probability in python list

I have a list [[1, 2, 7], [1, 2, 3], [1, 2, 3, 7], [1, 2, 3, 5, 6, 7]] and I need [1,2,3,7] as final result (this is kind of reverse engineering). One logic is to check intersections -
while(i<dlistlen):
j=i+1
while(j<dlistlen):
il = dlist1[i]
jl = dlist1[j]
tmp = list(set(il) & set(jl))
print tmp
#print i,j
j=j+1
i=i+1
this is giving me output :
[1, 2]
[1, 2, 7]
[1, 2, 7]
[1, 2, 3]
[1, 2, 3]
[1, 2, 3, 7]
[]
Looks like I am close to getting [1,2,3,7] as my final answer, but can't figure out how. Please note, in the very first list (([[1, 2, 7], [1, 2, 3], [1, 2, 3, 7], [1, 2, 3, 5, 6, 7]] )) there may be more items leading to one more final answer besides [1,2,3,4]. But as of now, I need to extract only [1,2,3,7] .
Please note, this is not kind of homework, I am creating own clustering algorithm that fits my need.
You can use the Counter class to keep track of how often elements appear.
>>> from itertools import chain
>>> from collections import Counter
>>> l = [[1, 2, 7], [1, 2, 3], [1, 2, 3, 7], [1, 2, 3, 5, 6, 7]]
>>> #use chain(*l) to flatten the lists into a single list
>>> c = Counter(chain(*l))
>>> print c
Counter({1: 4, 2: 4, 3: 3, 7: 3, 5: 1, 6: 1})
>>> #sort keys in order of descending frequency
>>> sortedValues = sorted(c.keys(), key=lambda x: c[x], reverse=True)
>>> #show the four most common values
>>> print sortedValues[:4]
[1, 2, 3, 7]
>>> #alternatively, show the values that appear in more than 50% of all lists
>>> print [value for value, freq in c.iteritems() if float(freq) / len(l) > 0.50]
[1, 2, 3, 7]
It looks like you're trying to find the largest intersection of two list elements. This will do that:
from itertools import combinations
# convert all list elements to sets for speed
dlist = [set(x) for x in dlist]
intersections = (x & y for x, y in combinations(dlist, 2))
longest_intersection = max(intersections, key=len)

How can I get a set of (possibly overlapping) slices in a Python list based on elements that match a criteria?

Suppose I have a python list l=[1,2,3,4,5]. I would like to find all x-element lists starting with elements that satisfy a function f(e), or the sublist going to the end of l if there aren't enough items. For instance, suppose f(e) is e%2==0, and x=3 I'd like to get [[2,3,4],[4,5]].
Is there an elegant or "pythonic" way to do this?
>>> f = lambda e: e % 2 == 0
>>> x = 3
>>> l = [1, 2, 3, 4, 5]
>>> def makeSublists(lst, length, f):
for i in range(len(lst)):
if f(lst[i]):
yield lst[i:i+length]
>>> list(makeSublists(l, x, f))
[[2, 3, 4], [4, 5]]
>>> list(makeSublists(list(range(10)), 5, f))
[[0, 1, 2, 3, 4], [2, 3, 4, 5, 6], [4, 5, 6, 7, 8], [6, 7, 8, 9], [8, 9]]
Using a list comprehension:
>>> l = range(1,6)
>>> x = 3
>>> def f(e):
return e%2 == 0
>>> [l[i:i+x] for i, j in enumerate(l) if f(j)]
[[2, 3, 4], [4, 5]]

find common data python

Using
def compare_lsts(list1,list2):
first_set = set(list1)
second_set=set(list2)
results =[x for x in list1 if x in list2]
print(results)
and running compare_lsts([1,2,3,4,5],[3,8,9,1,7]) gives the numbers contained in both sets, i.e. [1,3].
However making list 1 contain more than 1 list e.g. compare_lsts([[1,2,3,4,5],[5,8,2,9,12],[3,7,19,4,16]],[3,7,2,16,19]) gives [],[],[].
I have used for list in list1 followed by results for the loop. I clearly don't know what I am doing.
Basically the question is: How does one compare items in one static list with as many lists as there are?
First of all, you already started using sets, so you should definitely use them, as they are faster when checking containment. Also, there are already a few helpful built-in features for sets, so for comparing two lists, you can just intersect the sets to get those items that are in both lists:
>>> set1 = set([1, 2, 3, 4, 5])
>>> set2 = set([3, 8, 9, 1, 7])
>>> set1 & set2
{1, 3}
>>> list(set1 & set2) # in case you need a list as the output
[1, 3]
Similarly, you can also find the union of two sets to get those items that are in any of the sets:
>>> set1 | set2
{1, 2, 3, 4, 5, 7, 8, 9}
So, if you want to find all items from list2 that are in any of list1’s sublists, then you could intersect all the sublists with list2 and then union all those results:
>>> sublists = [set([1, 2, 3, 4, 5]), set([5, 8, 2, 9, 12]), set([3, 7, 19, 4, 16])]
>>> otherset = set([3, 7, 2, 16, 19])
>>> intersections = [sublist & otherset for sublist in sublists]
>>> intersections
[{2, 3}, {2}, {16, 3, 19, 7}]
>>> union = set()
>>> for intersection in intersections:
union = union | intersection
>>> union
{16, 19, 2, 3, 7}
You can also do that a little bit nicer using functools.reduce:
>>> import functools
>>> functools.reduce(set.union, intersections)
{16, 19, 2, 3, 7}
Similarly, if you want to actually intersect those results, you could do that as well:
>>> functools.reduce(set.intersection, intersections)
set()
And finally, you can pack that all in a nice function:
def compareLists (mainList, *otherLists):
mainSet = set(mainList)
otherSets = [set(otherList) for otherList in otherLists]
intersections = [mainSet & otherSet for otherSet in otherSets]
return functools.reduce(set.union, intersections) # or replace with set.intersection
And use it like this:
>>> compareLists([1, 2, 3, 4, 5], [3, 8, 9, 1, 7])
{1, 3}
>>> compareLists([3, 7, 2, 16, 19], [1, 2, 3, 4, 5], [5, 8, 2, 9, 12], [3, 7, 19, 4, 16])
{16, 19, 2, 3, 7}
Note, that I replaced the order of the arguments in the function, so the main list (in your case list2) is mentioned first as that is the one the others are compared to.
If you're after elements from the first that are in all of the lists:
set(first).intersection(second, third) # fourth, fifth, etc...
>>> set([1, 2, 3]).intersection([2, 3, 4], [3, 4, 5])
set([3])
If you're after elements from the first that are in any of the other lists:
>>> set([1, 2, 3]) & set([4]).union([5])
set([2])
So, then a simple func:
def in_all(fst, *rst):
return set(fst).intersection(*rst)
def in_any(fst, *rst):
it = iter(rst)
return set(fst) & set(next(it, [])).union(*it)
Not sure if it's the best way but:
def flat(l):
c_l = []
for i in l:
if isinstance(i,list):
map(c_l.append,i)
else:
c_l.append(i)
return c_l
def compare_lsts(a,b):
if all([True if isinstance(x,list) else False for x in a]): #if there is sublists in a
a = flat(a) #flats a
if all([True if isinstance(x,list) else False for x in b]): #if there is sublists in b
b = flat(b) #flats b
return list(set(a) & set(b)) #intersection between a and b
print (compare_lsts([[1,2,3,4,5],[5,8,2,9,12],[3,7,19,4,16]],[3,7,2,16,19]) #[16, 3, 2, 19, 7])

Categories

Resources