Checking for key in dict during comprehension - python

Is it possible to do something like this:
l = [1, 2, 2, 3, 4, 4, 1, 1]
d = {num: [num] if num not in d else d[num].append(num) for num in l}
Inherently, I wouldn't think so, without declaring d = {} first; even then, it doesn't append:
Output: {1: [1], 2: [2], 3: [3], 4: [4]}
# desired: {1: [1, 1, 1], 2: [2, 2], 3: [3], 4: [4, 4]}
Could use a defaultdict, curious if the comprehension is even possible?

No, it's not possible. If you think about, it will make sense why. When Python evaluates an assignment statement, it first evaluates the right-hand side of the assignment - the expression. Since it hasn't evaluated the entire assignment yet, the variable on the left-hand hasn't been added to the current namespace yet. Thus, while the expression is being evaluated, the variable will be undefined.
As suggested, you can use collections.defaultdict to accomplish what you want:
>>> from collections import defaultdict
>>>
>>> l = [1, 2, 2, 3, 4, 4, 1, 1]
>>> d = defaultdict(list)
>>> for num in l:
d[num].append(num)
>>> d
defaultdict(<class 'list'>, {1: [1, 1, 1], 2: [2, 2], 3: [3], 4: [4, 4]})
>>>

d doesn't exist in your dictionary comprehension.
Why not:
l = [1, 2, 2, 3, 4, 4, 1, 1]
d = {num: [num] * l.count(num) for num in set(l)}
EDIT: I think, it is better to use a loop there
d = {}
for item in l:
d.setdefault(item, []).append(item)

No, you cannot refer to your list comprehension before the comprehension is assigned to a variable.
But you can use collections.Counter to limit those costly list.append calls.
from collections import Counter
l = [1, 2, 2, 3, 4, 4, 1, 1]
c = Counter(l)
d = {k: [k]*v for k, v in c.items()}
# {1: [1, 1, 1], 2: [2, 2], 3: [3], 4: [4, 4]}
Related: Create List of Single Item Repeated n Times in Python

Related

Python find frequency of numbers in list of lists

I have a list of lists of int as shown below
[[1, 2, 3],
[1, 5],
[4, 2, 6]]
I want to generate the frequency of the numbers in the lists as a dict, for example 1 occurs in 2 of the lists, and so on, expected output is
{1:2,
2:2,
3:1,
4:1,
5:1,
6:1}
How can this be generated?
You could try this:
L is your list of list.
expected = {1:2,
2:2,
3:1,
4:1,
5:1,
6:1}
>>> from itertools import chain
>>> from collections import Counter
>>> flattened = list(chain.from_iterable(L))
>>> flattened
[1, 2, 3, 1, 5, 4, 2, 6]
>>> counts = Counter(flattened)
>>> counts
Counter({1: 2, 2: 2, 3: 1, 5: 1, 4: 1, 6: 1})
# It's easy to make it to a function or one-liner too.
>>> counts = Counter(chain.from_iterable(L))
>>> assert counts == expected # your expected result shown above
# silence means matching.
you can use Counter for this
>>> from collections import Counter as c
>>> array = [[1, 2, 3],[1,5],[4,2,6]]
>>> result = c()
>>> for sublist in array:
... result += c(sublist)
...
>>> result
Counter({1: 2, 2: 2, 3: 1, 5: 1, 4: 1, 6: 1})

Creating a random dictionary in python

I would like to create a random dictionary starting from the following array:
list = [1,2,3,4,5]
In particular, I would like to have as keys of the dictionary all the elements of the array and as corresponding keys, randomly pick some values from that array except the value that corresponds to the key
An example of expected output should be something like:
Randomdict = {1: [2, 4], 2: [1,3,5] 3: [2] 4: [2,3,5] 5: [1,2,3,4]}
And last but not least all keys should have at least 1 value
It can be done with the random module and comprehensions:
from random import sample, randrange
d = {i: sample([j for j in lst if i != j], randrange(1, len(lst) - 1))
for i in lst}
If you first use random.seed(0) for reproducible data, you will get:
{1: [3, 2], 2: [4, 3], 3: [2, 4], 4: [3, 1]}
{1: [3], 2: [1], 3: [4, 1], 4: [1, 3]}
{1: [3, 2], 2: [3, 4], 3: [4], 4: [2, 3]}
Something like this? Might needs some tweaks
from random import randrange, sample
q = [1, 2, 3, 4, 5]
a = {}
for i in q:
n = randrange(len(q)-1)
a[i] = sample(q, n)
print(a)

Mapping two list without looping

I have two lists of equal length. The first list l1 contains data.
l1 = [2, 3, 5, 7, 8, 10, ... , 23]
The second list l2 contains the category the data in l1 belongs to:
l2 = [1, 1, 2, 1, 3, 4, ... , 3]
How can I partition the first list based on the positions defined by numbers such as 1, 2, 3, 4 in the second list, using a list comprehension or lambda function. For example, 2, 3, 7 from the first list belongs to the same partition as they have corresponding values in the second list.
The number of partitions is known at the beginning.
You can use a dictionary:
>>> l1 = [2, 3, 5, 7, 8, 10, 23]
>>> l2 = [1, 1, 2, 1, 3, 4, 3]
>>> d = {}
>>> for i, j in zip(l1, l2):
... d.setdefault(j, []).append(i)
...
>>>
>>> d
{1: [2, 3, 7], 2: [5], 3: [8, 23], 4: [10]}
If a dict is fine, I suggest using a defaultdict:
>>> from collections import defaultdict
>>> d = defaultdict(list)
>>> for number, category in zip(l1, l2):
... d[category].append(number)
...
>>> d
defaultdict(<type 'list'>, {1: [2, 3, 7], 2: [5], 3: [8, 23], 4: [10]})
Consider using itertools.izip for memory efficiency if you are using Python 2.
This is basically the same solution as Kasramvd's, but I think the defaultdict makes it a little easier to read.
This will give a list of partitions using list comprehension :
>>> l1 = [2, 3, 5, 7, 8, 10, 23]
>>> l2 = [1, 1, 2, 1, 3, 4, 3]
>>> [[value for i, value in enumerate(l1) if j == l2[i]] for j in set(l2)]
[[2, 3, 7], [5], [8, 23], [10]]
A nested list comprehension :
[ [ l1[j] for j in range(len(l1)) if l2[j] == i ] for i in range(1, max(l2)+1 )]
If it is reasonable to have your data stored in numpy's ndarrays you can use extended indexing
{i:l1[l2==i] for i in set(l2)}
to construct a dictionary of ndarrays indexed by category code.
There is an overhead associated with l2==i (i.e., building a new Boolean array for each category) that grows with the number of categories, so that you may want to check which alternative, either numpy or defaultdict, is faster with your data.
I tested with n=200000, nc=20 and numpy was faster than defaultdict + izip (124 vs 165 ms) but with nc=10000 numpy was (much) slower (11300 vs 251 ms)
Using some itertools and operator goodies and a sort you can do this in a one liner:
>>> l1 = [2, 3, 5, 7, 8, 10, 23]
>>> l2 = [1, 1, 2, 1, 3, 4, 3]
>>> itertools.groupby(sorted(zip(l2, l1)), operator.itemgetter(0))
The result of this is a itertools.groupby object that can be iterated over:
>>> for g, li in itertools.groupby(sorted(zip(l2, l1)), operator.itemgetter(0)):
>>> print(g, list(map(operator.itemgetter(1), li)))
1 [2, 3, 7]
2 [5]
3 [8, 23]
4 [10]
This is not a list comprehension but a dictionary comprehension. It resembles #cromod's solution but preserves the "categories" from l2:
{k:[val for i, val in enumerate(l1) if k == l2[i]] for k in set(l2)}
Output:
>>> l1
[2, 3, 5, 7, 8, 10, 23]
>>> l2
[1, 1, 2, 1, 3, 4, 3]
>>> {k:[val for i, val in enumerate(l1) if k == l2[i]] for k in set(l2)}
{1: [2, 3, 7], 2: [5], 3: [8, 23], 4: [10]}
>>>

extracting item with most common probability in python list

I have a list [[1, 2, 7], [1, 2, 3], [1, 2, 3, 7], [1, 2, 3, 5, 6, 7]] and I need [1,2,3,7] as final result (this is kind of reverse engineering). One logic is to check intersections -
while(i<dlistlen):
j=i+1
while(j<dlistlen):
il = dlist1[i]
jl = dlist1[j]
tmp = list(set(il) & set(jl))
print tmp
#print i,j
j=j+1
i=i+1
this is giving me output :
[1, 2]
[1, 2, 7]
[1, 2, 7]
[1, 2, 3]
[1, 2, 3]
[1, 2, 3, 7]
[]
Looks like I am close to getting [1,2,3,7] as my final answer, but can't figure out how. Please note, in the very first list (([[1, 2, 7], [1, 2, 3], [1, 2, 3, 7], [1, 2, 3, 5, 6, 7]] )) there may be more items leading to one more final answer besides [1,2,3,4]. But as of now, I need to extract only [1,2,3,7] .
Please note, this is not kind of homework, I am creating own clustering algorithm that fits my need.
You can use the Counter class to keep track of how often elements appear.
>>> from itertools import chain
>>> from collections import Counter
>>> l = [[1, 2, 7], [1, 2, 3], [1, 2, 3, 7], [1, 2, 3, 5, 6, 7]]
>>> #use chain(*l) to flatten the lists into a single list
>>> c = Counter(chain(*l))
>>> print c
Counter({1: 4, 2: 4, 3: 3, 7: 3, 5: 1, 6: 1})
>>> #sort keys in order of descending frequency
>>> sortedValues = sorted(c.keys(), key=lambda x: c[x], reverse=True)
>>> #show the four most common values
>>> print sortedValues[:4]
[1, 2, 3, 7]
>>> #alternatively, show the values that appear in more than 50% of all lists
>>> print [value for value, freq in c.iteritems() if float(freq) / len(l) > 0.50]
[1, 2, 3, 7]
It looks like you're trying to find the largest intersection of two list elements. This will do that:
from itertools import combinations
# convert all list elements to sets for speed
dlist = [set(x) for x in dlist]
intersections = (x & y for x, y in combinations(dlist, 2))
longest_intersection = max(intersections, key=len)

Merge List of lists where sublists have common elements

I have a list of lists like this
list = [[1, 2], [1, 3], [4, 5]]
and as you see the first element of the first two sublists is repeated
So I want my output too be:
list = [[1, 2, 3], [4, 5]]
Thank you
The following code should solve your problem:
def merge_subs(lst_of_lsts):
res = []
for row in lst_of_lsts:
for i, resrow in enumerate(res):
if row[0]==resrow[0]:
res[i] += row[1:]
break
else:
res.append(row)
return res
Note that the elsebelongs to the inner for and is executed if the loop is exited without hitting the break.
I have a solution that builds a dict first with the 1st values, then creates a list from that, but the order may not be the same (i.e. [4, 5] may be before [1, 2, 3]):
>>> from collections import defaultdict
>>> d = defaultdict(list)
>>> map(lambda x: d[x[0]].append(x[1]), l)
[None, None, None]
>>> d
defaultdict(<type 'list'>, {1: [2, 3], 4: [5]})
>>> [[key] + list(val) for key, val in d.iteritems()]
[[1, 2, 3], [4, 5]]
You can use python sets, because you can compute intersection and union pretty easy. The code would be more clear, but the complexity would probably be comparable to the other solutions.
Although arguably unreadable:
# Note the _ after the list, otherwise you are redefining the list type in your scope
list_ = [[1, 2], [1, 3], [4, 5]]
from itertools import groupby
grouper = lambda l: [[k] + sum((v[1::] for v in vs), []) for k, vs in groupby(l, lambda x: x[0])]
print grouper(list_)
A more readable variant:
from collections import defaultdict
groups = defaultdict(list)
for vs in list_:
group[vs[0]] += vs[1:]
print group.items()
Note that these solve a more generic form of your problem, instead of [[1, 2], [1, 3], [4, 5]] you could also have something like this: [[1, 2, 3], [1, 4, 5], [2, 4, 5, 6], [3]]
Explanation about the _. This is why you don't want to overwrite list:
spam = list()
print spam
# returns []
list = spam
print list
# returns []
spam = list()
# TypeError: 'list' object is not callable
As you can see above, by setting list = spam we broke the default behaviour of list().

Categories

Resources