Should I use a dictionary for membership testing? - python

Suppose I have two lists A, B such that A is a subset of B. If I were to parse through the points of B and each time I want to test if an element is a member of A, would representing A as a dictionary be better than as a list? I ask because I am under the impression that dictionaries have worst case lookup time O(1), whereas for arrays it is O(n).
That is, which of the following would be more efficient in terms of time complexity?
# Code 1
A = [1, 2, 3]
B = [1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 1, 2, 3]
for i in B:
if i in A:
print (i)
else:
print (-1)
# Code 2
A = [1, 2, 3]
B = [1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 1, 2, 3]
A_dict = {}
for i in A:
A_dict[i] = 0
for i in B:
if i in A_dict:
print (i)
else:
print (-1)
It seems that if what I said about time complexities above is true, then the first code has complexity O(|B| x |A|), whereas the second has complexity O(|B|). Is this correct?

You should use sets for that. They have O(1) lookup, like dicts, but they aren't key-value pairs.
Your code would then look like this:
A = [1, 2, 3]
B = [1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 1, 2, 3]
A_set = set(A)
for i in B:
if i in A_set:
print (i)
else:
print (-1)
or:
A = {1, 2, 3}
...

Related

How to create a list that represent the number of times a given item was shown?

This problem seems really stupid bu I can't get my head around it.
I have the following list:
a = [2, 1, 3, 1, 1, 2, 3, 2, 3]
I have to produce a second list which have the same size as the previous one but the values that appear should be the amount of times that a value showed up in the array until that point. For example:
b = [1, 1, 1, 2, 3, 2, 2, 3, 3]
So b[0] = 1 because it's the first time the item '2' appear on the 'a' list. b[5] = 2 and b[7] = 3 because it's the second and third time that the item '2' appear on the list 'a'.
Here a solution:
from collections import defaultdict
a = [2, 1, 3, 1, 1, 2, 3, 2, 3]
b = []
d = defaultdict(int)
for x in a:
d[x] +=1
b.append(d[x])
print(b)
Output:
[1, 1, 1, 2, 3, 2, 2, 3, 3]
I think using dictionary might help you, basically I am iterating the list and storing the current frequency of the number.
a = [2, 1, 3, 1, 1, 2, 3, 2, 3]
d = {}
z = []
for i in a:
if i not in d:
d[i] = 1
z.append(1)
else:
d[i]+=1
z.append(d[i])
print(z)
output = [1, 1, 1, 2, 3, 2, 2, 3, 3]

Remove duplicate numbers from a list

I was attempting to remove all duplicated numbers in a list.
I was trying to understand what is wrong with my code.
numbers = [1, 1, 1, 1, 6, 5, 5, 2, 3]
for x in numbers:
if numbers.count(x) >= 2:
numbers.remove(x)
print(numbers)
The result I got was:
[1, 1, 6, 5, 2, 3]
I guess the idea is to write code yourself without using library functions. Then I would still suggest to use additional set structure to store your previous items and go only once over your array:
numbers = [1, 1, 1, 1, 6, 5, 5, 2, 3]
unique = set()
for x in numbers:
if x not in unique:
unique.add(x)
numbers = list(unique)
print(numbers)
If you want to use your code then the problem is that you modify collection in for each loop, which is a big NO NO in most programming languages. Although Python allows you to do that, the problem and solution are already described in this answer: How to remove items from a list while iterating?:
Note: There is a subtlety when the sequence is being modified by the loop (this can only occur for mutable sequences, i.e. lists). An internal counter is used to keep track of which item is used next, and this is incremented on each iteration. When this counter has reached the length of the sequence the loop terminates. This means that if the suite deletes the current (or a previous) item from the sequence, the next item will be skipped (since it gets the index of the current item which has already been treated). Likewise, if the suite inserts an item in the sequence before the current item, the current item will be treated again the next time through the loop. This can lead to nasty bugs that can be avoided by making a temporary copy using a slice of the whole sequence, e.g.,
for x in a[:]:
if x < 0: a.remove(x)
numbers = [1, 1, 1, 1, 6, 5, 5, 2, 3]
Using a shallow copy of the list:
for x in numbers[:]:
if numbers.count(x) >= 2:
numbers.remove(x)
print(numbers) # [1, 6, 5, 2, 3]
Alternatives:
Preserving the order of the list:
Using dict.fromkeys()
print(list(dict.fromkeys(numbers).keys())) # [1, 6, 5, 2, 3]
Using more_itertools.unique_everseen(iterable, key=None):
from more_itertools import unique_everseen
print(list(unique_everseen(numbers))) # [1, 6, 5, 2, 3]
Using pandas.unique:
import pandas as pd
print(pd.unique(numbers).tolist()) # [1, 6, 5, 2, 3]
Using collections.OrderedDict([items]):
from collections import OrderedDict
print(list(OrderedDict.fromkeys(numbers))) # [1, 6, 5, 2, 3]
Using itertools.groupby(iterable[, key]):
from itertools import groupby
print([k for k,_ in groupby(numbers)]) # [1, 6, 5, 2, 3]
Ignoring the order of the list:
Using numpy.unique:
import numpy as np
print(np.unique(numbers).tolist()) # [1, 2, 3, 5, 6]
Using set():
print(list(set(numbers))) # [1, 2, 3, 5, 6]
Using frozenset([iterable]):
print(list(frozenset(numbers))) # [1, 2, 3, 5, 6]
Why don't you simply use a set:
numbers = [1, 1, 1, 1, 6, 5, 5, 2, 3]
numbers = list(set(numbers))
print(numbers)
Before anything, the first advice I can give is to never edit over an array that you are looping. All kinds of wacky stuff happens. Your code is fine (I recommend reading other answers though, there's an easier way to do this with a set, which pretty much handles the duplication thing for you).
Instead of removing number from the array you are looping, just clone the array you are looping in the actual for loop syntax with slicing.
numbers = [1, 1, 1, 1, 6, 5, 5, 2, 3]
for x in numbers[:]:
if numbers.count(x) >= 2:
numbers.remove(x)
print(numbers)
print("Final")
print(numbers)
The answer there is numbers[:], which gives back a clone of the array. Here's the print output:
[1, 1, 1, 6, 5, 5, 2, 3]
[1, 1, 6, 5, 5, 2, 3]
[1, 6, 5, 5, 2, 3]
[1, 6, 5, 5, 2, 3]
[1, 6, 5, 5, 2, 3]
[1, 6, 5, 2, 3]
[1, 6, 5, 2, 3]
[1, 6, 5, 2, 3]
[1, 6, 5, 2, 3]
Final
[1, 6, 5, 2, 3]
Leaving a placeholder here until I figure out how to explain why in your particular case it's not working, like the actual step by step reason.
Another way to solve this making use of the beautiful language that is Python, is through list comprehension and sets.
Why a set. Because the definition of this data structure is that the elements are unique, so even if you try to put in multiple elements that are the same, they won't appear as repeated in the set. Cool, right?
List comprehension is some syntax sugar for looping in one line, get used to it with Python, you'll either use it a lot, or see it a lot :)
So with list comprehension you will iterate an iterable and return that item. In the code below, x represents each number in numbers, x is returned to be part of the set. Because the set handles duplicates...voila, your code is done.
numbers = [1, 1, 1, 1, 6, 5, 5, 2, 3]
nubmers_a_set = {x for x in numbers }
print(nubmers_a_set)
This seems like homework but here is a possible solution:
import numpy as np
numbers = [1, 1, 1, 1, 6, 5, 5, 2, 3]
filtered = list(np.unique(numbers))
print(filtered)
#[1, 2, 3, 5, 6]
This solution does not preserve the ordering. If you need also the ordering use:
filtered_with_order = list(dict.fromkeys(numbers))
Why don't you use fromkeys?
numbers = [1, 1, 1, 1, 6, 5, 5, 2, 3]
numbers = list(dict.fromkeys(numbers))
Output: [1,6,5,2,3]
The flow is as follows.
Now the list is [1, 1, 1, 1, 6, 5, 5, 2, 3] and Index is 0.
The x is 1. The numbers.count(1) is 4 and thus the 1 at index 0 is removed.
Now the numbers list becomes [1, 1, 1, 6, 5, 5, 2, 3] but the Index will +1 and becomes 1.
The x is 1. The numbers.count(1) is 3 and thus the 1 and index 1 is removed.
Now the numbers list becomes [1, 1, 6, 5, 5, 2, 3] but the Index will +1 and becomes 2.
The x will be 6.
etc...
So that's why there are two 1's.
Please correct me if I am wrong. Thanks!
A fancy method is to use collections.Counter:
>>> from collections import Counter
>>> numbers = [1, 1, 1, 1, 6, 5, 5, 2, 3]
>>> c = Counter(numbers)
>>> list(c.keys())
[1, 6, 5, 2, 3]
This method have a linear time complexity (O(n)) and uses a really performant library.
You can try:
from more_itertools import unique_everseen
items = [1, 1, 1, 1, 6, 5, 5, 2, 3]
list(unique_everseen(items))
or
from collections import OrderedDict
>>> items = [1, 1, 1, 1, 6, 5, 5, 2, 3]
>>> list(OrderedDict.fromkeys(items))
[1, 2, 0, 3]
more you can find here
How do you remove duplicates from a list whilst preserving order?

Swap two values randomly in list

I have the following list:
a = [1, 2, 5, 4, 3, 6]
And I want to know how I can swap any two values at a time randomly within a list regardless of position within the list. Below are a few example outputs on what I'm thinking about:
[1, 4, 5, 2, 3, 6]
[6, 2, 3, 4, 5, 1]
[1, 6, 5, 4, 3, 2]
[1, 3, 5, 4, 2, 6]
Is there a way to do this in Python 2.7? My current code is like this:
import random
n = len(a)
if n:
i = random.randint(0,n-1)
j = random.randint(0,n-1)
a[i] += a[j]
a[j] = a[i] - a[j]
a[i] -= a[j]
The issue with the code I currently have, however, is that it starts setting all values to zero given enough swaps and iterations, which I do not want; I want the values to stay the same in the array, but do something like 2opt and only switch around two with each swap.
You are over-complicating it, it seems. Just randomly sample two indices from the list, then swap the values at those indicies:
>>> def swap_random(seq):
... idx = range(len(seq))
... i1, i2 = random.sample(idx, 2)
... seq[i1], seq[i2] = seq[i2], seq[i1]
...
>>> a
[1, 2, 5, 4, 3, 6]
>>> swap_random(a)
>>> a
[1, 2, 3, 4, 5, 6]
>>> swap_random(a)
>>> a
[1, 2, 6, 4, 5, 3]
>>> swap_random(a)
>>> a
[1, 2, 6, 5, 4, 3]
>>> swap_random(a)
>>> a
[6, 2, 1, 5, 4, 3]
Note, I used the Python swap idiom, which doesn't require an intermediate variable. It is equivalent to:
temp = seq[i1]
seq[i1] = seq[i2]
seq[i2] = temp

Pythonic way to find all elements with the highest frequency? [duplicate]

This question already has answers here:
How to find most common elements of a list? [duplicate]
(11 answers)
Closed 6 years ago.
I have a list such as this:
lst = [1, 3, 5, 1, 5, 6, 1, 1, 3, 4, 5, 2, 3, 4, 5, 3, 4]
I would like to find all the elements which occur most frequently.
So I would like:
most = [1, 3, 5]
1, 3, and 5 would occur the most, which is 4 times. What's a fast, pythonic way to do this? I tried methods shown here:
How to find most common elements of a list?.
But it only gives me the top 3, I need all elements. Thank you.
With collections.Counter and a list comprehension:
from collections import Counter
lst = [1, 3, 5, 1, 5, 6, 1, 1, 3, 4, 5, 2, 3, 4, 5, 3, 4]
r = [x for x, _ in Counter(lst).most_common(3)]
print(r)
# [1, 3, 5]
You can generalize for values with highest count by using max on the counter values:
c = Counter(lst)
m = max(c.values())
r = [k for k in c if c[k] == m]
print(r)
# [1, 3, 5]
For large iterables, to efficiently iterate through the counter and stop once the required items have been taken, you can use itertools.takewhile with most_common without any parameters:
from itertools import takewhile
c = Counter(lst)
m = max(c.values())
r = [x for x, _ in takewhile(lambda x: x[1]==m, c.most_common())]
print(r)
# [1, 3, 5]
You gain by not having to iterate through all the items in the counter object, although there is some overhead with having to sort the items using most_common; so I'm sure if this absolutely efficient after all. You could do some experiments with timeit.
You can also get the same result with groupby from itertools module and list comprehension in this way:
from itertools import groupby
a = [1, 3, 5, 1, 5, 6, 1, 1, 3, 4, 5, 2, 3, 4, 5, 3, 4]
most_common = 3
final = [k for k,v in groupby(sorted(a), lambda x: x) if len(list(v)) > most_common]
Output:
print(final)
>>> [1, 3, 5]
You can do the following if you like to print all the most frequent,
from collections import Counter
words=[1, 3, 5, 1, 5, 6, 1, 1, 3, 4, 5, 2, 3, 4, 5, 3, 4]
most= [word for word, word_count in Counter(words).most_common()]
print (most)
>>>
[1, 3, 5, 4, 2, 6]
Please note, if you want to limit, you can enter the number inside most_common() function. Ex: ...most_common(3)]. Hope this answers your question.

Compare each element in a list to all others

Is there a way to compare all elements of a list (ie one such as [4, 3, 2, 1, 4, 3, 2, 1, 4]) to all others and return, for each element, the number of other elements it is different from (ie, for the list above [6, 7, 7, 7, 6, 7, 7, 7, 6])? I then will need to add the numbers from this list.
li = [4, 3, 2, 1, 4, 3, 2, 1, 4]
from collections import Counter
c = Counter(li)
print c
length = len(li)
print [length - c[el] for el in li]
Creating c before executing [length - c[el] for el in li] is better than doing count(i) for each element i of the list, because that means that count() do the same count several times (each time it encounters a given element, it counts it)
By the way, another way to write it:
map(lambda x: length-c[x] , li)
You can get similar counter with count() method.
And subtract the total number.
Do it in one line with a comprehension list.
>>> l = [4, 3, 2, 1, 4, 3, 2, 1, 4]
>>> [ len(l)-l.count(i) for i in l ]
[6, 7, 7, 7, 6, 7, 7, 7, 6]
For Python 2.7:
test = [4, 3, 2, 1, 4, 3, 2, 1, 4]
length = len(test)
print [length - test.count(x) for x in test]
You could just use the sum function, along with a generator expression.
>>> l = [4, 3, 2, 1, 4, 3, 2, 1, 4]
>>> length = len(l)
>>> print sum(length - l.count(i) for i in l)
60
The good thing about a generator expression is that you don't create an actual list in memory, but functions like sum can still iterate over them and produce the desired result. Note, however, that once you iterate over a generator once, you can't iterate over it again.

Categories

Resources