Get index range of the repetitive elements in the list - python

Suppose I have a list a = [-1,-1,-1,1,1,1,2,2,2,-1,-1,-1,1,1,1] in python what i want is if there is any built in function in python in which we pass a list and it will return which element are present at what what index ranges for example
>>> index_range(a)
{-1 :'0-2,9-11', 1:'3-5,12-14', 2:'6-8'}
I have tried to use Counter function from collection.Counter library but it only outputs the count of the element.
If there is not any built in function can you please guide me how can i achieve this in my own function not the whole code just a guideline.

You can create your custom function using itertools.groupby and collections.defaultdict to get the range of numbers in the form of list as:
from itertools import groupby
from collections import defaultdict
def index_range(my_list):
my_dict = defaultdict(list)
for i, j in groupby(enumerate(my_list), key=lambda x: x[1]):
index_range, numlist = list(zip(*j))
my_dict[numlist[0]].append((index_range[0], index_range[-1]))
return my_dict
Sample Run:
>>> index_range([-1,-1,-1,1,1,1,2,2,2,-1,-1,-1,1,1,1])
{1: [(3, 5), (12, 14)], 2: [(6, 8)], -1: [(0, 2), (9, 11)]}
In order to get the values as string in your dict, you may either modify the above function, or use the return value of the function in dictionary comprehension as:
>>> result_dict = index_range([-1,-1,-1,1,1,1,2,2,2,-1,-1,-1,1,1,1])
>>> {k: ','.join('{}:{}'.format(*i) for i in v)for k, v in result_dict.items()}
{1: '3:5,12:14', 2: '6:8', -1: '0:2,9:11'}

You can use a dict that uses list items as keys and their indexes as values:
>>> lst = [-1,-1,-1,1,1,1,2,2,2,-1,-1,-1,1,1,1]
>>> indexes = {}
>>> for index, item in enumerate(lst):
... indexes.setdefault(value, []).append(index)
>>> indexes
{1: [3, 4, 5, 12, 13, 14], 2: [6, 7, 8], -1: [0, 1, 2, 9, 10, 11]}
You could then merge the index lists into ranges if that's what you need. I can help you with that too if necessary.

Related

Efficient way to find key by value in Python dict where dict values are iterables

I have an iterable of unique numbers:
lst = [14, 11, 8, 55]
where every value is somewhere among numbers of dict's iterable-values, say lists:
dict_itms.items() = dict_items([(1, [0, 1, 2, 3]), (2, [11, 14, 12]), (3, [30, 8, 42]), (4, [55, 6])])
I have to find each lst element in a dict such a way that, finally, I would have a list of keys pairwise against each element in lst.
This method:
keys_ = []
for a in lst:
for k, v in dict_itms.items():
if a in v:
keys_ += [k]
break
else:
continue
gives:
[2, 2, 3, 4]
Is there more efficient way to find every key pairwise against each number to find?
You can use any in a list comprehension:
print([k for k,v in dict_itms.items() if any(x in lst for x in v)])
Output:
[2, 3, 4]
Update
According to this answer not set(v).isdisjoint(lst) is the fastest:
print([k for k,v in dict_itms.items() if not set(v).isdisjoint(lst)])
It's unclear what you mean by 'efficient'; do you need this to be efficient in a given pass or in aggregate? The reason I ask is that typically the best way to handle this in aggregate is by doing a pre-processing pass that flips your key-value relation:
reverse_lookup = dict()
for k,v in d.items():
for i in v:
keys = reverse_lookup.get(i, []) # Provide an empty list if this item not yet found
keys.append(k)
reverse_lookup[i] = keys
Now that you have your reverse lookup processed, you can use it in a straightforward manner:
result = [reverse_lookup.get(i) for i in lst]
# `result` is actually a list of lists, so as to allow duplicates. You will need to flatten it, or change the reverse lookup to ignore dupes.
The initial processing for the reverse lookup is O(n*m), where n*m is the total length of the original dictionary values summed. However, each lookup for the lst portion is O(1), so if you squint and have enough lookups this is O(p), where p is the length of lst. This will be wildly more efficient than other approaches if you have to do it a lot, and much less efficient if you're only ever passing over a given dictionary once.
A simple and Pythonic implementation:
d = dict([(1, [0, 1, 2, 3]), (2, [11, 14, 12]), (3, [30, 8, 42]), (4, [55, 6])])
xs = [14, 11, 8, 55]
keys = [k for k, v in d.items() if set(v).intersection(xs)]
print(keys)
However, this doesn't duplicate the 2 key, which your example does - not sure if that's behaviour you need?

Sort a list alphabetically and retrieve initial index in python

I've been trying to sort a list of names alphabetically, let's say:
list=['Bob','Alice','Charlie']
print(list.index('Alice'))
1
However I'd also like to keep track of the original indexes, so this won't work:
list.sort()
print(list)
['Alice','Bob','Charlie']
print(list.index('Alice'))
0
After sorting the indexes changed; is there any way of keeping track of the original indexes? I've checked other similar questions and numpy has a solution, but not useful for str variables.
Just sort the reversed (index, name) tuples from enumerate to keep track of the elements and their indices:
>>> names = ['Bob','Alice','Charlie']
>>> sorted((name, index) for index, name in enumerate(names))
[('Alice', 1), ('Bob', 0), ('Charlie', 2)]
l = ['Bob','Alice','Charlie']
e = enumerate(l) # creates a generator of [(0, 'Bob'), (1, 'Alice'), (2, 'Charlie')]
sl = sorted(e, key=lambda x: x[1]) # [(1, 'Alice'), (0, 'Bob'), (2, 'Charlie')]
You may create another list of indices and sort that one, leaving the original untouched:
>>> a = ['Bob', 'Alice', 'Charlie']
>>> idx = range(len(a))
>>> idx
[0, 1, 2]
>>> sorted( idx, key=lambda x : a[x] )
[1, 0, 2]
>>>
You could create a nested dictionary of sorts to hold the original index and sorted value.
First I would recommend to use a proper name for your list object, list is a keyword in python.
names=['Bob','Alice','Charlie']
name_dict = {name : {'unsorted' : idx} for idx,name in enumerate(names)}
for sorted_idx, name in enumerate(sorted(names)):
name_dict[name].update({'sorted' : sorted_idx})
print(name_dict['Bob']['sorted'])
1
print(name_dict['Bob']['unsorted'])
0
print(name_dict)
{'Bob': {'unsorted': 0, 'sorted': 1},
'Alice': {'unsorted': 1, 'sorted': 0},
'Charlie': {'unsorted': 2, 'sorted': 2}}
Sort a list alphabetically and retrieve initial index elemnt in python
l=['Bob','Alice','Charlie']
def sort_and_get_first_element(list1):
list1.sort()
return list1[0]
print sort_and_get_first_element(l)
Yes, you can keep track of initial index but with different data structure
a = ['Bob','Alice','Charlie']
l = sorted(enumerate(a), key=lambda i: i[1])
print(l)
Now the sorted list that keep track of initial index is,
[(1, 'Alice'), (0, 'Bob'), (2, 'Charlie')]

Fast removal of consecutive duplicates in a list and corresponding items from another list

My question is similar to this previous SO question.
I have two very large lists of data (almost 20 million data points) that contain numerous consecutive duplicates. I would like to remove the consecutive duplicate as follows:
list1 = [1,1,1,1,1,1,2,3,4,4,5,1,2] # This is 20M long!
list2 = ... # another list of size len(list1), also 20M long!
i = 0
while i < len(list)-1:
if list[i] == list[i+1]:
del list1[i]
del list2[i]
else:
i = i+1
And the output should be [1, 2, 3, 4, 5, 1, 2] for the first list.
Unfortunately, this is very slow since deleting an element in a list is a slow operation by itself. Is there any way I can speed up this process? Please note that, as shown in the above code snipped, I also need to keep track of the index i so that I can remove the corresponding element in list2.
Python has this groupby in the libraries for you:
>>> list1 = [1,1,1,1,1,1,2,3,4,4,5,1,2]
>>> from itertools import groupby
>>> [k for k,_ in groupby(list1)]
[1, 2, 3, 4, 5, 1, 2]
You can tweak it using the keyfunc argument, to also process the second list at the same time.
>>> list1 = [1,1,1,1,1,1,2,3,4,4,5,1,2]
>>> list2 = [9,9,9,8,8,8,7,7,7,6,6,6,5]
>>> from operator import itemgetter
>>> keyfunc = itemgetter(0)
>>> [next(g) for k,g in groupby(zip(list1, list2), keyfunc)]
[(1, 9), (2, 7), (3, 7), (4, 7), (5, 6), (1, 6), (2, 5)]
If you want to split those pairs back into separate sequences again:
>>> zip(*_) # "unzip" them
[(1, 2, 3, 4, 5, 1, 2), (9, 7, 7, 7, 6, 6, 5)]
You can use collections.deque and its max len argument to set a window size of 2. Then just compare the duplicity of the 2 entries in the window, and append to the results if different.
def remove_adj_dups(x):
"""
:parameter x is something like '1, 1, 2, 3, 3'
from an iterable such as a string or list or a generator
:return 1,2,3, as list
"""
result = []
from collections import deque
d = deque([object()], maxlen=2) # 1st entry is object() which only matches with itself. Kudos to Trey Hunner -->object()
for i in x:
d.append(i)
a, b = d
if a != b:
result.append(b)
return result
I generated a random list with duplicates of 20 million numbers between 0 and 10.
def random_nums_with_dups(number_range=None, range_len=None):
"""
:parameter
:param number_range: use the numbers between 0 and number_range. The smaller this is then the more dups
:param range_len: max len of the results list used in the generator
:return: a generator
Note: If number_range = 2, then random binary is returned
"""
import random
return (random.choice(range(number_range)) for i in range(range_len))
I then tested with
range_len = 2000000
def mytest():
for i in [1]:
return [remove_adj_dups(random_nums_with_dups(number_range=10, range_len=range_len))]
big_result = mytest()
big_result = mytest()[0]
print(len(big_result))
The len was 1800197 (read dups removed), in <5 secs, which includes the random list generator spinning up.
I lack the experience/knowhow to say if it is memory efficient as well. Could someone comment please

How to extract from a Python list while also accounting for the position of the extracted elements?

Given a list x e.g.
[4,6,7,21,1,7,3]
I need to extract those values that are less than or equal to 4. This is easily done, but I also need to take some note of where in the list those values occurred. If all values were unique I know I could probably use list.index() in some way. But there will be duplicated values. How best to achieve this?
how about simply
[(i, val) for i, val in enumerate([[4,6,7,21,1,7,3]) if val <= 4]
or depending on your use-case, perhaps a dictionary would be more suitable? Either from index to value:
{i:val for i, val in enumerate([4,6,7,21,1,7,3]) if val <= 4}
or from value to index:
from collections import defaultdict
indexes = defaultdict(list)
for i, val in enumerate([4,6,7,21,1,7,3]):
if val <= 4:
indexes[val].append(i)
you can make another list which will store tuples of the elements less than equal to 4 as first element and their index as second element, like this:
my_list = [4, 6, 7, 21, 1, 7, 3]
req_list = []
for i in range(len(my_list)):
e = my_list[i]
if e <= 4:
req_list.append((e, i))
here req_list will have pair-tuples with the first element as the element less than equal to 4 and the second element the index of that element.
e.g.
if
my_list = [4, 6, 7, 21, 1, 7, 3]
then
req_list = [(4, 0), (1, 4), (3, 6)]

Identify duplicate values in a list in Python

Is it possible to get which values are duplicates in a list using python?
I have a list of items:
mylist = [20, 30, 25, 20]
I know the best way of removing the duplicates is set(mylist), but is it possible to know what values are being duplicated? As you can see, in this list the duplicates are the first and last values. [0, 3].
Is it possible to get this result or something similar in python? I'm trying to avoid making a ridiculously big if elif conditional statement.
These answers are O(n), so a little more code than using mylist.count() but much more efficient as mylist gets longer
If you just want to know the duplicates, use collections.Counter
from collections import Counter
mylist = [20, 30, 25, 20]
[k for k,v in Counter(mylist).items() if v>1]
If you need to know the indices,
from collections import defaultdict
D = defaultdict(list)
for i,item in enumerate(mylist):
D[item].append(i)
D = {k:v for k,v in D.items() if len(v)>1}
Here's a list comprehension that does what you want. As #Codemonkey says, the list starts at index 0, so the indices of the duplicates are 0 and 3.
>>> [i for i, x in enumerate(mylist) if mylist.count(x) > 1]
[0, 3]
You can use list compression and set to reduce the complexity.
my_list = [3, 5, 2, 1, 4, 4, 1]
opt = [item for item in set(my_list) if my_list.count(item) > 1]
The following list comprehension will yield the duplicate values:
[x for x in mylist if mylist.count(x) >= 2]
simplest way without any intermediate list using list.index():
z = ['a', 'b', 'a', 'c', 'b', 'a', ]
[z[i] for i in range(len(z)) if i == z.index(z[i])]
>>>['a', 'b', 'c']
and you can also list the duplicates itself (may contain duplicates again as in the example):
[z[i] for i in range(len(z)) if not i == z.index(z[i])]
>>>['a', 'b', 'a']
or their index:
[i for i in range(len(z)) if not i == z.index(z[i])]
>>>[2, 4, 5]
or the duplicates as a list of 2-tuples of their index (referenced to their first occurrence only), what is the answer to the original question!!!:
[(i,z.index(z[i])) for i in range(len(z)) if not i == z.index(z[i])]
>>>[(2, 0), (4, 1), (5, 0)]
or this together with the item itself:
[(i,z.index(z[i]),z[i]) for i in range(len(z)) if not i == z.index(z[i])]
>>>[(2, 0, 'a'), (4, 1, 'b'), (5, 0, 'a')]
or any other combination of elements and indices....
I tried below code to find duplicate values from list
1) create a set of duplicate list
2) Iterated through set by looking in duplicate list.
glist=[1, 2, 3, "one", 5, 6, 1, "one"]
x=set(glist)
dup=[]
for c in x:
if(glist.count(c)>1):
dup.append(c)
print(dup)
OUTPUT
[1, 'one']
Now get the all index for duplicate element
glist=[1, 2, 3, "one", 5, 6, 1, "one"]
x=set(glist)
dup=[]
for c in x:
if(glist.count(c)>1):
indices = [i for i, x in enumerate(glist) if x == c]
dup.append((c,indices))
print(dup)
OUTPUT
[(1, [0, 6]), ('one', [3, 7])]
Hope this helps someone
That's the simplest way I can think for finding duplicates in a list:
my_list = [3, 5, 2, 1, 4, 4, 1]
my_list.sort()
for i in range(0,len(my_list)-1):
if my_list[i] == my_list[i+1]:
print str(my_list[i]) + ' is a duplicate'
The following code will fetch you desired results with duplicate items and their index values.
for i in set(mylist):
if mylist.count(i) > 1:
print(i, mylist.index(i))
You should sort the list:
mylist.sort()
After this, iterate through it like this:
doubles = []
for i, elem in enumerate(mylist):
if i != 0:
if elem == old:
doubles.append(elem)
old = None
continue
old = elem
You can print duplicate and Unqiue using below logic using list.
def dup(x):
duplicate = []
unique = []
for i in x:
if i in unique:
duplicate.append(i)
else:
unique.append(i)
print("Duplicate values: ",duplicate)
print("Unique Values: ",unique)
list1 = [1, 2, 1, 3, 2, 5]
dup(list1)
mylist = [20, 30, 25, 20]
kl = {i: mylist.count(i) for i in mylist if mylist.count(i) > 1 }
print(kl)
It looks like you want the indices of the duplicates. Here is some short code that will find those in O(n) time, without using any packages:
dups = {}
[dups.setdefault(v, []).append(i) for i, v in enumerate(mylist)]
dups = {k: v for k, v in dups.items() if len(v) > 1}
# dups now has keys for all the duplicate values
# and a list of matching indices for each
# The second line produces an unused list.
# It could be replaced with this:
for i, v in enumerate(mylist):
dups.setdefault(v, []).append(i)
m = len(mylist)
for index,value in enumerate(mylist):
for i in xrange(1,m):
if(index != i):
if (L[i] == L[index]):
print "Location %d and location %d has same list-entry: %r" % (index,i,value)
This has some redundancy that can be improved however.
def checkduplicate(lists):
a = []
for i in lists:
if i in a:
pass
else:
a.append(i)
return i
print(checkduplicate([1,9,78,989,2,2,3,6,8]))

Categories

Resources