Find multiple occuring string in array and output index - python

I have a array filled with e-mail addresses which change constantly. e.g.
mailAddressList = ['chip#plastroltech.com','spammer#example.test','webdude#plastroltech.com','spammer#example.test','spammer#example.test','support#plastroltech.com']
How do I find multiple occurrences of the same string in the array and output it's indexes?

just group indexes by email and print only those items, where lenght of index list is greater than 1:
from collections import defaultdict
mailAddressList = ['chip#plastroltech.com',
'spammer#example.test',
'webdude#plastroltech.com',
'spammer#example.test',
'spammer#example.test',
'support#plastroltech.com'
]
index = defaultdict(list)
for i, email in enumerate(mailAddressList):
index[email].append(i)
print [(email, positions) for email, positions in index.items()
if len(positions) > 1]
# [('spammer#example.test', [1, 3, 4])]

Try this:
query = 'spammer#example.test''
indexes = [i for i, x in enumerate(mailAddressList) if x == query]
Output:
[1, 3, 4]

In [7]: import collections
In [8]: q=collections.Counter(mailAddressList).most_common()
In [9]: indexes = [i for i, x in enumerate(mailAddressList) if x == q[0][0]]
In [10]: indexes
Out[10]: [1, 3, 4]

note: solutions submitted before are more pythonic than mine. but in my opinon, lines that i've written before are easier to understand. i simply will create a dictionary, then will add mail adresses as key and the indexes as value.
first declare an empty dictionary.
>>> dct = {}
then iterate over mail adresses (m) and their indexes (i) in mailAddressList and add them to dictionary.
>>> for i, m in enumerate(mailAddressList):
... if m not in dct.keys():
... dct[m]=[i]
... else:
... dct[m].append(i)
...
now, dct looks liike this.
>>> dct
{'support#plastroltech.com': [5], 'webdude#plastroltech.com': [2],
'chip#plastroltech.com': [0], 'spammer#example.test': [1, 3, 4]}
there are many ways to grab the [1,3,4]. one of them (also not so pythonic :) )
>>> [i for i in dct.values() if len(i)>1][0]
[1, 3, 4]
or this
>>> [i for i in dct.items() if len(i[1])>1][0] #you can add [1] to get [1,3,4]
('spammer#example.test', [1, 3, 4])

Here's a dictionary comprehension solution:
result = { i: [ k[0] for k in list(enumerate(mailAddressList)) if k[1] == i ] for j, i in list(enumerate(mailAddressList)) }
# Gives you: {'webdude#plastroltech.com': [2], 'support#plastroltech.com': [5], 'spammer#example.test': [1, 3, 4], 'chip#plastroltech.com': [0]}
It's not ordered, of course, since it's a hash table. If you want to order it, you can use the OrderedDict collection. For instance, like so:
from collections import OrderedDict
final = OrderedDict(sorted(result.items(), key=lambda t: t[0]))
# Gives you: OrderedDict([('chip#plastroltech.com', [0]), ('spammer#example.test', [1, 3, 4]), ('support#plastroltech.com', [5]), ('webdude#plastroltech.com', [2])])
This discussion is less relevant, but it might also prove useful to you.

mailAddressList = ["chip#plastroltech.com","spammer#example.test","webdude#plastroltech.com","spammer#example.test","spammer#example.test","support#plastroltech.com"]
print [index for index, address in enumerate(mailAddressList) if mailAddressList.count(address) > 1]
prints [1, 3, 4], the indices of the addresses occuring more than once in the list.

Related

Segmenting a list of lists in Python

I have a list of lists all of the same length. I would like to segment the first list into contiguous runs of a given value. I would then like to segment the remaining lists to match the segments generated from the first list.
For example:
Given value: 2
Given list of lists: [[0,0,2,2,2,1,1,1,2,3], [1,2,3,4,5,6,7,8,9,10], [1,1,1,1,1,1,1,1,1,1]
Return: [ [[2,2,2],[2]], [[3,4,5],[9]], [[1,1,1],[1]] ]
The closest I have gotten is to get the indices by:
>>> import itertools
>>> import operator
>>> x = 2
>>> L = [[0,0,2,2,2,1,1,1,2,3],[1,2,3,4,5,6,7,8,9,10],[1,1,1,1,1,1,1,1,1,1]]
>>> I = [[i for i,value in it] for key,it in itertools.groupby(enumerate(L[0]), key=operator.itemgetter(1)) if key == x]
>>> print I
[[2, 3, 4], [8]]
This code was modified from another question on this site.
I would like to find the most efficient way possible, since these lists may be very long.
EDIT:
Maybe if I place the lists one on top of each other it might be clearer:
[[0,0,[2,2,2],1,1,1,[2],3], -> [2,2,2],[2]
[1,2,[3,4,5],6,7,8,[9],10],-> [3,4,5],[9]
[1,1,[1,1,1],1,1,1,[1],1]] -> [1,1,1],[1]
You can use groupby to create a list of groups in the form of a tuple of starting index and length of the group, and use this list to extract the values from each sub-list:
from itertools import groupby
from operator import itemgetter
def match(L, x):
groups = [(next(g)[0], sum(1 for _ in g) + 1)
for k, g in groupby(enumerate(L[0]), key=itemgetter(1)) if k == x]
return [[lst[i: i + length] for i, length in groups] for lst in L]
so that:
match([[0,0,2,2,2,1,1,1,2,3], [1,2,3,4,5,6,7,8,9,10], [1,1,1,1,1,1,1,1,1,1]], 2)
returns:
[[[2, 2, 2], [2]], [[3, 4, 5], [9]], [[1, 1, 1], [1]]]
l=[[0,0,2,2,2,1,1,1,2,3], [1,2,3,4,5,6,7,8,9,10], [1,1,1,1,1,1,1,1,1,1]]
temp=l[0]
value=2
dict={}
k=-1
prev=-999
for i in range(0,len(temp)):
if(temp[i]==value):
if(prev!=-999 and prev==i-1):
if(k in dict):
dict[k].append(i)
else:
dict[k]=[i]
else:
k+=1
if(k in dict):
dict[k].append(i)
else:
dict[k]=[i]
prev=i
output=[]
for i in range(0,len(l)):
single=l[i]
final=[]
for keys in dict: #{0: [2, 3, 4], 1: [8]}
ans=[]
desired_indices=dict[keys]
for j in range(0,len(desired_indices)):
ans.append(single[desired_indices[j]])
final.append(ans)
output.append(final)
print(output) #[[[2, 2, 2], [2]], [[3, 4, 5], [9]], [[1, 1, 1], [1]]]
This seems to be one of the approach, this first creates the dictionary of contagious elements and then looks for that keys in every list and stores in output.

Split lists and tuples in Python

I have a simple question.
I have list, or a tuple, and I want to split it into many lists (or tuples) that contain the same elements.
I'll try to be more clear using an example:
(1,1,2,2,3,3,4) --> (1,1),(2,2),(3,3),(4,)
(1,2,3,3,3,3) --> (1,),(2,),(3,3,3,3)
[2,2,3,3,2,3] --> [2,2],[3,3],[2],[3]
How can I do? I know that tuples and lists do not have the attribute "split" so i thought that i could turn them into strings before. This is what i tried:
def splitt(l)
x=str(l)
for i in range (len(x)-1):
if x[i]!=x[i+1]:
x.split()
return x
You can use groupby.
import itertools as it
[list(grp) if isinstance(t,list) else tuple(grp) for k, grp in it.groupby(t)]
Examples:
>>> t = (1,2,3,3,3,3)
[(1,), (2,), (3, 3, 3, 3)]
>>> t = [2,2,3,3,2,3]
[[2, 2], [3, 3], [2], [3]]
You also may try with for-loop:
def group_lt(list_or_tuple):
result = []
for x in list_or_tuple:
if not result or result[-1][0] != x:
result.append(type(list_or_tuple)([x]))
else:
result[-1] += type(list_or_tuple)([x])
return result
t = (1,1,2,2,3,3,4)
print(group_lt(t)) # [(1,1),(2,2),(3,3),(4,)]
l = [2,2,3,3,2,3]
print(group_lt(l)) # [[2,2],[3,3],[2],[3]]
Try this
from itertools import groupby
input_list = [1, 1, 2, 4, 6, 6, 7]
output = [list(g) for k, g in groupby(input_list)]

Unique list of lists

I have a nested list as an example:
lst_a = [[1,2,3,5], [1,2,3,7], [1,2,3,9], [1,2,6,8]]
I'm trying to check if the first 3 indices of a nested list element are the same as other.
I.e.
if [1,2,3] exists in other lists, remove all the other nested list elements that contain that. So that the nested list is unique.
I'm not sure the most pythonic way of doing this would be.
for i in range(0, len(lst_a)):
if lst[i][:3] == lst[i-1][:3]:
lst[i].pop()
Desired output:
lst_a = [[1,2,3,9], [1,2,6,8]]
If, as you said in comments, sublists that have the same first three elements are always next to each other (but the list is not necessarily sorted) you can use itertools.groupby to group those elements and then get the next from each of the groups.
>>> from itertools import groupby
>>> lst_a = [[1,2,3,5], [1,2,3,7], [1,2,3,9], [1,2,6,8]]
>>> [next(g) for k, g in groupby(lst_a, key=lambda x: x[:3])]
[[1, 2, 3, 5], [1, 2, 6, 8]]
Or use a list comprehension with enumerate and compare the current element with the last one:
>>> [x for i, x in enumerate(lst_a) if i == 0 or lst_a[i-1][:3] != x[:3]]
[[1, 2, 3, 5], [1, 2, 6, 8]]
This does not require any imports, but IMHO when using groupby it is much clearer what the code is supposed to do. Note, however, that unlike your method, both of those will create a new filtered list, instead of updating/deleting from the original list.
I think you are missing a loop For if you want to check all possibilities. I guess it should like :
for i in range(0, len(lst_a)):
for j in range(i, len(lst_a)):
if lst[i][:3] == lst[j][:3]:
lst[i].pop()
Deleting while going throught the list is maybe not the best idea you should delete unwanted elements at the end
Going with your approach, Find the below code:
lst=[lst_a[0]]
for li in lst_a[1:]:
if li[:3]!=lst[0][:3]:
lst.append(li)
print(lst)
Hope this helps!
You can use a dictionary to filter a list:
dct = {tuple(i[:3]): i for i in lst}
# {(1, 2, 3): [1, 2, 3, 9], (1, 2, 6): [1, 2, 6, 8]}
list(dct.values())
# [[1, 2, 3, 9], [1, 2, 6, 8]]

Intersection of subelements of multiple list of lists in dictionary

I have a number of lists of lists stored in a dictionary. I want to find the intersection of the sub-lists (i.e. intersection of dict[i][j]) for all keys of the dictionary. )
For example, if the dictionary stored sets of tuples instead, I could use the code:
set.intersection(*[index[key] for key in all_keys])
What is an efficient way to do this? One way I tried was to first convert each list of lists into a set of tuples and then taking the intersection of those, but this is rather clunky.
Example:
Suppose the dictionary of lists of lists is
dict = {}
dict['A'] = [[1, 'charlie'], [2, 'frankie']]
dict['B'] = [[1, 'charlie'], [2, 'chuck']]
dict['C'] = [[2, 'chuck'], [1, 'charlie']]
then I want to return
[1, 'charlie']
(maybe as a tuple, doesn't have to be list)
EDIT: I just found an okay way to do it, but it's not very 'pythonic'
def search(index, search_words):
rv = {tuple(t) for t in index[search_words[0]]}
for word in search_words:
rv = rv.intersection({tuple(t) for t in index[word]})
return rv
Let's call your dictionary of lists of lists d:
>>> d = {'A': [[1, 'charlie'], [2, 'frankie']], 'B': [[1, 'charlie'], [2, 'chuck']], 'C': [[2, 'chuck'], [1, 'charlie']]}
I am calling it d because dict is a builtin and we would prefer not to overwrite it.
Now, find the intersection:
>>> set.intersection( *[ set(tuple(x) for x in d[k]) for k in d ] )
set([(1, 'charlie')])
How it works
set(tuple(x) for x in d[k])
For key k, this forms a set out of tuples of the elements in d[k]. Taking k='A' for example:
>>> k='A'; set(tuple(x) for x in d[k])
set([(2, 'frankie'), (1, 'charlie')])
[ set(tuple(x) for x in d[k]) for k in d ]
This makes a list of the sets from the step above. Thus:
>>> [ set(tuple(x) for x in d[k]) for k in d ]
[set([(2, 'frankie'), (1, 'charlie')]),
set([(2, 'chuck'), (1, 'charlie')]),
set([(2, 'chuck'), (1, 'charlie')])]
set.intersection( *[ set(tuple(x) for x in d[k]) for k in d ] )
This gathers the intersection of the three sets as above:
>>>set.intersection( *[ set(tuple(x) for x in d[k]) for k in d ] )
set([(1, 'charlie')])
[item for item in A if item in B+C]
Your usecase demands the use of reduce function. But, quoting the BDFL's take on reduce,
So now reduce(). This is actually the one I've always hated most, because, apart from a few examples involving + or *, almost every time I see a reduce() call with a non-trivial function argument, I need to grab pen and paper to diagram what's actually being fed into that function before I understand what the reduce() is supposed to do. So in my mind, the applicability of reduce() is pretty much limited to associative operators, and in all other cases it's better to write out the accumulation loop explicitly.
So, what you have got is already 'pythonic'.
I would write the program like this
>>> d = {'A': [[1, 'charlie'], [2, 'frankie']],
... 'B': [[1, 'charlie'], [2, 'chuck']],
... 'C': [[2, 'chuck'], [1, 'charlie']]}
>>> values = (value for value in d.values())
>>> result = {tuple(item) for item in next(values)}
>>> for item in values:
... result &= frozenset(tuple(items) for items in item)
>>> result
set([(1, 'charlie')])
from collections import defaultdict
d = {'A': [[1, 'charlie'], [2, 'frankie']], 'B': [[1, 'charlie'], [2, 'chuck']], 'C': [[2, 'chuck'], [1, 'charlie']]}
frequency = defaultdict(set)
for key, items in d.iteritems():
for pair in items:
frequency[tuple(pair)].add(key)
output = [
pair for pair, occurrances in frequency.iteritems() if len(d) == len(occurrances)
]
print output

How do I remove duplicate arrays in a list in Python

I have a list in Python filled with arrays.
([4,1,2],[1,2,3],[4,1,2])
How do I remove the duplicate array?
Very simple way to remove duplicates (if you're okay with converting to tuples/other hashable item) is to use a set as an intermediate element.
lst = ([4,1,2],[1,2,3],[4,1,2])
# convert to tuples
tupled_lst = set(map(tuple, lst))
lst = map(list, tupled_lst)
If you have to preserve order or don't want to convert to tuple, you can use a set to check if you've seen the item before and then iterate through, i.e.,
seen = set()
def unique_generator(lst)
for item in lst:
tupled = tuple(item)
if tupled not in seen:
seen.add(tupled)
yield item
lst = list(unique_generator(lst))
This isn't great python, but you can write this as a crazy list comprehension too :)
seen = set()
lst = [item for item in lst if not(tuple(item) in seen or seen.add(tuple(item)))]
If order matters:
>>> from collections import OrderedDict
>>> items = ([4,1,2],[1,2,3],[4,1,2])
>>> OrderedDict((tuple(x), x) for x in items).values()
[[4, 1, 2], [1, 2, 3]]
Else it is much simpler:
>>> set(map(tuple, items))
set([(4, 1, 2), (1, 2, 3)])
l = ([4,1,2],[1,2,3],[4,1,2])
uniq = []
for i in l:
if not i in uniq:
uniq.append(i)
print('l=%s' % str(l))
print('uniq=%s' % str(uniq))
which produces:
l=([4, 1, 2], [1, 2, 3], [4, 1, 2])
uniq=[[4, 1, 2], [1, 2, 3]]
Use sets to keep track of seen items, but as sets can only contain hashable items so you may have to convert the items of your tuple to some hashable value first( tuple in this case) .
Sets provide O(1) lookup, so overall complexity is going to be O(N)
This generator function will preserve the order:
def solve(lis):
seen = set()
for x in lis:
if tuple(x) not in seen:
yield x
seen.add(tuple(x))
>>> tuple( solve(([4,1,2],[1,2,3],[4,1,2])) )
([4, 1, 2], [1, 2, 3])
If the order doesn't matter then you can simply use set() here:
>>> lis = ([4,1,2],[1,2,3],[4,1,2]) # this contains mutable/unhashable items
>>> set( tuple(x) for x in lis) # apply tuple() to each item, to make them hashable
set([(4, 1, 2), (1, 2, 3)]) # sets don't preserve order
>>> lis = [1, 2, 2, 4, 1] #list with immutable/hashable items
>>> set(lis)
set([1, 2, 4])

Categories

Resources