Find values that occur is multiple lists and their frequency

Find values that occur is multiple lists and their frequency - python

I have a list of lists. Each list has string values in it.
A value in the list is often seen passing through different lists. I want to find the values that occur in different lists at least more than k times.
For example, 127-0-0-1-59928 can be seen 3 times or 3-7-3-final-0 can be seen 4 times in the following case, and similarly there are other values that repeat.
[['127-0-0-1-59924'],
['127-0-0-1-59922'],
['127-0-0-1-59926'],
['127-0-0-1-59926', '3-8-0', '4-15-0-76', '3-7-3-final-0'],
['127-0-0-1-59928'],
['127-0-0-1-59928', '3-8-0', '4-15-0-76', '3-7-3-final-0'],
['127-0-0-1-59928'],
['127-0-0-1-59926'],
['127-0-0-1-34426'],
['127-0-0-1-34426', '3-8-0', '4-15-0-76', '3-7-3-final-0'],
['127-0-0-1-34428'],
['127-0-0-1-34428', '3-8-0', '4-15-0-76', '3-7-3-final-0'],
['127-0-0-1-34428'],
['127-0-0-1-34426']]
Is there an efficient way in which the frequencies of the values and/or values that occur in multiple lists more frequently (say above a certain threshold 'k') can be calculated?
Thanks a lot for the help!

You could just create a collections.Counter with the elements of all the lists:
lst = [['127-0-0-1-59924'], ...]
import collections
counts = collections.Counter(c for l in lst for c in l)
print(counts.most_common())
# [('3-8-0', 4), ('4-15-0-76', 4), ('3-7-3-final-0', 4), ('127-0-0-1-59926', 3), ('127-0-0-1-59928', 3), ('127-0-0-1-34426', 3), ('127-0-0-1-34428', 3), ('127-0-0-1-59924', 1), ('127-0-0-1-59922', 1)]
Note that this will be the accumulated counts of all the lists, so if an element appears twice in the same list, that counts as two occurrences, too.
If, instead, you do not want to consider multiple occurrences in the same list, but just count the number of different lists the elements appear in, you could do the same, but convert the sublists to set first (the result is the same in this case):
counts = collections.Counter(c for l in lst for c in set(l))
Neither of those methods considers the position of the element in the list, in case that's a concern.

Related

Find number of array pairs (a,b) where abs(a-b) = k in O(n)

What is the number of pairs in an array where the difference of the two elements is a target number k? Brute force is trivial, start from each index and go through all upcoming indices. I need to do that in O(n). I know it's possible to do it using hash table but I can't get my hear around it.
Example:
[2,5,4,1,7,4], k=3
Here there are 5 pairs. (2,5), (4,1), (4,7), (1,4), (7,4)
I defined this problem as a simpler problem than Leetcode's Subarray Sum Equals K. I imaging the leetcode problem has the same solution except that it will be applied on the running sum array.

You can build a dictionary associating each instance of a number with its n-k value. This will have O(n) complexity. Then run through the numbers and pair them with the list of corresponding numbers from the dictionary (also in O(n)):
numbers = [2,5,4,1,7,4]
k=3
numSet = dict()
for n in numbers: numSet.setdefault(n-k,[]).append(n) # O(n)
pairs = [(n,m) for n in numbers for m in numSet.get(n,[])] #O(n)
print(pairs)
[(2, 5), (1, 4), (4, 7), (4, 7)]
Given that numbers can repeat in the list, the actual complexity could be more than O(n) because the second part may yield more than n pairs (e.g. [3,6,3,6,3,6] produces 9 pairs). Producing more than n results in O(n) is technically impossible but I guess the data provided is designed to avoid these extreme cases.

Python, Make variable equal to the second column of an array

I realise that there's a fair chance this has been asked somewhere else, but to be honest I'm not sure exactly what terminology I should be using to search for it.
But basically I've got a list with a varying number of elements. Each element contains 3 values: A string, another list, and an integer eg:
First element = ('A', [], 0)
so
ListofElements[0] = [('A', [], 0)]
And what I am trying to do is make a new list that consists of all of the integers(3rd thing in the elements) that are given in ListofElements.
I can do this already by stepping through each element of ListofElements and then appending the integer onto the new list shown here:
NewList=[]
for element in ListofElements:
NewList.append(element[2])
But using a for loop seems like the most basic way of doing it, is there a way that uses less code? Maybe a list comprehension or something such as that. It seems like something that should be able to be done on a single line.
That is just a step in my ultimate goal, which is to find out the index of the element in ListofElements that has the minimum integer value. So my process so far is to make a new list, and then find the integer index of that new list using:
index=NewList.index(min(NewList))
Is there a way that I can just avoid making the new list entirely and generate the index straight away from the original ListofElements? I got stuck with what I would need to fill in to here, or how I would iterate through :
min(ListofElements[?][2])

You can use a list coprehension:
[x[2] for x in ListOfElements]
This is generally considered a "Pythonic" approach.
You can also find the minimum in a rather stylish manner using:
minimum = min(ListOfElements, key=lambda x: x[2])
index = ListOfElements.index(minimum)
Some general notes:
In python using underscores is the standard rather than CamelCase.
In python you almost never have to use an explicit for loop. Instead prefer
coprehensions or a functional pattern (map, apply etc.)

You can map your list with itemgetter:
>>> from operator import itemgetter
>>> l = [(1, 2, 3), (1, 2, 3), (1, 2, 3), (1, 2, 3), (1, 2, 3)]
>>> map(itemgetter(2), l)
[3, 3, 3, 3, 3]
Then you can go with your approach to find the position of minimum value.

Removing similar, but not identical, lists from a list of lists in python

I am identifying loops in directional graphs. My function returns a list of lists which store the nodes in any loops found.
For instance in a graph where the nodes are connected like this:
(1,2)(2,3)(3,4)(3,5)(5,2)
a loop is found at 2 - 3 - 5 so the function would return:
[[2,3,5]]
There are occasions where there are multiple loops which would return something like:
[[2,3,4][6,7,8,9]]
This is great, but if there are multiple start points in a graph which join the same loop at different points, such as in the graph:
(1,2)(2,3)(3,4)(3,5)(5,2)(6,3)
both nodes 1 and 6 join the same loop at different points which would return:
[[2,3,5][3,5,2]]
So here there are two identical loops, which are not identical lists. I want to identify such duplication and remove all but one (it doesn't matter which).
Note, there may be cases where there are multiple loops, one which is duplicated, such as:
[[2,3,5][3,5,2][7,8,9,6]]
I've tried looking into itertools:
loops.sort()
list(loops for loops,_ in itertools.groupby(loops))
but that's not helped, and I'm not 100% sure that this is appropriate anyway. Any ideas? I'm on python 2.4. Thanks for any help.

If you only care about the elements of each loop, and not the order, I would canonicalize each loop by sorting it, and then take the set:
>>> loops = [[2,3,5],[3,5,2],[7,8,9,6]]
>>> set(tuple(sorted(loop)) for loop in loops)
set([(2, 3, 5), (6, 7, 8, 9)])
In order to use set here you need to convert to a tuple. You could convert the tuples back to lists, or turn the final set back into a list (maybe even using sorted to get a canonical order), but whether you'd actually need to would depend upon what you'd be doing with it.
If you need to preserve path order, I'd canonicalize in a different way:
def rotated(l, n):
return l[n:] + l[:n]
def canonicalize(l):
m = min(l)
where = l.index(m)
return rotated(l, where)
and then
>>> loops = [[2,5,3], [5,3,2], [7,8,6,9]]
>>> set(tuple(canonicalize(loop)) for loop in loops)
set([(2, 5, 3), (6, 9, 7, 8)])
[Edit: note that this simple canonicalization only works if each vertex can only be visited once in a path.]

First you need to define what similar is, because it is stronger than set:
def is_similar(X,Y):
n = len(X)
return len(Y) == n and any( all( X[i] == Y[(i+j)%n]
for i in range(n) )
for j in range(1,n) ) #the 1 here so that identical lists are not similar
The distinction is important as the path (1,2,3,4) is different from the path (1,3,2,4), they do not correspond to the same loop.
def remove_similars(L):
new_L = []
for item in L:
if not any( is_similar(item, l) for l in new_L ):
new_L.append(item)
return new_L

You could take a set of each of your lists. If two sets are equal, then you have a duplicated loop. You're losing the order of the nodes in the loop, though, but does it matter to you ?

Iterate through tuple values python

I have a list like
[(1, 3), (6, 7)]
and a string
'AABBCCDD'
I need to get the result AABCD.
I know I can get the integers form the tuple with nameOfTuple[0][0] yielding 1.
I also know that I can get the chars form the string with nameOfString[0] yielding A.
My question is, how do I iterate through two arguments in the tuple, in order to save the integers (to a list maybe) and then get the chars from the string?

In [1]: l = [(1, 3), (6, 7)]
In [2]: s = 'AABBCCDD'
In [3]: ''.join(s[start-1:end] for (start,end) in l)
Out[3]: 'AABCD'
Here, pairs of indices from l are assigned to start and end, one pair at a time. The relevant portion of the string is then extracted using s[start-1:end], yielding a sequence of strings. The strings are then merged using join().

Accessing grouped items in arrays

I'm new to Python and have a list of numbers. e.g.
5,10,32,35,64,76,23,53....
and I've grouped them into fours (5,10,32,35, 64,76,23,53 etc..) using the code from this post.
def group_iter(iterator, n=2, strict=False):
""" Transforms a sequence of values into a sequence of n-tuples.
e.g. [1, 2, 3, 4, ...] => [(1, 2), (3, 4), ...] (when n == 2)
If strict, then it will raise ValueError if there is a group of fewer
than n items at the end of the sequence. """
accumulator = []
for item in iterator:
accumulator.append(item)
if len(accumulator) == n: # tested as fast as separate counter
yield tuple(accumulator)
accumulator = [] # tested faster than accumulator[:] = []
# and tested as fast as re-using one list object
if strict and len(accumulator) != 0:
raise ValueError("Leftover values")
How can I access the individual arrays so that I can perform functions on them. For example, I'd like to get the average of the first values of every group (e.g. 5 and 64 in my example numbers).

Let's say you have the following tuple of tuples:
a=((5,10,32,35), (64,76,23,53))
To access the first element of each tuple, use a for-loop:
for i in a:
print i[0]
To calculate average for the first values:
elements=[i[0] for i in a]
avg=sum(elements)/float(len(elements))

Ok, this is yielding a tuple of four numbers each time it's iterated. So, convert the whole thing to a list:
L = list(group_iter(your_list, n=4))
Then you'll have a list of tuples:
>>> L
[(5, 10, 32, 35), (64, 76, 23, 53), ...]
You can get the first item in each tuple this way:
firsts = [tup[0] for tup in L]
(There are other ways, of course.)

You've created a tuple of tuples, or a list of tuples, or a list of lists, or a tuple of lists, or whatever...
You can access any element of any nested list directly:
toplist[x][y] # yields the yth element of the xth nested list
You can also access the nested structures by iterating over the top structure:
for list in lists:
print list[y]

Might be overkill for your application but you should check out my library, pandas. Stuff like this is pretty simple with the GroupBy functionality:
http://pandas.sourceforge.net/groupby.html
To do the 4-at-a-time thing you would need to compute a bucketing array:
import numpy as np
bucket_size = 4
n = len(your_list)
buckets = np.arange(n) // bucket_size
Then it's as simple as:
data.groupby(buckets).mean()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Find values that occur is multiple lists and their frequency - python

Related

Find number of array pairs (a,b) where abs(a-b) = k in O(n)

Python, Make variable equal to the second column of an array

Removing similar, but not identical, lists from a list of lists in python

Iterate through tuple values python

Accessing grouped items in arrays

Categories

Resources