Find difference between two values? - python

I have a random data set, and I was wondering if it is at all possible to find all sets of points where the difference between them is greater than some constant. It doesn't matter if the points aren't consecutive, as long as the difference between the corresponding value is greater than that constant.

You can (and probably should) use itertools.permutations, no nested loops required.
E. g.: if we want to find elements from numbers between 10 and 15 (including 10 and 15) which difference is greater than 3:
from itertools import permutations
numbers = range(10, 16)
restriction = 3
filtered_numbers_pairs = []
for value, other_value in permutations(numbers, r=2):
if value - other_value > restriction:
filtered_numbers_pairs.append((value, other_value))
print(filtered_numbers_pairs)
gives us
[(14, 10), (15, 10), (15, 11)]
or if you need to store values indexes – just add enumerate:
from itertools import permutations
numbers = range(10, 16)
restriction = 3
filtered_numbers_pairs = []
for (index, value), (other_index, other_value) in permutations(enumerate(numbers), r=2):
if value - other_value > restriction:
filtered_numbers_pairs.append((index, other_index))
print(filtered_numbers_pairs)
gives us
[(4, 0), (5, 0), (5, 1)]

Python supports sets:
>>> a = {1, 2, 3}
>>> type(a)
<type 'set'>
>>> b = {2, 4, 5}
>>> a-b # Finds all items in a, but not in b.
set([1, 3])
>>> b-a # Finds all items in b, but not in a.
set([4, 5])
>>> (a-b).union(b-a) # Finds the union of both differences.
set([1, 3, 4, 5])
See help(set) for documentation.
To apply this to your question, however, will need an example of the data you have and the outcome you want. Eg, some normalization may be required, or maybe you aren't dealing with sets afterall.

Use nested loops
diff = []
for i, val1 in enumerate(dataset):
for j, val2 in enumerate(dataset[i+1:]):
if abs(val1 - val2) > some_constant:
diff.append((i, j))
The inner loop uses a slice of the array so we don't add both i, j and j, i to the result.

Yes it is possible.
It would be something like this:
sets = []
for item1 in dataset:
for item2 in dataset:
if abs(item1 - item2) > somevalue):
sets.append((item1, item2))
You create a sets list, which is gonna hold the value-pairs which have a absolute difference bigger than somevalue. Then you append the sets containing the values of those items into sets.
EDIT: The list sets is a mutable object, if you want that to be immutable, this code won't work for you.

Related

What Best way to find unique sublists of a given length that are present in a list?

I have built a function that finds all of the unique sublists, of length i, present in a given list.
For example if you have list=[0,1,1,0,1] and i=1, you just get [1,0]. If i=2, you get [[0,1],[1,1],[1,0]], but not [0,0] because while it is a possible combination of 1 and 0, it is not present in the given list. The code is listed below.
While the code functions, I do not believe it is the most efficient. It relies on finding all possible sublists and testing for the presence of each one, which becomes impractical at i > 4 (for say a list length of 100). I was hoping I could get help in finding a more efficient method for computing this. I am fully aware that this is probably not a great way to do this, but with what little knowledge I have its the first thing that I could come up with.
The code I have written:
def present_sublists (l,of_length):
"""
takes a given list of 1s and 0s and returns all the unique sublist of that
string that are of a certain length
"""
l_str=[str(int) for int in l] #converts entries in input to strings
l_joined="".join(l_str) #joins input into one strings, i.e. "101010"
sublist_sets=set(list(itertools.combinations(l_joined,of_length)))
#uses itertools to get all possible combintations of substrings, and then set
#properties to removes duplicates
pos_sublists=list(sublist_sets) #returns the set to a list
sublists1=[]
for entry in pos_sublists: #returns the entries to a list
sublists1.append(list(entry))
for entry in sublists1: #returns the "1"s and "0" to 1s and 0s
for entry2 in entry:
entry[entry.index(entry2)]=int(entry2)
present_sublists=[]
for entry in sublists1: #tests whether the possible sublist is
#present in the input list
for x in range(len(l) - len(entry) + 1):
if entry not in present_sublists:
if l[x: x + len(entry)] == entry:
present_sublists.append(entry)
output=present_sublists
return output
Given your code and sample, look like you want all the unique contiguous sub-sequences of the given input, if so you don't need to compute all combinations, neither shifting around between strings, list, set and back from string, let alone looping multiple times over the thing, using the slice notation is more that enough to get the desire result
>>> [0,1,2,3,4][0:2]
[0, 1]
>>> [0,1,2,3,4][1:3]
[1, 2]
>>> [0,1,2,3,4][2:4]
[2, 3]
>>> [0,1,2,3,4][3:5]
[3, 4]
>>>
An appropriate use of the indexes from the slice get us all the contiguous sub-sequences of any given size (2 in the example)
Now to make this more automatic, we make an appropriate for loop
>>> seq=[0,1,2,3,4]
>>> size=2
>>> for i in range(len(seq)-size+1):
print(seq[i:i+size])
[0, 1]
[1, 2]
[2, 3]
[3, 4]
>>>
Now that we know how to get all the sub-sequences we care about, we focus on getting only the unique ones, for that of course we use a set but a list can't be in a set, so we need something that can, so a tuple is the answer (which is basically an immutable list), and that is everything you need, lets put it all together:
>>> def sub_sequences(seq,size):
"""return a set with all the unique contiguous sub-sequences of the given size of the given input"""
seq = tuple(seq) #make it into a tuple so it can be used in a set
if size>len(seq) or size<0: #base/trivial case
return set() #or raise an exception like ValueError
return {seq[i:i+size] for i in range(len(seq)-size+1)} #a set comprehension version of the previous mentioned loop
>>> sub_sequences([0,1,2,3,4],2)
{(0, 1), (1, 2), (2, 3), (3, 4)}
>>>
>>> #now lets use your sample
>>>
>>> sub_sequences([0,1,1,0,1],2)
{(0, 1), (1, 0), (1, 1)}
>>> sub_sequences([0,1,1,0,1],3)
{(1, 0, 1), (1, 1, 0), (0, 1, 1)}
>>> sub_sequences([0,1,1,0,1],4)
{(1, 1, 0, 1), (0, 1, 1, 0)}
>>> sub_sequences([0,1,1,0,1],5)
{(0, 1, 1, 0, 1)}
>>>
Let's label the bits 0, 1, 2, 3, .....
Let's also define a function f(len, n) where f(len, n) is defined to be set of all the strings of length len that occur in the first n bits.
So
f(0, n) = {''} since you can always make the empty string
f(len, 0) = set() if len > 0
So what is the value of f(len, n) if len > 0 and n > 0? It contains everything in f(len, n - 1), plus in contains everything in f(len - 1, n - 1) with l[n-1] appended to it.
You now have everything you need to find f(of_length, len(l)) reasonably efficientlyt.
To stick to your function footprint I would suggest something like:
Iterate through each sublist and put them into a set() to ensure the uniqueness
The sublists needs to be converted to tuples since lists cannot be hashed therefore cannot be put into sets as they are
Convert the resulted tuples in the set back to the required formats.
When creating new lists, list comprehension is the most effective and pythonic way to choose.
>>> def present_sublists(l,of_length):
... sublists = set([tuple(l[i:i+of_length]) for i in range(0,len(l)+1-of_length)])
... return [list(sublist) for sublist in sublists]
...
>>> present_sublists([0,1,1,0,1], 1)
[[0], [1]]
>>> present_sublists([0,1,1,0,1], 2)
[[0, 1], [1, 0], [1, 1]]

Turn List of Dictionaries and Into a Set of Dictionaries

I have a list of dictionaries like the following:
a = [{1000976: 975},
{1000977: 976},
{1000978: 977},
{1000979: 978},
{1000980: 979},
{1000981: 980},
{1000982: 981},
{1000983: 982},
{1000984: 983},
{1000985: 984}]
I could be thinking about this wrong, but I'm comparing this list of dicts to another list of dicts and am attempting to remove elements (dictionaries) in one list that are in the other. In order to list operations, I want to transform both into sets and perform set subtraction. However I'm getting the following error when attempting to do the conversion.
set_a = set(a)
TypeError: unhashable type: 'dict'
Am I thinking about this incorrectly?
Try this:
>>> a = [{1000976: 975},
... {1000977: 976},
... {1000978: 977},
... {1000979: 978},
... {1000980: 979},
... {1000981: 980},
... {1000982: 981},
... {1000983: 982},
... {1000984: 983},
... {1000985: 984}]
>>> a.extend(a) # just to add some duplicates
>>> len(a)
20
>>> dict_set = set(frozenset(d.items()) for d in a)
>>> b = [dict(s) for s in dict_set]
>>> b
[{1000982: 981}, {1000983: 982}, {1000981: 980}, {1000985: 984}, {1000978: 977}, {1000980: 979}, {1000977: 976}, {1000976: 975}, {1000984: 983}, {1000979: 978}]
>>> len(b)
10
If you want do set subtraction between two lists of dicts then just use the same conversion to sets as above on both dicts, do the subtraction, then convert back.
Note: At the very least all values in your dict should also be hashable (as well as keys but that goes without saying). If not, you need a similar transformation on the values into a hashable, immutable type of some kind.
Note: This is also does not preserve the original order; if that's important to you need to adapt this to an algorithm like this one. The key though is converting dicts to some immutable type.
You could turn the dictionaries into tuples, as there are only two values like so:
a_set = set(t for d in a for t in d.items())
And then use set operations to compare two sets from that point. To convert back into a list of dictionaries, you can use:
a_list = [{key: value} for key, value in a_set]
For filtering there's a one-liner. (b is the filter list of dicts). This is by far and away the fastest approach, unless you are using the same filter against multiple sets.
c = [a[i] for i,j in enumerate(a) if j not in b]
Or using the built in filter: another one-liner (slower):
c = list(filter(lambda i: i not in b, a))
If you are really asking how to convert a list of dicts into a set-operable variable, then you can do this with yet another one-liner:
a_set = set(map(lambda i: frozenset(i.items()), a))
again, if we have 'b' as a list of dicts as our filter
b_set = set(map(lambda i: frozenset(i.items()), b))
... and we can now use set operations on them:
c_set = a_set - b_set
The 'frozenset' method of converting a dict to a set is about 25% faster than using a list comprehension; but it's much slower to convert everything to sets and then perform the set operations than it is just to use a simple list comprehension filter such as the one at the top of my answer. Obviously, if one is going to do many filters, it may be cost effective to convert the objects to immutables; but in that case, it may be better to change the underlying data structure of the objects, and convert the entire structure to a class.
If you don't want to use frozen set and your dicts are arbitrary, rather than single entry dicts, you can tupelise the dicts:
a_set = set(map(lambda j: tuple(map(lambda i: tuple((i, j[i])), j)), a))
You suggest in the question that you don't want ANY nested loop, and so far all the answers (including mine) have a 'for' (or a lambda).
When we want to use a set method for filtering two dictionaries, it's not too shabby to do exactly that as follows:
c = a.items() - b.items()
of course if we want c to be a dict, we need to wrap it again:
c = dict(a.items() - b.items()
Likewise, for lists of immutable types, we can do the same (by coercing our lists into sets:
x = [3, 4, 5, 6, 7]
y = [3, 2, 1, 7]
z = set(x) - set(y)
or (tuples are immutable)
x = [(3, 1), (4, 1), (5, 1), (6, 2), (7, 5)]
y = [(4, 1), (4, 2), (5, 1)]
z = set(x) - set(y)
but (mutable) lists fail (as do your dicts):
x = [[3, 1], [4, 1], [5, 1], [6, 2], [7, 5]]
y = [[4, 1], [4, 2], [5, 1]]
z = set(x) - set(y)
>>>> TypeError: unhashable type: 'list'
This is because they are being stored by reference, not by value - so the uniqueness of them is unknowable at that point. One can handle it by creating a class - but then that is not using a list of dicts anymore, and your 'for' is just being buried into a class method.
So - you will need a nested loop somewhere, even if it is hidden by a lambda or a function..

How do you convert a Dictionary to a List?

For example, if the Dictionary is {0:0, 1:0, 2:0} making a list: [0, 0, 0].
If this isn't possible, how do you take the minimum of a dictionary, meaning the dictionary: {0:3, 1:2, 2:1} returning 1?
convert a dictionary to a list is pretty simple, you have 3 flavors for that .keys(), .values() and .items()
>>> test = {1:30,2:20,3:10}
>>> test.keys() # you get the same result with list(test)
[1, 2, 3]
>>> test.values()
[30, 20, 10]
>>> test.items()
[(1, 30), (2, 20), (3, 10)]
>>>
(in python 3 you would need to call list on those)
finding the maximum or minimum is also easy with the min or max function
>>> min(test.keys()) # is the same as min(test)
1
>>> min(test.values())
10
>>> min(test.items())
(1, 30)
>>> max(test.keys()) # is the same as max(test)
3
>>> max(test.values())
30
>>> max(test.items())
(3, 10)
>>>
(in python 2, to be efficient, use the .iter* versions of those instead )
the most interesting one is finding the key of min/max value, and min/max got that cover too
>>> max(test.items(),key=lambda x: x[-1])
(1, 30)
>>> min(test.items(),key=lambda x: x[-1])
(3, 10)
>>>
here you need a key function, which is a function that take one of whatever you give to the main function and return the element(s) (you can also transform it to something else too) for which you wish to compare them.
lambda is a way to define anonymous functions, which save you the need of doing this
>>> def last(x):
return x[-1]
>>> min(test.items(),key=last)
(3, 10)
>>>
You can simply take the minimum with:
min(dic.values())
And convert it to a list with:
list(dic.values())
but since a dictionary is unordered, the order of elements of the resulting list is undefined.
In python-2.7 you do not need to call list(..) simply dic.values() will be sufficient:
dic.values()
>>> a = {0:0, 1:2, 2:4}
>>> a.keys()
[0, 1, 2]
>>> a.values()
[0, 2, 4]
Here is my one-liner solution for a flattened list of keys and values:
d = {'foo': 'bar', 'zoo': 'bee'}
list(sum(d.items(), tuple()))
And the result:
['foo', 'bar', 'zoo', 'bee']
A dictionary is defined as the following:
dict{[Any]:[Any]} = {[Key]:[Value]}
The problem with your question is that you haven't clarified what the keys are.
1: Assuming the keys are just numbers and in ascending order without gaps, dict.values() will suffice, as other authors have already pointed out.
2: Assuming the keys are just numbers in strictly ascending order but not in the right order:
i = 0
list = []
while i < max(mydict.keys()):
list.append(mydict[i])
i += 1
3: Assuming the keys are just numbers but not in strictly ascending order:
There still is a way, but you have to get the keys first and do it via the maximum of the keys and an try-except block
4: If none of these is the case, maybe dict is not what you are looking for and a 2d or 3d array would suffice? This also counts if one of the solutions do work. Dict seems to be a bad choice for what you are doing.

Sort a list then give the indexes of the elements in their original order

I have an array of n numbers, say [1,4,6,2,3]. The sorted array is [1,2,3,4,6], and the indexes of these numbers in the old array are 0, 3, 4, 1, and 2. What is the best way, given an array of n numbers, to find this array of indexes?
My idea is to run order statistics for each element. However, since I have to rewrite this function many times (in contest), I'm wondering if there's a short way to do this.
>>> a = [1,4,6,2,3]
>>> [b[0] for b in sorted(enumerate(a),key=lambda i:i[1])]
[0, 3, 4, 1, 2]
Explanation:
enumerate(a) returns an enumeration over tuples consisting of the indexes and values in the original list: [(0, 1), (1, 4), (2, 6), (3, 2), (4, 3)]
Then sorted with a key of lambda i:i[1] sorts based on the original values (item 1 of each tuple).
Finally, the list comprehension [b[0] for b in ...] returns the original indexes (item 0 of each tuple).
Using numpy arrays instead of lists may be beneficial if you are doing a lot of statistics on the data. If you choose to do so, this would work:
import numpy as np
a = np.array( [1,4,6,2,3] )
b = np.argsort( a )
argsort() can operate on lists as well, but I believe that in this case it simply copies the data into an array first.
Here is another way:
>>> sorted(xrange(len(a)), key=lambda ix: a[ix])
[0, 3, 4, 1, 2]
This approach sorts not the original list, but its indices (created with xrange), using the original list as the sort keys.
This should do the trick:
from operator import itemgetter
indices = zip(*sorted(enumerate(my_list), key=itemgetter(1)))[0]
The long way instead of using list comprehension for beginner like me
a = [1,4,6,2,3]
b = enumerate(a)
c = sorted(b, key = lambda i:i[1])
d = []
for e in c:
d.append(e[0])
print(d)

Accessing grouped items in arrays

I'm new to Python and have a list of numbers. e.g.
5,10,32,35,64,76,23,53....
and I've grouped them into fours (5,10,32,35, 64,76,23,53 etc..) using the code from this post.
def group_iter(iterator, n=2, strict=False):
""" Transforms a sequence of values into a sequence of n-tuples.
e.g. [1, 2, 3, 4, ...] => [(1, 2), (3, 4), ...] (when n == 2)
If strict, then it will raise ValueError if there is a group of fewer
than n items at the end of the sequence. """
accumulator = []
for item in iterator:
accumulator.append(item)
if len(accumulator) == n: # tested as fast as separate counter
yield tuple(accumulator)
accumulator = [] # tested faster than accumulator[:] = []
# and tested as fast as re-using one list object
if strict and len(accumulator) != 0:
raise ValueError("Leftover values")
How can I access the individual arrays so that I can perform functions on them. For example, I'd like to get the average of the first values of every group (e.g. 5 and 64 in my example numbers).
Let's say you have the following tuple of tuples:
a=((5,10,32,35), (64,76,23,53))
To access the first element of each tuple, use a for-loop:
for i in a:
print i[0]
To calculate average for the first values:
elements=[i[0] for i in a]
avg=sum(elements)/float(len(elements))
Ok, this is yielding a tuple of four numbers each time it's iterated. So, convert the whole thing to a list:
L = list(group_iter(your_list, n=4))
Then you'll have a list of tuples:
>>> L
[(5, 10, 32, 35), (64, 76, 23, 53), ...]
You can get the first item in each tuple this way:
firsts = [tup[0] for tup in L]
(There are other ways, of course.)
You've created a tuple of tuples, or a list of tuples, or a list of lists, or a tuple of lists, or whatever...
You can access any element of any nested list directly:
toplist[x][y] # yields the yth element of the xth nested list
You can also access the nested structures by iterating over the top structure:
for list in lists:
print list[y]
Might be overkill for your application but you should check out my library, pandas. Stuff like this is pretty simple with the GroupBy functionality:
http://pandas.sourceforge.net/groupby.html
To do the 4-at-a-time thing you would need to compute a bucketing array:
import numpy as np
bucket_size = 4
n = len(your_list)
buckets = np.arange(n) // bucket_size
Then it's as simple as:
data.groupby(buckets).mean()

Categories

Resources