Sorting tuples with dynamic values and length - python

I am trying to sort numerical tuples by two arguments:
first argument, lenght of the tuple: the smaller the tuple, the better;
second argument, the n-th value of the tuple: if two tuples have the same lenght, then they should be ordered by the first numerical value by which they differ
For example, let's say we have these four tuples:
a = (2, 5) # len(a) == 2
b = (2, 5, 3) # len(b) == 3
c = (2, 4, 3) # len(c) == 3
d = (1, 4, 4, 8) # len(d) == 4
The result I'm willing to obtain is a function that will help me sort these tuples so that:
a is the first tuple (the smallest one)
b and c follows (the middle ones)
d is the last tuple (the longest one)
since b and c both get the same lenght, they shall be ordered so that c comes before b, because while their first value is the same, c's second values is smaller than b's
Therefore, the four tuples above should be listed as [a, c, b, d].
Question is: how do I do it, knowing that the tuples have no fixed length, and they might differ at any value from the first to the last one?

You can sort tuples with... tuples:
res = sorted([a, b, c, d], key=lambda x: (len(x), x))
# [(2, 5), (2, 4, 3), (2, 5, 3), (1, 4, 4, 8)]
The key is the key argument, which utilises an anonymous (lambda) function. Python sorts tuples by element. So the first element len(x) gives priority to length of tuple. The second element x gives secondary importance to sorting by the tuple itself, which itself is performed element-wise.

Related

Find difference between two values?

I have a random data set, and I was wondering if it is at all possible to find all sets of points where the difference between them is greater than some constant. It doesn't matter if the points aren't consecutive, as long as the difference between the corresponding value is greater than that constant.
You can (and probably should) use itertools.permutations, no nested loops required.
E. g.: if we want to find elements from numbers between 10 and 15 (including 10 and 15) which difference is greater than 3:
from itertools import permutations
numbers = range(10, 16)
restriction = 3
filtered_numbers_pairs = []
for value, other_value in permutations(numbers, r=2):
if value - other_value > restriction:
filtered_numbers_pairs.append((value, other_value))
print(filtered_numbers_pairs)
gives us
[(14, 10), (15, 10), (15, 11)]
or if you need to store values indexes – just add enumerate:
from itertools import permutations
numbers = range(10, 16)
restriction = 3
filtered_numbers_pairs = []
for (index, value), (other_index, other_value) in permutations(enumerate(numbers), r=2):
if value - other_value > restriction:
filtered_numbers_pairs.append((index, other_index))
print(filtered_numbers_pairs)
gives us
[(4, 0), (5, 0), (5, 1)]
Python supports sets:
>>> a = {1, 2, 3}
>>> type(a)
<type 'set'>
>>> b = {2, 4, 5}
>>> a-b # Finds all items in a, but not in b.
set([1, 3])
>>> b-a # Finds all items in b, but not in a.
set([4, 5])
>>> (a-b).union(b-a) # Finds the union of both differences.
set([1, 3, 4, 5])
See help(set) for documentation.
To apply this to your question, however, will need an example of the data you have and the outcome you want. Eg, some normalization may be required, or maybe you aren't dealing with sets afterall.
Use nested loops
diff = []
for i, val1 in enumerate(dataset):
for j, val2 in enumerate(dataset[i+1:]):
if abs(val1 - val2) > some_constant:
diff.append((i, j))
The inner loop uses a slice of the array so we don't add both i, j and j, i to the result.
Yes it is possible.
It would be something like this:
sets = []
for item1 in dataset:
for item2 in dataset:
if abs(item1 - item2) > somevalue):
sets.append((item1, item2))
You create a sets list, which is gonna hold the value-pairs which have a absolute difference bigger than somevalue. Then you append the sets containing the values of those items into sets.
EDIT: The list sets is a mutable object, if you want that to be immutable, this code won't work for you.

Sort a list then give the indexes of the elements in their original order

I have an array of n numbers, say [1,4,6,2,3]. The sorted array is [1,2,3,4,6], and the indexes of these numbers in the old array are 0, 3, 4, 1, and 2. What is the best way, given an array of n numbers, to find this array of indexes?
My idea is to run order statistics for each element. However, since I have to rewrite this function many times (in contest), I'm wondering if there's a short way to do this.
>>> a = [1,4,6,2,3]
>>> [b[0] for b in sorted(enumerate(a),key=lambda i:i[1])]
[0, 3, 4, 1, 2]
Explanation:
enumerate(a) returns an enumeration over tuples consisting of the indexes and values in the original list: [(0, 1), (1, 4), (2, 6), (3, 2), (4, 3)]
Then sorted with a key of lambda i:i[1] sorts based on the original values (item 1 of each tuple).
Finally, the list comprehension [b[0] for b in ...] returns the original indexes (item 0 of each tuple).
Using numpy arrays instead of lists may be beneficial if you are doing a lot of statistics on the data. If you choose to do so, this would work:
import numpy as np
a = np.array( [1,4,6,2,3] )
b = np.argsort( a )
argsort() can operate on lists as well, but I believe that in this case it simply copies the data into an array first.
Here is another way:
>>> sorted(xrange(len(a)), key=lambda ix: a[ix])
[0, 3, 4, 1, 2]
This approach sorts not the original list, but its indices (created with xrange), using the original list as the sort keys.
This should do the trick:
from operator import itemgetter
indices = zip(*sorted(enumerate(my_list), key=itemgetter(1)))[0]
The long way instead of using list comprehension for beginner like me
a = [1,4,6,2,3]
b = enumerate(a)
c = sorted(b, key = lambda i:i[1])
d = []
for e in c:
d.append(e[0])
print(d)

add a tuple to a tuple in Python

I have a tuple:
a = (1,2,3)
and I need to add a tuple at the end
b = (4,5)
The result should be:
(1,2,3,(4,5))
Even if I wrap b in extra parents:
a + (b), I get (1,2,3,4,5) which is not what I wanted.
When you do a + b you are simply concatenating both the tuples. Here, you want the entire tuple to be a part of another tuple. So, we wrap that inside another tuple.
a, b = (1, 2, 3), (4,5)
print a + (b,) # (1, 2, 3, (4, 5))
>>> a = (1,2,3)
>>> b = (4,5)
>>> a + (b,)
(1, 2, 3, (4, 5))
tuple objects are immutable. The result you're getting is a result of the fact that the + (and +=) operator is overridden to allow "extending" tuples the same way as lists. So when you add two tuples, Python assumes that you want to concatenate their contents.
To add an entire tuple onto the end of another tuple, wrap the tuple to be added inside another tuple.
c = a + (b,) # Add a 1-tuple containing the tuple to be added.
print(c) # >>> (1, 2, 3, (4, 5))

Getting one value from a tuple

Is there a way to get one value from a tuple in Python using expressions?
def tup():
return (3, "hello")
i = 5 + tup() # I want to add just the three
I know I can do this:
(j, _) = tup()
i = 5 + j
But that would add a few dozen lines to my function, doubling its length.
You can write
i = 5 + tup()[0]
Tuples can be indexed just like lists.
The main difference between tuples and lists is that tuples are immutable - you can't set the elements of a tuple to different values, or add or remove elements like you can from a list. But other than that, in most situations, they work pretty much the same.
For anyone in the future looking for an answer, I would like to give a much clearer answer to the question.
# for making a tuple
my_tuple = (89, 32)
my_tuple_with_more_values = (1, 2, 3, 4, 5, 6)
# to concatenate tuples
another_tuple = my_tuple + my_tuple_with_more_values
print(another_tuple)
# (89, 32, 1, 2, 3, 4, 5, 6)
# getting a value from a tuple is similar to a list
first_val = my_tuple[0]
second_val = my_tuple[1]
# if you have a function called my_tuple_fun that returns a tuple,
# you might want to do this
my_tuple_fun()[0]
my_tuple_fun()[1]
# or this
v1, v2 = my_tuple_fun()
Hope this clears things up further for those that need it.
General
Single elements of a tuple a can be accessed -in an indexed array-like fashion-
via a[0], a[1], ... depending on the number of elements in the tuple.
Example
If your tuple is a=(3,"a")
a[0] yields 3,
a[1] yields "a"
Concrete answer to question
def tup():
return (3, "hello")
tup() returns a 2-tuple.
In order to "solve"
i = 5 + tup() # I want to add just the three
you select the 3 by:
tup()[0] # first element
so all together:
i = 5 + tup()[0]
Alternatives
Go with namedtuple that allows you to access tuple elements by name (and by index). Details are at https://docs.python.org/3/library/collections.html#collections.namedtuple
>>> import collections
>>> MyTuple=collections.namedtuple("MyTuple", "mynumber, mystring")
>>> m = MyTuple(3, "hello")
>>> m[0]
3
>>> m.mynumber
3
>>> m[1]
'hello'
>>> m.mystring
'hello'

Binning into timeslots - Is there a better way than using list comp?

I have a dataset of events (tweets to be specific) that I am trying to bin / discretize. The following code seems to work fine so far (assuming 100 bins):
HOUR = timedelta(hours=1)
start = datetime.datetime(2009,01,01)
z = [dt + x*HOUR for x in xrange(1, 100)]
But then, I came across this fateful line at python docs 'This makes possible an idiom for clustering a data series into n-length groups using zip(*[iter(s)]*n)'. The zip idiom does indeed work - but I can't understand how (what is the * operator for instance?). How could I use to make my code prettier? I'm guessing this means I should make a generator / iterable for time that yields the time in graduations of an HOUR?
I will try to explain zip(*[iter(s)]*n) in terms of a simpler example:
imagine you have the list s = [1, 2, 3, 4, 5, 6]
iter(s) gives you a listiterator object that will yield the next number from s each time you ask for an element.
[iter(s)] * n gives you the list with iter(s) in it n times e.g. [iter(s)] * 2 = [<listiterator object>, <listiterator object>] - the key here is that these are 2 references to the same iterator object, not 2 distinct iterator objects.
zip takes a number of sequences and returns a list of tuples where each tuple contains the ith element from each of the sequences. e.g. zip([1,2], [3,4], [5,6]) = [(1, 3, 5), (2, 4, 6)] where (1, 3, 5) are the first elements from the parameters passed to zip and (2, 4, 6) are the second elements from the parameters passed to zip.
The * in front of *[iter(s)]*n converts the [iter(s)]*n from being a list into being multiple parameters being passed to zip. so if n is 2 we get zip(<listiterator object>, <listiterator object>)
zip will request the next element from each of its parameters but because these are both references to the same iterator this will result in (1, 2), it does the same again resulting in (3, 4) and again resulting in (5, 6) and then there are no more elements so it stops. Hence the result [(1, 2), (3, 4), (5, 6)]. This is the clustering a data series into n-length groups as mentioned.
The expression from the docs looks like this:
zip(*[iter(s)]*n)
This is equivalent to:
it = iter(s)
zip(*[it, it, ..., it]) # n times
The [...]*n repeats the list n times, and this results in a list that contains nreferences to the same iterator.
This is again equal to:
it = iter(s)
zip(it, it, ..., it) # turning a list into positional parameters
The * before the list turns the list elements into positional parameters of the function call.
Now, when zip is called, it starts from left to right to call the iterators to obtain elements that should be grouped together. Since all parameters refer to the same iterator, this yields the first n elements of the initial sequence. Then that process continues for the second group in the resulting list, and so on.
The result is the same as if you had constructed the list like this (evaluated from left to right):
it = iter(s)
[(it.next(), it.next(), ..., it.next()), (it.next(), it.next(), ..., it.next()), ...]

Categories

Resources