Related
Consider a portion of an ndarray x formed by two consecutive slices (I'm using numpy in the example, but the question is more general. I'm actually using pytorch in my application.):
import numpy as np
x = np.arange(4 * 10 * 12 * 7).reshape(4, 10, 12, 7)
first = (slice(None), 3, slice(3, 9))
second = (2, slice(1, 3), slice(5))
out = x[first][second]
I want a way to get a canonical, combined ndslice as a function of x.shape, first and second. E.g.
combined = compose(x.shape, first, second)
assert np.equal(x[first][second], x[combined]).all()
assert combined == (2, 3, slice(4, 6), slice(5))
I am only interested in "simple" ndslices consisting of: a single int, a single slice object, or a tuple of any combination of ints and slice objects.
By canonical, I mean that the resulting combined slice should uniquely identify that subset of x.
For example here is another way to access the same segment and it should lead to the same combined ndslice:
other_first = (slice(2, 4), slice(None), slice(2, 7))
other_second = (0, 3, slice(2, -1), slice(5))
combined = compose(x.shape, other_first, other_second)
assert np.equal(x[other_first][other_second], x[combined]).all()
assert combined == (2, 3, slice(4, 6), slice(5))
Because slices support "None" and negative indexes, we will need the shape of x in order to get a canonical ndslice.
Related Questions
Note that I am interested in x[first][second] which different (in general) from x[first, second] discussed elsewhere.
I'm still interested if someone happens to know of a builtin or more elegant solution (even if specific to numpy or pytorch), but this is the general, home-grown solution I came up with:
def compose_single(lhs, rhs, length):
out = range(length)[lhs][rhs]
return out if isinstance(out, int) else slice(out.start, out.stop, out.step)
def compose(shape, first, second):
def ensure_tuple(ndslice):
return ndslice if isinstance(ndslice, tuple) else (ndslice,)
first = ensure_tuple(first)
second = ensure_tuple(second)
out = list(first) + [slice(None)] * (len(shape) - len(first))
remaining_dims = [i for i, s in enumerate(out) if isinstance(s, slice)]
for i, rhs in zip(remaining_dims, second):
out[i] = compose_single(out[i], rhs, length=shape[i])
return tuple(out)
Note that the canonical output will not use negative or None starts or ends. So I've updated the test cases below:
shape = (4, 10, 12, 7)
first = (slice(None), 3, slice(3, 9))
second = (2, slice(1, 3), slice(5))
expected_combined = (2, 3, slice(4, 6, 1), slice(0, 5, 1))
assert compose(shape, first, second) == expected_combined
other_first = (slice(2, 4), slice(None), slice(2, 7))
other_second = (0, 3, slice(2, -1), slice(5))
assert compose(shape, other_first, other_second) == expected_combined
I am relatively new to programming, and have this problem:
There are two lists: C=[i,i,k,l,i] and D =[m,n,o,p,q]
I want to select the index of the minimum element of C. If k or l is the minimum, it is quite simple, since the min function will directly return the desired index. But if i is the minimum, there are several possibilities. In that case, I want to look at list D's elements, but only at the indices where i occurs in C. I then want to chose my sought-after index based on the minimum of those particular elements in D
I thought of the following code:
min_C = min(C)
if C.count(min_C) == 1:
soughtafter_index = C.index(min_C)
else:
possible_D_value = []
for iterate in C:
if iterate==min_C:
possible_index = C.index(iterate)
possible_D_value.append(D[possible_index])
best_D_value = min(possible_D_value)
soughtafter_index = D.index(best_D_value)
(Note that in the problem C and D will always have the same length)
I havent had a chance to test the code yet, but wanted to ask whether it is reasonable? Is there a better way to handle this? (and what if there is a third list-- then this code will get even longer...)
Thank you all
Try this:
soughtafter_index = list(zip(C, D)).index(min(zip(C,D)))
UPDATE with the required explanation:
>>> C = [1, 5, 1, 3, 1, 4]
>>> D = [0, 1, 1, 3, 0, 1]
>>> list(zip(C, D))
[(1, 0), (5, 1), (1, 1), (3, 3), (1, 0), (4, 1)]
>>> min(zip(C, D))
(1, 0)
I want to generate permutations of elements in a list, but only keep a set where each element is on each position only once.
For example [1, 2, 3, 4, 5, 6] could be a user list and I want 3 permutations. A good set would be:
[1,2,3,5,4,6]
[2,1,4,6,5,3]
[3,4,5,1,6,2]
However, one could not add, for example, [1,3,2,6,5,4] to the above, as there are two permutations in which 1 is on the first position twice, also 5 would be on the 5th position twice, however other elements are only present on those positions once.
My code so far is :
# this simply generates a number of permutations specified by number_of_samples
def generate_perms(player_list, number_of_samples):
myset = set()
while len(myset) < number_of_samples:
random.shuffle(player_list)
myset.add(tuple(player_list))
return [list(x) for x in myset]
# And this is my function that takes the stratified samples for permutations.
def generate_stratified_perms(player_list, number_of_samples):
user_idx_dict = {}
i = 0
while(i < number_of_samples):
perm = generate_perms(player_list, 1)
for elem in perm:
if not user_idx_dict[elem]:
user_idx_dict[elem] = [perm.index(elem)]
else:
user_idx_dict[elem] += [perm.index(elem)]
[...]
return total_perms
but I don't know how to finish the second function.
So in short, I want to give my function a number of permutations to generate, and the function should give me that number of permutations, in which no element appears on the same position more than the others (once, if all appear there once, twice, if all appear there twice, etc).
Let's starting by solving the case of generating n or fewer rows first. In that case, your output must be a Latin rectangle or a Latin square. These are easy to generate: start by constructing a Latin square, shuffle the rows, shuffle the columns, and then keep just the first r rows. The following always works for constructing a Latin square to start with:
1 2 3 ... n
2 3 4 ... 1
3 4 5 ... 2
... ... ...
n 1 2 3 ...
Shuffling rows is a lot easier than shuffling columns, so we'll shuffle the rows, then take the transpose, then shuffle the rows again. Here's an implementation in Python:
from random import shuffle
def latin_rectangle(n, r):
square = [
[1 + (i + j) % n for i in range(n)]
for j in range(n)
]
shuffle(square)
square = list(zip(*square)) # transpose
shuffle(square)
return square[:r]
Example:
>>> latin_rectangle(5, 4)
[(2, 4, 3, 5, 1),
(5, 2, 1, 3, 4),
(1, 3, 2, 4, 5),
(3, 5, 4, 1, 2)]
Note that this algorithm can't generate all possible Latin squares; by construction, the rows are cyclic permutations of each other, so you won't get Latin squares in other equivalence classes. I'm assuming that's OK since generating a uniform probability distribution over all possible outputs isn't one of the question requirements.
The upside is that this is guaranteed to work, and consistently in O(n^2) time, because it doesn't use rejection sampling or backtracking.
Now let's solve the case where r > n, i.e. we need more rows. Each column can't have equal frequencies for each number unless r % n == 0, but it's simple enough to guarantee that the frequencies in each column will differ by at most 1. Generate enough Latin squares, put them on top of each other, and then slice r rows from it. For additional randomness, it's safe to shuffle those r rows, but only after taking the slice.
def generate_permutations(n, r):
rows = []
while len(rows) < r:
rows.extend(latin_rectangle(n, n))
rows = rows[:r]
shuffle(rows)
return rows
Example:
>>> generate_permutations(5, 12)
[(4, 3, 5, 2, 1),
(3, 4, 1, 5, 2),
(3, 1, 2, 4, 5),
(5, 3, 4, 1, 2),
(5, 1, 3, 2, 4),
(2, 5, 1, 3, 4),
(1, 5, 2, 4, 3),
(5, 4, 1, 3, 2),
(3, 2, 4, 1, 5),
(2, 1, 3, 5, 4),
(4, 2, 3, 5, 1),
(1, 4, 5, 2, 3)]
This uses the numbers 1 to n because of the formula 1 + (i + j) % n in the first list comprehension. If you want to use something other than the numbers 1 to n, you can take it as a list (e.g. players) and change this part of the list comprehension to players[(i + j) % n], where n = len(players).
If runtime is not that important I would go for the lazy way and generate all possible permutations (itertools can do that for you) and then filter out all permutations which do not meet your requirements.
Here is one way to do it.
import itertools
def permuts (l, n):
all_permuts = list(itertools.permutations(l))
picked = []
for a in all_permuts:
valid = True
for p in picked:
for i in range(len(a)):
if a[i] == p[i]:
valid = False
break
if valid:
picked.append (a)
if len(picked) >= n:
break
print (picked)
permuts ([1,2,3,4,5,6], 3)
I'm trying to figure out how to iterate over an arbitrary number of loops where each loop depends on the most recent outer loop. The following code is an example of what I want to do:
def function(z):
n = int(log(z))
tupes = []
for i_1 in range(1, n):
for i_2 in range(1, i_1):
...
...
...
for i_n in range(1, i_{n - 1}):
if i_1*i_2*...*i_n > z:
tupes.append((i_1, i_2,..., i_n))
return tupes
While I'd like this to work for any z > e**2, it's sufficient for it to work for zs up to e**100. I know that if I take the Cartesian product of the appropriate ranges that I'll end up with a superset of the tuples I desire, but I'd like to obtain only the tuples I seek.
If anyone can help me with this, I'd greatly appreciate it. Thanks in advance.
Combinations can be listed in ascending order; in fact, this is the default behavior of itertools.combinations.
The code:
for i1 in range(1,6):
for i2 in range(1,i1):
for i3 in range(1,i2):
print (i3, i2, i1)
# (1, 2, 3)
# (1, 2, 4)
# ...
# (3, 4, 5)
Is equivalent to the code:
from itertools import combinations
for combination in combinations(range(1,6), 3):
print combination
# (1, 2, 3)
# (1, 2, 4)
# ...
# (3, 4, 5)
Using the combinations instead of the Cartesian product culls the sample space down to what you want.
The logic in your question implemented recursively (note that this allows for duplicate tuples):
import functools
def f(n, z, max_depth, factors=(), depth=0):
res = []
if depth == max_depth:
product = functools.reduce(lambda x, y: x*y, factors, 1)
if product > z:
res.append(factors)
else:
for i in range(1, n):
new_factors = factors + (i,)
res.extend(f(i, z, factors=new_factors, depth=depth+1, max_depth=max_depth))
return res
z = np.e ** 10
n = int(np.log(z))
print(f(n, z, max_depth=8))
yields
[(8, 7, 6, 5, 4, 3, 2, 1),
(9, 7, 6, 5, 4, 3, 2, 1),
(9, 8, 6, 5, 4, 3, 2, 1),
(9, 8, 7, 5, 4, 3, 2, 1),
(9, 8, 7, 6, 4, 3, 2, 1),
(9, 8, 7, 6, 5, 3, 2, 1),
(9, 8, 7, 6, 5, 4, 2, 1),
(9, 8, 7, 6, 5, 4, 3, 1),
(9, 8, 7, 6, 5, 4, 3, 2)]
As zondo suggested, you'll need to use a function and recursion to accomplish this task. Something along the lines of the following should work:
def recurse(tuplesList, potentialTupleAsList, rangeEnd, z):
# No range to iterate over, check if tuple sum is large enough
if rangeEnd = 1 and sum(potentialTupleAsList) > z:
tuplesList.append(tuple(potentialTupeAsList))
return
for i in range(1, rangeEnd):
potentialTupleAsList.append(i)
recurse(tuplesList, potentialTupleAsList, rangeEnd - 1, z)
# Need to remove item you used to make room for new value
potentialTupleAsList.pop(-1)
Then you could call it as such to get the results:
l = []
recurse(l, [], int(log(z)), z)
print l
Your innermost loop can (if reached at all) only go over range(1, 1). Since the endpoint is not included, the loop will not iterate over any values. The shortest implementation of your function is thus:
def function(z):
return []
If you are content with tuples of length smaller than n, then I propose the following solution:
import math
def function(z):
def f(tuples, loop_variables, product, end):
if product > z:
tuples.append(loop_variables)
for i in range(end - 1, 0, -1):
f(tuples, loop_variables + (i,), product * i, i)
n = int(math.log(z))
tuples = []
f(tuples, (), 1, n)
return tuples
The time complexity is not good though: With n nested loops over O(n) elements, we are on the order of n**n steps.
Given 2 lists:
a = [3,4,5,5,5,6]
b = [1,3,4,4,5,5,6,7]
I want to find the "overlap":
c = [3,4,5,5,6]
I'd also like it if i could extract the "remainder" the part of a and b that's not in c.
a_remainder = [5,]
b_remainder = [1,4,7,]
Note:
a has three 5's in it and b has two.
b has two 4's in it and a has one.
The resultant list c should have two 5's (limited by list b) and one 4 (limited by list a).
This gives me what i want, but I can't help but think there's a much better way.
import copy
a = [3,4,5,5,5,6]
b = [1,3,4,4,5,5,6,7]
c = []
for elem in copy.deepcopy(a):
if elem in b:
a.pop(a.index(elem))
c.append(b.pop(b.index(elem)))
# now a and b both contain the "remainders" and c contains the "overlap"
On another note, what is a more accurate name for what I'm asking for than "overlap" and "remainder"?
collection.Counter available in Python 2.7 can be used to implement multisets that do exactly what you want.
a = [3,4,5,5,5,6]
b = [1,3,4,4,5,5,6,7]
a_multiset = collections.Counter(a)
b_multiset = collections.Counter(b)
overlap = list((a_multiset & b_multiset).elements())
a_remainder = list((a_multiset - b_multiset).elements())
b_remainder = list((b_multiset - a_multiset).elements())
print overlap, a_remainder, b_remainder
Use python set
intersection = set(a) & set(b)
a_remainder = set(a) - set(b)
b_remainder = set(b) - set(a)
In the language of sets, overlap is 'intersection' and remainder is 'set difference'. If you had distinct items, you wouldn't have to do these operations yourself, check out http://docs.python.org/library/sets.html if you're interested.
Since we're not working with distinct elements, your approach is reasonable. If you wanted this to run faster, you could create a dictionary for each list and map the number to how many elements are in each array (e.g., in a, 3->1, 4->1, 5->2, etc.). You would then iterate through map a, determine if that letter existed, decrement its count and add it to the new list
Untested code, but this is the idea
def add_or_update(map,value):
if value in map:
map[value]+=1
else
map[value]=1
b_dict = dict()
for b_elem in b:
add_or_update(b_dict,b_elem)
intersect = []; diff = [];
for a_elem in a:
if a_elem in b_dict and b_dict[a_elem]>0:
intersect.add(a_elem);
for k,v in diff:
for i in range(v):
diff.add(k);
OK, verbose, but kind of cool (similar in spirit to the collections.Counter idea, but more home-made):
import itertools as it
flatten = it.chain.from_iterable
sorted(
v for u,v in
set(flatten(enumerate(g)
for k, g in it.groupby(a))).intersection(
set(flatten(enumerate(g)
for k, g in it.groupby(b))))
)
The basic idea is to make each of the lists into a new list which attaches a counter to each object, numbered to account for duplicates -- so that then you can then use set operations on these tuples after all.
To be slightly less verbose:
aa = set(flatten(enumerate(g) for k, g in it.groupby(a)))
bb = set(flatten(enumerate(g) for k, g in it.groupby(b)))
# aa = set([(0, 3), (0, 4), (0, 5), (0, 6), (1, 5), (2, 5)])
# bb = set([(0, 1), (0, 3), (0, 4), (0, 5), (0, 6), (0, 7), (1, 4), (1, 5)])
cc = aa.intersection(bb)
# cc = set([(0, 3), (0, 4), (0, 5), (0, 6), (1, 5)])
c = sorted(v for u,v in cc)
# c = [3, 4, 5, 5, 6]
groupby -- produces a list of lists containing identical elements
(but because of the syntax needs the g for k,g in it.groupby(a) to extract each list)
enumerate -- appends a counter to each element of each sublist
flatten -- create a single list
set -- convert to a set
intersection -- find the common elements
sorted(v for u,v in cc) -- get rid of the counters and sort the result
Finally, I'm not sure what you mean by the remainders; it seems like it ought to be my aa-cc and bb-cc but I don't know where you get a_remainder = [4]:
sorted(v for u,v in aa-cc)
# [5]
sorted(v for u,v in bb-cc)
# [1, 4, 7]
A response from kerio in #python on freenode:
[ i for i in itertools.chain.from_iterable([k] * v for k, v in \
(Counter(a) & Counter(b)).iteritems())
]
Try difflib.SequenceMatcher(), "a flexible class for comparing pairs of sequences of any type"...
A quick try:
a = [3,4,5,5,5,6]
b = [1,3,4,4,5,5,6,7]
sm = difflib.SequenceMatcher(None, a, b)
c = []
a_remainder = []
b_remainder = []
for tag, i1, i2, j1, j2 in sm.get_opcodes():
if tag == 'replace':
a_remainder.extend(a[i1:i2])
b_remainder.extend(b[j1:j2])
elif tag == 'delete':
a_remainder.extend(a[i1:i2])
elif tag == 'insert':
b_remainder.extend(b[j1:j2])
elif tag == 'equal':
c.extend(a[i1:i2])
And now...
>>> print c
[3, 4, 5, 5, 6]
>>> print a_remainder
[5]
>>> print b_remainder
[1, 4, 7]
Aset = Set(a);
Bset = Set(b);
a_remainder = a.difference(b);
b_remainder = b.difference(a);
c = a.intersection(b);
But if you need c to have duplicates, and order is important for you,
you may look for w:Longest common subsequence problem
I don't think you should actually use this solution, but I took this opportunity to practice with lambda functions and here is what I came up with :)
a = [3,4,5,5,5,6]
b = [1,3,4,4,5,5,6,7]
dedup = lambda x: [set(x)] if len(set(x)) == len(x) else [set(x)] + dedup([x[i] for i in range(1, len(x)) if x[i] == x[i-1]])
default_set = lambda x: (set() if x[0] is None else x[0], set() if x[1] is None else x[1])
deduped = map(default_set, map(None, dedup(a), dedup(b)))
get_result = lambda f: reduce(lambda x, y: list(x) + list(y), map(lambda x: f(x[0], x[1]), deduped))
c = get_result(lambda x, y: x.intersection(y)) # [3, 4, 5, 6, 5]
a_remainder = get_result(lambda x, y: x.difference(y)) # [5]
b_remainder = get_result(lambda x, y: y.difference(x)) # [1, 7, 4]
I'm pretty sure izip_longest would have simplified this a bit (wouldn't have needed the default_set lambda), but I was testing this with Python 2.5.
Here are some of the intermediate values used in the calculation in case anyone wants to understand this:
dedup(a) = [set([3, 4, 5, 6]), set([5]), set([5])]
dedup(b) = [set([1, 3, 4, 5, 6, 7]), set([4, 5])]
deduped = [(set([3, 4, 5, 6]), set([1, 3, 4, 5, 6, 7])), (set([5]), set([4, 5])), (set([5]), set([]))]