Sorting a record array in numpy - python

I have an numpy structure array
import numpy as np
np.array([(0, 1, 1167606000), (0, 1, 1167606005), (0, 1, 1167606008),
(0, 10, 1167606010), (0, 10, 1167606012), (1, 0, 1167606000),
(1, 2, 1167606001), (1, 0, 1167606005), (1, 0, 1167606008),
(2, 1, 1167606001), (2, 3, 1167606002), (3, 2, 1167606002),
(3, 4, 1167606003), (4, 3, 1167606003), (4, 5, 1167606004),
(5, 4, 1167606004), (5, 6, 1167606005), (6, 5, 1167606005),
(6, 7, 1167606006), (7, 6, 1167606006), (7, 8, 1167606007),
(8, 7, 1167606007), (8, 9, 1167606008), (9, 8, 1167606008),
(9, 10, 1167606009), (10, 9, 1167606009), (10, 0, 1167606010),
(10, 0, 1167606012)],
dtype=[('fr', '<i8'), ('to', '<i8'), ('time', '<i8')])
Is there a vectorized way to sort it first by the minimum of 'fr', 'to' and then by 'time'. Another thing is that I want to sort this without making any copies.
Edit:
The sort is not by 'fr', 'to' and 'time', but first by the minimum of 'fr' and 'to' followed by 'time'. The expected answer in the above case is
(0, 1, 1167606000),
(1, 0, 1167606000),
(0, 1, 1167606005),
(1, 0, 1167606005),
(0, 1, 1167606008),
(1, 0, 1167606008),
(0, 10, 1167606010),
(0, 10, 1167606012),
(1, 2, 1167606001),
(2, 1, 1167606001),
(2, 3, 1167606002),
(3, 2, 1167606002),
(3, 4, 1167606003),
(4, 3, 1167606003),
...

You can give an order argument to sort:
a.sort(order=['fr', 'to', 'time'])
Sorting by minimum of two columns:
Using lexsort, you can sort by any set of keys. Here, give it a['time'] and np.minimum(a['to'], a['fr']) (sorts by last item first)
inds = np.lexsort((a['time'], np.minimum(a['to'], a['fr'])))
a = a[inds]
To avoid making a copy of a when you rearrange it, you can use take instead of a = a[inds]
np.take(a, inds, out=a)

Related

List Transformation of Routes in a list

I have a routes list like this:
routes=
[[(0, 1), (1, 6), (6, 7), (7, 10), (10, 9), (9, 0)],
[(0, 2), (2, 3), (3, 4), (4, 5), (5, 0)],
[(0, 8), (8, 0)]]
Route 1 is from 0 to 1 to 6 to 7 and so on ....
how can i transform this list into a list like this:
routes_new= [[0, 1, 6, 7, 10, 9, 0], [0, 2, 3, 4, 5, 0],[0,8,0]]
Thanks a lot!
Assuming you are looking for cycles here, which seems to be the case, a simple way to achieve what you want is using NetworkX's nx.simple_cycles. This does not require for the edges to be ordered.
import networkx as nx
routes= [[(0, 1), (1, 6), (6, 7), (7, 10), (10, 9), (9, 0)],
[(0, 2), (2, 3), (3, 4), (4, 5), (5, 0)],
[(0, 8), (8, 0)]]
paths = []
for route in routes:
G = nx.from_edgelist(route, create_using=nx.DiGraph)
paths.append(list(nx.simple_cycles(G)))
paths
# [[[0, 1, 6, 7, 10, 9]], [[0, 2, 3, 4, 5]], [[0, 8]]]
If there can be multiple cycles in each route, check other functions in the cycle module, like nx.find_cycle, which allow you to specify an origin.
A simple list comprehension can help you flatten your lists:
routes_new = [[t[0] for t in l]+[l[-1][1]] for l in routes]
NB. this is assuming the edges are in order and linked. No check is performed. If this is not the case, please provide such an example.
output:
[[0, 1, 6, 7, 10, 9, 0], [0, 2, 3, 4, 5, 0], [0, 8, 0]]
You can just take the first number of each tuple and add the very last one for each route :
routes_new = [[val[0] for val in route] + [route[-1][-1]] for route in routes]

Create new list out of two lists via multiplication. Python

Good morning!
I'm trying to generate a new list out of two lists, by using multiplication operation.
Below I show you step by step what I did:
import itertools
from itertools import product
import numpy as np
import pandas as pd
Parameter_list=[]
Parameter=[range(0,2,1),range(0,2,1)]
Parameter_list=list(itertools.product(*Parameter))
print(Parameter_list)
[(0, 0), (0, 1), (1, 0), (1, 1)]
Then I deleted the first value, which is basically the null matrix:
del Parameter_list[0]
print(Parameter_list)
[(0, 1), (1, 0), (1, 1)]
I proceeded by creating the two paramter list:
Parameter_A=[range(1,2,1),range(3,6,2),range(10,20,10)]
Parameter_A=list(itertools.product(*Parameter_A))
Parameter_B=[range(0,2,1),range(4,6,2),range(10,20,10)]
Parameter_B=list(itertools.product(*Parameter_B))
print(Parameter_A)
print(Parameter_B)
[(1, 3, 10), (1, 5, 10)]
[(0, 4, 10), (1, 4, 10)]
And combined the lists:
comb=list(product(Parameter_A,Parameter_B))
print(comb)
[((1, 3, 10), (0, 4, 10)),
((1, 3, 10), (1, 4, 10)),
((1, 5, 10), (0, 4, 10)),
((1, 5, 10), (1, 4, 10))]
Until here no prob. But now I'm struggling to create a new list from multiplying the Parameter List with the comb list. The desired output is the following:
[((0, 0, 0), (0, 4, 10)),
((0, 0, 0), (1, 4, 10)),
((0, 0, 0), (0, 4, 10)),
((0, 0, 0), (1, 4, 10)),
((1, 3, 10), (0, 0, 0)),
((1, 3, 10), (0, 0, 0)),
((1, 5, 10), (0, 0, 0)),
((1, 5, 10), (0, 0, 0)),
((1, 3, 10), (0, 4, 10)),
((1, 3, 10), (1, 4, 10)),
((1, 5, 10), (0, 4, 10)),
((1, 5, 10), (1, 4, 10))]
Can someone help me? Many thanks!
Doing this with lists instead of with a numpy array is not the most convenient choice. That said, it's still something you can do with a one-liner.
prod = [tuple(i if j != 0 else (0,) * len(i) for i, j in zip(comb_items, bool_items))
for comb_items, bool_items in itertools.product(comb, Parameter_list)]
>>> prod
[((0, 0, 0), (0, 4, 10)),
((1, 3, 10), (0, 0, 0)),
((1, 3, 10), (0, 4, 10)),
((0, 0, 0), (1, 4, 10)),
((1, 3, 10), (0, 0, 0)),
((1, 3, 10), (1, 4, 10)),
((0, 0, 0), (0, 4, 10)),
((1, 5, 10), (0, 0, 0)),
((1, 5, 10), (0, 4, 10)),
((0, 0, 0), (1, 4, 10)),
((1, 5, 10), (0, 0, 0)),
((1, 5, 10), (1, 4, 10))]
I am assuming that the order of the outputs isn't critical and that the Parameter_list will always be booleans. Both of these things can be pretty easily changed if needed.

Find rows in which sum of column values have a specific result

I have the following dataframe:
value
0 2
1 3
2 10
3 1
4 12
I need to build a formula that identifies which rows that, when values are summed, result is 23.
In this case the output should be something like [2,3,4] (10+1+12).
I believe it is something in the permutation/combination field however the option I found that led me closer to that target demands an specific length of combinations, and that wouldn't be the case since the combination could be composed of n values (I will never know the exact size of n upfront).
Is there a way to do that?
from pandas import Series
import itertools
s = Series([2, 3, 10, 1, 12])
result = []
for a, b, c in itertools.combinations(s.index, 3):
combination_sum = s.iloc[[a, b, c]].sum()
if combination_sum == 23:
result.append((a, b, c))
result
you can generalize this and make it into a function for n values.
This is how you will generalize it
In example series I have added some more values for better understanding
from pandas import Series
import itertools
s = Series([2, 3, 10, 1, 12, 4, 5, 6, 7, 8])
def get_column_whose_sum_is(sum_value=23, combination_of_columns=3, data_as_series=s):
result = []
for columns in itertools.combinations(data_as_series.index, combination_of_columns):
combination_sum = data_as_series.iloc[list(columns)].sum()
if combination_sum == sum_value:
result.append(columns)
return result
get_column_whose_sum_is(sum_value = 33, combination_of_columns = 4, data_as_series = s)
# [(1, 2, 4, 9), (2, 4, 5, 8), (2, 4, 6, 7), (4, 7, 8, 9)]
get_column_whose_sum_is(sum_value = 23, combination_of_columns = 3, data_as_series = s)
# [(1, 4, 9), (2, 3, 4), (2, 6, 9), (2, 7, 8), (4, 5, 8), (4, 6, 7)]
#for loop to find all combinations possibilities
c=[]
for i in range(len(s.index)):
c=c+get_column_whose_sum_is(sum_value = 23, combination_of_columns = i, data_as_series = s)
print(c)
#[(1, 4, 9), (2, 3, 4), (2, 6, 9), (2, 7, 8), (4, 5, 8), (4, 6, 7), (0, 1, 2, 9), (0, 1, 4, 7), (0, 2, 5, 8), (0, 2, 6, 7), (0, 3, 4, 9), (0, 4, 5, 6), (0, 7, 8, 9), (1, 2, 5, 7), (1, 3, 4, 8), (1, 6, 8, 9), (2, 3, 5, 9), (2, 3, 6, 8), (3, 4, 5, 7), (5, 6, 7, 9), (0, 1, 2, 3, 8), (0, 1, 3, 4, 6), (0, 1, 5, 7, 9), (0, 1, 6, 7, 8), (0, 2, 3, 5, 7), (0, 3, 6, 8, 9), (1, 2, 3, 5, 6), (1, 3, 5, 8, 9), (1, 3, 6, 7, 9), (3, 5, 6, 7, 8), (0, 1, 3, 5, 6, 9), (0, 1, 3, 5, 7, 8)]
Be aware that a subset sum like that might incur in performance problems even with small samples.

Using Numpy to find combination of rows in an array such that each column sums to the same value

I'm trying to use numpy to find configurations of rows in a matrix such that summing the columns of the rows will result in the same value. As an example, for the matrix/array
[[0,0,0,1],
[1,0,1,0],
[1,1,0,0],
[0,1,0,0]]
I would like to have the first, second, and last row as output, because
0,0,0,1
1,0,1,0
0,1,0,0 +
-------
= 1,1,1,1
Is there any tools built into numpy that would assist me in acquiring this?
One solution is to enumerate the power set of rows and then check each possible subset of rows for the summation condition. For matrices with a large number of rows, this is likely to be quite slow.
Use the standard itertools recipe for the power set:
from itertools import chain, combinations
def powerset(iterable):
xs = list(iterable)
return chain.from_iterable(combinations(xs, n) for n in range(len(xs) + 1))
then I show a working example with some synthetic data:
In [79]: data
Out[79]:
array([[0, 1, 1],
[0, 0, 1],
[1, 0, 1],
[0, 1, 1],
[0, 0, 0],
[0, 1, 0],
[1, 1, 1],
[1, 1, 0],
[1, 1, 1],
[0, 1, 0]], dtype=int32)
In [80]: def is_constant(array):
...: return (array == array[0]).all()
...:
In [81]: solution = []
In [82]: for candidate in powerset(range(len(data))):
...: if candidate and is_constant(data[candidate, :].sum(axis=0)):
...: solution.append(candidate)
...:
Which shows, for example:
In [83]: solution
Out[83]:
[(4,),
(6,),
(8,),
(1, 7),
(2, 5),
(2, 9),
(4, 6),
(4, 8),
(6, 8),
(0, 2, 7),
(1, 4, 7),
(1, 6, 7),
(1, 7, 8),
(2, 3, 7),
(2, 4, 5),
(2, 4, 9),
(2, 5, 6),
(2, 5, 8),
(2, 6, 9),
(2, 8, 9),
(4, 6, 8),
(0, 2, 4, 7),
(0, 2, 6, 7),
(0, 2, 7, 8),
(1, 2, 5, 7),
(1, 2, 7, 9),
(1, 4, 6, 7),
(1, 4, 7, 8),
(1, 6, 7, 8),
(2, 3, 4, 7),
(2, 3, 6, 7),
(2, 3, 7, 8),
(2, 4, 5, 6),
(2, 4, 5, 8),
(2, 4, 6, 9),
(2, 4, 8, 9),
(2, 5, 6, 8),
(2, 6, 8, 9),
(0, 2, 4, 6, 7),
(0, 2, 4, 7, 8),
(0, 2, 6, 7, 8),
(1, 2, 4, 5, 7),
(1, 2, 4, 7, 9),
(1, 2, 5, 6, 7),
(1, 2, 5, 7, 8),
(1, 2, 6, 7, 9),
(1, 2, 7, 8, 9),
(1, 4, 6, 7, 8),
(2, 3, 4, 6, 7),
(2, 3, 4, 7, 8),
(2, 3, 6, 7, 8),
(2, 4, 5, 6, 8),
(2, 4, 6, 8, 9),
(0, 2, 4, 6, 7, 8),
(1, 2, 4, 5, 6, 7),
(1, 2, 4, 5, 7, 8),
(1, 2, 4, 6, 7, 9),
(1, 2, 4, 7, 8, 9),
(1, 2, 5, 6, 7, 8),
(1, 2, 6, 7, 8, 9),
(2, 3, 4, 6, 7, 8),
(1, 2, 4, 5, 6, 7, 8),
(1, 2, 4, 6, 7, 8, 9)]
and we can verify the solution for a few of these cases:
In [84]: data[(1, 2, 4, 6, 7, 8, 9), :].sum(axis=0)
Out[84]: array([4, 4, 4])
In [85]: data[(0, 2, 4, 6, 7), :].sum(axis=0)
Out[85]: array([3, 3, 3])
To extend this for more specific use cases, you could use itertools.combinations to produce subsets of only a certain size, like subsets of exactly 2 rows or exactly 3 rows, etc.
Or you could just filter out unwanted results (like trivial solutions consisting of one row at a time) from the result set given in my example.
Note you can simplify the function definition of powerset (the one I use is literally just taken from the Python docs about itertools recipes). Instead of passing an iterable that gets converted to a list, you could just pass an integer and skip directly to return the final chain.from_iterable result, and then modify to just pass len(data) as the argument of powerset in my example, like this:
from itertools import chain, combinations
def powerset(N):
"""Power set of integers {0, ..., N-1}."""
xs = list(range(N))
return chain.from_iterable(combinations(xs, n) for n in range(N + 1))
...
for candidate in powerset(len(data)):
...

python: perform a generic multi dimensional loop

Python:
How to efficiency execute a multidimensional loop, when the number of indexes to loop is dynamic.
Assume an array var_size containing the size of each variable
var_size = [ 3, 4, 5 ]
and a function 'loop' which will call 'f(current_state)' for each point.
def f(state): print state
loop(var_size, f)
This call would call f in the following order:
f( [ 0, 0, 0])
f( [ 0, 0, 1])
f( [ 0, 0, 2])
f( [ 0, 1, 0])
etc....
You can do this with itertools.product:
>>> print list(itertools.product(*(range(x) for x in reversed([3,4,5]))))
[(0, 0, 0), (0, 0, 1), (0, 0, 2), (0, 1, 0), (0, 1, 1), (0, 1, 2), (0, 2, 0), (0, 2, 1), (0, 2, 2), (0, 3, 0), (0, 3, 1), (0, 3, 2), (1, 0, 0), (1, 0, 1), (1, 0, 2), (1, 1, 0), (1, 1, 1), (1, 1, 2), (1, 2, 0), (1, 2, 1), (1, 2, 2), (1, 3, 0), (1, 3, 1), (1, 3, 2), (2, 0, 0), (2, 0, 1), (2, 0, 2), (2, 1, 0), (2, 1, 1), (2, 1, 2), (2, 2, 0), (2, 2, 1), (2, 2, 2), (2, 3, 0), (2, 3, 1), (2, 3, 2), (3, 0, 0), (3, 0, 1), (3, 0, 2), (3, 1, 0), (3, 1, 1), (3, 1, 2), (3, 2, 0), (3, 2, 1), (3, 2, 2), (3, 3, 0), (3, 3, 1), (3, 3, 2), (4, 0, 0), (4, 0, 1), (4, 0, 2), (4, 1, 0), (4, 1, 1), (4, 1, 2), (4, 2, 0), (4, 2, 1), (4, 2, 2), (4, 3, 0), (4, 3, 1), (4, 3, 2)]
Note that I'm generating tuples instead of lists, but that's easy to fix if you really need to.
So, to me it looks like you want:
map(f,itertools.product(*map(range,reversed(var_size))))
Make a list initialized to 0s, as many entries as are in var_size. We treat this list as a list of 'tumblers' - we increment the last one in the list until it overflows its limit (aka var_size at the same point into the list). If so, we set it to 0, go one left and repeat the increment/overflow check until we either do not overflow (reset the 'which tumbler are we looking at' variable back to the last and continue) or overflow all entries of the list (we're done, we looped all the way around), then perform the next call.
I don't know if this is optimal or pythonic, but it is O(n).
This code does the job - And it has the advantage of not creating the list. However, it not that elegant....
Any ideas on how to get this better?
def loop(var_size, f):
nb = len(var_size)
state = [0]*nb
ok = True
while ok:
f(state)
for i in range(nb-1, -1, -1):
state[i] = state[i]+1
if state[i] < var_size[i]:
break
else:
if i == 0:
ok = False
break
else:
state[i] = 0
var_size = [3,4,5]
def f(state):
print state
loop(var_size, f)

Categories

Resources