I'm trying to write a function that will take as input length L and distance D (both integers > 1) and output all possible sequences that fit the following parameters:
start with the number 1
have L elements
have a distance of 1 to D between each element and the following element
So, for L = 4 and D = 2, the possible sequences would be:
1 2 3 4 (distance of 1 between each consecutive element)
1 2 3 5
1 2 4 5
1 2 4 6
1 3 4 5
1 3 4 6
1 3 5 6
1 3 5 7 (distance of 2 between each consecutive element)
Or, for L = 3 and D = 3, the possible sequences would be:
1 2 3 (distance of 1 between each consecutive element)
1 2 4
1 2 5
1 3 4
1 3 5 (distance of 2 between each consecutive element)
1 3 6
1 4 5
1 4 6
1 4 7 (distance of 3 between each consecutive element)
From hand-coding several of these, the number of possible sequences seems to be D ** (L-1). At first I only needed 2\**7, and 128 sequences wasn't that difficult to create by hand. However, I now need 3**7, and possibly even larger amounts, so I need to write a function.
Python is the language I'm learning. Recursion seems to be the way to do it, but I've only practiced on simple recursion, and I'm stuck as to how precisely to write this. As best as I can work out, I need a function that calls itself from within a for-loop. Does this make sense? Directions to similarly structured functions would be greatly appreciated, as well.
You can use itertools.product and itertools.accumulate to achieve your desired function:
import itertools
def f(l, d):
for sub in itertools.product(range(1, d+1), repeat=l-1):
yield tuple(itertools.accumulate((1,) + sub))
for l in f(4, 2):
print(l)
(1, 2, 3, 4)
(1, 2, 3, 5)
(1, 2, 4, 5)
(1, 2, 4, 6)
(1, 3, 4, 5)
(1, 3, 4, 6)
(1, 3, 5, 6)
(1, 3, 5, 7)
Here is a quick and dirty implementation
def gen_seq(D, L):
for uple in itertools.product(range(1, D+1), repeat=L-1):
yield tuple(numpy.cumsum((1,) + uple))
The point of recursion is not only formalizing the code in a recursive way, but orientating your mind in a recursive way. Compare carefully the results for length 3 and length 4 with distance 2.
a. Length 3
1 2 3
1 2 4
1 3 4
1 3 5
b. Length 4
1 2 3 | 4
1 2 3 | 5
1 2 4 | 5
1 2 4 | 6
1 3 4 | 5
1 3 4 | 6
1 3 5 | 6
1 3 5 | 7
In the result of length 4, the right side of | is just the result of length 3. That means the result of N length can be derived from N - 1 length.
Assume that we have already a procedure to solve k - 1 length solve_part(k-1), by extending the result of k-1 to next length k next_len(solve_part(k-1) ...), this problem is naturally solved in a recursive way.
import itertools
def flatmap(func, *iterable):
return itertools.chain.from_iterable(map(func, *iterable))
def next_distance(eachlist, D):
return map(lambda eachdis: eachlist + [eachlist[-1] + eachdis], range(1,D+1))
def next_len(L,D):
return flatmap(lambda eachlist: next_distance(eachlist, D), L)
def solve_it(leng,dis):
def solve_part(k):
if k == 0:
return [[]]
elif k == 1:
return [[1]]
else:
return next_len(solve_part(k-1),dis)
return solve_part(leng)
result=solve_it(4,2)
print([[i for i in j] for j in result])
Related
I want to do some complex calculations in pandas while referencing previous values (basically I'm calculating row by row). However the loops take forever and I wanted to know if there was a faster way. Everybody keeps mentioning using shift but I don't understand how that would even work.
df = pd.DataFrame(index=range(500)
df["A"]= 2
df["B"]= 5
df["A"][0]= 1
for i in range(len(df):
if i != 0: df['A'][i] = (df['A'][i-1] / 3) - df['B'][i-1] + 25
numpy_ext can be used for expanding calculations
pandas-rolling-apply-using-multiple-columns for reference
I have also included a simpler calc to demonstrate behaviour in simpler way
df = pd.DataFrame(index=range(5000))
df["A"]= 2
df["B"]= 5
df["A"][0]= 1
import numpy_ext as npe
# for i in range(len(df):
# if i != 0: df['A'][i] = (df['A'][i-1] / 3) - df['B'][i-1] + 25
# SO example - function of previous values in A and B
def f(A,B):
r = np.sum(A[:-1]/3) - np.sum(B[:-1] + 25) if len(A)>1 else A[0]
return r
# much simpler example, sum of previous values
def g(A):
return np.sum(A[:-1])
df["AB_combo"] = npe.expanding_apply(f, 1, df["A"].values, df["B"].values)
df["A_running"] = npe.expanding_apply(g, 1, df["A"].values)
print(df.head(10).to_markdown())
sample output
A
B
AB_combo
A_running
0
1
5
1
0
1
2
5
-29.6667
1
2
2
5
-59
3
3
2
5
-88.3333
5
4
2
5
-117.667
7
5
2
5
-147
9
6
2
5
-176.333
11
7
2
5
-205.667
13
8
2
5
-235
15
9
2
5
-264.333
17
In one move we can make it equal to the 2nd maximum element and have to make all elements equal to the minimum element.
My code is given below it works fine but I want to reduce its time complexity.
def No_Books(arr, n):
arr = sorted(arr)
steps = 0
while arr[0]!= arr[arr.index(max(arr))]:
max1 = max(arr)
count = arr.count(max1)
scnd_max = arr.index(max1)-1
arr[scnd_max+count] = arr[scnd_max]
steps += 1
return steps
n = int(input())
arr = [int(x) for x in input().split()]
print(No_Books(arr,n))
Output
5
4 5 5 2 4
6
Here minimum moves required is 6
I'm interpreting the question in the following way:
For each element in the array, there is one and only one operation you're allowed to perform, and that operation is to replace an index's value with the array's current second-largest element.
How many operations are necessary to make the entire array's values equal to the initial minimum value?
With the example input 4 5 5 2 4 needing to go through the following steps:
Array - step - comments
4 5 5 2 4 - 0 - start
4 4 5 2 4 - 1 - replace the first 5 with 4 (the second-largest value in the array)
4 4 4 2 4 - 2 - replace the second 5 with 4
2 4 4 2 4 - 3 - replace the first 4 with 2
2 2 4 2 4 - 4
2 2 2 2 4 - 5
2 2 2 2 2 - 6
It took 6 steps, so the result is 6.
If that is correct, then I can change your quadratic solution (O(n^2), where n is the size of the array) to a quasilinear solution (O(n + mlogm) where n is the size of the array, and m is the number of unique values in the array), as follows.
The approach is to notice that each value needs to be dropped down to the next largest value for each unique value smaller than itself. So if we can track the count of each unique value, we can determine the number of steps without actually doing any array updates.
In pseudocode:
function determineSteps(array):
define map from integer to integer, defaulting to 0
for each value in array: // Linear in N
map(value)++
sort map by key, descending // M log M
// largerCount is the number of elements larger than the current second-largest value
define largerCount, assign 0 to largerCount
// stepCount is the number of steps required
define stepCount, assign 0 to stepCount
for each key in map except the last: // Linear in M
largerCount = largerCount + map(key)
stepCount = stepCount + largerCount
return stepCount
On your example input:
4 5 5 2 4
Create map { 4: 2, 5: 2, 2: 1 }
Sort map by key, descending: { 5: 2, 4: 2, 2: 1 }
stepCount = 0
largerCount = 0
Examine key = 5, map(key) = 2
largerCount = 0 + 2 = 2
stepCount = 0 + 2 = 2
Examine key = 4, map(key) = 2
largerCount = 2 + 2 = 4
stepCount = 2 + 4 = 6
return 6
Let's say I have this array:
np.arange(9)
[0 1 2 3 4 5 6 7 8]
I would like to shuffle the elements with np.random.shuffle but certain numbers have to be in the original order.
I want that 0, 1, 2 have the original order.
I want that 3, 4, 5 have the original order.
And I want that 6, 7, 8 have the original order.
The number of elements in the array would be multiple of 3.
For example, some possible outputs would be:
[ 3 4 5 0 1 2 6 7 8]
[ 0 1 2 6 7 8 3 4 5]
But this one:
[2 1 0 3 4 5 6 7 8]
Would not be valid because 0, 1, 2 are not in the original order
I think that maybe zip() could be useful here, but I'm not sure.
Short solution using numpy.random.shuffle and numpy.ndarray.flatten functions:
arr = np.arange(9)
arr_reshaped = arr.reshape((3,3)) # reshaping the input array to size 3x3
np.random.shuffle(arr_reshaped)
result = arr_reshaped.flatten()
print(result)
One of possible random results:
[3 4 5 0 1 2 6 7 8]
Naive approach:
num_indices = len(array_to_shuffle) // 3 # use normal / in python 2
indices = np.arange(num_indices)
np.random.shuffle(indices)
shuffled_array = np.empty_like(array_to_shuffle)
cur_idx = 0
for idx in indices:
shuffled_array[cur_idx:cur_idx+3] = array_to_shuffle[idx*3:(idx+1)*3]
cur_idx += 3
Faster (and cleaner) option:
num_indices = len(array_to_shuffle) // 3 # use normal / in python 2
indices = np.arange(num_indices)
np.random.shuffle(indices)
tmp = array_to_shuffle.reshape([-1,3])
tmp = tmp[indices,:]
tmp.reshape([-1])
This is a typical use case for FEM/FVM equation systems, so is perhaps of broader interest. From a triangular mesh à la
I would like to create a scipy.sparse.csr_matrix. The matrix rows/columns represent values at the nodes of the mesh. The matrix has entries on the main diagonal and wherever two nodes are connected by an edge.
Here's an MWE that first builds a node->edge->cells relationship and then builds the matrix:
import numpy
import meshzoo
from scipy import sparse
nx = 1600
ny = 1000
verts, cells = meshzoo.rectangle(0.0, 1.61, 0.0, 1.0, nx, ny)
n = len(verts)
nds = cells.T
nodes_edge_cells = numpy.stack([nds[[1, 2]], nds[[2, 0]],nds[[0, 1]]], axis=1)
# assign values to each edge (per cell)
alpha = numpy.random.rand(3, len(cells))
vals = numpy.array([
[alpha**2, -alpha],
[-alpha, alpha**2],
])
# Build I, J, V entries for COO matrix
I = []
J = []
V = []
#
V.append(vals[0][0])
V.append(vals[0][1])
V.append(vals[1][0])
V.append(vals[1][1])
#
I.append(nodes_edge_cells[0])
I.append(nodes_edge_cells[0])
I.append(nodes_edge_cells[1])
I.append(nodes_edge_cells[1])
#
J.append(nodes_edge_cells[0])
J.append(nodes_edge_cells[1])
J.append(nodes_edge_cells[0])
J.append(nodes_edge_cells[1])
# Create suitable data for coo_matrix
I = numpy.concatenate(I).flat
J = numpy.concatenate(J).flat
V = numpy.concatenate(V).flat
matrix = sparse.coo_matrix((V, (I, J)), shape=(n, n))
matrix = matrix.tocsr()
With
python -m cProfile -o profile.prof main.py
snakeviz profile.prof
one can create and view a profile of the above:
The method tocsr() takes the lion share of the runtime here, but this is also true when building alpha is more complex. Consequently, I'm looking for ways to speed this up.
What I've already found:
Due to the structure of the data, the values on the diagonal of the matrix can be summed up in advance, i.e.,
V.append(vals[0, 0, 0] + vals[1, 1, 2])
I.append(nodes_edge_cells[0, 0]) # == nodes_edge_cells[1, 2]
J.append(nodes_edge_cells[0, 0]) # == nodes_edge_cells[1, 2]
This makes I, J, V shorter and thus speeds up tocsr.
Right now, edges are "per cell". I could identify equal edges with each other using numpy.unique, effectively saving about half of I, J, V. However, I found that this too takes some time. (Not surprising.)
One other thought that I had was that that I could replace the diagonal V, I, J by a simple numpy.add.at if there was a csr_matrix-like data structure where the main diagonal is kept separately. I know that this exists in some other software packages, but couldn't find it in scipy. Correct?
Perhaps there's a sensible way to construct CSR directly?
I would try creating the csr structure directly, especially if you are resorting to np.unique since this gives you sorted keys, which is half the job done.
I'm assuming you are at the point where you have i, j sorted lexicographically and overlapping v summed using np.add.at on the optional inverse output of np.unique.
Then v and j are already in csr format. All that's left to do is creating the indptr which you simply get by np.searchsorted(i, np.arange(M+1)) where M is the column length. You can pass these directly to the sparse.csr_matrix constructor.
Ok, let code speak:
import numpy as np
from scipy import sparse
from timeit import timeit
def tocsr(I, J, E, N):
n = len(I)
K = np.empty((n,), dtype=np.int64)
K.view(np.int32).reshape(n, 2).T[...] = J, I
S = np.argsort(K)
KS = K[S]
steps = np.flatnonzero(np.r_[1, np.diff(KS)])
ED = np.add.reduceat(E[S], steps)
JD, ID = KS[steps].view(np.int32).reshape(-1, 2).T
ID = np.searchsorted(ID, np.arange(N+1))
return sparse.csr_matrix((ED, np.array(JD, dtype=int), ID), (N, N))
def viacoo(I, J, E, N):
return sparse.coo_matrix((E, (I, J)), (N, N)).tocsr()
#testing and timing
# correctness
N = 1000
A = np.random.random((N, N)) < 0.001
I, J = np.where(A)
E = np.random.random((2, len(I)))
D = np.zeros((2,) + A.shape)
D[:, I, J] = E
D2 = tocsr(np.r_[I, I], np.r_[J, J], E.ravel(), N).A
print('correct:', np.allclose(D.sum(axis=0), D2))
# speed
N = 100000
K = 10
I, J = np.random.randint(0, N, (2, K*N))
E = np.random.random((2 * len(I),))
I, J, E = np.r_[I, I, J, J], np.r_[J, J, I, I], np.r_[E, E]
print('N:', N, ' -- nnz (with duplicates):', len(E))
print('direct: ', timeit('f(a,b,c,d)', number=10, globals={'f': tocsr, 'a': I, 'b': J, 'c': E, 'd': N}), 'secs for 10 iterations')
print('via coo:', timeit('f(a,b,c,d)', number=10, globals={'f': viacoo, 'a': I, 'b': J, 'c': E, 'd': N}), 'secs for 10 iterations')
Prints:
correct: True
N: 100000 -- nnz (with duplicates): 4000000
direct: 7.702431229001377 secs for 10 iterations
via coo: 41.813509466010146 secs for 10 iterations
Speedup: 5x
So, in the end this turned out to be the difference between COO's and CSR's sum_duplicates (just like #hpaulj suspected). Thanks to the efforts of everyone involved here (particularly #paul-panzer), a PR is underway to give tocsr a tremendous speedup.
SciPy's tocsr does a lexsort on (I, J), so it helps organizing the indices in such a way that (I, J) will come out fairly sorted already.
For for nx=4, ny=2 in the above example, I and J are
[1 6 3 5 2 7 5 5 7 4 5 6 0 2 2 0 1 2 1 6 3 5 2 7 5 5 7 4 5 6 0 2 2 0 1 2 5 5 7 4 5 6 0 2 2 0 1 2 1 6 3 5 2 7 5 5 7 4 5 6 0 2 2 0 1 2 1 6 3 5 2 7]
[1 6 3 5 2 7 5 5 7 4 5 6 0 2 2 0 1 2 5 5 7 4 5 6 0 2 2 0 1 2 1 6 3 5 2 7 1 6 3 5 2 7 5 5 7 4 5 6 0 2 2 0 1 2 5 5 7 4 5 6 0 2 2 0 1 2 1 6 3 5 2 7]
First sorting each row of cells, then the rows by the first column like
cells = numpy.sort(cells, axis=1)
cells = cells[cells[:, 0].argsort()]
produces
[1 4 2 5 3 6 5 5 5 6 7 7 0 0 1 2 2 2 1 4 2 5 3 6 5 5 5 6 7 7 0 0 1 2 2 2 5 5 5 6 7 7 0 0 1 2 2 2 1 4 2 5 3 6 5 5 5 6 7 7 0 0 1 2 2 2 1 4 2 5 3 6]
[1 4 2 5 3 6 5 5 5 6 7 7 0 0 1 2 2 2 5 5 5 6 7 7 0 0 1 2 2 2 1 4 2 5 3 6 1 4 2 5 3 6 5 5 5 6 7 7 0 0 1 2 2 2 5 5 5 6 7 7 0 0 1 2 2 2 1 4 2 5 3 6]
For the number in the original post, sorting cuts down the runtime from about 40 seconds to 8 seconds.
Perhaps an even better ordering can be achieved if the nodes are numbered more appropriately in the first place. I'm thinking of Cuthill-McKee and friends.
If I were to type something like this, I would get these values:
print range(1,10)
[1,2,3,4,5,6,7,8,9]
but say if I want to use this same value in a for loop then it would instead start at 0, an example of what I mean:
for r in range(1,10):
for c in range(r):
print c,
print ""
The Output is this:
0
0 1
0 1 2
0 1 2 3
0 1 2 3 4
0 1 2 3 4 5
0 1 2 3 4 5 6
0 1 2 3 4 5 6 7
0 1 2 3 4 5 6 7 8
Why is 0 here? shouldn't it start at 1 and end in 9?
You are creating a second range() object in your loop. The default start value is 0.
Each iteration you create a loop over range(r), meaning range from 0 to r, exclusive, to produce the output numbers. For range(1) that means you get a list with just [0] in it, for range(1) you get [0, 1], etc.
If you wanted to produce ranges from 1 to r inclusive`, just add 1 to the number you actually print:
for r in range(1,10):
for c in range(r):
print c + 1,
print ""
or range from 1 to r + 1:
for r in range(1,10):
for c in range(1, r + 1):
print c,
print ""
Both produce your expected output:
>>> for r in range(1,10):
... for c in range(r):
... print c + 1,
... print ""
...
1
1 2
1 2 3
1 2 3 4
1 2 3 4 5
1 2 3 4 5 6
1 2 3 4 5 6 7
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8 9
>>> for r in range(1,10):
... for c in range(1, r + 1):
... print c,
... print ""
...
1
1 2
1 2 3
1 2 3 4
1 2 3 4 5
1 2 3 4 5 6
1 2 3 4 5 6 7
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8 9
If you pass only one argument to range function, it would treat that as the ending value (without including it), starting from zero.
If you pass two arguments to the range function, it would treat the first value as the starting value and the second value as the ending value (without including it).
If you pass three arguments to the range function, it would treat the first value as the starting value and the second value as the ending value (without including it) and the third value as the step value.
You can confirm this with few trial runs like this
>>> range(10)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9] # Default start value 0
>>> range(5, 10)
[5, 6, 7, 8, 9] # Starts from 5
>>> range(5, 10, 2)
[5, 7, 9] # Starts from 5 & takes only the 2nd element
Nope.
for r in range(1,10):
for c in range(r):
print c,
print ""
range(), when only given one argument, prints the numbers from 0 to the argument, not including the argument:
>>> range(6)
[0, 1, 2, 3, 4, 5]
And so, on the third iteration of your code, this is what happens:
for r in range(1,10): # r is 3
for c in range(r): # range(3) is [0,1,2]
print c, #you then print each of the range(3), giving the output you observe
print ""
https://docs.python.org/2/library/functions.html#range
From the docs:
The arguments must be plain integers. If the step argument is omitted, it defaults to 1. If the start argument is omitted, it defaults to 0.