Related
I am looking at a very large structured dataset that I would like to make unstructured. Here is the example…
x1 x2 x3 day id
1 5 9 2 A
9 7 9 3 B
3 1 4 1 A
2 6 5 1 B
3 5 8 2 B
3 2 3 2 C
The rows above are presented in a random order. Another way to think of this example is as follows…
x = [[1, 5, 9, 2, “A”],
[9, 7, 9, 3, “B”],
[3, 1, 4, 1, “A”],
[2, 6, 5, 1, “B”],
[3, 5, 8, 2, “B”],
[3, 2, 3, 2, “C”]]
Once processed, the desired output is…
[[[3, 1, 4, 1], [1, 5, 9, 2]],
[[2, 6, 5, 1], [3, 5, 8, 2], [9, 7, 9, 3]],
[[3, 2, 3, 2]]],
[[1, A], [1,B], [2,C]]
The first list has the x variables, and the second list has the start date with each identifier.
I have an idea of how to achieve this, but it is in O(n^3). Is there a more efficient method, maybe in O(nlogn)?
Edit: Although mentioned in my previous post, I have made it clearer that the rows are presented in random order. I have also removed redundant column in the code example.
Try:
x = [
[3, 1, 4, 1, 1, "A"],
[1, 5, 9, 2, 2, "A"],
[2, 6, 5, 1, 1, "B"],
[3, 5, 8, 2, 2, "B"],
[9, 7, 9, 3, 3, "B"],
[3, 2, 3, 2, 2, "C"],
]
out = {}
for row in x:
out.setdefault(row[-1], []).append(row[:-1])
print(list(out.values()) + [[[v[0][-1], k] for k, v in out.items()]])
Prints:
[
[[3, 1, 4, 1, 1], [1, 5, 9, 2, 2]],
[[2, 6, 5, 1, 1], [3, 5, 8, 2, 2], [9, 7, 9, 3, 3]],
[[3, 2, 3, 2, 2]],
[[1, "A"], [1, "B"], [2, "C"]],
]
I am doing recursion and storing the value in every step if calling.
Like if my working program-code is like-
lst=[]
def after_occurance(ls,l,curr):
for i in range(l,curr):
if ls[i]==ls[curr]:
return False
return True
def permutate(A,l,r):
if l==r:
ans=A.copy()
print(A,ans)
# change the commenting of the following 2 lines to see the difference
lst.append(A)
#lst.append(ans)
print(lst)
return lst
else:
for i in range(l,r+1):
if after_occurance(A,l,i):
A[i],A[l] = A[l],A[i]
permutate(A,l+1,r)
hm[A[l]]=1
A[l],A[i] = A[i],A[l]
else:
continue
lst.clear()
A=[1,2,6]
A=sorted(A)
permutate(A,0,len(A)-1)
return lst
Following are 2 kind of outputs when Toggling between 2 commented line respectively
[1, 2, 6] [1, 2, 6]
[[1, 2, 6]]
[1, 6, 2] [1, 6, 2]
[[1, 2, 6], [1, 6, 2]]
[2, 1, 6] [2, 1, 6]
[[1, 2, 6], [1, 6, 2], [2, 1, 6]]
[2, 6, 1] [2, 6, 1]
[[1, 2, 6], [1, 6, 2], [2, 1, 6], [2, 6, 1]]
[6, 2, 1] [6, 2, 1]
[[1, 2, 6], [1, 6, 2], [2, 1, 6], [2, 6, 1], [6, 2, 1]]
[6, 1, 2] [6, 1, 2]
[[1, 2, 6], [1, 6, 2], [2, 1, 6], [2, 6, 1], [6, 2, 1], [6, 1, 2]]
[1 2 6 ] [1 6 2 ] [2 1 6 ] [2 6 1 ] [6 1 2 ] [6 2 1 ]
[1, 2, 6] [1, 2, 6]
[[1, 2, 6]]
[1, 6, 2] [1, 6, 2]
[[1, 6, 2], [1, 6, 2]]
[2, 1, 6] [2, 1, 6]
[[2, 1, 6], [2, 1, 6], [2, 1, 6]]
[2, 6, 1] [2, 6, 1]
[[2, 6, 1], [2, 6, 1], [2, 6, 1], [2, 6, 1]]
[6, 2, 1] [6, 2, 1]
[[6, 2, 1], [6, 2, 1], [6, 2, 1], [6, 2, 1], [6, 2, 1]]
[6, 1, 2] [6, 1, 2]
[[6, 1, 2], [6, 1, 2], [6, 1, 2], [6, 1, 2], [6, 1, 2], [6, 1, 2]]
[1 2 6 ] [1 2 6 ] [1 2 6 ] [1 2 6 ] [1 2 6 ] [1 2 6 ]
Can somebody explain this behavior and what basic rule should I follow while doing Recursive calls and variable access in python?
So, this is the code you really wanted to post:
def after_occurance(ls, l, curr):
for i in range(l, curr):
if ls[i] == ls[curr]:
return False
return True
def permutate(A, l, r):
if l == r:
ans = A.copy()
# change the commenting of the following 2 lines to see the difference
#lst.append(A)
lst.append(ans)
return
else:
for i in range(l, r + 1):
if after_occurance(A, l, i):
A[i],A[l] = A[l],A[i]
permutate(A, l + 1, r)
A[l],A[i] = A[i],A[l]
else:
continue
lst = []
A = [1,2,6]
A = sorted(A)
permutate(A, 0, len(A) - 1)
print(lst)
The difference comes from appending a copy() of A or just a reference to A.
When you append a reference to A, all the future changes to A show up in lst because the result is lst = [A, A, A, A, A, ....] and so lst cannot be anything apart from a list of the same thing.
When you append a copy() of A, you make a new list which is not changed after the append() and so records the history of how A looked over time.
I want duplicate rows in numpy arrays based on the numeric value of the first entry in each row. So if the value is 1, then the row isn't duplicated, but if the value is 3 that row will be represented 3 times. I tried to use np.repeat and np.tile but I don't know if they're the right tool for this and I haven't figured out if there is a way to do it yet.
Here are my randomly generated arrays :
[[[3 1 3 1 2]
[4 4 4 2 0]
[3 4 4 4 0]
[1 4 3 3 0]]
[[4 2 0 2 1]
[2 1 2 0 3]
[4 1 3 4 3]
[2 3 2 0 0]]]
My goal is to end up with this:
[[[3 1 3 1 2]
[3 1 3 1 2]
[3 1 3 1 2]
[4 4 4 2 0]
[4 4 4 2 0]
[4 4 4 2 0]
[4 4 4 2 0]
[3 4 4 4 0]
[3 4 4 4 0]
[3 4 4 4 0]
[1 4 3 3 0]]
[[4 2 0 2 1]
[4 2 0 2 1]
[4 2 0 2 1]
[4 2 0 2 1]
[2 1 2 0 3]
[2 1 2 0 3]
[4 1 3 4 3]
[4 1 3 4 3]
[4 1 3 4 3]
[4 1 3 4 3]
[2 3 2 0 0]
[2 3 2 0 0]]]
Here is the code I have so far
array = np.random.randint(5, size = (2, 4,5))
for a in array:
for b in a:
array = np.tile(a, (b[0],1))
If I print b[0], I can get each value. I want to use those values to duplicate each row.
3
4
3
1
4
2
4
2
So I thought I could loop through those values and multiply each row by its corresponding value to add new rows, but my result only duplicates the second array one time.
[[4 2 0 2 1]
[2 1 2 0 3]
[4 1 3 4 3]
[2 3 2 0 0]
[4 2 0 2 1]
[2 1 2 0 3]
[4 1 3 4 3]
[2 3 2 0 0]]
Where am I going wrong? Should I not use np.tile?
Since there is no guarantee that your original 2D sub-arrays in the 3D source array will have the same shape after performing this operation, they cannot in general be stacked back into a 3D array.
You can get a list of arrays with np.repeat by passing the first column of each 2D array as the number of repeats. It will then repeat each row the corresponding number of times:
from pprint import pprint
result = ([np.repeat(a[i], a[i, :, 0], axis=0) for i in range(a.shape[0])])
pprint(result)
Output:
[array([[3, 1, 3, 1, 2],
[3, 1, 3, 1, 2],
[3, 1, 3, 1, 2],
[4, 4, 4, 2, 0],
[4, 4, 4, 2, 0],
[4, 4, 4, 2, 0],
[4, 4, 4, 2, 0],
[3, 4, 4, 4, 0],
[3, 4, 4, 4, 0],
[3, 4, 4, 4, 0],
[1, 4, 3, 3, 0]]),
array([[4, 2, 0, 2, 1],
[4, 2, 0, 2, 1],
[4, 2, 0, 2, 1],
[4, 2, 0, 2, 1],
[2, 1, 2, 0, 3],
[2, 1, 2, 0, 3],
[4, 1, 3, 4, 3],
[4, 1, 3, 4, 3],
[4, 1, 3, 4, 3],
[4, 1, 3, 4, 3],
[2, 3, 2, 0, 0],
[2, 3, 2, 0, 0]])]
Use numpy.repeat with np.arange:
import numpy as np
arr = np.array([[[3, 1, 3, 1, 2],
[4, 4, 4, 2, 0],
[3, 4, 4, 4, 0],
[1, 4, 3, 3, 0]],
[[4, 2, 0, 2, 1],
[2, 1, 2, 0, 3],
[4, 1, 3, 4, 3],
[2, 3, 2, 0, 0]]])
arr2d = np.vstack(arr)
dup = arr2d[np.repeat(np.arange(arr2d.shape[0]), arr2d[:,0])]
np.split(dup, np.cumsum(np.sum(np.split(arr2d[:,0], arr.shape[0]), 1)))[:-1]
Output:
[array([[3, 1, 3, 1, 2],
[3, 1, 3, 1, 2],
[3, 1, 3, 1, 2],
[4, 4, 4, 2, 0],
[4, 4, 4, 2, 0],
[4, 4, 4, 2, 0],
[4, 4, 4, 2, 0],
[3, 4, 4, 4, 0],
[3, 4, 4, 4, 0],
[3, 4, 4, 4, 0],
[1, 4, 3, 3, 0]]),
array([[4, 2, 0, 2, 1],
[4, 2, 0, 2, 1],
[4, 2, 0, 2, 1],
[4, 2, 0, 2, 1],
[2, 1, 2, 0, 3],
[2, 1, 2, 0, 3],
[4, 1, 3, 4, 3],
[4, 1, 3, 4, 3],
[4, 1, 3, 4, 3],
[4, 1, 3, 4, 3],
[2, 3, 2, 0, 0],
[2, 3, 2, 0, 0]])]
Since the 2d-arrays do not always have same shape, most of the time it will yield list of arrays. Such inconsistency is not so well handled by numpy.
In such case, you can simply use itertools.repeat with list comprehension. (Though it looks quite similar to #gmds' answer)
Given l:
import itertools
l = [[[3, 1, 3, 1, 2], [4, 4, 4, 2, 0], [3, 4, 4, 4, 0], [1, 4, 3, 3, 0]],
[[4, 2, 0, 2, 1], [2, 1, 2, 0, 3], [4, 1, 3, 4, 3], [2, 3, 2, 0, 0]]]
[[j for i in sub for j in itertools.repeat(i, i[0])] for sub in l]
Output:
[[[3, 1, 3, 1, 2],
[3, 1, 3, 1, 2],
[3, 1, 3, 1, 2],
[4, 4, 4, 2, 0],
[4, 4, 4, 2, 0],
[4, 4, 4, 2, 0],
[4, 4, 4, 2, 0],
[3, 4, 4, 4, 0],
[3, 4, 4, 4, 0],
[3, 4, 4, 4, 0],
[1, 4, 3, 3, 0]],
[[4, 2, 0, 2, 1],
[4, 2, 0, 2, 1],
[4, 2, 0, 2, 1],
[4, 2, 0, 2, 1],
[2, 1, 2, 0, 3],
[2, 1, 2, 0, 3],
[4, 1, 3, 4, 3],
[4, 1, 3, 4, 3],
[4, 1, 3, 4, 3],
[4, 1, 3, 4, 3],
[2, 3, 2, 0, 0],
[2, 3, 2, 0, 0]]]
I found this code on the Internet (Find all possible subsets that sum up to a given number)
def partitions(n):
if n:
for subpart in partitions(n-1):
yield [1] + subpart
if subpart and (len(subpart) < 2 or subpart[1] > subpart[0]):
yield [subpart[0] + 1] + subpart[1:]
else:
yield []
I was wondering if someone could find a way to pull out of the answer only the answers, that are 2 digit addition?
For example: I type in 10. It gives me:
[[1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 1, 1, 2], [1, 1, 1, 1, 1, 1, 2, 2], [1, 1, 1, 1, 2, 2, 2], [1, 1, 2, 2, 2, 2], [2, 2, 2, 2, 2], [1, 1, 1, 1, 1, 1, 1, 3], [1, 1, 1, 1, 1, 2, 3], [1, 1, 1, 2, 2, 3], [1, 2, 2, 2, 3], [1, 1, 1, 1, 3, 3], [1, 1, 2, 3, 3], [2, 2, 3, 3], [1, 3, 3, 3], [1, 1, 1, 1, 1, 1, 4] , [1, 1, 1, 1, 2, 4], [1, 1, 2, 2, 4], [2, 2, 2, 4], [1, 1, 1, 3, 4], [1, 2, 3, 4], [3, 3, 4], [1, 1, 4, 4], [2, 4, 4], [1, 1, 1, 1, 1, 5], [1, 1, 1, 2, 5], [1, 2, 2, 5], [1, 1, 3, 5], [2, 3, 5], [1, 4, 5], [5, 5], [1, 1, 1, 1, 6], [1, 1, 2 , 6], [2, 2, 6], [1, 3, 6], [4, 6], [1, 1, 1, 7], [1, 2, 7], [3, 7], [1, 1, 8], [2, 8], [1, 9], [10]]
I would like it only gives:
[[5, 5], [4, 6], [3, 7], [2, 8], [1, 9]]
Since you only want partitions of length 2 (and the products of the elements of each partition), we can use a simpler approach:
#! /usr/bin/env python
''' Find pairs of positive integers that sum to n, and their product '''
def part_prod(n):
parts = [(i, n-i) for i in xrange(1, 1 + n//2)]
print parts
print '\n'.join(["%d * %d = %d" % (u, v, u*v) for u,v in parts])
def main():
n = 10
part_prod(n)
if __name__ == '__main__':
main()
output
[(1, 9), (2, 8), (3, 7), (4, 6), (5, 5)]
1 * 9 = 9
2 * 8 = 16
3 * 7 = 21
4 * 6 = 24
5 * 5 = 25
You could use itertools.combinations_with_replacement
from itertools import combinations_with_replacement
n = 10
print([x for x in combinations_with_replacement(range(1,n), 2) if sum(x) == n])
[(1, 9), (2, 8), (3, 7), (4, 6), (5, 5)]
Just for fun with list comprehension without using itertools.
num = 10
[[x, y] for x in range(1, num) for y in range(1, num) if x + y == num and x <= y]
# [[1, 9], [2, 8], [3, 7], [4, 6], [5, 5]]
I wanted to know why the solution I have written doesn't work:
def transpose(sudoku):
n = len(sudoku)
l_tr = [0]*n
k = 0
tr_sudoku = [0]*n
while k < n:
tr_sudoku[k] = l_tr
k = k+1
i = 0
for i in range(len(sudoku)):
j = 0
for j in range(len(sudoku)):
tr_sudoku[i][j] = sudoku[j][i]
print j, i, tr_sudoku, sudoku[i][j]
print tr_sudoku
return tr_sudoku
correct = [[1,2,3],[2,3,1],[3,1,2]]
print transpose(correct)
It outputs the following incorrect solution:
0 0 [[1, 0, 0], [1, 0, 0], [1, 0, 0]] 1
1 0 [[1, 2, 0], [1, 2, 0], [1, 2, 0]] 2
2 0 [[1, 2, 3], [1, 2, 3], [1, 2, 3]] 3
[[1, 2, 3], [1, 2, 3], [1, 2, 3]]
0 1 [[2, 2, 3], [2, 2, 3], [2, 2, 3]] 2
1 1 [[2, 3, 3], [2, 3, 3], [2, 3, 3]] 3
2 1 [[2, 3, 1], [2, 3, 1], [2, 3, 1]] 1
[[2, 3, 1], [2, 3, 1], [2, 3, 1]]
0 2 [[3, 3, 1], [3, 3, 1], [3, 3, 1]] 3
1 2 [[3, 1, 1], [3, 1, 1], [3, 1, 1]] 1
2 2 [[3, 1, 2], [3, 1, 2], [3, 1, 2]] 2
[[3, 1, 2], [3, 1, 2], [3, 1, 2]]
[[3, 1, 2], [3, 1, 2], [3, 1, 2]]
Help would be appreciated! Thanks.
The ideal correct solution to:
correct = [[1,2,4],[2,3,4],[3,4,2]]
would be:
tr_correct = [[1,2,3],[2,3,4],[4,4,2]]
You can easily transpose with zip:
def transpose(sudoku):
return list(map(list, zip(*sudoku)))
Example output:
>>> correct = [[1,2,3],[2,3,1],[3,1,2]]
>>> transpose(correct)
[[1, 2, 3], [2, 3, 1], [3, 1, 2]]
The easiest "manual" way is to switch rows and columns:
def transpose_manually(sudoku):
output = sudoku[:] # new grid the same size
for r in range(len(sudoku)): # each row
for c in range(len(sudoku[0])): # each column
output[c][r] = sudoku[r][c] # switch
return output