I'm having troubles writing this piece of code.
I need to create a list to only have 3 values every 3 values :
The expected output must be something like :
output1 = [1,2,3,7,8,9,13,14,15,....67,68,69]
output2 = [4,5,6,10,11,12...70,71,72]
Any ideas how can I reach that ?
Use two loops -- one for each group of three, and one for each item within that group. For example:
>>> [i*6 + j for i in range(12) for j in range(1, 4)]
[1, 2, 3, 7, 8, 9, 13, 14, 15, 19, 20, 21, 25, 26, 27, 31, 32, 33, 37, 38, 39, 43, 44, 45, 49, 50, 51, 55, 56, 57, 61, 62, 63, 67, 68, 69]
>>> [i*6 + j for i in range(12) for j in range(4, 7)]
[4, 5, 6, 10, 11, 12, 16, 17, 18, 22, 23, 24, 28, 29, 30, 34, 35, 36, 40, 41, 42, 46, 47, 48, 52, 53, 54, 58, 59, 60, 64, 65, 66, 70, 71, 72]
Suppose you want n values every n values of total sets starting with start. Just change the start and number of sets you need. In below example list start with 1, so first set [1,2,3] and we need 12 sets each containing 3 consecutive element
Method 1
n = 3
start = 1
total = 12
# 2*n*i + start is first element of every set of n tuples (Arithmetic progression)
print([j for i in range(total) for j in range(2*n*i + start, 2*n*i + start+n)])
# Or
print(sum([list(range(2*n*i + start, 2*n*i + start+n)) for i in range(total)], []))
Method 2 (Numpy does operation in C, so fast)
import numpy as np
n = 3
start = 1
total = 12
# One liner
print(
(np.arange(start, start + n, step=1)[:, np.newaxis] + np.arange(0, total, 1) * 2*n).transpose().reshape(-1)
)
##############EXPLAINATION OF ABOVE ONE LINEAR########################
# np.arange start, start+1, ... start + n - 1
first_set = np.arange(start, start + n, step=1)
# [1 2 3]
# np.arange 0, 2*n, 4*n, 6*n, ....
multiple_to_add = np.arange(0, total, 1) * 2*n
print(multiple_to_add)
# broadcast first set using np.newaxis and repeatively add to each element in multiple_to_add
each_set_as_col = first_set[:, np.newaxis] + multiple_to_add
# [[ 1 7 13 19 25 31 37 43 49 55 61 67]
# [ 2 8 14 20 26 32 38 44 50 56 62 68]
# [ 3 9 15 21 27 33 39 45 51 57 63 69]]
# invert rows and columns
each_set_as_row = each_set_as_col.transpose()
# [[ 1 2 3]
# [ 7 8 9]
# [13 14 15]
# [19 20 21]
# [25 26 27]
# [31 32 33]
# [37 38 39]
# [43 44 45]
# [49 50 51]
# [55 56 57]
# [61 62 63]
# [67 68 69]]
merge_all_set_in_single_row = each_set_as_row.reshape(-1)
# array([ 1, 2, 3, 7, 8, 9, 13, 14, 15, 19, 20, 21, 25, 26, 27, 31, 32,
# 33, 37, 38, 39, 43, 44, 45, 49, 50, 51, 55, 56, 57, 61, 62, 63, 67,
# 68, 69])
To make the logic understandable, because sometimes the Pythonic methods look 'magic'
Here's a naive algorithm to do that:
output1 = []
output2 = []
for i in range(1, 100): # change as you like:
if (i-1) % 6 < 3:
output1.append(i)
else:
output2.append(i)
What's going on here:
Initializing two empty lists.
Iterate through integers in a range.
How to tell if i should go to output1 or output2:
I can see that 3 consecutive numbers go to output1, then 3 consecutive to output2.
This tells me I can use the modulo % operator, (doing % 6)
The rest is simple logic to get the exact result wanted.
Related
I am translating code from MATLAB to Python. I need to extract the lower subdiagonal values of a matrix. My attempt in python seems to extract the same values (sum is equal), but in different order. This is a problem as I need to apply corrcoef after.
The original Matlab code is using an array of indices to subset a matrix.
MATLAB code:
values = 1:100;
matrix = reshape(values,[10,10]);
subdiag = find(tril(ones(10),-1));
matrix_subdiag = matrix(subdiag);
subdiag_sum = sum(matrix_subdiag);
disp(matrix_subdiag(1:10))
disp(subdiag_sum)
Output:
2
3
4
5
6
7
8
9
10
13
1530
My attempt in Python
import numpy as np
matrix = np.arange(1,101).reshape(10,10)
matrix_t = matrix.T #to match MATLAB arrangement
matrix_subdiag = matrix_t[np.tril_indices((10), k = -1)]
subdiag_sum = np.sum(matrix_subdiag)
print(matrix_subdiag[0:10], subdiag_sum))
Output:
[2 3 13 4 14 24 5 15 25 35] 1530
How do I get the same order output? Where is my error?
Thank you!
For the sum use directly numpy.triu on the non-transposed matrix:
S = np.triu(matrix, k=1).sum()
# 1530
For the indices, numpy.triu_indices_from and slicing as a flattened array:
idx = matrix[np.triu_indices_from(matrix, k=1)]
output:
array([ 2, 3, 4, 5, 6, 7, 8, 9, 10, 13, 14, 15, 16, 17, 18, 19, 20,
24, 25, 26, 27, 28, 29, 30, 35, 36, 37, 38, 39, 40, 46, 47, 48, 49,
50, 57, 58, 59, 60, 68, 69, 70, 79, 80, 90])
I have the following start and end values:
start = 0
end = 54
I need to generate subsets of 4 sequential integers starting from start until end with a space of 20 between each subset. The result should be this one:
0, 1, 2, 3, 24, 25, 26, 27, 48, 49, 50, 51
In this example, we obtained 3 subsets:
0, 1, 2, 3
24, 25, 26, 27
48, 49, 50, 51
How can I do it using numpy or pandas?
If I do r = [i for i in range(0,54,4)], I get [0, 4, 8, 12, 16, 20, 24, 28, 32, 36, 40, 44, 48, 52].
This should get you what you want:
j = 20
k = 4
result = [split for i in range(0,55, j+k) for split in range(i, k+i)]
print (result)
Output:
[0, 1, 2, 3, 24, 25, 26, 27, 48, 49, 50, 51]
Maybe something like this:
r = [j for i in range(0, 54, 24) for j in range(i, i + 4)]
print(r)
[0, 1, 2, 3, 24, 25, 26, 27, 48, 49, 50, 51]
you can use numpy.arange which returns an ndarray object containing evenly spaced values within a given range
import numpy as np
r = np.arange(0, 54, 4)
print(r)
Result
[0, 4, 8, 12, 16, 20, 24, 28, 32, 36, 40, 44, 48, 52]
Numpy approach
You can use np.arange to generate number with a step value of 20 + 4, where 20 is for space between each interval and 4 for each sequential sub array.
start = 0
end = 54
out = np.arange(0, 54, 24) # array([ 0, 24, 48]) These are the starting points
# for each subarray
step = np.tile(np.arange(4), (len(out), 1))
# [[0 1 2 3]
# [0 1 2 3]
# [0 1 2 3]]
res = out[:, None] + step
# array([[ 0, 1, 2, 3],
# [24, 25, 26, 27],
# [48, 49, 50, 51]])
This can be done with plane python:
rangeStart = 0
rangeStop = 54
setLen = 4
step = 20
stepTot = step + setLen
a = list( list(i+s for s in range(setLen)) for i in range(rangeStart,rangeStop,stepTot))
In this case you will get the subsets as sublists in the array.
I dont think you need to use numpy or pandas to do what you want. I achieved it with a simple while loop
num = 0
end = 54
sequence = []
while num <= end:
sequence.append(num)
num += 1
if num%4 == 0: //If four numbers have been added
num += 20
//output: [0, 1, 2, 3, 24, 25, 26, 27, 48, 49, 50, 51]
I have a list of numbers and from this list, I want to create 3 more lists that contain the maximum, average, and 5th largest number from it. My original list overdraw is the block of lists, which means it has sub-blocks in it and each block has 6 numbers in it and there are a total of 3 blocks or 6x3 matrix or array.
overdraw:
[[16,13,23,14,33,45],[23,11,54,34,23,76],[22,54,34,43,41,11]]
I know how to calculate max, average and 5 largest in this list. But I want a answer in a specific way like I know the max, average, and 5th largest values of each block but I want them to get printed 4 times. I know all the values:
Max = [45, 76, 54]
Average = [24, 37, 34]
Largest(5th) = [14, 23, 22]
my approach:
overdraw = [[16,13,23,14,33,45],[23,11,54,34,23,76],[22,54,34,43,41,11]]
x = [sorted(block, reverse=True) for block in overdraw] # first sort the whole list
max = [x[i][0] for i in range(0, len(x))] # for max
largest = [x[i][4] for i in range(0, len(x))] #5th largest
average = [sum(x[i])/len(x[i]) for i in range(0, len(x))] #average
print("max: ", max)
print("5th largest: ", largest)
print("average: ", average)
You will get the same output after running this code but I want output in this format:
Average = [24, 24, 24, 24, 37, 37, 37, 37, 34, 34, 34, 34]
Max = [45, 45, 45, 45, 76, 76, 76, 76, 54, 54, 54, 54]
Largest(5th) = [14, 14, 14, 14, 23, 23, 23, 23, 22, 22, 22, 22]
As you can see each average, max, and the largest number is printed 4 times in their respective list. So can anyone help with this answer?
What about using pandas.DataFrame.explode
import pandas as pd
df = pd.DataFrame({
'OvIdx' : 3 * [range(4)],
'Average' : average,
'Max' : max, # should be renamed/assigned as max_ instead
'Largest(5th)': largest
}).explode('OvIdx').set_index('OvIdx').astype(int)
print(df)
which shows
Average Max Largest(5th)
OvIdx
0 24 45 14
1 24 45 14
2 24 45 14
3 24 45 14
0 36 76 23
1 36 76 23
2 36 76 23
3 36 76 23
0 34 54 22
1 34 54 22
2 34 54 22
3 34 54 22
from here, you can still do all the calculations you want and/or getting a NumPy array, doing df.values.
Following your comment, you can also get your column(s) as individual entities, doing, e.g.
>>> df.Average.tolist()
[24, 24, 24, 24, 36, 36, 36, 36, 34, 34, 34, 34]
>>> df.Max.tolist()
[45, 45, 45, 45, 76, 76, 76, 76, 54, 54, 54, 54]
>>> df['Largest(5th)'].tolist() # as string key since the name is a little bit exotic
[14, 14, 14, 14, 23, 23, 23, 23, 22, 22, 22, 22]
which approach starts to be a little bit overkilled, readable though.
A solution that returns lists like you specified
import itertools
import numpy as np
n_times = 4
overdraw = [[16,13,23,14,33,45],[23,11,54,34,23,76],[22,54,34,43,41,11]]
y = [sorted(block, reverse=True) for block in overdraw]
maximum = list(itertools.chain(*[[max(x)]*n_times for x in y]))
average = list(itertools.chain(*[[int(round(sum(x)/len(x)))]*n_times for x in y]))
fifth_largest = list(itertools.chain(*[[x[4]]*n_times for x in y]))
print(f"Average = {average}")
print(f"Max = {maximum}")
print(f"Largest(5th): {fifth_largest}")
Outputs:
Average = [24, 24, 24, 24, 37, 37, 37, 37, 34, 34, 34, 34]
Max = [45, 45, 45, 45, 76, 76, 76, 76, 54, 54, 54, 54]
Largest(5th): [14, 14, 14, 14, 23, 23, 23, 23, 22, 22, 22, 22]
I have an nxn numpy array, and I would like to divide it evenly into nxn tiles and randomly shuffle these, while retaining the pattern inside the tiles.
For example, if I have an array that's size (200,200), I want to be able to divide this into say 16 arrays of size (50,50), or even 64 arrays of size (25,25), and randomly shuffle these, while retaining the same shape of the original array (200,200) and retaining the order of numbers inside of the smaller arrays.
I have looked up specific numpy functions, and I found the numpy.random.shuffle(x) function, but this will randomly shuffle the individual elements of an array. I would only like to shuffle these smaller arrays within the larger array.
Is there any numpy function or quick way that will do this? I'm not sure where to begin.
EDIT: To further clarify exactly what I want:
Let's say I have an input 2D array of shape (10,10) of values:
0 1 2 3 4 5 6 7 8 9
10 11 12 13 14 15 16 17 18 19
20 21 22 23 24 25 26 27 28 29
30 31 32 33 34 35 36 37 38 39
40 41 42 43 44 45 46 47 48 49
50 51 52 53 54 55 56 57 58 59
60 61 62 63 64 65 66 67 68 69
70 71 72 73 74 75 76 77 78 79
80 81 82 83 84 85 86 87 88 89
90 91 92 93 94 95 96 97 98 99
I choose a tile size such that it fits evenly into this array, so since this array has shape (10,10), I can either choose to split this into 4 (5,5) tiles, or 25 (2,2) tiles. So if I choose 4 (5,5) tiles, I want to randomly shuffle these tiles that results in an output array that could look like this:
50 51 52 53 54 0 1 2 3 4
60 61 62 63 64 10 11 12 13 14
70 71 72 73 74 20 21 22 23 24
80 81 82 83 84 30 31 32 33 34
90 91 92 93 94 40 41 42 43 44
55 56 57 58 59 5 6 7 8 9
65 66 67 68 69 15 16 17 18 19
75 76 77 78 79 25 26 27 28 29
85 86 87 88 89 35 36 37 38 39
95 96 97 98 99 45 46 47 48 49
Every array (both the input array, the output array, and the separate tiles) would be squares, so that when randomly shuffled the size and dimension of the main array stays the same (10,10).
here is my solution using loop
import numpy as np
arr = np.arange(36).reshape(6,6)
def suffle_section(arr, n_sections):
assert arr.shape[0]==arr.shape[1], "arr must be square"
assert arr.shape[0]%n_sections == 0, "arr size must divideable into equal n_sections"
size = arr.shape[0]//n_sections
new_arr = np.empty_like(arr)
## randomize section's row index
rand_indxes = np.random.permutation(n_sections*n_sections)
for i in range(n_sections):
## randomize section's column index
for j in range(n_sections):
rand_i = rand_indxes[i*n_sections + j]//n_sections
rand_j = rand_indxes[i*n_sections + j]%n_sections
new_arr[i*size:(i+1)*size, j*size:(j+1)*size] = \
arr[rand_i*size:(rand_i+1)*size, rand_j*size:(rand_j+1)*size]
return new_arr
result = suffle_section(arr, 3)
display(arr)
display(result)
array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35]])
array([[ 4, 5, 16, 17, 24, 25],
[10, 11, 22, 23, 30, 31],
[14, 15, 2, 3, 0, 1],
[20, 21, 8, 9, 6, 7],
[26, 27, 12, 13, 28, 29],
[32, 33, 18, 19, 34, 35]])
If you have access to skimage (it comes with Spyder) you could use view_as_blocks:
from skimage.util import view_as_blocks
def shuffle_tiles(arr, m, n):
a_= view_as_blocks(arr,(m,n)).reshape(-1,m,n)
# shuffle works along 1st dimension and in-place
np.random.shuffle(a_)
return a_
We will use np.random.shuffle alongwith axes permutations to achieve the desired results. There are two interpretations to it. Hence, two solutions.
Shuffle randomly within each block
Elements in each block are randomized and that same randomized order is maintaiined in all blocks.
def randomize_tiles_shuffle_within(a, M, N):
# M,N are the height and width of the blocks
m,n = a.shape
b = a.reshape(m//M,M,n//N,N).swapaxes(1,2).reshape(-1,M*N)
np.random.shuffle(b.T)
return b.reshape(m//M,n//N,M,N).swapaxes(1,2).reshape(a.shape)
Shuffle randomly blocks w.r.t each other
Blocks are randomized w.r.t each other, while keeping the order within each block same as in the original array.
def randomize_tiles_shuffle_blocks(a, M, N):
m,n = a.shape
b = a.reshape(m//M,M,n//N,N).swapaxes(1,2).reshape(-1,M*N)
np.random.shuffle(b)
return b.reshape(m//M,n//N,M,N).swapaxes(1,2).reshape(a.shape)
Sample runs -
In [47]: a
Out[47]:
array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35]])
In [48]: randomize_tiles_shuffle_within(a, 3, 3)
Out[48]:
array([[ 1, 7, 13, 4, 10, 16],
[14, 8, 12, 17, 11, 15],
[ 0, 6, 2, 3, 9, 5],
[19, 25, 31, 22, 28, 34],
[32, 26, 30, 35, 29, 33],
[18, 24, 20, 21, 27, 23]])
In [49]: randomize_tiles_shuffle_blocks(a, 3, 3)
Out[49]:
array([[ 3, 4, 5, 18, 19, 20],
[ 9, 10, 11, 24, 25, 26],
[15, 16, 17, 30, 31, 32],
[ 0, 1, 2, 21, 22, 23],
[ 6, 7, 8, 27, 28, 29],
[12, 13, 14, 33, 34, 35]])
Here is an approach that tries hard to avoid unnecessary copies:
import numpy as np
def f_pp(a,bs):
i,j = a.shape
k,l = bs
esh = i//k,k,j//l,l
bc = esh[::2]
sh1,sh2 = np.unravel_index(np.random.permutation(bc[0]*bc[1]),bc)
ns1,ns2 = np.unravel_index(np.arange(bc[0]*bc[1]),bc)
out = np.empty_like(a)
out.reshape(esh)[ns1,:,ns2] = a.reshape(esh)[sh1,:,sh2]
return out
Timings:
pp 0.41529153706505895
dv 1.3133141631260514
br 1.6034217830747366
Test script (continued)
# Divakar
def f_dv(a,bs):
M,N = bs
m,n = a.shape
b = a.reshape(m//M,M,n//N,N).swapaxes(1,2).reshape(-1,M*N)
np.random.shuffle(b)
return b.reshape(m//M,n//N,M,N).swapaxes(1,2).reshape(a.shape)
from skimage.util import view_as_blocks
# Brenlla shape fixed by pp
def f_br(arr,bs):
m,n = bs
a_= view_as_blocks(arr,(m,n))
sh = a_.shape
a_ = a_.reshape(-1,m,n)
# shuffle works along 1st dimension and in-place
np.random.shuffle(a_)
return a_.reshape(sh).swapaxes(1,2).reshape(arr.shape)
ex = np.arange(100000).reshape(1000,100)
bs = 10,10
tst = np.tile(np.arange(np.prod(bs)).reshape(bs),np.floor_divide(ex.shape,bs))
from timeit import timeit
for n,f in list(globals().items()):
if n.startswith('f_'):
assert (tst==f(tst,bs)).all()
print(n[2:],timeit(lambda:f(ex,bs),number=1000))
Here's code to shuffle row order but keep row items exactly as is:
import numpy as np
np.random.seed(0)
#creates a 6x6 array
a = np.random.randint(0,100,(6,6))
a
array([[44, 47, 64, 67, 67, 9],
[83, 21, 36, 87, 70, 88],
[88, 12, 58, 65, 39, 87],
[46, 88, 81, 37, 25, 77],
[72, 9, 20, 80, 69, 79],
[47, 64, 82, 99, 88, 49]])
#creates a number for each row index, 0,1,2,3,4,5
order = np.arange(6)
#shuffle index array
np.random.shuffle(order)
#make new array in shuffled order
shuffled = np.array([a[y] for y in order])
shuffled
array([[46, 88, 81, 37, 25, 77],
[88, 12, 58, 65, 39, 87],
[83, 21, 36, 87, 70, 88],
[47, 64, 82, 99, 88, 49],
[44, 47, 64, 67, 67, 9],
[72, 9, 20, 80, 69, 79]])
I have two arrays and I am wanting to loop through a second array to only return arrays whose first element is equal to an element from another array.
a = [10, 11, 12, 13, 14]
b = [[9, 23, 45, 67, 56, 23, 54], [10, 8, 52, 30, 15, 47, 109], [11, 81,
152, 54, 112, 78, 167], [13, 82, 84, 63, 24, 26, 78], [18, 182, 25, 63, 96,
104, 74]]
I have two different arrays, a and b. I would like to find a way to look through each of the sub-arrays(?) within b in which
the first value is equal to the values in array a to create a new array, c.
The result I am looking for is:
c = [[10, 8, 52, 30, 15, 47, 109],[11, 81, 152, 54, 112, 78, 167],[13, 82, 84, 63, 24, 26, 78]]
Does Python have a tool to do this in a way Excel has MATCH()?
I tried looping in a manner such as:
for i in a:
if i in b:
print (b)
But because there are other elements within the array, this way is not working. Any help would be greatly appreciated.
Further explanation of the problem:
a = [5, 6, 7, 9, 12]
I read in a excel file using XLRD (b_csv_data):
Start Count Error Constant Result1 Result2 Result3 Result4
5 41 0 45 23 54 66 19
5.4 44 1 21 52 35 6 50
6 16 1 42 95 39 1 13
6.9 50 1 22 71 86 59 97
7 38 1 43 50 47 83 67
8 26 1 29 100 63 15 40
9 46 0 28 85 9 27 81
12 43 0 21 74 78 20 85
Next, I created a look to read in a select number of rows. For simplicity, this file above only has a few rows. My current file has about 100 rows.
for r in range (1, 7): #skipping headers and only wanting first few rows to start
b_raw = b_csv_data.row_values(r)
b = np.array(b_raw) # I created this b numpy array from the line of code above
Use np.isin -
In [8]: b[np.isin(b[:,0],a)]
Out[8]:
array([[ 10, 8, 52, 30, 15],
[ 11, 81, 152, 54, 112],
[ 13, 82, 84, 63, 24]])
With sorted a, we can also use np.searchsorted -
idx = np.searchsorted(a,b[:,0])
idx[idx==len(a)] = 0
out = b[a[idx] == b[:,0]]
If you have an array with different number of elements per row, which is essentially array of lists, you need to modify the slicing part. So, in that case, get the first off elements -
b0 = [bi[0] for bi in b]
Then, use b0 to replace all instances of b[:,0] in earlier posted methods.
Use list comprehension:
c = [l for l in b if l[0] in a]
Output:
[[10, 8, 52, 30, 15], [11, 81, 152, 54, 112], [13, 82, 84, 63, 24]]
If your list or arrays are considerably large, using numpy.isin can be significantly faster:
b[np.isin(b[:, 0], a), :]
Benchmark:
a = [10, 11, 12, 13, 14]
b = [[9, 23, 45, 67, 56], [10, 8, 52, 30, 15], [11, 81, 152, 54, 112],
[13, 82, 84, 63, 24], [18, 182, 25, 63, 96]]
list_comp, np_isin = [], []
for i in range(1,100):
a_test = a * i
b_test = b * i
list_comp.append(timeit.timeit('[l for l in b_test if l[0] in a_test]', number=10, globals=globals()))
a_arr = np.array(a_test)
b_arr = np.array(b_test)
np_isin.append(timeit.timeit('b_arr[np.isin(b_arr[:, 0], a_arr), :]', number=10, globals=globals()))
While it is not clear and concise, I would recommend using list comprehension if the b is shorter than 100. Otherwise, numpy is your way to go.
You are doing it reverse. It is better to loop through the elements of b array and check if it is present in a. If yes then print that element of b. See the answer below.
a = [10, 11, 12, 13, 14]
b = [[9, 23, 45, 67, 56, 23, 54], [10, 8, 52, 30, 15, 47, 109], [11, 81, 152, 54, 112, 78, 167], [13, 82, 84, 63, 24, 26, 78], [18, 182, 25, 63, 96, 104, 74]]
for bb in b: # if you want to check only the first element of b is in a
if bb[0] in a:
print(bb)
for bb in b: # if you want to check if any element of b is in a
for bbb in bb:
if bbb in a:
print(bb)
Output:
[10, 8, 52, 30, 15, 47, 109]
[11, 81, 152, 54, 112, 78, 167]
[13, 82, 84, 63, 24, 26, 78]