numpy: Print matrix with random elements, columns and rows - python

I want a matrix to be printed with random columns(0, 9) and random rows(0, 9) with random elements(0, 9)
Where (0, 9) is any random number between 0 and 9.

First, randomize your number of columns and rows:
import numpy as np
rows, cols = np.random.randint(10, size = 2)
If you want a matrix of integers just try:
m = np.random.randint(10, size = (rows,cols))
This will output a rows x cols matrix with random numbers in the close interval [0,9].
If you want a matrix of float numbers just try:
m = np.random.rand(rows,cols) * 9
This will output a rows x cols matrix with random numbers in the close interval [0,9].

If what you're looking for is a 10x10 matrix filled with random numbers between 0 and 9, here's what you want:
# this randomizes the size of the matrix.
rows, cols = np.random.randint(9, size=(2))
# this prints a matrix filled with random numbers, with the given size.
print(np.random.randint(9, size=(rows, cols)))
Output:
[[1 7 1 4 4 4 4 3]
[1 4 7 3 0 5 3 5]
[6 3 3 7 5 7 6 1]
[3 8 5 7 2 0 1 6]
[5 0 8 5 0 1 5 1]
[1 3 3 7 3 7 5 6]
[3 7 4 1 8 3 7 8]
[8 8 8 5 8 4 7 1]]

Related

Calculating the max and index of max within a section of array

Following the StackOverflow post Elegantly calculate mean of first three values of a list I have tweaked the code to find the maximum.
However, I also require to know the position/index of the max.
So the code below calculates the max value for the first 3 numbers and then the max value for the next 3 numbers and so on.
For example for a list of values [6 3 7 4 6 9 2 6 7 4 3 7 7 2 5 4 1 7 5 1]. The code below takes the first 3 values 6,3,7 and outputs the max as 7 and then for the next 3 values 4,6,9 outputs the value 9 and so on.
But I also want to find which position/index they are at, 1.e 7 is at position 2 and 9 at position 5. The final result [2,5,8,11,12,...]. Any ideas on how to calculate the index. Thanks in advance.
import numpy as np
np.random.seed(42)
test_data = np.random.randint(low = 0, high = 10, size = 20)
maxval = [max(test_data[i:i+3]) for i in range(0,len(test_data),3)]
print(test_data)
print(maxval)
output: test_data : [6 3 7 4 6 9 2 6 7 4 3 7 7 2 5 4 1 7 5 1]
output: [7, 9, 7, 7, 7, 7, 5]
import numpy as np
np.random.seed(42)
test_data = np.random.randint(low = 0, high = 10, size = 20)
maxval = [max(test_data[i:i+3]) for i in range(0,len(test_data),3)]
index = [(np.argmax(test_data[i: i+3]) + i) for i in range(0,len(test_data),3)]
print(test_data)
print(maxval)
print(index)

Function for scaling numbers in a pd.DF with a list of numbers

Pretty new to Python. I'm trying to create a function which should look at a csv file, with an ID number, Name, and then N columns of numbers from different tests and then scale/round the numbers so they can be compared to the Danish grading system from [-3, 00, 02, 4, 7, 10, 12].
My script below does exactly that, but my function only returns the last result of the DF.
Here's the CSV, I use for testing:
StudentID,Name,Assignment1,Assignment2,Assignment3
s123456,Michael Andersen,7,5,4
s123789,Bettina Petersen,12,3,10
s123468,Thomas Nielsen,-3,7,2
s123579,Marie Hansen,10,12,12
s123579,Marie Hansen,10,12,12
s127848, Andreas Nielsen,2,2,2
s120799, Mads Westergaard,12,12,10
Its worth to mention that i need these functions separate, for my main script.
I've made a simple function which loads the file using pandas:
import pandas as pd
def dataLoad(filename):
grades = pd.read_csv(filename)
return grades
then I've written this script for the rounding of the numbers:
# Importing modules
import pandas as pd
import numpy as np
#Loading in the function dataLoad
from dataLoad import dataLoad
#Defining my data witht the function
grades=dataLoad('Karakterer.csv')
def roundGrade(grades):
#Dropping the two first columns of the pd.DF
grades=grades.drop(['StudentID','Name'],axis=1)
#Making the pd.DF into a numpy array
sample_grades=np.array(grades)
#Setting the parameters of the scale to round up to
grade_Scale = np.array([-3,0,2,4,7,10,12])
#Defining i, so i get gradually bigger with each cycle
i=0
#Making a for loop, which rounds every number in every row of the given array
for i in range(0,len(grades)):
grouped = [min(grade_Scale,key=lambda x:abs(grade-x)) for grade in sample_grades[i,:]]
#Making i 1 time bigger for each cycle
i=i+1
return grouped
Tell if you need some more information about the script, cheers guys!
For improve performance use numpy:
#assign output to df instead grades for possible assign values back in last step
df = dataLoad('Karakterer.csv')
grades = df.drop(['StudentID','Name'],axis=1)
grade_Scale = np.array([-3,0,2,4,7,10,12])
grades=df.drop(['StudentID','Name'],axis=1)
print (grades)
Assignment1 Assignment2 Assignment3
0 7 5 4
1 12 3 10
2 -3 7 2
3 10 12 12
4 10 12 12
5 2 2 2
6 12 12 10
arr = grades.values
a = grade_Scale[np.argmin(np.abs(arr[:,:, None] - grade_Scale[None,:]), axis=2)]
print (a)
[[ 7 4 4]
[12 2 10]
[-3 7 2]
[10 12 12]
[10 12 12]
[ 2 2 2]
[12 12 10]]
Last if need assign back output to columns:
df[grades.columns] = a
print (df)
StudentID Name Assignment1 Assignment2 Assignment3
0 s123456 Michael Andersen 7 4 4
1 s123789 Bettina Petersen 12 2 10
2 s123468 Thomas Nielsen -3 7 2
3 s123579 Marie Hansen 10 12 12
4 s123579 Marie Hansen 10 12 12
5 s127848 Andreas Nielsen 2 2 2
6 s120799 Mads Westergaard 12 12 10
Explanation:
It is used this solution but for multiple columns:
Idea is compare 2d array created from all columns from DataFrame to arr by array grade_Scale. So you can use broadcasting for possible create 3d array of differences between them with absolute values:
print (np.abs(arr[:,:, None] - grade_Scale[None,:]))
[[[10 7 5 3 0 3 5]
[ 8 5 3 1 2 5 7]
[ 7 4 2 0 3 6 8]]
[[15 12 10 8 5 2 0]
[ 6 3 1 1 4 7 9]
[13 10 8 6 3 0 2]]
[[ 0 3 5 7 10 13 15]
[10 7 5 3 0 3 5]
[ 5 2 0 2 5 8 10]]
[[13 10 8 6 3 0 2]
[15 12 10 8 5 2 0]
[15 12 10 8 5 2 0]]
[[13 10 8 6 3 0 2]
[15 12 10 8 5 2 0]
[15 12 10 8 5 2 0]]
[[ 5 2 0 2 5 8 10]
[ 5 2 0 2 5 8 10]
[ 5 2 0 2 5 8 10]]
[[15 12 10 8 5 2 0]
[15 12 10 8 5 2 0]
[13 10 8 6 3 0 2]]]
Then use position by minimal values by numpy.argmin per axis=2 (working with 3rd axis in 3d array):
print (np.argmin(np.abs(arr[:,:, None] - grade_Scale[None,:]), axis=2))
[[4 3 3]
[6 2 5]
[0 4 2]
[5 6 6]
[5 6 6]
[2 2 2]
[6 6 5]]
And last use indexing by grade_Scale values:
print (grade_Scale[np.argmin(np.abs(arr[:,:, None] - grade_Scale[None,:]), axis=2)])
[[ 7 4 4]
[12 2 10]
[-3 7 2]
[10 12 12]
[10 12 12]
[ 2 2 2]
[12 12 10]]
You are re-assigning the new calculated value to grouped in every iteration. One way to handle that is to declare a variable and append,
def roundGrade(grades):
i = 0
grouped = []
for i in range(0,len(grades)):
grouped.append([min(grade_Scale,key=lambda x:abs(grade-x)) for grade in sample_grades[i,:]])
i=i+1
return grouped
Now call the function,
roundGrade(np.array([[ 7, 5, 4],
[12, 3, 10]]))
[[7, 4, 4], [12, 2, 10]]

Shuffle "coupled" elements in python array

Let's say I have this array:
np.arange(9)
[0 1 2 3 4 5 6 7 8]
I would like to shuffle the elements with np.random.shuffle but certain numbers have to be in the original order.
I want that 0, 1, 2 have the original order.
I want that 3, 4, 5 have the original order.
And I want that 6, 7, 8 have the original order.
The number of elements in the array would be multiple of 3.
For example, some possible outputs would be:
[ 3 4 5 0 1 2 6 7 8]
[ 0 1 2 6 7 8 3 4 5]
But this one:
[2 1 0 3 4 5 6 7 8]
Would not be valid because 0, 1, 2 are not in the original order
I think that maybe zip() could be useful here, but I'm not sure.
Short solution using numpy.random.shuffle and numpy.ndarray.flatten functions:
arr = np.arange(9)
arr_reshaped = arr.reshape((3,3)) # reshaping the input array to size 3x3
np.random.shuffle(arr_reshaped)
result = arr_reshaped.flatten()
print(result)
One of possible random results:
[3 4 5 0 1 2 6 7 8]
Naive approach:
num_indices = len(array_to_shuffle) // 3 # use normal / in python 2
indices = np.arange(num_indices)
np.random.shuffle(indices)
shuffled_array = np.empty_like(array_to_shuffle)
cur_idx = 0
for idx in indices:
shuffled_array[cur_idx:cur_idx+3] = array_to_shuffle[idx*3:(idx+1)*3]
cur_idx += 3
Faster (and cleaner) option:
num_indices = len(array_to_shuffle) // 3 # use normal / in python 2
indices = np.arange(num_indices)
np.random.shuffle(indices)
tmp = array_to_shuffle.reshape([-1,3])
tmp = tmp[indices,:]
tmp.reshape([-1])

numpy: extract multiple subarrays of a position array in an efficient way

I have a 2D coefficient array COEFF with size row x col and a position array POS with size n x 2.
The goal is to create a batched array BAT with size n x (2*l) x (2*l) where l is the half length of subarray.
It looks like this
BAT[i, :, :] = COEFF[POS[i, 1] - l:POS[i, 1] + l, POS[i, 0] - l:POS[i, 0] + l]
It is possible to generate BAT based on above sequential code. However, I'm wondering is there an efficient way to construct the BAT array in parallel.
Thanks!
I'm not aware of a perfectly satisfactory solution to mixing advanced indexing and slicing in that way. But the following may be acceptable (assuming that by "parallel" you mean "vectorised"):
import numpy as np
nrow, ncol = 7, 7
n, l = 3, 2
coeff = np.random.randint(0,10, (nrow,ncol))
pos = np.c_[np.random.randint(l, nrow-l+1, (n,)),np.random.randint(l, ncol-l+1, (n,))]
i = (pos[:, :1] + np.arange(-l, l))[:, :, None]
j = (pos[:, 1:] + np.arange(-l, l))[:, None, :]
print(coeff, '\n')
print(pos, '\n')
print(coeff[i, j])
Prints:
# [[7 6 7 6 3 9 9]
# [3 6 8 3 4 8 6]
# [3 7 4 7 4 6 8]
# [0 7 2 3 7 0 4]
# [8 5 2 0 0 1 7]
# [4 6 1 9 4 5 4]
# [1 6 8 3 4 5 0]]
# [[2 2]
# [3 2]
# [2 4]]
# [[[7 6 7 6]
# [3 6 8 3]
# [3 7 4 7]
# [0 7 2 3]]
# [[3 6 8 3]
# [3 7 4 7]
# [0 7 2 3]
# [8 5 2 0]]
# [[7 6 3 9]
# [8 3 4 8]
# [4 7 4 6]
# [2 3 7 0]]]

Python: Shrink/Extend 2D arrays in fractions

There are 2D arrays of numbers as outputs of some numerical processes in the form of 1x1, 3x3, 5x5, ... shaped, that correspond to different resolutions.
In a stage an average i.e., 2D array value in the shape nxn needs to be produced.
If the outputs were in consistency of shape i.e., say all in 11x11 the solution was obvious, so:
element_wise_mean_of_all_arrays.
For the problem of this post however the arrays are in different shapes so the obvious way does not work!
I thought it might be some help by using kron function however it didn't. For example, if array is in shape of 17x17 how to make it 21x21. So for all others from 1x1,3x3,..., to build a constant-shaped array, say 21x21.
Also it can be the case that the arrays are smaller and bigger in shape compared to the target shape. That is an array of 31x31 to be shruk into 21x21.
You could imagine the problem as a very common task for images, being shrunk or extended.
What are possible efficient approaches to do the same jobs on 2D arrays, in Python, using numpy, scipy, etc?
Updates:
Here is a bit optimized version of the accepted answer bellow:
def resize(X,shape=None):
if shape==None:
return X
m,n = shape
Y = np.zeros((m,n),dtype=type(X[0,0]))
k = len(X)
p,q = k/m,k/n
for i in xrange(m):
Y[i,:] = X[i*p,np.int_(np.arange(n)*q)]
return Y
It works perfectly, however do you all agree it is the best choice in terms of the efficiency? If not any improvement?
# Expanding ---------------------------------
>>> X = np.array([[1,2,3],[4,5,6],[7,8,9]])
[[1 2 3]
[4 5 6]
[7 8 9]]
>>> resize(X,[7,11])
[[1 1 1 1 2 2 2 2 3 3 3]
[1 1 1 1 2 2 2 2 3 3 3]
[1 1 1 1 2 2 2 2 3 3 3]
[4 4 4 4 5 5 5 5 6 6 6]
[4 4 4 4 5 5 5 5 6 6 6]
[7 7 7 7 8 8 8 8 9 9 9]
[7 7 7 7 8 8 8 8 9 9 9]]
# Shrinking ---------------------------------
>>> X = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12],[13,14,15,16]])
[[ 1 2 3 4]
[ 5 6 7 8]
[ 9 10 11 12]
[13 14 15 16]]
>>> resize(X,(2,2))
[[ 1 3]
[ 9 11]]
Final note: that the code above easily could be translated to Fortran for the highest performance possible.
I'm not sure I understand exactly what you are trying but if what I think the simplest way would be:
wanted_size = 21
a = numpy.array([[1,2,3],[4,5,6],[7,8,9]])
b = numpy.zeros((wanted_size, wanted_size))
for i in range(wanted_size):
for j in range(wanted_size):
idx1 = i * len(a) / wanted_size
idx2 = j * len(a) / wanted_size
b[i][j] = a[idx1][idx2]
You could maybe replace the b[i][j] = a[idx1][idx2] with some custom function like the average of a 3x3 matrix centered in a[idx1][idx2] or some interpolation function.

Categories

Resources