Matching by column within a specified range in a multidimensional numpy array

Matching by column within a specified range in a multidimensional numpy array - python

I have a set of arrays that I want to match to a seperate set of arrays I have, within a certain range. Example code bloew:
arr1_1 = np.array([468.12, 30.4, 879.74])
arr1_2 = np.array([351.20, 84.98, 514.45])
arr1_3 = np.array([21.456, 89.56, 69.45])
array_1 = np.column_stack ((arr1_1, arr1_2, arr1_3))
arr2_1 = np.array([879.12, 48.4, 212.47...])
arr2_2 = np.array([389.06, 80.91, 87.98...])
arr2_3 = np.array([224.566, 98.35, 657.30..])
array_2 = np.column_stack ((arr2_1, arr2_2, arr2_3))
The second set of arrays are much larger than the first. Is there any way to match the second set to the first, by column, within a specific range? i.e any rows from array_2 where the second columns value (array2_2) is within ±5 of array1_2 in array_1? In this examply my ideal output would be
array_match = ([48.4, 80.91, 98.35])
(because 80.91 is within 5 of 84.98)

Related

Understanding np.ix_

Code:
import numpy as np
ray = [1,22,33,42,51], [61,71,812,92,103], [113,121,132,143,151], [16,172,183,19,201]
ray = np.asarray(ray)
type(ray)
ray[np.ix_([-2:],[3:4])]
I'd like to use index slicing and get a subarray consisting of the last two rows and the 3rd/4th columns. My current code produces an error:
I'd also like to sum each column. What am I doing wrong? I cannot post a picture because I need at least 10 reputation points.

So you want to make a slice of an array. The most straightforward way to do it is... slicing:
slice = ray[-2:,3:]
or if you want it explicitly
slice = ray[-2:,3:5]
See it explained in Understanding slicing
But if you do want to use np.ix_ for some reason, you need
slice = ray[np.ix_([-2,-1],[3,4])]
You can't use : here, because [] here don't make a slice, they construct lists and you should specify explicitly every row number and every column number you want in the result. If there are too many consecutive indices, you may use range:
slice = ray[np.ix_(range(-2, 0),range(3, 5))]
And to sum each column:
slice.sum(0)
0 means you want to reduce the 0th dimension (rows) by summation and keep other dimensions (columns in this case).

Can you extract indexes of data over a threshold from numpy array or pandas dataframe

I am using the following to compare several strings to each other. It's the fastest method I've been able to devise, but it results in a very large 2D array. which I can look at and see what I want. Ideally, I would like to set a threshold and pull the index(es) for each value over that number. To make matters more complicated, I don't want the index comparing the string to itself, and it's possible the string might be duplicated elsewhere so I would want to know if that's the case, so I can't just ignore 1's.
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
texts = sql.get_corpus()
vectorizer = TfidfVectorizer()
vectors = vectorizer.fit_transform(texts)
similarity = cosine_similarity(vectors)
sql.get_corups() returns a list of strings, currently 1600ish strings.
Is what I want possible? I've tried using comparing each of the 1.4M combinations to each other using Levenshtein, which works, but it takes 2.5 hours vs half above. I've also tried vecotrs with spacy, which takes days.

I'm not entirely sure I read your post correctly, but I believe this should get you started:
import numpy as np
# randomly distributed data we want to filter
data = np.random.rand(5, 5)
# get index of all values above a threshold
threshold = 0.5
above_threshold = data > threshold
# I am assuming your matrix has all string comparisons to
# itself on the diagonal
not_ident = np.identity(5) == 0.
# [edit: to prevent duplicate comparisons, use this instead of not_ident]
#upper_only = np.triu(np.ones((5,5)) - np.identity(5))
# 2D array, True when criteria met
result = above_threshold * not_ident
print(result)
# original shape, but 0 in place of all values not matching above criteria
values_orig_shape = data * result
print(values_orig_shape)
# all values that meet criteria, as a 1D array
values = data[result]
print(values)
# indices of all values that meet criteria (in same order as values array)
indices = [index for index,value in np.ndenumerate(result) if value]
print(indices)

Uniform index sampling for numpy arrays

I have a space of zeros with a variable dimension and an array of ones with a variable dimension, for instance:
import numpy
space = numpy.zeros((1000,5))
a = numpy.ones((150))
I would like to insert the ones of the array inside the matrix in order that those ones will be homogeneously distributed inside the matrix.

You can use numpy.linspace to obtain the indices.
It's not obvious if you'd like to assign a slice of five ones to every index location or just assign to the first index of the slice. This is how both of these would work:
space = numpy.zeros((1000,5))
a = numpy.ones((150, 5))
b = numpy.ones((150,))
index = numpy.rint(numpy.linspace(start=0, stop=999, num=150)).astype(np.int)
# This would assign five ones to every location
space[index] = a
# This would assign a one to the first element at every location
space[index, 0] = b

Select same length of subarrays by evenly spacing in numpy

I'm new to python and stackoverflow and I'm working on a project that deal with manually created arrays with different length.
path = '/home/Documents/Noise'
files = glob.glob(path + '/*.txt')
data_noise = []
for file in files:
df = pd.read_csv(file, delimiter=',', header=None)
df = df.values
m,n = df.shape
df = np.reshape(df,m)
data_noise.append(df)
I create a list data_noise to store numpy arrays, each array has different length m. I want to select subarrays from each array so that they have same length, say, 100. But instead of selecting the first 100 elements or the last 100 in each array, I want to evenly space and select in each array.
For example, for a length 300 array, I need elements indexed by 0,3,6,9,... and for a length 500 array, I need elements indexed by 0,5,10,15,...
How do I modify my code to do that?

It's not numpy, but in general Python this should work.
distance = 100 / m
sub_list = df[0::distance]
Provided you add some checks and possibly rounding.

How to get nd array from np.where() in python?

My problem has different moving objects. We calculate distance between these objects in different time frame.
I have a nd array A with shape (a,b) which stores distances. a is the number of fames and b is the number of coordinates on which this distance is calculated.
I have a list L which has the names of these objects. It has a length of b.
I want to find where distance value is 1 . Then I want to locate the name of this index in list L (which has same index). I write the following
A=[[nd array]]
L=[list of names]
list_to_array=np.array(L)
array_of_names_meeting_criteria=list_to_array[np.where(A==1)[1]]
The problem is i am not getting names per frame. I want this array to be split frame wise so i get (a,x) where a is number of frames and for each frame I have x number of names
sample case
A=np.array[[1,2,2,6],[3,4,5,1],[3,1,17,4],[2,3,1,5]]
L=[('cat','dog'),('lion','elephant'),('man','women'),('fish','shark')]
llist_to_array=np.array(L)
array_of_names_meeting_criteria=list_to_array[np.where(A==1)[1]]
I should get the below :-
[['cat','dog'],['fish','shark'],['lion','elephant'],['man','women']]

I just did some minor edits from your code and here's the result:
A = np.array([[1,2,2,6],[3,4,5,1],[3,1,17,4],[2,3,1,5]])
L = [('cat','dog'),('lion','elephant'),('man','women'),('fish','shark')]
list_to_array = np.array(L)
array_of_names_meeting_criteria = list_to_array[np.where(A==1)[1]]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Matching by column within a specified range in a multidimensional numpy array - python

Related

Understanding np.ix_

Can you extract indexes of data over a threshold from numpy array or pandas dataframe

Uniform index sampling for numpy arrays

Select same length of subarrays by evenly spacing in numpy

How to get nd array from np.where() in python?

Categories

Resources