Removing indexes to match array dimensions - python

I have two arrays (x, y) with different values and I am trying to find the median for y for values in x < 100. My problem is that I have filtered out some values in array x so the arrays are not the same shape. Is there a way I can remove the those indexes that I removed in array y in array x?
For example that they both are 24, 36 but after the filtering array y is 22, 32 and x is still 24, 36. How can I remove the same indexes? lets say I removed index 4, 7 and 9, 14. How can I remove those exact same ones in array x?
My code if needed. data_mg is y and data_dg is x.
data_mg = image_data_mg[0].data[0:x, 0:y].astype('float')
data_err = image_data_err[0].data[0:x, 0:y].astype('float')
data_dg = image_data_dg[0].data[0:x, 0:y].astype('float')
data_mg[data_mg == 0] = np.nan
data_err[data_err == 0] = np.nan
data_dg[data_dg == 0] = np.nan
data_mg = data_mg[data_mg/data_err > 2]
data_dg = np.ndarray.flatten(data_dg)
data_dg = data_dg[data_mg]
data_mg = np.ndarray.flatten(data_mg)
data_mg = data_mg[np.logical_not(np.isnan(data_mg))]
data_dg = np.ndarray.flatten(data_dg)
data_dg = data_dg[np.logical_not(np.isnan(data_dg))]
b = np.where(np.array(data_dg > 100))
median = np.median(data_mg[b])
print('Flux median at dispersion > 100 km/s is ' + str(median))
a = np.where(data_dg <= 100)
median1 = np.median(data_mg[a])
print('Flux median at dispersion <= 100 km/s is ' + str(median1))
IndexError: arrays used as indices must be of integer (or boolean) type, line 10

It looks like data_mg and data_dg start with the same shape and you use boolean indexing to keep the values that are not na in each. The trouble is that different values are nan in each array. I would suggest making a combined index that you can use for both arrays.
data_mg = np.ndarray.flatten(data_mg)
data_dg = np.ndarray.flatten(data_dg)
ix_mg = np.logical_not(np.isnan(data_mg))
ix_dg = np.logical_not(np.isnan(data_dg))
ix_combined = np.logical_and(ix_mg, ix_dg)
data_mg = data_mg[ix_combined]
data_dg = data_dg[ix_combined]

First, you could just do the same indexing operation on each array so they'll be of the same shape. I believe that would look something like this:
idx = data_mg / data_err > 2
data_mg = data_mg[idx]
data_df = data_dg[idx)
But the error you're getting may not be due to this. It looks like your error is coming from the line:
data_dg = data_dg[data_mg]
Giving the error:
IndexError: arrays used as indices must be of integer (or boolean) type, line 10
I'm not sure what your intent is here, so I'm not sure what to recommend. If this is you trying to get them to be the same shape, the lines I included above should do that for you.

Related

I want to compare a 2d numpy array to find that data is lies between the range or not if yes than it should be append in new group

I want to compare 2d numpy array with the single x_min , x_max and same for the value of y but I didn't understand the concept of loop in this case how to define loop to compare and use numpy.where_logical_and.
import numpy as np
group_count = 0
xy = np.array([[116,2306],[118,2307],[126,1517]])
idx = np.array([[0,0],[0,1]])
group1 = []
for l in xy:
for i in idx:
for j in range(1):
x_temp = xy[idx[i][j]]
x1 = x_temp[0][0]
y1 = x_temp[0][1]
x1_max = x1 + 60
x1_min = x1 - 60
y1_max = y1 +60
y1_min = y1 - 60
range_grp_1 = [x1_max,x1_min,y1_min,y1_max]
grp1 = [x1,y1]
grp_1 = np.array(grp1)
#print(grp_1,range_grp_1)
if group_count != 0:
print('group count greater than 0')
if np.where((l[i]>x1_min) and (l[i]<x1_max) and (l[i]>y1_min) and (l[i]<y1_max)):
print(l[i])
else:
group1.append(grp_1)
group_count+=1
Error:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
I post a new code here.
As you said you have a lot of points and the ranges seems to variate, I propose you to wrap the control inside a function, so you can call it as many times as you need to, passing the range of coordinates to be evaluated.
# function to return max and min of list of coordinates
def min_max(coords):
xy = np.array(coords)
xs = [] #save 'x' values
for i in range(len(xy)):
x = [xy[i][0]]
xs.append(x)
ys = [] #save 'y' values
for i in range(len(xy)):
y = [xy[i][1]]
ys.append(y)
rangex = []
rangey = []
for x in min(xs): #get min 'x'
minx = x - 60
rangex.append(minx)
maxx = x + 60
rangex.append(maxx)
for y in min(ys): #get min 'y'
miny = y - 60
rangey.append(miny)
maxy = y + 60
rangey.append(maxy)
return [rangex,rangey]
If you pass the same coordinates you posted the first time, it returns
Execution #1:
coords = [[116,2306],[118,2307],[126,1517]]
my_ranges = min_max(coords)
print(my_ranges)
#[[56, 176], [1457, 1577]]
Or if you pass just the new range you gave me:
Execution #2:
new_coord = [[518,2007]]#pay attention to the format
my_ranges = min_max(new_coord)
print(my_ranges)
#[[458, 578], [1947, 2067]]
And the last part of the code. The one that separates the groups if they belongs to the evaluated range or not.
#changed again:
group1 = [] #coords in the interval
group2 = [] #coords out of the interval
for l in dynCoords:
pair = [l[0],l[1]]
if l[0] in range(my_ranges[0][0],my_ranges[0][1]) and l[1] in range(my_ranges[1][0],my_ranges[1][1]):
group1.append(pair)
else:
group2.append(pair)
#new line appended
my_ranges = min_max(group2)
With the original coordinates [[116,2306],[118,2307],[126,1517]] the groups [118,2307],[126,1517] got out of the range, and went to group2. With the new line appended they were used to change the minimun threshold again, now it goes from 56-2246 for xs and 176-2366 for ys. Let's say you use group2 in dynCoords, dynCoords = group2 and execute what goes under the label #changed again, you get for group1: [[116, 2306], [118, 2307]] and group2 goes empty.
I think you can make a function for that part of the code too. And run it as many times as you need to treat all your coordinates set.
you going to correct.
Suppose we have 1st element of the array: [116 1517]
x_min = 116-60 (56)
x_max = 116+60 (176)
y_min = 1517-60 (1457)
y_max = 1517+60 (1577)
now the other coordinates compare to these values:
for example :
now we have array = [146 1568]
then
x =146 y=1568
if x>x_min and x<x_max and y<y_max and y>y_max
grp.append(array)
else:
print('not in range)
so I want this type of output
146>56 and 146<176 and 1568>1457 and 1568<1577
this might be true so it would append in new array

Removing Numbers from Column that Don't Appear in Array

Took a large data set, removed any numbers that are not within 2 SD from a specific column and created an array, now I want to remove any numbers not in array from columns without messing up index. Would preferably like to convert any non-present numbers as nan.
Code used to remove values outside of 2 SD:
pupil_area_array = numpy.array(part_data['pupil_area'])
mean = numpy.mean(part_data['pupil_area'], axis=0)
sd = numpy.std(part_data['pupil_area'], axis=0)
final_list = [x for x in part_data['pupil_area'] if (x > mean - 2 * sd)]
final_list = [x for x in final_list if (x < mean + 2 * sd)]
print(final_list)
If you are not restricted to using a generator, you should be able to use map() https://www.geeksforgeeks.org/python-map-function/:
def filter_sd(value):
if x > mean - 2 * sd:
return x
return None #or return 'Nan'
final = map(filter_sd, part_data['pupil_area'])

numpy - could not broadcast input unknown error

I am attempting to run the following code, but am getting the following error:
line 71, in cross_validation
folds[index] = numpy.vstack((folds[index], dataset[jindex])). ValueError: could not broadcast input array from shape (2,8) into shape (8)
What is interesting is that when I print out the shapes of the two items I am trying to use in the vstack, they have the same shape (8,)
I am trying to determine why this line of the function is failing. Any advice would be greatly appreciated.
import numpy
def csv_to_array(file):
# Open the file, and load it in delimiting on the ',' for a comma separated value file
data = open(file, 'r')
data = numpy.loadtxt(data, delimiter=',')
# Loop through the data in the array
for index in range(len(data)):
# Utilize a try catch to try and convert to float, if it can't convert to float, converts to 0
try:
data[index] = [float(x) for x in data[index]]
except Exception:
data[index] = 0
except ValueError:
data[index] = 0
# Return the now type-formatted data
return data
def create_folds(dataset):
length = len(dataset)
folds = numpy.empty_like(dataset)
for index in range(5):
tempArray = numpy.ndarray(shape=(1, length))
numpy.append(folds, tempArray)
temp_class_array = numpy.ndarray(shape=(1,1))
numpy.append(folds, temp_class_array)
return folds
def class_distribution(dataset):
dataset = numpy.asarray(dataset)
num_total_rows = dataset.shape[0]
num_columns = dataset.shape[1]
classes = dataset[:,num_columns-1]
classes = numpy.unique(classes)
class_weights = []
for aclass in classes:
total = 0
weight = 0
for row in dataset:
if numpy.array_equal(aclass, row[-1]):
total = total + 1
else:
continue
weight = float((total/num_total_rows))
class_weights.append(weight)
class_weights = numpy.asarray(class_weights)
return classes, class_weights
def cross_validation(dataset):
classes, class_weights = class_distribution(dataset)
total_length = len(dataset)
folds = create_folds(dataset)
added_so_far = 0
for a_class, a_class_weight in zip(classes, class_weights):
amt_for_fold = float(((a_class_weight * total_length) / 5)-1)
for index in range(0,10,2):
added = 0
for jindex in range(len(classes)):
if added >= amt_for_fold:
break
if classes[jindex] == a_class:
print(folds[index].shape)
print(dataset[jindex].shape)
folds[index] = numpy.vstack((folds[index], dataset[jindex]))
# print(folds)
folds[index + 1] = numpy.vstack((folds[index + 1], [classes[jindex]]))
if index < 8:
dataset = numpy.delete(dataset, jindex, 0)
classes = numpy.delete(classes, jindex, 0)
added_so_far = added_so_far + 1
for xindex in range(len(folds)):
folds[xindex] = numpy.delete(folds[xindex], 0, 0)
print(folds)
return folds
def main():
print("BEGINNING CFV")
ecoli = csv_to_array('Classification/ecoli.csv')
cross_validation(ecoli)
main()
On the following dataset:
0.61,0.45,0.48,0.5,0.48,0.35,0.41,0
0.17,0.38,0.48,0.5,0.45,0.42,0.5,0
0.44,0.35,0.48,0.5,0.55,0.55,0.61,0
0.43,0.4,0.48,0.5,0.39,0.28,0.39,0
0.42,0.35,0.48,0.5,0.58,0.15,0.27,0
0.23,0.33,0.48,0.5,0.43,0.33,0.43,0
0.37,0.52,0.48,0.5,0.42,0.42,0.36,0
0.29,0.3,0.48,0.5,0.45,0.03,0.17,0
0.22,0.36,0.48,0.5,0.35,0.39,0.47,0
0.23,0.58,0.48,0.5,0.37,0.53,0.59,0
0.47,0.47,0.48,0.5,0.22,0.16,0.26,0
0.54,0.47,0.48,0.5,0.28,0.33,0.42,0
0.51,0.37,0.48,0.5,0.35,0.36,0.45,0
0.4,0.35,0.48,0.5,0.45,0.33,0.42,0
0.44,0.34,0.48,0.5,0.3,0.33,0.43,0
0.44,0.49,0.48,0.5,0.39,0.38,0.4,0
0.43,0.32,0.48,0.5,0.33,0.45,0.52,0
0.49,0.43,0.48,0.5,0.49,0.3,0.4,0
0.47,0.28,0.48,0.5,0.56,0.2,0.25,0
0.32,0.33,0.48,0.5,0.6,0.06,0.2,0
0.34,0.35,0.48,0.5,0.51,0.49,0.56,0
0.35,0.34,0.48,0.5,0.46,0.3,0.27,0
0.38,0.3,0.48,0.5,0.43,0.29,0.39,0
0.38,0.44,0.48,0.5,0.43,0.2,0.31,0
0.41,0.51,0.48,0.5,0.58,0.2,0.31,0
0.34,0.42,0.48,0.5,0.41,0.34,0.43,0
0.51,0.49,0.48,0.5,0.53,0.14,0.26,0
0.25,0.51,0.48,0.5,0.37,0.42,0.5,0
0.29,0.28,0.48,0.5,0.5,0.42,0.5,0
0.25,0.26,0.48,0.5,0.39,0.32,0.42,0
0.24,0.41,0.48,0.5,0.49,0.23,0.34,0
0.17,0.39,0.48,0.5,0.53,0.3,0.39,0
0.04,0.31,0.48,0.5,0.41,0.29,0.39,0
0.61,0.36,0.48,0.5,0.49,0.35,0.44,0
0.34,0.51,0.48,0.5,0.44,0.37,0.46,0
0.28,0.33,0.48,0.5,0.45,0.22,0.33,0
0.4,0.46,0.48,0.5,0.42,0.35,0.44,0
0.23,0.34,0.48,0.5,0.43,0.26,0.37,0
0.37,0.44,0.48,0.5,0.42,0.39,0.47,0
0,0.38,0.48,0.5,0.42,0.48,0.55,0
0.39,0.31,0.48,0.5,0.38,0.34,0.43,0
0.3,0.44,0.48,0.5,0.49,0.22,0.33,0
0.27,0.3,0.48,0.5,0.71,0.28,0.39,0
0.17,0.52,0.48,0.5,0.49,0.37,0.46,0
0.36,0.42,0.48,0.5,0.53,0.32,0.41,0
0.3,0.37,0.48,0.5,0.43,0.18,0.3,0
0.26,0.4,0.48,0.5,0.36,0.26,0.37,0
0.4,0.41,0.48,0.5,0.55,0.22,0.33,0
0.22,0.34,0.48,0.5,0.42,0.29,0.39,0
0.44,0.35,0.48,0.5,0.44,0.52,0.59,0
0.27,0.42,0.48,0.5,0.37,0.38,0.43,0
0.16,0.43,0.48,0.5,0.54,0.27,0.37,0
0.06,0.61,0.48,0.5,0.49,0.92,0.37,1
0.44,0.52,0.48,0.5,0.43,0.47,0.54,1
0.63,0.47,0.48,0.5,0.51,0.82,0.84,1
0.23,0.48,0.48,0.5,0.59,0.88,0.89,1
0.34,0.49,0.48,0.5,0.58,0.85,0.8,1
0.43,0.4,0.48,0.5,0.58,0.75,0.78,1
0.46,0.61,0.48,0.5,0.48,0.86,0.87,1
0.27,0.35,0.48,0.5,0.51,0.77,0.79,1
0.52,0.39,0.48,0.5,0.65,0.71,0.73,1
0.29,0.47,0.48,0.5,0.71,0.65,0.69,1
0.55,0.47,0.48,0.5,0.57,0.78,0.8,1
0.12,0.67,0.48,0.5,0.74,0.58,0.63,1
0.4,0.5,0.48,0.5,0.65,0.82,0.84,1
0.73,0.36,0.48,0.5,0.53,0.91,0.92,1
0.84,0.44,0.48,0.5,0.48,0.71,0.74,1
0.48,0.45,0.48,0.5,0.6,0.78,0.8,1
0.54,0.49,0.48,0.5,0.4,0.87,0.88,1
0.48,0.41,0.48,0.5,0.51,0.9,0.88,1
0.5,0.66,0.48,0.5,0.31,0.92,0.92,1
0.72,0.46,0.48,0.5,0.51,0.66,0.7,1
0.47,0.55,0.48,0.5,0.58,0.71,0.75,1
0.33,0.56,0.48,0.5,0.33,0.78,0.8,1
0.64,0.58,0.48,0.5,0.48,0.78,0.73,1
0.11,0.5,0.48,0.5,0.58,0.72,0.68,1
0.31,0.36,0.48,0.5,0.58,0.94,0.94,1
0.68,0.51,0.48,0.5,0.71,0.75,0.78,1
0.69,0.39,0.48,0.5,0.57,0.76,0.79,1
0.52,0.54,0.48,0.5,0.62,0.76,0.79,1
0.46,0.59,0.48,0.5,0.36,0.76,0.23,1
0.36,0.45,0.48,0.5,0.38,0.79,0.17,1
0,0.51,0.48,0.5,0.35,0.67,0.44,1
0.1,0.49,0.48,0.5,0.41,0.67,0.21,1
0.3,0.51,0.48,0.5,0.42,0.61,0.34,1
0.61,0.47,0.48,0.5,0,0.8,0.32,1
0.63,0.75,0.48,0.5,0.64,0.73,0.66,1
0.71,0.52,0.48,0.5,0.64,1,0.99,1
0.72,0.42,0.48,0.5,0.65,0.77,0.79,2
0.79,0.41,0.48,0.5,0.66,0.81,0.83,2
0.83,0.48,0.48,0.5,0.65,0.76,0.79,2
0.69,0.43,0.48,0.5,0.59,0.74,0.77,2
0.79,0.36,0.48,0.5,0.46,0.82,0.7,2
0.78,0.33,0.48,0.5,0.57,0.77,0.79,2
0.75,0.37,0.48,0.5,0.64,0.7,0.74,2
0.59,0.29,0.48,0.5,0.64,0.75,0.77,2
0.67,0.37,0.48,0.5,0.54,0.64,0.68,2
0.66,0.48,0.48,0.5,0.54,0.7,0.74,2
0.64,0.46,0.48,0.5,0.48,0.73,0.76,2
0.76,0.71,0.48,0.5,0.5,0.71,0.75,2
0.84,0.49,0.48,0.5,0.55,0.78,0.74,2
0.77,0.55,0.48,0.5,0.51,0.78,0.74,2
0.81,0.44,0.48,0.5,0.42,0.67,0.68,2
0.58,0.6,0.48,0.5,0.59,0.73,0.76,2
0.63,0.42,0.48,0.5,0.48,0.77,0.8,2
0.62,0.42,0.48,0.5,0.58,0.79,0.81,2
0.86,0.39,0.48,0.5,0.59,0.89,0.9,2
0.81,0.53,0.48,0.5,0.57,0.87,0.88,2
0.87,0.49,0.48,0.5,0.61,0.76,0.79,2
0.47,0.46,0.48,0.5,0.62,0.74,0.77,2
0.76,0.41,0.48,0.5,0.5,0.59,0.62,2
0.7,0.53,0.48,0.5,0.7,0.86,0.87,2
0.64,0.45,0.48,0.5,0.67,0.61,0.66,2
0.81,0.52,0.48,0.5,0.57,0.78,0.8,2
0.73,0.26,0.48,0.5,0.57,0.75,0.78,2
0.49,0.61,1,0.5,0.56,0.71,0.74,2
0.88,0.42,0.48,0.5,0.52,0.73,0.75,2
0.84,0.54,0.48,0.5,0.75,0.92,0.7,2
0.63,0.51,0.48,0.5,0.64,0.72,0.76,2
0.86,0.55,0.48,0.5,0.63,0.81,0.83,2
0.79,0.54,0.48,0.5,0.5,0.66,0.68,2
0.57,0.38,0.48,0.5,0.06,0.49,0.33,2
0.78,0.44,0.48,0.5,0.45,0.73,0.68,2
0.78,0.68,0.48,0.5,0.83,0.4,0.29,3
0.63,0.69,0.48,0.5,0.65,0.41,0.28,3
0.67,0.88,0.48,0.5,0.73,0.5,0.25,3
0.61,0.75,0.48,0.5,0.51,0.33,0.33,3
0.67,0.84,0.48,0.5,0.74,0.54,0.37,3
0.74,0.9,0.48,0.5,0.57,0.53,0.29,3
0.73,0.84,0.48,0.5,0.86,0.58,0.29,3
0.75,0.76,0.48,0.5,0.83,0.57,0.3,3
0.77,0.57,0.48,0.5,0.88,0.53,0.2,3
0.74,0.78,0.48,0.5,0.75,0.54,0.15,3
0.68,0.76,0.48,0.5,0.84,0.45,0.27,3
0.56,0.68,0.48,0.5,0.77,0.36,0.45,3
0.65,0.51,0.48,0.5,0.66,0.54,0.33,3
0.52,0.81,0.48,0.5,0.72,0.38,0.38,3
0.64,0.57,0.48,0.5,0.7,0.33,0.26,3
0.6,0.76,1,0.5,0.77,0.59,0.52,3
0.69,0.59,0.48,0.5,0.77,0.39,0.21,3
0.63,0.49,0.48,0.5,0.79,0.45,0.28,3
0.71,0.71,0.48,0.5,0.68,0.43,0.36,3
0.68,0.63,0.48,0.5,0.73,0.4,0.3,3
0.74,0.49,0.48,0.5,0.42,0.54,0.36,4
0.7,0.61,0.48,0.5,0.56,0.52,0.43,4
0.66,0.86,0.48,0.5,0.34,0.41,0.36,4
0.73,0.78,0.48,0.5,0.58,0.51,0.31,4
0.65,0.57,0.48,0.5,0.47,0.47,0.51,4
0.72,0.86,0.48,0.5,0.17,0.55,0.21,4
0.67,0.7,0.48,0.5,0.46,0.45,0.33,4
0.67,0.81,0.48,0.5,0.54,0.49,0.23,4
0.67,0.61,0.48,0.5,0.51,0.37,0.38,4
0.63,1,0.48,0.5,0.35,0.51,0.49,4
0.57,0.59,0.48,0.5,0.39,0.47,0.33,4
0.71,0.71,0.48,0.5,0.4,0.54,0.39,4
0.66,0.74,0.48,0.5,0.31,0.38,0.43,4
0.67,0.81,0.48,0.5,0.25,0.42,0.25,4
0.64,0.72,0.48,0.5,0.49,0.42,0.19,4
0.68,0.82,0.48,0.5,0.38,0.65,0.56,4
0.32,0.39,0.48,0.5,0.53,0.28,0.38,4
0.7,0.64,0.48,0.5,0.47,0.51,0.47,4
0.63,0.57,0.48,0.5,0.49,0.7,0.2,4
0.69,0.65,0.48,0.5,0.63,0.48,0.41,4
0.43,0.59,0.48,0.5,0.52,0.49,0.56,4
0.74,0.56,0.48,0.5,0.47,0.68,0.3,4
0.71,0.57,0.48,0.5,0.48,0.35,0.32,4
0.61,0.6,0.48,0.5,0.44,0.39,0.38,4
0.59,0.61,0.48,0.5,0.42,0.42,0.37,4
0.74,0.74,0.48,0.5,0.31,0.53,0.52,4
The vstack() is returning a shape (2,8) array.
You're then assigning that (2,8) array to the LHS folds[index], which is just a shape (8,) array.
numpy tries to see if such a mismatched assignment can be justified by broadcasting, subject to the rules and constraints of broadcasting, and is finally giving up, with that error message.
Not sure what your actual intent is, so I'm not able to suggest alternative.
My guess is that folds should actually be created as a 3d array, in which each inner 2d array has as many rows as the length of each fold.
I also have this suspicion that, the line folds = numpy.empty_like(dataset) is based on some wrong understanding of numpy.empty_like(). Please double-check that.
I think you might be misunderstanding what vstack does. Given two vectors with 8 items it will stack them vertically and you will get a 2x8 matrix. Indeed the output will always be at lead 2D. See doc and the examples in https://docs.scipy.org/doc/numpy/reference/generated/numpy.vstack.html
E.g.
a = np.array([1,2,3])
b = np.array([1,2,3])
np.vstack((a,b))
outputs
array([[1, 2, 3],
[1, 2, 3]])

Python: how to make conditional operation in an array

I have an numpy array M of dimension NxM and a dataframe tmp containing the information of the cell of the array.
If I have to add values to the cell of M, I do
M[tmp.a, tmp.b] = tmp1.n
However I would like to add the values only to those cells in which M < tmp.n, something like
M[M[tmp.a, tmp.b] < tmp1.n] = tmp1.n
I solved in this way
s = shape(M)
M0 = np.zeros((s[1], s[0]))
M0[tmp1.a, tmp1.b] += tmp1.n
idx = np.where(M < M0)
M[idx[:][0], idx[:][1]] = M0[idx[:][0], idx[:][1]]
If I understood you correctly you may do something like:
M[tmp.a, tmp.b] = max(tmp1.n, M[tmp.a, tmp.b])
This can be done using Numpy logical indexing
# a logical (boolean) array
log = M < tmp.n
# apply it to source and target and use `+=` to add the values
M[log] += tmp.n[log]
If the arrays don't have the same shape then you can also pick a specific dimension:
log = M[:, 0] < tmp.n
# apply it to source and target and use `+=` to add the values
M[log, 0] += tmp.n[log]

filling numpy array by index

I have a function which gives me the index for a given value. Eg,
def F(value):
index = do_something(value)
return index
I want to use this index to fill a huge numpy array by 1s. Lets call array features
l = [1,4,2,3,7,5,3,6,.....]
NOTE: features.shape[0] = len(l)
for i in range(features.shape[0]):
idx = F(l[i])
features[i, idx] = 1
Is there a pythonic way to perform this (as the loop takes a lot of time if the array is huge)?
If you can vectorize F(value) you could write something like
indices = np.arange(features.shape[0])
feature_indices = F(l)
features.flat[indices, feature_indices] = 1
try this:
i = np.arange(features.shape[0]) # rows
j = np.vectorize(F)(np.array(l)) # columns
features[i,j] = 1

Categories

Resources