How to feed a nested array into an SVM model

How to feed a nested array into an SVM model - python

my question is the following:
I have an array that has feature vectors that correspond to several audio files. So if for example there are 10 audio files than this array would have length 10.
I have a feature that is itself a list (this list comprises the information of a specific feature of the audio file) and for a given audio file the feature vector looks like this:
array([0.03861840871664194, 187.72393405210002, 62.59881268743305,
0.2911392405063291,
array([4963.40332031, 3229.98046875, 2691.65039062, 3208.44726562,
4338.94042969, 4220.5078125 , 4166.67480469, 4801.90429688,
5555.56640625, 5910.86425781, 6115.4296875 , 5706.29882812,
4984.93652344, 2756.25 , 1991.82128906, 2551.68457031,
2734.71679688, 2906.98242188, 3143.84765625, 3219.21386719,
3186.9140625 , 3165.38085938, 3068.48144531, 2465.55175781,
2110.25390625, 2508.61816406, 2993.11523438, 3843.67675781,
4715.77148438, 5652.46582031, 5480.20019531, 5792.43164062,
5932.39746094, 6244.62890625, 6072.36328125, 6201.5625 ,
6158.49609375, 6201.5625 , 6233.86230469, 6061.59667969])],
dtype=object)
Now when I try to feed this data into the svm model:
from sklearn import svm
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
import matplotlib.pyplot as plt
X_train, X_val, y_train, y_val = train_test_split(X,y,test_size=0.3)
model = svm.SVC()
model.fit(X_train,y_train)
yt_p = model.predict(X_train)
yv_p = model.predict(X_val)
I get this error ValueError: setting an array element with a sequence.
How can I structure my feature vector in order to be able to feed it to the svm?
EDIT:
Here I provide with an example of X
if we have 5 audio files then X will be:
array([[0.017455393927437918, 227.66237105624407, 32.42076654734572,
0.3867924528301887,
array([1851.85546875, 2433.25195312, 3057.71484375, 3079.24804688,
3079.24804688, 3068.48144531, 3046.94824219, 3359.1796875 ,
3908.27636719, 4618.87207031, 4618.87207031, 4521.97265625,
4091.30859375, 3111.54785156, 3100.78125 , 2863.91601562,
1561.15722656, 1119.7265625 , 1065.89355469, 947.4609375 ,
979.76074219, 990.52734375, 990.52734375, 1356.59179688,
2077.95410156, 2993.11523438, 3025.41503906, 3068.48144531,
3079.24804688, 3090.01464844, 3100.78125 , 3111.54785156,
2993.11523438, 3100.78125 , 3079.24804688, 2853.14941406,
1205.859375 , 1281.22558594, 1614.99023438, 2131.78710938,
2325.5859375 , 2034.88769531, 1916.45507812, 1744.18945312,
1851.85546875, 2357.88574219, 2368.65234375, 1916.45507812,
1959.52148438, 1959.52148438, 1754.95605469, 1787.25585938,
2207.15332031])],
[0.03861840871664194, 187.72393405210002, 62.59881268743305,
0.2911392405063291,
array([4963.40332031, 3229.98046875, 2691.65039062, 3208.44726562,
4338.94042969, 4220.5078125 , 4166.67480469, 4801.90429688,
5555.56640625, 5910.86425781, 6115.4296875 , 5706.29882812,
4984.93652344, 2756.25 , 1991.82128906, 2551.68457031,
2734.71679688, 2906.98242188, 3143.84765625, 3219.21386719,
3186.9140625 , 3165.38085938, 3068.48144531, 2465.55175781,
2110.25390625, 2508.61816406, 2993.11523438, 3843.67675781,
4715.77148438, 5652.46582031, 5480.20019531, 5792.43164062,
5932.39746094, 6244.62890625, 6072.36328125, 6201.5625 ,
6158.49609375, 6201.5625 , 6233.86230469, 6061.59667969])],
[0.042435441297643324, 128.81225073038124, 20.912528554426807,
0.313953488372093,
array([4349.70703125, 4242.04101562, 4274.34082031, 4123.60839844,
4457.37304688, 4834.20410156, 4661.93847656, 4306.640625 ,
4231.27441406, 4543.50585938, 4435.83984375, 6201.5625 ,
8817.84667969, 8817.84667969, 742.89550781, 721.36230469,
732.12890625, 732.12890625, 710.59570312, 721.36230469,
925.92773438, 1119.7265625 , 1141.25976562, 1431.95800781,
7762.71972656, 7934.98535156, 7891.91894531, 7332.05566406,
3789.84375 , 2799.31640625, 2831.61621094, 2217.91992188,
581.39648438, 602.9296875 , 2217.91992188, 2228.68652344,
2368.65234375, 2519.38476562, 2863.91601562, 3682.17773438,
3649.87792969, 4188.20800781, 4112.84179688])],
[0.006295381642571726, 130.28309914454434, 5.193614287487564,
0.2411764705882353,
array([7978.05175781, 8010.3515625 , 8118.01757812, 8430.24902344,
8257.98339844, 8451.78222656, 8591.74804688, 8677.88085938,
8796.31347656, 8850.14648438, 8796.31347656, 8925.51269531,
6244.62890625, 344.53125 , 344.53125 , 1614.99023438,
2325.5859375 , 2971.58203125, 3316.11328125, 3617.578125 ,
3294.58007812, 2788.54980469, 2637.81738281, 2702.41699219,
2723.95019531, 3133.08105469, 3413.01269531, 5663.23242188,
5770.8984375 , 5577.09960938, 2228.68652344, 1604.22363281,
1690.35644531, 4123.60839844, 5566.33300781, 5803.19824219,
5749.36523438, 5846.26464844, 6772.19238281, 7073.65722656,
7622.75390625, 7859.61914062, 8236.45019531, 8441.015625 ,
8699.4140625 , 8807.08007812, 8742.48046875, 8667.11425781,
8710.18066406, 8947.04589844, 9140.84472656, 9130.078125 ,
8936.27929688, 8925.51269531, 8947.04589844, 8925.51269531,
9097.77832031, 9205.44433594, 9194.67773438, 9140.84472656,
9162.37792969, 9043.9453125 , 9162.37792969, 9108.54492188,
9183.91113281, 9280.81054688, 9270.04394531, 9108.54492188,
9076.24511719, 9356.17675781, 9226.97753906, 9216.2109375 ,
9248.51074219, 9140.84472656, 9237.74414062, 9334.64355469,
9259.27734375, 9226.97753906, 9216.2109375 , 9108.54492188,
9183.91113281, 9216.2109375 , 9248.51074219, 9259.27734375,
9183.91113281])],
[0.017070271599460656, 171.91660927761163, 26.854424936811768,
0.11188811188811189,
array([4715.77148438, 4629.63867188, 4898.80371094, 5275.63476562,
4941.87011719, 4532.73925781, 4618.87207031, 4995.703125 ,
4705.00488281, 4500.43945312, 4188.20800781, 4371.24023438,
4457.37304688, 4188.20800781, 4909.5703125 , 4877.27050781,
6761.42578125, 7708.88671875, 7719.65332031, 7956.51855469,
8484.08203125, 9033.17871094, 9043.9453125 , 9000.87890625,
9011.64550781, 9011.64550781, 9000.87890625, 9108.54492188,
8817.84667969, 6686.05957031, 1808.7890625 , 1830.32226562,
1851.85546875, 1636.5234375 , 1022.82714844, 1281.22558594,
1927.22167969, 1948.75488281, 1302.75878906, 1399.65820312,
1873.38867188, 1959.52148438, 7245.92285156, 9011.64550781,
9420.77636719, 9549.97558594, 9453.07617188, 9431.54296875,
9410.00976562, 9248.51074219, 9151.61132812, 9194.67773438,
8968.57910156, 8634.81445312, 8268.75 , 7439.72167969,
5501.73339844, 5232.56835938, 5103.36914062, 7052.12402344,
7299.75585938, 7127.49023438, 7192.08984375, 5673.99902344,
5523.26660156, 5986.23046875, 6729.12597656, 6309.22851562,
5135.66894531, 5081.8359375 , 5329.46777344, 5404.83398438])]],
dtype=object)

You can feed the feature with the lists inside to your model in two ways:
Treat the list as additional features
Map all of its elements into a single number with a function you deem appropriate (min, median, mean, max, sum, etc.).
To try the first option:
# Convert `X` to data frame
X = pd.DataFrame(X)
# Rename columns
X.columns = ['feature_' + str(i + 1) for i in range(X.shape[1])]
# Convert the feature with lists inside to long format
x = X['feature_5'].explode().to_frame()
# Create counter by observation so we can pivot
x['observation_id'] = x.groupby(level=0).cumcount()
# Convert to dataset and rename all columns
x = x.pivot(columns='observation_id', values='feature_5').fillna(0)
x = x.add_prefix('list_element_')
# Drop `feature_5` from X
X.drop(columns='feature_5', axis=1, inplace=True)
# Concatenate X and x together
X = pd.concat([X, x], axis=1)
# Carry on as before
X_train, X_val, y_train, y_val = train_test_split(X,y,test_size=0.3)
model = svm.SVC()
model.fit(X_train,y_train)
There's no right answer to the second option and only you can decide how to do this because only you know what the lists mean. However, if you want to get the mean (for example) of each list and use that as a feature:
# Get the mean of each list
means = [np.mean(array) for array in X[:, 4]]
# Replace the lists with `means`
X[:, 4] = means
And then carry on with the splitting and fitting.

Related

Area under the histogram is not 1 when using density in plt.hist

Consider the following dataset with random data:
test_dataset = np.array([ -2.09601881, -4.26602684, 1.09105452, -4.59559669,
1.05865251, -0.93076762, -14.70398945, -18.01937129,
4.64126152, -10.34178822, -9.46058493, -5.66864965,
-3.17562022, 15.7030379 , 10.59675205, -5.80882413,
-24.00604149, -4.81518663, -1.94333927, 1.18142171,
12.72030312, 3.84917581, -0.4468796 , 11.91828567,
-17.99171774, 9.35108712, -5.57233376, 5.77547128,
5.49296099, -10.96132844, -18.75174336, 5.27843303,
25.73548956, -21.58043021, -14.24734733, 12.57886018,
-22.10002076, 1.72207555, -6.0411867 , -3.63568527,
7.26542117, -0.21449529, -6.64974714, -0.94574606,
-4.23339431, 16.76199734, -12.42195793, 18.965854 ,
-23.85336123, -15.55104466, 6.17215868, 7.34993316,
8.62461351, -16.30482638, -16.35601099, 1.96857833,
18.74440399, -22.48374434, -10.895831 , -10.14393648,
-17.62768751, 4.83388855, 20.1578181 , 6.04299626,
0.97198296, -3.40889754, -10.62734293, 1.70240472,
20.4203839 , 10.26751364, 15.47859675, -10.97940064,
1.82728251, 4.22894717, 8.31502887, -5.48502811,
-1.09244874, -11.32072796, -24.88520436, -7.42108403,
19.4200716 , 4.82704045, -12.46290135, -15.18466755,
6.37714692, -11.06825059, 5.10898588, -9.07485484,
1.63946084, -12.2270078 , 12.63776832, -25.03916909,
2.42972082, -14.22890171, 18.2199446 , 6.9819771 ,
-12.07795089, 2.59948596, -16.90206575, 6.35192719,
7.33823106, -23.69653447, -11.66091871, -19.40251179,
-12.64863792, 11.04004231, 13.7247356 , -16.36107329,
20.43227515, 17.97334692, 16.92675175, -5.62051239,
-8.66304184, -8.40848514, -23.20919855, 0.96808137,
-5.03287253, -3.13212582, 18.81155666, -8.27988284,
3.85708447, 12.43039322, 17.98003878, 18.11009997,
-3.74294421, -16.62276121, 9.4446743 , 2.2060981 ,
8.34853736, 14.79144713, -1.91113975, -5.17061419,
4.53451746, 8.19090358, 7.98343201, 11.44592322,
-16.9132677 , -25.92554857, 10.10638432, -8.09236786,
20.8878207 , 19.52368296, 0.85858125, 2.61760415,
9.21360649, -8.1192651 , -6.94829273, 2.73562447,
13.40981323, -9.05018331, -17.77563166, -21.03927199,
4.10415845, -1.31550732, 5.68284828, 15.08670773,
-19.78675315, 12.94697869, -11.51797637, 1.91485992,
16.69417993, -16.04271622, -1.14028558, 9.79830109,
-18.58386093, -7.52963269, -10.10059878, -25.2194216 ,
-0.10598426, -15.77641532, -14.15999125, 14.35011271,
11.15178588, -14.43856266, 15.84015226, -3.41221883,
11.90724469, 0.57782081, 18.82127466, -6.01068727,
-19.83684476, 2.20091942, -1.38707755, -8.62821053,
-11.89000913, -11.69539815, 5.70242019, -3.83781841,
5.35894135, -0.30995954, 21.76661212, 8.52974329,
-9.13065082, -11.06209 , -12.00654618, 2.769838 ,
-12.21579496, -27.2686534 , -4.58538197, -6.94388425])
I'd like to plot normalized histogram of it, so in the plt.hist options I choose density=True:
import numpy as np
import matplotlib.pyplot as plt
data1, bins, _ = plt.hist(test_dataset, density=True);
print(np.trapz(data1))
print(sum(data1))
which outputs the following histogram:
0.18206124014272715
0.18866449755723017
From matplotlib documentation:
The density parameter, which normalizes bin heights so that the integral of the histogram is 1. The resulting histogram is an approximation of the probability density function.
But from my example it is clearly seen that the integral of the histogram is NOT 1 and strongly depends on the number of bins: if I specify it for example to be 40 the sum will increase:
data1, bins, _ = plt.hist(test_dataset, density=True);
print(np.trapz(data1))
print(sum(data1))
0.7508847002777762
0.7546579902289207
Is it incorrect description in documentation or I misunderstand some issues here?

you do not calculate the area, area you should calculate as follow (in your example):
sum(data1 * np.diff(bins)) == 1

Python OptBinning package's OptimalBinning and BinningProcess giving different results sometimes

I'm using the OptBinning package to bin some numeric data. I'm following this example to do this. And from this tutorial I read that "... the best way to view BinningProcess is as a wrapper for OptimalBinning", which implies that they should both give the same outputs. However, I'm seeing they give different outputs for some features and the same for others. Why is this the case? Below is an example showing how the two methods lead to the same output for 'mean radius' but not 'worst radius' using the breast cancer data in sklearn.
import pandas as pd
import numpy as np
from sklearn.datasets import load_breast_cancer
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from optbinning import BinningProcess
from optbinning import OptimalBinning
# Load data
data = load_breast_cancer()
df = pd.DataFrame(data.data, columns=data.feature_names)
# Bin 'mean radius' data using OptimalBinning method
var = 'mean radius'
x = df[var]
y = data.target
optb = OptimalBinning(name=var, dtype="numerical")
optb.fit(x, y)
binning_table = optb.binning_table
binning_table.build()['WoE']
0 -3.12517
1 -2.71097
2 -1.64381
3 -0.839827
4 -0.153979
5 2.00275
6 5.28332
7 0
8 0
Totals
Name: WoE, dtype: object
# Bin 'mean radius' using BinningProcess method
var = ['mean radius']
bc_pipe = Pipeline([('WOE Binning', BinningProcess(variable_names=var))])
preprocessor = ColumnTransformer([('Numeric Pipeline', bc_pipe, var)], remainder='passthrough')
preprocessor.fit(df, y)
df_processed = preprocessor.transform(df)
df_processed = pd.DataFrame(df_processed, columns=df.columns)
df_processed[var[0]].unique()
array([ 5.28332344, -3.12517033, -1.64381421, -0.15397917, 2.00275405,
-0.83982705, -2.71097154])
## We see that the Weight of Evidence (WoE) values are the same for 'mean radius' using both methods (except for the 0's, which we can ignore for now)
# Bin 'worst radius' using OptimalBinning process
var = 'worst radius'
x = df[var]
y = data.target
optb = OptimalBinning(name=var, dtype="numerical")
optb.fit(x, y)
binning_table = optb.binning_table
binning_table.build()['WoE']
0 -4.56645
1 -2.6569
2 -0.800606
3 -0.060772
4 1.61976
5 5.5251
6 0
7 0
Totals
Name: WoE, dtype: object
# Bin 'worst radius' using BinningProcess method
var = ['worst radius']
bc_pipe = Pipeline([('WOE Binning', BinningProcess(variable_names=var))])
preprocessor = ColumnTransformer([('Numeric Pipeline', bc_pipe, var)], remainder='passthrough')
preprocessor.fit(df, y)
df_processed = preprocessor.transform(df)
df_processed = pd.DataFrame(df_processed, columns=df.columns)
df_processed[var[0]].unique()
array([0.006193 , 0.003532 , 0.004571 , 0.009208 , 0.005115 , 0.005082 ,
0.002179 , 0.005412 , 0.003749 , 0.01008 , 0.003042 , 0.004144 ,
0.01284 , 0.003002 , 0.008093 , 0.005466 , 0.002085 , 0.004142 ,
0.001997 , 0.0023 , 0.002425 , 0.002968 , 0.004394 , 0.001987 ,
0.002801 , 0.007444 , 0.003711 , 0.004217 , 0.002967 , 0.003742 ,
0.00456 , 0.005667 , 0.003854 , 0.003896 , 0.003817 , ... ])
## We now see that for 'worst radius' the two WoE's are not the same. Why?

I think the problem is due to the default behaviour of ColumnTransformer option remainder="passthrough". The remaining columns are concatenated, and that's why the position of the transformed variables changes. If you look at the dataframe, the first column contains the WoE values of the feature "worst radius". As an example, please try the following:
binning_process = BinningProcess(variable_names=var)
binning_process.fit(df[var], y)
np.unique(binning_process.transform(df[var]).values)
The binning process, as expected, will return the same WoE values. See also: https://scikit-learn.org/stable/modules/generated/sklearn.compose.ColumnTransformer.html
By default, only the specified columns in transformers are transformed and combined in the output, and the non-specified columns are dropped. (default of 'drop'). By specifying remainder='passthrough', all remaining columns that were not specified in transformers will be automatically passed through. This subset of columns is concatenated with the output of the transformers.

ValueError: x and y must be the same size In Python while creating KMeans Model

I'm building a Kmeans clustering model with a churn dataset and am getting an error that says ValueError: x and y must be the same size when trying to create cluster graph.
I'll post both my function and the graph code here in a sec, but in trying to narrow it down, I think it may have something to do with this line of code in the function:
x=kmeans.cluster_centers_[:,0]
, y=kmeans.cluster_centers_[:,1]
Here's the full code
def Create_kmeans_cluster_graph(df_final, data, n_clusters, x_title, y_title, chart_title):
""" Display K-means cluster based on data """
kmeans = KMeans(n_clusters=n_clusters # No of cluster in data
, random_state = random_state # Selecting same training data
)
kmeans.fit(data)
kmean_colors = [plotColor[c] for c in kmeans.labels_]
fig = plt.figure(figsize=(12,8))
plt.scatter(x= x_title + '_norm'
, y= y_title + '_norm'
, data=data
, color=kmean_colors # color of data points
, alpha=0.25 # transparancy of data points
)
plt.xlabel(x_title)
plt.ylabel(y_title)
plt.scatter(x=kmeans.cluster_centers_[:,0]
, y=kmeans.cluster_centers_[:,1]
, color='black'
, marker='X' # Marker sign for data points
, s=100 # marker size
)
plt.title(chart_title,fontsize=15)
plt.show()
return kmeans.fit_predict(df_final[df_final.Churn==1][[x_title+'_norm', y_title +'_norm']])
//Graph
df_final['Cluster'] = -1 # by default set Cluster to -1
df_final.iloc[(df_final.Churn==1),'Cluster'] = Create_kmeans_cluster_graph(df_final
,df_final[df_final.Churn==1][['Tenure_norm','MonthlyCharge_norm']]
,3
,'Tenure'
,'MonthlyCharges'
,"Tenure vs Monthlycharges : Churn customer cluster")
df_final['Cluster'].unique()

You get that error because of this line:
plt.scatter(x= x_title + '_norm'
, y= y_title + '_norm'
, data=data
, color=kmean_colors # color of data points
, alpha=0.25 # transparancy of data points
)
If you use plt.scatter, it does not take in data= as an argument, you can read the help page. You can either do:
plt.scatter(data[x_title + '_norm'],data[y_title + '_norm'],...)
Or you use the plot.scatter method on a pandas dataframe, which I did in a edited version of your function:
def Create_kmeans_cluster_graph(df_final, data, n_clusters, x_title, y_title, chart_title):
plotColor = ['k','g','b']
kmeans = KMeans(n_clusters=n_clusters , random_state = random_state)
kmeans.fit(data)
kmean_colors = [plotColor[c] for c in kmeans.labels_]
data.plot.scatter(x= x_title + '_norm', y= y_title + '_norm',
color=kmean_colors,alpha=0.25)
plt.xlabel(x_title)
plt.ylabel(y_title)
plt.scatter(x=kmeans.cluster_centers_[:,0],y=kmeans.cluster_centers_[:,1],
color='black',marker='X',s=100)
return kmeans.labels_
On an example dataset, it works:
import pandas as pd
import numpy as np
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
random_state = 42
np.random.seed(42)
df_final = pd.DataFrame({'Tenure_norm':np.random.uniform(0,1,50),
'MonthlyCharge_norm':np.random.uniform(0,1,50),
'Churn':np.random.randint(0,3,50)})
Create_kmeans_cluster_graph(df_final
,df_final[df_final.Churn==1][['Tenure_norm','MonthlyCharge_norm']]
,3
,'Tenure'
,'MonthlyCharge'
,"Tenure vs Monthlycharges : Churn customer cluster")

Can't get the fit with lmfit

I want to do a fit using lmfit but I am having some issues. Here is my code:
from lmfit import Model
import numpy as np
def fit_func(x,a,b,c):
return a*(b-x)**(5/8)+c
x = np.array([ 131.871 , 218.825 , 305.046 , 390.533 ,
475.128 , 558.959 , 642.001 , 724.307 ,
805.794 , 886.422 , 966.20900001, 1045.19300001,
1123.39300001, 1200.75800001, 1277.23700001, 1352.83300001,
1427.57800001, 1501.49800001, 1574.55300001, 1646.69500001,
1717.90800001, 1788.22100001, 1857.65100001, 1926.18300001,
1993.76400001, 2060.37000001, 2126.00900001, 2190.70600001,
2254.44800001, 2317.20000001, 2378.92000001, 2439.60300001,
2499.25800001, 2557.89000001, 2615.46600001, 2671.95000001,
2727.30900001, 2781.54300001, 2834.64700001, 2886.60600001,
2937.38000001, 2986.92900001])
y = np.array([ 0. , 3.14159265, 6.28318531, 9.42477796,
12.56637061, 15.70796327, 18.84955592, 21.99114858,
25.13274123, 28.27433388, 31.41592654, 34.55751919,
37.69911184, 40.8407045 , 43.98229715, 47.1238898 ,
50.26548246, 53.40707511, 56.54866776, 59.69026042,
62.83185307, 65.97344573, 69.11503838, 72.25663103,
75.39822369, 78.53981634, 81.68140899, 84.82300165,
87.9645943 , 91.10618695, 94.24777961, 97.38937226,
100.53096491, 103.67255757, 106.81415022, 109.95574288,
113.09733553, 116.23892818, 119.38052084, 122.52211349,
125.66370614, 128.8052988 ])
fit_model = Model(fit_func)
params = fit_model.make_params()
params['b'].set(5000, min=3500)
result = fit_model.fit(y, x=x)
But I am getting this error:
ValueError: The model function generated NaN values and the fit aborted! Please check your model function and/or set boundaries on parameters where applicable. In cases like this, using "nan_policy='omit'" will probably not work.
What am I doing wrong? I tried to adjust the a, b, c parameters by hand and a=-1.2, b=3600, c=196 give a pretty good fit, so the program should be able to find something similar to that.

Two things are missing:
a) you need to pass params to fit_model.fit() as with
result = fit_model.fit(y, params, x=x)
b) you need to give initial values for all parameters. Un-initialized parameters will have a value of -np.inf, which is deliberately chosen because it will throw such errors.
You say you know reasonable values for a, b, and c. Use that knowledge! Something like
fit_model = Model(fit_func)
params = fit_model.make_params(a=-1, b=4000, c=200)
params['b'].min = x.max() * (1.000001) # prevent (negative number)**fraction
result = fit_model.fit(y, params, x=x)
print(result.fit_report())
should work.

Tensorflow Grab Predictions and Indices for values above thresholds

What is the easiest way to grab the corresponding prediction values and indices based on those above a certain threshold?
Consider this problem:
sess = tf.InteractiveSession()
predictions = tf.constant([[ 0.32957435, 0.82079124, 0.54503286, 0.51966476, 0.63359714,
0.92034972, 0.13774526, 0.45154464, 0.18284607, 0.14604568],
[ 0.78612137, 0.98291659, 0.4841609 , 0.63260579, 0.21568334,
0.82978213, 0.05054879, 0.09517837, 0.28309393, 0.01788473],
[ 0.05706763, 0.24366784, 0.04608512, 0.32987678, 0.2342416 ,
0.91725373, 0.60084391, 0.51787591, 0.74161232, 0.30830121],
[ 0.67310858, 0.6250236 , 0.42477703, 0.37107778, 0.65123832,
0.97282803, 0.59533679, 0.49564457, 0.54935825, 0.63008392],
[ 0.70233917, 0.48129809, 0.59114349, 0.63535333, 0.71188867,
0.4799161 , 0.90896237, 0.86089945, 0.47896886, 0.83451629],
[ 0.82923532, 0.8950938 , 0.99231505, 0.05526769, 0.98151541,
0.18153167, 0.63851702, 0.07426929, 0.91846335, 0.81246626],
[ 0.12850153, 0.23018432, 0.29871917, 0.71228445, 0.13235569,
0.41061044, 0.98215759, 0.90024149, 0.53385031, 0.92247963],
[ 0.87011361, 0.44218826, 0.01772344, 0.87317121, 0.52231467,
0.86476815, 0.25352192, 0.31709731, 0.38249743, 0.74694788],
[ 0.15262914, 0.49544573, 0.49644637, 0.07461977, 0.13706958,
0.18619633, 0.86163998, 0.03700352, 0.51173556, 0.40018845]])
score_idx = tf.where(predictions > 0.8)
scores = tf.SparseTensor(score_idx, tf.gather_nd(predictions, score_idx), dense_shape=tf.shape(predictions, out_type=tf.int64))
dense_scores = tf.sparse_tensor_to_dense(scores)
print(sess.run([scores, dense_scores]))
I can easily get a sparse tensor that has all of the predictions above 0.8, but ultimately I am looking to return two separate 1D tensors:
Predicted Indices = list of indexes above threshold (0.8 in example)
Scores = the scores for the corresponding examples
So for the first row which is:
[ 0.32957435, 0.82079124, 0.54503286, 0.51966476, 0.63359714,
0.92034972, 0.13774526, 0.45154464, 0.18284607, 0.14604568]
I am looking to return:
predicted_indices = [1,5]
scores = [0.821, 0.920]
Is there a simple solution that I am missing?

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to feed a nested array into an SVM model - python

Related

Area under the histogram is not 1 when using density in plt.hist

Python OptBinning package's OptimalBinning and BinningProcess giving different results sometimes

ValueError: x and y must be the same size In Python while creating KMeans Model

Can't get the fit with lmfit

Tensorflow Grab Predictions and Indices for values above thresholds

Categories

Resources