how to split an array by value - python

My code so far is:
import numpy as np
data=np.genfromtxt('filename')
print(data)
which prints:
[[ 0.723 1. ]
[ 0.433 2. ]
[ 0.258 1. ]
[ 1.52 2. ]
[ 0.083 2. ]
[ 2.025 1. ]
[ 3.928 1. ]]
How do i split the data into two groups, based on if the line has a 1 or 2?

A simple solution is to use np.where which returns results of a conditional statement in the form of a tuple of arrays, which can be directly used with numpy's advanced slice notation to slice that data into a new variable.
import numpy as np
data = np.array(
[[ 0.723, 1. ],
[ 0.433, 2. ],
[ 0.258, 1. ],
[ 1.52, 2. ],
[ 0.083, 2. ],
[ 2.025, 1. ],
[ 3.928, 1. ]])
data1 = data[np.where(data[:,1] == 1)]
data2 = data[np.where(data[:,1] == 2)]
print(data1)
print(data2)

How about something like this:
import numpy as np
data = np.asarray([[0.723, 1.],
[0.433, 2.],
[0.258, 1.],
[1.520, 2.],
[0.083, 2.],
[2.025, 1.],
[3.928, 1.]])
split_data = [data[data[:,1] == 1.], data[data[:,1] == 2.]]
print(f'data:\n{data}')
print(f'split_data:\n{split_data}')
Explanation:
data[:,1] references the value in the 2nd "column" per se.
Output:
data:
[[0.723 1. ]
[0.433 2. ]
[0.258 1. ]
[1.52 2. ]
[0.083 2. ]
[2.025 1. ]
[3.928 1. ]]
split_data:
[array([[0.723, 1. ],
[0.258, 1. ],
[2.025, 1. ],
[3.928, 1. ]]),
array([[0.433, 2. ],
[1.52 , 2. ],
[0.083, 2. ]])]

Your question was rather brief, so I didn't quite catch the dataformat but I tried replicating it with:
foo = [[ 0.723, 1 ], [ 0.433, 2 ], [ 0.258, 1 ], [ 1.52, 2 ],
[ 0.083, 2 ], [ 2.025, 1 ], [ 3.928, 1 ]]
In case would want to filter this list foo to only contain numbers matching certain number you could use the following list comprehension:
foo_is_1 = [e for e in foo if e[1] == 1]
foo_is_2 = [e for e in foo if e[1] == 2]
print(foo_is_1)
print(foo_is_2)
In case you know nothing about the second argument and just want to split your list up in a list of lists with unique second arguments you could use:
list_of_lists = [[e for e in foo if e[1] == a] for a in list(set([a[1] for a in foo]))]
for entry in list_of_lists:
print(entry)
Which is basically two list comprehensions, one for each unique second argument a, and one for each entry e in foo.

Related

How to use all the elements of the array using for loop?

actually, I need to put the returned values of the function (global_displacement(X)) into another running loop.
can someone please tell me how to obtain the required output?
and what idiotic mistake I have been doing.
as every time it gives me only the first([ 0, 0, X[0], X[1]]) OR
the last value([ X[20], X[21], X[53], X[54]]) in the output,
because of wrong indendation of "return j" in the below written code .
import numpy as np
X = [ 0.19515612, 0.36477665, 0.244737, 0.42873321, 0.16864666, 0.08636661, 0.05376605, -0.57201897, -0.00935055, -1.24923862, 0., -1.53111525, 0.00935055, -1.24923862, -0.05376605, -0.57201897, -0.1686466,
0.08636661, -0.244737, 0.42873321, -0.19515612, 0.36477665, 0.02279911, 0. , 0.3563355 , 0.01379104, 0. , 0.42289958, -0.00747999, 0. , 0.0825908, -0.02949519 , 0. , -0.57435396,
-0.04074819, 0. , -1.25069528 ,-0.02972642, 0. , -1.53227704, -0. , 0. , -1.25069528 , 0.02972642 , 0. , -0.57435396 , 0.04074819 , 0. , 0.0825908, 0.02949519, 0. ,
0.42289958, 0.00747999 , 0. , 0.3563355 , -0.01379104, -0.02279911]
def global_displacement(X):
global_displacements = np.array( [[ 0, 0, X[0], X[1]], [ X[0], X[1], X[2], X[3]], [ X[2], X[3],X[4], X[5]], [ X[4],X[5],X[6], X[7]],[ X[6],X[7],X[8],X[9]], [ X[8],X[9],X[10], X[11] ], [ X[10], X[11],X[12], X[13]], [ X[12], X[13],X[14], X[15]],[ X[14], X[15],X[16], X[17]],[ X[16], X[17],X[18], X[19]], [ X[18], X[19],X[20], X[21]],[ X[20], X[21], 0, 0],
[ X[0], X[1], X[23], X[24]], [ X[2], X[3], X[26],X[27]], [ X[4], X[5], X[29],X[30]], [ X[6], X[7], X[32],X[33]], [ X[8],X[9],X[35], X[36]], [ X[10], X[11], X[38], X[39]], [ X[12], X[13], X[41], X[42]] ,[ X[14], X[15], X[44], X[45]],[ X[16], X[17], X[47], X[48]],[ X[18], X[19], X[50], X[51]], [ X[20], X[21], X[53], X[54]] ] )
for i in (global_displacements):
j = i.reshape(4,1)
return j
print(global_displacement(X))
this is the expected output, and I need to put these values in another loop, by calling this function.
[[0. ]
[0. ]
[0.19515612]
[0.36477665]]
[[0.19515612]
[0.36477665]
[0.244737 ]
[0.42873321]]
[[0.244737 ]
[0.42873321]
[0.16864666]
[0.08636661]]
[[ 0.16864666]
[ 0.08636661]
[ 0.05376605]
[-0.57201897]]
[[ 0.05376605]
[-0.57201897]
[-0.00935055]
[-1.24923862]]
[[-0.00935055]
[-1.24923862]
[ 0. ]
[-1.53111525]]
[[ 0. ]
[-1.53111525]
[ 0.00935055]
[-1.24923862]]
[[ 0.00935055]
[-1.24923862]
[-0.05376605]
[-0.57201897]]
[[-0.05376605]
[-0.57201897]
[-0.1686466 ]
[ 0.08636661]]
[[-0.1686466 ]
[ 0.08636661]
[-0.244737 ]
[ 0.42873321]]
[[-0.244737 ]
[ 0.42873321]
[-0.19515612]
[ 0.36477665]]
[[-0.19515612]
[ 0.36477665]
[ 0. ]
[ 0. ]]
[[0.19515612]
[0.36477665]
[0. ]
[0.3563355 ]]
[[0.244737 ]
[0.42873321]
[0. ]
[0.42289958]]
[[0.16864666]
[0.08636661]
[0. ]
[0.0825908 ]]
[[ 0.05376605]
[-0.57201897]
[ 0. ]
[-0.57435396]]
[[-0.00935055]
[-1.24923862]
[ 0. ]
[-1.25069528]]
[[ 0. ]
[-1.53111525]
[ 0. ]
[-1.53227704]]
[[ 0.00935055]
[-1.24923862]
[ 0. ]
[-1.25069528]]
[[-0.05376605]
[-0.57201897]
[ 0. ]
[-0.57435396]]
[[-0.1686466 ]
[ 0.08636661]
[ 0. ]
[ 0.0825908 ]]
[[-0.244737 ]
[ 0.42873321]
[ 0. ]
[ 0.42289958]]
[[-0.19515612]
[ 0.36477665]
[ 0. ]
[ 0.3563355 ]]
Your function already converts everything into the right format except that the inner values should be stored into a list. For this you can use numpy.newaxis. It is used to add a new dimension to your array (good post about its functionality).
import numpy as np
def global_displacement(X):
global_displacements = np.array( [[ 0, 0, X[0], X[1]], [ X[0], X[1], X[2], X[3]], [ X[2], X[3],X[4], X[5]], [ X[4],X[5],X[6], X[7]],[ X[6],X[7],X[8],X[9]], [ X[8],X[9],X[10], X[11] ], [ X[10], X[11],X[12], X[13]], [ X[12], X[13],X[14], X[15]],[ X[14], X[15],X[16], X[17]],[ X[16], X[17],X[18], X[19]], [ X[18], X[19],X[20], X[21]],[ X[20], X[21], 0, 0],
[ X[0], X[1], X[23], X[24]], [ X[2], X[3], X[26],X[27]], [ X[4], X[5], X[29],X[30]], [ X[6], X[7], X[32],X[33]], [ X[8],X[9],X[35], X[36]], [ X[10], X[11], X[38], X[39]], [ X[12], X[13], X[41], X[42]] ,[ X[14], X[15], X[44], X[45]],[ X[16], X[17], X[47], X[48]],[ X[18], X[19], X[50], X[51]], [ X[20], X[21], X[53], X[54]] ] )
new_structure = global_displacements[:, :, np.newaxis]
return new_structure
X = [ 0.19515612, 0.36477665, 0.244737, 0.42873321, 0.16864666, 0.08636661, 0.05376605, -0.57201897, -0.00935055, -1.24923862, 0., -1.53111525, 0.00935055, -1.24923862, -0.05376605, -0.57201897, -0.1686466,
0.08636661, -0.244737, 0.42873321, -0.19515612, 0.36477665, 0.02279911, 0. , 0.3563355 , 0.01379104, 0. , 0.42289958, -0.00747999, 0. , 0.0825908, -0.02949519 , 0. , -0.57435396,
-0.04074819, 0. , -1.25069528 ,-0.02972642, 0. , -1.53227704, -0. , 0. , -1.25069528 , 0.02972642 , 0. , -0.57435396 , 0.04074819 , 0. , 0.0825908, 0.02949519, 0. ,
0.42289958, 0.00747999 , 0. , 0.3563355 , -0.01379104, -0.02279911]
result = global_displacement(X)
print(result)
Output:
[[[ 0. ]
[ 0. ]
[ 0.19515612]
[ 0.36477665]]
[[ 0.19515612]
[ 0.36477665]
[ 0.244737 ]
[ 0.42873321]]
[[ 0.244737 ]
[ 0.42873321]
[ 0.16864666]
[ 0.08636661]]
[[ 0.16864666]
[ 0.08636661]
[ 0.05376605]
[-0.57201897]]
[[ 0.05376605]
[-0.57201897]
[-0.00935055]
[-1.24923862]]
[[-0.00935055]
[-1.24923862]
[ 0. ]
[-1.53111525]]
[[ 0. ]
[-1.53111525]
[ 0.00935055]
[-1.24923862]]
[[ 0.00935055]
[-1.24923862]
[-0.05376605]
[-0.57201897]]
[[-0.05376605]
[-0.57201897]
[-0.1686466 ]
[ 0.08636661]]
[[-0.1686466 ]
[ 0.08636661]
[-0.244737 ]
[ 0.42873321]]
[[-0.244737 ]
[ 0.42873321]
[-0.19515612]
[ 0.36477665]]
[[-0.19515612]
[ 0.36477665]
[ 0. ]
[ 0. ]]
[[ 0.19515612]
[ 0.36477665]
[ 0. ]
[ 0.3563355 ]]
[[ 0.244737 ]
[ 0.42873321]
[ 0. ]
[ 0.42289958]]
[[ 0.16864666]
[ 0.08636661]
[ 0. ]
[ 0.0825908 ]]
[[ 0.05376605]
[-0.57201897]
[ 0. ]
[-0.57435396]]
[[-0.00935055]
[-1.24923862]
[ 0. ]
[-1.25069528]]
[[ 0. ]
[-1.53111525]
[ 0. ]
[-1.53227704]]
[[ 0.00935055]
[-1.24923862]
[ 0. ]
[-1.25069528]]
[[-0.05376605]
[-0.57201897]
[ 0. ]
[-0.57435396]]
[[-0.1686466 ]
[ 0.08636661]
[ 0. ]
[ 0.0825908 ]]
[[-0.244737 ]
[ 0.42873321]
[ 0. ]
[ 0.42289958]]
[[-0.19515612]
[ 0.36477665]
[ 0. ]
[ 0.3563355 ]]]
First off, you don't need .reshape to transform a 1D array of N elements into a 2D array that's N by 1. You can just add a dimension to the array.
Second, you generally don't want to write loops to handle a Numpy array. You want to use Numpy tools to process everything at once. Just think about the problem in the full number of dimensions: you want to transform a 2D array that's M by N, into a 3D one that's M by N by 1. That's... still just adding a dimension to the array.
So:
global_displacements = np.array(...)
return global_displacements[..., np.newaxis]

multiply numpy row with all elements in list

how to multiply all rows in numpy array with list elements one by one like first row in array with first tuple in list , second with second and so on.
i am doing this
utl = np.array([[ 3, 12. ],
[ 3. , 17. ]])
all_ltp = ([(0, 134.30000305175778), (1, 133.80000305175778)])
a=np.array(list(itertools.product(utl, all_ltp)))
a = np.reshape(a, (-1,4))
print(a)
output is -
[[ 3. 12. 0. 134.30000305]
[ 3. 12. 1. 133.80000305]
[ 3. 17. 0. 134.30000305]
[ 3. 17. 1. 133.80000305]]
it only works but if i increase the values of array then
utl = np.array([[ 3, 12. , 99 ],
[ 3. , 17. , 99 ]])
all_ltp = ([(0, 134.30000305175778), (1, 133.80000305175778)])
a=np.array(list(itertools.product(utl, all_ltp)))
a = np.reshape(a, (-1,2))
print(a)
output is -
[[array([ 3., 12., 99.]) (0, 134.30000305175778)]
[array([ 3., 12., 99.]) (1, 133.80000305175778)]
[array([ 3., 17., 99.]) (0, 134.30000305175778)]
[array([ 3., 17., 99.]) (1, 133.80000305175778)]]
it is also working but not combining elements
output must be -
[[ 3. 12. 99 0. 134.30000305]
[ 3. 12. 99 1. 133.80000305]
[ 3. 17. 99 0. 134.30000305]
[ 3. 17. 99 1. 133.80000305]]
First convert all_ltp to a Numpy array:
b = np.array(all_ltp)
Then generate 2 intermediate arrays, by repeating utl and tiling b:
wrk1 = np.repeat(utl, repeats=b.shape[0], axis=0)
wrk2 = np.tile(b, reps=(utl.shape[0], 1))
(print both of them to see the result).
And to get the final result, horizontally stack both these tables:
result = np.hstack((wrk1, wrk2))
The result, for your source data, is:
[[ 3. 12. 99. 0. 134.30000305]
[ 3. 12. 99. 1. 133.80000305]
[ 3. 17. 99. 0. 134.30000305]
[ 3. 17. 99. 1. 133.80000305]]
Or, to have more concise code, run:
result = np.hstack((np.repeat(utl, repeats=b.shape[0], axis=0),
np.tile(b, reps=(utl.shape[0], 1))))

Floating point with normalized function

I write own normalized module, because I seem sklearn don't normalize all data together (only per column or row). And I have two codes.
First code with sklearn.
from sklearn import preprocessing
data = np.array([[-1], [-0.5], [0], [1], [2], [6], [10], [18]])
print(data)
scaler = preprocessing.MinMaxScaler(feature_range=(5, 10))
print(scaler.fit_transform(data))
print(scaler.inverse_transform(scaler.fit_transform(data)))
Result:
[[-1. ]
[-0.5]
[ 0. ]
[ 1. ]
[ 2. ]
[ 6. ]
[10. ]
[18. ]]
[[ 5. ]
[ 5.13157895]
[ 5.26315789]
[ 5.52631579]
[ 5.78947368]
[ 6.84210526]
[ 7.89473684]
[10. ]]
[[-1. ]
[-0.5]
[ 0. ]
[ 1. ]
[ 2. ]
[ 6. ]
[10. ]
[18. ]]
And with my module:
data = np.array([[-1, 2], [-0.5, 6], [0, 10], [1, 18]])
print(data)
scaler = scl.Scaler(feature_range=(5, 10))
print(scaler.transform(data))
print(scaler.inverse_transform(scaler.transform(data)))
Result:
[[-1. 2. ]
[-0.5 6. ]
[ 0. 10. ]
[ 1. 18. ]]
[[ 5. 5.78947368]
[ 5.13157895 6.84210526]
[ 5.26315789 7.89473684]
[ 5.52631579 10. ]]
[[-1.00000000e+00 2.00000000e+00]
[-5.00000000e-01 6.00000000e+00]
[ 1.33226763e-15 1.00000000e+01]
[ 1.00000000e+00 1.80000000e+01]]
I guess 1.33226763e-15 don't suit for me.
I think it occur because there is floating point. Although sklearn don't have this problem.
Please tell me where do I do mistake?
import numpy as np
class Scaler:
def __init__(self, feature_range: tuple = (0, 1)):
self.scaler_min = feature_range[0]
self.scaler_max = feature_range[1]
self.data_min = None
self.data_max = None
def transform(self, x: np.ndarray):
self.data_min = x.min(initial=0)
self.data_max = x.max(initial=0)
scaled_data = (x - x.min(initial=0)) / (x.max(initial=0) - x.min(initial=0))
return scaled_data * (self.scaler_max - self.scaler_min) + self.scaler_min
def inverse_transform(self, x: np.ndarray):
scaled_data = (x - self.scaler_min) / (self.scaler_max - self.scaler_min)
return scaled_data * (self.data_max - self.data_min) + self.data_min

Remove square brackets in Python

I have this ouptut:
[[[-0.015, -0.1533, 1. ]]
[[-0.0069, 0.1421, 1. ]]
...
[[ 0.1318, -0.4406, 1. ]]
[[ 0.2059, -0.3854, 1. ]]]
But I would like to remove the square brackets that are leftover resulting as this:
[[-0.015 -0.1533 1. ]
[-0.0069 0.1421 1. ]
...
[ 0.1318 -0.4406 1. ]
[ 0.2059 -0.3854 1. ]]
My code is this:
XY = []
for i in range(4000):
Xy_1 = [round(random.uniform(-0.5, 0.5), 4), round(random.uniform(-0.5, 0.5), 4), 1]
Xy_0 = [round(random.uniform(-0.5, 0.5), 4), round(random.uniform(-0.5, 0.5), 4), 0]
Xy.append(random.choices(population=(Xy_0, Xy_1), weights=(0.15, 0.85)))
Xy = np.asarray(Xy)
You can use numpy.squeeze to remove 1 dim from array
>>> np.squeeze(Xy)
array([[ 0.3609, 0.2378, 0. ],
[-0.2432, -0.2043, 1. ],
[ 0.3081, -0.2457, 1. ],
...,
[ 0.311 , 0.03 , 1. ],
[-0.0572, -0.317 , 1. ],
[ 0.3026, 0.1829, 1. ]])
Or
reshape usingnumpy.reshape
>>> Xy.reshape(4000,3)
array([[ 0.3609, 0.2378, 0. ],
[-0.2432, -0.2043, 1. ],
[ 0.3081, -0.2457, 1. ],
...,
[ 0.311 , 0.03 , 1. ],
[-0.0572, -0.317 , 1. ],
[ 0.3026, 0.1829, 1. ]])
>>>
Try extend method.
Xy.extend(random.choices(population=(Xy_0, Xy_1), weights=(0.15, 0.85)))
You can use this one random.choices(population=(Xy_0, Xy_1), weights=(0.15, 0.85))[0]
XY = []
for i in range(4000):
Xy_1 = [round(random.uniform(-0.5, 0.5), 4), round(random.uniform(-0.5, 0.5), 4), 1]
Xy_0 = [round(random.uniform(-0.5, 0.5), 4), round(random.uniform(-0.5, 0.5), 4), 0]
# Pythonic way :-)
Xy.append(random.choices(population=(Xy_0, Xy_1), weights=(0.15, 0.85))[0])
Xy = np.asarray(Xy)
print(Xy)
Output
[[ 0.3948 0.0915 1. ]
[ 0.4197 -0.344 1. ]
[-0.4541 0.3192 1. ]
[ 0.3285 0.0453 1. ]
[-0.0171 -0.3088 1. ]
[ 0.2958 -0.2757 1. ]
[-0.1303 0.1581 0. ]
[-0.4146 -0.4454 1. ]
[ 0.0247 0.325 1. ]
[-0.227 0.139 1. ]]
You can try this to remove 1dim using sum.
a=[ [[-0.015, -0.1533, 1. ]],
[[-0.0069, 0.1421, 1. ]],
...
[[ 0.1318, -0.4406, 1. ]],
[[ 0.2059, -0.3854, 1. ]] ]
sum(a,[])
'''
[[-0.015, -0.1533, 1. ],
[-0.0069, 0.1421, 1. ],
...
[ 0.1318, -0.4406, 1. ],
[ 0.2059, -0.3854, 1. ]]
'''

Min-max normalisation of a NumPy array

I have the following numpy array:
foo = np.array([[0.0, 10.0], [0.13216, 12.11837], [0.25379, 42.05027], [0.30874, 13.11784]])
which yields:
[[ 0. 10. ]
[ 0.13216 12.11837]
[ 0.25379 42.05027]
[ 0.30874 13.11784]]
How can I normalize the Y component of this array. So it gives me something like:
[[ 0. 0. ]
[ 0.13216 0.06 ]
[ 0.25379 1 ]
[ 0.30874 0.097]]
Referring to this Cross Validated Link, How to normalize data to 0-1 range?, it looks like you can perform min-max normalisation on the last column of foo.
v = foo[:, 1] # foo[:, -1] for the last column
foo[:, 1] = (v - v.min()) / (v.max() - v.min())
foo
array([[ 0. , 0. ],
[ 0.13216 , 0.06609523],
[ 0.25379 , 1. ],
[ 0.30874 , 0.09727968]])
Another option for performing normalisation (as suggested by OP) is using sklearn.preprocessing.normalize, which yields slightly different results -
from sklearn.preprocessing import normalize
foo[:, [-1]] = normalize(foo[:, -1, None], norm='max', axis=0)
foo
array([[ 0. , 0.2378106 ],
[ 0.13216 , 0.28818769],
[ 0.25379 , 1. ],
[ 0.30874 , 0.31195614]])
sklearn.preprocessing.MinMaxScaler can also be used (feature_range=(0, 1) is default):
from sklearn import preprocessing
min_max_scaler = preprocessing.MinMaxScaler()
v = foo[:,1]
v_scaled = min_max_scaler.fit_transform(v)
foo[:,1] = v_scaled
print(foo)
Output:
[[ 0. 0. ]
[ 0.13216 0.06609523]
[ 0.25379 1. ]
[ 0.30874 0.09727968]]
Advantage is that scaling to any range can be done.
I think you want this:
foo[:,1] = (foo[:,1] - foo[:,1].min()) / (foo[:,1].max() - foo[:,1].min())
You are trying to min-max scale between 0 and 1 only the second column.
Using sklearn.preprocessing.minmax_scale, should easily solve your problem.
e.g.:
from sklearn.preprocessing import minmax_scale
column_1 = foo[:,0] #first column you don't want to scale
column_2 = minmax_scale(foo[:,1], feature_range=(0,1)) #second column you want to scale
foo_norm = np.stack((column_1, column_2), axis=1) #stack both columns to get a 2d array
Should yield
array([[0. , 0. ],
[0.13216 , 0.06609523],
[0.25379 , 1. ],
[0.30874 , 0.09727968]])
Maybe you want to min-max scale between 0 and 1 both columns. In this case, use:
foo_norm = minmax_scale(foo, feature_range=(0,1), axis=0)
Which yields
array([[0. , 0. ],
[0.42806245, 0.06609523],
[0.82201853, 1. ],
[1. , 0.09727968]])
note: Not to be confused with the operation that scales the norm (length) of a vector to a certain value (usually 1), which is also commonly referred to as normalization.

Categories

Resources