How to fill masked 2D array with values from another masked array

How to fill masked 2D array with values from another masked array - python

I am trying to generate 2 children from 2 parents by crossover. I want to fix a part from parent A and fill the blanks with elements from parent B.
I was able to mask both parents and get the elements on another array, but I am not able to fill in the gaps from the fixed part of Parent A with the fill elements from Parent B
Here's what I have tried so far:
import numpy as np
from numpy.random import default_rng
rng = default_rng()
numMachines = 5
numJobs = 5
population =[[[4, 0, 2, 1, 3],
[4, 2, 0, 1, 3],
[4, 2, 0, 1, 3],
[4, 0, 3, 2, 1],
[2, 3, 4, 1, 0]],
[[2, 0, 1, 3, 4],
[4, 3, 1, 2, 0],
[2, 0, 3, 4, 1],
[4, 3, 1, 0, 2],
[4, 0, 3, 1, 2]]]
parentA = np.array(population[0])
parentB = np.array(population[1])
childA = np.zeros((numJobs, numMachines))
np.copyto(childA, parentA)
childB = np.zeros((numJobs, numMachines))
np.copyto(childB, parentB)
subJobs = np.stack([rng.choice(numJobs ,size=int(np.max([2, np.floor(numJobs/2)])), replace=False) for i in range(numMachines)])
maskA = np.stack([(np.isin(childA[i], subJobs[i])) for i in range(numMachines)])
invMaskA = np.invert(maskA)
maskB = np.stack([(np.isin(childB[i], subJobs[i])) for i in range(numMachines)])
invMaskB = np.invert(maskB)
maskedChildAFixed = np.ma.masked_array(childA, maskA)
maskedChildBFixed = np.ma.masked_array(childB, maskB)
maskedChildAFill = np.ma.masked_array(childA, invMaskA)
maskedChildBFill = np.ma.masked_array(childB, invMaskB)
maskedChildAFill = np.stack([maskedChildAFill[i].compressed() for i in range(numMachines)])
maskedChildBFill = np.stack([maskedChildBFill[i].compressed() for i in range(numMachines)])
EDIT:
Sorry, I was so frustrated with this yesterday that I forgot to add some more information to make it more clear. First, I have fixed the code so it now runs by just copying and pasting (I forgot to add some import calls and some variables).
This is a fixed portion from Parent A that won't change in child A.
>>> print(maskedChildAFixed)
[[-- 0.0 2.0 -- 3.0]
[4.0 -- 0.0 1.0 --]
[4.0 -- -- 1.0 3.0]
[-- 0.0 3.0 2.0 --]
[-- -- 4.0 1.0 0.0]]
I need to fill in these blank parts with the fill part from parent B.
>>> print(maskedChildBFill)
[[1. 4.]
[3. 2.]
[2. 0.]
[4. 1.]
[3. 2.]]
For my children to be legal I can't repeat an integer in each row. If I try to use the "np.na.filled()" function with the compressed maskedChildBFill it gives me an error.
>>> print(np.ma.filled(maskedChildAFixed, fill_value=maskedChildBFill))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\Rafael\.conda\envs\CoutoBinario\lib\site-packages\numpy\ma\core.py", line 639, in filled
return a.filled(fill_value)
File "C:\Users\Rafael\.conda\envs\CoutoBinario\lib\site-packages\numpy\ma\core.py", line 3752, in filled
np.copyto(result, fill_value, where=m)
File "<__array_function__ internals>", line 6, in copyto
ValueError: could not broadcast input array from shape (5,2) into shape (5,5)
I'll now coment the part of the code that compresses the fill portion (lines 46 and 47). It won't delete the blank spaces from the maskedChildBFill so that the size of the matrices are preserved.
>>> print(np.ma.filled(maskedChildAFixed, fill_value=maskedChildBFill))
[[2. 0. 2. 3. 3.]
[4. 3. 0. 1. 0.]
[4. 0. 3. 1. 3.]
[4. 0. 3. 2. 2.]
[4. 0. 4. 1. 0.]]
See how I get an invalid individual? Note the repeated integers in row 1. The individual should look like this:
[[1.0 0.0 2.0 4.0 3.0]
[4.0 3.0 0.0 1.0 2.0]
[4.0 2.0 0.0 1.0 3.0]
[4.0 0.0 3.0 2.0 1.0]
[3.0 2.0 4.0 1.0 0.0]]
I hope this update makes it easier to understand what I am trying to do. Thanks for all the help so far! <3
EDIT 2
I was able to work around by converting everything to list and then with for loops substitute the values in place, but this should be super slow. There might be a way to do this using numpy.
maskedChildAFill = maskedChildAFill.tolist()
maskedChildBFill = maskedChildBFill.tolist()
maskedChildAFixed = maskedChildAFixed.tolist()
maskedChildBFixed = maskedChildBFixed.tolist()
for i in range(numMachines):
counterA = 0
counterB = 0
for n, j in enumerate(maskedChildAFixed[i]):
if maskedChildAFixed[i][n] is None:
maskedChildAFixed[i][n] = maskedChildBFill[i][counterA]
counterA += 1
for n, j in enumerate(maskedChildBFixed[i]):
if maskedChildBFixed[i][n] is None:
maskedChildBFixed[i][n] = maskedChildAFill[i][counterB]
counterB += 1

I think you are looking for this:
parentA = np.array(population[0])
parentB = np.array(population[1])
childA = np.zeros((numJobs, numMachines))
np.copyto(childA, parentA)
childB = np.zeros((numJobs, numMachines))
np.copyto(childB, parentB)
subJobs = np.stack([rng.choice(numJobs ,size=int(np.max([2, np.floor(numJobs/2)])), replace=False) for i in range(numMachines)])
maskA = np.stack([(np.isin(childA[i], subJobs[i])) for i in range(numMachines)])
invMaskA = np.invert(maskA)
maskB = np.stack([(np.isin(childB[i], subJobs[i])) for i in range(numMachines)])
invMaskB = np.invert(maskB)
maskedChildAFixed = np.ma.masked_array(childA, maskA)
maskedChildBFixed = np.ma.masked_array(childB, maskB)
maskedChildAFill = np.ma.masked_array(childB, invMaskA)
maskedChildBFill = np.ma.masked_array(childA, invMaskB)
from operator import and_
crossA = np.ma.array(maskedChildAFixed.filled(0)+maskedChildAFill.filled(0),mask=list(map(and_,maskedChildAFixed.mask,maskedChildAFill.mask)))
crossB = np.ma.array(maskedChildBFixed.filled(0)+maskedChildBFill.filled(0),mask=list(map(and_,maskedChildBFixed.mask,maskedChildBFill.mask)))
Please note that I change line maskedChildAFill = np.ma.masked_array(childB, invMaskA) to fit the description of your problem. If that is not what you want, simply change it back to your original code. The last two lines should do the work for you.
output:
crossA
[[4.0 0.0 2.0 1.0 4.0]
[4.0 2.0 0.0 2.0 0.0]
[2.0 2.0 3.0 1.0 3.0]
[4.0 3.0 3.0 2.0 2.0]
[2.0 0.0 4.0 1.0 0.0]]
crossB
[[2.0 0.0 1.0 1.0 4.0]
[4.0 2.0 0.0 2.0 0.0]
[2.0 2.0 3.0 1.0 1.0]
[4.0 3.0 3.0 2.0 2.0]
[4.0 0.0 4.0 1.0 2.0]]
EDIT: Per OP's edit on question, this would work for the purpose:
maskedChildAFixed[np.where(maskA)] = maskedChildBFill.ravel()
maskedChildBFixed[np.where(maskB)] = maskedChildAFill.ravel()
Example output for maskedChildAFixed:
[[4.0 0.0 2.0 1.0 3.0]
[4.0 2.0 0.0 1.0 3.0]
[3.0 2.0 0.0 1.0 4.0]
[4.0 0.0 3.0 2.0 1.0]
[1.0 3.0 4.0 2.0 0.0]]

Related

Format Python numpy array output

How do I get this code to always return 1 decimal for every element in the array?
import numpy as np
def mult_list_with_x(liste, skalar):
print(np.array(liste) * skalar)
liste = [1, 1.5, 2, 2.5, 3]
skalar = 2
mult_list_with_x(liste, skalar)
I.e.:
[2.0 3.0 4.0 5.0 6.0]
not
[2. 3. 4. 5. 6.]

You can use np.set_printoptions to set the format:
import numpy as np
def mult_list_with_x(liste, skalar):
print(np.array(liste) * skalar)
liste = [1, 1.5, 2, 2.5, 3]
skalar = 2
np.set_printoptions(formatter={'float': '{: 0.1f}'.format})
mult_list_with_x(liste, skalar)
Output:
[ 2.0 3.0 4.0 5.0 6.0]
Note that this np printoptions setting is permanent - see below for a temporary option.
Or to reset the defaults afterwards use:
np.set_printoptions(formatter=None)
np.get_printoptions() # to check the settings
An option to temporarily set the print options - kudos to mozway for the hint in the comments!:
with np.printoptions(formatter={'float': '{: 0.1f}'.format}):
print(np.array(liste) * skalar)
An option to just format the print output as string:
print(["%.1f" % x for x in ( np.array(liste) * skalar)])
Output:
['2.0', '3.0', '4.0', '5.0', '6.0']
Choose an option fitting how the output should further be used.

You need to use this setup first:
float_formatter = "{:.1f}".format
np.set_printoptions(formatter={'float_kind':float_formatter})
Output
[2.0 3.0 4.0 5.0 6.0]

Read data with missing values with Python into arrays

I have a datafile which is similar to (The original file is much bigger),
Data
6 6
0.0 0.2 0.4
0.6 0.8
1.0
0.0 0.4 0.6
1.2 1.6
2.0
1.0 3.0 4.0 1.0
1.0 3.0
1.0 2.0 1.0 4.0
5.0 2.0
3.0 3.0 1.0 1.0
5.0 1.0
2.0 7.0 1.0 1.0
5.0 2.0
2.0 3.0 8.0 6.0
3.0 1.0
3.0 3.0 4.0 6.0
1.0 1.0
and I need to plot it as a 2D contour.
The first line is dummy, The first 6 shows the number of elements in the x direction as
0.0 0.2 0.4
0.6 0.8
1.0
and the second one shows the number of elements in the y direction as,
0.0 0.4 0.6
1.2 1.6
2.0
Then each 6 number shows the value of contour in each row, starting from row 1 as,
1.0 3.0 4.0 1.0
1.0 3.0
I want to cast this data into a 2D array so that I can plot them.
I tried,
data = numpy.genfromtxt('fileHere',delimiter= " ",skip_header=1)
to read data into a general array and then split it. But I get the following error,
Line #16999 (got 15 columns instead of 3)
I also tried the readline() and split() functions of Python but they make it much harder to continue. I want to have x and y in arrays and a separate array for the data in a let's say 6X6 shape. In Matlab I used to use the fscanf function
fscanf(fid,'%d',6);
I will be happy to have your ideas on this. Thanks

I think you want to read the full file into a variable, then replace all newline '\n', then split and convert it into an ndarray.
Here's how I did it.
import numpy as np
txt = '''\
6 6
0.0 0.2 0.4
0.6 0.8
1.0
0.0 0.4 0.6
1.2 1.6
2.0
1.0 3.0 4.0 1.0
1.0 3.0
1.0 2.0 1.0 4.0
5.0 2.0
3.0 3.0 1.0 1.0
5.0 1.0
2.0 7.0 1.0 1.0
5.0 2.0
2.0 3.0 8.0 6.0
3.0 1.0
3.0 3.0 4.0 6.0
1.0 1.0'''
txt_list = txt.replace('\n',' ').split()
# if you want to convert the values to floats, you can include the next line
# otherwise the data will be converted to numpy as string
txt_list = [float(i) for i in txt_list]
width = int(txt_list[1])
height = len(txt_list[2:])//width
txt_array = np.reshape(txt_list[2:], (height, width))
print (txt_array)
The output of this will be:
[['0.0' '0.2' '0.4' '0.6' '0.8' '1.0']
['0.0' '0.4' '0.6' '1.2' '1.6' '2.0']
['1.0' '3.0' '4.0' '1.0' '1.0' '3.0']
['1.0' '2.0' '1.0' '4.0' '5.0' '2.0']
['3.0' '3.0' '1.0' '1.0' '5.0' '1.0']
['2.0' '7.0' '1.0' '1.0' '5.0' '2.0']
['2.0' '3.0' '8.0' '6.0' '3.0' '1.0']
['3.0' '3.0' '4.0' '6.0' '1.0' '1.0']]

Quite a smart solution to reformat your input is possible using
Pandas.
Start with reading your input file as a pandasonic DataFrame (with
standard field separator, i.e. a comma):
df = pd.read_csv('Input.txt')
As your input file does not contain commas, each line is read as a single
field and the column name (Data) is taken from the first line.
So far the initial part of df is:
Data
0 6 6
1 0.0 0.2 0.4
2 0.6 0.8
3 1.0
4 0.0 0.4 0.6
The left column is the index, but it is not important.
The type of the only column is object, actually a string.
Then, to reformat this DataFrame into a 6-columns Numpy array, it
is enough to run the following one-liner:
result = df.drop(0).Data.str.split(' ').explode().astype('float').values.reshape(-1, 6)
Steps:
drop(0) - Drop the initial row (with index 0).
Data - Take Data column.
str.split(' ') - Split each element on spaces (the result is a list of strings).
explode() - Convert each list into a sequence of rows. So far each
element is of string type.
astype('float') - Change the type to float.
values - Take the underlying Numpy (1-D) array.
reshape(-1, 6) - Reshape to 6 columns and as many rows as needed.
The result, for your data sample is:
array([[0. , 0.2, 0.4, 0.6, 0.8, 1. ],
[0. , 0.4, 0.6, 1.2, 1.6, 2. ],
[1. , 3. , 4. , 1. , 1. , 3. ],
[1. , 2. , 1. , 4. , 5. , 2. ],
[3. , 3. , 1. , 1. , 5. , 1. ],
[2. , 7. , 1. , 1. , 5. , 2. ],
[2. , 3. , 8. , 6. , 3. , 1. ],
[3. , 3. , 4. , 6. , 1. , 1. ]])
And the last step is to divide this array into:
2 initial rows (x and y coordinates),
following rows (actual data),
To do it, run:
x = result[0]
y = result[1]
data = result[2:]
Alternative: Don't create separate variables, but call plt.contour
passing respective rows of result as X, Y and Z.
Something like:
plt.contour(result[0], result[1], result[2:]);

Drop or modify consecutive duplicate rows

Suppose we have a DataFrame with two types of data: float and ndarray (shape is always (2,)):
data = [
0.1, np.array([1.0, 0.1]), np.array([1.0, 0.1]),
np.array([1.0, 0.1]), 0.1, 0.1, np.array([0.1, 1.0]), 1.0
]
df = pd.DataFrame(
{'A': data,}, index=[0., 1., 2.0, 2.6, 3., 3.2, 3.4, 4.0]
)
x | A
----|-------------
0.0 | 1.0
1.0 | [1.0, 0.1]
2.0 | [1.0, 0.1]
2.6 | [1.0, 0.1]
3.0 | 0.1
3.2 | 0.1
3.4 | [0.1, 1.0]
4.0 | 1.0
I would like to process consecutive duplicates in order to:
Drop repetitions if they are floats (keeping the first in the "group");
Modifying each element in the "group" using all index values this group if these elements are ndarrays.
The expected result for a given example would be something like (here I tried to proportionally split the range [1., 0.1] onto three regions):
x | A
----|-------------
0.0 | 1.0
1.0 | [1.0, 0.55]
2.0 | [0.55, 0.28]
2.6 | [0.28, 0.1]
3.0 | 0.1
3.4 | [0.1, 1.0]
4.0 | 1.0
To start with, I've tried using df != df.shift() to find duplicates, but it would raise an error when coparisng float with ndarry and would not "group" more than 2 elements.
I was also trying to groupby(by=function), where function is checking the dtype of the element, but it seems that groupby is acting ony on index in this case.
Obviously, I can loop through rows and keep track of repetitions, but it is not very elegant (efficient).
Do you have any suggestions?

Step 1: Drop consecutive equal floats
To check whether 2 elements of a row are equal floats, define
the following function:
def equalFloats(row):
if (type(row.A).__name__ == 'float') and (type(row.B).__name__ == 'float'):
return row.A == row.B
return False
Then, temporarily add to df column B containing the previous value
in A column:
df['B'] = df.A.shift()
And to drop consecutive floats in A column (and also drop
B column) run:
df = df[~df.apply(equalFloats, axis=1)][['A']]
For the time being df contains:
A
0.0 0.1
1.0 [1.0, 0.1]
2.0 [1.0, 0.1]
2.6 [1.0, 0.1]
3.0 0.1
3.4 [0.1, 1.0]
4.0 1
4.5 2.1
To check that consecutive, but different floats are not removed,
I added row with index 4.5 and value 2.1. As you see, it was not removed.
Step 2: Convert arrays in A column
Define another function:
def step2(row):
if type(row.A).__name__ == 'ndarray':
arr = row.A
arr[1] = arr.sum() / row.name
return row
return row
(row.name is the index value of the current row).
Then apply it:
df = df.apply(step2, axis=1)
The result is:
A
0.0 0.1
1.0 [1.0, 2.1]
2.0 [1.0, 0.775]
2.6 [1.0, 0.5473372781065089]
3.0 0.1
3.4 [0.1, 0.12456747404844293]
4.0 1
4.5 2.1
If you want, change the formula in step2 to any other of your choice.
Edit following comments
I defined df as:
A
0.0 0.1
1.0 [1.0, 0.1]
2.0 [1.0, 0.1]
2.6 [1.0, 0.1]
3.0 0.1
3.1 0.1
3.2 0.1
3.4 [0.1, 1.0]
4.0 1
4.5 2.1
It contains 3 consecutive 0.1 values.
Note that you didn't write about how many consecutive
values contains such a "group".
Both functions can be defined also with isinstance:
def equalFloats(row):
if isinstance(row.A, float) and isinstance(row.B, float):
return row.A == row.B
return False
def step2(row):
if isinstance(row.A, np.ndarray):
arr = row.A
arr[1] = arr.sum() / row.name
return row
return row
Then after you run:
df['B'] = df.A.shift()
df = df[~df.apply(equalFloats, axis=1)][['A']]
df = df.apply(step2, axis=1)
The result is:
A
0.0 0.1
1.0 [1.0, 1.1]
2.0 [1.0, 0.55]
2.6 [1.0, 0.4230769230769231]
3.0 0.1
3.4 [0.1, 0.3235294117647059]
4.0 1
4.5 2.1
As you can see, from a sequence of 3 values of 0.1 remained
only the first.

Filtering Numpy's array of arrays

Working with a numpy's ndarray for preprossessing data to a neural network. It basically contains several fixed-length arrays for sensor data. So for example:
>>> type(arr)
<class 'numpy.ndarray'>
>>> arr.shape
(400,1,5,4)
>>> arr
[
[[ 9.4 -3.7 -5.2 3.8]
[ 2.8 1.4 -1.7 3.4]
[ 0.0 0.0 0.0 0.0]
[ 0.0 0.0 0.0 0.0]
[ 0.0 0.0 0.0 0.0]]
..
[[ 0.0 -1.0 2.1 0.0]
[ 3.0 2.8 -3.0 8.2]
[ 7.5 1.7 -3.8 2.6]
[ 0.0 0.0 0.0 0.0]
[ 0.0 0.0 0.0 0.0]]
]
Each of the nested array is shaped (1, 5,4). The goal is to run through this arr and select only those arrays having at least the first three rows as non-zero (although single entry can be zero, but not whole row).
So in the example I give above, the first nested array should be deleted because only 2 first-rows are non-zero, whereas we need 3 and above.

Here's a trick you can use:
mask = arr[:,:,:3].any(axis=3).all(axis=2)
arr_filtered = arr[mask]
Quick explanation: To keep a nested array it should have at least 3 first rows (hence we need to look only at arr[:,:,:3]) such that all of them (hence .all(axis=2) at the end) have at least one non-zero entry (hence .any(axis=3)).

Pandas Multi-Index DataFrame to Numpy Ndarray

I am trying to convert a multi-index pandas DataFrame into a numpy.ndarray. The DataFrame is below:
s1 s2 s3 s4
Action State
1 s1 0.0 0 0.8 0.2
s2 0.1 0 0.9 0.0
2 s1 0.0 0 0.9 0.1
s2 0.0 0 1.0 0.0
I would like the resulting numpy.ndarray to be the following with np.shape() = (2,2,4):
[[[ 0.0 0.0 0.8 0.2 ]
[ 0.1 0.0 0.9 0.0 ]]
[[ 0.0 0.0 0.9 0.1 ]
[ 0.0 0.0 1.0 0.0]]]
I have tried df.as_matrix() but this returns:
[[ 0. 0. 0.8 0.2]
[ 0.1 0. 0.9 0. ]
[ 0. 0. 0.9 0.1]
[ 0. 0. 1. 0. ]]
How do I return a list of lists for the first level with each list representing an Action records.

You could use the following:
dim = len(df.index.get_level_values(0).unique())
result = df.values.reshape((dim1, dim1, df.shape[1]))
print(result)
[[[ 0. 0. 0.8 0.2]
[ 0.1 0. 0.9 0. ]]
[[ 0. 0. 0.9 0.1]
[ 0. 0. 1. 0. ]]]
The first line just finds the number of groups that you want to groupby.
Why this (or groupby) is needed: as soon as you use .values, you lose the dimensionality of the MultiIndex from pandas. So you need to re-pass that dimensionality to NumPy in some way.

One way
In [151]: df.groupby(level=0).apply(lambda x: x.values.tolist()).values
Out[151]:
array([[[0.0, 0.0, 0.8, 0.2],
[0.1, 0.0, 0.9, 0.0]],
[[0.0, 0.0, 0.9, 0.1],
[0.0, 0.0, 1.0, 0.0]]], dtype=object)

Using Divakar's suggestion, np.reshape() worked:
>>> print(P)
s1 s2 s3 s4
Action State
1 s1 0.0 0 0.8 0.2
s2 0.1 0 0.9 0.0
2 s1 0.0 0 0.9 0.1
s2 0.0 0 1.0 0.0
>>> np.reshape(P,(2,2,-1))
[[[ 0. 0. 0.8 0.2]
[ 0.1 0. 0.9 0. ]]
[[ 0. 0. 0.9 0.1]
[ 0. 0. 1. 0. ]]]
>>> np.shape(P)
(2, 2, 4)

Elaborating on Brad Solomon's answer, to get a sligthly more generic solution - indexes of different sizes and an unfixed number of indexes - one could do something like this:
def df_to_numpy(df):
try:
shape = [len(level) for level in df.index.levels]
except AttributeError:
shape = [len(df.index)]
ncol = df.shape[-1]
if ncol > 1:
shape.append(ncol)
return df.to_numpy().reshape(shape)
If df has missing sub-indexes reshape will not work. One way to add them would be (maybe there are better solutions):
def enforce_df_shape(df):
try:
ind = pd.MultiIndex.from_product([level.values for level in df.index.levels])
except AttributeError:
return df
fulldf = pd.DataFrame(-1, columns=df.columns, index=ind) # remove -1 to fill fulldf with nan
fulldf.update(df)
return fulldf

If you are just trying to pull out one column, say s1, and get an array with shape (2,2) you can use the .index.levshape like this:
x = df.s1.to_numpy().reshape(df.index.levshape)
This will give you a (2,2) containing the value of s1.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to fill masked 2D array with values from another masked array - python

Related

Format Python numpy array output

Read data with missing values with Python into arrays

Drop or modify consecutive duplicate rows

Filtering Numpy's array of arrays

Pandas Multi-Index DataFrame to Numpy Ndarray

Categories

Resources