Python - issue with dimension of sequency - python

I want to create in Python the following sequence of zero's and one's:
{0, 1,1,1,1, 0,0, 1,1,1, 0,0,0, 1,1, 0,0,0,0, 1}
So there is first 1 zero and 4 one's, then 2 zeros and 3 one's, then 3 zeros and 2 ones and finally 4 zeros and 1 one. The final array is supposed to have dimension 20x1, but my code gives me the dimension 4x2. Does anyone know how I can fix this?
Here's my code:
import numpy as np
seq = [ (np.ones(n), np.zeros(5-n) ) for n in range(1,5)]
Many thanks in advance!

For each iteration you create a tuple of two things, hence the 4x2 result. You can bring it to the form you want by concatenating the array elements all together, but there is a pattern to your sequence; you can take advantage that it looks like a triangular matrix of 1s and 0s, which you can then flatten.
n = 5
ones = np.ones((n, n), dtype=int)
seq = np.triu(ones)[1:].flatten()
Output:
array([0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1])

You can use flatten:
import numpy as np
l = np.array([[0] * n + [1] * (5 - n) for n in range(1, 5)]).flatten()
print(l)
# >>> [0 1 1 1 1 0 0 1 1 1 0 0 0 1 1 0 0 0 0 1]

Related

Insert value in numpy array with conditions

I want to insert the value in the NumPy array as follows,
If Nth row is the same as (N-1)th row insert 1 for Nth row and (N-1)th row and rest 0
If Nth row is different from (N_1)th row then change column and repeat condition
Here is the example
d = {'col1': [2,2, 3,3,3, 4,4, 5,5,5,],
'col2': [3,3, 4,4,4, 1,1, 0,0,0]}
df = pd.DataFrame(data=d)
np.zeros((10,4))
###########################################################
OUTPUT MATRIX
1 0 0 0 First two rows are the same so 1,1 in a first column
1 0 0 0
0 1 0 0 Three-rows are same 1,1,1
0 1 0 0
0 1 0 0
0 0 1 0 Again two rows are the same 1,1
0 0 1 0
0 0 0 1 Again three rows are same 1,1,1
0 0 0 1
0 0 0 1
IIUC, you can achieve this simply with numpy indexing:
# group by successive identical values
group = df.ne(df.shift()).all(1).cumsum().sub(1)
# craft the numpy array
a = np.zeros((len(group), group.max()+1), dtype=int)
a[np.arange(len(df)), group] = 1
print(a)
Alternative with numpy.identity:
# group by successive identical values
group = df.ne(df.shift()).all(1).cumsum().sub(1)
shape = df.groupby(group).size()
# craft the numpy array
a = np.repeat(np.identity(len(shape), dtype=int), shape, axis=0)
print(a)
output:
array([[1, 0, 0, 0],
[1, 0, 0, 0],
[0, 1, 0, 0],
[0, 1, 0, 0],
[0, 1, 0, 0],
[0, 0, 1, 0],
[0, 0, 1, 0],
[0, 0, 0, 1],
[0, 0, 0, 1],
[0, 0, 0, 1]])
intermediates:
group
0 0
1 0
2 1
3 1
4 1
5 2
6 2
7 3
8 3
9 3
dtype: int64
shape
0 2
1 3
2 2
3 3
dtype: int64
other option
for fun, likely no so efficient on large inputs:
a = pd.get_dummies(df.agg(tuple, axis=1)).to_numpy()
Note that this second option uses groups of identical values, not successive identical values. For identical values with the first (numpy) approach, you would need to use group = df.groupby(list(df)).ngroup() and the numpy indexing option (this wouldn't work with repeating the identity).

Is there a way to simplify the creation of all possible (length x height) grids?

Here's my code for a 4x4 grid to better explain my problem:
#The "Duct-Tape" solution
for box0 in range(0,2):
for box1 in range(0,2):
for box2 in range(0,2):
for box3 in range(0,2):
for box4 in range(0,2):
for box5 in range(0,2):
for box6 in range(0,2):
for box7 in range(0,2): #0 = OutBag, 1 = InBag
for box8 in range(0,2):
for box9 in range(0,2):
for box10 in range(0,2):
for box11 in range(0,2):
for box12 in range(0,2):
for box13 in range(0,2):
for box14 in range(0,2):
for box15 in range(0,2):
totalGrids.append([[box0,box1,box2,box3],
[box4,box5,box6,box7],
[box8,box9,box10,box11],
[box12,box13,box14,box15]])
What's a way to make something like this for a length x height size grid?
This is another way to do it with fewer for loops by using binary arithmetic:
totalGrids = []
for i in range(0, 1 << 16):
totalGrids.append(
[
[(i >> j) & 1 for j in range(0, 4)],
[(i >> j) & 1 for j in range(4, 8)],
[(i >> j) & 1 for j in range(8, 12)],
[(i >> j) & 1 for j in range(12, 16)]
])
print(totalGrids[0])
print(totalGrids[1])
print(totalGrids[2])
print()
print(totalGrids[-3])
print(totalGrids[-2])
print(totalGrids[-1])
Output (first 3 and last 3 elements):
[[0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0]]
[[1, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0]]
[[0, 1, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0]]
[[1, 0, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1]]
[[0, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1]]
[[1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1]]
To generalize this from 4 x 4 to height x width, something like this should work:
height = 3
width = 5
totalGrids = []
for i in range(0, 1 << (height * width)):
totalGrids.append(
[[(i >> j) & 1 for j in range(k * width, (k + 1) * width)] for k in range(0, height)]
)
Here is an explanation of the above.
The matrix, which has height x width elements, is to be filled with every possible combination of 0s and 1s across these elements. As an example, if height = 2 and width = 4, then there are 8 elements in total, and one ordering of the required combinations of 0s and 1s is:
0 0 0 0 0 0 0 0 (this is 0 in binary)
0 0 0 0 0 0 0 1 (this is 1 in binary)
0 0 0 0 0 0 1 0 (this is 2 in binary)
0 0 0 0 0 0 1 1 (this is 3 in binary)
...
0 0 0 0 1 1 1 1 (this is 15 in binary)
0 0 0 1 0 0 0 0 (this is 16 in binary)
0 0 0 1 0 0 0 1
0 0 0 1 0 0 1 0
0 0 0 1 0 0 1 1 (EXAMPLE VALUE USED BELOW)
...
0 0 1 0 0 0 0 0 (this is 32 in binary)
...
0 0 1 1 0 0 0 0 (this is 48 in binary)
...
1 1 1 1 1 1 1 1 (this is 255 = 2**8 - 1 in binary)
These are just the binary values from 0 to 2**8 - 1 which can be expressed as Python integers in range(0, 2**8). They are exactly what is needed, and now the only question is how to populate a Python list of lists of size height x width.
The answer is to use binary arithmetic. Let's look at 0 0 0 1 0 0 1 1 as an example. We can specify this in Python as an integer, namely i = 19.
For the 1st slot of 8, we want to use the rightmost binary bit in our example, which is 1. We can extract this using Python's bitwise & operation by taking value = i & 1. Applying & 1 to any integer effectively masks off all but the binary ones-place digit.
For the 2nd slot, we need to add an additional step:
First we slide the bits to the right by 1 position (allowing the rightmost bit to fall off the edge, which is fine since we have already processed it and won't need it again) using Python's right shift operation >> as follows: value = i >> 1. In binary, this yields 0 0 0 0 1 0 0 1, which is the integer 9. The right-shift operator has moved the bit that was in the binary twos-place rightward into the binary ones-place.
Next, we can use the same technique as we did for the 1st slot to mask off all but the ones-place bit: value = i & 1.
Rather than do the above as two separate statements, we can simply write: value = (i >> 1) & 1.
In general, for the j'th slot, we can extract the j'th bit from our example integer by writing: value = (i >> j) & 1.
Now let's look at the key logic within the loop:
[[(i >> j) & 1 for j in range(k * width, (k + 1) * width)] for k in range(0, height)]
This uses a nested list comprehension to loop first over k in range(0, height) and then over j in range(k * width, (k + 1) * width), and to put the result of the above bitwise expression (i >> j) & 1 into each successive element in our matrix (or list of lists).
Finally, let's look again at the very outer loop in the code:
for i in range(0, 1 << (height * width)):
This uses Python's bitwise left shift operation <<, which does the opposite of what right shift (>>) does, namely to shift the bits of 1 to the left by (height * width) binary positions. Because each shift to the left causes a number to double in value, our left shift expression gives the same result as 2 ** (height * width), which is exactly the number of 0/1 combinations that your question is seeking.
So, by iterating from 0 to 2 ** (height * width), then extracting and collating the bits of each value into the corresponding matrix elements for that iteration's matrix, and appending that matrix to the totalGrids variable, we ultimately construct a list of matrices with the required properties.

How to change values into a matrix?

I have a matrix filled with 0 values and I want to add randomly a 1 value into a and a+1 position. Then I want to use b and b+1 for the next row.. and so on.
How can I do it?
w, h = 10, 3
Matrix = [[0 for x in range(w)] for y in range(h)]
a = random.randint(0,9)
b = random.randint(0,9)
c = random.randint(0,9)
print(a, b, c)
EXAMPLE:
a = 5 b = 2 c = 1
0000011000
0011000000
0110000000
You should reduce the randint range to 8 (more generically w-2), or alternatively use randrange so you don't cross the edge of the row with the +1.
Then just loop on each row, generate the number and change that row using the number as an index:
import random
w, h = 10, 3
matrix = [[0 for x in range(w)] for y in range(h)]
for row in matrix:
i = random.randrange(w-1)
print(i)
row[i:i+2] = [1, 1]
print(*matrix, sep='\n')
Will give:
0
8
2
[1, 1, 0, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 0, 1, 1]
[0, 0, 1, 1, 0, 0, 0, 0, 0, 0]
In matrices in Python, you can access the row with [] and the columns with [][],
So if you want the third item in the second row, it would be [1][2] note we're starting from 0.
Getting back to your question, it would look something like this,
Matrix[0][a] = 1
Matrix[0][a + 1] = 1
And so on for the b, c but you could also use a loop.
random = [a, b, c]
for row in range(h):
Matrix[row][random[row]] = 1
Matrix[row][random[row] + 1] = 1
Basically what we're doing in this loop, is for each row in the matrix, which in your code is defined by h and we will look at the first row and define at the index of the random value 'a' and update it to 1.
then we'll go to 'a + 1' in the same row and also update it to 1.
And then for the next row, we'll take b and do the same thing.
Note: this would raise an error if a, b or c are 9 because as soon as you increase it by 1 you will be out the boundaries of the list which is 0 to 9

How do I replace an element with another new element in the same index and move the previous element to the next index

I have this problem where I want to replace an element with a new element, and instead of removing the element I replaced, I just want it to move to the next index.
import numpy as np
empty_arr = [0] * 5
arr = np.array(empty_arr)
inserted = np.insert(empty_arr, 1, 3)
inserted = np.insert(empty_arr, 1, 4)
#Output:
[0 4 0 0 0 0]
I don't know the right syntax for this but I just want to replace element 3 with 4
#Expected Output:
[0 3 4 0 0 0] #move the element 4 to next index
You are placing the result of the first insertion in the inserted variable but you are starting over from the original array for the 2nd insertion and overriding the previous result.
You should start the 2nd insertion from the previous result:
inserted = np.insert(empty_arr, 1, 3)
inserted = np.insert(inserted, 1, 4)
BTW, do you have to use numpy arrays for this ? regular Python lists seem better suited:
empty_arr = [0] * 5
empty_arr.insert(1,3)
empty_arr.insert(1,4)
print(empty_arr)
[0, 4, 3, 0, 0, 0, 0]
Note that if you want 4 to appear after 3 in you result, you either have to insert them in the reverse order at index 1 or insert 4 at index 2 after inserting 3 at index 1.
import numpy as np
empty_arr = [0] * 5
arr = np.array(empty_arr)
empty_arr = np.insert(empty_arr, 1, 3)
empty_arr = np.insert(empty_arr, 1, 4)
#output
array([0, 4, 3, 0, 0, 0, 0])

Numpy: how to convert observations to probabilities?

I have a feature matrix and a corresponding targets, which are ones or zeroes:
# raw observations
features = np.array([[1, 1, 0],
[1, 1, 0],
[0, 1, 0],
[0, 1, 0],
[0, 1, 0],
[0, 0, 1]])
targets = np.array([1, 0, 1, 1, 0, 0])
As you can see, each feature may correspond to both ones and zeros. I need to convert my raw observation matrix to probability matrix, where each feature will correspond to the probability of seeing one as a target:
[1 1 0] -> 0.5
[0 1 0] -> 0.67
[0 0 1] -> 0
I have constructed a quite straight-forward solution:
import numpy as np
# raw observations
features = np.array([[1, 1, 0],
[1, 1, 0],
[0, 1, 0],
[0, 1, 0],
[0, 1, 0],
[0, 0, 1]])
targets = np.array([1, 0, 1, 1, 0, 0])
from collections import Counter
def convert_obs_to_proba(features, targets):
features_ = []
targets_ = []
# compute unique rows (idx will point to some representative)
b = np.ascontiguousarray(features).view(np.dtype((np.void, features.dtype.itemsize * features.shape[1])))
_, idx = np.unique(b, return_index=True)
idx = idx[::-1]
zeros = Counter()
ones = Counter()
# collect row-wise number of one and zero targets
for i, row in enumerate(features[:]):
if targets[i] == 0:
zeros[tuple(row)] += 1
else:
ones[tuple(row)] += 1
# iterate over unique features and compute probabilities
for k in idx:
unique_row = features[k]
zero_count = zeros[tuple(unique_row)]
one_count = ones[tuple(unique_row)]
proba = float(one_count) / float(zero_count + one_count)
features_.append(unique_row)
targets_.append(proba)
return np.array(features_), np.array(targets_)
features_, targets_ = convert_obs_to_proba(features, targets)
print(features_)
print(targets_)
which:
extracts unique features;
counts number of zero and one observations targets for each unique feature;
computes probability and constructs the result.
Could it be solved in a prettier way using some advanced numpy magic?
Update. Previous code was pretty inefficient O(n^2). Converted it to more performance-friendly. Old code:
import numpy as np
# raw observations
features = np.array([[1, 1, 0],
[1, 1, 0],
[0, 1, 0],
[0, 1, 0],
[0, 1, 0],
[0, 0, 1]])
targets = np.array([1, 0, 1, 1, 0, 0])
def convert_obs_to_proba(features, targets):
features_ = []
targets_ = []
# compute unique rows (idx will point to some representative)
b = np.ascontiguousarray(features).view(np.dtype((np.void, features.dtype.itemsize * features.shape[1])))
_, idx = np.unique(b, return_index=True)
idx = idx[::-1]
# calculate ZERO class occurences and ONE class occurences
for k in idx:
unique_row = features[k]
zeros = 0
ones = 0
for i, row in enumerate(features[:]):
if np.array_equal(row, unique_row):
if targets[i] == 0:
zeros += 1
else:
ones += 1
proba = float(ones) / float(zeros + ones)
features_.append(unique_row)
targets_.append(proba)
return np.array(features_), np.array(targets_)
features_, targets_ = convert_obs_to_proba(features, targets)
print(features_)
print(targets_)
It's easy using Pandas:
df = pd.DataFrame(features)
df['targets'] = targets
Now you have:
0 1 2 targets
0 1 1 0 1
1 1 1 0 0
2 0 1 0 1
3 0 1 0 1
4 0 1 0 0
5 0 0 1 0
Now, the fancy part:
df.groupby([0,1,2]).targets.mean()
Gives you:
0 1 2
0 0 1 0.000000
1 0 0.666667
1 1 0 0.500000
Name: targets, dtype: float64
Pandas doesn't print the 0 at the leftmost part of the 0.666 row, but if you inspect the value there, it is indeed 0.
np.sum(np.reshape([targets[f] if tuple(features[f])==tuple(i) else 0 for i in np.vstack(set(map(tuple,features))) for f in range(features.shape[0])],features.shape[::-1]),axis=1)/np.sum(np.reshape([1 if tuple(features[f])==tuple(i) else 0 for i in np.vstack(set(map(tuple,features))) for f in range(features.shape[0])],features.shape[::-1]),axis=1)
Here you go, numpy magic! Although unnecceserily so, this could probably be cleaned up using some boring variables ;)
(And this is probably far from optimal)

Categories

Resources