Numpy pack bits into 32-bit little-endian values - python

Numpy provides packbits function to convert from values to individual bits. With bitorder='little' I can read them in C as uint8_t values without issues. However, I would like to read them as uint32_t values. This means that I have to reverse the order of each 4 bytes.
I tried to use
import numpy as np
array = np.array([1,0,1,1,0,1,0,1,0,1,0,0,1,0,1,1,0,0,1,1,0,1,0,1,1,0,0,1,0,1,0,1,
1,0,0,1,1,0,1,0,1,1,0,0,1,1,1,0,0,1])
array = np.packbits(array, bitorder='little')
array.dtype = np.uint32
array.byteswap(inplace=True)
print(array)
but have the following error:
Traceback (most recent call last):
File "sample.py", line 5, in <module>
array.dtype = np.uint32
ValueError: When changing to a larger dtype, its size must be a divisor of the total size in bytes of the last axis of the array.
I have 50 bits in the input. The first chunk of 32 bits written in the little-endian format (earliest input bit is the least significant bit) are 0b10101001101011001101001010101101 = 2846675629, the second is 0b100111001101011001 = 160601. So the expected output is
[2846675629 160601]

My first answer fixes the exception.
This answer, relies on this and this
Pad the array from the right to the nearest power of 2
Reshape to have some arrays, each array of size 32
Pack bits PER ARRAY and only then view as unit32.
import numpy as np
import math
# https://stackoverflow.com/questions/49791312/numpy-packbits-pack-to-uint16-array
# https://stackoverflow.com/questions/36534035/pad-0s-of-numpy-array-to-nearest-power-of-two/36534077
def next_power_of_2(number):
# Returns next power of two following 'number'
return 2**math.ceil(math.log(number, 2))
a = np.array([
1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1,
1, 0, 0, 1, 1, 0, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1
])
# a = np.array([
# 0 for _ in range(31)
# ] + [1])
padding_size = next_power_of_2(len(a)) - len(a)
b = np.concatenate([a, np.zeros(padding_size)])
c = b.reshape((-1, 32)).astype(np.uint8)
d = np.packbits(c, bitorder='little').view(np.uint32)
print(d)
output:
[2846675629 160601]

You can't use array.dtype = np.uint32 as you did, because numpy arrays have to be consecutive in memory.
Instead, you can create a new array of the new type.
import numpy as np
array = np.array([1,0,1,1,0,1,0,1,0,1,0,0,1,0,1,1,0,0,1,1,0,1,0,1,1,0,0,1,0,1,0,1,1,0,0,1,1,0,1,0,1,1,0,0,1,1,1,0,0,1])
array = np.packbits(array, bitorder='little')
array = np.array(array, dtype=np.uint32)
array.byteswap(inplace=True)
print(array)

Related

Optimal combination of linked-buckets

Let's say I have the following (always binary) options:
import numpy as np
a=np.array([1, 1, 0, 0, 1, 1, 1])
b=np.array([1, 1, 0, 0, 1, 0, 1])
c=np.array([1, 0, 0, 1, 0, 0, 0])
d=np.array([1, 0, 1, 1, 0, 0, 0])
And I want to find the optimal combination of the above that get's me to at least, with minimal above:
req = np.array([50,50,20,20,100,40,10])
For example:
final = X1*a + X2*b + X3*c + X4*d
Does this map to a known operational research problem? Or does it fall under mathematical programming?
Is this NP-hard, or exactly solveable in a reasonable amount of time (I've assumed it's combinatorally impossible to solve exactly)
Are there know solutions to this?
Note: The actual length of arrays are longer - think ~50, and the number of options are ~20
My current research has led me to some combination of the assignment problem and knapsack, but not too sure.
It's a covering problem, easily solvable using an integer program solver (I used OR-Tools below). If the X variables can be fractional, substitute NumVar for IntVar. If the X variables are 0--1, substitute BoolVar.
import numpy as np
a = np.array([1, 1, 0, 0, 1, 1, 1])
b = np.array([1, 1, 0, 0, 1, 0, 1])
c = np.array([1, 0, 0, 1, 0, 0, 0])
d = np.array([1, 0, 1, 1, 0, 0, 0])
opt = [a, b, c, d]
req = np.array([50, 50, 20, 20, 100, 40, 10])
from ortools.linear_solver import pywraplp
solver = pywraplp.Solver.CreateSolver("SCIP")
x = [solver.IntVar(0, solver.infinity(), "x{}".format(i)) for i in range(len(opt))]
extra = [solver.NumVar(0, solver.infinity(), "y{}".format(j)) for j in range(len(req))]
for j, (req_j, extra_j) in enumerate(zip(req, extra)):
solver.Add(extra_j == sum(opt_i[j] * x_i for (opt_i, x_i) in zip(opt, x)) - req_j)
solver.Minimize(sum(extra))
status = solver.Solve()
if status == pywraplp.Solver.OPTIMAL:
print("Solution:")
print("Objective value =", solver.Objective().Value())
for i, x_i in enumerate(x):
print("x{} = {}".format(i, x[i].solution_value()))
else:
print("The problem does not have an optimal solution.")
Output:
Solution:
Objective value = 210.0
x0 = 40.0
x1 = 60.0
x2 = -0.0
x3 = 20.0

Complex index numpy array or indexing dataframe

I have an array (dataframe) with shape 9800, 9800. I need to index it (without labels) like:
x = (9800,9800)
a = x[0:7000,0:7000] (plus) x[7201:9800, 0:7000] (plus) x[0:7000, 7201:9800] (plus) x[7201:9800, 7201:9800]
b = x[7000:7200, 7000:7200]
c = x[7000:7200, 0:7000] (plus) x[7000:7200, 7201:9800]
d = x[0:7000, 7000:7200] (plus) x[7201:9800, 7000:7200]
What I mean by plus, is not a proper addition but more like a concatenation. Like putting the resulting dataframes together one next to the other. See attached image.
Is there any "easy" way of doing this? I need to replicate this to 10,000 dataframes and add them up individually to save memory.
You have np.r_, which basically creates an index array for you, for example:
np.r_[:3,4:6]
gives
array([0, 1, 2, 4, 5])
So in your case:
a_idx = np.r_[0:7000,7200:9000]
a = x[a_idx, a_idx]
c = x[7000:7200, a_idx]
In [167]: x=np.zeros((9800,9800),'int8')
The first list of slices:
In [168]: a = [x[0:7000,0:7000], x[7201:9800, 0:7000],x[0:7000, 7201:9800], x[7201:9800, 7201:9800]]
and their shapes:
In [169]: [i.shape for i in a]
Out[169]: [(7000, 7000), (2599, 7000), (7000, 2599), (2599, 2599)]
Since the shapes vary, you can't simply concatenate them all:
In [170]: np.concatenate(a, axis=0)
Traceback (most recent call last):
File "<ipython-input-170-c111dc665509>", line 1, in <module>
np.concatenate(a, axis=0)
File "<__array_function__ internals>", line 5, in concatenate
ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 7000 and the array at index 2 has size 2599
In [171]: np.concatenate(a, axis=1)
Traceback (most recent call last):
File "<ipython-input-171-227af3749524>", line 1, in <module>
np.concatenate(a, axis=1)
File "<__array_function__ internals>", line 5, in concatenate
ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 0, the array at index 0 has size 7000 and the array at index 1 has size 2599
You can concatenate subsets:
In [172]: np.concatenate(a[:2], axis=0)
Out[172]:
array([[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
...,
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0]], dtype=int8)
In [173]: _.shape
Out[173]: (9599, 7000)
I won't take the time to construct the other lists, but it looks like you could construct the first column with
np.concatenate([a[0], c[0], a[1]], axis=0)
similarly for the other columns, and then concatenate columns. Or join them by rows first.
np.block([[a[0],d[0],a[2]],[....]]) with an appropriate mix of list elements should do the same (just a difference in notation, same concatenation work).

calculate sum of Nth column of numpy array entry grouped by the indices in first two columns?

I would like to loop over following check_matrix in such a way that code recognize whether the first and second element is 1 and 1 or 1 and 2 etc? Then for each separate class of pair i.e. 1,1 or 1,2 or 2,2, the code should store in the new matrices, the sum of last element (which in this case has index 8) times exp(-i*q(check_matrix[k][2:5]-check_matrix[k][5:8])), where i is iota (complex number), k is the running index on check_matrix and q is a vector defined as given below. So there are 20 q vectors.
import numpy as np
q= []
for i in np.linspace(0, 10, 20):
q.append(np.array((0, 0, i)))
q = np.array(q)
check_matrix = np.array([[1, 1, 0, 0, 0, 0, 0, -0.7977, -0.243293],
[1, 1, 0, 0, 0, 0, 0, 1.5954, 0.004567],
[1, 2, 0, 0, 0, -1, 0, 0, 1.126557],
[2, 1, 0, 0, 0, 0.5, 0.86603, 1.5954, 0.038934],
[2, 1, 0, 0, 0, 2, 0, -0.7977, -0.015192],
[2, 2, 0, 0, 0, -0.5, 0.86603, 1.5954, 0.21394]])
This means in principles I will have to have 20 matrices of shape 2x2, corresponding to each q vector.
For the moment my code is giving only one matrix, which appears to be the last one, even though I am appending in the Matrices. My code looks like below,
for i in range(2):
i = i+1
for j in range(2):
j= j +1
j_list = []
Matrices = []
for k in range(len(check_matrix)):
if check_matrix[k][0] == i and check_matrix[k][1] == j:
j_list.append(check_matrix[k][8]*np.exp(-1J*np.dot(q,(np.subtract(check_matrix[k][2:5],check_matrix[k][5:8])))))
j_11 = np.sum(j_list)
I_matrix[i-1][j-1] = j_11
Matrices.append(I_matrix)
I_matrix is defined as below:
I_matrix= np.zeros((2,2),dtype=np.complex_)
At the moment I get following output.
Matrices = [array([[-0.66071446-0.77603624j, -0.29038112+2.34855023j], [-0.31387562-0.08116629j, 4.2788 +0.j ]])]
But, I desire to get a matrix corresponding to each q value meaning that in total there should be 20 matrices in this case, where each 2x2 matrix element would be containing sums such that elements belong to 1,1 and 1,2 and 2,2 pairs in following manner
array([[11., 12.],
[21., 22.]])
I shall highly appreciate your suggestion to correct it. Thanks in advance!
I am pretty sure you can solve this problem in an easier way and I am not 100% sure that I understood you correctly, but here is some code that does what I think you want. If you have a possibility to check if the results are valid, I would suggest you do so.
import numpy as np
n = 20
q = np.zeros((20, 3))
q[:, -1] = np.linspace(0, 10, n)
check_matrix = np.array([[1, 1, 0, 0, 0, 0, 0, -0.7977, -0.243293],
[1, 1, 0, 0, 0, 0, 0, 1.5954, 0.004567],
[1, 2, 0, 0, 0, -1, 0, 0, 1.126557],
[2, 1, 0, 0, 0, 0.5, 0.86603, 1.5954, 0.038934],
[2, 1, 0, 0, 0, 2, 0, -0.7977, -0.015192],
[2, 2, 0, 0, 0, -0.5, 0.86603, 1.5954, 0.21394]])
check_matrix[:, :2] -= 1 # python indexing is zero based
matrices = np.zeros((n, 2, 2), dtype=np.complex_)
for i in range(2):
for j in range(2):
k_list = []
for k in range(len(check_matrix)):
if check_matrix[k][0] == i and check_matrix[k][1] == j:
k_list.append(check_matrix[k][8] *
np.exp(-1J * np.dot(q, check_matrix[k][2:5]
- check_matrix[k][5:8])))
matrices[:, i, j] = np.sum(k_list, axis=0)
NOTE: I changed your indices to have consistent
zero-based indexing.
Here is another approach where I replaced the k-loop with a vectored version:
for i in range(2):
for j in range(2):
k = np.logical_and(check_matrix[:, 0] == i, check_matrix[:, 1] == j)
temp = np.dot(check_matrix[k, 2:5] - check_matrix[k, 5:8], q[:, :, np.newaxis])[..., 0]
temp = check_matrix[k, 8:] * np.exp(-1J * temp)
matrices[:, i, j] = np.sum(temp, axis=0)
3 line solution
You asked for efficient solution in your original title so how about this solution that avoids nested loops and if statements in a 3 liner, which is thus hopefully faster?
fac=2*(check_matrix[:,0]-1)+(check_matrix[:,1]-1)
grp=np.split(check_matrix[:,8], np.cumsum(np.unique(fac,return_counts=True)[1])[:-1])
[np.sum(x) for x in grp]
output:
[-0.23872600000000002, 1.126557, 0.023742000000000003, 0.21394]
How does it work?
I combine the first two columns into a single index, treating each as "bits" (i.e. base 2)
fac=2*(check_matrix[:,0]-1)+(check_matrix[:,1]-1)
( If you have indexes that exceed 2, you can still use this technique but you will need to use a different base to combine the columns. i.e. if your indices go from 1 to 18, you would need to multiply column 0 by a number equal to or larger than 18 instead of 2. )
So the result of the first line is
array([0., 0., 1., 2., 2., 3.])
Note as well it assumes the data is ordered, that one column changes fastest, if this is not the case you will need an extra step to sort the index and the original check matrix. In your example the data is ordered.
The next step groups the data according to the index, and uses the solution posted here.
np.split(check_matrix[:,8], np.cumsum(np.unique(fac,return_counts=True)[1])[:-1])
[array([-0.243293, 0.004567]), array([1.126557]), array([ 0.038934, -0.015192]), array([0.21394])]
i.e. it outputs the 8th column of check_matrix according to the grouping of fac
then the last line simply sums those... knowing how the first two columns were combined to give the single index allows you to map the result back. Or you could simply add it to check matrix as a 9th column if you wanted.

Detecting egde on square wave

I have two lists, one for time and other for amplitude.
time = [0, 1, 2, 3, 6, 7, 10, 11, 13, 15, 16, 17, 18, 20] # (seconds for example) the step isn't fixed
ampli = [0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0] # ugly space to facilitate the reading
I want to know when there's a change from '0' to '1' or vice-versa, but I only care if the event happens after verify_time = X. So, if verify_time = 12.5 it would return time[8] = 13 and time[10] = 16.
What I have so far is:
time = [0, 1, 2, 3, 6, 7, 10, 11, 13, 15, 16, 17, 18, 20] # (seconds for example) the step isn't fixed
ampli = [0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0] # ugly spacing to facilitate the reading
verify_time = 12.5
start_end = []
for i, (t, a) in enumerate(zip(time, ampli)):
if t >= verify_time: # should check the values from here
if ampli[i-1] and (a != ampli[i-1]): # there's a change from 0 to 1 or vice-versa
start_end.append(i)
print(f"Start: {time[start_end[0]]}s")
print(f"End: {time[start_end[1]]}s")
This will print:
Start: 13s
End: 17s
Question 1) Shouldn't it print End: 16s? I'm kind of lost with this logic because the number of '1's is three (3).
Question 2) Is there another way to have the same results without using this for if if? I find it awkward, in Matlab I would use the diff() function
if you don't mind using numpy, it is easiest, also faster in larger lists, to find edges by calculating differences, unless your waves are taking gigabytes that goes out of memory
import numpy as np
verify_time = 12.5
time = np.array([0, 1, 2, 3, 6, 7, 10, 11, 13, 15, 16, 17, 18, 20])
ampli = np.array([0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0])
ind = time>verify_time
time = time[ind]
ampli = ampli[ind]
d_ampli = np.diff(ampli)
starts = np.where(d_ampli>0)[0]
ends = np.where(d_ampli<0)[0]-1
UPDATE
I forgot to change the diff properly, it should be d_ampli = np.diff(ampli, prepend=ampli[0]
UPDATE
As you noted, the original answer returns an empty start. The reason is that after filtering the ampli starts with [1, 1, ...] so there is no edge. A philosophical question arises here, does the edge really starts before 12.5 or after it? We don't know, and I'm kinda sure you won't care. What you want here is a backward differencing scheme that numpy does not allow, so we just trick it by shifting everything forward one index as:
import numpy as np
verify_time = 12.5
time = np.array([0, 1, 2, 3, 6, 7, 10, 11, 13, 15, 16, 17, 18, 20])
ampli = np.array([0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0])
d_ampli = np.r_[[0], np.diff(ampli)]
starts = np.where(d_ampli>0)[0]
ends = np.where(d_ampli<0)[0]-1
start = start[time[start]>verify_time]
ends = ends[time[ends]>verify_time]
start, ends
(array([8], dtype=int64), array([10], dtype=int64))
It prints 17s because you take note of the first value after the change, which is 17 for the first 0 after the end of the square wave.
I've simplified the logic into a list comprehension, so you it should make more sense:
assert len(time) == len(ampli)
start_end = [i for i in range(len(time)) if time[i] >= verify_time and ampli[i-1] is not None and (ampli[i] != ampli[i-1])]
print(f"Start: {time[start_end[0]]}s")
print(f"End: {time[start_end[1]]}s")
Also, you had an issue, where if ampli[i-1] was also False when it was 0. Fixed that too. It would be most accurate, if you took the average of time[start_end[0]] and time[start_end[0]-1], as all you know based on your resolution, that the transition occurred somewhere between the two samples.
I've made the below solution to have a straightforward algorithm. In summary, it goes as follows:
Convert lists to NumPy arrays
Find closest value in time array to verify_time, cut off all indexes that occur beforehand.
NumPys' "diff" method is great for finding rising and falling edges. Once those edges are found, we can use NumPys' "where" method to look up the indexes and then return the time found at the same indexes in the time array.
Coding Environment
Python 3.6 (Minimum Requirement for the print statements)
NumPy 1.15.2 (Older versions are probably fine)
import numpy as np
# inputs
time = [0, 1, 2, 3, 6, 7, 10, 11, 13, 15, 16, 17, 18, 20] # (seconds for example) the step isn't fixed
ampli = [0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0] # ugly spacing to facilitate the reading
verify_time = 12.5
# ------------------------------------------
# Solution
# Step 1) Convert lists to Numpy Arrays
npTime = np.array(time)
npAmplitude = np.array(ampli) # Amplitude
# Step 2) Find closest Value in time array to 'verify_time'.
# Strategy:
# i) Subtact 'verify_time' from each value in array. (Produces an array of Diffs)
# ii) The Diff that is nearest to zero, or better yet is zero is the best match for 'verify_time'
# iii) Get the array index of the Diff selected in step ii
# Step i
npDiffs = np.abs(npTime - float(verify_time))
# Step ii
smallest_value = np.amin(npDiffs)
# Step iii (Use numpy.where to lookup array index)
first_index_we_care_about = (np.where(npDiffs == smallest_value)[0])[0]
first_index_we_care_about = first_index_we_care_about - 1 # Below edge detection requires previous index
# Remove the beginning parts of the arrays that the question doesn't care about
npTime = npTime[first_index_we_care_about:len(npTime)]
npAmplitude = npAmplitude[first_index_we_care_about:len(npAmplitude)]
# Step 3) Edge Detection: Find the rising and falling edges
# Generates a 1 when rising edge is found, -1 for falling edges, 0s for no change
npEdges = np.diff(npAmplitude)
# For Reference
# Here you can see that numpy diff placed a 1 before all rising edges, and a -1 before falling
# ampli [ 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0]
# npEdges [ 0, 1, 0, -1, 0, 0, 0, 1, 0, 0, -1, 0, 0]
# Get array indexes where a 1 is found (I.e. A Rising Edge)
npRising_edge_indexes = np.where(npEdges == 1)[0]
# Get array indexes where a -1 is found (I.e. A Falling Edge)
npFalling_edge_indexes = np.where(npEdges == -1)[0]
# Print times that edges are found after 'verify_time'
# Note: Adjust edge detection index by '+1' to answer question correctly (yes this is consistent)
print(f'Start: {npTime[npRising_edge_indexes[0]+1]}s')
print(f'End: {npTime[npFalling_edge_indexes[0]+1]}s')
Output
Start: 13s
End: 17s

Calculate number of equal neighbouring cells within a numpy array

I have 2d binary numpy arrays of varying size, which contain certain patterns.
Just like this:
import numpy
a = numpy.zeros((6,6), dtype=numpy.int)
a[1,2] = a[1,3] = 1
a[4,4] = a[5,4] = a[4,3] = 1
Here the "image" contains two patches one with 2 and one with 3 connected cells.
print a
array([[0, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 1, 0],
[0, 0, 0, 0, 1, 0]])
I want to know how often a non-zero cell borders another non-zero cell ( neighbours defined as rook's case, so the cells to the left, right, below and above each cell) including their pseudo-replication (so vice-versa).
A previous approach for inner boundaries returns wrong values (5) as it was intended to calculate outer boundaries.
numpy.abs(numpy.diff(a, axis=1)).sum()
So for the above test array, the correct total result would be 6 (The upper patch has two internal borders, the lower four ).
Grateful for any tips!
EDIT:
Mistake: The lower obviously has 4 internal edges (neighbouring cells with the same value)
Explained the desired neighbourhood a bit more
I think the result is 8 if it's 8-connected neighborhood. Here is the code:
import numpy
a = numpy.zeros((6,6), dtype=numpy.int)
a[1,2] = a[1,3] = 1
a[4,4] = a[5,4] = a[4,3] = 1
from scipy.ndimage import convolve
kernel = np.ones((3, 3))
kernel[1, 1] = 0
b = convolve(a, kernel, mode="constant")
b[a != 0].sum()
but you said rook's case.
edit
Here is the code for 4-connected neighborhood:
import numpy as np
a = np.zeros((6,6), dtype=np.int)
a[1,2] = a[1,3] = 1
a[4,4] = a[5,4] = a[4,3] = 1
from scipy import ndimage
kernel = ndimage.generate_binary_structure(2, 1)
kernel[1, 1] = 0
b = convolve(a, kernel, mode="constant")
b[a != 0].sum()

Categories

Resources