Setting indicators based in index per row in numpy - python

I am looking for an efficient way to set a indicators from zero to a known number (which differs for each row).
e.g.
a =
array([[1, 1, 1, 0, 0, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 1, 0, 0, 0],
[1, 1, 0, 0, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 1, 1, 1, 0]])
and I know the vector with the index when a goes from 1 to zero.
b = [3, 1, 6, 2, 8]
Rather than filling all the rows of a using a for-loop, I want to know if there is a fast way to set these indicators.

Use outer-comparison on ranged array vs. b -
In [16]: ncols = 9
In [17]: b
Out[17]: [3, 1, 6, 2, 8]
In [19]: np.greater.outer(b,np.arange(ncols)).view('i1')
Out[19]:
array([[1, 1, 1, 0, 0, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 1, 0, 0, 0],
[1, 1, 0, 0, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 1, 1, 1, 0]], dtype=int8)
Other similar ways to express the same -
(np.asarray(b)[:,None] > np.arange(ncols)).view('i1')
(np.asarray(b)[:,None] > np.arange(ncols)).astype(int)
With b being an array, simplifies further, as we can skip the array conversion with np.asarray(b).

Simplest way I can think of is:
result=[]
for row in array:
result.append(row.tolist().index(0))
print(result)
[3, 1, 6, 2, 8]
The reason this works is, that list has a method called index, which tells the first occurrence of a specific item in the list. So I am iterating over this 2-dimentional array, converting each of it to list and using index of 0 on each.
You can store these values into another list and append to it for each row and that's it.

You can use broadcasting to do an outer comparison:
b = np.asarray([3, 1, 6, 2, 8])
a = (np.arange(b.max() + 1) < b[:, None]).astype(int)
# array([[1, 1, 1, 0, 0, 0, 0, 0, 0],
# [1, 0, 0, 0, 0, 0, 0, 0, 0],
# [1, 1, 1, 1, 1, 1, 0, 0, 0],
# [1, 1, 0, 0, 0, 0, 0, 0, 0],
# [1, 1, 1, 1, 1, 1, 1, 1, 0]])

Related

Turning a list into list of lists

I am writing a function which takes columns=c and rows=r (both can be unequal!) and that should a list of lists, where each row is a list containing c elements, all rows within a list. How do I create such sublists given the list below?
list = [0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1]
should return:
[[0, 0, 0, 0, 0], [1, 1, 0, 1, 1], [0, 0, 1, 1, 1], [1, 1, 1, 1, 0], [0, 1, 0, 1, 1]]
I tried to use split() however it seems like it works for strings only.
Numpy:
import numpy
c, r = 4, 5
list_ = [0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0]
numpy.array(list_).reshape(c, r).tolist()
#out (shortened example list to avoid 5x5):
[[0, 0, 0, 0, 0], [1, 1, 0, 1, 1], [0, 0, 1, 1, 1], [1, 1, 1, 1, 0]]
However, if your goal is to create "an cxr array with zeroes and ones", you should better use:
numpy.random.randint(0, high=2, size=(c, r))
# out
array([[1, 1, 1, 0, 0],
[1, 1, 0, 0, 0],
[0, 1, 1, 1, 0],
[1, 0, 0, 1, 0]])
Use itertools.islice: (Also don't use list as a variable name. It replaces the builtin function)
from itertools import islice
def chunker(data, rows, cols):
d = iter(data)
return [list(islice(d, cols)) for row in range(rows)]
data = [0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1]
result = chunker(data, 4, 5)
Result:
[[0, 0, 0, 0, 0],
[1, 1, 0, 1, 1],
[0, 0, 1, 1, 1],
[1, 1, 1, 1, 0]]
You can use a list comprehension:
c, r = 4, 5
list = [0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1]
list_of_lists = [list[i - c: i] for i in range(c, len(list), c)]
l= [0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1]
print([L[i:i+4] for i in range(0,len(L),4)])
output:
[[0, 0, 0, 0], [0, 1, 1, 0], [1, 1, 0, 0], [1, 1, 1, 1], [1, 1, 1, 0], [0, 1, 0, 1], [1]]
using slicing and list comprehension.
new_list=[list[i:i+5] for i in range(len(list)//5)]
just do this like it,it will be done.
a sample usage screenshot
Try this:
ls = [0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1]
[ls[i*5:i*5+5] for i in range(len(ls)//5)]
Out[1]:
[[0, 0, 0, 0, 0],
[0, 0, 0, 0, 1],
[0, 0, 0, 1, 1],
[0, 0, 1, 1, 0],
[0, 1, 1, 0, 1]]
Or as a function:
def split_list(list, length):
return [list[i*length:i*length+length] for i in range((len(list)//length))]
split_list(ls, 5)

How can I apply a user-defined row function onto all rows of a loaded in array?

I have written a user defined function that loops through a row of values in order to give the number of zeros between values (distance between values). Those distances are appended into a list and then averaged for a final value of average distance between values. The function works great when I load in a CSV file with just one row of values. However, I would like to be able to apply the function to a file with multiple rows, and then report the output of each row into a dataframe.
This is all being run with python 3.7. I attempted to create a nested loop in order apply the function manually. I have tried the numpy.apply_along_axis function. I have also tried reading the file in as a pandas dataframe, and then using the .apply() function. However, I am a bit unfamiliar with pandas, and when I replaced the numpy indexing in the function with pandas indexing, I began to generate multiple errors.
When I load in a larger CSV file and try to apply it to file[0] for example, the function does not work. It seems to work only when I load in a file with one row of values.
def avg_dist():
import statistics as st
dist = []
ctr=0
#distances between events
for i in range(len(n)):
if n[i] > 0 and i < (len(n)-1):
if n[i+1]==0:
i+=1
while n[i]==0 and i < (len(n)-1):
ctr+=1
i+=1
dist.append(ctr)
ctr=0
else:
i+=1
else:
i+=1
#Average distance between events
aved = st.mean(dist)
return(aved)
The latest response is at the end of the answer. There have been several edits.
The very end (4th edit) of the answer has a completely new approach.
I'm not certain what you're trying to do but hopefully this can help.
import numpy as np
# Generate some events
events = np.random.rand(3,12)*10.
events *= np.random.randint(5, size=(3,12))<1
events
Out[36]:
array([[ 0. , 0. , 0. , 0. , 0. ,
0. , 5.35598205, 0. , 0. , 0. ,
0. , 0. ],
[ 0. , 6.65094145, 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 6.04581361],
[ 6.88119682, 4.31178109, 0. , 0. , 0. ,
0. , 0. , 1.16999289, 0. , 0. ,
0. , 0. ]])
# generate a boolean array of events. (as int for a compact print.)
an_event = (events != 0).astype(np.int)
n_event
Out[37]:
array([[0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
[1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0]])
def event_range(arr):
from_start = arr.cumsum(axis=1)
from_end = np.flip(arr, axis=1).cumsum(axis=1)
from_end = np.flip(from_end, axis=1)
return np.logical_and(from_start, from_end).astype(np.int)
event_range function step by step.
from_start is the cumsum of an_event. zero before any event, >0 after that.
from_start = an_event.cumsum(axis=1) # cumsum the event count. zeros before the first event.
from_start
Out[40]:
array([[0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1], # zeroes before the first event.
[0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2],
[1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3]], dtype=int32)
from_end is the cumsum of an_event but from max to min. Therefore zero after the last event.
from_end = np.flip(an_event, axis=1).cumsum(axis=1) # cumsum of reversed arrays
from_end = np.flip(from_end, axis=1) # reverse the result.
from_end
Out[41]:
array([[1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0], # zero after the last event.
[2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[3, 2, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0]], dtype=int32)
logically anding these together to get, zeroes before the first event, ones after that and zeroes after the last event.
ev_range = np.logical_and(from_start, from_end).astype(np.int)
ev_range
Out[42]:
# zero before first and after last event, one between the events.
array([[0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0],
[0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0]])
n_range = ev_range.sum(axis=1)
n_range
Out[43]: array([ 1, 11, 8])
n_events = an_event.sum(axis=1)
n_events
Out[44]: array([1, 2, 3])
avg = n_range / n_events
avg
Out[45]: array([ 1. , 5.5 , 2.66666667])
Should avg be n_range/ (n_events-1)? i.e. count the gaps, not the events.
What would you expect for only one event in a row? What for zero events in a row?
Edit following comments
To count gaps longer than zero gets a bit involved. The easiest is probably to take the differences for consecutive columns. Where these are -1 there is a 1 followed by a zero. You need to add a final zero column to your data in case the last column has an event in it.
np.random.seed(10)
test = 1*(np.random.randint(4, size=(4,12))<1)
test
Out[24]:
array([[0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0],
[0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1],
[0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0]])
temp = np.diff(test, axis=-1)
temp
Out[26]:
array([[ 0, 1, -1, 1, -1, 0, 1, -1, 0, 1, -1],
[ 0, 1, -1, 1, -1, 1, -1, 1, -1, 1, 0],
[ 0, 0, 1, -1, 0, 0, 0, 1, 0, -1, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 1, -1, 0]])
np.where(temp<0, 1,0)
Out[28]:
array([[0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1],
[0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0],
[0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0]])
In [29]: np.where(temp<0, 1,0).sum(axis=-1)-1
Out[29]: array([3, 3, 1, 0]) # should be [3, 4, 1, 0]
Add a column of zeros to test.
test = np.hstack((test, np.zeros((4,1), dtype = np.int)))
test
Out[31]:
array([[0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0],
[0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0],
[0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0]])
temp=np.diff(test, axis=-1)
temp
Out[35]:
array([[ 0, 1, -1, 1, -1, 0, 1, -1, 0, 1, -1, 0],
[ 0, 1, -1, 1, -1, 1, -1, 1, -1, 1, 0, -1], # An extra -1 here.
[ 0, 0, 1, -1, 0, 0, 0, 1, 0, -1, 0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 1, -1, 0, 0]])
np.where(temp<0, 1,0).sum(axis=-1)-1
Out[36]: array([3, 4, 1, 0])
As I said, a bit involved. It may be easier to loop through but this should be faster if more difficult to understand.
2nd Edit following another idea.
import numpy as np
np.random.seed(10)
test = 1*(np.random.randint(4, size=(4,12))<1)
test
Out[2]:
array([[0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0],
[0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1],
[0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0]])
temp = np.diff(test, axis=-1)
np.where(temp<0, 1, 0).sum(axis=-1)+test[:,-1]-1
# +test[:,-1] adds the last column to include any 1's from there.
Out[4]: array([3, 4, 1, 0])
3rd Edit
Thinking this through I created 2 functions, I also show a do_divide which copes with divide by zero.
import numpy as np
def zero_after_last_event(arr):
"""
Returns an array set to zero in all cells after the last event
"""
from_end = np.flip(arr, axis=-1).cumsum(axis=-1) # cumsum of reversed arrays
from_end = np.flip(from_end, axis=-1) # reverse the result.
from_end[from_end>0] = 1 # gt zero set to 1
return from_end
def event_range(arr):
""" event_range is zero before the first event,
zero after the last event and 1 elsewhere. """
return np.logical_and(arr.cumsum(axis=-1), zero_after_last_event(arr)).astype(np.int)
def do_divide(a, b):
""" Does a protected divide. Returns zero for divide by zero """
with np.errstate(invalid='ignore'): # Catch divide by zero
result = a / b
result[~np.isfinite(result)] = 0.
return result
Set up a test array
np.random.seed(10)
events = 1*(np.random.randint(4, size=(4,12))<1)
events
Out[15]:
array([[0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0],
[0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1],
[0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0]])
With the functions and data above above this follows.
# Count gap lengths
gaps = 1 - events # invert the values in events (1->0, 0->1)
gaps = np.logical_and(gaps, event_range(events)).astype(np.int)
gaps
Out[19]:
array([[0, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0],
[0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0],
[0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])
sumgaps = gaps.sum(axis = -1)
sumgaps
Out[22]: array([5, 4, 4, 0])
# Count how many gaps
temp = np.diff(events, axis=-1) # temp is -1 when an event isn't immediately followed by another event.
n_gaps = np.where(temp<0, 1, 0).sum(axis=-1)+events[:,-1]-1
# +test[:,-1] adds the last column to include any 1's from there.
n_gaps
Out[23]: array([3, 4, 1, 0])
do_divide(sum_gaps, n_gaps)
Out[21]: array([1.66666667, 1. , 4. , 0. ])
4th Edit - using np.bincount
import numpy as np
def do_divide(a, b):
""" Does a protected divide. Returns zero for divide by zero """
with np.errstate(invalid='ignore'): # Catch divide by zero
result = a / b
result[~np.isfinite(result)] = 0.
return result
np.random.seed(10)
events = 1*(np.random.randint(4, size=(4,12))<1)
cumulative = events.cumsum(axis=1)
cumulative
Out[2]:
array([[0, 0, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4],
[0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 6],
[0, 0, 0, 1, 1, 1, 1, 1, 2, 3, 3, 3],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1]])
bin_count_len = 1+cumulative.max() # Biggest bins length required.
result = np.zeros((cumulative.shape[0], bin_count_len), dtype=np.int)
for ix, row in enumerate(cumulative):
result[ix] = np.bincount( row, minlength = bin_count_len )
result
Out[4]:
array([[2, 2, 3, 3, 2, 0, 0],
[2, 2, 2, 2, 2, 1, 1],
[3, 5, 1, 3, 0, 0, 0],
[9, 3, 0, 0, 0, 0, 0]])
Lose column 0. Its before any events. Lose the last column, always after the last event. The gaps include the opening event, -1 removes it from the gap size.
temp = result[:, 1:-1]-1 #
temp
Out[6]:
array([[ 1, 2, 2, 1, -1],
[ 1, 1, 1, 1, 0],
[ 4, 0, 2, -1, -1],
[ 2, -1, -1, -1, -1]])
Set any cell temp[r, n] = 0 if temp[r, n+1]==0
temp_lag = (result[:, 2:]>0)*1
temp_lag
Out[8]:
array([[1, 1, 1, 0, 0],
[1, 1, 1, 1, 1],
[1, 1, 0, 0, 0],
[0, 0, 0, 0, 0]])
temp *= temp_lag
temp
Out[10]:
array([[1, 2, 2, 0, 0],
[1, 1, 1, 1, 0],
[4, 0, 0, 0, 0],
[0, 0, 0, 0, 0]])
tot_gaps = temp.sum(axis=1)
n_gaps = np.count_nonzero(temp, axis=1)
tot_gaps, n_gaps
Out[13]: (array([5, 4, 4, 0]), array([3, 4, 1, 0]))
do_divide(tot_gaps, n_gaps)
Out[14]: array([1.66666667, 1. , 4. , 0. ])
HTH

Error shuffling list into new list

I am creating a list by shifting an old list out_g, item by item, and appending the result to the new one, new_sets. As I am iterating, I check the resulting shift, and it is correct. After this is complete, I print out the new list, and it is all a single object repeated. What am I missing?
The error occurs during the for loop at the end, where I append the results to new_sets.
#!/usr/bin/python
import math
def LFSR(register, feedback, output):
"""
https://natronics.github.io/blag/2014/gps-prn/
:param list feedback: which positions to use as feedback (1 indexed)
:param list output: which positions are output (1 indexed)
:returns output of shift register:
"""
# calculate output
out = [register[i-1] for i in output]
if len(out) > 1:
out = sum(out) % 2
else:
out = out[0]
# modulo 2 add feedback
fb = sum([register[i-1] for i in feedback]) % 2
# shift to the right
for i in reversed(range(len(register[1:]))):
register[i+1] = register[i]
# put feedback in position 1
register[0] = fb
return out
def shiftInPlace(l, n):
# https://stackoverflow.com/questions/2150108/efficient-way-to-shift-a-list-in-python
n = n % len(l)
head = l[:n]
l[:n] = []
l.extend(head)
return l
##########
## Main ##
##########
n = 3
# init register states
if n == 5 :
LFSR_A = [1,1,1,1,0]
LFSR_B = [1,1,1,0,1]
LFSR_A_TAPS =[5,4,3,2]
LFSR_B_TAPS =[5,3]
elif n == 7:
LFSR_A = [1,0,0,1,0,1,0]
LFSR_B = [1,0,0,1,1,1,0]
LFSR_A_TAPS = [7,3,2,1]
LFSR_B_TAPS = [7,3]
elif n == 3:
LFSR_A = [1,0,1]
LFSR_B = [0,1,1]
LFSR_A_TAPS = [3,2]
LFSR_B_TAPS = [3,1]
output_reg = [n]
N = 2**n-1
out_g = []
for i in range(0,N): #replace N w/ spread_fact
a = (LFSR(LFSR_A, LFSR_A_TAPS, output_reg))
b = (LFSR(LFSR_B, LFSR_B_TAPS, output_reg))
out_g.append(a ^ b)
# FOR BALANCED GOLD CODES NUMBER OF ONES MUST BE ONE MORE THAN NUMBER
# OF ZEROS
nzeros = sum(x == 0 for x in out_g)
nones = sum(x == 1 for x in out_g)
print "Gold Code Output Period[%d] of length %d -- {%d} 0's, {%d} 1's" % (N,N,nzeros,nones)
# produce all time shifted versions of the code
new_sets = []
for i in range(0,N-1):
new_sets.append(shiftInPlace(out_g,1))
# a=shiftInPlace(out_g,1)
# new_sets.append(a)
print new_sets[i]
print new_sets
My output :
Gold Code Output Period[7] of length 7 -- {3} 0's, {4} 1's
[1, 1, 0, 1, 0, 1, 0]
[1, 0, 1, 0, 1, 0, 1]
[0, 1, 0, 1, 0, 1, 1]
[1, 0, 1, 0, 1, 1, 0]
[0, 1, 0, 1, 1, 0, 1]
[1, 0, 1, 1, 0, 1, 0]
[[1, 0, 1, 1, 0, 1, 0], [1, 0, 1, 1, 0, 1, 0], [1, 0, 1, 1, 0, 1, 0], [1, 0, 1, 1, 0, 1, 0], [1, 0, 1, 1, 0, 1, 0], [1, 0, 1, 1, 0, 1, 0]]
Correct values are printing on the iteration, but the final list has all the same values.
The problem should be obvious from your output - you are seeing the same list because you are appending the same list. Consider - you even name your function "shift in place", so that returns a mutated version of the same list you passed in, and then you append that same list. So one quick fix is to make a copy which you end up appending:
new_sets = []
for i in range(0,N-1):
new_sets.append(shiftInPlace(out_g,1)[:]) # append copy
# a=shiftInPlace(out_g,1)
# new_sets.append(a)
print new_sets[i]
This gives the output:
Gold Code Output Period[7] of length 7 -- {3} 0's, {4} 1's
[1, 1, 0, 1, 0, 1, 0]
[1, 0, 1, 0, 1, 0, 1]
[0, 1, 0, 1, 0, 1, 1]
[1, 0, 1, 0, 1, 1, 0]
[0, 1, 0, 1, 1, 0, 1]
[1, 0, 1, 1, 0, 1, 0]
[[1, 1, 0, 1, 0, 1, 0], [1, 0, 1, 0, 1, 0, 1], [0, 1, 0, 1, 0, 1, 1], [1, 0, 1, 0, 1, 1, 0], [0, 1, 0, 1, 1, 0, 1], [1, 0, 1, 1, 0, 1, 0]]
As an aside, for efficient in-place rotations, consider changing your data-structure to a collections.deque, which implements a doubly-linked list:
In [10]: from collections import deque
...: d = deque([1, 1, 0, 1, 0, 1, 0])
...: print(d)
...: for i in range(0, N-1):
...: d.rotate(-1)
...: print(d)
...:
deque([1, 1, 0, 1, 0, 1, 0])
deque([1, 0, 1, 0, 1, 0, 1])
deque([0, 1, 0, 1, 0, 1, 1])
deque([1, 0, 1, 0, 1, 1, 0])
deque([0, 1, 0, 1, 1, 0, 1])
deque([1, 0, 1, 1, 0, 1, 0])
deque([0, 1, 1, 0, 1, 0, 1])
You might try creating your list of rotations like this:
>>> li=[1,0,1,1,0,0]
>>> [li[r:]+li[:r] for r in range(len(li))]
[[1, 0, 1, 1, 0, 0], [0, 1, 1, 0, 0, 1], [1, 1, 0, 0, 1, 0], [1, 0, 0, 1, 0, 1], [0, 0, 1, 0, 1, 1], [0, 1, 0, 1, 1, 0]]
... following up on my comment to juanpa's answer ...
When you append in this fashion, you append a reference to the in-place list. Your two-line code with variable a works the same way. You've appended 6 copies of the same variable reference; every time you shift the list, you shift the underlying object. All of the appended references point to that object.
Here's detailed output tracing your program. Note how all of the elements of new_sets change on every iteration. In my repair, I used the two-line assignment, but added a copy like this: new_sets.append(a[:])
Gold Code Output Period[7] of length 7 -- {3} 0's, {4} 1's
TRACE out_g = [0, 1, 1, 0, 1, 0, 1]
ENTER shiftInPlace, l= [0, 1, 1, 0, 1, 0, 1]
LEAVE shiftInPlace, head= [0] l= [1, 1, 0, 1, 0, 1, 0]
TRACE a= [1, 1, 0, 1, 0, 1, 0] new_sets= [[1, 1, 0, 1, 0, 1, 0]]
TRACE out_g = [1, 1, 0, 1, 0, 1, 0]
ENTER shiftInPlace, l= [1, 1, 0, 1, 0, 1, 0]
LEAVE shiftInPlace, head= [1] l= [1, 0, 1, 0, 1, 0, 1]
TRACE a= [1, 0, 1, 0, 1, 0, 1] new_sets= [[1, 0, 1, 0, 1, 0, 1], [1, 0, 1, 0, 1, 0, 1]]
TRACE out_g = [1, 0, 1, 0, 1, 0, 1]
ENTER shiftInPlace, l= [1, 0, 1, 0, 1, 0, 1]
LEAVE shiftInPlace, head= [1] l= [0, 1, 0, 1, 0, 1, 1]
TRACE a= [0, 1, 0, 1, 0, 1, 1] new_sets= [[0, 1, 0, 1, 0, 1, 1], [0, 1, 0, 1, 0, 1, 1], [0, 1, 0, 1, 0, 1, 1]]
TRACE out_g = [0, 1, 0, 1, 0, 1, 1]
ENTER shiftInPlace, l= [0, 1, 0, 1, 0, 1, 1]
LEAVE shiftInPlace, head= [0] l= [1, 0, 1, 0, 1, 1, 0]
TRACE a= [1, 0, 1, 0, 1, 1, 0] new_sets= [[1, 0, 1, 0, 1, 1, 0], [1, 0, 1, 0, 1, 1, 0], [1, 0, 1, 0, 1, 1, 0], [1, 0, 1, 0, 1, 1, 0]]
TRACE out_g = [1, 0, 1, 0, 1, 1, 0]
ENTER shiftInPlace, l= [1, 0, 1, 0, 1, 1, 0]
LEAVE shiftInPlace, head= [1] l= [0, 1, 0, 1, 1, 0, 1]
TRACE a= [0, 1, 0, 1, 1, 0, 1] new_sets= [[0, 1, 0, 1, 1, 0, 1], [0, 1, 0, 1, 1, 0, 1], [0, 1, 0, 1, 1, 0, 1], [0, 1, 0, 1, 1, 0, 1], [0, 1, 0, 1, 1, 0, 1]]
TRACE out_g = [0, 1, 0, 1, 1, 0, 1]
ENTER shiftInPlace, l= [0, 1, 0, 1, 1, 0, 1]
LEAVE shiftInPlace, head= [0] l= [1, 0, 1, 1, 0, 1, 0]
TRACE a= [1, 0, 1, 1, 0, 1, 0] new_sets= [[1, 0, 1, 1, 0, 1, 0], [1, 0, 1, 1, 0, 1, 0], [1, 0, 1, 1, 0, 1, 0], [1, 0, 1, 1, 0, 1, 0], [1, 0, 1, 1, 0, 1, 0], [1, 0, 1, 1, 0, 1, 0]]
[[1, 0, 1, 1, 0, 1, 0], [1, 0, 1, 1, 0, 1, 0], [1, 0, 1, 1, 0, 1, 0], [1, 0, 1, 1, 0, 1, 0], [1, 0, 1, 1, 0, 1, 0], [1, 0, 1, 1, 0, 1, 0]]

scipy.ndimage.label: include error margin

After reading an interesting topic on scipy.ndimage.label (Variable area threshold for identifying objects - python), I'd like to include an 'error margin' in the labelling.
In the above linked discussion:
How can the blue dot on top be included, too (let's say it is wrongly disconnected from the orange, biggest, object)?
I found the structure attribute, which should be able to include that dot by changing the array (from np.ones(3,3,3) to anything more than that (I'd like it to be 3D). However, adjusting the 'structure' attribute to a larger array does not seem to work, unfortunately. It either gives an error of dimensions (RuntimeError: structure and input must have equal rank
) or it does not change anything..
Thanks!
this is the code:
labels, nshapes = ndimage.label(a, structure=np.ones((3,3,3)))
in which a is a 3D array.
Here's a possible approach that uses scipy.ndimage.binary_dilation. It is easier to see what is going on in a 2D example, but I'll show how to generalize to 3D at the end.
In [103]: a
Out[103]:
array([[0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[1, 1, 0, 0, 1, 0, 0],
[1, 1, 0, 0, 0, 1, 1],
[0, 0, 0, 0, 0, 1, 1],
[1, 1, 1, 0, 0, 0, 0]])
In [104]: from scipy.ndimage import label, binary_dilation
Extend each "shape" by one pixel down and to the right:
In [105]: b = binary_dilation(a, structure=np.array([[0, 0, 0], [0, 1, 1], [0, 1, 1]])).astype(int)
In [106]: b
Out[106]:
array([[0, 0, 0, 1, 1, 0, 0],
[0, 0, 0, 1, 1, 0, 0],
[1, 1, 1, 0, 1, 1, 0],
[1, 1, 1, 0, 1, 1, 1],
[1, 1, 1, 0, 0, 1, 1],
[1, 1, 1, 1, 0, 1, 1]])
Apply label to the padded array:
In [107]: labels, numlabels = label(b)
In [108]: numlabels
Out[108]: 2
In [109]: labels
Out[109]:
array([[0, 0, 0, 1, 1, 0, 0],
[0, 0, 0, 1, 1, 0, 0],
[2, 2, 2, 0, 1, 1, 0],
[2, 2, 2, 0, 1, 1, 1],
[2, 2, 2, 0, 0, 1, 1],
[2, 2, 2, 2, 0, 1, 1]], dtype=int32)
By multiplying a by labels, we get the desired array of labels of a:
In [110]: alab = labels*a
In [111]: alab
Out[111]:
array([[0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[2, 2, 0, 0, 1, 0, 0],
[2, 2, 0, 0, 0, 1, 1],
[0, 0, 0, 0, 0, 1, 1],
[2, 2, 2, 0, 0, 0, 0]])
(This assumes that the values in a are 0 or 1. If they are not, you can use alab = labels * (a > 0).)
For a 3D input, you have to change the structure argument to binary_dilation:
struct = np.zeros((3, 3, 3), dtype=int)
struct[1:, 1:, 1:] = 1
b = binary_dilation(a, structure=struct).astype(int)

Searching numpy array for for pattern

I'd like to find a value in a numpy array given a search pattern. For instance for the given array a, I want to retrieve a result of 1 when using the search pattern s because 1 is the element at index 0 of a[:,1] (=array([1, 0, 0, 1])) and the elements of a[1:,1] match s (i.e. (a[1:,1] == s).all() == True => return a[0,1]).
Another example would be s=[1, 0, 1] for which I would expect a search result of 2 (match at 4th column starting (1-based)). 2 would also be the search result for s=[2, 0, 0], etc.
>>> import numpy as np
>>> a = np.asarray([[0, 1, 2, 2, 2, 2, 2, 2], [0, 0, 1, 1, 2, 2, 3, 3], [0, 0, 0, 0, 0, 0, 0, 0], [0, 1, 0, 1, 0, 1, 0, 1]])
>>> a
array([[0, 1, 2, 2, 2, 2, 2, 2],
[0, 0, 1, 1, 2, 2, 3, 3],
[0, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 0, 1, 0, 1, 0, 1]])
>>> s = np.asarray([0, 0, 1])
I came up with a[0, np.where((a[1:,:].transpose() == s).all(axis=-1))[0][0]], but thought there must be something more elegant...
Additionally, it would be great if I could do this operation with one call on multiple search patters, so that I retrieve the 0-element for which the values of index 1 to index 3 match.
Single search pattern
Here's one approach with help from broadcasting and slicing -
a[0,(a[1:] == s[:,None]).all(0)]
Multiple search patterns
For multiple search patterns (stored as 2D array), we just need to broadcast as before and look for ANY match at the end -
a[0,((a[1:] == s[...,None]).all(1)).any(0)]
Here's a sample run -
In [327]: a
Out[327]:
array([[0, 1, 2, 2, 2, 2, 2, 2],
[0, 0, 1, 1, 2, 2, 3, 3],
[0, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 0, 1, 0, 1, 0, 1]])
In [328]: s
Out[328]:
array([[1, 0, 1],
[2, 0, 0]])
In [329]: a[0,((a[1:] == s[...,None]).all(1)).any(0)]
Out[329]: array([2, 2])

Categories

Resources