Find and flat repeated values in numpy array

Find and flat repeated values in numpy array - python

I want to find values in an np array that are repeated more than x times and set them to 0.
Lets say this is my array:
[255,0,0,255,255,255,0,0,255,255,255,255,255,0,0]
I want to set to 0 all parts that are repeated more than x times.
Lets say, x = 3, the output array will be:
[255,0,0,255,255,255,0,0,0,0,0,0,0,0,0]
If x = 2:
[255,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
Of course, I can loop over the indexes, count them and set to 0, but there's got to be a faster and more efficient way (the purpose is to remove horizontal grids from an image).

Using pandas
s = pd.Series(x)
n = 5
s.groupby((s != s.shift()).cumsum()).apply(lambda z: z if z.size < n else pd.Series([0]*z.size)).values
array([255, 0, 0, 255, 255, 255, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int64)
n = 2
array([255, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int64)

You may be able to solve this by viewing at your data using a rolling window with length x+1 and hopsize 1. If all values in this window are equal, set them all to zero. Rolling windows can easily be done using SciKit image's view_as_windows():
import numpy
import skimage
x = 3
data = numpy.asarray([255,0,0,255,255,255,0,0,255,255,255,255,255,0,0])
data_view = skimage.util.view_as_windows(data, window_shape=(x + 1,))
mask = numpy.all(numpy.isclose(data_view, data_view[..., 0, None]), axis=1)
data_view[mask, :] = 0
data
# array([255, 0, 0, 255, 255, 255, 0, 0, 0, 0, 0, 0, 0, 0, 0])

Related

How to create a flow generator for a given iterable object?

Write a function that produces stream generator for given iterable object (list, generator, etc) whose elements contain position and value and sorted by order of apperance. Stream generator should be equal to initial stream (without position) but gaps filled with zeroes. For example:
gen = gen_stream(9,[(4,111),(7,12)])
list(gen) [0, 0, 0, 0, 111, 0, 0, 12, 0] # first element has zero index, so 111 located on fifth position, 12 located on 8th position
I.e. 2 significant elements has indexes 4 and 7, all other elements filled with zeroes.
To simplify things elements are sorted (i.e element with lower position should precede element with higher number) in initial stream.
First parameter can be None, in this case stream should be inifinite, e.g. infinite zeroes stream:
gen_stream(None, [])
following stream starts with 0, 0, 0, 0, 111, 0, 0, 12, ... then infinitely generates zeroes:
gen_stream(None, [(4,111),(7,12)])
Function should also support custom position-value extractor for more advanced cases, e.g.
def day_extractor(x):
months = [31,28,31,30,31,31,30,31,30,31,30,31]
acc = sum(months[:x[1]-1]) + x[0] - 1
return (acc, x[2])
precipitation_days = [(3,1,4),(5,2,6)]
list(gen_stream(59,precipitation_days,day_extractor)) #59: January and February to limit output
[0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
precipitation_days format is following: (d,m,mm), where d - day in month, m - month, mm - precipitation in millimeters
So, in example:
(3,1,4) - January,3 precipitation: 4 mm
(5,2,6) - February,5 precipitation: 6 mm
Extractor passed as optional third parameter with default value - lambda function that handles (position, value) pairs like in first example.
That's what i did:
import sys
a=[(4,111),(7,12)]
n = 9
def gen_stream(n1, a1):
if n1==None:
b = [0 for i in range(sys.maxsize)]
else:
b = [0 for i in range(n1)]
for i in range(len(a1)):
b[a[i][0]]=a[i][1]
for i in range(len(b)):
yield b[i]
for i in gen_stream(None, a):
print(i)
So far I have reached a stream with infinite zeros, but the function is not executed for some reason ... And how to do it next with months? My memory error crashes, and the program eats a lot of RAM (((help please

Python Numpy find row index of array consisting of 5 values, by searching for 2 values

Hi apologies if the title is confusing, I am new to numpy and not used to the terminology.
Suppose we have a numpy array acting as a world map.
The parameters are (x y r g b) - all are int16
Example:
a = np.array([[ 0, 0, 0, 255, 0], #index 0
[ 0, 1, 0, 0, 255], #index 1
[ 0, 2, 0, 255, 0]]) #index 2
Now we want to find the index value of the row with x and y values (0, 2) - hence the row with index 2.
[ 0, 1, 0, 0, 255] #index 2
How would I do this without also inputting the rest of the values (r g b)? Basically we are searching a five value row with two values - how would I do this?

You can slice the rows up to the second column, and check if they are equal to [0,2]. Then use all setting axis to 1 to set to True those that satisfy all conditions, and use the boolean array to index the ndarray:
a = np.array([[ 0, 0, 0, 255, 0],
[ 0, 1, 0, 0, 255],
[ 0, 2, 0, 255, 0]])
a[(a[:,:2] == [0,2]).all(1)]
# array([[ 0, 2, 0, 255, 0]])

Here's your data:
import numpy as np
arr = np.array([[ 0, 0, 0, 255, 0],
[ 0, 1, 0, 0, 255],
[ 0, 2, 0, 255, 0]])
a,b = 0,2 # [a,b] is what we are looking for, in the first two cols
Here's the solution to get the row index containing [a,b]:
found_index = np.argmax(np.logical_and(arr[:,0]==[a],arr[:,1]==[b]))
print (found_index)
Output:
2
Explanation:
The best way to understand how this works, is by printing each part of it:
print (arr[:,0]==[a])
Outputs:
[ True True True]
print (arr[:,1]==[b])
Outputs:
[False False True]
print (np.logical_and(arr[:,0]==[a],arr[:,1]==[b]))
# print (np.logical_and([ True True True], [False False True]))
Outputs:
[False False True]

An elegant/faster way to find the end points of a line on a image?

I've been working to improve the speed of my code by replacing for loops of array operations to appropriate NumPy functions.
The function aims to get the end points of a line, which is the only two points that has exactly one neighbor pixel in 255.
Is there a way I could get two points from np.where with conditions or some NumPy functions I'm not familiar with will do the job?
def get_end_points(image):
x1=-1
y1=-1
x2=-1
y2=-1
for i in range(image.shape[0]):
for j in range(image.shape[1]):
if image[i][j]==255 and neighbours_sum(i,j,image) == 255:
if x1==-1:
x1 = j
y1 = i
else:
x2=j
y2=i
return x1,y1,x2,y2

Here is a solution with convolution:
import numpy as np
import scipy.signal
def find_endpoints(img):
# Kernel to sum the neighbours
kernel = [[1, 1, 1],
[1, 0, 1],
[1, 1, 1]]
# 2D convolution (cast image to int32 to avoid overflow)
img_conv = scipy.signal.convolve2d(img.astype(np.int32), kernel, mode='same')
# Pick points where pixel is 255 and neighbours sum 255
endpoints = np.stack(np.where((img == 255) & (img_conv == 255)), axis=1)
return endpoints
# Test
img = np.zeros((1000, 1000), dtype=np.uint8)
# Draw a line from (200, 130) to (800, 370)
for i in range(200, 801):
j = round(i * 0.4 + 50)
img[i, j] = 255
print(find_endpoints(img))
# [[200 130]
# [800 370]]
EDIT:
You may also consider using Numba for this. The code would be pretty much what you already have, so maybe not particularly "elegant", but much faster. For example, something like this:
import numpy as np
import numba as nb
#nb.njit
def find_endpoints_nb(img):
endpoints = []
# Iterate through every row and column
for i in range(img.shape[0]):
for j in range(img.shape[1]):
# Check current pixel is white
if img[i, j] != 255:
continue
# Sum neighbours
s = 0
for ii in range(max(i - 1, 0), min(i + 2, img.shape[0])):
for jj in range(max(j - 1, 0), min(j + 2, img.shape[1])):
s += img[ii, jj]
# Sum including self pixel for simplicity, check for two white pixels
if s == 255 * 2:
endpoints.append((i, j))
if len(endpoints) >= 2:
break
if len(endpoints) >= 2:
break
return np.array(endpoints)
print(find_endpoints_nb(img))
# [[200 130]
# [800 370]]
This runs comparatively faster in my computer:
%timeit find_endpoints(img)
# 34.4 ms ± 64.4 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit find_endpoints_nb(img)
# 552 µs ± 4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Also, it should use less memory. The code above assumes there will be only two endpoints. You may be able to make it even faster if you add parallelization (although you would have to make some changes, because you would not be able to modify the list endpoints from parallel threads).

Edit: I didnt notice you have grayscale image, but as far as the idea is concerned, nothing changed
I cannot give you exact solution, but I can give you faster way to find what you want
1) a) Find indexes (pixels) where is white [255,255,255]
indice =np.where(np.all(image==255, axis=2))
1) b) do your loops around this points
this is faster because you are not doing useless loops
2) This solution should be very very fast, but it will be hard to program
a) find the indexes like in 1)
indice =np.where(np.all(image==255, axis=2))
b) move indice array +1 in X axis and add it to image
indices = =np.where(np.all(image==255, axis=2))
indices_up = # somehow add to all indexes in x dimension +1 (simply move it up)
add_up = image[indices]+image[indices_up]
# if in add_up matrix is array with(rgb channel) [510,510,510] # 255+255, then it has neightbour in x+1
# Note that you cant do it with image of dtype uint8, because 255 is max, adding up you will end up back at 255
You have to this for all neighbours though -> x+1,x-1,y+1,y-1, x+1,y+1....
It will be extra fast tough
EDIT2: I was able to make a script that should do it, but you should test it first
import numpy as np
image = np.array([[0, 0, 0, 0, 0, 0, 0,0,0],
[0, 0, 255, 0, 0, 0, 0,0,0],
[0, 0, 255, 0, 255, 0, 0,0,0],
[0, 0, 0, 255,0, 255, 0,0,0],
[0, 0, 0, 0, 0, 255, 0,0,0],
[0, 0, 0, 0, 0, 0, 0,0,0],
[0, 0, 0, 0, 0, 0, 0,0,0]])
image_f = image[1:-1,1:-1] # cut image
i = np.where(image_f==255) # find 255 in the cut image
x = i[0]+1 # calibrate x indexes for original image
y = i[1]+1 # calibrate y indexes for original image
# this is done so you dont search in get_indexes() out of image
def get_indexes(xx,yy,image):
for i in np.where(image[xx,yy]==255):
for a in i:
yield xx[a],yy[a]
# Search for horizontal and vertical duplicates(neighbours)
for neighbours_index in get_indexes(x+1,y,image):
print(neighbours_index )
for neighbours_index in get_indexes(x-1,y,image):
print(neighbours_index )
for neighbours_index in get_indexes(x,y+1,image):
print(neighbours_index )
for neighbours_index in get_indexes(x,y-1,image):
print(neighbours_index )

I think I can at least provide an elegant solution using convolutions.
We can look for the amount of neighbouring pixels by convolving the original image with a 3x3 ring. Then we can determine if the line end was there if the center pixel also had a white pixel in it.
>>> import numpy as np
>>> from scipy.signal import convolve2d
>>> a = np.array([[0, 0, 0, 0, 0, 0, 0], [0, 1, 0, 0, 0, 0, 0], [0, 1, 0, 0, 0, 0, 0], [0, 0, 1, 0, 0, 0, 0], [0, 0, 0, 1, 0, 0, 0], [0, 0, 0, 0, 1, 0, 0], [0, 0, 0, 0, 1, 1, 0]])
>>> a
array([[0, 0, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0],
[0, 0, 0, 0, 1, 1, 0]])
>>> c = np.full((3, 3), 1)
>>> c[1, 1] = 0
>>> c
array([[1, 1, 1],
[1, 0, 1],
[1, 1, 1]])
>>> np.logical_and(convolve2d(a, c, mode='same') == 1, a == 1).astype(int)
array([[0, 0, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0]])
Feel free to see what the individual components produce, but for the sake of brevity I didn't include them here. And as you might have noticed, it does correctly reject cases where the line ends with two neighbouring pixels.
This you can of course convert to the arbitrary amount of indices of line endings with np.where:
np.array(np.where(result))

Is there a way to use bincount with bin width = 0.1 in python?

I have something like
[12.414261306701654, 10.52589457006108, 12.398125569114093, 11.900971715356471, 11.566273761189997, 10.31504117886884, 10.235859974871904, 10.25704925592012, 10.296557787801154, 10.19010244226054]
say I want the count of occurrence in(10,10.1)(10.1,10.2),...
I think that numpy.bincountonly takes integer bin width, however if I multiply my array by 10 and use bincount, when I plot the result later the xscale is also off by 10, and I don't know how can I get the accurate plot.
thanks

Have a look at np.histogram:
>>> import numpy as np
>>> data = [12.414261306701654, 10.52589457006108, 12.398125569114093, 11.900971715356471, 11.566273761189997, 10.31504117886884, 10.235859974871904, 10.25704925592012, 10.296557787801154, 10.19010244226054]
>>> counts, bin_edges = np.histogram(data, bins=np.arange(10, 12.6, 0.1))
>>> counts
array([0, 1, 3, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0,
1, 1])

Python - creating a list with 2 characteristics bug

The goal is to create a list of 99 elements. All elements must be 1s or 0s. The first element must be a 1. There must be 7 1s in total.
import random
import math
import time
# constants determined through testing
generation_constant = 0.96
def generate_candidate():
coin_vector = []
coin_vector.append(1)
for i in range(0, 99):
random_value = random.random()
if (random_value > generation_constant):
coin_vector.append(1)
else:
coin_vector.append(0)
return coin_vector
def validate_candidate(vector):
vector_sum = sum(vector)
sum_test = False
if (vector_sum == 7):
sum_test = True
first_slot = vector[0]
first_test = False
if (first_slot == 1):
first_test = True
return (sum_test and first_test)
vector1 = generate_candidate()
while (validate_candidate(vector1) == False):
vector1 = generate_candidate()
print vector1, sum(vector1), validate_candidate(vector1)
Most of the time, the output is correct, saying something like
[1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 1, 0, 0, 0] 7 True
but sometimes, the output is:
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] 2 False
What exactly am I doing wrong?

I'm not certain I understand your requirements, but here's what it sounds like you need:
#!/usr/bin/python3
import random
ones = [ 1 for i in range(6) ]
zeros = [ 0 for i in range(99 - 6) ]
list_ = ones + zeros
random.shuffle(list_)
list_.insert(0, 1)
print(list_)
print(list_.count(1))
print(list_.count(0))
HTH

The algorithm you gave works, though it's slow. Note that the ideal generation_constant can actually be calculated using the binomial distribution. The optimum is &approx;0.928571429 which will fit the conditions 1.104% of the time. If you set the first element to 1 manually, then the optimum generation_constant is &approx;0.93877551 which will fit the conditions 16.58% of the time.
The above is based on the binomial distribution, which says that the probability of having exactly k "success" events out of N total tries where each try has probability p will be P( k | N, p ) = N! * p ^ k * (1 - p) ^ (N - k) / ( n! * (N - k)). Just stick that into Excel, Mathematica, or a graphing calculator and maximize P.
Alternatively:
To generate a list of 99 numbers where the first and 6 additional items are 1 and the remaining elements are 0, you don't need to call random.random so much. Generating pseudo-random numbers is very expensive.
There are two ways to avoid calling random so much.
The most processor efficient way is to only call random 6 times, for the 6 ones you need to insert:
import random
# create vector of 99 0's
vector = [0 for i in range(99)]
# set first element to 1
vector[0] = 1
# list of locations of all 0's
indexes = range(1, 99)
# only need to loop 6 times for remaining 6 ones
for i in range(6):
# select one of the 0 locations at random
# "pop" it from the list so it can't be selected again
# and set it's coresponding element in vector to 1.
vector[indexes.pop(random.randint(0, len(indexes) - 1))] = 1
Alternatively, to save on memory, you can just test each new index to make sure it will actually set something:
import random
# create vector of 99 0's
vector = [0 for i in range(99)]
# only need to loop 7 times
for i in range(7):
index = 0 # first element is set to 1 first
while vector[index] == 1: # keep calling random until a 0 is found
index = random.randint(0, 98) # random index to check/set
vector[index] = 1 # set the random (or first) element to 1
The second one will always set the first element to 1 first, because index = random.randint(0, 98) only ever gets called if vector[0] == 1.

With genetic programming you want to control your domain so that invalid configurations are eliminated as much as possible. The fitness is suppose to rate valid configurations, not eliminate invalid configurations. Honestly this problem doesn't really seem to be a good fit for genetic programming. You have outlined the domain. But I don't see a fitness description anywhere.
Anyway, that being said, the way I would populate the domain would be: since the first element is always 1, ignore it, since the remaining 98 only have 6 ones, shuffle in 6 ones to 92 zeros. Or even enumerate the possible as your domain isn't very large.

I have a feeling it is your use of sum(). I believe this modifies the list in place:
>>> mylist = [1,2,3,4]
>>> sum(mylist)
10
>>> mylist
[]
Here's a (somewhat) pythonic recursive version
def generate_vector():
generation_constant = .96
myvector = [1]+[ 1 if random.random() > generation_constant else 0 for i in range(0,99)]
mysum = 0
for a in myvector:
mysum = (mysum + a)
if mysum == 7 and myvector[0]==1:
return myvector
return generate_vector()
and for good measure
def generate_test():
for i in range(0,10000):
vector = generate_vector()
sum = 0
for a in vector:
sum = sum + a
if sum != 7 or vector[0]!=1:
print vector
output:
>>> generate_test()
>>>

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Find and flat repeated values in numpy array - python

Using pandas s = pd.Series(x) n = 5 s.groupby((s != s.shift()).cumsum()).apply(lambda z: z if z.size < n else pd.Series([0]*z.size)).values array([255, 0, 0, 255, 255, 255, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int64) n = 2 array([255, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int64)

Related

How to create a flow generator for a given iterable object?

Python Numpy find row index of array consisting of 5 values, by searching for 2 values

An elegant/faster way to find the end points of a line on a image?

Is there a way to use bincount with bin width = 0.1 in python?

Python - creating a list with 2 characteristics bug

Categories

Resources