Related
Pretend I have a pandas Series that consists of 0s and 1s, but this can work with numpy arrays or any iterable. I would like to create a formula that would take an array and an input n and then return a new series that contains 1s at the nth indices leading up to every time that there is at least a single 1 in the original series. Here is an example:
array = np.array([0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1])
> preceding_indices_function(array, 2)
np.array([0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1])
For each time there is a 1 in the input array, the two indices preceding it are filled in with 1 regardless of whether there is a 0 or 1 in that index in the original array.
I would really appreciate some help on this. Thanks!
Use a convolution with np.convolve:
N = 2
# craft a custom kernel
kernel = np.ones(2*N+1)
kernel[-N:] = 0
# array([1, 1, 1, 0, 0])
out = (np.convolve(array, kernel, mode='same') != 0).astype(int)
Output:
array([0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1])
Unless you don't want to use numpy, mozway's transpose is the best solution.
But since several iterations have been given, I add my itertools based solution
[a or b or c for a,b,c in itertools.zip_longest(array, array[1:], array[2:], fillvalue=0)]
zip_longest is the same as classical zip, but if the iterators have different "lengths", the number of iteration is the one of the longest, and finished iterators will return None. Unless you add a fillvalue parameter to zip_longest.
So, here itertools.zip_longest(array, array[1:], array[2:], fillvalue=0) gives a sequence of triplets (a,b,c), of 3 subsequent elements (a being the current element, b the next, c the one after, b and c being 0 if there isn't any next element or element after the next).
So from there, a simple comprehension build a list of [a or b or c] that is 1 if a, or b or c is 1, 0 else.
import numpy as np
array = np.array([0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1])
array = np.array([a or array[idx+1] or array[idx+2] for idx, a in enumerate(array[:-2])] + [array[-2] or array[-1]] + [array[-1]])
this function works if a is a list, should work with other iterables as well:
def preceding_indices_function(array, n):
for i in range(len(a)):
if array[i] == 1:
for j in range(n):
if i-j-1 >= 0:
array[i-j-1] = 1
return array
I got a solution that is similar to the other one but slightly simpler in my opinion:
>>> [1 if (array[i+1] == 1 or array[i+2] == 1) else x for i,x in enumerate(array) if i < len(array) - 2]
[0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1]
Problem
I want to identify when I've encountered a true value and maintain that value for the rest of the array... for a particular bin. From a Numpy perspective it would be like a combination of numpy.logical_or.accumulate and numpy.logical_or.at.
Example
Consider the truth values in a, the bins in b and the expected output in c.
I've used 0 for False and 1 for True then converted to bool in order to align the array values.
a = np.array([0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0]).astype(bool)
b = np.array([0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 2, 3, 3, 0, 1, 2, 3])
# zeros ↕ ↕ ↕ ↕ ↕ ↕ ↕
# ones ↕ ↕ ↕ ↕ ↕
# twos ↕ ↕
# threes ↕ ↕ ↕
c = np.array([0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1]).astype(bool)
# ╰─────╯ ↑ ↑ ↑ ↑
# zero bin no True yet │ │ │ two never had a True
# one bin first True │ three bin first True
# zero bin first True
What I've Tried
I can loop through each value and track whether the associated bin has seen a True value yet.
tracker = np.zeros(4, bool)
result = np.zeros(len(b), bool)
for i, (truth, bin_) in enumerate(zip(a, b)):
tracker[bin_] |= truth
result[i] = tracker[bin_]
result * 1
array([0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1])
But I was hoping for a O(n) time Numpy solution. I have the option of using a JIT wrapper like Numba but I'd rather keep it just Numpy.
O(n) solution
def cumulative_linear_seen(seen, bins):
"""
Tracks whether or not a value has been observed as
True in a 1D array, and marks all future values as
True for these each individual value.
Parameters
----------
seen: ndarray
One-hot array marking an occurence of a value
bins: ndarray
Array of bins to which occurences belong
Returns
-------
One-hot array indicating if the corresponding bin has
been observed at a point in time
"""
# zero indexing won't work with logical and, need to 1-index
one_up = bins + 1
# Next step is finding where each unique value is seen
occ = np.flatnonzero(a)
v_obs = one_up[a]
# We can fill another mapping array with these occurences.
# then map by corresponding index
i_obs = np.full(one_up.max() + 1, seen.shape[0] + 1)
i_obs[v_obs] = occ
# Finally, we create the map and compare to an array of
# indices from the original seen array
seen_idx = i_obs[one_up]
return (seen_idx <= np.arange(seen_idx.shape[0])).astype(int)
array([0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1])
PiR's contribution
Based on insights above
r = np.arange(len(b))
one_hot = np.eye(b.max() + 1, dtype=bool)[b]
np.logical_or.accumulate(one_hot & a[:, None], axis=0)[r, b] * 1
array([0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1])
Older attempts
Just to get things started, here is a solution that, while vectorized, is not O(n). I believe an O(n) solution similar to this exists, I'll work on the complexity :-)
Attempt 1
q = b + 1
u = sparse.csr_matrix(
(a, q, np.arange(a.shape[0] + 1)), (a.shape[0], q.max()+1)
)
m = np.maximum.accumulate(u.A) * np.arange(u.shape[1])
r = np.where(m[:, 1:] == 0, np.nan, m[:, 1:])
(r == q[:, None]).any(1).view(np.int8)
array([0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1], dtype=int8)
Attempt 2
q = b + 1
m = np.logical_and(a, q)
r = np.flatnonzero(u)
t = q[m]
f = np.zeros((a.shape[0], q.max()))
f[r, t-1] = 1
v = np.maximum.accumulate(f) * np.arange(1, q.max()+1)
(v == q[:, None]).any(1).view(np.int8)
array([0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1], dtype=int8)
How can I find the amount of consecutive 1s (or any other value) in each row for of the following numpy array? I need a pure numpy solution.
array([[0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0],
[0, 0, 1, 0, 0, 1, 2, 0, 0, 1, 1, 1],
[0, 0, 0, 4, 1, 0, 0, 0, 0, 1, 1, 0]])
There are two parts to my question, first: what is the maximum number of 1s in a row? Should be
array([2,3,2])
in the example case.
And second, what is the index of the start of the first set of multiple consecutive 1s in a row? For the example case this would be
array([3,9,9])
In this example I put 2 consecutive 1s in a row. But it should be possible to change that to 5 consecutive 1s in a row, this is important.
A similar question was answered using np.unique, but it only works for one row and not an array with multiple rows as the result would have different lengths.
Here's a vectorized approach based on differentiation -
import numpy as np
import pandas as pd
# Append zeros columns at either sides of counts
append1 = np.zeros((counts.shape[0],1),dtype=int)
counts_ext = np.column_stack((append1,counts,append1))
# Get start and stop indices with 1s as triggers
diffs = np.diff((counts_ext==1).astype(int),axis=1)
starts = np.argwhere(diffs == 1)
stops = np.argwhere(diffs == -1)
# Get intervals using differences between start and stop indices
start_stop = np.column_stack((starts[:,0], stops[:,1] - starts[:,1]))
# Get indices corresponding to max. interval lens and thus lens themselves
SS_df = pd.DataFrame(start_stop)
out = start_stop[SS_df.groupby([0],sort=False)[1].idxmax(),1]
Sample input, output -
Original sample case :
In [574]: counts
Out[574]:
array([[0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0],
[0, 0, 1, 0, 0, 1, 2, 0, 0, 1, 1, 1],
[0, 0, 0, 4, 1, 0, 0, 0, 0, 1, 1, 0]])
In [575]: out
Out[575]: array([2, 3, 2], dtype=int64)
Modified case :
In [577]: counts
Out[577]:
array([[0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0],
[0, 0, 1, 0, 0, 1, 2, 0, 1, 1, 1, 1],
[0, 0, 0, 4, 1, 1, 1, 1, 1, 0, 1, 0]])
In [578]: out
Out[578]: array([2, 4, 5], dtype=int64)
Here's a Pure NumPy version that is identical to the previous until we have start, stop. Here's the full implementation -
# Append zeros columns at either sides of counts
append1 = np.zeros((counts.shape[0],1),dtype=int)
counts_ext = np.column_stack((append1,counts,append1))
# Get start and stop indices with 1s as triggers
diffs = np.diff((counts_ext==1).astype(int),axis=1)
starts = np.argwhere(diffs == 1)
stops = np.argwhere(diffs == -1)
# Get intervals using differences between start and stop indices
intvs = stops[:,1] - starts[:,1]
# Store intervals as a 2D array for further vectorized ops to make.
c = np.bincount(starts[:,0])
mask = np.arange(c.max()) < c[:,None]
intvs2D = mask.astype(float)
intvs2D[mask] = intvs
# Get max along each row as final output
out = intvs2D.max(1)
I think one problem that is very similar is to check if between the sorted rows the element wise difference is a certain amount. Here if there is a difference of 1 between 5 consecutive would be as follows. It can also be done for difference of 0 for two cards:
cardAmount=cards[0,:].size
has4=cards[:,np.arange(0,cardAmount-4)]-cards[:,np.arange(cardAmount-3,cardAmount)]
isStraight=np.any(has4 == 4, axis=1)
I have a binary matrix which I create by NumPy. The matrix has 6 rows and 8 columns.
array([[1, 0, 1, 1, 1, 0, 1, 1],
[1, 1, 1, 1, 1, 1, 0, 0],
[0, 0, 1, 0, 0, 1, 1, 1],
[1, 0, 1, 1, 0, 1, 1, 0],
[0, 1, 0, 0, 1, 0, 1, 1],
[0, 1, 0, 1, 1, 1, 0, 0]])
First column is the sign of a number.
Example:
1, 0, 1, 1, 1, 0, 1, 1 -> 1 0111011 -> -59
When I used int(str, base=2) as a result I received value 187, and the value should be -59.
>>> int(''.join(map(str, array[0])), 2)
>>> 187
How can I convert the string into the signed integer?
Pyhton doesn't know that the first bit is supposed to represent the sign (compare with bin(-59)), so you have to handle that yourself, for example, if A contains the array:
num = int(''.join(map(str, A[0,1:])), 2)
if A[0,0]:
num *= -1
Here's a more Numpy-ish way to do it, for the whole array at once:
num = np.packbits(A).astype(np.int8)
num[num<0] = -128 - num[num<0]
Finally, a code-golf version:
(A[:,:0:-1]<<range(7)).sum(1)*(1-2*A[:,0])
You could split each row a sign and value variable. Then if sign is negative multiply the value by -1.
row = array[0]
sign, value = row[0], row[1:]
int(''.join(map(str, value)), 2) if sign == 0 else int(''.join(map(str, value)), 2) * -1
First of all, it looks like NumPy array rather than NumPy matrix.
There are a couple options I can think of. Pretty straight forward way will look like that:
def rowToSignedDec(arr, row):
res = int(''.join(str(x) for x in arr[row][1:].tolist()),2)
if arr[row][0] == 1:
return -res
else:
return res
print rowToSignedDec(arr, 0)
-59
That one is clearly not the most efficient one and neither the shortest one-liner:
int(''.join(str(x) for x in arr[0][1:].tolist()),2) - 2*int(arr[0][0])*int(''.join(str(x) for x in arr[0][1:].tolist()),2)
Where arr is the above-mentioned array.
The goal is to create a list of 99 elements. All elements must be 1s or 0s. The first element must be a 1. There must be 7 1s in total.
import random
import math
import time
# constants determined through testing
generation_constant = 0.96
def generate_candidate():
coin_vector = []
coin_vector.append(1)
for i in range(0, 99):
random_value = random.random()
if (random_value > generation_constant):
coin_vector.append(1)
else:
coin_vector.append(0)
return coin_vector
def validate_candidate(vector):
vector_sum = sum(vector)
sum_test = False
if (vector_sum == 7):
sum_test = True
first_slot = vector[0]
first_test = False
if (first_slot == 1):
first_test = True
return (sum_test and first_test)
vector1 = generate_candidate()
while (validate_candidate(vector1) == False):
vector1 = generate_candidate()
print vector1, sum(vector1), validate_candidate(vector1)
Most of the time, the output is correct, saying something like
[1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 1, 0, 0, 0] 7 True
but sometimes, the output is:
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] 2 False
What exactly am I doing wrong?
I'm not certain I understand your requirements, but here's what it sounds like you need:
#!/usr/bin/python3
import random
ones = [ 1 for i in range(6) ]
zeros = [ 0 for i in range(99 - 6) ]
list_ = ones + zeros
random.shuffle(list_)
list_.insert(0, 1)
print(list_)
print(list_.count(1))
print(list_.count(0))
HTH
The algorithm you gave works, though it's slow. Note that the ideal generation_constant can actually be calculated using the binomial distribution. The optimum is ≈0.928571429 which will fit the conditions 1.104% of the time. If you set the first element to 1 manually, then the optimum generation_constant is ≈0.93877551 which will fit the conditions 16.58% of the time.
The above is based on the binomial distribution, which says that the probability of having exactly k "success" events out of N total tries where each try has probability p will be P( k | N, p ) = N! * p ^ k * (1 - p) ^ (N - k) / ( n! * (N - k)). Just stick that into Excel, Mathematica, or a graphing calculator and maximize P.
Alternatively:
To generate a list of 99 numbers where the first and 6 additional items are 1 and the remaining elements are 0, you don't need to call random.random so much. Generating pseudo-random numbers is very expensive.
There are two ways to avoid calling random so much.
The most processor efficient way is to only call random 6 times, for the 6 ones you need to insert:
import random
# create vector of 99 0's
vector = [0 for i in range(99)]
# set first element to 1
vector[0] = 1
# list of locations of all 0's
indexes = range(1, 99)
# only need to loop 6 times for remaining 6 ones
for i in range(6):
# select one of the 0 locations at random
# "pop" it from the list so it can't be selected again
# and set it's coresponding element in vector to 1.
vector[indexes.pop(random.randint(0, len(indexes) - 1))] = 1
Alternatively, to save on memory, you can just test each new index to make sure it will actually set something:
import random
# create vector of 99 0's
vector = [0 for i in range(99)]
# only need to loop 7 times
for i in range(7):
index = 0 # first element is set to 1 first
while vector[index] == 1: # keep calling random until a 0 is found
index = random.randint(0, 98) # random index to check/set
vector[index] = 1 # set the random (or first) element to 1
The second one will always set the first element to 1 first, because index = random.randint(0, 98) only ever gets called if vector[0] == 1.
With genetic programming you want to control your domain so that invalid configurations are eliminated as much as possible. The fitness is suppose to rate valid configurations, not eliminate invalid configurations. Honestly this problem doesn't really seem to be a good fit for genetic programming. You have outlined the domain. But I don't see a fitness description anywhere.
Anyway, that being said, the way I would populate the domain would be: since the first element is always 1, ignore it, since the remaining 98 only have 6 ones, shuffle in 6 ones to 92 zeros. Or even enumerate the possible as your domain isn't very large.
I have a feeling it is your use of sum(). I believe this modifies the list in place:
>>> mylist = [1,2,3,4]
>>> sum(mylist)
10
>>> mylist
[]
Here's a (somewhat) pythonic recursive version
def generate_vector():
generation_constant = .96
myvector = [1]+[ 1 if random.random() > generation_constant else 0 for i in range(0,99)]
mysum = 0
for a in myvector:
mysum = (mysum + a)
if mysum == 7 and myvector[0]==1:
return myvector
return generate_vector()
and for good measure
def generate_test():
for i in range(0,10000):
vector = generate_vector()
sum = 0
for a in vector:
sum = sum + a
if sum != 7 or vector[0]!=1:
print vector
output:
>>> generate_test()
>>>