How to make create a triangle of "1"? - python

I want to create this from multiple arrays, best using NumPy:
1 0 0 0 0 0
1 1 0 0 0 0
1 1 1 0 0 0
1 1 1 1 0 0
1 1 1 1 1 0
1 1 1 1 1 1
However, I prefer if a library is used to create this, how do I go about doing this?
Note: NumPy can be used to create the array as well.
There are a lot of answers on SO, but they all provide answers that do not use libraries, and I haven't been able to find anything online to produce this!

You can use np.tril:
>>> np.tril(np.ones((6, 6), dtype=int))
array([[1, 0, 0, 0, 0, 0],
[1, 1, 0, 0, 0, 0],
[1, 1, 1, 0, 0, 0],
[1, 1, 1, 1, 0, 0],
[1, 1, 1, 1, 1, 0],
[1, 1, 1, 1, 1, 1]])

Using numpy.tri
Syntax:
numpy.tri(N, M=None, k=0, dtype=<class 'float'>, *, like=None)
Basically it creates an array with 1's at and below the given diagonal and 0's elsewhere.
Example:
import numpy as np
np.tri(6, dtype=int)
>>>
array([[1, 0, 0, 0, 0, 0],
[1, 1, 0, 0, 0, 0],
[1, 1, 1, 0, 0, 0],
[1, 1, 1, 1, 0, 0],
[1, 1, 1, 1, 1, 0],
[1, 1, 1, 1, 1, 1]])

Related

Pandas column of lists, append a new column to each list

For example, I got a pd.Series of list like below
test = pd.Series([[1, 0, 0, 0],[0, 1, 0, 0],[0, 1, 0, 0],[0, 0, 0, 1],[1, 0, 0, 0]])
print(test)
0 [1, 0, 0, 0]
1 [0, 1, 0, 0]
2 [0, 1, 0, 0]
3 [0, 0, 0, 1]
4 [1, 0, 0, 0]
what I want to do is that, I want to add (the index + 1) of each element into the each list, like
0 [1, 0, 0, 0, 1]
1 [0, 1, 0, 0, 2]
2 [0, 1, 0, 0, 3]
3 [0, 0, 0, 1, 4]
4 [1, 0, 0, 0, 5]
how can I achieve this?
np.column_stack
Stack the index to the existing list assign back to test in-place:
test[:] = np.column_stack([test.tolist(), test.index + 1]).tolist()
test
0 [1, 0, 0, 0, 1]
1 [0, 1, 0, 0, 2]
2 [0, 1, 0, 0, 3]
3 [0, 0, 0, 1, 4]
4 [1, 0, 0, 0, 5]
dtype: object
Here, the Series is converted to a list of lists, then concatenated with (index + 1). When assigning back, you need to use a list of lists because pandas doesn't understand you want a column of lists if you're assigning a numpy array back.
Series.map and itertools.count
Another option, having fun with itertools:
from itertools import count
c = count(1)
test.map(lambda l: [*l, next(c)])
0 [1, 0, 0, 0, 1]
1 [0, 1, 0, 0, 2]
2 [0, 1, 0, 0, 3]
3 [0, 0, 0, 1, 4]
4 [1, 0, 0, 0, 5]
dtype: object
test = pd.Series([[1, 0, 0, 0],[0, 1, 0, 0],[0, 1, 0, 0],
[0, 0, 0, 1],[1, 0, 0, 0]])
b=0
for a in test:
b+=1
a.append(b)
print(test)
will give
0 [1, 0, 0, 0, 1]
1 [0, 1, 0, 0, 2]
2 [0, 1, 0, 0, 3]
3 [0, 0, 0, 1, 4]
4 [1, 0, 0, 0, 5]
You could try using this pd.Series with a list comprehension:
import pandas as pd
test = pd.Series([[1, 0, 0, 0],[0, 1, 0, 0],[0, 1, 0, 0],[0, 0, 0, 1],[1, 0, 0, 0]])
print(test + pd.Series([[i + 1] for i in test.index]))
Output:
0 [1, 0, 0, 0, 1]
1 [0, 1, 0, 0, 2]
2 [0, 1, 0, 0, 3]
3 [0, 0, 0, 1, 4]
4 [1, 0, 0, 0, 5]
dtype: object

Smoothing a 2-D Numpy Array with a Kernel

Suppose I have an (m x n) 2-d numpy array that are just 0's and 1's. I want to "smooth" the array by running, for example, a 3x3 kernel over the array and taking the majority value within that kernel. For values at the edges, I would just ignore the "missing" values.
For example, let's say the array looked like
import numpy as np
x = np.array([[1, 0, 0, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 1, 1, 1, 0],
[0, 0, 1, 1, 0, 1, 1, 0],
[0, 0, 1, 0, 1, 1, 1, 0],
[0, 1, 1, 1, 1, 0, 1, 0],
[0, 0, 1, 1, 1, 1, 1, 0],
[0, 0, 0, 0, 0, 0, 0, 0]])
Starting at the top left "1", a 3 x 3 kernel centered at the first top left element, would be missing the first row and first column. The way I want to treat that is just ignore that and consider the remaining 2 x 2 matrix:
1 0
0 0
In this case, the majority value is 0, so set that element to 0. Repeating this for all elements, the resulting 2-d array I would want is:
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 1 1 1 1 0
0 0 1 1 1 1 1 0
0 0 1 1 1 1 1 0
0 0 1 1 1 1 1 0
0 0 1 1 1 1 0 0
0 0 0 0 0 0 0 0
How do I accomplish this?
You can use skimage.filters.rank.majority to assign to each value the most occuring one within its neighborhood. The 3x3 kernel can be defined using skimage.morphology.square:
from skimage.filters.rank import majority
from skimage.morphology import square
majority(x.astype('uint8'), square(3))
array([[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 1, 1, 0, 0],
[0, 0, 1, 1, 1, 1, 1, 0],
[0, 0, 1, 1, 1, 1, 1, 0],
[0, 0, 1, 1, 1, 1, 1, 0],
[0, 0, 1, 1, 1, 1, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0]], dtype=uint8)
Note: You'll need the latest stable version of scikit-image for majority. More here
I ended up doing something like this (which is based off of How do I use scipy.ndimage.filters.gereric_filter?):
import scipy.ndimage.filters
import scipy.stats as scs
def filter_most_common_element(a, w_k=np.ones(shape=(3, 3))):
"""
Creating a function for scipy.ndimage.generic_filter.
See https://docs.scipy.org/doc/scipy/reference/generated/scipy.ndimage.generic_filter.html for more information
on generic filters.
This filter takes a kernel of np.ones() to find the most common element in the array.
Based off of https://stackoverflow.com/questions/61197364/smoothing-a-2-d-numpy-array-with-a-kernel
"""
a = a.reshape(w_k.shape)
a = np.multiply(a, w_k)
# See https://docs.scipy.org/doc/scipy-0.19.0/reference/generated/scipy.stats.mode.html
most_common_element = scs.mode(a, axis=None)[0][0]
return most_common_element
x = np.array([[1, 0, 0, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 1, 1, 1, 0],
[0, 0, 1, 1, 0, 1, 1, 0],
[0, 0, 1, 0, 1, 1, 1, 0],
[0, 1, 1, 1, 1, 0, 1, 0],
[0, 0, 1, 1, 1, 1, 1, 0],
[0, 0, 0, 0, 0, 0, 0, 0]])
out = scipy.ndimage.filters.generic_filter(x, filter_most_common_element, footprint=np.ones((3,3)),mode='constant',cval=0.0)
out
array([[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 1, 1, 0, 0],
[0, 0, 1, 1, 1, 1, 1, 0],
[0, 0, 1, 1, 1, 1, 1, 0],
[0, 0, 1, 1, 1, 1, 1, 0],
[0, 0, 1, 1, 1, 1, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0]])

Apply a function to series of list without apply in pandas

I have a dataframe
df = pd.DataFrame({'Binary_List': [[0, 0, 1, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 0, 0, 0],
[0, 0, 0, 0, 1, 1, 1]]})
df
Binary_List
0 [0, 0, 1, 0, 0, 0, 0]
1 [0, 1, 0, 0, 0, 0, 0]
2 [0, 0, 1, 1, 0, 0, 0]
3 [0, 0, 0, 0, 1, 1, 1]
I want to apply a function to each list, without use of apply because apply is very slow when running on large dataset
def count_one(lst):
index = [i for i, e in enumerate(lst) if e != 0]
# some more steps
return len(index)
df['Value'] = df['Binary_List'].apply(lambda x: count_one(x))
df
Binary_List Value
0 [0, 0, 1, 0, 0, 0, 0] 1
1 [0, 1, 0, 0, 0, 0, 0] 1
2 [0, 0, 1, 1, 0, 0, 0] 2
3 [0, 0, 0, 0, 1, 1, 1] 3
I tried using this, but no improvement
vfunc = np.vectorize(count_one)
df['Value'] = vfunc(df['Binary_List'])
This gives me error
df['Value'] = count_one(df['Binary_List'])
you can try DataFrame.explode:
df.explode('Binary_List').reset_index().groupby('index').sum()
Binary_List
index
0 1
1 1
2 2
3 3
Also you can do:
pd.Series([np.array(key).sum() for key in df['Binary_List']])
0 1
1 1
2 2
3 3
dtype: int64
for getting length of list items you can use str function like below
df = pd.DataFrame({'Binary_List': [[0, 0, 1, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 0, 0, 0],
[0, 0, 0, 0, 1, 1, 1]]})
df["Binary_List"].astype(np.str).str.count("1")

How to cluster data points near each other and assign each cluster a new numeric value?

If I have an array of data like this:
[[1, 1, 0, 0, 0, 1, 1],
[1, 0, 0, 0, 0, 1, 1],
[1, 0, 1, 1, 0, 0, 1],
[0, 0, 1, 1, 0, 0, 0]]
How do I cluster each grouping of 1s and assign each grouping of 1s a count such that I get an array like this:
[[1, 1, 0, 0, 0, 2, 2],
[1, 0, 0, 0, 0, 2, 2],
[1, 0, 3, 3, 0, 0, 2],
[0, 0, 3, 3, 0, 0, 0]]
Basically trying to identify each cluster of data points and assign that cluster of data points a specific value identifying it.
The skimage.measure.label() function (as already mentioned by Aaron) should give exactly the result you're looking for:
import numpy as np
import skimage
# Initialize example array
arr = np.array([
[1, 1, 0, 0, 0, 1, 1],
[1, 0, 0, 0, 0, 1, 1],
[1, 0, 1, 1, 0, 0, 1],
[0, 0, 1, 1, 0, 0, 0],
])
# Label connected regions
result = skimage.measure.label(arr)
print(result)
# Output:
# [[1 1 0 0 0 2 2]
# [1 0 0 0 0 2 2]
# [1 0 3 3 0 0 2]
# [0 0 3 3 0 0 0]]

Get # of contiguous hits and their first/last index in a NumPy array

Here is an itertools solution that returns a list of the lengths of each contiguous block. Here a contiguous block is a run of 1s without breaks in between. Is there a way to also have itertools return the index associated with each block?
import itertools
import numpy as np
stack = np.zeros((10,10))
stack[0] = 1
stack[5,:5] = 1
contiguous_hits = [ (sum( 1 for _ in group )) for row in stack for key, group in itertools.groupby(row) if key ]
Many thanks!
Here's one vectorized method -
def start_stop_per_row(stack):
z = np.zeros((stack.shape[0],1),dtype=stack.dtype)
z_ext = np.column_stack((z,stack,z))
mask = z_ext[:,1:] != z_ext[:,:-1]
idx = np.argwhere(mask)
return pd.DataFrame({'row':idx[::2,0],'start':idx[::2,1],'stop':idx[1::2,1]-1})
Sample run -
In [108]: stack
Out[108]:
array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 1, 1, 1, 0, 0, 1, 1, 1],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 0, 0, 0, 0, 0]])
In [109]: start_stop_per_row(stack)
Out[109]:
row start stop
0 0 0 9
1 2 1 4
2 2 7 9
3 5 0 4

Categories

Resources