Duplicating specific elements in lists or Numpy arrays - python

I work with large data sets in my research.
I need to duplicate an element in a Numpy array. The code below achieves this, but is there a function in Numpy that performs the operation in a more efficient manner?
"""
Example output
>>> (executing file "example.py")
Choose a number between 1 and 10:
2
Choose number of repetitions:
9
Your output array is:
[1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>>
"""
x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
y = int(input('Choose the number you want to repeat (1-10):\n'))
repetitions = int(input('Choose number of repetitions:\n'))
output = []
for i in range(len(x)):
if x[i] != y:
output.append(x[i])
else:
for j in range(repetitions):
output.append(x[i])
print('Your output array is:\n', output)

One approach would be to find the index of the element to be repeated with np.searchsorted. Use that index to slice the left and right sides of the array and insert the repeated array in between.
Thus, one solution would be -
idx = np.searchsorted(x,y)
out = np.concatenate(( x[:idx], np.repeat(y, repetitions), x[idx+1:] ))
Let's consider a bit more generic sample case with x as -
x = [2, 4, 5, 6, 7, 8, 9, 10]
Let the number to be repeated is y = 5 and repetitions = 7.
Now, use the proposed codes -
In [57]: idx = np.searchsorted(x,y)
In [58]: idx
Out[58]: 2
In [59]: np.concatenate(( x[:idx], np.repeat(y, repetitions), x[idx+1:] ))
Out[59]: array([ 2, 4, 5, 5, 5, 5, 5, 5, 5, 6, 7, 8, 9, 10])
For the specific case of x always being [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], we would have a more compact/elegant solution, like so -
np.r_[x[:y-1], [y]*repetitions, x[y:]]

There is the numpy.repeat function:
>>> np.repeat(3, 4)
array([3, 3, 3, 3])
>>> x = np.array([[1,2],[3,4]])
>>> np.repeat(x, 2)
array([1, 1, 2, 2, 3, 3, 4, 4])
>>> np.repeat(x, 3, axis=1)
array([[1, 1, 1, 2, 2, 2],
[3, 3, 3, 4, 4, 4]])
>>> np.repeat(x, [1, 2], axis=0)
array([[1, 2],
[3, 4],
[3, 4]])

Related

checking if list contains same element in straight row

so,I have a list [5. 5. 5. 5. 0. 0.] I want to check if it contains same elements in straight row (atleast 4 )
I came up with this
for i in list:
w = []
for x in range(len(i)-4):
if (i[x] == i[x+1] == i[x+2] == i[x+3] != 0) :
print(i[x])
break
I gives me desired output,
but , what would the efficient way to achieving the same result ,without much looping ?
In numpy it's worth to find indices of value changes and return every index that preceeds another index that differs not less than 4.
x = np.array([6,6,6,6,7,7,8,8,4,4,4,4,4,4,4,3,3,3,3])
div_points = np.flatnonzero(np.diff(x, prepend=x[0]+1, append=x[-1]+1))
idx = np.r_[np.diff(div_points)>=4, False]
x[div_points[idx]]
>>> array([6, 4, 3])
And if you're quite lazy, you could just 'slide' all the comparisons:
view = np.lib.stride_tricks.sliding_window_view(x, 4)
view
array([[6, 6, 6, 6],
[6, 6, 6, 7],
[6, 6, 7, 7],
[6, 7, 7, 8],
[7, 7, 8, 8],
[7, 8, 8, 4],
[8, 8, 4, 4],
[8, 4, 4, 4],
[4, 4, 4, 4],
[4, 4, 4, 4],
[4, 4, 4, 4],
[4, 4, 4, 4],
[4, 4, 4, 3],
[4, 4, 3, 3],
[4, 3, 3, 3],
[3, 3, 3, 3]])
r = np.all(x[:-3, None] == view, axis=1)
x[:-3][r]
>>> array([6, 4, 4, 4, 4, 3])
Here's a one-liner that will return the indexes of all items in the specified 1D array that are identical to the next N items. Note that it requires the array be of floats because it uses NaN.
import functools as ft
N = 3
indexes = np.where(ft.reduce(lambda prev, cur: prev.astype(bool) & (cur == a), [np.pad(a, (0,i), constant_values=np.nan)[i:] for i in range(1, N+1)]))[0]
Example:
>>> import functools as ft
>>> a = np.array([1, 2, 3, 3, 3, 3, 4, 5, 10, 10, 10, 10, 10, 2, 4], dtype=float)
>>> N = 3
>>> indexes = np.where(ft.reduce(lambda prev, cur: prev.astype(bool) & (cur == a), [np.pad(a, (0,i), constant_values=np.nan)[i:] for i in range(1, N+1)]))[0]
>>> indexes
array([2, 8, 9])
Now, if we look at the array and the indexes:
[1, 2, 3, 3, 3, 3, 4, 5, 10, 10, 10, 10, 10, 2, 4]
^ 2 ^^ 8
^^ 9
Try this:
lst = [3, 5, 5, 5, 5, 0, 0]
for i in range(4, len(lst)):
if len(set(lst[i-4:i]))==1:
print(lst[i-4])
If you want to solve this problem without any library, you can iterate the list and count the Continuous elements:
a = [
[5, 5, 5, 5, 0, 0],
[1, 2, 3, 4, 5, 6],
[0, 1, 1, 1, 0, 0, 0],
[0, 0, 1, 1, 1, 1, 0],
[0, 1, 2, 2, 2, 2],
[1, 1, 1, 1]
]
def contains_n(n, l):
c = 0
last = ""
for v in l:
if v == last:
c += 1
else:
c = 1
last = v
if c == n:
return True
return False
for v in a:
print(contains_n(4, v))
The output will be:
True
False
False
True
True
True

Python: How to split a list into ordered chunks

If I have the following list
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Then
np.array_split([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], 3)
Returns
[array([0, 1, 2, 3]), array([4, 5, 6]), array([7, 8, 9])]
Is there a way to get the sub-arrays in the following order?
[array([0, 3, 6, 9]), array([1, 4, 7]), array([2, 5, 8])]
As the lists are of differing lengths, a numpy.ndarray isn't possible without a bit of fiddling, as all sub-arrays must be the same length.
However, if a simple list meets your requirement, you can use:
l = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
l2 = []
for i in range(3):
l2.append(l[i::3])
Output:
[[0, 3, 6, 9], [1, 4, 7], [2, 5, 8]]
Or more concisely, giving the same output:
[l[i::3] for i in range(3)]
Let's look into source code refactor of np.array_split:
def array_split(arr, Nsections):
Neach_section, extras = divmod(len(arr), Nsections)
section_sizes = ([0] + extras * [Neach_section + 1] + (Nsections - extras) * [Neach_section])
div_points = np.array(section_sizes).cumsum()
sub_arrs = []
for i in range(Nsections):
st = div_points[i]
end = div_points[i + 1]
sub_arrs.append(arr[st:end])
return sub_arrs
Taking into account your example arr = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] and Nsections = 3 it will construct section sizes [0, 4, 3, 3] and dividing points [0, 4, 7, 10]. Then do something like this:
[arr[div_points[i]:div_points[i + 1]] for i in range(3)]
Trying to mimic behaviour of numpy, indeed,
def array_split_withswap(arr, N):
sub_arrs = []
for i in range(N):
sub_arrs.append(arr[i::N])
Is the best option to go with (like in #S3DEV solution).

Create histogram from two arrays

I have two numpy arrays with the same dimensions: weights, and percents. Percents is 'real' data, and the weights is how many of each 'real' data there is in the histogram.
Eg)
weights = [[0, 1, 1, 4, 2]
[0, 1, 0, 3, 5]]
percents = [[1, 2, 3, 4, 5]
[1, 2, 3, 4, 5]]
(every row of percents is the same)
I would like to "multiply" these together in such a way that I produce weights[x] * [percents[x]]:
results = [[0 * [1] + 1 * [2] + 1 * [3] + 4 * [4] + 2 * [5]
[0 * [1] + 1 * [2] + 0 * [3] + 3 * [4] + 5 * [5]]
= [[2, 3, 4, 4, 4, 4, 5, 5]
[2, 4, 4, 4, 5, 5, 5, 5, 5]]
Notice that the lengths of each row can be different.. Ideally this can be done in numpy but because of this it may end up being a list of lists.
Edit:
I've been able to cobble together these nested for loops but obviously it's not ideal:
list_of_hists = []
for index in df.index:
hist = []
# Create a list of lists, later to be flattened to 'results'
for i, percent in enumerate(percents):
hist.append(
# For each percent, create a list of [percent] * weight
[percent]
* int(
df.iloc[index].values[i]
)
)
# flatten the list of lists in hist
results = [val for list_ in hist for val in list_]
list_of_hists.append(results)
There is a np.repeat designed for such kind of operations but it doesn't work in 2D case. So you need to work with flattened views of arrays instead.
weights = np.array([[0, 1, 1, 4, 2], [0, 1, 0, 3, 5]])
percents = np.array([[1, 2, 3, 4, 5], [1, 2, 3, 4, 5]])
>>> np.repeat(percents.ravel(), weights.ravel())
array([2, 3, 4, 4, 4, 4, 5, 5, 2, 4, 4, 4, 5, 5, 5, 5, 5])
And after that you need to select index locations where to split it:
>>> np.split(np.repeat(percents.ravel(), weights.ravel()), np.cumsum(np.sum(weights, axis=1)[:-1]))
[array([2, 3, 4, 4, 4, 4, 5, 5]), array([2, 4, 4, 4, 5, 5, 5, 5, 5])]
Note that np.split is quite unefficient operation as well as your wish to make array out of rows of unequal lenghts.
You can use list-comprehension and reduce from functools:
import functools
res=[functools.reduce(lambda x,y: x+y,
[x*[y] for x, y in zip(w, p)])
for w, p in zip(weights, percents)]
OUTPUT:
[[2, 3, 4, 4, 4, 4, 5, 5],
[2, 4, 4, 4, 5, 5, 5, 5, 5]]
Or, just list-comprehension solution only:
res= [[j for i in [x*[y]
for x, y in zip(w, p)]
for j in i]
for w, p in zip(weights, percents)]
OUTPUT:
[[2, 3, 4, 4, 4, 4, 5, 5],
[2, 4, 4, 4, 5, 5, 5, 5, 5]]

Make a list of ranges in numpy

I want to make a list of integer sequences with random start points. The way I would do this in pure python is
x = np.zeros(1000, 10) # 1000 sequences of 10 elements each
starts = np.random.randint(1, 1000, 1000)
for i in range(len(x)):
x[i] = np.arange(starts[i], starts[i] + 10)
I wonder if there is a more elegant way of doing this using Numpy functionality.
You can use broadcasting after extending starts to a 2D version and adding in the 1D range array, like so -
x = starts[:,None] + np.arange(10)
Explanation
Let's take a small example for starts to see what that broadcasting does in this case.
In [382]: starts
Out[382]: array([3, 1, 3, 2])
In [383]: starts.shape
Out[383]: (4,)
In [384]: starts[:,None]
Out[384]:
array([[3],
[1],
[3],
[2]])
In [385]: starts[:,None].shape
Out[385]: (4, 1)
In [386]: np.arange(10).shape
Out[386]: (10,)
Thus, looking at the shapes and putting those together, a schematic diagram of the same would look something like this -
starts : 4
np.arange(10) : 10
After extending starts :
starts[:,None] : 4 x 1
np.arange(10) : 10
Thus, when we add starts[:,None] with np.arange(10), the elems of starts[:,None] would be broadcasted along its second axis 10 times corresponding to the length of the other array along that axis. For np.arange(10), it would be converted to 2D with its first dim being a singleton dim and its elems being broadcasted along it 4 times correspoinding to the length of 4 for the other array starts[:,None] along that axis. Please note that there aren't explicit replications, as under the hood the elems are broadcasted and added on the fly.
Thus, functionally we would have the replications, like so -
In [391]: np.repeat(starts[:,None],10,axis=1)
Out[391]:
array([[3, 3, 3, 3, 3, 3, 3, 3, 3, 3],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[3, 3, 3, 3, 3, 3, 3, 3, 3, 3],
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2]])
In [392]: np.repeat(np.arange(10)[None],4,axis=0)
Out[392]:
array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])
These broadcasted elems are then added to give us the desired output x.

Python - Read nth line in a matrix

What is the simplest way I can read the nth letter of a matrix?
I thought this would be possible with a simple for loop but so far I haven't had any luck.
The best I can do so far is using a count which is not exactly elegant:
matrix = [[1, 3, 5, 2, 6, 2, 4, 1], [2, 6, 1, 6, 2, 5, 7], [1, 6, 2, 6, 8, 2, 6]]
count = 0
for n in matrix:
print matrix[count][nth]
count += 1
For example:
Read the 0th number of every row: 1, 2, 1.
Read the 4th number of every row: 6, 2, 8.
If your need to do this operation a lot you could transpose your matrix using zip(*matrix)
>>> matrix = [[1, 3, 5, 2, 6, 2, 4, 1], [2, 6, 1, 6, 2, 5, 7], [1, 6, 2, 6, 8, 2, 6]]
>>> matrix_t = zip(*matrix)
>>> matrix_t
[(1, 2, 1), (3, 6, 6), (5, 1, 2), (2, 6, 6), (6, 2, 8), (2, 5, 2), (4, 7, 6)]
>>> matrix_t[0]
(1, 2, 1)
>>> matrix_t[3]
(2, 6, 6)
Here's something that will handle rows of different lengths (as in your example), as well as supporting Python's special interpretation of negative indexes as relative to the end of the sequence (by changing them intolen(s) + n):
NULL = type('NULL', (object,), {'__repr__': lambda self: '<NULL>'})()
def nth_elems(n):
abs_n = abs(n)
return [row[n] if abs_n < len(row) else NULL for row in matrix]
matrix = [[1, 3, 5, 2, 6, 2, 4, 1], [2, 6, 1, 6, 2, 5, 7], [1, 6, 2, 6, 8, 2, 6]]
print nth_elems(0) # [1, 2, 1]
print nth_elems(6) # [4, 7, 6]
print nth_elems(7) # [1, <NULL>, <NULL>]
print nth_elems(-1) # [1, 7, 6]
Maybe this way?
column = [row[0] for row in matrix]
(for the 0th element)
In [1]: matrix = [[1, 3, 5, 2, 6, 2, 4, 1], [2, 6, 1, 6, 2, 5, 7], [1, 6, 2, 6, 8, 2, 6]]
In [2]: nth=0
In [3]: [row[nth] for row in matrix]
Out[3]: [1, 2, 1]
In [4]: nth=4
In [5]: [row[nth] for row in matrix]
Out[5]: [6, 2, 8]
Here is a solution using list comprehension:
[x[0] for x in matrix]
Which is basically, equal to:
for x in matrix:
print x[0]
You can also make it a function:
def getColumn(lst, col):
return [i[col] for i in lst]
Demo:
>>> matrix = [[1, 3, 5, 2, 6, 2, 4, 1], [2, 6, 1, 6, 2, 5, 7], [1, 6, 2, 6, 8, 2, 6]]
>>> def getColumn(lst, col):
return [i[col] for i in lst]
>>> getColumn(matrix, 0)
[1, 2, 1]
>>> getColumn(matrix, 5)
[2, 5, 2]
Hope this helps!
List comprehensions will work well here:
>>> matrix = [[1, 3, 5, 2, 6, 2, 4, 1], [2, 6, 1, 6, 2, 5, 7], [1, 6, 2, 6, 8, 2, 6]]
>>> # Get all the 0th indexes
>>> a = [item[0] for item in matrix]
>>> a
[1, 2, 1]
>>> # Get all the 4th indexes
>>> b = [item[4] for item in matrix]
>>> b
[6, 2, 8]
>>>
Your for loop is likely not doing what you expect. n is not an integer. It is the current row.
I think what you wanted to do was:
for row in matrix:
print row[0], row[4]
This prints,
1 6
2 2
1 8
Also, strictly speaking, matrix is a list of lists. To really have a matrix you might need to use numpy.
Lists in Python are not intended to be used like this. Using list comprehension may cause both memory and CPU issues if the data is sufficiently big. Consider using numpy if this is an issue.
Use zip:
>>> matrix = [[1, 3, 5, 2, 6, 2, 4, 1], [2, 6, 1, 6, 2, 5, 7], [1, 6, 2, 6, 8, 2, 6]]
>>> zip(*matrix)[0]
(1, 2, 1)
>>> zip(*matrix)[4]
(6, 2, 8)

Categories

Resources