Create histogram from two arrays

Create histogram from two arrays - python

I have two numpy arrays with the same dimensions: weights, and percents. Percents is 'real' data, and the weights is how many of each 'real' data there is in the histogram.
Eg)
weights = [[0, 1, 1, 4, 2]
[0, 1, 0, 3, 5]]
percents = [[1, 2, 3, 4, 5]
[1, 2, 3, 4, 5]]
(every row of percents is the same)
I would like to "multiply" these together in such a way that I produce weights[x] * [percents[x]]:
results = [[0 * [1] + 1 * [2] + 1 * [3] + 4 * [4] + 2 * [5]
[0 * [1] + 1 * [2] + 0 * [3] + 3 * [4] + 5 * [5]]
= [[2, 3, 4, 4, 4, 4, 5, 5]
[2, 4, 4, 4, 5, 5, 5, 5, 5]]
Notice that the lengths of each row can be different.. Ideally this can be done in numpy but because of this it may end up being a list of lists.
Edit:
I've been able to cobble together these nested for loops but obviously it's not ideal:
list_of_hists = []
for index in df.index:
hist = []
# Create a list of lists, later to be flattened to 'results'
for i, percent in enumerate(percents):
hist.append(
# For each percent, create a list of [percent] * weight
[percent]
* int(
df.iloc[index].values[i]
)
)
# flatten the list of lists in hist
results = [val for list_ in hist for val in list_]
list_of_hists.append(results)

There is a np.repeat designed for such kind of operations but it doesn't work in 2D case. So you need to work with flattened views of arrays instead.
weights = np.array([[0, 1, 1, 4, 2], [0, 1, 0, 3, 5]])
percents = np.array([[1, 2, 3, 4, 5], [1, 2, 3, 4, 5]])
>>> np.repeat(percents.ravel(), weights.ravel())
array([2, 3, 4, 4, 4, 4, 5, 5, 2, 4, 4, 4, 5, 5, 5, 5, 5])
And after that you need to select index locations where to split it:
>>> np.split(np.repeat(percents.ravel(), weights.ravel()), np.cumsum(np.sum(weights, axis=1)[:-1]))
[array([2, 3, 4, 4, 4, 4, 5, 5]), array([2, 4, 4, 4, 5, 5, 5, 5, 5])]
Note that np.split is quite unefficient operation as well as your wish to make array out of rows of unequal lenghts.

You can use list-comprehension and reduce from functools:
import functools
res=[functools.reduce(lambda x,y: x+y,
[x*[y] for x, y in zip(w, p)])
for w, p in zip(weights, percents)]
OUTPUT:
[[2, 3, 4, 4, 4, 4, 5, 5],
[2, 4, 4, 4, 5, 5, 5, 5, 5]]
Or, just list-comprehension solution only:
res= [[j for i in [x*[y]
for x, y in zip(w, p)]
for j in i]
for w, p in zip(weights, percents)]
OUTPUT:
[[2, 3, 4, 4, 4, 4, 5, 5],
[2, 4, 4, 4, 5, 5, 5, 5, 5]]

Related

Shuffling two 2D tensors in PyTorch and maintaining same order correlation

Is it possible to shuffle two 2D tensors in PyTorch by their rows, but maintain the same order for both? I know you can shuffle a 2D tensor by rows with the following code:
a=a[torch.randperm(a.size()[0])]
To elaborate:
If I had 2 tensors
a = torch.tensor([[1, 1, 1, 1, 1],
[2, 2, 2, 2, 2],
[3, 3, 3, 3, 3]])
b = torch.tensor([[4, 4, 4, 4, 4],
[5, 5, 5, 5, 5],
[6, 6, 6, 6, 6]])
And ran them through some function/block of code to shuffle randomly but maintain correlation and produce something like the following
a = torch.tensor([[2, 2, 2, 2, 2],
[1, 1, 1, 1, 1],
[3, 3, 3, 3, 3]])
b = torch.tensor([[5, 5, 5, 5, 5],
[4, 4, 4, 4, 4],
[6, 6, 6, 6, 6]])
My current solution is converting to a list, using the random.shuffle() function like below.
a_list = a.tolist()
b_list = b.tolist()
temp_list = list(zip(a_list , b_list ))
random.shuffle(temp_list) # Shuffle
a_temp, b_temp = zip(*temp_list)
a_list, b_list = list(a_temp), list(b_temp)
# Convert back to tensors
a = torch.tensor(a_list)
b = torch.tensor(b_list)
This takes quite a while and was wondering if there is a better way.

You mean
indices = torch.randperm(a.size()[0])
a=a[indices]
b=b[indices]
?

checking if list contains same element in straight row

so,I have a list [5. 5. 5. 5. 0. 0.] I want to check if it contains same elements in straight row (atleast 4 )
I came up with this
for i in list:
w = []
for x in range(len(i)-4):
if (i[x] == i[x+1] == i[x+2] == i[x+3] != 0) :
print(i[x])
break
I gives me desired output,
but , what would the efficient way to achieving the same result ,without much looping ?

In numpy it's worth to find indices of value changes and return every index that preceeds another index that differs not less than 4.
x = np.array([6,6,6,6,7,7,8,8,4,4,4,4,4,4,4,3,3,3,3])
div_points = np.flatnonzero(np.diff(x, prepend=x[0]+1, append=x[-1]+1))
idx = np.r_[np.diff(div_points)>=4, False]
x[div_points[idx]]
>>> array([6, 4, 3])
And if you're quite lazy, you could just 'slide' all the comparisons:
view = np.lib.stride_tricks.sliding_window_view(x, 4)
view
array([[6, 6, 6, 6],
[6, 6, 6, 7],
[6, 6, 7, 7],
[6, 7, 7, 8],
[7, 7, 8, 8],
[7, 8, 8, 4],
[8, 8, 4, 4],
[8, 4, 4, 4],
[4, 4, 4, 4],
[4, 4, 4, 4],
[4, 4, 4, 4],
[4, 4, 4, 4],
[4, 4, 4, 3],
[4, 4, 3, 3],
[4, 3, 3, 3],
[3, 3, 3, 3]])
r = np.all(x[:-3, None] == view, axis=1)
x[:-3][r]
>>> array([6, 4, 4, 4, 4, 3])

Here's a one-liner that will return the indexes of all items in the specified 1D array that are identical to the next N items. Note that it requires the array be of floats because it uses NaN.
import functools as ft
N = 3
indexes = np.where(ft.reduce(lambda prev, cur: prev.astype(bool) & (cur == a), [np.pad(a, (0,i), constant_values=np.nan)[i:] for i in range(1, N+1)]))[0]
Example:
>>> import functools as ft
>>> a = np.array([1, 2, 3, 3, 3, 3, 4, 5, 10, 10, 10, 10, 10, 2, 4], dtype=float)
>>> N = 3
>>> indexes = np.where(ft.reduce(lambda prev, cur: prev.astype(bool) & (cur == a), [np.pad(a, (0,i), constant_values=np.nan)[i:] for i in range(1, N+1)]))[0]
>>> indexes
array([2, 8, 9])
Now, if we look at the array and the indexes:
[1, 2, 3, 3, 3, 3, 4, 5, 10, 10, 10, 10, 10, 2, 4]
^ 2 ^^ 8
^^ 9

Try this:
lst = [3, 5, 5, 5, 5, 0, 0]
for i in range(4, len(lst)):
if len(set(lst[i-4:i]))==1:
print(lst[i-4])

If you want to solve this problem without any library, you can iterate the list and count the Continuous elements:
a = [
[5, 5, 5, 5, 0, 0],
[1, 2, 3, 4, 5, 6],
[0, 1, 1, 1, 0, 0, 0],
[0, 0, 1, 1, 1, 1, 0],
[0, 1, 2, 2, 2, 2],
[1, 1, 1, 1]
]
def contains_n(n, l):
c = 0
last = ""
for v in l:
if v == last:
c += 1
else:
c = 1
last = v
if c == n:
return True
return False
for v in a:
print(contains_n(4, v))
The output will be:
True
False
False
True
True
True

Python: How to split a list into ordered chunks

If I have the following list
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Then
np.array_split([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], 3)
Returns
[array([0, 1, 2, 3]), array([4, 5, 6]), array([7, 8, 9])]
Is there a way to get the sub-arrays in the following order?
[array([0, 3, 6, 9]), array([1, 4, 7]), array([2, 5, 8])]

As the lists are of differing lengths, a numpy.ndarray isn't possible without a bit of fiddling, as all sub-arrays must be the same length.
However, if a simple list meets your requirement, you can use:
l = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
l2 = []
for i in range(3):
l2.append(l[i::3])
Output:
[[0, 3, 6, 9], [1, 4, 7], [2, 5, 8]]
Or more concisely, giving the same output:
[l[i::3] for i in range(3)]

Let's look into source code refactor of np.array_split:
def array_split(arr, Nsections):
Neach_section, extras = divmod(len(arr), Nsections)
section_sizes = ([0] + extras * [Neach_section + 1] + (Nsections - extras) * [Neach_section])
div_points = np.array(section_sizes).cumsum()
sub_arrs = []
for i in range(Nsections):
st = div_points[i]
end = div_points[i + 1]
sub_arrs.append(arr[st:end])
return sub_arrs
Taking into account your example arr = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] and Nsections = 3 it will construct section sizes [0, 4, 3, 3] and dividing points [0, 4, 7, 10]. Then do something like this:
[arr[div_points[i]:div_points[i + 1]] for i in range(3)]
Trying to mimic behaviour of numpy, indeed,
def array_split_withswap(arr, N):
sub_arrs = []
for i in range(N):
sub_arrs.append(arr[i::N])
Is the best option to go with (like in #S3DEV solution).

Duplicating specific elements in lists or Numpy arrays

I work with large data sets in my research.
I need to duplicate an element in a Numpy array. The code below achieves this, but is there a function in Numpy that performs the operation in a more efficient manner?
"""
Example output
>>> (executing file "example.py")
Choose a number between 1 and 10:
2
Choose number of repetitions:
9
Your output array is:
[1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>>
"""
x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
y = int(input('Choose the number you want to repeat (1-10):\n'))
repetitions = int(input('Choose number of repetitions:\n'))
output = []
for i in range(len(x)):
if x[i] != y:
output.append(x[i])
else:
for j in range(repetitions):
output.append(x[i])
print('Your output array is:\n', output)

One approach would be to find the index of the element to be repeated with np.searchsorted. Use that index to slice the left and right sides of the array and insert the repeated array in between.
Thus, one solution would be -
idx = np.searchsorted(x,y)
out = np.concatenate(( x[:idx], np.repeat(y, repetitions), x[idx+1:] ))
Let's consider a bit more generic sample case with x as -
x = [2, 4, 5, 6, 7, 8, 9, 10]
Let the number to be repeated is y = 5 and repetitions = 7.
Now, use the proposed codes -
In [57]: idx = np.searchsorted(x,y)
In [58]: idx
Out[58]: 2
In [59]: np.concatenate(( x[:idx], np.repeat(y, repetitions), x[idx+1:] ))
Out[59]: array([ 2, 4, 5, 5, 5, 5, 5, 5, 5, 6, 7, 8, 9, 10])
For the specific case of x always being [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], we would have a more compact/elegant solution, like so -
np.r_[x[:y-1], [y]*repetitions, x[y:]]

There is the numpy.repeat function:
>>> np.repeat(3, 4)
array([3, 3, 3, 3])
>>> x = np.array([[1,2],[3,4]])
>>> np.repeat(x, 2)
array([1, 1, 2, 2, 3, 3, 4, 4])
>>> np.repeat(x, 3, axis=1)
array([[1, 1, 1, 2, 2, 2],
[3, 3, 3, 4, 4, 4]])
>>> np.repeat(x, [1, 2], axis=0)
array([[1, 2],
[3, 4],
[3, 4]])

Python - Read nth line in a matrix

What is the simplest way I can read the nth letter of a matrix?
I thought this would be possible with a simple for loop but so far I haven't had any luck.
The best I can do so far is using a count which is not exactly elegant:
matrix = [[1, 3, 5, 2, 6, 2, 4, 1], [2, 6, 1, 6, 2, 5, 7], [1, 6, 2, 6, 8, 2, 6]]
count = 0
for n in matrix:
print matrix[count][nth]
count += 1
For example:
Read the 0th number of every row: 1, 2, 1.
Read the 4th number of every row: 6, 2, 8.

If your need to do this operation a lot you could transpose your matrix using zip(*matrix)
>>> matrix = [[1, 3, 5, 2, 6, 2, 4, 1], [2, 6, 1, 6, 2, 5, 7], [1, 6, 2, 6, 8, 2, 6]]
>>> matrix_t = zip(*matrix)
>>> matrix_t
[(1, 2, 1), (3, 6, 6), (5, 1, 2), (2, 6, 6), (6, 2, 8), (2, 5, 2), (4, 7, 6)]
>>> matrix_t[0]
(1, 2, 1)
>>> matrix_t[3]
(2, 6, 6)

Here's something that will handle rows of different lengths (as in your example), as well as supporting Python's special interpretation of negative indexes as relative to the end of the sequence (by changing them intolen(s) + n):
NULL = type('NULL', (object,), {'__repr__': lambda self: '<NULL>'})()
def nth_elems(n):
abs_n = abs(n)
return [row[n] if abs_n < len(row) else NULL for row in matrix]
matrix = [[1, 3, 5, 2, 6, 2, 4, 1], [2, 6, 1, 6, 2, 5, 7], [1, 6, 2, 6, 8, 2, 6]]
print nth_elems(0) # [1, 2, 1]
print nth_elems(6) # [4, 7, 6]
print nth_elems(7) # [1, <NULL>, <NULL>]
print nth_elems(-1) # [1, 7, 6]

Maybe this way?
column = [row[0] for row in matrix]
(for the 0th element)

In [1]: matrix = [[1, 3, 5, 2, 6, 2, 4, 1], [2, 6, 1, 6, 2, 5, 7], [1, 6, 2, 6, 8, 2, 6]]
In [2]: nth=0
In [3]: [row[nth] for row in matrix]
Out[3]: [1, 2, 1]
In [4]: nth=4
In [5]: [row[nth] for row in matrix]
Out[5]: [6, 2, 8]

Here is a solution using list comprehension:
[x[0] for x in matrix]
Which is basically, equal to:
for x in matrix:
print x[0]
You can also make it a function:
def getColumn(lst, col):
return [i[col] for i in lst]
Demo:
>>> matrix = [[1, 3, 5, 2, 6, 2, 4, 1], [2, 6, 1, 6, 2, 5, 7], [1, 6, 2, 6, 8, 2, 6]]
>>> def getColumn(lst, col):
return [i[col] for i in lst]
>>> getColumn(matrix, 0)
[1, 2, 1]
>>> getColumn(matrix, 5)
[2, 5, 2]
Hope this helps!

List comprehensions will work well here:
>>> matrix = [[1, 3, 5, 2, 6, 2, 4, 1], [2, 6, 1, 6, 2, 5, 7], [1, 6, 2, 6, 8, 2, 6]]
>>> # Get all the 0th indexes
>>> a = [item[0] for item in matrix]
>>> a
[1, 2, 1]
>>> # Get all the 4th indexes
>>> b = [item[4] for item in matrix]
>>> b
[6, 2, 8]
>>>

Your for loop is likely not doing what you expect. n is not an integer. It is the current row.
I think what you wanted to do was:
for row in matrix:
print row[0], row[4]
This prints,
1 6
2 2
1 8
Also, strictly speaking, matrix is a list of lists. To really have a matrix you might need to use numpy.

Lists in Python are not intended to be used like this. Using list comprehension may cause both memory and CPU issues if the data is sufficiently big. Consider using numpy if this is an issue.

Use zip:
>>> matrix = [[1, 3, 5, 2, 6, 2, 4, 1], [2, 6, 1, 6, 2, 5, 7], [1, 6, 2, 6, 8, 2, 6]]
>>> zip(*matrix)[0]
(1, 2, 1)
>>> zip(*matrix)[4]
(6, 2, 8)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Create histogram from two arrays - python

Related

Shuffling two 2D tensors in PyTorch and maintaining same order correlation

checking if list contains same element in straight row

Python: How to split a list into ordered chunks

Duplicating specific elements in lists or Numpy arrays

Python - Read nth line in a matrix

Categories

Resources