I want to create a new list that is the sum of the columns of the previous lists.
I have a lot of different lists and I would like to sum up all of the different lists in the most efficient way possible. Below is an example of the issue I am trying to solve:
list[0] = [2,4,1,6,7]
list[1] = [3,1,2,11,0]
list[2] = [2,4,2,2,1]
...
list[999] = [4,2,5,6,7]
The newlist would then look something like this:
Newlist = [1340,1525,675,1825,895]
What would be the best way to create the new list.
What do you think of this solution:
import numpy as np
a = list()
a.append([2, 4, 1, 6, 7]) # a[0]
a.append([3, 1, 2, 11, 0]) # a[1]
a.append([2, 4, 2, 2, 1]) # a[2]
# 1st solution
rslt_1 = a[0]
for _ in range(1, len(a)):
rslt_1 = np.add(rslt_1, a[_])
# 2nd solution
rslt_2 = np.sum(a, axis=0)
print("Rslt_1:", rslt_1)
print("Rslt_2:", rslt_2)
Returns:
Rslt_1: [ 7 9 5 19 8]
Rslt_2: [ 7 9 5 19 8]
Related
I have a multi-dimensional array for scores, and for which, I need to get sum of each columns at 3rd level in Python. I am using Numpy to achieve this.
import numpy as np
Data is something like:
score_list = [
[[1,1,3], [1,2,5]],
[[2,7,5], [4,1,3]]
]
This should return:
[[3 8 8] [5 3 8]]
Which is happening correctly using this:
sum_array = np_array.sum(axis=0)
print(sum_array)
However, if I have irregular shape like this:
score_list = [
[[1,1], [1,2,5]],
[[2,7], [4,1,3]]
]
I expect it to return:
[[3 8] [5 3 8]]
However, it comes up with warning and the return value is:
[list([1, 1, 2, 7]) list([1, 2, 5, 4, 1, 3])]
How can I get expected result?
numpy will try to cast it into an nd array which will fail, instead consider passing each sublist individually using zip.
score_list = [
[[1,1], [1,2,5]],
[[2,7], [4,1,3]]
]
import numpy as np
res = [np.sum(x,axis=0) for x in zip(*score_list)]
print(res)
[array([3, 8]), array([5, 3, 8])]
Here is one solution for doing this, keep in mind that it doesn't use numpy and will be very inefficient for larger matrices (but for smaller matrices runs just fine).
# Create matrix
score_list = [
[[1,1,3], [1,2,5]],
[[2,7,5], [4,1,3]]
]
# Get each row
for i in range(1, len(score_list)):
# Get each list within the row
for j in range(len(score_list[i])):
# Get each value in each list
for k in range(len(score_list[i][j])):
# Add current value to the same index
# on the first row
score_list[0][j][k] += score_list[i][j][k]
print(score_list[0])
There is bound to be a better solution but this is a temporary fix for you :)
Edit. Made more efficient
A possible solution:
a = np.vstack([np.array(score_list[x], dtype='object')
for x in range(len(score_list))])
[np.add(*[x for x in a[:, i]]) for i in range(a.shape[1])]
Another possible solution:
a = sum(score_list, [])
b = [a[x] for x in range(0,len(a),2)]
c = [a[x] for x in range(1,len(a),2)]
[np.add(x[0], x[1]) for x in [b, c]]
Output:
[array([3, 8]), array([5, 3, 8])]
I am working in a colab with some dataframes and I have two numpy arrays:
-First one indicates the index of a row.
-The other one indicates the number of repetitions (I did some methods before all this).
If I print both arrays I get something like this:
print(uniqueValues, occurCount)
OUTPUT: [ 13 33 66 ... 99907 99911 99928] [7 1 6 ... 1 6 4]
We can interpret it as: 13 is repeated 7 times, 33 is repeated 1 time....
Now the question:
How can i remove the index and the repetition from both arrays, based on the number of repetition?
Example:
if < 5 then remove element
Expected output:[ 13 66 ... 99911] [7 6 ... 6]
You can use the matching values from occurCount as a filter on uniqueValues and occurCount using boolean indexing:
uniqueValues = uniqueValues[occurCount >= 5]
occurCount = occurCount[occurCount >= 5]
For example:
import numpy as np
uniqueValues = np.array([13, 33, 66, 99907, 99911, 99928])
occurCount = np.array([7, 1, 6, 1, 6, 4])
uniqueValues = uniqueValues[occurCount >= 5]
occurCount = occurCount[occurCount >= 5]
print(uniqueValues )
print(occurCount)
Output:
[ 13 66 99911]
[7 6 6]
uniqueValues = np.array([13, 33, 66, 99907, 99911, 99928])
occurCount = np.array([7, 1, 6, 1, 6, 4])
np.array([uniqueValues, occurCount])[:, occurCount >= 5]
will return a 2 dim array with your results. but the logic is the same as pointed by Nick.
Create a new array where you will append the indexes for occurCount values that meet the criteria of <5. Then use these index value to delete these values from both arrays and store the new version of the array. Need to assign it to the variables because the original np arrays are immutable.
import numpy as np
uniqueValues = np.array([13, 33, 66, 99907, 99911, 99928])
occurCount = np.array([7, 1, 6, 1, 6, 4])
indexes = []
for index, item in enumerate(y):
if item < 5:
indexes.append(index)
y = np.delete(y, indexes)
x = np.delete(x, indexes)
print(x, y)
I am pretty new to python and i am yet to get a good handle on it.
I am dealing with huge datas in arrays and matrices, so i need some help in parellelizing the loops.
Heres my exact problem :
#Program Block starts
A= [1, 2, 3, 4, 5]
B= [2, 5, 7, 9, 15]
# I have a 3*3 matrix
C = [[0 for x in range(3)] for x in range(3)]
C[0][:]=[2,5,7]
C[1][:]=[7,9,15]
C[2][:]=[2,9,15]
#C is composed of elements of the array B and i want to change each element which would correspond to A i.e. change 2 to 1 , 5 to 2, and so on.
C_new = []
for el in range(0, 3):
n = C[el][:]
n_new=[]
for i in range(0, 3):
for j in range(0,5):
if n[i]== B[j]:
n_new.append(j+1)
C_new.append(n_new)
#Program Block Ends
I will obtain an output of
C_new =[ 1 2 3; 3 4 5; 1 4 5]
My original sizes are as follows:
A & B have 600000
C has 4000000*4
so i would like to parallelize wrt the rows of C ..break down 4000000 into parts..
For something like this, I recommend you taking a look at Numpy. It's been built to work with large arrays such as this.
Simple Matlab code: e.g A(5+(1:3)) -> gives [A(6), A(7), A(8)]
In the above, A is a vector or a matrix. For instance:
A = [1 2 3 4 5 6 7 8 9 10];
A(5+(1:3))
ans =
6 7 8
Note that MATLAB indexing starts at 1, not 0.
How can i do the same in Python?
You are looking for slicing behavior
A = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>> A[5:8]
[6, 7, 8]
If A is some function that you want to call with parameters 6, 7, and 8, you could use a list comprehension.
answers = [A(6+i) for i in range(3)]
You want to do two things.
First, create a range (5 + (1:3)) which could be done in Python like range(number).
Second, apply a function to each range index. This could be done with map or a for loop.
The for loop solutions have been addressed, so here's a map based one:
result = map(A,your_range)
Use a list comprehension:
x = 5
f = 1 # from
t = 3 # till
print [x+i for i in range(f,t+1)]
If you are trying to use subscripts to create an array which is a subset of the whole array:
subset_list = A[6:8]
in python u can do it easily by A[5:5+3] . u can reference the values 5 and 3 by variables also like
b=5
c=3
a[b:b+c]
How to find out which indices belong to the lowest x (say, 5) numbers of an array?
[10.18398473, 9.95722384, 9.41220631, 9.42846614, 9.7300549 , 9.69949144, 9.86997862, 10.28299122, 9.97274071, 10.08966867, 9.7]
Also, how to directly find the sorted (from low to high) lowest x numbers?
The existing answers are nice, but here's the solution if you're using numpy:
mylist = np.array([10.18398473, 9.95722384, 9.41220631, 9.42846614, 9.7300549 , 9.69949144, 9.86997862, 10.28299122, 9.97274071, 10.08966867, 9.7])
x = 5
lowestx = np.argsort(mylist)[:x]
#array([ 2, 3, 5, 10, 4])
You could do something like this:
>>> l = [5, 1, 2, 4, 6]
>>> sorted(range(len(l)), key=lambda i: l[i])
[1, 2, 3, 0, 4]
mylist = [10.18398473, 9.95722384, 9.41220631, 9.42846614, 9.7300549 , 9.69949144, 9.86997862, 10.28299122, 9.97274071, 10.08966867, 9.7]
# lowest 5
lowest = sorted(mylist)[:5]
# indices of lowest 5
lowest_ind = [i for i, v in enumerate(mylist) if v in lowest]
# 5 indices of lowest 5
import operator
lowest_5ind = [i for i, v in sorted(enumerate(mylist), key=operator.itemgetter(1))[:5]]
[a.index(b) for b in sorted(a)[:5]]
sorted(a)[.x]