Related
Suppose I have a list of 10 elements [a, b, c, d, e, f, g, h, i, j] and I can multiply each element by 0, 1, 2, -1, -2.
The total of the multiplication factors I use must be equal to zero. Ie if I multiply five numbers by -1 I must multiply the other five by 1, or I can multiply a by 2, b and c by -1 and the rest by 0.
I want to find the list resulting from this operation that has the largest sum.
How can I go about coding this in python?
I've tried coding every single iteration of [2, 1, 0, -1, -2] and deleting the lists that do not add to 0 and then multiplying by the original list, however I got stuck.
You can sort the list, scan it from the ends towards the center, assigning 2 to the larger element and -2 to the smaller.
def baby_knapsack(xs):
xs = sorted(xs, reverse=True)
res = list()
n = len(xs)
for i in range(n//2):
res.extend(((xs[i], 2), (xs[-1-i], -2)))
if n % 2 == 1:
res.append((xs[n//2], 0))
return res
xs = [-10, -5, 0, 5, 10, 15]
# In [73]: q.baby_knapsack(q.xs)
# Out[73]: [(15, 2), (-10, -2), (10, 2), (-5, -2), (5, 2), (0, -2)]
How can I quickly find the min or max sum of the elements of a row in an array?
For example:
1, 2
3, 4
5, 6
7, 8
The minimum sum would be row 0 (1 + 2), and the maximum sum would be row 3 (7 + 8)
print mat.shape
(8, 1, 2)
print mat
[[[-995.40045 -409.15112]]
[[-989.1511 3365.3267 ]]
[[-989.1511 3365.3267 ]]
[[1674.5447 3035.3523 ]]
[[ 0. 0. ]]
[[ 0. 3199. ]]
[[ 0. 3199. ]]
[[2367. 3199. ]]]
In native Python, min and max have key functions:
>>> LoT=[(1, 2), (3, 4), (5, 6), (7, 8)]
>>> min(LoT, key=sum)
(1, 2)
>>> max(LoT, key=sum)
(7, 8)
If you want the index of the first min or max in Python, you would do something like:
>>> min(((i, t) for i, t in enumerate(LoT)), key=lambda (i,x): sum(x))
(0, (1, 2))
And then peel that tuple apart to get what you want. You also could use that in numpy, but at unknown (to me) performance cost.
In numpy, you can do:
>>> a=np.array(LoT)
>>> a[a.sum(axis=1).argmin()]
array([1, 2])
>>> a[a.sum(axis=1).argmax()]
array([7, 8])
To get the index only:
>>> a.sum(axis=1).argmax()
3
x = np.sum(x,axis=1)
min_x = x.min()
max_x = x.max()
presuming x is 4,2 array use np.sum to sum across the rows, then .min() returns the min value of your array and .max() returns the max value
You can do this using np.argmin and np.sum:
array_minimum_index = np.argmin([np.sum(x, axis=1) for x in mat])
array_maximum_index = np.argmax([np.sum(x, axis=1) for x in mat])
For your array, this results in array_minimum_index = 0 and array_maximum_index = 7, as your sums at those indices are -1404.55157 and 5566.0
To simply print out the values of the min and max sum, you can do this:
array_sum_min = min([np.sum(x,axis=1) for x in mat])
array_sum_max = max([np.sum(x,axis=1) for x in mat])
You can use min and max and use sum as their key.
lst = [(1, 2), (3, 4), (5, 6), (7, 8)]
min(lst, key=sum) # (1, 2)
max(lst, key=sum) # (7, 8)
If you want the sum directly and you do not care about the tuple itself, then map can be of help.
min(map(sum, lst)) # 3
max(map(sum, lst)) # 15
I have a scenario where I have a dataframe and vocabulary file which I am trying to fit to the dataframe string columns. I am using scikit learn countVectorizer which produces a sparse matrix. I need to take the output of the sparse matrix and merge it with the dataframe for corresponding row in dataframe.
code:-
from sklearn.feature_extraction.text import CountVectorizer
docs = ["You can catch more flies with honey than you can with vinegar.",
"You can lead a horse to water, but you can't make him drink.",
"search not cleaning up on hard delete",
"updating firmware version failed",
"increase not service topology s memory",
"Nothing Matching Here"
]
vocabulary = ["catch more","lead a horse", "increase service", "updating" , "search", "vinegar", "drink", "failed", "not"]
vectorizer = CountVectorizer(analyzer=u'word', vocabulary=vocabulary,lowercase=True,ngram_range=(0,19))
SpraseMatrix = vectorizer.fit_transform(docs)
Below is sparse matrix output -
(0, 0) 1
(0, 5) 1
(1, 6) 1
(2, 4) 1
(2, 8) 1
(3, 3) 1
(3, 7) 1
(4, 8) 1
Now, What I am looking to do is build a string for each row from sparse matrix and add it to the corresponding document.
Ex:- for doc 3 ("Updating firmware version failed") , I am looking to get "3:1 7:1 " from sparse matrix (i.e updating & failed column index and their frequency) and add this to doc's data frame's row 3.
I tried below , and it produces flatten output where as I am looking to get the submatrix based on the row index, loop through it and build a concated string for each row such as "3:1 7:1" , and finally then add this string as a new column to data frame for each corresponding row.
cx = SpraseMatrix .tocoo()
for i,j,v in zip(cx.row, cx.col, cx.data):
print((i,j,v))
(0, 0, 1)
(0, 5, 1)
(1, 6, 1)
(2, 4, 1)
(2, 8, 1)
(3, 3, 1)
(3, 7, 1)
(4, 8, 1)
I'm not entirely following what you want, but maybe the lil format will be easier to work with:
In [1122]: M = sparse.coo_matrix(([1,1,1,1,1,1,1,1],([0,0,1,2,2,3,3,4],[0,5,6,4,
...: 8,3,7,8])))
In [1123]: M
Out[1123]:
<5x9 sparse matrix of type '<class 'numpy.int32'>'
with 8 stored elements in COOrdinate format>
In [1124]: print(M)
(0, 0) 1
(0, 5) 1
(1, 6) 1
(2, 4) 1
(2, 8) 1
(3, 3) 1
(3, 7) 1
(4, 8) 1
In [1125]: Ml = M.tolil()
In [1126]: Ml.data
Out[1126]: array([list([1, 1]), list([1]), list([1, 1]), list([1, 1]), list([1])], dtype=object)
In [1127]: Ml.rows
Out[1127]: array([list([0, 5]), list([6]), list([4, 8]), list([3, 7]), list([8])], dtype=object)
It's attributes are organized by row, which appears to be how you want it.
In [1130]: Ml.rows[3]
Out[1130]: [3, 7]
In [1135]: for i,(rd) in enumerate(zip(Ml.rows, Ml.data)):
...: print(' '.join(['%s:%s'%ij for ij in zip(*rd)]))
...:
0:1 5:1
6:1
4:1 8:1
3:1 7:1
8:1
You can also iterate through the rows of the csr format, but that requires a bit more math with the .indptr attribute.
i have a list :
a=[1, 2, 3, 300] # this is IDs of workers
And a list of tuples :
f=[(1, 1, 1), (1, 0, 0), (0, 0, 0), (1, 500, 600)]
For every element in a ( a[i]) it has a related element (tuple) in f ( f[i) ) . So what i need is to sum the elements in f[i] for every a[i] till certain indices according to user . For example if user want the summation to end till certain index say 2 , the output will then be for ID 1=a[0] --> sum will be 2 (f[0]=1 +f[1]=1 ) , for ID 2=a[2] --> the summation is 1 [f[0]=0+f[1]=1] and so on till a[3]
here is my code :
str1=int(input('enter the index[enter -->1/2/3]'))
a=[1, 2, 3, 300]
f=[(1, 1, 1), (1, 0, 0), (0, 0, 0), (1, 500, 600)]
length=len(a)
temp=0 #sum
for i in range(0,length):
y=a[i]
att_2=f[i]
print("{} {}".format("The worker ID is ", y))
for z in range(0,(str1)):
temp=temp+att_2[i]
print(temp) # tracing the sum
I getting a error plus wrong result for some a[i] :
enter the index[enter -->1/2/3]2
temp=temp+att_2[i]
IndexError: tuple index out of range
The Student ID is 1
1
2
The Student ID is 2
2
2
The Student ID is 3
2
2
The Student ID is 300
Process finished with exit code 1
I am trying to fix these errors , but i cannot find its reasons. Thank you
Your Error is because you have mixed up the variable i and the variable z.
Your code loops through the tuple using variable i and that will result in an error as the maximum value i will take is calculated for another set of instructions.
A switch of variables on line 11 will fix your problems
Original:
str1=int(input('enter the index[enter -->1/2/3]'))
a=[1, 2, 3, 300]
f=[(1, 1, 1), (1, 0, 0), (0, 0, 0), (1, 500, 600)]
length=len(a)
temp=0 #sum
for i in range(0,length):
y=a[i]
att_2=f[i]
print("{} {}".format("The worker ID is ", y))
for z in range(0,(str1)):
temp=temp+att_2[i]
print(temp) # tracing the sum
New:
str1=int(input('enter the index[enter -->1/2/3]'))
a=[1, 2, 3, 300]
f=[(1, 1, 1), (1, 0, 0), (0, 0, 0), (1, 500, 600)]
length=len(a)
temp=0 #sum
for i in range(0,length):
y=a[i]
att_2=f[i]
print("{} {}".format("The worker ID is ", y))
for z in range(0,(str1)):
temp=temp+att_2[z]
print(temp) # tracing the sum
I have got a list of 2d coordinates with this structure:
coo = [(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2), (2, 0)]
Where coo[0] is the first coordinate stored in a tuple.
I would like to choose two different random coordinates. I can of course use this method:
import numpy as np
rndcoo1 = coo[np.random.randint(0,len(coo))]
rndcoo2 = coo[np.random.randint(0,len(coo))]
if rndcoo1 != rndcoo2:
#do something
But because I have to repeat this operation 1'000'000 times I was wondering if there is a faster method to do that. np.random.choice() can't be used for 2d array is there any alternative that I can use?
import random
result = random.sample(coo, 2)
will give you the expected output. And it is (probably) as fast as you can get with Python.
Listed in this post is a vectorized approach that gets us a number of such random choices for a number of iterations in one go without looping through those many times of iterations. The idea uses np.argpartition and is inspired by this post.
Here's the implementation -
def get_items(coo, num_items = 2, num_iter = 10):
idx = np.random.rand(num_iter,len(coo)).argpartition(num_items,axis=1)[:,:2]
return np.asarray(coo)[idx]
Please note that we would return a 3D array with the first dimension being the number of iterations, second dimension being the number of choices to be made at each iteration and the last dimension is the length of each tuple.
A sample run should present a bit more clearer picture -
In [55]: coo = [(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2), (2, 0)]
In [56]: get_items(coo, 2, 5)
Out[56]:
array([[[2, 0],
[1, 1]],
[[0, 0],
[1, 1]],
[[0, 2],
[2, 0]],
[[1, 1],
[1, 0]],
[[0, 2],
[1, 1]]])
Runtime test comparing a loopy implementation with random.sample as listed in #freakish's post -
In [52]: coo = [(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2), (2, 0)]
In [53]: %timeit [random.sample(coo, 2) for i in range(10000)]
10 loops, best of 3: 34.4 ms per loop
In [54]: %timeit get_items(coo, 2, 10000)
100 loops, best of 3: 2.81 ms per loop
Is coo just an example, or are your coordinates actually equally spaced? If so, you can just sample M 2D-coordinates like this:
import numpy
N = 100
M = 1000000
coo = numpy.random.randint(0, N, size=(M, 2))
Of course you can also bias and scale the distribution using addition and multiplication to account for different step sizes and offsets.
If you run into memory limitations with large Ms, you can of course sample smaller sizes, or just one array of 2 values with size=2.