I am newbie in Python. I think I'm looking for something easy, but can't find.
I have an numpy binary array, e.g.:
[1,0,1,1,0,0,0,1,1,1,1,0]
And I want to do 2 things:
Join (?) all elements into one number, so result will be:
x=101100011110
Next want to converse it into binary, so:
xx=2846
I have an algorithm to do 2., but I don't know how to do 1. I can do it using loop, but is it possible to do it using numpy, without loop? My array will be huge, so I need the best option.
>>> int(''.join(map(str, [1,0,1,1,0,0,0,1,1,1,1,0])))
101100011110
Or with a little numpy:
>>> int(''.join(np.array([1,0,1,1,0,0,0,1,1,1,1,0]).astype('|S1')))
101100011110
I like #timgeb's answer, but if you're sure you want to use numpy calculations directly, you could do something like this:
x = np.array([1,0,1,1,0,0,0,1,1,1,1,0])
exponents = np.arange(len(x))[::-1]
powers = 10**exponents
result = sum(powers * x)
In [12]: result
Out[12]: 101100011110
As pointed out by #Magellan88 in the comments, if you set powers=2**exponents you can get from 0 to your second part of the question in one sweep.
Since you don't want loop in first task then you can go with map method , I just wanted to show you can also try this :
import numpy as np
array=np.array([1,0,1,1,0,0,0,1,1,1,1,0])
int_con=str(array).replace(',','').replace(' ','').replace('[','').replace(']','')
print("Joined {}".format(int_con))
bin_to_de=0
for digit in int_con:
bin_to_de=bin_to_de*2+int(digit)
print("Decimal conversion {}".format(bin_to_de))
output:
Joined 101100011110
Decimal conversion 2846
Related
I have the following statement in Pandas that uses the apply method which can take up to 2 minutes long.
I read that in order to optimize the speed. I should vectorize the statement. My original statement looks like this:
output_data["on_s"] = output_data["m_ind"].apply(lambda x: my_matrix[x, 0] + my_matrix[x, 1] + my_matrix[x, 2]
Where my_matrix is spicy.sparse matrix. So my initial step was to use the sum method:
summed_matrix = my_matrix.sum(axis=1)
But then after this point I get stuck on how to proceed.
Update: Including example data
The matrice looks like this (scipy.sparse.csr_matrix):
(290730, 2) 0.3058016922838267
(290731, 2) 0.3390328430763723
(290733, 2) 0.0838999800585995
(290734, 2) 0.0237008960604337
(290735, 2) 0.0116864263235209
output_data["m_ind"] is just a Pandas series that has come values like so:
97543
97544
97545
97546
97547
An Update: You have convert sparse matrix into dense matrix first
Since you haven't provided any reproducible code I can't understand what is your problem exactly and give you an very very precise answer. But I will answer according to my understanding.
Let's assume your my_matrix is some thing like this
[[1,2,3],
[4,5,6],
[7,8,9]]
then the summed_matrix will be like [6,15,24]. So if my assumption is right it looks like you are almost there.
First I'll give you the simplest answer.
Try using this line.
output_data["on_s"] = output_data["m_ind"].apply(lambda x: summed_matrix[x])
Then we can try this completely vectorized method.
Turn m_ind into a one hot encoded array ohe_array. Be careful to generate ohe_array according to the increasing order (sorted).
Otherwise you will get the wrong answer. refer this
Then get the dot product of ohe_array and summed_matrix. refer this
Assign the result into the column on_s
Also We can try the following code and compare performance against apply.
indexes = output_data["m_ind"].values
on_s = []
for i in indexes:
on_s.append(summed_matrix[i])
output_data['on-s'] = on_s
Which is the most performant way
to convert something like that
problem = [ [np.array([1,2,3]), np.array([4,5])],
[np.array([6,7,8]), np.array([9,10])]]
into
desired = np.array([[1,2,3,4,5],
[6,7,8,9,10]])
Unfortunately, the final number of columns and rows (and length of subarrays) is not known in advance, as the subarrays are read from a binary file, record by record.
How about this:
problem = [[np.array([1,2,3]), np.array([4,5])],
[np.array([6,7,8]), np.array([9,10])]]
print np.array([np.concatenate(x) for x in problem])
I think this:
print np.array([np.hstack(i) for i in problem])
Using your example, this runs in 0.00022s, wherease concatenate takes 0.00038s
You can also use apply_along_axis although this runs in 0.00024s:
print np.apply_along_axis(np.hstack, 1, problem)
I am implementing the code in python which has the variables stored in numpy vectors. I need to perform simple operation: something like (vec1+vec2^2)/vec3. Each element of each vector is summed and multiplied. (analog of MATLAB elementwise .* operation).
The problem is in my code that I have dictionary which stores all vectors:
var = {'a':np.array([1,2,2]),'b':np.array([2,1,3]),'c':np.array([3])}
The 3rd vector is just 1 number which means that I want to multiply this number by each element in other arrays like 3*[1,2,3]. And at the same time I have formula which is provided as a string:
formula = '2*a*(b/c)**2'
I am replacing the formula using Regexp:
formula_for_dict_variables = re.sub(r'([A-z][A-z0-9]*)', r'%(\1)s', formula)
which produces result:
2*%(a)s*(%(b)s/%(c)s)**2
and substitute the dictionary variables:
eval(formula%var)
In the case then I have just pure numbers (Not numpy arrays) everything is working, but when I place numpy.arrays in dict I receive an error.
Could you give an example how can I solve this problem or maybe suggest some different approach. Given that vectors are stored in dictionary and formula is a string input.
I also can store variables in any other container. The problem is that I don't know the name of variables and formula before the execution of code (they are provided by user).
Also I think iteration through each element in vectors probably will be slow given the python for loops are slow.
Using numexpr, then you could do this:
In [143]: import numexpr as ne
In [146]: ne.evaluate('2*a*(b/c)**2', local_dict=var)
Out[146]: array([ 0.88888889, 0.44444444, 4. ])
Pass the dictionary to python eval function:
>>> var = {'a':np.array([1,2,2]),'b':np.array([2,1,3]),'c':np.array([3])}
>>> formula = '2*a*(b/c)**2'
>>> eval(formula, var)
array([ 0.8889, 0.4444, 4. ])
i have this :
npoints=10
vectorpoint=random.uniform(-1,1,[1,2])
experiment=random.uniform(-1,1,[npoints,2])
and now i want to create an array with dimensions [1,npoints].
I can't think how to do this.
For example table=[1,npoints]
Also, i want to evaluate this:
for i in range(1,npoints):
if experiment[i,0]**2+experiment[i,1]**2 >1:
table[i]=0
else:
table[i]=1
I am trying to evaluate the experiment[:,0]**2+experiment[:,1]**2 and if it is >1 then an element in table becomes 0 else becomes 1.
The table must give me sth like [1,1,1,1,0,1,0,1,1,0].
I can't try it because i can't create the array "table".
Also,if there is a better way (with list comprehensions) to produce this..
Thanks!
Try:
table = (experiment[:,0]**2 + experiment[:,1]**2 <= 1).astype(int)
You can leave off the astype(int) call if you're happy with an array of booleans rather than an array of integers. As Joe Kington points out, this can be simplified to:
table = 1 - (experiment**2).sum(axis=1).astype(int)
If you really need to create the table array up front, you could do:
table = zeros(npoints, dtype=int)
(assuming that you've already import zeros from numpy). Then your for loop should work as written.
Aside: I suspect that you want range(npoints) rather than range(1, npoints) in your for statement.
Edit: just noticed that I had the 1s and 0s backwards. Now fixed.
I would like to apply a function to a monodimensional array 3 elements at a time, and output for each of them a single element.
for example I have an array of 13 elements:
a = np.arange(13)**2
and I want to apply a function, let's say np.std as an example.
Here is the equivalent list comprehension:
[np.std(a[i:i+3]) for i in range(0, len(a),3)]
[1.6996731711975948,
6.5489609014628334,
11.440668201153674,
16.336734339790461,
0.0]
does anyone know a more efficient way using numpy functions?
The simplest way is to reshape it and apply the function along an axis.
import numpy as np
a = np.arange(12)**2
b = a.reshape(4,3)
print np.std(b, axis=1)
If you need a little better performance than that, you could try stride_tricks. Below is the same as above except using stride_tricks. I was wrong about the performance gain, because as you can see below, b becomes exactly the same view as b above. I wouldn't be surprised if they compiled to exactly the same thing.
import numpy as np
a = np.arange(12)**2
b = np.lib.stride_tricks.as_strided(a, shape=(4,3), strides=(a.itemsize*3, a.itemsize))
print np.std(b, axis=1)
Are you talking about something like vectorize? http://docs.scipy.org/doc/numpy/reference/generated/numpy.vectorize.html
You can reshape it. But that does require that the size not change. If you can tack on some bogus entries at the end you can do this:
[np.std(s) for s in a.reshape(-1,3)]