Forgive me if this has been asked before, I couldn't find it. I am trying to progressively sum a numpy array into a new numpy array using vector operations. What I mean by this is that the 2nd index of the new array is equal to the 1st + 2nd index of the old array. or A[n] = B[0] + B[1] ... + B[n]. I know how to do this using a for loop but I'm looking for a vectorized solution.
Here is my non-vectorized solution:
import numpy as np
A = np.arange(10)
B = np.empty(10)
for i in range(len(A)):
B[i] = sum(A[0:i+1])
print(B)
You can do it like this:
import numpy as np
A = np.arange(10)
B = np.cumsum(A)
# [ 0 1 3 6 10 15 21 28 36 45]
Thanks
The "progressive" sum is called cumulative sum. Use NumPy's cumsum for this.
Using your example and comparing B to np.cumsum(A) results in equal arrays:
>>> import numpy as np
>>> A = np.arange(10)
>>> B = np.empty(10)
>>> for i in range(len(A)):
... B[i] = sum(A[0:i+1])
...
>>> np.array_equal(B, np.cumsum(A))
True
Related
I am learning Pandas and I am moving my python code to Pandas. I want to compare every value with the next values using a sub. So the first with the second etc.. The second with the third but not with the first because I already did that. In python I use two nested loops over a list:
sub match_values (a, b):
#do some stuff...
l = ['a', 'b', 'c']
length = len(l)
for i in range (1, length):
for j in range (i, length): # starts from i, not from the start!
if match_values(l[i], l[j]):
#do some stuff...
How do I do a similar technique in Pandas when my list is a column in a dataframe? Do I simply reference every value like before or is there a clever "vector-style" way to do this fast and efficient?
Thanks in advance,
Jo
Can you please check this ? It provides an output in the form of a list for each row after comparing the values.
>>> import pandas as pd
>>> import numpy as np
>>> val = [16,19,15,19,15]
>>> df = pd.DataFrame({'val': val})
>>> df
val
0 16
1 19
2 15
3 19
4 15
>>>
>>>
>>> df['match'] = df.apply(lambda x: [ (1 if (x['val'] == df.loc[idx, 'val']) else 0) for idx in range(x.name+1, len(df)) ], axis=1)
>>> df
val match
0 16 [0, 0, 0, 0]
1 19 [0, 1, 0]
2 15 [0, 1]
3 19 [0]
4 15 []
Yes, vector comparison as pandas is built on Numpy:
df['columnname'] > 5
This will result in a Boolean array. If you also want to return the actually part of the dataframe:
df[df['columnname'] > 5]
I have one simple one-dimensional array and an empty array in NumPy. I try to concatenate them, but I get an array in float.
from numpy import *
a = zeros(5,'i')
a += 1
b = []
c = hstack((a,b))
d = concatenate((a, b))
print("a",a)
print("b",b)
print("c",c)
print("d",d)
I got:
a [1 1 1 1 1]
b []
c [1. 1. 1. 1. 1.]
d [1. 1. 1. 1. 1.]
But I am looking for an integer array
[1 1 1 1 1]
How? And what is the most efficient way?
Try this way:
numpy array dtype by default is float.So, change it to np.int32
a = np.zeros(5,dtype=np.int32)
a += 1
b = np.array([],dtype=np.int32)
You might create b as 0-size np.array of dtype 'i' rather than list, that is:
import numpy as np
a = np.zeros(5,'i')
a += 1
b = np.array([],'i')
c = np.hstack((a,b))
d = np.concatenate((a, b))
print(d)
Output:
[1 1 1 1 1]
I think numpy assumes the empty array as float64 data type.
If you run the following
np.array([]).dtype
it returns dtype('float64')
so you should initilize the empty array as follows
b=[]
b=np.array(b,dtype="int32")
What is point you willing to have same array as input ?
use numpy.ones to reduce computation instead of numpy.zeros
`
import numpy
a = numpy.ones(5,dtype=int)
b = []
b = numpy.array([],dtype=int)
d = concatenate((a, b))
`
I want to slice a multidimensional array in Numpy.
Suppose my array is a 5*5*5 array, and the slicing that I have in mind in theory can be done using numpy.ix_:
s0 = [0,1,2]
s1 = [1,2,3]
s2 = [1,2,3]
b = a[numpy.ix_(s0,s1,s2)]
The problem is that the dimension of the array, as well as the way I need to slice the array along different dimensions change within the programme (for example array 'a' might be 2,3,4,... dimensional, and s0, s1, ... also change), so the above code doesn't work as I like unless I can pass a list/tuple to numpy.ix_ like this:
N = 3
M = 3
s = [np.ones(M).astype(int) for i in range(N)]
s[0] = [0,1,2]
s[1] = [1,2,3]
s[2] = [1,2,3]
b = a[numpy.ix_(s)]
Which unfortunately doesn't work, because ix_ only accepts one dimensional objects (?). What's the best workaround? How to cleanly implement ix_ myself (or is there an easier way to do this?)?
Use the * argument-unpacking operator:
b = a[numpy.ix_(*s)]
is equivalent to
b = a[numpy.ix_(s[0], s[1], ..., s[n])]
For example,
import numpy as np
N = 3
M = 3
a = np.arange((M+1)**N).reshape([M+1]*N)
s = [np.ones(M).astype(int) for i in range(N)]
s[0] = [0,1,2]
s[1] = [1,2,3]
s[2] = [1,2,3]
b = a[np.ix_(*s)]
print(b)
prints
[[[ 5 6 7]
[ 9 10 11]
[13 14 15]]
[[21 22 23]
[25 26 27]
[29 30 31]]
[[37 38 39]
[41 42 43]
[45 46 47]]]
In R I can do:
> y = c(2,3)
> x = c(4,5)
> z = data.frame(x,y)
> z[3,3]<-6
> z
x y V3
1 4 2 NA
2 5 3 NA
3 NA NA 6
R automatically fills the empty cells with NA.
If I use numpy.insert from numpy, numpy throws by default an error:
import numpy
y = [2,3]
x = [4,5]
z = numpy.array([y, x])
z = numpy.insert(z, 3, 6, 3)
IndexError: axis 3 is out of bounds for an array of dimension 2
Is there a way to insert values in a way that works similar to R in numpy?
numpy is more of a replacement for R's matrices, and not so much for its data frames. You should consider using python's pandas library for this. For example:
In [1]: import pandas
In [2]: y = pandas.Series([2,3])
In [3]: x = pandas.Series([4,5])
In [4]: z = pandas.DataFrame([x,y])
In [5]: z
Out[5]:
0 1
0 4 5
1 2 3
In [19]: z.loc[3,3] = 6
In [20]: z
Out[20]:
0 1 3
0 4 5 NaN
1 2 3 NaN
3 NaN NaN 6
In numpy you need to initialize an array with the appropriate size:
z = numpy.empty(3, 3)
z.fill(numpy.nan)
z[:2, 0] = x
z[:2, 1] = z
z[3,3] = 6
Looking at the raised error is possible to understand why it occurred:
you are trying to insert values in an axes non existent in z.
you can fix it doing
import numpy as np
y = [2,3]
x = [4,5]
array = np.array([y, x])
z = np.insert(array, 1, [3,6], axis=1))
The interface is quite different from the R's one. If you are using IPython,
you can easily access the documentation for some numpy function, in this case
np.insert, doing:
help(np.insert)
which gives you the function signature, explain each parameter used to call it and provide
some examples.
you could, alternatively do
import numpy as np
x = [4,5]
y = [2,3]
array = np.array([y,x])
z = [3,6]
new_array = np.vstack([array.T, z]).T # or, as below
# new_array = np.hstack([array, z[:, np.newaxis])
Also, give a look at the Pandas module. It provides
an interface similar to what you asked, implemented with numpy.
With pandas you could do something like:
import pandas as pd
data = {'y':[2,3], 'x':[4,5]}
dataframe = pd.DataFrame(data)
dataframe['z'] = [3,6]
which gives the nice output:
x y z
0 4 2 3
1 5 3 5
If you want a more R-like experience within python, I can highly recommend pandas, which is a higher-level numpy based library, which performs operations of this kind.
Say I have a 1D array:
import numpy as np
my_array = np.arange(0,10)
my_array.shape
(10, )
In Pandas I would like to create a DataFrame with only one row and 10 columns using this array. FOr example:
import pandas as pd
import random, string
# Random list of characters to be used as columns
cols = [random.choice(string.ascii_uppercase) for x in range(10)]
But when I try:
pd.DataFrame(my_array, columns = cols)
I get:
ValueError: Shape of passed values is (1,10), indices imply (10,10)
I presume this is because Pandas expects a 2D array, and I have a (flat) 1D array. Is there a way to inflate my 1D array into a 2D array or have Panda use a 1D array in the creation of the dataframe?
Note: I am using the latest stable version of Pandas (0.11.0)
Your value array has length 9, (values from 1 till 9), and your cols list has length 10.
I dont understand your error message, based on your code, i get:
ValueError: Shape of passed values is (1, 9), indices imply (10, 9)
Which makes sense.
Try:
my_array = np.arange(10).reshape(1,10)
cols = [random.choice(string.ascii_uppercase) for x in range(10)]
pd.DataFrame(my_array, columns=cols)
Which results in:
F H L N M X B R S N
0 0 1 2 3 4 5 6 7 8 9
Either these should do it:
my_array2 = my_array[None] # same as myarray2 = my_array[numpy.newaxis]
or
my_array2 = my_array.reshape((1,10))
A single-row, many-columned DataFrame is unusual. A more natural, idiomatic choice would be a Series indexed by what you call cols:
pd.Series(my_array, index=cols)
But, to answer your question, the DataFrame constructor is assuming that my_array is a column of 10 data points. Try DataFrame(my_array.reshape((1, 10)), columns=cols). That works for me.
By using one of the alternate DataFrame constructors it is possible to create a DataFrame without needing to reshape my_array.
import numpy as np
import pandas as pd
import random, string
my_array = np.arange(0,10)
cols = [random.choice(string.ascii_uppercase) for x in range(10)]
pd.DataFrame.from_records([my_array], columns=cols)
Out[22]:
H H P Q C A G N T W
0 0 1 2 3 4 5 6 7 8 9