I need to populate a 2D array whose shape is 3xN, where N is initially unknown. The code looks as follows:
import numpy as np
import random
nruns = 5
all_data = [[]]
for run in range(nruns):
n = random.randint(1,10)
d1 = random.sample(range(0, 30), n)
d2 = random.sample(range(0, 30), n)
d3 = random.sample(range(0, 30), n)
data_tmp = [d1, d2, d3]
all_data = np.concatenate((all_data,data_tmp),axis=0)
This gives the following error:
ValueError Traceback (most recent call last)
<ipython-input-103-22af8f04e7c0> in <module>
10 d3 = random.sample(range(0, 30), n)
11 data_tmp = [d1, d2, d3]
---> 12 all_data = np.concatenate((all_data,data_tmp),axis=0)
13 print(np.shape(data_tmp))
<__array_function__ internals> in concatenate(*args, **kwargs)
ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 0 and the array at index 1 has size 4
Is there a way to do this without pre-allocating all_data? Note that in my application, the data will not be random, but generated inside the loop.
Many thanks!
You could store the data generated in each step of the for loop into a list and create the array when you are done.
In [298]: import numpy as np
...: import random
In [299]: nruns = 5
...: all_data = []
In [300]: for run in range(nruns):
...: n = random.randint(1,10)
...: d1 = random.sample(range(0, 30), n)
...: d2 = random.sample(range(0, 30), n)
...: d3 = random.sample(range(0, 30), n)
...: all_data.append([d1, d2, d3])
In [301]: all_data = np.hstack(all_data)
In [302]: all_data
Out[302]:
array([[13, 28, 14, 15, 11, 0, 0, 19, 6, 28, 14, 18, 1, 15, 4, 20,
9, 14, 15, 13, 27, 28, 25, 5, 7, 4, 10, 22, 12, 6, 23, 15,
0, 20, 14, 5, 13],
[10, 9, 23, 4, 25, 28, 17, 14, 3, 4, 5, 9, 7, 18, 23, 9,
14, 15, 25, 26, 29, 12, 21, 0, 5, 6, 11, 27, 13, 26, 22, 14,
6, 5, 7, 23, 0],
[13, 0, 7, 14, 29, 26, 12, 16, 13, 3, 9, 6, 11, 2, 19, 17,
28, 14, 25, 24, 3, 12, 22, 7, 23, 18, 5, 14, 0, 14, 15, 8,
3, 2, 26, 21, 16]])
See if this is what you need, i.e. populate along axis 1 instead of 0.
import numpy as np
import random
nruns = 5
all_data = [[], [], []]
for run in range(nruns):
n = random.randint(1,10)
d1 = random.sample(range(0, 30), n)
d2 = random.sample(range(0, 30), n)
d3 = random.sample(range(0, 30), n)
data_tmp = [d1, d2, d3]
all_data = np.concatenate((all_data, data_tmp), axis=1)
How about using np.random only:
nruns = 5
# set seed for repeatability, remove for randomness
np.random.seed(42)
# randomize the lengths for the runs
num_samples = np.random.randint(1,10, nruns)
# sampling with the total length
all_data = np.random.randint(0,30, (3, num_samples.sum()))
# or, if `range(0,30)` represents some population
# all_data = np.random.choice(range(0,30), (3,num_samples.sum()) )
print(all_data)
Output:
[[25 18 22 10 10 23 20 3 7 23 2 21 20 1 23 11 29 5 1 27 20 0 11 25
21 28 11 24 16 26 26]
[ 9 27 27 15 14 29 29 14 29 18 11 22 19 24 2 4 18 6 20 8 6 17 3 24
27 13 17 25 8 25 20]
[ 1 19 27 14 27 6 11 28 7 14 2 13 16 3 17 7 3 1 29 5 21 9 3 21
28 17 25 11 1 9 29]]
Related
I'm trying to append a list to a dataframe in Python
I want to put the first 6 numbers on the same line and then add it line by line, until complete the dataframe.
I tried to generate the data to make it easier:
import pandas as pd
import random
randomlist = []
for i in range(0,30):
n = random.randint(1,30)
randomlist.append(n)
resultado(randomlist)
randomlist = 30, 11, 18, 11, 28, 18, 22, 18, 20, 10, 11, 6, 29, 1, 11, 15, 3, 4, 17, 11, 17, 18, 27, 25, 11, 10, 7, 4, 18, 27
lista_colunas = ['Carro', 'Moto', 'Barco', 'Patinete', 'Mobilete', 'Skate]
lista_index = ['Entre 1 a 5', 'Entre 6 a 10', 'Entre 11 a 15', 'Entre 16 a 20', 'Entre 21 a 25']
Expected outcome:
I'd suggest directly using NumPy instead.
import numpy as np
import pandas as pd
# 1, 30 are the boundaries of the values;
# (5, 6) is the shape of the resulting matrix
df = pd.DataFrame(np.random.randint(1, 30, (5, 6)), index=lista_index, columns=lista_colunas)
Result:
>>> df
Carro Moto Barco Patinete Mobilete Skate
Entre 1 a 5 20 8 24 21 29 15
Entre 6 a 10 16 3 29 29 21 27
Entre 11 a 15 25 8 27 29 25 23
Entre 16 a 20 18 27 22 3 23 7
Entre 21 a 25 22 2 2 17 4 12
You need to firstly reshape your list and afterwards give the arguments to the dataFrame. This should work:
import pandas as pd
import numpy as np
randomlist = [30, 11, 18, 11, 28, 18, 22, 18, 20, 10, 11, 6, 29, 1, 11, 15, 3, 4, 17, 11, 17, 18, 27, 25, 11, 10, 7, 4, 18, 27]
lista_colunas = ['Carro', 'Moto', 'Barco', 'Patinete', 'Mobilete', 'Skate']
lista_index = ['Entre 1 a 5', 'Entre 6 a 10', 'Entre 11 a 15', 'Entre 16 a 20', 'Entre 21 a 25']
randomlist = np.reshape(randomlist, (len(lista_index), len(lista_colunas)))
df = pd.DataFrame(randomlist, index = lista_index, columns=lista_colunas)
print(df.head())
Hope this could help.
I made some numpy array np3
np1 = np.array(range(2*3*5))
np3 = np1.reshape(2,3,5)
and np3 has shape like this:
[[[ 0 1 2 3 4]
[ 5 6 7 8 9]
[10 11 12 13 14]]
[[15 16 17 18 19]
[20 21 22 23 24]
[25 26 27 28 29]]]
then, I made new numpy array np_55
np_55 = np.full((3,1),55)
and np_55 has shape like this:
[[55]
[55]
[55]]
I want make numpy array like below using both numpy arrays np3 and np_55 (I'll call that 'ANSWER'):
[[[ 0 1 2 3 4 55]
[ 5 6 7 8 9 55]
[10 11 12 13 14 55]]
[[15 16 17 18 19 55]
[20 21 22 23 24 55]
[25 26 27 28 29 55]]]
but I can't make it using both numpy arrays np3 and np_55. Of course I can make hard code like this:
a = np.append((np3[0]), np3_55, axis=1)
b = np.append((np3[1]), np3_55, axis=1)
a = a.reshape(1,3,6)
b = b.reshape(1,3,6)
np.append(a, b, axis=0)
but I don't know how can I solve ANSWER simply.
You can try the following:
import numpy as np
a = np.arange(2*3*5).reshape(2, 3, 5)
b = np.full((3,1),55)
np.c_[a, np.broadcast_to(b, (a.shape[0], *b.shape))]
It gives:
array([[[ 0, 1, 2, 3, 4, 55],
[ 5, 6, 7, 8, 9, 55],
[10, 11, 12, 13, 14, 55]],
[[15, 16, 17, 18, 19, 55],
[20, 21, 22, 23, 24, 55],
[25, 26, 27, 28, 29, 55]]])
I have the following code:
ttmbond = 10
daywalk = np.arange(0,30)
dtm = ttmbond - daywalk/252
curve_list = [0.083,0.25,0.5,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]
pos1= np.ones((len(daywalk+1),len(curve_list)))
pos2 = pos1*curve_list
pos3 = pos2 <= dtm
which now gives me this
TRUE/FALSE ndarray
I would like to get a list of row index of last true value in each column. From this example, my final result should be something like [12, 11 , 11, 11, 11, 11, ....]
Or is there anyway to get the position of the value from the curve_list which is the highest smaller or equal to value of values in dtm?
Thanks
Extending your code:
ttmbond = 10
daywalk = np.arange(0,30)
dtm = ttmbond - daywalk/252
curve_list = [0.083,0.25,0.5,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]
pos1= np.ones((len(daywalk+1),len(curve_list)))
pos2 = pos1*curve_list
pos3 = (pos2 <= (dtm+pos1.T).T)
temp = np.where(pos3==True)
loc = np.where((temp[0][1:]-temp[0][:-1])==1)[0]
res = np.append(temp[1][loc], temp[1][-1])
print(res)
'''
Output:
[13 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12
12 12 12 12 12 12]
'''
Other way around:
ttmbond = 10
daywalk = np.arange(0,30)
dtm = ttmbond - daywalk/252
curve_list = [0.083,0.25,0.5,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]
res = [np.where((i>0)==True)[0][0] for i in [curve_list - i for i in dtm]]
print(res)
'''
Output:
[13, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12]
'''
I am trying to convert some R code into numpy. I have a vector as follows:
r=[2.00000
1.64000
1.36000
1.16000
1.04000
1.00000
1.64000
1.28000
1.00000
0.80000
0.68000
0.64000
1.36000
1.00000
0.72000
0.52000
0.40000
0.36000
1.16000
0.80000
0.52000
0.32000
0.20000
0.16000
1.04000
0.68000
0.40000
0.20000
0.08000
0.04000
1.00000
0.64000
0.36000
0.16000
0.04000
0.00000]
I am trying to convert following R code
index <- order(r)
into numpy by following code
index = np.argsort(r)
Here are the results
Numpy
index=array([35, 29, 34, 28, 33, 23, 27, 22, 21, 32, 17, 16, 26, 15, 20, 11, 31,25, 10, 14, 9, 19, 30, 5, 8, 13, 4, 24, 18, 3, 7, 12, 2, 6, 1, 0])
R
index= [36 30 35 29 24 34 23 28 22 18 33 17 27 16 21 12 32 11 26 15 10 20 6 9 14 31 5 25 4 19 8 3 13 2 7 1]
As you see the results are different. How can I obtain results of R in numpy
Looking at the documentation of order, it looks like r uses radix sort for short vectors, which is indeed a stable sort. argsort on the other hand uses quicksort by default which is not a stable sort, and will not guarantee ties to be in the same order as the original array.
However, you can use a stable sort with argsort by specifying the kind flag:
np.argsort(r, kind='stable')
When I use a stable sort on your vector:
array([35, 29, 34, 28, 23, 33, 22, 27, 21, 17, 32, 16, 26, 15, 20, 11, 31,
10, 25, 14, 9, 19, 5, 8, 13, 30, 4, 24, 3, 18, 7, 2, 12, 1,
6, 0], dtype=int64)
Compared to the r result (subtracting one for the difference in indexing):
np.array_equal(np.argsort(r, kind='stable'), r_out - 1)
True
A word of warning: it appears the r switches to shell sort under certain conditions (I don't know enough about r to give a more detailed clarification), but shell sort is not stable. This will be something you have to address if those conditions are met.
I have a list of elements and I want to use mapping functions to generate an element-wise list of whether they are within any ranges in a list of ranges. I already have a solution that uses a for-loop, but for-loops are too slow because both my element and range lists will be much larger.
Here is my code so far:
import pandas as pd
# check element-wise if [1,0,45,60] within ranges 1-10, 21-30, or 41-50
# expected output: true, false, true, false
s = pd.Series([1,0,45,60])
f = lambda x: any((x >= pd.Series([1,20,40])) & (x <= pd.Series([10,30,50])))
print map(f, s)
Error:
elif isinstance(other, (np.ndarray, pd.Index)):
--> if len(self) != len(other):
raise ValueError('Lengths must match to compare')
return self._constructor(na_op(self.values, np.asarray(other)),
TypeError: len() of unsized object
Figured it out. Seems like everything works and is still fast if I convert to numpy. Normally I'd frown on introducing a new library but pandas is built on top of numpy.
import pandas as pd, numpy as np
s = pd.Series([1,0,45,60])
mins = np.array(pd.Series([1,20,40]))
maxes = np.array(pd.Series([10,30,50]))
f = lambda x: np.any((x >= mins) & (x <= maxes))
print map(f, s)
I think you can first create all ranges and for check use isin with tolist:
import pandas as pd
s = pd.Series([1,0,45,60])
print s
0 1
1 0
2 45
3 60
dtype: int64
rng = range(1,11) + range(21,31) + range(41,51)
print rng
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 21, 22, 23, 24, 25,
26, 27, 28, 29, 30, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50]
print s.isin(rng)
0 True
1 False
2 True
3 False
dtype: bool
print s.isin(rng).tolist()
[True, False, True, False]
EDIT:
For creating ranges you can use numpy.arange and numpy.concatenate:
import numpy as np
rng = np.concatenate((np.arange(1, 11), np.arange(21, 31), np.arange(41, 51)))
print rng
[ 1 2 3 4 5 6 7 8 9 10 21 22 23 24 25
26 27 28 29 30 41 42 43 44 45 46 47 48 49 50]
Another solution for generating ranges can be slicing:
s = range(0,51)
print s
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35,
36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50]
print s[1:11] + s[21:31] + s[41:51]
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 21, 22, 23, 24, 25,
26, 27, 28, 29, 30, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50]
you can use cut() function for categorizing your values:
In [296]: s[pd.cut(s, bins=range(0, 110, 10), labels=labels).isin(['1 - 10','21 - 30','41 - 50'])]
Out[296]:
0 1
3 23
5 45
dtype: int64
Explanation:
original series:
In [291]: s
Out[291]:
0 1
1 0
2 19
3 23
4 35
5 45
6 60
dtype: int64
labels for categories:
In [292]: labels = [ "{0} - {1}".format(i, i + 9) for i in range(1, 100, 10) ]
In [293]: labels
Out[293]:
['1 - 10',
'11 - 20',
'21 - 30',
'31 - 40',
'41 - 50',
'51 - 60',
'61 - 70',
'71 - 80',
'81 - 90',
'91 - 100']
using cut() for categorizing your series:
In [294]: pd.cut(s, bins=range(0, 110, 10), labels=labels)
Out[294]:
0 1 - 10
1 NaN
2 11 - 20
3 21 - 30
4 31 - 40
5 41 - 50
6 51 - 60
dtype: category
Categories (10, object): [1 - 10 < 11 - 20 < 21 - 30 < 31 - 40 ... 61 - 70 < 71 - 80 < 81 - 90 <
91 - 100]
select only intereseting categories:
In [295]: pd.cut(s, bins=range(0, 110, 10), labels=labels).isin(['1 - 10','21 - 30','41 - 50'])
Out[295]:
0 True
1 False
2 False
3 True
4 False
5 True
6 False
dtype: bool
and finally:
In [296]: s[pd.cut(s, bins=range(0, 110, 10), labels=labels).isin(['1 - 10','21 - 30','41 - 50'])]
Out[296]:
0 1
3 23
5 45
dtype: int64