having trouble inserting a value into a numpy array - python

I'm creating a hash function and for some reason i cant use the np.insert function. I don't understand what I'm doing wrong and need a bit of help.
lis = np.arange(1,13)
key = np.array([18, 41, 22, 44, 59, 32, 31, 73])
# h(x) = x mod 13
for i in range(len(key)):
slot = key[i] % 13
np.insert(lis, slot, key[i])
print(lis)
lis returns
[ 1 2 3 4 5 6 7 8 9 10 11 12]

np.insert creates a new array that is, it does not manipulate lis it creates a copy of lis & manipulates that. So if you want to store the results back into lis try the following
lis = np.arange(1,13)
key = np.array([18, 41, 22, 44, 59, 32, 31, 73])
# h(x) = x mod 13
for i in range(len(key)):
slot = key[i] % 13
lis = np.insert(lis, slot, key[i])
print(lis)
This gives,
[ 1 2 41 3 4 31 44 32 73 5 59 18 6 7 22 8 9 10 11 12]

Related

How to merge 2 lists and sort them based on index?

I have the following dataframe:
idx
val1
val2
0
15
12
1
14
38
2
11
88
3
95
21
4
19
98
5
12
48
6
35
38
7
25
39
8
65
28
I created two lists based on index say.
list1 = [0, 3, 6]
list2 = [5, 8]
I tried to write a code wherein index values from list1 takes val1 data and list2 takes val2 data and same is sorted on index.
My output list should be
output = [15, 95, 48, 35, 28]
The solution is something like:
pd.concat([df1, df2], axis=0).sort_index()
Please provide a minimal reproducible example to have a specific solution for your task
Try:
x = df.loc[df['idx'].isin(list1), 'val1']
y = df.loc[df['idx'].isin(list2), 'val2']
x = pd.concat([x, y]).sort_index().to_list()
print(x)
Prints:
[15, 95, 48, 35, 28]

how to sort the chunk of list in python?

I have a list of rates, which contain almost 35040 values in it. I have divided my list into 365 blocks of 96 elements in it. Now I want to get the first 4 minimum values from each block and to achieve that first I am sorting blocks in increasing order and printing or inserting the first 4 elements from the list into a new list.
my approach:
import pandas as pd
inputFile = "inputFile.xlsx"
fileName = inputFile
inputSheetDF = pd.read_excel(fileName, sheet_name='Sheet1')
iexRate = inputSheetDF['IEX Price']
#iexRate = [2.3, 2.4, 3, 4, 3.2, 4.1, 5.......]
testList = []
n = 96
x = [iexRate[i:i + n] for i in range(0, len(iexRate), n)]
x.sort()
but this x.sort() giving me an error.
ValueError: Can only compare identically-labeled Series objects
So basically I want an output in that testList which contains the first 4 minimum elements in each 96 block.
Here's a proposed solution, which has the advantage of being vectorized. I'm using a much smaller dataset - 3 chunks of 4 each, sampling the top (button) 2 from each chunk - but the idea for a larger dataset is of course the same.
df = pd.DataFrame({"rate": np.random.randint(1, 100, 12), "chunk": [1]*4 + [2]*4 + [3]*4 })
print(df)
==>
rate chunk
0 81 1
1 51 1
2 50 1
3 83 1
4 33 2
5 88 2
6 97 2
7 2 2
8 22 3
9 23 3
10 4 3
11 83 3
df.sort_values("rate", inplace=True)
df.groupby("chunk").head(2).sort_values("chunk")
==>
rate chunk
2 50 1
1 51 1
7 2 2
4 33 2
10 4 3
8 22 3
To get a flat list of all the rates, just do :
flat_list = list(res.rate)
==> [50, 51, 2, 33, 4, 22]
iexRate = pd.Series(range(1,100))
n = 15
x = [iexRate[i:i + n] for i in range(0, len(iexRate), n)]
testList = [sorted(block)[:4] for block in x]
[[1, 2, 3, 4], [16, 17, 18, 19], [31, 32, 33, 34], [46, 47, 48, 49], [61, 62, 63, 64], [76, 77, 78, 79], [91, 92, 93, 94]]

Comparing between one value and a range of values in an array

I have a 2D array that looks like this:
a = [[ 0 0 0 0 0 25 30 35 40 45 50 55 60 65 70 75]
[ 4 5 6 7 8 29 34 39 44 49 54 59 64 69 74 250]]
and I also have another 1D array that looks like this:
age_array = [45,46,3,7]
is there a way to verify that the values in age_array are within the range of the 2 values in the first column of a and if not then move on to the next column? For example,
if a[0: , :] <= age_array[i] <= a[1:, :]
return True
else: return False
If you want to know if each entry in the age array is between the a[0][0] and a[1][0]
a = [[0, 0, 0, 0, 0, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75],
[4, 5, 6, 7, 8, 29, 34, 39, 44, 49, 54, 59, 64, 69, 74, 250]]
age_array = [45,46,3,7]
dct = {}
for age in age_array:
for i in range(len(a[0])):
if a[0][i] <= age and age <= a[1][i]:
print(str(age) + ' is between ' + str(a[0][i]) + ' and ' + str(a[1][i]))
break
This outputs:
45 is between 45 and 49
46 is between 45 and 49
3 is between 0 and 4
7 is between 0 and 7
You can convert both the arrays into sets and then check if age_array set is a subset of a set.
Unfortunately I cannot post answer as your first array is not properly formatted
Very simple to understand but it might look quite ugly.
value=[]
for x in range(len(a)):
for xx in range(len(a[x])):
for xxx in range(len(b)):
if a[x][xx]==b[xxx]:
value.append("true")
else:
value.append("false")
for a in value:
if a=="true":
#it falls in the category

new python pandas dataframe column based on value of variable, using function

I have a variable, 'ImageName' which ranges from 0-1600. I want to create a new variable, 'LocationCode', based on the value of 'ImageName'.
If 'ImageName' is less than 70, I want 'LocationCode' to be 1. if 'ImageName' is between 71 and 90, I want 'LocationCode' to be 2. I have 13 different codes in all. I'm not sure how to write this in python pandas. Here's what I tried:
def spatLoc(ImageName):
if ImageName <=70:
LocationCode = 1
elif ImageName >70 and ImageName <=90:
LocationCode = 2
return LocationCode
df['test'] = df.apply(spatLoc(df['ImageName'])
but it returned an error. I'm clearly not defining things the right way but I can't figure out how to.
You can just use 2 boolean masks:
df.loc[df['ImageName'] <= 70, 'Test'] = 1
df.loc[(df['ImageName'] > 70) & (df['ImageName'] <= 90), 'Test'] = 2
By using the masks you only set the value where the boolean condition is met, for the second mask you need to use the & operator to and the conditions and enclose the conditions in parentheses due to operator precedence
Actually I think it would be better to define your bin values and call cut, example:
In [20]:
df = pd.DataFrame({'ImageName': np.random.randint(0, 100, 20)})
df
Out[20]:
ImageName
0 48
1 78
2 5
3 4
4 9
5 81
6 49
7 11
8 57
9 17
10 92
11 30
12 74
13 62
14 83
15 21
16 97
17 11
18 34
19 78
In [22]:
df['group'] = pd.cut(df['ImageName'], range(0, 105, 10), right=False)
df
Out[22]:
ImageName group
0 48 [40, 50)
1 78 [70, 80)
2 5 [0, 10)
3 4 [0, 10)
4 9 [0, 10)
5 81 [80, 90)
6 49 [40, 50)
7 11 [10, 20)
8 57 [50, 60)
9 17 [10, 20)
10 92 [90, 100)
11 30 [30, 40)
12 74 [70, 80)
13 62 [60, 70)
14 83 [80, 90)
15 21 [20, 30)
16 97 [90, 100)
17 11 [10, 20)
18 34 [30, 40)
19 78 [70, 80)
Here the bin values were generated using range but you could pass your list of bin values yourself, once you have the bin values you can define a lookup dict:
In [32]:
d = dict(zip(df['group'].unique(), range(len(df['group'].unique()))))
d
Out[32]:
{'[0, 10)': 2,
'[10, 20)': 4,
'[20, 30)': 9,
'[30, 40)': 7,
'[40, 50)': 0,
'[50, 60)': 5,
'[60, 70)': 8,
'[70, 80)': 1,
'[80, 90)': 3,
'[90, 100)': 6}
You can now call map and add your new column:
In [33]:
df['test'] = df['group'].map(d)
df
Out[33]:
ImageName group test
0 48 [40, 50) 0
1 78 [70, 80) 1
2 5 [0, 10) 2
3 4 [0, 10) 2
4 9 [0, 10) 2
5 81 [80, 90) 3
6 49 [40, 50) 0
7 11 [10, 20) 4
8 57 [50, 60) 5
9 17 [10, 20) 4
10 92 [90, 100) 6
11 30 [30, 40) 7
12 74 [70, 80) 1
13 62 [60, 70) 8
14 83 [80, 90) 3
15 21 [20, 30) 9
16 97 [90, 100) 6
17 11 [10, 20) 4
18 34 [30, 40) 7
19 78 [70, 80) 1
The above can be modified to suit your needs but it's just to demonstrate an approach which should be fast and without the need to iterate over your df.
In Python, you use the dictionary lookup notation to find a field within a row. The field name is ImageName. In the spatLoc() function below, the parameter row is a dictionary containing the entire row, and you would find an individual column by using the field name as key to the dictionary.
def spatLoc(row):
if row['ImageName'] <=70:
LocationCode = 1
elif row['ImageName'] >70 and row['ImageName'] <=90:
LocationCode = 2
return LocationCode
df['test'] = df.apply(spatLoc, axis=1)

Numpy array operation using another indices array

I want to do a multidimensional array operation using numpy on three arrays, of which one is an index array, e.g.:
a = numpy.arange(20).reshape((5, 4))
# a = [[ 0 1 2 3] [ 4 5 6 7] [ 8 9 10 11] [12 13 14 15] [16 17 18 19]]
b = numpy.arange(24).reshape(((3, 2, 4)))
# b = [[[ 0 1 2 3] [ 4 5 6 7]] [[ 8 9 10 11] [12 13 14 15]] [[16 17 18 19] [20 21 22 23]]]
c = numpy.array([0,0,1,1,2])
# c = [0 0 1 1 2]
now, what I want is:
d = a * b[&] + b[&&]
where & is the second element of second dimension of b (e.g. [ 4 5 6 7]) and && is the first element of second dimension (e.g. [ 0 1 2 3]) related to i-th item of the first dimension of b, where i is from array c (e.g. c[0]=0 for the first element of first dimension of array b). d has same dimension as a.
Edit: Answer for the above example is:
# d = [[0 6 14 24] [16 26 38 52] [104 126 150 176] [152 178 206 236] [336 374 414 456]]
Thanks
>>> a * b[c,1,:] + b[c,0,:]
array([[ 0, 6, 14, 24],
[ 16, 26, 38, 52],
[104, 126, 150, 176],
[152, 178, 206, 236],
[336, 374, 414, 456]])

Categories

Resources