Comparing between one value and a range of values in an array - python

I have a 2D array that looks like this:
a = [[ 0 0 0 0 0 25 30 35 40 45 50 55 60 65 70 75]
[ 4 5 6 7 8 29 34 39 44 49 54 59 64 69 74 250]]
and I also have another 1D array that looks like this:
age_array = [45,46,3,7]
is there a way to verify that the values in age_array are within the range of the 2 values in the first column of a and if not then move on to the next column? For example,
if a[0: , :] <= age_array[i] <= a[1:, :]
return True
else: return False

If you want to know if each entry in the age array is between the a[0][0] and a[1][0]
a = [[0, 0, 0, 0, 0, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75],
[4, 5, 6, 7, 8, 29, 34, 39, 44, 49, 54, 59, 64, 69, 74, 250]]
age_array = [45,46,3,7]
dct = {}
for age in age_array:
for i in range(len(a[0])):
if a[0][i] <= age and age <= a[1][i]:
print(str(age) + ' is between ' + str(a[0][i]) + ' and ' + str(a[1][i]))
break
This outputs:
45 is between 45 and 49
46 is between 45 and 49
3 is between 0 and 4
7 is between 0 and 7

You can convert both the arrays into sets and then check if age_array set is a subset of a set.
Unfortunately I cannot post answer as your first array is not properly formatted

Very simple to understand but it might look quite ugly.
value=[]
for x in range(len(a)):
for xx in range(len(a[x])):
for xxx in range(len(b)):
if a[x][xx]==b[xxx]:
value.append("true")
else:
value.append("false")
for a in value:
if a=="true":
#it falls in the category

Related

having trouble inserting a value into a numpy array

I'm creating a hash function and for some reason i cant use the np.insert function. I don't understand what I'm doing wrong and need a bit of help.
lis = np.arange(1,13)
key = np.array([18, 41, 22, 44, 59, 32, 31, 73])
# h(x) = x mod 13
for i in range(len(key)):
slot = key[i] % 13
np.insert(lis, slot, key[i])
print(lis)
lis returns
[ 1 2 3 4 5 6 7 8 9 10 11 12]
np.insert creates a new array that is, it does not manipulate lis it creates a copy of lis & manipulates that. So if you want to store the results back into lis try the following
lis = np.arange(1,13)
key = np.array([18, 41, 22, 44, 59, 32, 31, 73])
# h(x) = x mod 13
for i in range(len(key)):
slot = key[i] % 13
lis = np.insert(lis, slot, key[i])
print(lis)
This gives,
[ 1 2 41 3 4 31 44 32 73 5 59 18 6 7 22 8 9 10 11 12]

Python 3: Numpy 3d array to Pandas dataframe with 1st dimension values as columns and rows/cols position paired in one column

As the long title hints, I have an array of shape [n,m,z] and I want to turn it into a Pandas dataframe with the first column being an array of row and col position (2nd and 3rd dimension) and the next 13 columns being the value as from 1st dimension, leading to a DataFrame of (m*z)Xn. I have been reading the other examples but I haven't found any with pivoting one dimension to columns.
For example, for an array of shape [3,2,4]
import numpy as np
import pandas as pd
rand_int = np.random.randint(10,90,(3,2,4))
print(rand_int)
[[[57 76 30 34]
[21 70 10 51]]
[[73 67 55 51]
[78 38 50 76]]
[[89 58 47 35]
[45 11 61 18]]]
I want it to return as
Pair Col1 Col2 Col3
[0,0] 57 73 89
[0,1] 76 67 58
[0,2] 30 55 47
...
[1,3] 51 76 18
Can anyone help?
I may loop in m and z dimension to retrieve value.
import numpy as np
import pandas as pd
n = 3
m = 2
z = 4
rand_int = np.random.randint(10, 90, (n,m,z))
datas = [[[57, 76, 30, 34],
[21, 70, 10, 51]],
[[73, 67, 55, 51],
[78, 38, 50, 76]],
[[89, 58, 47, 35],
[45, 11, 61, 18]]]
res = []
for i in range(m):
for j in range(z):
res.append([[i, j]] + [data[i][j] for data in datas])
df = pd.DataFrame(res, columns=['Pair', 'Col1', 'Col2', 'Col3'])
print(df)
Pair Col1 Col2 Col3
0 [0, 0] 57 73 89
1 [0, 1] 76 67 58
2 [0, 2] 30 55 47
3 [0, 3] 34 51 35
4 [1, 0] 21 78 45
5 [1, 1] 70 38 11
6 [1, 2] 10 50 61
7 [1, 3] 51 76 18

How can I group the numbers by 5?

def num ():
num = int (input("Enter a number: ") )
while num in range (num >= 0,100) :
num += 1
print (num, end = " ")
num ()
My problem is I don't know how to group it into 5 (for e.g. 1 2 3 4 5 and the next line is 6 7 8 9 10). 5 numbers each line. And when the user inputs a number, it will count up from that number inputted.
Here is a variation:
num = int (input("Enter a number: ") )
l = list(range(num,100))
for i in range(0, len(l),5):
print(" ".join(map(str, l[i:i+5])))
We take sublists of size 5 (or less for the last one if necesarry) and use join to create a string with spaces. Since join needs strings, i use map
Example: (input 83)
83 84 85 86 87
88 89 90 91 92
93 94 95 96 97
98 99
You probably want something like this. Note, this handles when user input is 0.
def num ():
num = int (input("Enter a number: "))
count = 0
if num == 0:
foo = range(0, 100)
for num in foo:
count += 1
print(num, end = " ")
if count == 5:
count = 0
print()
else:
while num in range(num >= 0,100):
num += 1
count += 1
print (num, end = " ")
if count == 5:
count = 0
print()
num ()
Enter a number: 13
14 15 16 17 18
19 20 21 22 23
24 25 26 27 28
29 30 31 32 33
34 35 36 37 38
39 40 41 42 43
44 45 46 47 48
49 50 51 52 53
54 55 56 57 58
59 60 61 62 63
64 65 66 67 68
69 70 71 72 73
74 75 76 77 78
79 80 81 82 83
84 85 86 87 88
89 90 91 92 93
94 95 96 97 98
99 100
from functools import partial
from itertools import islice
num = int(input("Enter a number: "))
get_sublist = lambda iterable,length: list(islice(iterable, length))
print(list(iter(partial(get_sublist, iter(range(num,100)), 5), [])))
Enter a number: 12
[[12, 13, 14, 15, 16], [17, 18, 19, 20, 21], [22, 23, 24, 25, 26], [27, 28, 29, 30, 31],
[32, 33, 34, 35, 36], [37, 38, 39, 40, 41], [42, 43, 44, 45, 46], [47, 48, 49, 50, 51],
[52, 53, 54, 55, 56], [57, 58, 59, 60, 61], [62, 63, 64, 65, 66], [67, 68, 69, 70, 71],
[72, 73, 74, 75, 76], [77, 78, 79, 80, 81], [82, 83, 84, 85, 86], [87, 88, 89, 90, 91],
[92, 93, 94, 95, 96], [97, 98, 99]]
ref : more_itertools
Check this code:
num = int (input("Enter a number: "))
i = 100
while num in range (num >= 1,100) :
num += 1
i += 1
print (num, end = " ")
if i%5 == 0:
print()
Output:
Enter a number: 10
11 12 13 14 15
16 17 18 19 20
21 22 23 24 25
....
This may be able to be condensed but:
num = int (input("Enter a number: ") )
while i <= 100:
print(i, i+1, i+2, i+3, i+4, i+5)
i = i + 5

how to sort the chunk of list in python?

I have a list of rates, which contain almost 35040 values in it. I have divided my list into 365 blocks of 96 elements in it. Now I want to get the first 4 minimum values from each block and to achieve that first I am sorting blocks in increasing order and printing or inserting the first 4 elements from the list into a new list.
my approach:
import pandas as pd
inputFile = "inputFile.xlsx"
fileName = inputFile
inputSheetDF = pd.read_excel(fileName, sheet_name='Sheet1')
iexRate = inputSheetDF['IEX Price']
#iexRate = [2.3, 2.4, 3, 4, 3.2, 4.1, 5.......]
testList = []
n = 96
x = [iexRate[i:i + n] for i in range(0, len(iexRate), n)]
x.sort()
but this x.sort() giving me an error.
ValueError: Can only compare identically-labeled Series objects
So basically I want an output in that testList which contains the first 4 minimum elements in each 96 block.
Here's a proposed solution, which has the advantage of being vectorized. I'm using a much smaller dataset - 3 chunks of 4 each, sampling the top (button) 2 from each chunk - but the idea for a larger dataset is of course the same.
df = pd.DataFrame({"rate": np.random.randint(1, 100, 12), "chunk": [1]*4 + [2]*4 + [3]*4 })
print(df)
==>
rate chunk
0 81 1
1 51 1
2 50 1
3 83 1
4 33 2
5 88 2
6 97 2
7 2 2
8 22 3
9 23 3
10 4 3
11 83 3
df.sort_values("rate", inplace=True)
df.groupby("chunk").head(2).sort_values("chunk")
==>
rate chunk
2 50 1
1 51 1
7 2 2
4 33 2
10 4 3
8 22 3
To get a flat list of all the rates, just do :
flat_list = list(res.rate)
==> [50, 51, 2, 33, 4, 22]
iexRate = pd.Series(range(1,100))
n = 15
x = [iexRate[i:i + n] for i in range(0, len(iexRate), n)]
testList = [sorted(block)[:4] for block in x]
[[1, 2, 3, 4], [16, 17, 18, 19], [31, 32, 33, 34], [46, 47, 48, 49], [61, 62, 63, 64], [76, 77, 78, 79], [91, 92, 93, 94]]

new python pandas dataframe column based on value of variable, using function

I have a variable, 'ImageName' which ranges from 0-1600. I want to create a new variable, 'LocationCode', based on the value of 'ImageName'.
If 'ImageName' is less than 70, I want 'LocationCode' to be 1. if 'ImageName' is between 71 and 90, I want 'LocationCode' to be 2. I have 13 different codes in all. I'm not sure how to write this in python pandas. Here's what I tried:
def spatLoc(ImageName):
if ImageName <=70:
LocationCode = 1
elif ImageName >70 and ImageName <=90:
LocationCode = 2
return LocationCode
df['test'] = df.apply(spatLoc(df['ImageName'])
but it returned an error. I'm clearly not defining things the right way but I can't figure out how to.
You can just use 2 boolean masks:
df.loc[df['ImageName'] <= 70, 'Test'] = 1
df.loc[(df['ImageName'] > 70) & (df['ImageName'] <= 90), 'Test'] = 2
By using the masks you only set the value where the boolean condition is met, for the second mask you need to use the & operator to and the conditions and enclose the conditions in parentheses due to operator precedence
Actually I think it would be better to define your bin values and call cut, example:
In [20]:
df = pd.DataFrame({'ImageName': np.random.randint(0, 100, 20)})
df
Out[20]:
ImageName
0 48
1 78
2 5
3 4
4 9
5 81
6 49
7 11
8 57
9 17
10 92
11 30
12 74
13 62
14 83
15 21
16 97
17 11
18 34
19 78
In [22]:
df['group'] = pd.cut(df['ImageName'], range(0, 105, 10), right=False)
df
Out[22]:
ImageName group
0 48 [40, 50)
1 78 [70, 80)
2 5 [0, 10)
3 4 [0, 10)
4 9 [0, 10)
5 81 [80, 90)
6 49 [40, 50)
7 11 [10, 20)
8 57 [50, 60)
9 17 [10, 20)
10 92 [90, 100)
11 30 [30, 40)
12 74 [70, 80)
13 62 [60, 70)
14 83 [80, 90)
15 21 [20, 30)
16 97 [90, 100)
17 11 [10, 20)
18 34 [30, 40)
19 78 [70, 80)
Here the bin values were generated using range but you could pass your list of bin values yourself, once you have the bin values you can define a lookup dict:
In [32]:
d = dict(zip(df['group'].unique(), range(len(df['group'].unique()))))
d
Out[32]:
{'[0, 10)': 2,
'[10, 20)': 4,
'[20, 30)': 9,
'[30, 40)': 7,
'[40, 50)': 0,
'[50, 60)': 5,
'[60, 70)': 8,
'[70, 80)': 1,
'[80, 90)': 3,
'[90, 100)': 6}
You can now call map and add your new column:
In [33]:
df['test'] = df['group'].map(d)
df
Out[33]:
ImageName group test
0 48 [40, 50) 0
1 78 [70, 80) 1
2 5 [0, 10) 2
3 4 [0, 10) 2
4 9 [0, 10) 2
5 81 [80, 90) 3
6 49 [40, 50) 0
7 11 [10, 20) 4
8 57 [50, 60) 5
9 17 [10, 20) 4
10 92 [90, 100) 6
11 30 [30, 40) 7
12 74 [70, 80) 1
13 62 [60, 70) 8
14 83 [80, 90) 3
15 21 [20, 30) 9
16 97 [90, 100) 6
17 11 [10, 20) 4
18 34 [30, 40) 7
19 78 [70, 80) 1
The above can be modified to suit your needs but it's just to demonstrate an approach which should be fast and without the need to iterate over your df.
In Python, you use the dictionary lookup notation to find a field within a row. The field name is ImageName. In the spatLoc() function below, the parameter row is a dictionary containing the entire row, and you would find an individual column by using the field name as key to the dictionary.
def spatLoc(row):
if row['ImageName'] <=70:
LocationCode = 1
elif row['ImageName'] >70 and row['ImageName'] <=90:
LocationCode = 2
return LocationCode
df['test'] = df.apply(spatLoc, axis=1)

Categories

Resources