How to replace values in a array? - python

I'm beggining to study python and saw this:
I have and array(km_media) that have nan values,
km_media = km / (2019 - year)
it happend because the variable year has some 2019.
So for the sake of learning, I would like to know how do to 2 things:
how can I use the replace() to substitute the nan values for 0 in the variable;
how can i print the variable that has the nan values with the replace.
What I have until now:
1.
km_media = km_media.replace('nan', 0)
print(f'{km_media.replace('nan',0)}')
Thanks

Not sure is this will do what you are looking for?
a = 2 / np.arange(5)
print(a)
array([ inf, 2. , 1. , 0.66666667, 0.5 ])
b = [i if i != np.inf or i != np.nan else 0 for i in a]
print(b)
Output:
[0, 2.0, 1.0, 0.6666666666666666, 0.5]
Or:
np.where(((a == np.inf) | (a == np.nan)), 0, a)
Or:
a[np.isinf(a)] = 0
Also, for part 2 of your question, I'm not sure what you mean. If you have just replaced the inf's with 0, then you will just be printing zeros. If you want the index position of the inf's you have replaced, you can grab them before replacement:
np.where(a == np.inf)[0][0]
Output:
0 # this is the index position of np.inf in array a

Related

Combining two numpy arrays with equations based on both arrays

I have a python numpy 3x4 array A:
A=np.array([[0,1,2,3],[4,5,6,7],[1,1,1,1]])
and a 3x3 array B:
B=np.array([[1,1, 1],[2, 2, 2],[3,3,3]])
I am trying to use a numpy operation to produce array C where each element in C is based on an equation using corresponding elements in A and the entire row in B. A simplified example:
C[row,col] = A[ro1,col] * ( A[row,col] / B[row,0] + B[row,1] + B[row,2) )
My first thoughts were to just simple and just multiply all of A by column in B. Error.
C = A * B[:,0]
Then I thought to try this but it didn't work.
C = A[:,:] * B[:,0]
I am not sure how to use the " : " operator and get access to the specific row, col at the same time. I can do this in regular loops but I wanted something more numpy.
mport numpy as np
A=np.array([[0,1,2,3],[4,5,6,7],[1,1,1,1]])
B=np.array([[1,1, 1],[2, 2, 2],[3,3,3]])
C=np.zeros([3,4])
row,col = A.shape
print(A.shape)
print(A)
print(B.shape)
print(B)
print(C.shape)
print(C)
print(range(row-1))
for row in range(row):
for col in range(col):
C[row,col] = A[row,col] * (( A[row,col] / B[row,0]) + B[row,1] + B[row,2])
print(C)
Which prints:
(3, 4)
[[0 1 2 3]
[4 5 6 7]
[1 1 1 1]]
(3, 3)
[[1 1 1]
[2 2 2]
[3 3 3]]
(3, 4)
[[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]]
range(0, 2)
[[ 0. 3. 8. 15. ]
[24. 32.5 42. 0. ]
[ 6.33333333 6.33333333 0. 0. ]]
Suggestions on a better way?
Edited:
Now that I understand broadcasting a bit more, and got that code running, let me expand in a generic way what I am trying to solve. I am trying to map values of a category such as "Air" which can be a range (such as 0-5) that have to be mapped to a shade of a given RGB value. The values are recorded over a time period.
For example, at time 1, the value of Water is 4. The standard RGB color for Water is Blue (0,0,255). There are 5 possible values for Water. In the case of Blue, 255 / 5 = 51. To get the effect of the 4 value on the Blue palette, multiply 51 x 4 = 204. Since we want higher values to be darker, we subtract 255 (white) - 205 yielding 51. The Red and Green components end up being 0. So the value read at time N is a multiply on the weighted R, G and B values. We invert 0 values to be subtracted from 255 so they appear white. Stronger values are darker.
So to calculate the R' G' and B' for time 1 I used:
answer = data[:,1:4] - (data[:,1:4] / data[:,[0]] * data[:,[4]])
I can extract an [R, G, B] from and answer and put into an Image at some x,y. Works good. But I can't figure out how to use Range, R, G and B and calculate new R', G', B' for all Time 1, 2, ... N. Trying to expand the numpy approach if possible. I did it with standard loops as:
for row in range(rows):
for col in range(cols):
r = int(data[row,1] - (data[row,1] / data[row,0] * data[row,col_offset+col] ))
g = int(data[row,2] - (data[row,2] / data[row,0] * data[row,col_offset+col] ))
b = int(data[row,3] - (data[row,3] / data[row,0] * data[row,col_offset+col] ))
almostImage[row,col] = [r,g,b]
I can display the image in matplotlib and save it to .png, etc. So I think next step is to try list comprehension over the time points 2D array, and then refer back to the range and RGB values. Will give it a try.
Try this:
A*(A / B[:,[0]] + B[:,1:].sum(1, keepdims=True))
Output:
array([[ 0. , 3. , 8. , 15. ],
[24. , 32.5 , 42. , 52.5 ],
[ 6.33333333, 6.33333333, 6.33333333, 6.33333333]])
Explanation:
The first operation A/B[:,[0]] utilizes numpy broadcasting.
Then B[:,1:].sum(1, keepdims=True) is just B[:,1] + B[:,2], and keepdims=True allows the dimension to stay the same. Print it to see details.

Assignment by logical indexing in numpy

I have a real-valued numpy array of size (1000,). All values lie between 0 and 1, and I want to convert this to a categorical array. All values less than 0.25 should be assigned to category 0, values between 0.25 and 0.5 to category 1, 0.5 to 0.75 to category 2, and 0.75 to 1 to category 3. Logical indexing doesn't seem to work:
Y[Y < 0.25] = 0
Y[np.logical_and(Y >= 0.25, Y < 0.5)] = 1
Y[np.logical_and(Y >= 0.5, Y < 0.75)] = 2
Y[Y >= 0.75] = 3
Result:
for i in range(4):
print(f"Y == {i}: {sum(Y == i)}")
Y == 0: 206
Y == 1: 0
Y == 2: 0
Y == 3: 794
What needs to be done instead?
The error is in your conversion logic, not in your indexing. The final statment:
Y[Y >= 0.75] = 3
Converts not only the values in range 0.75 - 1.00, but also the prior assignments to classes 1 and 2.
You can reverse the assignment order, starting with class 3.
You can put an upper limit on the final class, although you still have a boundary problem with 1.00 vs class 1.
Perhaps best would be to harness the regularity of your divisions, such as:
Y = int(4*Y) # but you still have boundary problems.

get value by column index where row is a specific value

I have a dataframe sheet_overview:
Unnamed: 0 Headline Unnamed: 2 Unnamed: 3
0 nan 1. 1. username Erik
1 nan 1. 2. userage 23
2 nan 1. 3. favorite ice
I want to get the value 23, by looking for "1. 2." in the second column.
If I don't want to go onto the column names, I have to use the index. My question is, if my approach is too complicated.
It works but it seems to be too much and not very pythonic:
age = sheet_overview.iloc[
sheet_overview[
sheet_overview.iloc[:, 1] == '1. 2.']
.index[0], 3]
Add values for numpy array for filter with iloc and then use next for return first matched value - if not exist get missing:
a = sheet_overview.iloc[(sheet_overview.iloc[:, 1] == '1. 2.').values, 3]
a = next(iter(a), 'missing')
print (a)
23
If performance is important , use numba:
from numba import njit
#njit
def first_val(A, k):
a = A[:, 0]
b = A[:, 1]
for i in range(len(a)):
if a[i] == k:
return b[i]
return 'missing'
a = first_val(sheet_overview.iloc[:, [1,3]].values, '1. 2.')

np.logical_and operator not working as it should (numpy)

I have a dataset (ndarray, float 32), for example:
[-3.4028235e+38 -3.4028235e+38 -3.4028235e+38 ... 1.2578617e-01
1.2651859e-01 1.3053264e-01] ...
I want to remove all values below 0, greater than 1, so I use:
with rasterio.open(raster_file) as src:
h = src.read(1)
i = h[0]
i[np.logical_and(i >= 0.0, i <= 1.0)]
Obviously the first entries (i.e. -3.4028235e+38) should be removed but they still appear after the operator is applied. I'm wondering if this is related to the scientific notation and a pre-step is required to be performed, but I can't see what exactly. And ideas?
To simplify this, here is the code again:
pp = [-3.4028235e+38, -3.4028235e+38, -3.4028235e+38, 1.2578617e-01, 1.2651859e-01, 1.3053264e-01]
pp[np.logical_and(pp => 0.0, pp <= 1.0)]
print (pp)
And the result
pp = [-3.4028235e+38, -3.4028235e+38, -3.4028235e+38, 0.12578617, 0.12651859, 0.13053264]
So the first 3 entries still remain.
The problem is that you are not removing the indices you selected. You are just selecting them.
If you want to remove them. You should probably convert them to nans as such
from numpy import random, nan, logical_and
a = random.randn(10, 3)
print(a)
a[logical_and(a > 0, a < 1)] = nan
print(a)
Output example
[[-0.95355719 nan nan]
[-0.21268393 nan -0.24113676]
[-0.58929128 nan nan]
[ nan -0.89110972 nan]
[-0.27453321 1.07802157 1.60466863]
[-0.34829213 nan 1.51556019]
[-0.4890989 nan -1.08481203]
[-2.17016962 nan -0.65332871]
[ nan 1.58937678 1.79992471]
[ nan -0.91716538 1.60264461]]
Alternatively you can look into masked array
Silly mistake, I had to wrap the array in a numpy array, then assign a variable to the new constructed array, like so:
j = np.array(pp)
mask = j[np.logical_and(j >= 0.0, j <= 1.0)]

Finding an interval from a list

I have a list of including some cost values for a project. The list c says that a project started in year 1 (assuming c[0] = year 1, suspended in year 3, and completed at the end of year 4. So, there is no associated cost in year 5.
c = [3000000.0, 3000000.0, 0.0, 200000.0, 0.0]
From the list, I want to find the project length, which is basically 4 not 3 in the above example based on my way of programming. If the list would be as following:
d = [3000000.0, 3000000.0, 100000.0, 200000.0, 0.0]
I could have the following to solve my problem:
Input:
cc = 0
for i in d:
if i>0:
cc += 1
Output:
cc = 4
However, it does not really work when there is a suspension(gap) between two years. Any suggestions to make it work?
So, you want to find the position of last 0 in the list.
Look at this question
What I think is the best approach from the link above is:
last_0 = max(loc for loc, val in enumerate(c) if val == 0)
You can also calculate the first 0:
first_0 = min(loc for loc, val in enumerate(c) if val == 0)
And their difference is the length.
In one block:
zeros_indices = [loc for loc, val in enumerate(c) if val == 0]
length = max(zeros_indices) - min(zeros_indices)
if you want to find the last index + 1 (index start to 0) which is not 0 you can do :
>>> c = [3000000.0, 3000000.0, 0.0, 200000.0, 0.0, 0.0]
>>> cc=[c.index(i) for i in c if i!=0][-1]+1
>>> cc
4
EDIT :
you can use numpy to not take in account the first 0 in the list:
>>> c = [0.0, 0.0, 3000000.0, 3000000.0, 0.0, 200000.0, 0.0, 0.0]
>>> import numpy as np
>>> np.trim_zeros(c)
[3000000.0, 3000000.0, 0.0, 200000.0]
>>> len(np.trim_zeros(c))
4
You could look backwards through the list until you find the index of an element which has a value greater than zero, and if you add one to that index it will be the length in years with gaps allowed.
To include gap years to your counting, you can check that the current item is not the last one in the list.
You would better do this by iterating over indexes instead of elements to avoid having to find out your position within the list at each iteration.
cc = 0
for i in xrange(len(c)):
if c[i] > 0 and i < (len(c) - 1):
cc += 1
The xrange function generates a list from 0 to len(c). You could eihter use range instead but range keeps the generated list in memory while xrange does not.
EDIT: This does not handle multiple zeros at the end of the list c
c = [3000000.0, 3000000.0, 0.0, 0.0, 200000.0, 0.0]
cc = 0
templength = 0
for i in reversed(c):
if i == 0:
templength += 1
else:
break
print (len(c)-templength)
output:
5

Categories

Resources