python check if float value is missing - python

I am trying to find "missing" values in a python array of floats.
Such that in this case [1.1, 1.3, 2.1, 2.2, 2.3] I would like to print "1.2"
I dont have much experience with floats, I have tried something like this How to find a missing number from a list? but it doesn't work on floats.
Thanks!

To solve this, the problem would need to be simplified first, I am assuming that all the values would be float and with one decimal place, also let's assume that there can be multiple ranges like 1.1-1.3 and 2.1-2.3, also assuming that the numbers are in sorted order, here is a solution. It is written in python 3 by the way
vals = [1.1, 1.3, 2.1, 2.2, 2.3] # This will be the values in which to find the missing number
# The logic starts from here
for i in range(len(vals) - 1):
if vals[i + 1] * 10 - vals[i] * 10 == 2:
print((vals[i] * 10 + 1)/10)
print("\nfinished")

You might want to use https://numpy.org/doc/stable/reference/generated/numpy.arange.html
and create a list of floats (if you know start, end, step values)
Then you can create two sets and use difference to find missing values

Simplest yet dumb way:
Split float to integer and decimal parts.
Create cartesian product of both to generate Full array.
Use set and XOR to find out missing ones.
from itertools import product
source = [1.1, 1.3, 2.1, 2.2, 2.3]
separated = [str(n).split(".") for n in source]
integers, decimals = map(set, zip(*separated))
products = [float(f"{i}.{d}") for i, d in product(integers, decimals)]
print(*(set(products) ^ set(source)))
output:
1.2

I guess that the solutions to the problem you quote proprably work on your case, you just need to adapt the built-in range function to numpy.arange that allow you to create a range of numbers with floats.
it gets something like that: (just did a simple example)
import numpy as np
np_range = np.arange(1, 2, 0.1)
float_list = [1.2, 1.3, 1.4, 1.6]
for i in np_range:
if not round(i, 1) in float_list:
print(round(i, 1))
output:
1.0
1.1
1.5
1.7
1.8
1.9

This is an absolutely AWFUL way to do this, but depending on how many numbers you have in the list and how difficult the other solutions are you might appreciate it.
If you write
firstvalue = 1.1
secondvalue = 1.2
thirdvalue = 1.3
#assign these for every value you are keeping track of
if firstvalue in val: #(or whatever you named your list)
print("1.1 is in the list")
else:
print("1.1 is missing!")
if secondvalue in val:
print("1.2 is in the list")
else:
print("1.2 is missing!")
#etc etc etc for every value in the list. It's tedious and dumb but if you have few enough values in your list it might be your simplest option

With numpy
import numpy as np
arr = [1.1, 1.3, 2.1, 2.2, 2.3]
find_gaps = np.array(arr).round(1)
find_gaps[np.r_[np.diff(find_gaps).round(1), False] == 0.2] + 0.1
Output
array([1.2])
Test with random data
import numpy as np
np.random.seed(10)
arr = np.arange(0.1, 10.4, 0.1)
mask = np.random.randint(0,2, len(arr)).astype(np.bool)
gaps = arr[mask]
print(gaps)
find_gaps = np.array(gaps).round(1)
print('missing values:')
print(find_gaps[np.r_[np.diff(find_gaps).round(1), False] == 0.2] + 0.1)
Output
[ 0.1 0.2 0.4 0.6 0.7 0.9 1. 1.2 1.3 1.6 2.2 2.5 2.6 2.9
3.2 3.6 3.7 3.9 4. 4.1 4.2 4.3 4.5 5. 5.2 5.3 5.4 5.6
5.8 5.9 6.1 6.4 6.8 6.9 7.3 7.5 7.6 7.8 7.9 8.1 8.7 8.9
9.7 9.8 10. 10.1]
missing values:
[0.3 0.5 0.8 1.1 3.8 4.4 5.1 5.5 5.7 6. 7.4 7.7 8. 8.8 9.9]
More general solution
Find all missing value with specific gap size
import numpy as np
def find_missing(find_gaps, gaps = 1):
find_gaps = np.array(find_gaps)
gaps_diff = np.r_[np.diff(find_gaps).round(1), False]
gaps_index = find_gaps[(gaps_diff >= 0.2) & (gaps_diff <= round(0.1*(gaps + 1),1))]
gaps_values = np.searchsorted(find_gaps, gaps_index)
ranges = np.vstack([(find_gaps[gaps_values]+0.1).round(1),find_gaps[gaps_values+1]]).T
return np.concatenate([np.arange(start, end, 0.1001) for start, end in ranges]).round(1)
vals = [0.1,0.3, 0.6, 0.7, 1.1, 1.5, 1.8, 2.1]
print('Vals:', vals)
print('gap=1', find_missing(vals, gaps = 1))
print('gap=2', find_missing(vals, gaps = 2))
print('gap=3', find_missing(vals, gaps = 3))
Output
Vals: [0.1, 0.3, 0.6, 0.7, 1.1, 1.5, 1.8, 2.1]
gap=1 [0.2]
gap=2 [0.2 0.4 0.5 1.6 1.7 1.9 2. ]
gap=3 [0.2 0.4 0.5 0.8 0.9 1. 1.2 1.3 1.4 1.6 1.7 1.9 2. ]

Related

What's the best way for looping through pandas df and comparing 2 different dataframes then performing division on values returned?

I'm currently writing Python code that compares offensive and defensive stats in basketball and I want to be able to create weights with the given stats. I have my stats saved in a dataframe according to: team, position, and other numerical stats. I want to be able to loop through each team and their respective positions and corresponding stats. e.g.:
['DAL', 'C', 0.0, 3.0, 0.5, 0.4, 0.5, 0.7, 6.4] vs ['BOS', 'C', 1.7, 6.0, 2.1, 0.1, 0.7, 1.9, 9.0]
So I would like to compare BOS vs DAL at the C position and compare points, rebounds, assists etc. If one is greater than the other then divide the greater by the lesser.
The best thing I have so far is to convert the the dataframes to numpy and then proceed to loop through those and append into a blank list:
df1 = df1.to_numpy()
df2 = df2.to_numpy()
df1_array = []
df2_array = []
for x in range(len(df1)):
for a, h in zip(away, home):
if df1[x][0] == a or df1[x][0] == h:
df1_array.append(df1[x])
After I get the new arrays I would then loop through them again to compare values, however I feel like this is too rudimentary. What could be a more efficient or smarter way of executing this?
Use numpy.where to compare rows and return the truth value of ('team1' > 'team2') element-wise:
import pandas as pd
import numpy as np
# Creating the dataframe
team1 = ['DAL', 'C', 0.1, 3.0, 0.5, 0.4, 0.5, 0.7, 6.4]
team2 = ['BOS', 'C', 1.7, 6.0, 2.1, 0.1, 0.7, 1.9, 9.0]
df = pd.DataFrame(
{'team1':team1,
'team2':team2,
})
# Select the rows that contain numbers
df2 = df.iloc[2:].copy()
# Make the comparison, if team1 is larger than team2 then team1/team2 and viseversa.
df2['result'] = np.where(df2['team1']>df2['team2'], \
df2['team1']/df2['team2'], \
df2['team2']/df2['team1'])
df['result'] = df2['result'].fillna(0)
This yields
team1 team2 result
0 DAL BOS NaN
1 C C NaN
2 0.1 1.7 17.0
3 3.0 6.0 2.0
4 0.5 2.1 4.2
5 0.4 0.1 4.0
6 0.5 0.7 1.4
7 0.7 1.9 2.714286
8 6.4 9.0 1.40625
Becareful with the 0 in the first column of values in your problem description though, I changed it to 0.1 as otherwise it will give zero division error.

Why are the values different when iterating them in a for loop, than when printing the whole array? [duplicate]

This question already has answers here:
Is floating point math broken?
(31 answers)
Closed 2 years ago.
In Python, using numpy, the values change for printing them in an iterating process, or printing the whole array, why, and how can I fix this? I would like them to be e.g. 0.8 instead of 0.799999999...
>>> import numpy as np
>>> b = np.arange(0.5,2,0.1)
>>> for value in b:
... print(value)
...
0.5
0.6
0.7
0.7999999999999999
0.8999999999999999
0.9999999999999999
1.0999999999999999
1.1999999999999997
1.2999999999999998
1.4
1.4999999999999998
1.5999999999999996
1.6999999999999997
1.7999999999999998
1.8999999999999997
>>> print(b)
[0.5 0.6 0.7 0.8 0.9 1. 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9]
>>>
This happens because Python and NumPy use floating point arithmetic where some numbers, i.e. 0.1, cannot be represented exactly.
Also check python floating-point issues & limitations.
You can use Numpy's np.around for this:
>> b = np.around(b, 1) # first arg is the np array, second arg is the no. of decimal points
>> for value in b:
print(value)
0.5
0.6
0.7
0.8
0.9
1.0
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
To print you can use print format - "%.1f"
>>> for value in b:
... print("%.1f" %value)
...
0.5
0.6
0.7
0.8
0.9
1.0
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
you can use round to get whole number
for value in b:
print(round(value,2))
What i think is, that the method __str__ for the ndarray is implemented that way. There is nothing strange about that behaviour - when you use
print(b)
the function __str__ is called for readability. During this call you operate on ndarray. When you make print in for loop, you use __str__ of float number, which prints the number as it is.
Hope that it is clear, but this can be actually helpful.
:)

Sort list based on another list

I have two lists in python3.6, and I would like to sort w by considering d values. This is similar to this question,
Sorting list based on values from another list? , though, I could not use zip because w and d are not paired data.
I have a code sample, and want to get t variable.
Updated
I could do it by using for loop. Is there any fasterh way?
import numpy as np
w = np.arange(0.0, 1.0, 0.1)
t = np.zeros(10)
d = np.array([3.1, 0.2, 5.3, 2.2, 4.9, 6.1, 7.7, 8.1, 1.3, 9.4])
ind = np.argsort(d)
print('w', w)
print('d', d)
for i in range(10):
t[ind[i]] = w[i]
print('t', t)
#w [ 0. 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9]
#d [ 3.1 0.2 5.3 2.2 4.9 6.1 7.7 8.1 1.3 9.4]
#ht [ 0.3 0. 0.5 0.2 0.4 0.6 0.7 0.8 0.1 0.9]
Use argsort like so:
>>> t = np.empty_like(w)
>>> t[d.argsort()] = w
>>> t
array([0.3, 0. , 0.5, 0.2, 0.4, 0.6, 0.7, 0.8, 0.1, 0.9])
They are paired data, but in the opposite direction.
Make a third list, i, np.arange(0, 10).
zip this with d.
Sort the tuples with d as the sort key; i still holds the original index of each d element.
zip this with w.
Sort the triples (well, pairs with a pair as one element) with i as the sort key.
Extract the w values in their new order; this is your t array.
The answers for this question are fantastic, but I feel it is prudent to point out you are not doing what you think you are doing.
What you want to do: (or at least what I gather) You want t to contain the values of w rearranged to be in the sorted order of d
What you are doing: Filling out t in the sorted order of d, with elements of w. You are only changing the order of how t gets filled up. You are not reflecting the sort of d into w on t
Consider a small variation in your for loop
for i in range(0,10):
t[i] = w[ind[i]]
This outputs a t
('t', array([0.1, 0.8, 0.3, 0. , 0.4, 0.2, 0.5, 0.6, 0.7, 0.9]))
You can just adapt PaulPanzer's answer to this as well.

Running mean of numpy ndarrays from iterator

The question of how to compute a running mean of a series of numbers has been asked and answered before. However, I am trying to compute the running mean of a series of ndarrays, with an unknown length of series. So, for example, I have an iterator data where I would do:
running_mean = np.zeros((1000,3))
while True:
datum = next(data)
running_mean = calc_running_mean(datum)
What would calc_running_mean look like? My primary concern here is memory, as I can't have the entirety of the data in memory, and I don't know how much data I will be receiving. datum would be an ndarray, let's say that for this example it's a (1000,3) array, and the running mean would be an array of the same size, with each element containing the elementwise mean of every element we've seen in that position so far.
The key distinction this question has from previous questions is that it's calculating the elementwise mean of a series of ndarrays, and the number of arrays is unknown.
You can use itertools together with standard operators:
>>> import itertools, operator
>>> running_sum = itertools.accumulate(data)
>>> running_mean = map(operator.truediv, running_sum, itertools.count(1))
Example:
>>> data = (np.linspace(-i, i*i, 6) for i in range(10))
>>>
>>> running_sum = itertools.accumulate(data)
>>> running_mean = map(operator.truediv, running_sum, itertools.count(1))
>>>
>>> for i in running_mean:
... print(i)
...
[0. 0. 0. 0. 0. 0.]
[-0.5 -0.3 -0.1 0.1 0.3 0.5]
[-1. -0.46666667 0.06666667 0.6 1.13333333 1.66666667]
[-1.5 -0.5 0.5 1.5 2.5 3.5]
[-2. -0.4 1.2 2.8 4.4 6. ]
[-2.5 -0.16666667 2.16666667 4.5 6.83333333 9.16666667]
[-3. 0.2 3.4 6.6 9.8 13. ]
[-3.5 0.7 4.9 9.1 13.3 17.5]
[-4. 1.33333333 6.66666667 12. 17.33333333 22.66666667]
[-4.5 2.1 8.7 15.3 21.9 28.5]

Create Pandas dataframe from numpy array and use first column of the array as index

I have a numpy array (a):
array([[ 1. , 5.1, 3.5, 1.4, 0.2],
[ 1. , 4.9, 3. , 1.4, 0.2],
[ 2. , 4.7, 3.2, 1.3, 0.2],
[ 2. , 4.6, 3.1, 1.5, 0.2]])
I would like to make a pandas dataframe (pd) with values=a, columns= A,B,C,D and index= to the first column of my numpy array, finally it should looks like this:
A B C D
1 5.1 3.5 1.4 0.2
1 4.9 3.0 1.4 0.2
2 4.7 3.2 1.3 0.2
2 4.6 3.1 1.5 0.2
I am trying this:
df = pd.DataFrame(a, index=a[:,0], columns=['A', 'B','C','D'])
and I get the following error:
ValueError: Shape of passed values is (5, 4), indices imply (4, 4)
Any help?
Thanks
You passed the complete array as the data param, you need to slice your array also if you want just 4 columns from the array as the data:
In [158]:
df = pd.DataFrame(a[:,1:], index=a[:,0], columns=['A', 'B','C','D'])
df
Out[158]:
A B C D
1 5.1 3.5 1.4 0.2
1 4.9 3.0 1.4 0.2
2 4.7 3.2 1.3 0.2
2 4.6 3.1 1.5 0.2
Also having duplicate values in the index will make filtering/indexing problematic
So here a[:,1:] I take all the rows but index from column 1 onwards as desired, see the docs

Categories

Resources