i need help getting stDev without using importmath, python - python

### import math
def mean(values):
return sum(values)*1.0/len(values)
def std():
pass
print(std())
def std(values):
length = len(values)
if length < 2:
return("Standard deviation requires at least two data points")
m = mean(values)
total_sum = 0
for i in range(length):
total_sum += (values[i]-m)**2
under_root = total_sum*1.0/length
return math.sqrt(under_root)
vals = [5]
stan_dev = std(vals)
print(stan_dev)
values = [1, 2, 3, 4, 5]
stan_dev = std(values)
print(stan_dev)
__________________________________________________________________________
lst = [3, 19, 21, 1435, 653342]
sum = reduce((lambda x, y: x +y), lst)
print (sum)
# list = [3, 19, 21, 1435, 653342]
i need to be able to get the stDev without using sum or len
i need to 'unpack' the stDev ???

You can do it with two loops (there are shorter ways but this is simple):
arr = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
# Calculate the mean first
N, X = 0, 0
for xi in arr:
N += 1
X += xi
mean = X/N
# Calculate the standard deviation
DSS = 0
for xi in arr:
DSS += (xi - mean)**2
std = (DSS/N)**(1/2)
Outputs 4.5 for mean and 2.872 for std.

Related

Index array based on value limits of another

Let's say I have an array (or even a list) that looks like:
tmp_data = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
And then I have another ray that are distance values:
dist_data = [ 15.625 46.875 78.125 109.375 140.625 171.875 203.125 234.375 265.625 296.875]
Now, say I want to create a threshold of distance that I would like to perform an operation on from tmp_data. For this example, let's just take the max value. And let's set the threshold distance to 100. What I would like to do is take the n number of elements every 100 distance units and replace all elements in that with the maximum value in that small array. For example: I would want the final output to be
max_tmp_data_100 = [2,2,2,5,5,5,8,8,8,9]
This is because the first 3 elements in dist_data are below 100, so we take the first three elements of tmp_data (0,1,2), and get the maximum of this and replace all elements in there with that value, 2
Then, the next set of data that would be below the next 100 value would be
tmp_dist_array_100 = [109.375 140.625 171.875]
tmp_data_100 = [3,4,5]
max_tmp_data_100 = [5,5,5]
(append to [2,2,2])
I have come up with the following:
# Initialize
final_array = []
d_array = []
idx = 1
for i in range(0,10):
if dist_data[i] < idx * final_res:
d_array.append(tmp_data[i])
elif dist_data[i] > idx * final_res:
# Now get the values
max_val = np.amax(d_array)
new_array = np.ones(len(d_array)) * max_val
final_array.extend(new_array)
idx = idx + 1
But the outcome is
[2.0, 2.0, 2.0, 5.0, 5.0, 5.0, 5.0, 5.0]
When it should be [2,2,2,5,5,5,8,8,8,9]
With numpy:
import numpy as np
cdist_data = [15.625, 46.875, 78.125, 109.375, 140.625, 171.875, 203.125, 234.375,265.625, 296.875]
cut = 100
a = np.array(dist_data)
vals = np.searchsorted(a, np.r_[cut:a.max() + cut:cut]) - 1
print(vals[(a/cut).astype(int)])
It gives:
[2 2 2 5 5 5 9 9 9 9]
You can do with groupby
from itertools import groupby
dist_data = [ 15.625, 46.875 ,78.125 ,109.375 ,140.625 ,171.875 ,203.125 ,234.375, 265.625 ,296.875]
tmp_data = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
result = []
index_list = [[dist_data.index(i) for i in l]
for k, l in groupby(dist_data, key=lambda x:x//100)]
for i in tmp_data:
for lst in index_list:
if i in lst:
result.append(max(lst))
print(result)
# [2, 2, 2, 5, 5, 5, 9, 9, 9, 9]
A per your requirements last 4 elements will comes under next threshold value, the max of last 4 element is 9.

Add padding based on partial sum

I have four given variables:
group size
total of groups
partial sum
1-D tensor
and I want to add zeros when the sum within a group reached the partial sum. For example:
groupsize = 4
totalgroups = 3
partialsum = 15
d1tensor = torch.tensor([ 3, 12, 5, 5, 5, 4, 11])
The expected result is:
[ 3, 12, 0, 0, 5, 5, 5, 0, 4, 11, 0, 0]
I have no clue how can I achieve that in pure pytorch. In python it would be something like this:
target = [0]*(groupsize*totalgroups)
cursor = 0
current_count = 0
d1tensor = [ 3, 12, 5, 5, 5, 4, 11]
for idx, ele in enumerate(target):
subgroup_start = (idx//groupsize) *groupsize
subgroup_end = subgroup_start + groupsize
if sum(target[subgroup_start:subgroup_end]) < partialsum:
target[idx] = d1tensor[cursor]
cursor +=1
Can anyone help me with that? I have already googled it but couldn't find anything.
Some logic, Numpy and list comprehensions are sufficient here.
I will break it down step by step, you can make it slimmer and prettier afterwards:
import numpy as np
my_val = 15
block_size = 4
total_groups = 3
d1 = [3, 12, 5, 5, 5, 4, 11]
d2 = np.cumsum(d1)
d3 = d2 % my_val == 0 #find where sum of elements is 15 or multiple
split_points= [i+1 for i, x in enumerate(d3) if x] # find index where cumsum == my_val
#### Option 1
split_array = np.split(d1, split_points, axis=0)
padded_arrays = [np.pad(array, (0, block_size - len(array)), mode='constant') for array in split_array] #pad arrays
padded_d1 = np.concatenate(padded_arrays[:total_groups]) #put them together, discard extra group if present
#### Option 2
split_points = [el for el in split_points if el <len(d1)] #make sure we are not splitting on the last element of d1
split_array = np.split(d1, split_points, axis=0)
padded_arrays = [np.pad(array, (0, block_size - len(array)), mode='constant') for array in split_array] #pad arrays
padded_d1 = np.concatenate(padded_arrays)

Adding element of a list

I have a list containing some values. I want to calculate the sum of every 5 elements and then divide it by 5 and then store it in an empty list. While doing so I am not sure if I can iterate over a list the way I am doing. Being a newbie to python, any help would be much appreciated.
My list looks like this:
My code is:
a = []
i = np.arange(0,125,5)
j = np.arange(5,130,5)
for q,r in i,j:
cov = (np.sum(l[q:r]))/5
cov.append(a)
print(a)
I am getting the following error:
Instead of np.sum([i:i=+5])/5 you can use np.average().
instead of two value you can use range(0,length,5).
Try this:
a = []
for r in range(0,len(l),5):
try:
cov = (np.average(l[r:r+5]))
except IndexError:
cov = (np.average(l[r:]))
a.append(cov)
print(a)
If numpy is not a hard requirement I'd definitely do it with something simple like this:
values = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1]
values_avg = []
temp_sum = 0
for i in range(len(values)):
temp_sum += values[i]
if (i + 1) % 5 == 0:
values_avg.append(temp_sum / 5)
temp_sum = 0
print(values_avg)
# [3.0, 8.0, 8.0, 3.0]

Checking if n elements in an array are increasing

I have written a code for SPC and I am attempting to highlight certain out of control runs.
So I was wondering if there was a way to pull out n(in my case 7) amount of increasing elements in an array so I can index with with the color red when I go to plot them.
This is what I attempted but I obviously get an indexing error.
import numpy as np
import matplotlib.pyplot as plt
y = np.linspace(0,10,15)
x = np.array([1,2,3,4,5,6,7,8,9,1,4,6,4,6,8])
col =[]
for i in range(len(x)):
if x[i]<x[i+1] and x[i+1]<x[i+2] and x[i+2]<x[i+3] and x[i+3]<x[i+4] and x[i+4]<x[i+5] and x[i+5]<x[i+6] and x[i+6]<x[i+7]:
col.append('red')
elif x[i]>x[i+1] and x[i+1]>x[i+2] and x[i+2]>x[i+3] and x[i+3]>x[i+4] and x[i+4]>x[i+5] and x[i+5]>x[i+6] and x[i+6]>x[i+7]:
col.append('red')
else:
col.append('blue')
for i in range(len(x)):
# plotting the corresponding x with y
# and respective color
plt.scatter(y[i], x[i], c = col[i], s = 10,
linewidth = 0)
Any help would be greatly appreciated!
As Andy said in his comment you get the index error because at i=8 you get to 15 which is the length of x.
Either you only loop over len(x)-7 and just repeat the last entry in col 7 times or you could do something like this:
import numpy as np
import matplotlib.pyplot as plt
y = np.linspace(0,10,20)
x = np.array([1,2,3,4,5,6,1,2,3,1,0,-1,-2,-3,-4,-5,-6,4,5])
col =[]
diff = np.diff(x) # get diff to see if x inc + or dec - // len(x)-1
diff_sign = np.diff(np.sign(diff)) # get difference of the signs to get either 1 (true) or 0 (false) // len(x)-2
zero_crossings = np.where(diff_sign)[0] + 2 # get indices (-2 from len(x)-2) where a zero crossing occures
diff_zero_crossings = np.diff(np.concatenate([[0],zero_crossings,[len(x)]])) # get how long the periods are till next zero crossing
for i in diff_zero_crossings:
if i >= 6:
for _ in range(i):
col.append("r")
else:
for _ in range(i):
col.append("b")
for i in range(len(x)):
# plotting the corresponding x with y
# and respective color
plt.scatter(y[i], x[i], c = col[i], s = 10,
linewidth = 0)
plt.show()
To determine if all integer elements of a list are ascending, you could do this:-
def ascending(arr):
_rv = True
for i in range(len(arr) - 1):
if arr[i + 1] <= arr[i]:
_rv = False
break
return _rv
a1 = [1, 2, 3, 4, 5, 6, 7, 8, 9, 8, 10, 11, 12, 13, 14, 16]
a2 = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 16]
print(ascending(a1))
print(ascending(a2))
If you want to limit the sequence of ascending values then you could just use nested loops. It may look inelegant but it's surprisingly efficient and much simpler than bringing dataframes into the mix:-
def ascending(arr, seq):
for i in range(len(arr) - seq + 1):
state = True
for j in range(i, i + seq - 1):
if arr[j] >= arr[j + 1]:
state = False
break
if state:
return True
return False
a1 = [100, 99, 98, 6, 7, 8, 10, 11, 12, 13, 14, 13]
a2 = [9, 8, 7, 6, 5, 4, 3, 2, 1]
print(ascending(a1, 7))
print(ascending(a2, 7))

Count Overlap Between Neighboring Indices in NumPy Array

I have a NumPy array of integers:
x = np.array([1, 0, 2, 1, 4, 1, 4, 1, 0, 1, 4, 3, 0, 1, 0, 2, 1, 4, 3, 1, 4, 1, 0])
and another array of indices that references the array above:
indices = np.array([22, 12, 8, 1, 14, 21, 7, 0, 13, 19, 5, 3, 9, 16, 2, 15, 11, 18, 20, 6, 4, 10, 17])
For every pair of neighboring indices, we need to count how many consecutive values in x are overlapping starting at each of the two neighboring indices. For example, for indices[2] and indices[3], we have index 8 and 1, respectively, and they both reference positions in x. Then, starting at x[8] and x[1], we count how many consecutive values are the same or are overlapping but we stop checking the overlap under specific conditions (see below). In other words, we check if:
x[8] == x[1]
x[9] == x[2] # increment each index by one
... # continue incrementing each index except in the following conditions
stop if i >= x.shape[0]
stop if j >= x.shape[0]
6. stop if x[i] == 0
7. stop if x[j] == 0
stop if x[i] != x[j]
In reality, we do this for all neighboring index pairs:
out = np.zeros(indices.shape[0], dtype=int)
for idx in range(indices.shape[0]-1):
count = 0
i = indices[idx]
j = indices[idx + 1]
k = 0
# while i+k < x.shape[0] and j+k < x.shape[0] and x[i+k] != 0 and x[j+k] != 0 and x[i+k] == x[j+k]:
while i+k < x.shape[0] and j+k < x.shape[0] and x[i+k] == x[j+k]:
count += 1
k += 1
out[idx] = k
And the output is:
# [0, 0, 0, 0, 0, 1, 1, 1, 1, 3, 3, 2, 3, 0, 3, 0, 1, 0, 2, 2, 1, 2, 0] # This is the old output if x[i] == 0 and x[j] == 0 are included
[1 2 1 4 0 2 2 5 1 4 3 2 3 0 3 0 1 0 3 2 1 2 0]
I'm looking for a vectorized way to do this in NumPy.
This should do the trick (I am ignoring the two conditions x[i]=0 and x[j]=0)
for idx in range(indices.shape[0]-1):
i = indices[idx]
j = indices[idx + 1]
l = len(x) - max(i,j)
x1 = x[i:i+l]
x2 = x[j:j+l]
# Add False at the end to handle the case in which arrays are exactly the same
x0 = np.append(x1==x2, False)
out[idx] = np.argmin(x0)
Notice that with np.argmin I am exploiting the following two facts:
False < True
np.argmin only returns the first instance of the min in the array
Performance Analysis
Regarding time performance, I tested with N=10**5 and N=10**6, and as suggested in the comments, this cannot compete with numba jit.
def f(x, indices):
out = np.zeros(indices.shape[0], dtype=int)
for idx in range(indices.shape[0]-1):
i = indices[idx]
j = indices[idx + 1]
l = len(x) - max(i,j)
x1 = x[i:i+l]
x2 = x[j:j+l]
x0 = np.append(x1==x2, False)
out[idx] = np.argmin(x0)
return out
N=100_000
x = np.random.randint(0,10, N)
indices = np.arange(0, N)
np.random.shuffle(indices)
%timeit f(x, indices)
3.67 s ± 122 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
N=1_000_000
x = np.random.randint(0,10, N)
indices = np.arange(0, N)
np.random.shuffle(indices)
%time f(x, indices)
Wall time: 8min 20s
(I did not have the patience to let %timeit finish)

Categories

Resources