How to dynamically create columns based on multiple conditions - python

So I'm having the following problem:
I have a dataframe like the one bellow where time_diff_float is the time difference between each row and the row above in minutes. So, for example, I had value = 4 20 minutes after value = 1.
value | time_diff_float
1 NaN
4 20
3 13
2 55
5 08
7 15
First I have to check if the time difference between two rows is < 60 (one hour) and create a column using the formula rem = value (from row above) * lambda ** time difference between 2 rows . My lambda is a constant with the value of 0.97.
And then, if the time difference between each row and 2 rows above is still inferior to 60, I have to re-do the same thing comparing each row with 2 rows above. And then I have to do the same thing comparing 3 rows above and etc.
To do that I wrote the following code:
df.loc[df['time_diff_float'] < 60, 'rem_1'] = df['value'].shift() * (lambda_ ** (df['time_diff_float'] - 1))
df.loc[df['time_diff_float'] + df['time_diff_float'].shift() < 60, 'rem_2'] = df['value'].shift(2) * (lambda_ ** (df['time_diff_float'] + df['time_diff_float'].shift() - 1))
df.loc[df['time_diff_float'] + df['time_diff_float'].shift() + df['time_diff_float'].shift(2) < 60, 'rem_3'] = df['value'].shift(3) * (lambda_ ** (df['time_diff_float'] + df['time_diff_float'].shift() + df['time_diff_float'].shift(2) - 1))
My question is: since I have to re-do this at least 10 times (even more) with the real values I have, is there a way to create the "rem columns" dynamically?
Thanks in advance!

You can save a mask of your data and then update it in every time of the loop:
n = 3
for i in range(1, n):
if (i==1):
mask = df['time_diff_float']
df.loc[mask, 'rem_' +str(i)] = df['value'].shift() * (lambda_ ** (mask - 1))
else:
mask += df['time_diff_float'].shift(i-1)
df.loc[mask < 60, 'rem_'+str(i)] = df['value'].shift(i) * (lambda_ ** (mask - 1))

Related

Geometric series: calculate quotient and number of elements from sum and first & last element

Creating evenly spaced numbers on a log scale (a geometric progression) can easily be done for a given base and number of elements if the starting and final values of the sequence are known, e.g., with numpy.logspace and numpy.geomspace. Now assume I want to define the geometric progression the other way around, i.e., based on the properties of the resulting geometric series. If I know the sum of the series as well as the first and last element of the progression, can I compute the quotient and number of elements?
For instance, assume the first and last elements of the progression are and and the sum of the series should be equal to . I know from trial and error that it works out for n=9 and r≈1.404, but how could these values be computed?
You have enough information to solve it:
Sum of series = a + a*r + a*(r^2) ... + a*(r^(n-1))
= a*((r^n)-1)/(r-1)
= a*((last element * r) - 1)/(r-1)
Given the sum of series, a, and the last element, you can use the above equation to find the value of r.
Plugging in values for the given example:
50 = 1 * ((15*r)-1) / (r-1)
50r - 50 = 15r - 1
35r = 49
r = 1.4
Then, using sum of series = a*((r^n)-1)/(r-1):
50 = 1*((1.4^n)-1)(1.4-1)
21 = 1.4^n
n = log(21)/log(1.4) = 9.04
You can approximate n and recalculate r if n isn't an integer.
We have to reconstruct geometric progesssion, i.e. obtain a, q, m (here ^ means raise into power):
a, a * q, a * q^2, ..., a * q^(m - 1)
if we know first, last, total:
first = a # first item
last = a * q^(m - 1) # last item
total = a * (q^m - 1) / (q - 1) # sum
Solving these equation we can find
a = first
q = (total - first) / (total - last)
m = log(last / a) / log(q)
if you want to get number of items n, note that n == m + 1
Code:
import math
...
def Solve(first, last, total):
a = first
q = (total - first) / (total - last)
n = math.log(last / a) / math.log(q) + 1
return (a, q, n);
Fiddle
If you put your data (1, 15, 50) you'll get the solution
a = 1
q = 1.4
n = 9.04836151801382 # not integer
since n is not an integer you, probably want to adjust; let last == 15 be exact, when total can vary. In this case q = (last / first) ^ (1 / (n - 1)) and total = first * (q ^ n - 1) / (q - 1)
a = 1
q = 1.402850552006674
n = 9
total = 49.752 # now n is integer, but total <> 50
You have to solve the following two equations for r and n:
a:= An / Ao = r^(n - 1)
and
s:= Sn / Ao = (r^n - 1) / (r - 1)
You can eliminate n by
s = (r a - 1) / (r - 1)
and solve for r. Then n follows by log(a) / log(r) + 1.
In your case, from s = 50 and a = 15, we obtain r = 7/5 = 1.4 and n = 9.048...
It makes sense to round n to 9, but then r^8 = 15 (r ~ 1.40285) and r = 1.4 are not quite compatible.

Python mathematical expression evaluation

I was trying to solve a random problem, I used a relation that I made, when I've come to implement it in python it give me different results than the one that I calculated, so I tried to change.
the thing is I don't get how does python see each one!!?
those two expressions here give different results sometimes:
((column+1)//2) * ((row+1)//2)
= (column+1)//2 * (row+1)//2
Here's an example:
rows, columns = 4, 4
for row in range(2, rows+1):
for column in range(1, columns+1):
print('*'*15)
result = ((column+1)//2) * ((row+1)//2)
f_result = (column+1)//2 * (row+1)//2
print('>> normal expression:', (column+1)//2, (row+1)//2)
print('>> second expression:', ((column+1)//2), ((row+1)//2))
print('>> row:', row)
print('>> column:', column)
print('>> Results:', result, f_result)
print()
The last two entries in the results:
***************
>> normal expression: 2 2
>> second expression: 2 2
>> row: 4
>> column: 3
>> Results: 4 5
***************
>> normal expression: 2 2
>> second expression: 2 2
>> row: 4
>> column: 4
>> Results: 4 5
You need to understand operator precedence first
Check out this link
Now for the expression
((col+1)//2) * ((row+1)//2) = (col+1)//2 * (row+1)//2
((col+1)//2) * ((row+1)//2) = ((4+1)//2) * ((4+1)//2)
= (5//2)*(5//2)
= 2 * 2
= 4
(col+1)//2 * (row+1)//2 = (4+1)//2 * (4+1)//2
= 5//2 * 5//2
= 2 * 5//2
= 10//2 (as * has higher precedence over //)
= 5

Python Maclaurin series ln(x+1)

I have to write a program of Maclaurin series ln(x+1) on Python.
I need to use input function for two values: x, n. Then check if the values are legal and calculates the Maclaurin approximation (of order n) of the expression ln (1 + 𝑥) around the point x.
*Maclaurin series ln(x+1)= sum of ((-1)^n/n)*x^n
I stacked in the end when I calculate to expression, that what I wrote (after all the checks before):
for i in range(n + 1):
if i <= 1:
continue
else:
x = x + (((-1) ** (i + 1)) * (x ** i) / i)
When I input the test I get a number but it's a wrong answer.
Please help me understand what is wrong in this code.
Mathematically, the Maclaurin series is a bit beyond me, but I'll try to help. Two things.
First, you're storing all the successive values in x, as you calculate them; that means that the term for n = 5 (i = 5) is using a value of x which isn't the original value of the parameter x, but which has the successive results of the four previous computations stored in it. What you need to do instead is something like:
total = 0
for each value:
this term = some function of x # the value of x does not change
total = total + this term
Second, why aren't you interested in the term when i (or n) is equal to 1? The condition
if i <= 1:
continue
skips out the case when i equals 1, which evaluates to -x.
That should fix it, as far as I can see.
You are modifying the value of x in each iteration of the loop. Add and then store the partial sums in another variable.
def maclaurin_ln(x, n):
mac_sum = 0
for i in range(1, n + 1):
mac_sum += (((-1) ** (i + 1)) * (x ** i) / i)
return mac_sum
You can test this with the built-in function log1p to see how close they can get.
For ln(2) for different n,
from tabulate import tabulate
res = []
for n in [1, 10, 100, 1000, 10000]:
p = math.log1p(1)
q = maclaurin_ln(1, n)
res.append([1, n, p, q, q-p])
tabulate(res, headers=["x", "n", "log1p", "maclaurin_ln", "maclaurin_ln-log1p"])
x n log1p maclaurin_ln maclaurin_ln-log1p
--- ----- -------- -------------- --------------------
1 1 0.693147 1 0.306853
1 10 0.693147 0.645635 -0.0475123
1 100 0.693147 0.688172 -0.004975
1 1000 0.693147 0.692647 -0.00049975
1 10000 0.693147 0.693097 -4.99975e-05
For different x,
res = []
for x in range(10):
p = math.log1p(x/10)
q = maclaurin_ln(x/10, 100)
res.append([x/10, 1000, p, q, q-p])
tabulate(res, headers=["x", "n", "log1p", "maclaurin_ln", "maclaurin_ln-log1p"])
x n log1p maclaurin_ln maclaurin_ln-log1p
--- ---- --------- -------------- --------------------
0 1000 0 0 0
0.1 1000 0.0953102 0.0953102 1.38778e-17
0.2 1000 0.182322 0.182322 2.77556e-17
0.3 1000 0.262364 0.262364 -1.11022e-16
0.4 1000 0.336472 0.336472 0
0.5 1000 0.405465 0.405465 -1.11022e-16
0.6 1000 0.470004 0.470004 5.55112e-17
0.7 1000 0.530628 0.530628 -4.44089e-16
0.8 1000 0.587787 0.587787 -9.00613e-13
0.9 1000 0.641854 0.641854 -1.25155e-07

Converting MATLAB code to Python: Python types and order of operations

This is a MATLAB function from the author of RainbowCrack:
function ret = calc_success_probability(N, t, m)
arr = zeros(1, t - 1);
arr(1) = m;
for i = 2 : t - 1
arr(i) = N * (1 - (1 - 1 / N) ^ arr(i - 1));
end
exp = 0;
for i = 1 : t - 1
exp = exp + arr(i);
end
ret = 1 - (1 - 1 / N) ^ exp;
It calculates the probability of success in finding a plaintext password given a rainbow table with keyspace N, a large unsigned integer, chain of length t, and number of chains m.
A sample run:
calc_success_probability(80603140212, 2400, 40000000)
Returns 0.6055.
I am having difficulty converting this into Python. In Python 3, there is no max integer anymore, so N isn't an issue. I think in the calculations I have to force everything to a large floating point number, but I'm not sure.
I also don't know the order of operations in MATLAB. I think the code is saying this:
Create array of size [1 .. 10] so ten elements
Initialize every element of that array with zero
In zero-based indexing, I think this would be array[0 .. t-1], it looks like MATLAB uses 1 as the first (0'th) index.
Then second element of array (0-based indexing) initialized to m.
For each element in array, pos=1 (0-based indexing) to t-1:
array[pos] = N * (1 - (1 - 1/N) ** array[pos-1]
Where ** is the power operator. I think power is ^ in MATLAB, so N * (1 - (1-1/N) to the array[pos-1] power is like that above.
Then set an exponent. For each element in array 0 to t-1:
exponent is exponent + 1
return probability = 1 - (1 - 1/N) power of exp;
My Python code looks like this, and doesn't work. I can't figure out why, but it could be that I don't understand MATLAB enough, or Python, both, or I'm reading the math wrong somehow and what's going on in MATLAB is not what I'm expecting, i.e. I have order of operations and/or types wrong to make it work and I'm missing something in those terms...
def calc_success_probability(N, t, m):
comp_arr = []
# array with indices 1 to t-1 in MATLAB, which is otherwise 0 to t-2???
# range with 0, t is 0 to t excluding t, so t here is t-1, t-1 is up
# to including t-2... sounds wrong...
for i in range(0, t-1):
# initialize array
comp_arr.append(0)
print("t = {0:d}, array size is {1:d}".format(t, len(comp_arr)))
# zero'th element chain count
comp_arr[0] = m
for i in range(1, t-1):
comp_arr[i] = N * (1 - (1 - 1 / N)) ** comp_arr[i-1]
final_exp = 0
for i in range(0, t-1):
final_exp = final_exp + comp_arr[i]
probability = (1 - (1 - 1 / N)) ** final_exp
return probability
Watch your brackets! You have translated this:
arr(i) = N * ( 1 - ( 1 - 1 / N ) ^ arr(i - 1) );
to this:
comp_arr[i] = N * ( 1 - ( 1 - 1 / N ) ) ** comp_arr[i-1]
I've lined up everything so you can better see where it goes wrong. You've moved a bracket to the wrong location.
It should be:
comp_arr[i] = N * ( 1 - ( 1 - 1 / N ) ** comp_arr[i-1] )
Similarly,
ret = 1 - (1 - 1 / N) ^ exp;
is not the same as
probability = (1 - (1 - 1 / N)) ** final_exp
This should be
probability = 1 - (1 - 1 / N) ** final_exp

Why is the computing of the value of pi using the Machin Formula giving a wrong value?

For my school project I was trying to compute the value of using different methods. One of the formula I found was the Machin Formula that can be calculated using the Taylor expansion of arctan(x).
I wrote the following code in python:
import decimal
count = pi = a = b = c = d = val1 = val2 = decimal.Decimal(0) #Initializing the variables
decimal.getcontext().prec = 25 #Setting percision
while (decimal.Decimal(count) <= decimal.Decimal(100)):
a = pow(decimal.Decimal(-1), decimal.Decimal(count))
b = ((decimal.Decimal(2) * decimal.Decimal(count)) + decimal.Decimal(1))
c = pow(decimal.Decimal(1/5), decimal.Decimal(b))
d = (decimal.Decimal(a) / decimal.Decimal(b)) * decimal.Decimal(c)
val1 = decimal.Decimal(val1) + decimal.Decimal(d)
count = decimal.Decimal(count) + decimal.Decimal(1)
#The series has been divided into multiple small parts to reduce confusion
count = a = b = c = d = decimal.Decimal(0) #Resetting the variables
while (decimal.Decimal(count) <= decimal.Decimal(10)):
a = pow(decimal.Decimal(-1), decimal.Decimal(count))
b = ((decimal.Decimal(2) * decimal.Decimal(count)) + decimal.Decimal(1))
c = pow(decimal.Decimal(1/239), decimal.Decimal(b))
d = (decimal.Decimal(a) / decimal.Decimal(b)) * decimal.Decimal(c)
val2 = decimal.Decimal(val2) + decimal.Decimal(d)
count = decimal.Decimal(count) + decimal.Decimal(1)
#The series has been divided into multiple small parts to reduce confusion
pi = (decimal.Decimal(16) * decimal.Decimal(val1)) - (decimal.Decimal(4) * decimal.Decimal(val2))
print(pi)
The problem is that I am getting the right value of pi only till 15 decimal places, no matter the number of times the loop repeats itself.
For example:
at 11 repetitions of the first loop
pi = 3.141592653589793408632493
at 100 repetitions of the first loop
pi = 3.141592653589793410703296
I am not increasing the repetitions of the second loop as arctan(1/239) is very small and reaches an extremely small value with a few repetitions and therefore should not affect the value of pi at only 15 decimal places.
EXTRA INFORMATION:
The Machin Formula states that:
π = (16 * Summation of (((-1)^n) / 2n+1) * ((1/5)^(2n+1))) - (4 * Summation of (((-1)^n) / 2n+1) * ((1/239)^(2n+1)))
That many terms is enough to get you over 50 decimal places. The problem is that you are mixing Python floats with Decimals, so your calculations are polluted with the errors in those floats, which are only precise to 53 bits (around 15 decimal digits).
You can fix that by changing
c = pow(decimal.Decimal(1/5), decimal.Decimal(b))
to
c = pow(1 / decimal.Decimal(5), decimal.Decimal(b))
or
c = pow(decimal.Decimal(5), decimal.Decimal(-b))
Obviously, a similar change needs to be made to
c = pow(decimal.Decimal(1/239), decimal.Decimal(b))
You could make your code a lot more readable. For starters, you should put the stuff that calculates the arctan series into a function, rather than duplicating it for arctan(1/5) and arctan(1/239).
Also, you don't need to use Decimal for everything. You can just use simple Python integers for things like count and a. Eg, your calculation for a can be written as
a = (-1) ** count
or you could just set a to 1 outside the loop and negate it each time through the loop.
Here's a more compact version of your code.
import decimal
decimal.getcontext().prec = 60 #Setting precision
def arccot(n, terms):
base = 1 / decimal.Decimal(n)
result = 0
sign = 1
for b in range(1, 2*terms, 2):
result += sign * (base ** b) / b
sign = -sign
return result
pi = 16 * arccot(5, 50) - 4 * arccot(239, 11)
print(pi)
output
3.14159265358979323846264338327950288419716939937510582094048
The last 4 digits are rubbish, but the rest are fine.

Categories

Resources