I'm trying to statistically process data using python for learning purposes.
In my problem I generate two tosses of a dice n times, where X is a random variable, defining a product of two tosses. I managed how to calculate the expectation of X, then the variance of X, but I have problems with computing the standard deviation of X.
Here is my question.
How to get a third list from two lists, based on algebraic operations on elements of these two lists with the same serial numbers? Precisely, I want to get something like this.
x = [x0, x1, .., xi, .., xn]
y = [y0, y1, .., yi, .., yn]
z = [(x0-y0)^2, (x1-y1)^2, .., (xi-yi)^2, .., (xn-yn)^2]
Here is my code. Maybe it's a bit bulky, but it's my first one. I receive an error
unsupported operand type(s) for -: 'list' and 'Decimal
on the line
x_error_2 = Decimal (((x_storage) - (expectation_x))**2).quantize(Decimal('.0001'))
Clearly, I'm doing it wrong.
n = input ("n=")
sum_x = 0
sum_x_2 = 0
sum_x_error_2 = 0
x_storage = [ ]
expectation_x_storage = []
from decimal import Decimal
for i in range (0, n):
from random import *
x = Decimal ((randint(1, 6)*randint(1, 6))).quantize(Decimal('1'))
x_storage.append(x)
x_2 = Decimal (x**2).quantize(Decimal('.01'))
sum_x = sum_x + x
sum_x_2 = sum_x_2 + x_2
expectation_x = Decimal (sum_x / n).quantize(Decimal('.01'))
expectation_x_2 = Decimal (sum_x_2 / n).quantize(Decimal('.01'))
variance_x = Decimal ((expectation_x_2 - (expectation_x)**2)).quantize(Decimal('.01'))
print ("E(X)=")
print (expectation_x)
print ("V(X)=")
print (variance_x)
for i in range (0, n):
expectation_x_storage.append(expectation_x)
print x_storage
print expectation_x_storage
#code is working until the next line
for i in range (0, n):
x_error_2 = Decimal (((x_storage) - (expectation_x))**2).quantize(Decimal('.0001'))
sum_x_error_2 = sum_x_error_2 + x_error_2
standard_deviation_x_2 = Decimal ((sum_x_error_2)/(n-1)).quantize(Decimal('.01'))
print ("Sn2(X)=")
print (standard_deviation_x_2)
Looks that you simply need to take i-th element of x_storage here.
x_error_2 = Decimal (((x_storage[i]) - (expectation_x))**2).quantize(Decimal('.0001'))
Also change identation of the line
standard_deviation_x_2 = Decimal ((sum_x_error_2)/(n-1)).quantize(Decimal('.01'))
To place it outside for-loop. Not sure is it worth mentioning, but in python identation is critical.
Then it should work.
Seems you're using python 2.7? I'd suggest you to not mix style you call print with and without parentheses. Use print(...).
You already have two lists x = [x1,x2,...xn] and y=[y1,y2,...,yn] now z should be z=[(x1-y1)^2,(x2-y2)^2,...,(xn-yn)^2]
You can do it this way:
>>> a=[35.5,36.6,37.7]
>>> b=[12.34,13.89,30.8]
>>> c=[(a[i]-b[i])**2 for i in range(len(a))]
>>> c
[536.3856, 515.7441, 47.61000000000003]
>>>
If you to round those digits you can use round function
>>> c=[round((a[i]-b[i])**2,3) for i in range(len(a))]
>>> c
[536.386, 515.744, 47.61]
>>>
round(x,y) is round number x to y decimal digits
Related
For some reason it shows an error message: TypeError: argument should be a string or a Rational instance
import cmath
from fractions import Fraction
#Function
# Quadratic equatrion solver
def solver(a_entry, b_entry, c_entry):
a = int(a_entry)
b = int(b_entry)
c = int(c_entry)
d = (b*b) - (4*a*c)
sol1 = (-b-cmath.sqrt(d)/(2*a))
sol2 = (-b+cmath.sqrt(d)/(2*a))
sol3 = Fraction(sol1)
sol4 = Fraction(sol2)
print(f"Value of x1 = {sol3} and value of x2 = {sol4}")
solver(1, 2, 3)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 8, in solver
File "/usr/lib/python3.10/fractions.py", line 139, in __new__
raise TypeError("argument should be a string "
TypeError: argument should be a string or a Rational instance
I am a new programmer and I saw that this code generates a weird number (example: 5.42043240824+0j {inaccurate values})
when i give random values. So I want it to give either an accurate decimal values or in fraction. The fraction method dosen't work for some reason. Can someone please help. Alot of thanks.
2 things wrong in your code :
Use math instead of cmath as cmath is used for complexed values (it will always returns a complexe value, even 1+0j) which is not compatible with Fraction.
Be careful you wrote : (-b-cmath.sqrt(d)/(2*a)) but is should be ((-b-cmath.sqrt(d))/(2*a))
Also, the solution might no exist. For example, resolving 1x^2 + 3x + 10 has no answer (your fonction does not cross x axe). It still has complexe answer(s).
To avoid this you can use a try except to catch errors. OR you can validate that d^2 is greater than 4ac because you can't sqrt negative values (except with complexe values ;) ) :
def solver():
a = int(entry.get())
b = int(entry1.get())
c = int(entry2.get())
d = (b*b) - (4*a*c)
if d < 0:
text = "no real answer ! The function doesn't cross X axe !"
label2.configure(text = text)
else:
sol1 = ((-b-math.sqrt(d))/(2*a))
sol2 = ((-b+math.sqrt(d))/(2*a))
sol3 = Fraction(sol1)
sol4 = Fraction(sol2)
label2.configure(text = f"Value of x1 = {sol3} and value of x2 = {sol4}")
Hope it helps
The issue with sqrt
It appears that you do not want to evaluate the square roots to numerical approximations. But that is exactly what cmath.sqrt and math.sqrt do: they calculate numerical approximations of square roots.
For instance:
import math
print( math.sqrt(2) )
# 1.4142135623730951
If you are not interested in numerical approximations, then I suggest using a library for symbolic calculus. The best-known library for symbolic calculus in python is called sympy. This module has a sympy.sqrt function that will simplify a square root as much as it can, but without returning a numerical approximation:
import sympy
print( sympy.sqrt(9) )
# 3
print( sympy.sqrt(2) )
# sqrt(2)
print( sympy.sqrt(18) )
# 3*sqrt(2)
More information about sympy: https://docs.sympy.org/latest/tutorials/intro-tutorial/intro.html
Other advice
When you write a program, it is most usually a good idea to cleanly separate the parts of the code that deal with algorithms, maths, and logic, from the parts of the code that deal with input and output. I suggest writing two functions, one that solves quadratic equations, and one that does input and output:
import sympy
# returns solutions of a x**2 + b x + c == 0
def solver(a, b, c):
Delta = b*b - 4*a*c
sol1 = (-b - sympy.sqrt(Delta)) / (2*a)
sol2 = (-b + sympy.sqrt(Delta)) / (2*a)
return (sol1, sol2)
# ask for user input and solve an equation
def input_equation_output_solution():
a = int(entry.get())
b = int(entry1.get())
c = int(entry2.get())
sol1, sol2 = solver(a, b, c)
label2.configure(text = f"Value of x1 = {sol1} and value of x2 = {sol2}")
I'm writing a program that evaluates the power series sum_{m=0}{oo} a[m]x^m, where a[m] is recursively defined: a[m]=f(a[m-1]). I am generating symbols as follows:
a = list(sympy.symbols(' '.join([('a%d' % i) for i in range(10)])))
for i in range(1, LIMIT):
a[i] = f_recur(a[i-1], i-1)
This lets me refer to the symbols a0,a1,...,a9 using a[0],a[1],...,a[9], and a[m] is a function of a[m-1] given by f_recur.
Now, I hope code up the summation as follows:
m, x, y = sympy.symbols('m x y')
y = sympy.Sum(a[m]*x**m, (m, 0, 10))
But, m is not an integer so a[m] throws an Exception.
In this situation, where symbols are stored in a list, how would you code the summation? Thanks for any help!
SymPy's Sum is designed as a sum with a symbolic index. You want a sum with a concrete index running through 0, ... 9. This could be Python's sum
y = sum([a[m]*x**m for m in range(10)])
or, which is preferable from the performance point of view (relevant issue)
y = sympy.Add(*[a[m]*x**m for m in range(10)])
In either case, m is not a symbol but an integer.
I have a work-around that does not use sympy.Sum:
x = sympy.symbols('x')
y = a[0]*x**0
for i in range(1, LIMIT):
y += a[i]*x**i
This does the job, but sympy.Sum is not used.
Use IndexedBase instead of Symbol:
>>> a = IndexedBase('a')
>>> Sum(x**m*a[m],(m,1,3))
Sum(a[m]*x**m, (m, 1, 3))
>>> _.doit()
a[1]*x + a[2]*x**2 + a[3]*x**3
I am trying to complete the following exercise:
https://www.codewars.com/kata/whats-a-perfect-power-anyway/train/python
I tried multiple variations, but my code breaks down when big numbers are involved (I tried multiple variations with solutions involving log and power functions):
Exercise:
Your task is to check wheter a given integer is a perfect power. If it is a perfect power, return a pair m and k with m^k = n as a proof. Otherwise return Nothing, Nil, null, None or your language's equivalent.
Note: For a perfect power, there might be several pairs. For example 81 = 3^4 = 9^2, so (3,4) and (9,2) are valid solutions. However, the tests take care of this, so if a number is a perfect power, return any pair that proves it.
The exercise uses Python 3.4.3
My code:
import math
def isPP(n):
for i in range(2 +n%2,n,2):
a = math.log(n,i)
if int(a) == round(a, 1):
if pow(i, int(a)) == n:
return [i, int(a)]
return None
Question:
How is it possible that I keep getting incorrect answers for bigger numbers? I read that in Python 3, all ints are treated as "long" from Python 2, i.e. they can be very large and still represented accurately. Thus, since i and int(a) are both ints, shouldn't the pow(i, int(a)) == n be assessed correctly? I'm actually baffled.
(edit note: also added integer nth root bellow)
you are in the right track with logarithm but you are doing the math wrong, also you are skipping number you should not and only testing all the even number or all the odd number without considering that a number can be even with a odd power or vice-versa
check this
>>> math.log(170**3,3)
14.02441559235585
>>>
not even close, the correct method is described here Nth root
which is:
let x be the number to calculate the Nth root, n said root and r the result, then we get
rn = x
take the log in any base from both sides, and solve for r
logb( rn ) = logb( x )
n * logb( r ) = logb( x )
logb( r ) = logb( x ) / n
blogb( r ) = blogb( x ) / n
r = blogb( x ) / n
so for instance with log in base 10 we get
>>> pow(10, math.log10(170**3)/3 )
169.9999999999999
>>>
that is much more closer, and with just rounding it we get the answer
>>> round(169.9999999999999)
170
>>>
therefore the function should be something like this
import math
def isPP(x):
for n in range(2, 1+round(math.log2(x)) ):
root = pow( 10, math.log10(x)/n )
result = round(root)
if result**n == x:
return result,n
the upper limit in range is to avoid testing numbers that will certainly fail
test
>>> isPP(170**3)
(170, 3)
>>> isPP(6434856)
(186, 3)
>>> isPP(9**2)
(9, 2)
>>> isPP(23**8)
(279841, 2)
>>> isPP(279841)
(529, 2)
>>> isPP(529)
(23, 2)
>>>
EDIT
or as Tin Peters point out you can use pow(x,1./n) as the nth root of a number is also expressed as x1/n
for example
>>> pow(170**3, 1./3)
169.99999999999994
>>> round(_)
170
>>>
but keep in mind that that will fail for extremely large numbers like for example
>>> pow(8191**107,1./107)
Traceback (most recent call last):
File "<pyshell#90>", line 1, in <module>
pow(8191**107,1./107)
OverflowError: int too large to convert to float
>>>
while the logarithmic approach will success
>>> pow(10, math.log10(8191**107)/107)
8190.999999999999
>>>
the reason is that 8191107 is simple too big, it have 419 digits which is greater that the maximum float representable, but reducing it with a log produce a more reasonable number
EDIT 2
now if you want to work with numbers ridiculously big, or just plain don't want to use floating point arithmetic altogether and use only integer arithmetic, then the best course of action is to use the method of Newton, that the helpful link provided by Tin Peters for the particular case for cube root, show us the way to do it in general alongside the wikipedia article
def inthroot(A,n):
if A<0:
if n%2 == 0:
raise ValueError
return - inthroot(-A,n)
if A==0:
return 0
n1 = n-1
if A.bit_length() < 1024: # float(n) safe from overflow
xk = int( round( pow(A,1/n) ) )
xk = ( n1*xk + A//pow(xk,n1) )//n # Ensure xk >= floor(nthroot(A)).
else:
xk = 1 << -(-A.bit_length()//n) # power of 2 closer but greater than the nth root of A
while True:
sig = A // pow(xk,n1)
if xk <= sig:
return xk
xk = ( n1*xk + sig )//n
check the explanation by Mark Dickinson to understand the working of the algorithm for the case of cube root, which is basically the same for this
now lets compare this with the other one
>>> def nthroot(x,n):
return pow(10, math.log10(x)/n )
>>> n = 2**(2**12) + 1 # a ridiculously big number
>>> r = nthroot(n**2,2)
Traceback (most recent call last):
File "<pyshell#48>", line 1, in <module>
nthroot(n**2,2)
File "<pyshell#47>", line 2, in nthroot
return pow(10, math.log10(x)/n )
OverflowError: (34, 'Result too large')
>>> r = inthroot(n**2,2)
>>> r == n
True
>>>
then the function is now
import math
def isPPv2(x):
for n in range(2,1+round(math.log2(x))):
root = inthroot(x,n)
if root**n == x:
return root,n
test
>>> n = 2**(2**12) + 1 # a ridiculously big number
>>> r,p = isPPv2(n**23)
>>> p
23
>>> r == n
True
>>> isPPv2(170**3)
(170, 3)
>>> isPPv2(8191**107)
(8191, 107)
>>> isPPv2(6434856)
(186, 3)
>>>
now lets check isPP vs isPPv2
>>> x = (1 << 53) + 1
>>> x
9007199254740993
>>> isPP(x**2)
>>> isPPv2(x**2)
(9007199254740993, 2)
>>>
clearly, avoiding floating point is the best choice
I have a function that takes two inputs, and will return an array of tuples where the two numbers in a given tuple have the exact same ratio as the two numbers given to the function!
So everything was working fine, but for some reason in some instances, it is not picking up every tuple. Here is an example of it, and I don't know why:
In [52]: def find_r(num1,num2):
....: ratio = num1/float(num2)
....: ratio = 1/ratio
....: my_list = [(a,int(a * ratio)) for a in range(1,num1) if float(a * ratio).is_integer()] #and a * 1/float(ratio) + a <= num1]
....: return my_list
....:
In [53]: find_r(100,364)
Out[53]: [(75, 273)]
so it returned just one tuple, but if you divide both 75 and 273 by 3, you get a tuple of 25 and 91, which have the same ratio! Why did my function not pick up this instance?
If it helps, I do suspect it has something to do with the is_integer() method, but I am not too sure.
Thanks!
It is due to the imprecision of floating point arithmetic:
>>> ((100/364)*364).is_integer()
False
>>> ((25/91)*91).is_integer()
False
Instead of doing what you're doing, you should check for equivalence of fractions by cross-multiplying. That is, given a fraction a/b, to check if it is equivalent to another c/d, check whether ad == bc. This will avoid division and keep everything as integers.
You can do this:
def find_r(num1,num2):
return [(a, a*num2//num1) for a in range(1, num1) if (a*num2) % num1 == 0]
>>> find_r(100, 364)
[(25, 91), (50, 182), (75, 273)]
(There are other ways to accomplish your task, but this is the most similar to your original approach.)
I think that you get the answer you expect
>>> r=100/float(364)
>>> r
0.27472527472527475
>>> r=1/r
>>> r
3.6399999999999997
>>> r*25
90.99999999999999
>>> r*75
273.0
To make your integer check, you can use
if(int(a*ratio) == a*ratio) like in
def find_r(num1,num2):
ratio = num1/float(num2)
ratio = 1/ratio
my_list = [(a,int(a * ratio)) for a in range(1,num1) if int(a * ratio) == a * ratio]
for a in range(1,num1):
if int(a * ratio) == a * ratio:
print a * ratio
return my_list
print find_r(100,364)
Could you guys please tell me how I can make the following code more pythonic?
The code is correct. Full disclosure - it's problem 1b in Handout #4 of this machine learning course. I'm supposed to use newton's algorithm on the two data sets for fitting a logistic hypothesis. But they use matlab & I'm using scipy
Eg one question i have is the matrixes kept rounding to integers until I initialized one value to 0.0. Is there a better way?
Thanks
import os.path
import math
from numpy import matrix
from scipy.linalg import inv #, det, eig
x = matrix( '0.0;0;1' )
y = 11
grad = matrix( '0.0;0;0' )
hess = matrix('0.0,0,0;0,0,0;0,0,0')
theta = matrix( '0.0;0;0' )
# run until convergence=6or7
for i in range(1, 6):
#reset
grad = matrix( '0.0;0;0' )
hess = matrix('0.0,0,0;0,0,0;0,0,0')
xfile = open("q1x.dat", "r")
yfile = open("q1y.dat", "r")
#over whole set=99 items
for i in range(1, 100):
xline = xfile.readline()
s= xline.split(" ")
x[0] = float(s[1])
x[1] = float(s[2])
y = float(yfile.readline())
hypoth = 1/ (1+ math.exp(-(theta.transpose() * x)))
for j in range(0,3):
grad[j] = grad[j] + (y-hypoth)* x[j]
for k in range(0,3):
hess[j,k] = hess[j,k] - (hypoth *(1-hypoth)*x[j]*x[k])
theta = theta - inv(hess)*grad #update theta after construction
xfile.close()
yfile.close()
print "done"
print theta
One obvious change is to get rid of the "for i in range(1, 100):" and just iterate over the file lines. To iterate over both files (xfile and yfile), zip them. ie replace that block with something like:
import itertools
for xline, yline in itertools.izip(xfile, yfile):
s= xline.split(" ")
x[0] = float(s[1])
x[1] = float(s[2])
y = float(yline)
...
(This is assuming the file is 100 lines, (ie. you want the whole file). If you're deliberately restricting to the first 100 lines, you could use something like:
for i, xline, yline in itertools.izip(range(100), xfile, yfile):
However, its also inefficient to iterate over the same file 6 times - better to load it into memory in advance, and loop over it there, ie. outside your loop, have:
xfile = open("q1x.dat", "r")
yfile = open("q1y.dat", "r")
data = zip([line.split(" ")[1:3] for line in xfile], map(float, yfile))
And inside just:
for (x1,x2), y in data:
x[0] = x1
x[1] = x2
...
x = matrix([[0.],[0],[1]])
theta = matrix(zeros([3,1]))
for i in range(5):
grad = matrix(zeros([3,1]))
hess = matrix(zeros([3,3]))
[xfile, yfile] = [open('q1'+a+'.dat', 'r') for a in 'xy']
for xline, yline in zip(xfile, yfile):
x.transpose()[0,:2] = [map(float, xline.split(" ")[1:3])]
y = float(yline)
hypoth = 1 / (1 + math.exp(theta.transpose() * x))
grad += (y - hypoth) * x
hess -= hypoth * (1 - hypoth) * x * x.transpose()
theta += inv(hess) * grad
print "done"
print theta
the matrixes kept rounding to integers until I initialized one value
to 0.0. Is there a better way?
At the top of your code:
from __future__ import division
In Python 2.6 and earlier, integer division always returns an integer unless there is at least one floating point number within. In Python 3.0 (and in future division in 2.6), division works more how we humans might expect it to.
If you want integer division to return an integer, and you've imported from future, use a double //. That is
from __future__ import division
print 1//2 # prints 0
print 5//2 # prints 2
print 1/2 # prints 0.5
print 5/2 # prints 2.5
You could make use of the with statement.
the code that reads the files into lists could be drastically simpler
for line in open("q1x.dat", "r"):
x = map(float,line.split(" ")[1:])
y = map(float, open("q1y.dat", "r").readlines())