I have a function that I use on two different machine one a Mac running Python version 2.6 and the other is a Lenovo running version 3.2.The function writes data to a file and is called from with in a loop. When using Python 3.2 it works as expected and I get output such as below
25.0 25.0 25.0 0
25.0 25.0 75.0 0
25.0 25.0 125.0 0
25.0 25.0 175.0 0
25.0 25.0 225.0 0
25.0 75.0 25.0 0
25.0 75.0 75.0 0
25.0 75.0 125.0 0
25.0 75.0 175.0 0
25.0 75.0 225.0 0
When I run it on the machine running version 2.6 I get this
175.0 25.0 75.0 2
175.0 25.0 125.0 0
25.0 25.0 25.0 0 Should be first line
175.0 25.0 175.0 0
25.0 25.0 75.0 0 Should be second line
175.0 25.0 225.0 0
25.0 25.0 125.0 1
175.0 75.0 25.0 0
25.0 25.0 175.0 1
175.0 75.0 75.0 2
Here is the code
def filesave(Xc,Yc,Zc,S):
Xc = str(Xc)
Yc = str(Yc)
Zc = str(Zc)
Xs = str(S)
#Ms = str(Ma)
w = open("Myout.txt.","a+")
w.write(Xc)
w.write('\t')
w.write(Yc)
w.write('\t')
w.write(Zc)
w.write('\t')
w.write(Xs)
w.write('\n')
w.close()
return()
Is there some difference between the two versions that is causing the difference? Thanks!
EDIT
Rest of Code
def cell_centers():
read_file(F)
dx = dy = dz= float(input('Please enter a value for dr:')) #length of cell side
N = int(input('Please enter a value for N:')) #N^3 Number of cells to be created
Xc = zeros(N) #array creation
Yc = zeros(N)
Zc = zeros(N)
x1=0
y1=0
z1=0
county = 0
countz = 0
for i in range(N): #for loops to define cell centers
Xc[i] = dx/2 +x1
xmin = Xc[i]-dx/2
xmax = Xc[i]+dx/2
x1+=dx #increments x1 positions by dx
for j in range(N):
Yc[j] = dy/2 +y1
ymin = Yc[j]-dy/2
ymax = Yc[j]+dy/2
county+=1
if county==N: #if else statement resets y1 to zero
y1=0
county=0
else:
y1+=dy
for k in range(N):
Zc[k] = dz/2 +z1
countz+=1
zmin = Zc[k]-dz/2
zmax = Zc[k]+dz/2
if countz==N:
z1=0
countz=0
else:
z1+=dz
counter(Xc[i],Yc[j],Zc[k],N,xmin,xmax,ymin,ymax,zmin,zmax,*read_file(F))
return()
def counter(Xc,Yc,Zc,N,xmin,xmax,ymin,ymax,zmin,zmax,Xa,Ya,Za):
Cellcount = zeros(1)
S = (((xmin <= Xa) & (Xa <= xmax))& #count what is in specific range
((ymin <= Ya) & (Ya <= ymax))&
((zmin <= Za) & (Za <= zmax))).sum()
for l in range(1):
Cellcount[l]= S
filesave(Xc,Yc,Zc,S)
return()
I am going to go out on the limb and say the difference you are observing is due to the changed division between version 2.x and 3.x. (it looks like there's a lot of dividing going on, and I can't tell what type the numbers are, integer or float)
In 2.x you would get integer truncation when doing division with integers. This doesn't happen in v 3.x
Python 2
In [267]: 5 / 2
Out[267]: 2
Python 3:
In [1]: 5 / 2
Out[1]: 2.5
Your code does a lot of division.
If you still want to old integer division behavior, you can use // with Python 3:
Python 3:
In [2]: 5 // 2
Out[2]: 2
Changing the Division Operator explains this in detail.
What’s New In Python 3.0 goes over the big changes from v 2 to 3
If you want the new division behavior in Python 2.2+, you can use the from __future__ import division directive (Thanks #Jeff for reminding me).
Python 2:
In [1]: 5 / 2
Out[1]: 2
In [2]: from __future__ import division
In [3]: 5 / 2
Out[3]: 2.5
UPDATE:
Finally, please consider the potential problem of division as a cause (so perhaps the lines aren't out of order, but the results are different due to the division making it only appear that way). Is that possible? Also notice that the 4th column (the 3.x output) has all zeros .. that's not present in the 2.x output and further points toward possible problems with the computation of results -- so in fact the results are different and not out of order.
Your filesave function is fine. I bet the difference in output is because Python 2 returns an integer from integer division expressions, while Python 3 returns a float:
Python 2
>>> 1/2
0
>>> 4/2
2
Python 3
>>> 1/2
0.5
>>> 4/2
2.0
This will give different mathematical results in your program and might account for the different ordering of the output.
Related
Exercise 7.3 from Think Python 2nd Edition:
To test the square root algorithm in this chapter, you could compare it with
math.sqrt. Write a function named test_square_root that prints a table like this:
1.0 1.0 1.0 0.0
2.0 1.41421356237 1.41421356237 2.22044604925e-16
3.0 1.73205080757 1.73205080757 0.0
4.0 2.0 2.0 0.0
5.0 2.2360679775 2.2360679775 0.0
6.0 2.44948974278 2.44948974278 0.0
7.0 2.64575131106 2.64575131106 0.0
8.0 2.82842712475 2.82842712475 4.4408920985e-16
9.0 3.0 3.0 0.0
The first column is a number, a; the second column is the square root of a computed with the function from Section 7.5; the third column is the square root computed by math.sqrt; the fourth column is the absolute value of the difference between the two estimates.
It took me a while to get to this point:
import math
def square_root(a):
x = a / 2
epsilon = 0.0000001
while True:
y = (x + a/x) / 2
if abs(y-x) < epsilon:
break
x = y
return y
def last_digit(number):
rounded = '{:.11f}'.format(number)
dig = str(rounded)[-1]
return dig
def test_square_root():
for a in range(1, 10):
if square_root(a) - int(square_root(a)) < .001:
f = 1
s = 13
elif last_digit(math.sqrt(a)) == '0':
f = 10
s = 13
else:
f = 11
s = 13
print('{0:.1f} {1:<{5}.{4}f} {2:<{5}.{4}f} {3}'.format(a, square_root(a), math.sqrt(a), abs(square_root(a)-math.sqrt(a)), f, s))
test_square_root()
That's my current output:
1.0 1.0 1.0 1.1102230246251565e-15
2.0 1.41421356237 1.41421356237 2.220446049250313e-16
3.0 1.73205080757 1.73205080757 0.0
4.0 2.0 2.0 0.0
5.0 2.2360679775 2.2360679775 0.0
6.0 2.44948974278 2.44948974278 8.881784197001252e-16
7.0 2.64575131106 2.64575131106 0.0
8.0 2.82842712475 2.82842712475 4.440892098500626e-16
9.0 3.0 3.0 0.0
I'm more focused now on achieving the right output, then I'll perfect the code itself, so here are my main problems:
Format the last column (I used {:.12g} once, but then the '0.0' turned to be only '0', so, what should I do?)
Fix the values of the last column. As you can see, there should be only two numbers greater than 0 (when a = 2 and 8), but there are two more (when a = 6 and 1), I printed them alone to see what was going on and the results were the same, I can't understand it.
Thanks for your help! :)
I'm trying to convert kilometer values in one column of a dataframe to mile values. I've tried various things and this is what I have now:
def km_dist(column, dist):
length = len(column)
for dist in zip(range(length), column):
if (column == data["dist"] and dist in data.loc[(data["dist"] > 25)]):
return dist / 5820
else:
return dist
data = data.apply(lambda x: km_dist(data["dist"], x), axis=1)
The dataset I'm working with looks something like this:
past_score dist income lab score gender race income_bucket plays_sports student_id lat long
0 8.091553 11.586920 67111.784934 0 7.384394 male H 3 0 1 0.0 0.0
1 8.091553 11.586920 67111.784934 0 7.384394 male H 3 0 1 0.0 0.0
2 7.924539 7858.126614 93442.563796 1 10.219626 F W 4 0 2 0.0 0.0
3 7.924539 7858.126614 93442.563796 1 10.219626 F W 4 0 2 0.0 0.0
4 7.726480 11.057883 96508.386987 0 8.544586 M W 4 0 3 0.0 0.0
With my code above, I'm trying to loop through all the "dist" values and if those values are in the right column ("data["dist"]") and greater than 25, divide those values by 5820 (the number of feet in a kilometer). More generally, I'd like to find a way to operate on specific elements of dataframes. I'm sure this is at least a somewhat common question, I just haven't been able to find an answer for it. If someone could point me towards somewhere with an answer, I would be just as happy.
Instead your solution filter rows with mask and divide column dist by 5820:
data.loc[data["dist"] > 25, 'dist'] /= 5820
Working same like:
data.loc[data["dist"] > 25, 'dist'] = data.loc[data["dist"] > 25, 'dist'] / 5820
data.loc[data["dist"] > 25, 'dist'] /= 5820
print (data)
past_score dist income lab score gender race \
0 8.091553 11.586920 67111.784934 0 7.384394 male H
1 8.091553 11.586920 67111.784934 0 7.384394 male H
2 7.924539 1.350194 93442.563796 1 10.219626 F W
3 7.924539 1.350194 93442.563796 1 10.219626 F W
4 7.726480 11.057883 96508.386987 0 8.544586 M W
income_bucket plays_sports student_id lat long
0 3 0 1 0.0 0.0
1 3 0 1 0.0 0.0
2 4 0 2 0.0 0.0
3 4 0 2 0.0 0.0
4 4 0 3 0.0 0.0
The program I'm writing simulates rolling 4 dice and adds the result from each together into a "Total" column. I'm trying to print the outcomes for 10,000 dice rolls but for some reason the value of each dice drops to 0.0 somewhere in the program and it continues like this until the end. Could anyone tell me what's going wrong here and how to fix it? Thanks :)
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
np.random.seed(101)
four_dice = np.zeros([pow(10,4),5]) # 10,000 rows, 5 columns
n = 0
outcomes = [1,2,3,4,5,6]
for i in outcomes:
for j in outcomes:
for k in outcomes:
for l in outcomes:
four_dice[n,:] = [i,j,k,l,i+j+k+l]
n +=1
four_dice_df = pd.DataFrame(four_dice,columns=('1','2','3','4','Total'))
print(four_dice_df) #print the table
OUTPUT
1 2 3 4 Total
0 1.0 1.0 1.0 1.0 4.0
1 1.0 1.0 1.0 2.0 5.0
2 1.0 1.0 1.0 3.0 6.0
3 1.0 1.0 1.0 4.0 7.0
4 1.0 1.0 1.0 5.0 8.0
... ... ... ... ... ...
9995 0.0 0.0 0.0 0.0 0.0
9996 0.0 0.0 0.0 0.0 0.0
9997 0.0 0.0 0.0 0.0 0.0
9998 0.0 0.0 0.0 0.0 0.0
9999 0.0 0.0 0.0 0.0 0.0
[10000 rows x 5 columns]
Does this work for what you want?
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(1,7,size=(10000,4)),columns = [1,2,3,4])
df['total'] = df.sum(axis=1)
You ran out of dice combinations. You made your table 10^4 rows long, but there are only 6^4 combinations. Any row from 1296 through 9999 will be 0, because that's the initialized value.
To fix this, cut your table at the proper value: pow(6, 4)
Response to OP comment:
Of course you can write a loop. In this case, the controlling factor should be the number of results you want. Then you generate permutations to fulfill your needs. The Pythonic way to do this is to use the itertools package: permutations will give you the rolls in order; cycle will repeat the sequence until you quit asking.
However, the more obvious way for your current programming is perhaps to simply count in base 6:
digits = [1, 1, 1, 1, 1]
for i in range(10000):
# Record your digits in the data frame
...
# Add one for the next iteration; roll over if the die is already 6
for idx, die in enumerate(digits):
if die < 6:
digits[idx] += 1
break
else: # Reset die to 1 and continue to next die
digits[idx] = 1
This will increment the dice, left to right, until you either have one that doesn't need a reset to 1, or run out of dice.
Another possibility is to copy any of the many base-conversion functions available on line. Convert your iteration counter i to base 6, take the lowest 4 digits (quantity of dice), and add 1 to each digit.
This must be really obvious but I am currently doing a little tutorial that features this code snippet:
n=0
a=1
while a>0:
n=n+1
a=(1.0+2.0**(-n))-1.0
print (n)
And I've tried to run it but it keeps getting stuck at n=53. Why? I just assumed that while would always be true ...
If you change the last line to print(n, a) you can see what's happening more clearly:
n = 0
a = 1
while a > 0:
n = n + 1
a = (1.0 + 2.0 ** (-n)) - 1.0
print(n, a)
Output:
1 0.5
2 0.25
3 0.125
4 0.0625
# ...
50 8.881784197001252e-16
51 4.440892098500626e-16
52 2.220446049250313e-16
53 0.0
As you can see, a is half the size each time through the loop. Eventually, 2.0 ** (-n) is so small that floating point math (which has limited precision) is unable to tell the difference between 1.0 and 1.0 + 2.0 ** (-n):
>>> 1.0 + 2.0 ** -51
1.0000000000000004
>>> 1.0 + 2.0 ** -52
1.0000000000000002
>>> 1.0 + 2.0 ** -53
1.0
… and when that happens, subtracting 1.0 from 1.0 gives you 0.0, and the while loop terminates.
This question already has answers here:
Why does the division get rounded to an integer? [duplicate]
(13 answers)
Closed 6 years ago.
i have the following code:
a = 0.0
b = 0.0
c = 1.0
while a < 300:
a = a + 1
b = b + 1
c = c * b
d = (3**a)
e = (a+1)*c
f = d / e
print a, f
The moment f becomes less than 1, I get "0" displayed.. why?
The moment f becomes less than 1, I get "0" displayed
That's not what happens. The first time f is less than 1, 4.0 0.675 is printed. That's not 0:
1.0 1.5
2.0 1.5
3.0 1.125
4.0 0.675
5.0 0.3375
The value of f then quickly becomes very very small, to the point Python starts using scientific notation negative exponents:
6.0 0.144642857143
7.0 0.0542410714286
8.0 0.0180803571429
9.0 0.00542410714286
10.0 0.00147930194805
11.0 0.000369825487013
12.0 8.53443431568e-05
13.0 1.82880735336e-05
14.0 3.65761470672e-06
15.0 6.8580275751e-07
Note the -05, -06, etc. The value of f has become so small that it is more efficient to shift the decimal point. If you were to format those values using fixed-point notation, they'd use more zeros:
>>> format(8.53443431568e-05, '.53f')
'0.00008534434315680000128750276600086976941383909434080'
>>> format(6.8580275751e-07, '.53f')
'0.00000068580275751000002849671038224199648425383202266'
Eventually, f becomes too small for the floating point type; floating point values simply can't go closer to zero beyond this point:
165.0 5.89639287564e-220
166.0 1.05923225311e-221
167.0 1.89148616627e-223
168.0 3.35766775077e-225
169.0 5.92529603077e-227
170.0 0.0
The last value before 0.0 is 5.925 with two hundred and twenty seven zeros before it:
>>> format(5.92529603077e-227, '.250f')
'0.0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000592529603077000028336751'
That's pretty close to the absolute minimum value a float object can represent; see the min attribute of the sys.float_info named tuple:
>>> import sys
>>> sys.float_info.min
2.2250738585072014e-308
In other words, you reached the limits, and there was no more representable value between 5.92529603077e-227 and 0.0 left.