is there an alternative to using mod "%" in python - python

I am trying to cycle through a list of numbers (mostly decimals), but I want to return both 0.0 and the max number.
for example
maxNum = 3.0
steps = 5
increment = 0
time = 10
while increment < time:
print increment * (maxNum / steps)% maxNum
increment+=1
#
I am getting this as an output
0.0
0.6
1.2
1.8
2.4
0.0
but I want 3.0 as the largest number and to start back at 0.0 I.E.
0.0
0.6
1.2
1.8
2.4
3.0
0.0
Note, I have to avoid logical loops for the calculation part.

You could create the numbers that you want then use itertools.cycle to cycle through them:
import itertools
nums = itertools.cycle(0.6*i for i in range(6))
for t in range(10):
print(next(nums))
Output:
0.0
0.6
1.2
1.7999999999999998
2.4
3.0
0.0
0.6
1.2
1.7999999999999998

Only small change did the trick:
maxNum = 3.0
steps = 5
i = 0
times = 10
step = maxNum / steps
while (i < times):
print(step * (i % (steps + 1)))
i += 1
0.0
0.6
1.2
1.7999999999999998
2.4
3.0
0.0
0.6
1.2
1.7999999999999998

You could make a if statement that looks ahead if the next printed number is 0.0 then print the maxNum
maxNum = 3.0
steps = 5
increment = 0
time = 10
while increment < time:
print(round(increment * (maxNum / steps)% maxNum, 2))
increment+=1
if (round(increment * (maxNum / steps)% maxNum, 2)) == 0.0:
print(maxNum)
0.0
0.6
1.2
1.8
2.4
3.0
0.0
0.6
1.2
1.8
2.4
3.0

Related

String formatting in general format

Exercise 7.3 from Think Python 2nd Edition:
To test the square root algorithm in this chapter, you could compare it with
math.sqrt. Write a function named test_square_root that prints a table like this:
1.0 1.0 1.0 0.0
2.0 1.41421356237 1.41421356237 2.22044604925e-16
3.0 1.73205080757 1.73205080757 0.0
4.0 2.0 2.0 0.0
5.0 2.2360679775 2.2360679775 0.0
6.0 2.44948974278 2.44948974278 0.0
7.0 2.64575131106 2.64575131106 0.0
8.0 2.82842712475 2.82842712475 4.4408920985e-16
9.0 3.0 3.0 0.0
The first column is a number, a; the second column is the square root of a computed with the function from Section 7.5; the third column is the square root computed by math.sqrt; the fourth column is the absolute value of the difference between the two estimates.
It took me a while to get to this point:
import math
def square_root(a):
x = a / 2
epsilon = 0.0000001
while True:
y = (x + a/x) / 2
if abs(y-x) < epsilon:
break
x = y
return y
def last_digit(number):
rounded = '{:.11f}'.format(number)
dig = str(rounded)[-1]
return dig
def test_square_root():
for a in range(1, 10):
if square_root(a) - int(square_root(a)) < .001:
f = 1
s = 13
elif last_digit(math.sqrt(a)) == '0':
f = 10
s = 13
else:
f = 11
s = 13
print('{0:.1f} {1:<{5}.{4}f} {2:<{5}.{4}f} {3}'.format(a, square_root(a), math.sqrt(a), abs(square_root(a)-math.sqrt(a)), f, s))
test_square_root()
That's my current output:
1.0 1.0 1.0 1.1102230246251565e-15
2.0 1.41421356237 1.41421356237 2.220446049250313e-16
3.0 1.73205080757 1.73205080757 0.0
4.0 2.0 2.0 0.0
5.0 2.2360679775 2.2360679775 0.0
6.0 2.44948974278 2.44948974278 8.881784197001252e-16
7.0 2.64575131106 2.64575131106 0.0
8.0 2.82842712475 2.82842712475 4.440892098500626e-16
9.0 3.0 3.0 0.0
I'm more focused now on achieving the right output, then I'll perfect the code itself, so here are my main problems:
Format the last column (I used {:.12g} once, but then the '0.0' turned to be only '0', so, what should I do?)
Fix the values of the last column. As you can see, there should be only two numbers greater than 0 (when a = 2 and 8), but there are two more (when a = 6 and 1), I printed them alone to see what was going on and the results were the same, I can't understand it.
Thanks for your help! :)

Is there any method for labeling without iteration with pandas?

I have two time-based data. One is the accelerometer's measurement data, another is label data.
For example,
accelerometer.csv
timestamp,X,Y,Z
1.0,0.5,0.2,0.0
1.1,0.2,0.3,0.0
1.2,-0.1,0.5,0.0
...
2.0,0.9,0.8,0.5
2.1,0.4,0.1,0.0
2.2,0.3,0.2,0.3
...
label.csv
start,end,label
1.0,2.0,"running"
2.0,3.0,"exercising"
Maybe these data are unrealistic because these are just examples.
In this case, I want to merge these data to below:
merged.csv
timestamp,X,Y,Z,label
1.0,0.5,0.2,0.0,"running"
1.1,0.2,0.3,0.0,"running"
1.2,-0.1,0.5,0.0,"running"
...
2.0,0.9,0.8,0.5,"exercising"
2.1,0.4,0.1,0.0,"exercising"
2.2,0.3,0.2,0.3,"exercising"
...
I'm using the "iterrows" of pandas. However, the number of rows of real data is greater than 10,000. Therefore, the running time of program is so long. I think, there is at least one method for this work without iteration.
My code like to below:
import pandas as pd
acc = pd.read_csv("./accelerometer.csv")
labeled = pd.read_csv("./label.csv")
for index, row in labeled.iterrows():
start = row["start"]
end = row["end"]
acc.loc[(start <= acc["timestamp"]) & (acc["timestamp"] < end), "label"] = row["label"]
How can I modify my code to get rid of "for" iteration?
If the times in accelerometer don't go outside the boundaries of the times in label, you could use merge_asof:
accmerged = pd.merge_asof(acc, labeled, left_on='timestamp', right_on='start', direction='backward')
Output (for the sample data in your question):
timestamp X Y Z start end label
0 1.0 0.5 0.2 0.0 1.0 2.0 running
1 1.1 0.2 0.3 0.0 1.0 2.0 running
2 1.2 -0.1 0.5 0.0 1.0 2.0 running
3 2.0 0.9 0.8 0.5 2.0 3.0 exercising
4 2.1 0.4 0.1 0.0 2.0 3.0 exercising
5 2.2 0.3 0.2 0.3 2.0 3.0 exercising
Note you can remove the start and end columns with drop if you want to:
accmerged = accmerged.drop(['start', 'end'], axis=1)
Output:
timestamp X Y Z label
0 1.0 0.5 0.2 0.0 running
1 1.1 0.2 0.3 0.0 running
2 1.2 -0.1 0.5 0.0 running
3 2.0 0.9 0.8 0.5 exercising
4 2.1 0.4 0.1 0.0 exercising
5 2.2 0.3 0.2 0.3 exercising

Printing outcomes for rolling 4 dice 10,000 times

The program I'm writing simulates rolling 4 dice and adds the result from each together into a "Total" column. I'm trying to print the outcomes for 10,000 dice rolls but for some reason the value of each dice drops to 0.0 somewhere in the program and it continues like this until the end. Could anyone tell me what's going wrong here and how to fix it? Thanks :)
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
np.random.seed(101)
four_dice = np.zeros([pow(10,4),5]) # 10,000 rows, 5 columns
n = 0
outcomes = [1,2,3,4,5,6]
for i in outcomes:
for j in outcomes:
for k in outcomes:
for l in outcomes:
four_dice[n,:] = [i,j,k,l,i+j+k+l]
n +=1
four_dice_df = pd.DataFrame(four_dice,columns=('1','2','3','4','Total'))
print(four_dice_df) #print the table
OUTPUT
1 2 3 4 Total
0 1.0 1.0 1.0 1.0 4.0
1 1.0 1.0 1.0 2.0 5.0
2 1.0 1.0 1.0 3.0 6.0
3 1.0 1.0 1.0 4.0 7.0
4 1.0 1.0 1.0 5.0 8.0
... ... ... ... ... ...
9995 0.0 0.0 0.0 0.0 0.0
9996 0.0 0.0 0.0 0.0 0.0
9997 0.0 0.0 0.0 0.0 0.0
9998 0.0 0.0 0.0 0.0 0.0
9999 0.0 0.0 0.0 0.0 0.0
[10000 rows x 5 columns]
Does this work for what you want?
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(1,7,size=(10000,4)),columns = [1,2,3,4])
df['total'] = df.sum(axis=1)
You ran out of dice combinations. You made your table 10^4 rows long, but there are only 6^4 combinations. Any row from 1296 through 9999 will be 0, because that's the initialized value.
To fix this, cut your table at the proper value: pow(6, 4)
Response to OP comment:
Of course you can write a loop. In this case, the controlling factor should be the number of results you want. Then you generate permutations to fulfill your needs. The Pythonic way to do this is to use the itertools package: permutations will give you the rolls in order; cycle will repeat the sequence until you quit asking.
However, the more obvious way for your current programming is perhaps to simply count in base 6:
digits = [1, 1, 1, 1, 1]
for i in range(10000):
# Record your digits in the data frame
...
# Add one for the next iteration; roll over if the die is already 6
for idx, die in enumerate(digits):
if die < 6:
digits[idx] += 1
break
else: # Reset die to 1 and continue to next die
digits[idx] = 1
This will increment the dice, left to right, until you either have one that doesn't need a reset to 1, or run out of dice.
Another possibility is to copy any of the many base-conversion functions available on line. Convert your iteration counter i to base 6, take the lowest 4 digits (quantity of dice), and add 1 to each digit.

Summing subsets of many dataframes

I have ~1.2k files that when converted into dataframes look like this:
df1
A B C D
0 0.1 0.5 0.2 C
1 0.0 0.0 0.8 C
2 0.5 0.1 0.1 H
3 0.4 0.5 0.1 H
4 0.0 0.0 0.8 C
5 0.1 0.5 0.2 C
6 0.1 0.5 0.2 C
Now, I have to subset each dataframe with a window of fixed size along the rows, and add its contents to a second dataframe, with all its values originally initialized to 0.
df_sum
A B C
0 0.0 0.0 0.0
1 0.0 0.0 0.0
2 0.0 0.0 0.0
For example, let's set the window size to 3. The first subset therefore will be
window = df.loc[start:end, 'A':'C']
window
A B C
0 0.1 0.5 0.2
1 0.0 0.0 0.8
2 0.5 0.1 0.1
window.index = correct_index
df_sum = df_sum.add(window, fill_value=0)
df_sum
A B C
0 0.1 0.5 0.2
1 0.0 0.0 0.8
2 0.5 0.1 0.1
After that, the window will be the subset of df1 from rows 1-4, then rows 2-5, and finally rows 3-6. Once the first file has been scanned, the second file will begin, until all file have been processed. As you can see, this approach relies on df.loc for the subset and df.add for the addition. However, despite the ease of coding, it is very inefficient. On my machine it takes about 5 minutes to process the whole batch of 1.2k files of 200 lines each. I know that an implementation based on numpy arrays is orders of magnitude faster (about 10 seconds), but a bit more complicated in terms of subsetting and adding. Is there any way to increase the performance of this method while stile using dataframe? For example substituting the loc with a more performing slice method.
Example:
def generate_index_list(window_size):
before_offset = -(window_size - 1)// 2
after_offset = (window_size - 1)// 2
index_list = list()
for n in range(before_offset, after_offset + 1):
index_list.append(str(n))
return index_list
window_size = 3
for file in os.listdir('.'):
df1 = pd.read_csv(file, sep= '\t')
starting_index = (window_size - 1)//2
before_offset = (window_size - 1)// 2
after_offset = (window_size -1)//2
for index in df1.iterrows():
if index < starting_index or index + before_offset + 1 > len(profile.index):
continue
indexes = generate_index_list(window_size)
window = df1.loc[index - before_offset:index + after_offset, 'A':'C']
window.index = indexes
df_sum = df_sum.add(window, fill_value=0)
Expected output:
df_sum
A B C
0 1.0 1.1 2.0
1 1.0 1.1 2.0
2 1.1 1.6 1.4
Consider building a list of subsetted data frames with.loc and .head. Then run groupby aggregation after individual elements are concatenated.
window_size = 3
def window_process(file):
csv_df = pd.read_csv(file, sep= '\t')
window_dfs = [(csv_df.loc[i:,['A', 'B', 'C']] # ROW AND COLUMN SLICE
.head(window) # SELECT FIRST WINDOW ROWS
.reset_index(drop=True) # RESET INDEX TO 0, 1, 2, ...
) for i in range(df.shape[0])]
sum_df = (pd.concat(window_dfs) # COMBINE WINDOW DFS
.groupby(level=0).sum()) # AGGREGATE BY INDEX
return sum_df
# BUILD LONG DF FROM ALL FILES
long_df = pd.concat([window_process(f) for file in os.listdir('.')])
# FINAL AGGREGATION
df_sum = long_df.groupby(level=0).sum()
Using posted data sample, below are the outputs of each window_dfs:
A B C
0 0.1 0.5 0.2
1 0.0 0.0 0.8
2 0.5 0.1 0.1
A B C
0 0.0 0.0 0.8
1 0.5 0.1 0.1
2 0.4 0.5 0.1
A B C
0 0.5 0.1 0.1
1 0.4 0.5 0.1
2 0.0 0.0 0.8
A B C
0 0.4 0.5 0.1
1 0.0 0.0 0.8
2 0.1 0.5 0.2
A B C
0 0.0 0.0 0.8
1 0.1 0.5 0.2
2 0.1 0.5 0.2
A B C
0 0.1 0.5 0.2
1 0.1 0.5 0.2
A B C
0 0.1 0.5 0.2
With final df_sum to show accuracy of DataFrame.add():
df_sum
A B C
0 1.2 2.1 2.4
1 1.1 1.6 2.2
2 1.1 1.6 1.4

Checking decimal point python

I'm trying to create a script that counts to 3 (step size 0.1) using while, and I'm trying to make it not display .0 for numbers without decimal number (1.0 should be displayed as 1, 2.0 should be 2...)
What I tried to do is convert the float to int and then check if they equal. the problem is that it works only with the first number (0) but it doesn't work when it gets to 1.0 and 2.0..
this is my code:
i = 0
while i < 3.1:
if int(i) == i:
print int(i)
else:
print i
i = i + 0.1
that's the output I get:
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2.0
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
3.0
the output I should get:
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
3
thank you for your time.
Due to lack of precision in floating point numbers, they will not have an exact integral representation. Therefore, you want to make sure the difference is smaller than some small epsilon.
epsilon = 1e-10
i = 0
while i < 3.1:
if abs(round(i) - i) < epsilon:
print round(i)
else:
print i
i = i + 0.1
You can remove trailing zeros with '{0:g}'.format(1.00).
i = 0
while i < 3.1:
if int(i) == i:
print int(i)
else:
print '{0:g}'.format(i)
i = i + 0.1
See: https://docs.python.org/3/library/string.html#format-specification-mini-language
Update: Too lazy while copy/pasting. Thanks #aganders3
i = 0
while i < 3.1:
print '{0:g}'.format(i)
i = i + 0.1

Categories

Resources