Segmenting a list of data; python

Segmenting a list of data; python - python

I am trying to write a code that takes a list flow_rate, changes it into a segmented list list_segmented of length segment_len. Then with that segmented list, I take each index and make it a list of data_segment.
I am getting stuck trying to figure out how to make each list_segmented[i] = data_segment. The last part of the code calls another function for data_segment in which I have previously written and can import it.
Appreciate your help.
def flow_rate_to_disorder_status(flow_rate,segment_len,interval,threshold):
inlist = flow_rate[:]
list_segmented = []
disorder_status = []
while inlist:
list_segmented.append(inlist[0 : segment_len])
inlist[0 : segment_len] = []
for i in range(0, len(list_segmented)):
data_segment = list_segmented[i]
condition = sym.has_symptom(data_segment, interval, threshold)
disorder_status.append(condition)
Initial function:
def has_symptom(data_segment,interval,threshold):
max_ratio = 1 # maximum ratio allowed when dividing
# data points in interval by len data_segment
# for our example it is 1
# NOTE: max_ratio can NOT be less than threshold
# to define the range of the given interval:
min_interval = interval[0]
max_interval = interval[1]
# create an empty list to add to data points that fall in the interval
symptom_yes = []
# create a loop function to read every point in data_segment
# and compare wether or not it falls in the interval
for i in range(0, len(data_segment)):
if min_interval <= data_segment[i] <= max_interval:
# if the data falls in interval, add it to list symptom_yes
symptom_yes.append(data_segment[i])
# to get the fraction ration between interval points and total data points
fraction_ratio = len(symptom_yes) / len(data_segment)
# if the ratio of data points that fall in interval to total points in
# data segments is more than or equal to threshold and less than or equal
# to max_ratio (1 in our case) then apply condition
if threshold <= fraction_ratio <= max_ratio:
condition = True # entire segment has the symptom
else:
condition = False # entire segment does NOT have the symptom
return condition

You nearly did it:
for i in range(0, len(data_segment)): # <-- looping thru data_segment
# data_segment = list_segmented[i] <-- this was back to front.
list_segmented[i] = data_segment # <-- this will work
note: there are cleaner ways of doing this in python (like list comprehension).
Anyway, good question. Hope that helps.

It looks like the lines
condition = sym.has_symptom(data_segment, interval, threshold)
disorder_status.append(condition)
should each be indented by one more level to be inside the for loop, so that they are executed for each data segment.
You presumably also want to return disorder_status at the end of the function.

Related

Trying to add a progress bar as my python program runs

I am a beginner writing a Python code, where the computer generates a random number between 1 and 10, 1 and 100, 1 and 1000, 1 and 10000, 1 and 100000 and so on. The computer itself will guess the random number a number of times (a user input number), and every time there is a count of how many times the computer took to guess the random number. A mean of the count over the number of times will be calculated and put in an array, where matplotlib will generate a graph of x=log10(the upper bounds of the random number range, i.e. 10, 100, 1000,...)
At the moment, I print the log10 of each bound as it is processed, and that has been acting as my progress tracker. But I am thinking of adding my progress bar, and I don't know where to put it so that I can see how much of the overall program has run.
I have added tqdm.tqdm in all sorts of different places to no avail. I am expecting a progress bar increasing as the program runs.
My program is as shown.
# Importing the modules needed
import random
import time
import timeit
import numpy as np
import matplotlib.pyplot as plt
import tqdm
# Function for making the computer guess the number it itself has generated and seeing how many times it takes for it to guess the number
def computer_guess(x):
# Telling program that value "low" exists and it's 0
low = 0
# Telling program that value "high" exists and it's the arbitrary parameter x
high = x
# Storing random number with lower limit "low" and upper limit "high" as "ranno" for the while-loop later
ranno = random.randint(low, high)
# Setting initial value of "guess" for iteration
guess = -1
# Setting initial value of "count" for iteration
count = 1
# While-loop for all guessing conditions
while guess != ranno:
# Condition: As long as values of "low" and "high" aren't the same, keep guessing until the values are the same, in which case "guess" is same as "low" (or "high" becuase they are the same anyway)
if low != high:
guess = random.randint(low, high)
else:
guess = low
# Condition: If the guess if bigger than the random number, lower the "high" value to one less than 1, and add 1 to the guess count
if guess > ranno:
high = guess - 1
count += 1
# Condition: If the guess if smaller than the random number, increase the "low" value to one more than 1, and add 1 to the guess count
elif guess < ranno:
low = guess + 1
count += 1
# Condition: If it's not either of the above, i.e. the computer has guessed the number, return the guess count for this function
else:
return count
# Setting up a global array "upper_bounds" of the range of range of random numbers as a log array from 10^1 to 10^50
upper_bounds = np.logspace(1, 50, 50, 10)
def guess_avg(no_of_guesses):
# Empty array for all averages
list_of_averages = []
# For every value in the "upper_bounds" array,
for bound in upper_bounds:
# choose random number, "ranx", between 0 and the bound in the array
ranx = random.randint(0, bound)
# make an empty Numpy array, "guess_array", to insert the guesses into
guess_array = np.array([])
# For every value in whatever the no_of_guesses is when function called,
for i in np.arange(no_of_guesses):
# assign value, "guess", as calling function with value "ranx"
guess = computer_guess(ranx)
# stuff each resultant guess into the "guess_array" array
guess_array = np.append(guess_array, guess)
# Print log10 of each value in "upper_bound"
print(int(np.log10(bound)))
# Finding the mean of each value of the array of all guesses for the order of magnitude
average_of_guesses = np.mean(guess_array)
# Stuff the averages of guesses into the array the empty array made before
list_of_averages.append(average_of_guesses)
# Save the average of all averages in the list of averages into a single variable
average_of_averages = np.mean(list_of_averages)
# Print the list of averages
print(f"Your list of averages: {list_of_averages}")
# Print the average of averages
print(f"Average of averages: {average_of_averages}")
return list_of_averages
# Repeat the "guess_avg" function as long as the program is running
while True:
# Ask user to input a number for how many guesses they want the computer to make for each order of magnitude, and use that number for calling the function "guess_avg()"
resultant_average_numbers = guess_avg(
int(input("Input the number of guesses you want the computer to make: ")))
# Plot a graph with log10 of the order of magnitudes on the horizontal and the returned number of average of guesses
plt.plot(np.log10(upper_bounds), resultant_average_numbers)
# Show plot
plt.show()
I apologise if this is badly explained, it's my first time using Stackoverflow.

You can define the following progress_bar function, which you will call from wherever you want to monitor the advancement in you code:
import colorama
def progress_bar(progress, total, color=colorama.Fore.YELLOW):
percent = 100 * (progress / float(total))
bar = '█' * int(percent) + '-' * (100 - int(percent))
print(color + f'\r|{bar}| {percent:.2f}%', end='\r')
if progress == total:
print(colorama.Fore.GREEN + f'\r|{bar}| {percent:.2f}%', end='\r')
Hope this helps

You can also call tqdm by hand and then update it manually.
progress_bar = tqdm.tqdm(total=100)
progress_bar.update()
When you are finished, you can call progress_bar.clear() to start again.

You probably want two progress bars in the guess_avg() function. One to track the ranges and another to track the guesses.
In this example I've used the Enlighten progress bar library, but you can accomplish similar behavior with other libraries that support nested progress bars. One advantage Enlighten is going to have over others is you can print whatever you want while the progress bar is running, good for debugging.
You can make this simpler by using context managers and auto-updating counters, but I didn't do that here to make it clearer what's happening. You can also customize the template used for the progress bar.
import enlighten
def guess_avg(no_of_guesses):
# Empty array for all averages
list_of_averages = []
# For every value in the "upper_bounds" array,
# Create a progress bar manager manager
manager = enlighten.get_manager(leave=False)
# Create main progress bar for ranges
pbar_bounds = manager.counter(total=len(upper_bounds), desc='Bound ranges', unit='ranges')
for bound in upper_bounds:
# choose random number, "ranx", between 0 and the bound in the array
ranx = random.randint(0, bound)
# make an empty Numpy array, "guess_array", to insert the guesses into
guess_array = np.array([])
# For every value in whatever the no_of_guesses is when function called,
# Create nested progress bar for guesses, leave removes progress bar on close
pbar_guesses = manager.counter(total=no_of_guesses, desc='Guessing', unit='guesses', leave=False)
for i in np.arange(no_of_guesses):
# assign value, "guess", as calling function with value "ranx"
guess = computer_guess(ranx)
# stuff each resultant guess into the "guess_array" array
guess_array = np.append(guess_array, guess)
pbar_guesses.update() # Update nested progress bar
pbar_guesses.close() # Close nested progress bar
# Print log10 of each value in "upper_bound"
print(int(np.log10(bound))) # You can remove this now if you want
pbar_bounds.update() # Update main progress bar
# Finding the mean of each value of the array of all guesses for the order of magnitude
average_of_guesses = np.mean(guess_array)
# Stuff the averages of guesses into the array the empty array made before
list_of_averages.append(average_of_guesses)
manager.stop() # Stop the progress bar manager
# Save the average of all averages in the list of averages into a single variable
average_of_averages = np.mean(list_of_averages)
# Print the list of averages
print(f"Your list of averages: {list_of_averages}")
# Print the average of averages
print(f"Average of averages: {average_of_averages}")
return list_of_averages

How to make a nested for-loop with two individual loops?

I am tasked with running a for-loop which initially finds the value of funds in a mans investment account from his 41-65th birthday. Here is the code below.
mu = 0.076 ###Mean
sigma = 0.167 ###Standard deviation
np.exp(np.random.normal(mu, sigma,))
u = .076 ###set variables
bi = 50000 ###set variables
empty = [] ###empty list
for i in range(25): ###range for birthdays 40-65
bi = ((bi) * (np.exp(np.random.normal(mu, sigma))))
empty.append(bi)
print(empty)
len(empty) ##making sure my lists match up
roundedempty = [ '%.2f' % elem for elem in empty ]
age = [41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,
60,61,62,63,64,65]
len(age) ##making sure my lists match up
investing = pd.DataFrame({"Balance":roundedempty, "Age":age})
investing.set_index('Age', inplace=True)
investing
When I print this out it give me this:
Age Balance
41 53948.13
.........
65 334294.72
Now I am tasked with simulating this 100,000 times, but I am not sure how to nest another loop within that first set of code.
mu = 0.076 ###Mean
sigma = 0.167 ###Standard deviation
bi = 50000
lognormvals = np.zeros(100000)
for i in range(100000):
lognormvals[i] = ((bi) * (np.exp(np.random.normal(mu, sigma,))))
print(lognormvals)
np.mean(lognormvals)
This is what I want, but it is only doing it for his 41st birthday. I am tasked with trying to find the means of each birthday from his 41-65th. How can I nest this loop within the first loop to solve this?
My Shot at solving:
def InvestmentSim():
mu = 0.076 ###Mean
sigma = 0.167 ###Standard deviation
np.exp(np.random.normal(mu, sigma,))
u = .076 ###set variables
bi = 50000 ###set variables
empty = [] ###empty list
for i in range(25): ###range for birthdays 40-65
bi = ((bi) * (np.exp(np.random.normal(mu, sigma))))
empty.append(bi)
roundedempty = [ '%.2f' % elem for elem in empty ]
age = [41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,
60,61,62,63,64,65]
len(age) ##making sure my lists match up
investing = pd.DataFrame({"Balance":roundedempty, "Age":age})
investing.set_index('Age', inplace=True)
a = investing.iloc[24]['Balance']
return a
def AverageSim(iterations):
results = []
for n in range (0, iterations):
a = InvestmentSim()
results.append(a)
print(results)
return myresult
myresult = AverageSim(1)
myresults = np.array(myresult) # Turn the list of values into an array
mean = np.mean(myresults)
median = np.median(myresults)
print(mean, median)
Instead of doing all the balance for each year instead I just singled out the balance on his 65th birthday and set it equal to (a). Is this how my code should work? Doesn't seem to be running

If you just want to repeat the first snippet n times, then I'd suggest wrappning your code for the simulation, up in a function which you can call repeatedly in a for loop. The function should return your expected values, and the for loop should collect the results. After the loop is finished you can do further calculations with the loop such as mean.
# Your simulation as a callable function
def InvestmentSim():
# your first code
return investing
def AverageSims(iterations):
# Initialise an empty list to collect the results
results = []
for n in range (0, iterations):
investing = InvestmentSim()
results.append(investing)
# Adds investing to the end of the list
# Returns a list of all the dataframes which you can use for future
# calculations.
# Or do the desired calculations here and return only those results.
return results
myresult = AverageSims(100000)
Note that with 100,000 iterations you will get a very long list of fairly bulky dataframes. So instead you may want to do some calculations in place or pull out relevant results from each iteration and discard the rest. For example you could just save the start and end balances from each sim and append those to lists and return those.
I'd give an example but I don't use pandas so I don't want to guess at the syntax. The basic principle is the same though: Initialise some blank lists and append the results in the for loop.
Calculations will probably be simpler to set up if you turn the finished list(s) into numpy arrays using np.array(somelist)
Edit
Your code isn't running because you're calling the AverageSims function inside the AverageSims function, so you never actually get to that call. You need to move that call outside so it's executed when you run your script. The simplest way is to write the call the same way I did above, outside any function and without indentation.
Also, if your AverageSims() function doesn't have the return mysresults line, it will return None instead. That's not a problem unless you want to use results later on.
If you don't want to keep the results and are happy with printing them out as you do now, you can also call the function without equating it to a variable:
def InvestmentSim():
# your first code
return investing
def AverageSims(iterations):
# Repeatedly do the simulation, collect results
# Returning the results, if you don't need them you can omit this:
return myresult
# Now call the AverageSims function, otherwise it will never be executed. Note: No indent.
myresults = AverageSims(100000) # If you're returning the results
# OR:
AverageSims(100000) # If you don't return the results
myresult = np.array(myresult) # Turn the list of values into an array
mean = np.mean(myresult)
median = np.median(myresult)
print(mean, median)

You can just put a loop inside another loop:
for x in range(10):
for y in range(5):
print(x,y)
I your case it would also be advisable to put the inner loop into a function, and run that function in the outer loop. Like this:
def calculate(*args):
# do your calculations here that include a loop
return result
for x in range(100000):
calculate()

How to make function that calculates moving average of a list?

I am trying to calculate a moving average of a list called 'temp_data' in the function below. The moving average data should be stored in a list called 'moving_average'. The code below works in the sense that list 'temp_mov' is printed inside the function (line 12), but not when I call the function later on (in the last line of the code). In that case, I get an empty list. What mistake do I make?
# calculate moving average of a list of weather data
def make_moving(temps, temp_mov):
''' Create moving average from list weather data'''
cumsum, temp_mov = [0], []
for i, x in enumerate(temps, 1):
cumsum.append(cumsum[i-1] + x)
if i>=N:
moving_ave = round((cumsum[i] - cumsum[i-N])/N, 1)
temp_mov.append(moving_ave)
print(temp_mov)
return temp_mov
make_moving(temp_data, moving_average)
print(moving_average)

You assign a new list to temp_mov here:
cumsum, temp_mov = [0], []
Therefore, moving_average is not updated when temp_mov changes.
Changing make_moving(temp_data, moving_average) to moving_average = make_moving(temp_data) and removing the temp_mov parameter will solve the problem.

How to write multiple columns in a csv file according to "Lists of tuples" obtained during "for loop" iteration in python

With due respect to the previous responses, I am going to completely change my question. I am generating lists of tuples as below.
for i in range(5):
TotalDistance = 0 # particle i starts moving from 0
TotalTime = 0 # particle i starts moving at time 0
driftpoints =[(0,0)]
while TotalDistance < 5.0:
time = random.uniform(0,1) # paritcle takes time to move to next position
distance = random.uniform(0,1) # particle moves by distance.
TotalTime = TotalTime + time
TotalDistance = TotalDistance + distance
position = (TotalTime, TotalDistance)
driftpoints.append(position)
An example list for first iteration is given below.
[(0, 0), (0.21724544874575513, 0.754467286127031), (0.25007307998158623, 1.118356895500405), (0.7047856454945854, 1.4755146942363875), (1.3710776008226833, 2.16401542582095), (1.9942383846177156, 2.9751487045440026), (2.707031044871571, 3.9578284975759295), (3.3278895170648877, 4.831285527860187), (4.000180863917544, 5.218308572399064)]
If it was a single list, I can save in csv file in the following format.
Time, Position,
0, 0,
0.21724544874575513, 0.754467286127031,
0.25007307998158623, 1.118356895500405,
0.7047856454945854, 1.4755146942363875,
1.3710776008226833, 2.16401542582095,
1.9942383846177156, 2.9751487045440026,
2.707031044871571, 3.9578284975759295,
3.3278895170648877, 4.831285527860187,
4.000180863917544, 5.218308572399064
But I have more iterations to come. And the problem I am facing is adding columns for next particles. How do I save 5 pairs of columns for each particles in a single csv file? remember that the length of the columns can differ significantly because of the random numbers taken as the time and distance.
Please forgive me and redirect me to the solution if this or similar question has been already answered.

Okay, I did not get any response on my question except the first one after which I edited my question and the response was deleted.
Any way, I got it the following way.
transposed = []
__________________
[for i in range(5):]
[ all ]
[ my ]
[ previous ]
[____________code__]
transposed=transposed+zip(*driftpoints)
FinalFile=list(itertools.izip_longest(*transposed))
The first line after my previous code transposed the lists created in each iteration and added to make a long list of tuples that look like
transposed = [(0, t1, t2,...t5),(0, x1, x2,...x5), ... (0, t1, t2,...t8),(0,x1, x2,...x8)]
The last code then re-transposed the list created so that each iteration appear in the next set of columns. Unlike zip(*transposed), it is not limited to the shortest length of the columns.
Finally, write the FinalFile to csv file as
with open('file.csv','wb') as out:
csv_out=csv.writer(out)
for row in Finalfile:
csv_out.writerow(row)

How to create a list of n dictionaries

I'm learning Python and for practicing purposes I'm writing a script that reads a file (containing a graph in Trivial Graph Format) and runs a couple of graph algorithms on the graph.
I thought about storing the graph in a list of n dictionaries, where n is the number of vertexes and all the edges of a vertex would be stored in a dictionary.
I tried this
edges = [{} for i in xrange(num_vertexes)]
for line in file:
args = line.split(' ')
vertex1 = int(args[0])
vertex2 = int(args[1])
label = int(args[2])
edges[vertex1][vertex2] = label
but I'm getting this error for the last line:
IndexError: list index out of range

It looks like vertex1 is probably greater than num_vertexes. Given that python indexes from 0 and the example on the wiki of the format goes from 1, the last line's vertex number is probably 1 higher than the length of the index (I'd need to see the file to know for sure, of course). So in the python case lst[0] is the first element, and lst[n-1] is the last element where for the vertexes 1 is the first element and n is the last element.
So the fix here is to use vertex1 = int(args[0])-1

The issue is somewhere with your data, add some validation to make sure your code doesn't choke on bad data. Currently your code will fail if a line has non-numbers, less than three numbers, or if vertex1 >= len(edges).
edges = [{} for i in xrange(num_vertexs)]
for line in file:
args = line.split(' ')
if len(args) >= 3:
try:
vertex1 = int(args[0])
vertex2 = int(args[1])
label = int(args[2])
if vertex1 < len(edges):
edges[vertex1][vertex2] = label
else:
# value for vertex1 is too large
pass
except ValueError:
# you got some non-number data
pass
else:
# you got a line with not enough data
pass
Replace any of those pass statements with logging if needed (you can also remove the two else blocks if you don't intend to use them).

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Segmenting a list of data; python - python

Related

Trying to add a progress bar as my python program runs

How to make a nested for-loop with two individual loops?

How to make function that calculates moving average of a list?

How to write multiple columns in a csv file according to "Lists of tuples" obtained during "for loop" iteration in python

How to create a list of n dictionaries

Categories

Resources