creating initial population for genetic algorithm - python

I need to create the initial population to solve this equation z = x⁵ - 10x³ + 30x - y² + 21y using genetic algorithm. The population must be binary and need to follow this rules:
X and Y range: [-2.5, 2.5]
The first bit represents the signal (0 or 1)
The second and third bit represents the integer part, values from 0 to 2 (00, 01, 10)
The rest should represents the float part, values from 0 to 5000.
def pop(pop_size):
pop = []
for i in range(pop_size):
for j in range(2):
signal = bin(np.random.randint(0, 2))[2:]
integer = bin(np.random.randint(0, 3))[2:]
float = bin(np.random.randint(0, 5001))[2:].zfill(13)
binary = [signal, integer, float]
binary = [''.join(binary)]
pop.append(binary)
return pop
My output right now looks like this: [['1110001000110000'], ['1100010010011000'], ['11000100010001010'], ['0100011000000010'], ['0100010111100001'], ['01000111001101110']]
But I need it to look like this: [['1110001000110000', '1100010010011000'], ['11000100010001010', '0100011000000010'], ['0100010111100001', '01000111001101110']] because each pair represents the value for X and Y.
Any idea of what I'm missing?

How about
def pop(pop_size)
rlt = []
for i in range(pop_size):
rlt.append([None,None])
for j in range(2):
signal = bin(np.random.randint(0, 2))[2:]
integer = bin(np.random.randint(0, 3))[2:]
floats = bin(np.random.randint(0, 5001))[2:].zfill(13)
rlt[-1][j] = signal+integer+floats
return rlt
Demo
>>> pop(3)
[['0100111010000110', '000110111010111'], ['100101100010010', '010010000101100'], ['0100011000011010', '0100111100001011']]

Related

Generating random floats, summing to 1, with minimum value

I saw a many solutions for generating random floats within a specific range (like this) which actually helps me, and solutions for generating random floats summing to 1 (like this), and separately solutions work perfectly, but I can't figure how to merge them.
Currently my code is:
import random
def sample_floats(low, high, k=1):
""" Return a k-length list of unique random floats
in the range of low <= x <= high
"""
result = []
seen = set()
for i in range(k):
x = random.uniform(low, high)
while x in seen:
x = random.uniform(low, high)
seen.add(x)
result.append(x)
return result
And still, applying
weights = sample_floats(0.055, 1.0, 11)
weights /= np.sum(weights)
Returns weights array, in which there are some floats less that 0.055
Should I somehow implement np.random.dirichlet in function above, or it should be built on the basis of np.random.dirichlet and then implement condition > 0.055? Can't figure any solution.
Thank you in advice!
IIUC, you want to generate an array of k values, with minimum value of low=0.055.
It is easier to generate numbers from 0 that sum up to 1-low*k, and then to add low so that the final array sums to 1. Thus, this guarantees both the lower bound and the sum.
Regarding the high, I am pretty sure it is mathematically impossible to add this constraint as once you fix the lower bound and the sum, there is not enough degrees of freedom to chose an upper bound. The upper bound will be 1-low*(k-1) (here 0.505).
Also, be aware that, with a minimum value, you necessarily enforce a maximum k of 1//low (here 18 values). If you set k higher, the low bound won't be correct.
# parameters
low = 0.055
k = 10
a = np.random.rand(k)
a = (a/a.sum()*(1-low*k))
weights = a+low
# checking that the sum is 1
assert np.isclose(weights.sum(), 1)
Example output:
array([0.13608635, 0.06796974, 0.07444545, 0.1361171 , 0.07217206,
0.09223554, 0.12713463, 0.11012871, 0.1107402 , 0.07297022])
You could generate k-1 numbers iteratively by varying the lower and upper bounds of the uniform random number generator - the constraint at any iteration being that the number generated allows the rest of the numbers to be at least low
def sample_floats(low, high, k=1):
result = []
generated = 0
while generated < k-1:
current_higher_bound = max(low, 1 - (k - 1 - generated)*low - sum(result))
next_num = random.uniform(low, current_higher_bound)
result.append(next_num)
generated += 1
last_num = 1 - sum(result)
result.append(last_num)
return result
print(sample_floats(0.01, 1, k=15))
#[0.08878760926151083,
# 0.17897435239586243,
# 0.5873150041878156,
# 0.021487776792166513,
# 0.011234379498998357,
# 0.012408564286727042,
# 0.015391011259745103,
# 0.01264921242128719,
# 0.010759267284382326,
# 0.010615007333002748,
# 0.010288605412288477,
# 0.010060487014659121,
# 0.010027216923973544,
# 0.010000064276203318,
# 0.010001441651377285]
The samples are correlated, so I believe you can't generate them in an IID way. you can, however, do it in an iterative manner. For example, you can do it as I show in the code below. There are a few more special cases to check like what if the user inputs low<high or high*k<sum. But I figured you can find and account for them using my modification to your code.
import random
import warnings
def sample_floats(low = 0.055, high = 1., x_sum = 1., k = 1):
""" Return a k-length list of unique random floats
in the range of 'low' <= x <= 'high' summing up to 'sum'.
"""
sum_i = 0
xs = []
if x_sum - (k-1)*low < high:
warnings.warn(f'high = {high} is to high to be generated under the'
f' conditions set by k = {k}, sum = {x_sum}, and low = {low}.'
f' high automatically set to {x_sum - (k-1)*low}.')
if k == 1:
if high < x_sum:
raise ValueError(f'The parameter combination k = {k}, sum = {x_sum},'
' and high = {high} is impossible.')
else: return x_sum
high_i = high
for i in range(k-1):
x = random.uniform(low, high_i)
xs.append(x)
sum_i = sum_i + x
if high < (x_sum - sum_i - (k-1-i)*low):
high_i = high
else: high_i = x_sum - sum_i - (k-1-i)*low
xs.append(x_sum - sum_i)
return xs
For example:
random.seed(0)
xs = sample_floats(low = 0.055, high = 0.5, x_sum = 1., k = 5)
print(xs)
print(sum(xs))
Output:
[0.43076772392864643, 0.27801464913542906, 0.08495210994346317, 0.06568433355884717, 0.14058118343361425]
1.0

Reading just after decimal point for an entire data file (Python)

I'm trying to read just the value after the decimal point of a parameter calculated in my python script for a whole data set. I've read up on using math.modf, and I understand how it should work but I'm unsure how to apply that to my dataset as a whole. Ultimately I'm trying to plot a scatter graph with the values calculated.
I only need the number after the decimal point from the equation where x is the imported dataset
p = (x[:,0]-y)/z
I understand math.modf gives a result (fractional, integer) so I tried adding [0] at the end but I think that interferes when I'm trying to read certain lines from the dataset.
Sorry if this is really basic I'm new to python, thanks in advance.
This is how I've inputted it so far
norm1 = np.loadtxt("dataset")
y = (numerical value)
z = (numerical value)
p = (norm1[:,0] - y)/z
decp = math.modf(p)
plt.scatter(decp, norm1[:,2])
Use like that:
import math
x = 5/2
print(x)
x = math.modf(x)[0]
print(x)
output:
2.5
0.5
edit 1:
For entire dataset:
import math
x = 5/2
y = 5/2
list = []
list.append(x)
list.append(y)
print(list)
for i in range(len(list)):
list[i] = math.modf(list[i])[0]
print(list)
output:
[2.5, 2.5]
[0.5, 0.5]
Can't you handle the numbers as a string and cast it back to int?
# example input
nbrs = [45646.45646, 45646.15649, 48646.67845, 486468.15684]
def get_first_frac(n: int)->int:
return int(str(n).split('.')[1][0])
nbrs_frac = [get_first_frac(n) for n in nbrs]
print(nbrs_frac)
result:
[4, 1, 6, 1]
edit: to apply this on a np array do the following
result = np.array(list(map(get_first_frac, x)))

Python: Pseudo-random color from a string

I am doing some data visualization with Python in Blender and I need to assign colors to the data being represented. There is too much data to spend time hand-picking colors for each, so I want to generate the colors pseudo-randomly - I would like a given string to always result in the same color. That way, if it appears more than once, it will be clear that it is the same item.
For example, given a list like ['Moose', 'Frog', 'Your Mother'], let's say Moose would always be maroon.
The colors are specified in RGB, where each channel is a float from 0.0 to 1.0.
Here's what I've tried so far that isn't working:
import random
import hashlib
def norm(x, min, max): # this normalization function has a problem
normalized = ( x - min(x) ) / (max(x) - min(x) )
return normalized
def r(string, val):
h = hash(string)
print(h)
if h < 0:
h = h * -1
rs = random.seed( int(h) + int(val) )
output = norm(rs, 0.0, 1.0)
return output
my_list = ['Moose', 'Frog', 'Your Mother']
item = my_list[0]
color = [ r(item,1), r(item,2), (item,1) ]
print(color)
It results in TypeError: 'float' object is not callable but I don't know why. I'm trying to normalize the way this answer demonstrates.
It might be best to have a list of possible colors, as it allows for control over the palette. Either way, I need a pseudo-random float in the range of 0.0 ~ 0.1.
You can try with this function. It returns a rgb value for the name by using the 3 first letters.
def name_color(name):
color = [(ord(c.lower())-97)*8 for c in name[:3]]
return color
It turns out I didn't need to normalize or do anything fancy. random() returns a range between 0.0 ~ 1.0 as it is. I just needed to take a closer look at how random.seed() is supposed to be used.
Here's my solution:
import random
import hashlib
def r(string, int): # pseudo-randomization function
h = hash( string + str(int) ) # hash string and int together
if h < 0: # ensure positive number
h = h * -1
random.seed(h) # set the seed to use for randomization
output = random.random() # produce random value in range 0.0 ~ 1.0
output = round(output, 6) # round to 6 decimal places (optional)
return output
my_list = ['Toyota', 'Tesla', 'Mercedes-Benz']
item = my_list[0] # choose which list item
color = [ r(item,0), r(item,1), r(item,2) ] # R,G,B values
print(item, color)
Output: Toyota [0.049121, 0.383824, 0.635146]
I may try the list approach later.

Finding nearest set of numbers given a position

I have a dictionary that looks something like so:
exons = {'NM_015665': [(0, 225), (356, 441), (563, 645), (793, 861)], etc...}
and another file that has a position like so:
isoform pos
NM_015665 449
What I want to do is print the range of numbers that the position in the file is the closest to and then print the number within that range of numbers that the value is closest to. For this case, I want to print (356, 441) and then 441. I've successfully figured out a way to print the number in the set of numbers that the value is closest to, but my code below only takes into account 10 values on either side of the numbers listed. Is there any way to take into account that there are a different amount of numbers between each set of ranges?
This is the code I have so far:
with open('splicing_reinitialized.txt') as f:
reader = csv.DictReader(f,delimiter="\t")
for row in reader:
pos = row['pos']
name = row['isoform']
ppos1 = int(pos)
if name in exons:
y = exons[name]
for i, (low,high) in enumerate(exons[name]):
if low -5 <= ppos1 <= high + 5:
values = (low,high)
closest = min((low,high), key = lambda x:abs(x-ppos1))
I would rewrite it as a minimum distance search:
if name in exons:
y = exons[name]
minDist = 99999 # large number
minIdx = None
minNum = None
for i, (low,high) in enumerate(y):
dlow = abs(low - ppos1)
dhigh = abs(high - ppos1)
dist = min(dlow, dhigh)
if dist < minDist:
minDist = dist
minIdx = i
minNum = 0 if dlow < dhigh else 1
print(y[minIdx])
print(y[minIdx][minNum])
This ignores the search range, just search for the minimum distance pair.
A functional alternative :). This might even run faster. It clearly is very RAM-friendly and can be easily parallelized due to the perks of functional programming. I hope you'll find it interesting enough to study.
from itertools import imap, izip, ifilter, repeat
def closest_point(position, interval):
""":rtype: tuple[int, int]""" # closest interval point, distance to it
position_in_interval = interval[0] <= position <= interval[1]
closest = min([(border, abs(position - border)) for border in interval], key=lambda x: x[1])
return closest if not position_in_interval else (closest[0], 0) # distance is 0 if position is inside an interval
def closest_interval(exons, pos):
""":rtype: tuple[tuple[int, int], tuple[int, int]]"""
return min(ifilter(lambda x: x[1][1], izip(exons, imap(closest_point, repeat(pos, len(exons)), exons))),
key=lambda x: x[1][1])
print(closest_interval(exons['NM_015665'], 449))
This prints
((356, 441), (441, 8))
The first tuple is a range. The first integer in the second tuple is the closest point in the interval, the second integer is the distance.

Using non integer values as in a function

I'm getting this error: 'TypeError: list indices must be integers, not float'
but the functions I'm using need to accept non integer values, otherwise my results are different...
Just to give you an idea, I have written some code that fits a gaussian to some data with a single peak. To do this, I need to calculate an estimated value for sigma. To get that, I've written two functions that are meant to look at the data, use the x value for the peak to find two points(r_pos and l_pos) which are either side of the peak and a set distance from the y axis (thresh). And from that I can get an estimated sigma(r_pos - l_pos).
This is all coming about from a piece of code that worked, but the mark sheet for my coursework says I need to use functions, so I'm trying to turn this:
I0 = max(y)
pos = y.index(I0)
print 'Peak value is',I0,'Counts per sec at' ,x[pos], 'degrees(2theta)'
print pos,I0
#left position
thresh = 10
i = pos
while y[i] > thresh:
i -= 1
l_pos = x[i]
#right position
thresh = 10
i = y.index(I0)
while y[i] > thresh:
i += 1
r_pos = x[i]
print r_pos
sigma0 = r_pos - l_pos
print sigma0
Into something that uses functions that can be called etc. This is my attempt:
def Peak_Find(x,y):
I0 = max(y)
pos = y.index(I0)
return I0, x[pos]
def R_Pos(thresh,position):
i = position
while y[i] > thresh:
i += 0.1
r_pos = x[i]
return r_pos
peak_y,peak_x = Peak_Find(x,y)
Right Position = R_Pos(10,peak_x)
peak_y = 855.0
Peak_x = 32.1 , by the way
It looks like you want to replace the line
i = position
With something like
i = x.index(position)
because position is a float, and you want the location in the array of position. You are using i to get the index of an array, and you must use ints to do this, hence using the .index method to return the (integer) location in the array.
You are better off writing the program this way because then the variable names will actually match what is contained in the variables.
def Peak_Find(x,y):
I0 = max(y)
pos = y.index(I0)
return I0, pos
def R_Pos(thresh,position):
while y[position] > thresh:
position += 1 # Not sure if this is what you want
r_pos = x[position]
return r_pos # Not sure what you want here... this is the value at x, not the position

Categories

Resources