Python ete3 - Is there a way to stretch branches of phylogenetic trees? - python

I am trying to read a phylogenetic tree and stretch its branches to be bigger or smaller than the original, but I didn't find how.
The stretch needs to be on the tree itself - not on its visualization.
For example, the following code reads a tree and presents it:
from ete3 import Tree
t = Tree("(2azaa:0.1871453443,1dz0a:0.1944528747, (((1joi:0.1917345578,1nwpa:0.206793251):0.2050584423,"
"(1jzga:0.3027313573,1rkra:0.2710518895):0.08148637118):0.06756061176,(1cuoa:0.2959705289,"
"((1qhqa:0.585997308,1gy1a:2.509606787):0.1590837051,(1kdj:0.9427371887,"
"((1iuz:0.1918780006,7pcy:0.2035503755):0.1750205426,((2plt:0.2727097306,(2b3ia:0.6259053315,"
"(((1bawa:0.3036227494,1nin:0.5134587308):0.1375675558,((2raca:0.4617882857,1id2a:0.3274320042):0.7764884063,"
"(1pmy:0.7017063073,(1bqk:0.2214168026,(1adwa:0.4171298259,1paz:0.4214910379):0.08599165577):0.2074622534):0.9354371144):0.4486761297)"
":0.1105387947,(1m9wa:0.4551681561,1bxva:0.3931722476):0.06879588421):0.1131812572):0.4242876607):0.1447393581,"
"(1plb:0.2176281022,(1byoa:0.2314554253,(9pcy:0.2456728049,(1ag6:0.1776514893,1plc:0.318467746):0.02728470893)"
":0.07383541027):0.1260361833):0.2659408726):0.05013755844):0.2637791318):1.001560925):1.018869112):0.4609302267):0.1807238866);")
t.show()
The following link discusses how to use the library, but I didn't find what I was looking for:
http://etetoolkit.org/docs/latest/tutorial/tutorial_trees.html
Can anyone help?
Edit: If there are other Python libraries that can do that, I would love to hear which and how it's done.
Edit2: I know that in R there is a library named "ape" then can do it very simply... maybe someone who works with it knows the parallel operation in some python library?

After a long time I found a solution:
As far as I know there are no built in functions to stretch trees in phylogentic python libraries. This is very strange and I hope I am wrong.
However once you understand their data-structures there is in easy way to do it: all you need to do is just to run over all the edges in the trees and multiply them by the desired factor. This is done differently, depending on which library you use.
Here are two examples how to stretch the trees twice their size using dendropy and ete3:
from ete3 import Tree
import dendropy as dp
original_tree = "(2azaa:0.1871453443,1dz0a:0.1944528747,(((1joi:0.1917345578,1nwpa:0.206793251):0.2050584423,"\
"(1jzga:0.3027313573,1rkra:0.2710518895):0.08148637118):0.06756061176,(1cuoa:0.2959705289,"\
"((1qhqa:0.585997308,1gy1a:2.509606787):0.1590837051,(1kdj:0.9427371887,"\
"((1iuz:0.1918780006,7pcy:0.2035503755):0.1750205426,((2plt:0.2727097306,(2b3ia:0.6259053315,"\
"(((1bawa:0.3036227494,1nin:0.5134587308):0.1375675558,((2raca:0.4617882857,1id2a:0.3274320042):0.7764884063,"\
"(1pmy:0.7017063073,(1bqk:0.2214168026,(1adwa:0.4171298259,1paz:0.4214910379):0.08599165577):0.2074622534):0.9354371144):0.4486761297)"\
":0.1105387947,(1m9wa:0.4551681561,1bxva:0.3931722476):0.06879588421):0.1131812572):0.4242876607):0.1447393581,"\
"(1plb:0.2176281022,(1byoa:0.2314554253,(9pcy:0.2456728049,(1ag6:0.1776514893,1plc:0.318467746):0.02728470893)"\
":0.07383541027):0.1260361833):0.2659408726):0.05013755844):0.2637791318):1.001560925):1.018869112):0.4609302267):0.1807238866);"
#dendropy test
print("These are the dendropy results:")
t1 = dp.Tree.get_from_string(original_tree,"newick")
t2 = dp.Tree.get_from_string(original_tree,"newick")
for edge in t2.levelorder_edge_iter():
if(edge.length == None):
continue
edge.length *=2
print(t1)
print(t2)
#ete3 test
print("These are the ete3 results:")
t3 = Tree(original_tree)
t4 = Tree(original_tree)
for node in t4.iter_descendants():
node.dist*=2
print(t3.write())
print(t4.write())
Another lesson we can learn from this case - always do your homework on the data-structure you work with before you search for a built in function...

So far i didn't found a way to do this stretch...
I made a simple code that runs over the string that represents the tree, finds the numbers (which are the brunches lengths) and multiply then by 2.
This is a patch and not a real solution... still hoping someone will have an idea.
for c in original_tree:
if is_number(c) or c=='.':
number+=c
else:
if len(number)<5:
stretched_tree+=number
number=""
elif number!="":
stretched_tree+=str(float(number)*2)
number = ""
stretched_tree+=c

Related

What is the best way to test performance of different functions in my algorithm implemented in Python?

I have tried to search the site but I have not been able to find the answer that I'm looking for, I would like for someone to point me in the right direction.
I have been developing a deterministic k-means algorithm and I'm trying to test it's performance using different initialization functions, and I was wondering what is the best way to do this. Currently this is what I'm doing:
# . . .
if(init == RANDOM):
c_means = (unique_datap[np.random.choice(unique_datap.shape[0], k, replace = False)]).astype(np.float32)
elif(init == FFT):
c_means = deterministic_fft(unique_datap, el_count, k, rgb_distance).astype(np.float32)
elif(init == UMDI):
c_means = uniform_mode_dist_init(unique_datap, el_count, k, rgb_distance).astype(np.float32)
else:
raise ValueError('Unknown initialization')
# . . .
Obviously this is not very practical nor elegant, what could be a better way to do this. I'm interested in saving the outputs of the algorithm as well as measuring it's execution time and maybe the amount of memory it used. Any help or direction would be appreciated, I'm not really looking for a quick answer (although it would be still greatly appreciated), I can take all the time I want to read whatever source you can point me to.

terrain gen python constraints

in my free time I'm making a text-based/ascii(for now) rogue-like game as a study(relevant because context for question)
At the moment, I'm trying to generate the terrain/the rooms that will be used in the world.
The world should be 'endless'.
Generating random terrain isn't the big issue.
I'm struggling with finding a way to maintainably add constraints like:
'plains can not be next to mountain'
I could build a big decisiontree, however, this would mean an if currentTile == plain: if not next to mountain and an if currentTile == mountain: if not text to plains.
this not maintainable, since every rule has to be implemented on 2 places.
I'm wondering what standard solutions for this type of issues exist?
Greetings
I'm not entirely sure I follow your example excerpt but you could have a list of 2 element sets, each containing disallowed combinations. Then you could look up
disallowed = [set(plain, mountains)]
if set(currentTile, newTile) not in disallowed:
#rest of code

Floating point calculations debugging

So I recently decided to learn python and as a exercise (plus making something useful) I decided to make a Euler's Modified Method algorithm for solving higher-then-first order differential equations. An example input would be:
python script_name.py -y[0] [10,0]
where the first argument is the deferential equation (here: y''=-y), and the second one the initial conditions (here: y(0)=10, y'(0)=0). It is then meant to out put the resusts to two files (x-data.txt, and y-data.txt).
Heres the problem:
When in run the code with the specified the final line (at t=1) reads -0.0, but if you solve the ODE (y=10*cos(x)), it should read 5.4. Even if you go through the program with a pen and paper and execute the code, your (and the computers) results apart to diverge by the second iteration). Any idea what could have caused this?
NB: I'm using python 2.7 on a os x
Here's my code:
#! /usr/bin/python
# A higher order differential equation solver using Euler's Modified Method
import math
import sys
step_size = 0.01
x=0
x_max=1
def derivative(x, y):
d = eval(sys.argv[1])
return d
y=eval(sys.argv[2])
order = len(y)
y_derivative=y
xfile = open('x-data.txt','w+')
yfile = open('y-data.txt','w+')
while (x<x_max):
xfile.write(str(x)+"\n")
yfile.write(str(y[0])+"\n")
for i in range(order-1):
y_derivative[i]=y[(i+1)]
y_derivative[(order-1)] = derivative(x,y)
for i in range(order):
y[i]=y[i]+step_size*y_derivative[i]
x=x+step_size
xfile.close()
yfile.close()
print('done')
When you say y_derivative=y they are the SAME list with different names. I.e. when you change y_derivative[i]=y[i+1] both lists are changing. You want to use y_derivative=y[:] to make a copy of y to put in y_derivative
See How to clone or copy a list? for more info
Also see http://effbot.org/zone/python-list.htm
Note, I was able to debug this in IDLE by replacing sys.argv with your provided example. Then if you turn on the debugger and step through the code, you can see both lists change.

Lattice paths algorithm does not finish running for 20 X 20 grid

I wrote the following code in python to solve
problem 15 from Project Euler:
grid_size = 2
def get_paths(node):
global paths
if node[0] >= grid_size and node[1] >= grid_size:
paths += 1
return
else:
if node[0]<grid_size+1 and node[1] < grid_size+1:
get_paths((node[0]+1,node[1]))
get_paths((node[0],node[1]+1))
return paths
def euler():
print get_paths((0,0))
paths = 0
if __name__ == '__main__':
euler()
Although it runs quite well for a 2 X 2 grid, it's been running for hours for a 20 X 20 grid. How can I optimise the code so that it can run on larger grids?
Is it a kind of breadth first search problem? (It seems so to me.)
How can I measure the complexity of my solution in its current form?
You might want to look into the maths behind this problem. It's not necessary to actually iterate through all routes. (In fact, you'll never make the 1 minute mark like that).
I can post a hint but won't do so unless you ask for it, since I wouldn't want to spoil it for you.
Edit:
Yes, the algorithm you're using will never really be optimal since there's no way to reduce the search space of your problem. This means that (as pg1989 stated) you'll have to look into alternative means of solving this problem.
As sverre said looking over here might give a nudge in the right direction:
http://en.wikipedia.org/wiki/Binomial_coefficient
A direct solution may be found here (warning, big spoiler):
http://www.joaoff.com/2008/01/20/a-square-grid-path-problem/
Your algorithm is exponential, but only because you are re-evaluating get_paths with the same input many times. Adding Memoization to it will make it run in time. Also, you'll need to get rid of the global variable, and use return values instead. See also Dynamic Programming for a similar idea.
When solving problems on Project Euler, think about the math behind the problem for a long time before starting to code. This problem can be solved without any code whatsoever.
We're trying to count the number of ways through a grid. If you observe that the number of moves down and right do not change regardless of the path, then you only need to worry about the order in which you move down and right. So in the 2x2 case, the following combinations work:
DDRR
DRDR
RDRD
RRDD
RDDR
DRRD
Notice that if we pick where we put the R moves, the placement of the D moves is determined. So really we only have to choose, from the 4 movement slots available, which get the R moves. Can you think of a mathematical operation that does this?
Probably not the way the project Euler guys wanted this problem to be solved but the answer is just the central binomial coefficient of a 20x20 grid.
Using the formula provided at the wiki article you get:
from math import factorial, pow
grid = 20
print int(factorial(2 * grid) / pow(factorial(grid), 2))
The key is not to make your algorithm run faster, as it will (potentially) run in exponential time, no matter how fast each step is.
It is probably better to find another way of computing the answer. Using your (expensive, but correct) solution as a comparison for small values is probably a sanity-preserver during the algorithm optimization effort.
This question provides some good insight into optimization. The code is in c# but the algorithms are applicable. Watch out for spoilers, though.
Project Euler #15
It can be solved by simple observation of the pattern for small grids, and determining a straightforward formula for larger grids. There are over 100 billion paths for a 20x20 grid and any iterative solution will take too long to compute.
Here's my solution:
memo = {(0, 1) : 1, (1, 0) : 1}
def get_pathways(x, y):
if (x, y) in memo : return memo[(x, y)]
pathways = 0
if 0 in (x, y):
pathways = 1
else:
pathways = get_pathways(x-1, y) + get_pathways(x, y-1)
memo[(x, y)] = pathways
return pathways
enjoy :)

python: Chess moves validation

Does anybody know if there is a free python chess moves validation function available somewhere?
What I need. I have a diagram stored as a string, and move candidate. What I need is to see if move candidate is valid for the diagram.
Would be really interested to see examples, if possible.
The string looks this way:
ememememememememememememememememememembbememwpemememememememwpemembkememememememememememememememememwbembrememememwkemememememem
I understand it may seem stupid, but I find it the easiest to encode position this way. Move candidate for me is just another such position (which happened after next move, can change this behavior I think)
You are missing information e.g. whose turn to move, whether each king has ever moved (means castling is not allowed), the "en passant" status of each pawn. That aside, it would be a very instructive exercise for you to write your own, using a not-very-complicated board representation like the 10x12-element array described here (except that you'd linearise it to a 120-element array).
I know this is a rather old question, but my brother and me were looking for the same thing and we came across this awesome little python module called Chessnut.
Here is an example of its use:
#!/usr/bin/python
from Chessnut import Game
chessgame = Game(fen="rnbq1rk1/ppppp1bp/5np1/5p2/2PP4/2NBPN2/PP3PPP/R1BQK2R b KQ - 4 6")
print chessgame
print chessgame.get_moves()
# apply a move
chessgame.apply_move(chessgame.get_moves()[1])
print chessgame
and here the generated output:
rnbq1rk1/ppppp1bp/5np1/5p2/2PP4/2NBPN2/PP3PPP/R1BQK2R b KQ - 4 6
['b8a6', 'b8c6', 'd8e8', 'f8e8', 'f8f7', 'g8h8', 'g8f7', 'a7a6', 'a7a5', 'b7b6', 'b7b5', 'c7c6', 'c7c5', 'd7d6', 'd7d5', 'e7e6', 'e7e5', 'g7h8', 'g7h6', 'h7h6', 'h7h5', 'f6e8', 'f6d5', 'f6e4', 'f6g4', 'f6h5', 'g6g5', 'f5f4']
r1bq1rk1/ppppp1bp/2n2np1/5p2/2PP4/2NBPN2/PP3PPP/R1BQK2R w KQ - 5 7
Awesome! :)
Thanks cgearhart!
Just use the source of one of the Python Chess programs like PyChess or Python Chess
Specifically, the valid moves for pychess: https://code.google.com/p/pychess/source/browse/lib/pychess/Utils/lutils/validator.py
Wouldn't hurt to look at some of the related answers on the side: Chess move validation library and https://stackoverflow.com/questions/1239913/smallest-chess-playing-program stand out to me.
Though personally I'm in favor of building your own.
Check out ChessBoard.
Unfortunately it has some drawbacks:
it seems to be abandoned, because the bugs reported more than one year ago in the comments don't seem to be fixed
the code is not really PEP-8 compliant
some methods are very ugly and big, not all methods have docstrings
there are no unit tests, so digging into that code might be a challenge (I've already tried it at least twice and failed)
The good thing is that the code is GPL so you can play with it as long as you stick to that license.
I've made a simple chess implementation with move validation here: https://github.com/akulakov/pychess
Validation logic is in each piece's "moves()" method, and you can validate your own move by generating full list of moves and checking if your move is there.

Categories

Resources