Python OOP __Add__ matrices together (Looping Problem) - python

class Matrix:
def __init__(self, data):
self.data = data
def __repr__(self):
return repr(self.data)
def __add__(self, other):
data = []
for j in range(len(self.data)):
for k in range(len(self.data[0])):
data.append([self.data[k] + other.data[k]])
data.append([self.data[j] + other.data[j]])
data = []
return Matrix(data)
x = Matrix([[1,2,3],[2,3,4]])
y = Matrix([[10,10,10],[10,10,10]])
print(x + y,x + x + y)
I was able to get Matrices to add for 1 row by n columns, but when I tried to improve it for all n by n matrices by adding in a second loop I got this error.
Traceback (most recent call last):
line 24, in <module>
print(x + y,x + x + y)
line 15, in __add__
data.append([self.data[k] + other.data[k]])
IndexError: list index out of range

How about this:
class Matrix:
def __init__(self, data):
self.data = data
def __repr__(self):
return repr(self.data)
def __add__(self, other):
data = []
for j in range(len(self.data)):
data.append([])
for k in range(len(self.data[0])):
data[j].append(self.data[j][k] + other.data[j][k])
return Matrix(data)

Your code has a few problems... the first is basic logic of the addition algorithm
data.append([self.data[k] + other.data[k]])
this statement is highly suspect... data is a bidimensional matrix but here your are accessing it with a single index. data[k] is therefore a whole row and using + you are concatenating rows (probably not what you wanted, correct?). Probably the solution of highBandWidth is what you were looking for.
The second problem is more subtle, and is about the statement
self.data = data
This may be a problem because Python uses the so-called "reference semantic". Your matrix will use the passed data parameter for the content but without copying it. It will store a reference to the same data list object you passed to the constructor.
May be this is intentional but may be it's not... this is not clear. Is it ok for you that if you build two matrices from the same data and then change the content of a single element in the first also the content of the second changes? If this is not the case then you should copy the elements of data and not just assign the data member for example using
self.data = [row[:] for row in data]
or using copy.deepcopy from the standard copy module.
A third problem is that you are using just two spaces for indenting. This is not smart... when working in python you should use 4 spaces indenting and never use hard tabs chracters. Note that I said that doing this (using two spaces) is not smart, not that you are not smart so please don't take this personally (I even did the very same stupid error myself when starting with python). If you really want to be different then do so by writing amazing bug-free software in python and not by just using a bad indenting or choosing bad names for function or variables. Focus on higher-level beauty.
One last problem is that (once you really understand why your code didn't work) you should really read about python list comprehensions, a tool that can greatly simplify your code if used judiciously. You addition code could for example become
return Matrix([[a + b for a, b in zip(my_row, other_row)]
for my_row, other_row in zip(self.data, other.data)])
To a trained eye this is easier to read than your original code (and it's also faster).

Related

Python equivalent of java-streams piping

I mainly program in Java and I found that for data analysis python is more convenient.
I am looking for a way to pipe operations in a way that is equivalent to java streams. For example, I would like to do something like (I'm mixing java and python syntax).
(key, value) = Files.lines(Paths.get(path))
.map(line -> new Angle(line))
.filter(angle -> foo(angle))
.map(angle -> (angle, cosine(angle)))
.max(Comparator.comparing(Pair::getValue)
Here I take a list of lines from a file, convert each line into an Angle object, filter the angles by some parameter, then create a list of pairs and finally find the maximal pair. There may be multiple additional operations in addition, but the point is that this is one pipe passing the output of one operation into the next.
I know about python list comprehensions, however they seem to be limited to a single "map" and a single "filter". If I need to pipe several maps using comprehension, the expression soon becomes complicated (I need to put one comprehension inside another comprehension)
Is there a syntax construct in python that allows adding multiple operations in one command?
It is not difficult to achieve it by yourself, for example:
class BasePipe:
def __init__(self, data):
self.data = data
def filter(self, f):
self.data = [d for d in self.data if f(d)]
return self
def map(self, f):
self.data = [*map(f, self.data)]
return self
def __iter__(self):
yield from self.data
def __str__(self):
return str(self.data)
def max(self):
return max(self.data)
def min(self):
return min(self.data)
value = (
BasePipe([1, 2, 3, 4]).
map(lambda x: x * 2).
filter(lambda x: x > 4).
max()
)
And Gives:
8

How to access the __setitem__ definition | Syntax & Examples?

I have a matrix class similar to here:
class Matrix(object):
def __init__(self, m, n, init=True):
if init:
self.rows = [[0]*n for x in range(m)]
else:
self.rows = []
self.m = m
self.n = n
def __setitem__(self, idx, item):
self.rows[idx] = item
print("HERE")
...
I would like to set an element to a value of 2:
my_mat = 0000 my_mat = 0000
0000 -> 0200
0000 0000
0000 0000
and in my main() I set the element like this:
from matrix import Matrix
def main():
# Create matrix
my_mat = Matrix(4,3)
# Set element
my_mat[1][1] = 2
print(my_mat)
if __name__ == "__main__":
main()
The __setitem__ definition requires 3 args, (one which is self, that is provided automatically). So, id and item are needed. I have tried a number of different combinations to set an element of the matrix. When I try to set the element (above), "HERE" isn't printed. It appears that I'm not accessing the __setitem__ method at all.
How do I set an element using the __setitem__ def? Syntax and examples would be appreciated.
I have tried variations like:
my_mat(1,1) = 2
my_mat(1,1,2)
my_mat([1,1],2)
.... but all fail.
The __setitem__() method in the Matrix class in the linked ActiveState recipe is for accessing entire rows (likewise for the __getitem__() method).
So, to invoke its __setitem__() method would require something like the following, which first retrieves the entire row, changes a single element of it, and then stores the entire row back into the matrix at the same row index:
def main():
# Create matrix
my_mat = Matrix(4,3)
# Set single element, logically equivalent to my_mat[1][1] = 2
row = my_mat[1] # Uses Matrix.__getitem__()
row[1] = 2 # Change single element of row.
my_mat[1] = row # Uses Matrix.__setitem__() to replace row
print(my_mat)
As written, the recipe does not provide a way to do this all with a single line of code. If it's going to be done frequently, you're will need to modify the definition of the the Matrix class itself.
If you can't figure out how to do that, find a different recipe that supports it (or ask another question here. I suppose).
Parting thought: I suggest you look into getting and installing the numpy add-on module (it's not built-in to Python). It also has a number of other useful features and is very fast.
Minor Update
For fun, I figured-out a very hacky and unreadable way to do it in a single line of code:
# Logically equivalent to my_mat[1][1] = 2
my_mat[1] = (lambda r, i, v: (r.__setitem__(i, v), r)[1])(my_mat[1], 1, 2)
This defines an inline anonymous function that accepts 3 arguments, r, i, and v:
r: row to modify
i: index of item within the row
v: new value for item
This function calls row r's __setitem__() method (each row being a sublist in the recipe) to modify the item's value within the row, and then returns the modified row, which is assigned back to same row position it was retrieved from (overwriting its original value).
The Matrix class' __setitem__() will get called to perform the final step of replacing the entire row.
(I'm not recommending you do it this way...but hopefully this will give you the an idea about what a new method to the class would have o o since that would need to do something similar.)
I think in this situation you would want to define __getitem__ and have it return the proper row:
def __getitem__(self, item):
if isinstance(item, int):
return self.rows[item]
return super().__getitem__(item)
Or, alternatively, define a more complex __setitem__ as described here
Also, unrelated to your question, note that the else branch in your constructor will initialize rows as a one-dimensional list instead of two-dimensional, not sure if you meant to do that.

Small python program involving newton method

I'm trying to write a small program in python that involves(among other things) Newton method, but i'm encountering several problems, that are probably pretty basic, but since I'm new at programming, i cant overcome..
First i defined the function and it's derivative:
import math
def f(x,e,m):
return x-e*math.sin(x)-m
def df(x,e):
return 1-e*math.cos(x)
def newtons_method(E0,m,e,q):#q is the error
while abs(f(E0,e,m))>q:
E=E0-f(E0,e,m)/df(E0,e)
E0=E
return (E0)
def trueanomaly(e,E):
ta=2*math.arctan(math.sqrt((1+e)/(1-e))*math.tan(E))
return (ta)
def keplerianfunction(T,P,e,K,y,w):
for t in frange (0,100,0.5):
m=(2*math.pi*((t-T)/P))
E0=m+e*math.sin(m)+((e**2)/2)*math.sin(2*m)
newtons_method(E0,m,e,0.001)
trueanomaly(e,E0)
rv=y+K*(e*math.cos(w)+math.cos(w+ta))
return (ta)","(rv)
def frange(start, stop, step):
i = start
while i < stop:
yield i
i += step
The question is that this keeps giving me errors, indentation errors and stuff, especially in the keplerianfunction ... Can someone help me? what am I doing wrong here?
Thank you in advance!
Many things are wrong with this code, and I don't know what the desired behaviour is so can't guarantee that this will help, but I'm going to try and help you debug (although it looks like you mostly need to re-read your Python coursebook...).
First, in most languages if not all, there is a thing called the scope: a variable, function, or any other object, exists only within a certain scope. In particular, variables exist only in the scope of the function that they are defined in. This means that, to use the result of a function, you first need to return that result (which you are doing), and when you call that function you need to store that result into a variable, for example ta = trueanomaly(e, E0).
Then, you don't really need to use brackets when returning values, even if you want to return multiple values. If you do want to return multiple values though, you just need to separate them with a comma, but not with a string character of a comma: write return ta, rv instead of return ta","rv.
Finally, you seem to be iterating over a range of values, yet you don't return the whole range of values but either the first value (if your return is in the for loop), or the last one (if your return is under the for loop). Instead, you may want to store all the ta and rv values into one/two lists, and return that/those lists in the end, for example:
def keplerianfunction(T,P,e,K,y,w):
# Initialise two empty lists
tas = []
rvs = []
for t in frange (0,100,0.5):
m = 2*math.pi*((t-T)/P)
E0 = m+e*math.sin(m)+((e**2)/2)*math.sin(2*m)
E0 = newtons_method(E0,m,e,0.001)
ta = trueanomaly(e,E0)
rv = y+K*(e*math.cos(w)+math.cos(w+ta))
# At each step save the value for ta and rv into the appropriate list
tas.append(ta)
rvs.append(rv)
# And finally return the lists
return (tas,rvs)
Also, why using a new frange function when range has the exact same behaviour and is probably more efficient?

cython: reducing the size of a class, reduce memory use, improve speed [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I am have a relatively simple problem: given a position in the genome, return the name of the gene at that point.
The way I am solving this problem right now is using the following class in cython::
class BedFile():
""" A Lookup Object """
def __init__(self, bedfile):
self.dict = {}
cdef int start, end
with open(bedfile) as infile:
for line in infile:
f = line.rstrip().split('\t')
if len(f) < 4:
continue
chr = f[0]
start = int(f[1])
end = int(f[2])
gene = f[3]
if chr not in self.dict:
self.dict[chr] = {}
self.dict[chr][gene] = (start, end)
def lookup(self, chromosome, location):
""" Lookup your gene. Returns the gene name """
cdef int l
l = int(location)
answer = ''
for k, v in self.dict[chromosome].items():
if v[0] < l < v[1]:
answer = k
break
if answer:
return answer
else:
return None
The full project is here: https://github.com/MikeDacre/python_bed_lookup, although the entire relevant class is above.
The issue with the code as is is that the resulting class/dictionary take up a very large amount of memory for the human genome, with 110 million genes (that's a 110 million line long text file). I killed the init function in the process of building the dictionary after about two minutes, when it hit 16GB of memory. Anything that uses that much memory is basically useless.
I am sure there must me a more efficient way of doing this, but I am not a C programmer, and I am very new to cython. My guess is that I could build a pure C structure of some kind to hold the gene name and the start and end values. Then I could convert lookup() into a function that calls another cdef function called _lookup(), and use that cdef function to do that actual query.
In an ideal world, the whole structure could live in memory and take up less than 2GB of memory for ~2,000,000 entries (each entry with two ints and a string).
Edit:
I figured out how to do this efficiently with sqlite for large file, to see the complete code with sqlite see here: https://github.com/MikeDacre/python_bed_lookup
However, I still think that the class above can be optimized with cython to make the dictionary smaller in memory and lookups faster, any suggestions are appreciated.
Thanks!
To expand on my suggestion in the comments a bit, instead of storing (start,end) as a tuple, store it as a simple Cython-defined class:
cdef class StartEnd:
cdef public int start, end
def __init__(self, start, end):
self.start = start
self.end = end
(you could also play with changing the integer type for more size savings). I'm not recommending getting rid of the Python dictionaries because they're easy to use, and (I believe) optimised to be reasonably efficient for the (common in Python) case of string keys.
We can estimate the rough size savings by using sys.getsizeof. (Be aware that this will work well for built-in classes and Cython classes, but not so well for Python classes so don't trust it too far. Also be aware that the results are platform dependent so yours may differ slightly).
>>> sys.getsizeof((1,2)) # tuple
64
>>> sys.getsizeof(1) # Python int
28
(therefore each tuple contains 64+28+28=120 bytes)
>>> sys.getsizeof(StartEnd(1,2)) # my custom class
24
(24 makes sense: it's the PyObject_Head (16 bytes: a 64bit integer and a pointer) + 2 32-bit integers).
Therefore, 5 times smaller, which is a good start I think.
In my limited experience with cython and numpy, it is most profitable to use cython for 'inner' calculations that don't need to use Python/numpy code. They are iterations that can be cast to compact and fast C code.
Here's a rewrite of your code, splitting out two classes could be recast as Cython/C structures:
# cython candidate, like DavidW's StartEnd
class Gene(object):
def __init__(self, values):
self.chr = values[0]
self.start = int(values[1])
self.end = int(values[2])
self.gene = values[3]
def find(self, i):
return self.start<=i<self.end
def __repr__(self):
return "%s(%s, %d:%d)"%(self.chr,self.gene,self.start,self.end)
# cython candidate
class Chrom(list):
def add(self, aGene):
self.append(aGene)
def find(self, loc):
# find - analogous to string find?
i = int(loc)
for gene in self:
if gene.find(i):
return gene # gene.gene
return None
def __repr__(self):
astr = []
for gene in self:
astr += [repr(gene)]
return '; '.join(astr)
These would be imported and used by a high level Python function (or class) that does not need to be in the Cython .pdx file:
from collections import defaultdict
def load(anIterable):
data = defaultdict(Chrom)
for line in anIterable:
f = line.rstrip().split(',')
if len(f)<4:
continue
aGene = Gene(f)
data[aGene.chr].add(aGene)
return data
Use with a file or a text simulation:
# befile = 'filename'
# with open(bedfile) as infile:
# data = load(infile)
txt = """\
A, 1,4,a
A, 4,8,b
B, 3,5,a
B, 5,10,c
"""
data = load(txt.splitlines())
print data
# defaultdict(<class '__main__.Chrom'>, {
# 'A': A(a, 1:4); A(b, 4:8),
# 'B': B(a, 3:5); B(c, 5:10)})
print 3, data['A'].find(3) # a gene
print 9, data['B'].find(9) # c gene
print 11,data['B'].find(11) # none
I could define a find function that defers to a method if available, otherwise uses its own. This is analogous to numpy functions that delegate to methods:
def find(chrdata, loc):
# find - analogous to string find?
fn = getattr(chrdata, 'find',None)
if fn is None:
#raise AttributeError(chrdata,'does not have find method')
def fn(loc):
i = int(loc)
for gene in chrdata:
if gene.find(i):
return gene # gene.gene
return None
return fn(loc)
print 3, find(data['A'],3)
Test the find with an ordinary list data structure:
def loadlist(anIterable):
# collect data in ordinary list
data = defaultdict(list)
for line in anIterable:
f = line.rstrip().split(',')
if len(f)<4:
continue
aGene = Gene(f)
data[aGene.chr].append(aGene)
return data
data = loadlist(txt.splitlines())
print 3, find(data['A'],3)

List/matrix is not saving the correct values

I have a weird problem with an assignment I got. We are supposed to implement a matrix class. Well, it's not that hard, but Python just won't do as I tell it to. But I'm sure there is an explanation.
The problem is that, in the following code, I try to save values (provided in a list) into a matrix.
class simplematrix:
matrix = [[]]
def __init__(self, * args):
lm = args[0]
ln = args[1]
values = args[2]
self.matrix = [[0]*ln]*lm
for m in range(lm):
for n in range(ln):
self.matrix[m][n] = values[(m*ln)+n]
vals = [0,1,2,3,4,5]
a = simplematrix(2,3,vals)
When I try to print the matrix, I expect to get [[0,1,2],[3,4,5]], which I get if I run it by hand, on a piece of paper. If I print the matrix from Python I get [[3,4,5],[3,4,5]] instead.
Can anyone tell me why Python acts like this, or if I made some stupid mistake somewhere? :)
The problem is in [[0]*ln]*lm. The result consists of lm references to the same list, so when you modify one row, all rows appear to change.
Try:
self.matrix = [[0]*ln for i in xrange(lm)]
The answers by Tim and aix correct your mistake, but that step isn't even necessary, you can do the whole thing in one line using a list comprehension:
self.matrix = [[values[m*ln+n] for n in range(ln)] for m in range(lm)]
You can also say:
vals = range(6)
as opposed to what you already have. This tidies up your code and makes it more Pythonic.
The problem is that self.matrix = [[0]*ln]*lm doesn't give you a list of lm separate sublists, but a list of lm references to the single same list [[0]*ln].
Try
self.matrix = [[0]*ln for i in range(lm)]
(If you're on Python 2, use xrange(lm) instead).

Categories

Resources