Error when trying to round values in an ndarray - python

I am working on a memory-based collaborative filtering algorithm. I am building a matrix that I want to write into CSV file that contains three columns: users, app and ratings.
fid = fopen('pred_ratings.csv','wt');
for i=1:user_num
for j=1:item_num
if R(j,i) == 1
entry = Y(j,i);
else
entry = round(P(j,i));
end
fprintf(fid,'%d %d %d\n',i,j,entry);
end
end
fclose(fid);
The above code is a MATLAB implementation of writing a multidimensional matrix into a file having 3 columns. I tried to imitate this in python, using:
n_users=816
n_items=17
f = open("guru.txt","w+")
for i in range(1,n_users):
for j in range(1,n_items):
if (i,j)==1 in a:
entry = data_matrix(j, i)
else:
entry = round(user_prediction(j, i))
print(f, '%d%d%d\n', i, j, entry)
f.close
But this results in the following error:
File "<ipython-input-198-7a444566e1ce>", line 7, in <module>
entry = round(user_prediction(j, i))
TypeError: 'numpy.ndarray' object is not callable
What can be done to fix this?

numpy uses square brackets for indexing. Since user_predictions is a numpy array, it should be indexed as
user_predictions[i, j]
The same goes for data_matrix.
You should probably read the Numpy for MATLAB users guide.
Edit:
Also, the
if (i,j)==1 in a:
line is very dubious. (i, j) is a tuple of two integers, which means it will never be equal to 1. That line is thus equivalent to if False in a: which is probably not what you want.

Related

What is the equivalent of coo_matrix in Matlab?

I'm trying to translatethe following lines of code from Python to MATLAB. V, Id, and J are of size (6400,) which in MATLAB are 1 -by- 6400 row vectors. pts is of size 242.
My Python code
A = coo_matrix((V, (Id, J)), shape=(pts.size, pts.size)).tocsr()
A = A.tobsr(blocksize=(2, 2))
I translated the first line as follows to MATLAB
A = sparse(V,Id,J,242,242);
However, I got the error
Error using sparse
Index into matrix must be an integer.
How can I translate this code to MATLAB?
The MATLAB sparse function has several forms:
S = sparse(A)
S = sparse(m,n)
S = sparse(i,j,v)
S = sparse(i,j,v,m,n)
S = sparse(i,j,v,m,n,nz)
The form you are most likely looking for is the fourth one: S = sparse(i,j,v,m,n), and will want to call it (using your use case) as:
A = sparse(Id, J, V, 242, 242);
I think your error is that MATLAB wants the I and J indices first, followed by the value and you are making the value the first argument.

How to fix: "Int object not iterable" when assigning variables to two lists?

I tried making a question on this earlier and did a horrible job of explaining what I wanted. Hopefully the information I provide in this one is more helpful.
The program I am trying to make will take read input from a file in the form of the following: (there will be multiple varying test cases)
7 10
4 8
The program will assign a variable to the top-right integer (in this case, 10) and the bottom-left integer (4). The program will then compute the difference of the two variables. Here is the code I have so far -
with open('C:\\Users\\ayush\\Desktop\\USACO\\paint\\paint_test.in', 'r') as fn:
matrix = fn.readlines()
input_array = []
for line in matrix:
input_array.append(line.strip())
for p,q in enumerate(input_array):
for x,y in enumerate(p):
pass
for a,b in enumerate(q):
pass
print(y - a)
When I, however, run this code I get the following error:
Traceback (most recent call last):
File "C:\Users\ayush\Desktop\USACO\paint\paint.py", line 16, in <module>
for x,y in enumerate(p):
TypeError: 'int' object is not iterable
[Finished in 0.571s]
I'm not sure as to what the problem is, and why my lists cannot be iterated.
I hope I did a better job explaining my goal this time. Please let me know if there are any additional details I could try to provide. I would really appreciate some help - I've been stuck on this for the longest time.
Thanks!
Were you going for something along the lines of:
with open('C:\\Users\\ayush\\Desktop\\USACO\\paint\\paint_test.in', 'r') as fn:
matrix = fn.readlines()
input_array = []
for line in matrix:
input_array.append(line.strip())
top_line, bottom_line = input_array # previously p, q
top_right, top_left = top_line.split() # previously x, y
bottom_right, bottom_lefft = bottom_line.split() # previously a, b
print(int(top_left) - int(bottom_right)) # you would have run into issue subtracting strings without the int() calls
?
If so, that should work, but you can avoid all the unpacking if you just use [0] and [-1] indexes to get the first and last items (this has the advantage of working on a matrix of any size):
with open('C:\\Users\\ayush\\Desktop\\USACO\\paint\\paint_test.in', 'r') as fn:
lines = fn.read().splitlines()
matrix = [
[
int(item)
for item in line.split()
]
for line in lines
]
top_left = matrix[0][-1]
bottom_right = matrix[-1][0]
print(top_left - bottom_right)

Push data to google sheet from dataframe

I'm trying to push data into my google sheet with the following code, how can i change the code so that it will print in the 2nd row at the correct column base on the header that I've created.
First code:
class Header:
def __init__(self):
self.No_DOB_Y=1
self.No_DOB_M=2
self.No_DOB_D=3
self.Paid_too_much_little=4
self.No_number_of_ins=5
self.No_gender=6
self.No_first_login=7
self.No_last_login=8
self.Too_young_old=9
def __repr__(self):
return str(self.__dict__)
def add_col(self,name):
setattr(self,name,max(anomali_header.__dict__.values())+1)
anomali_header=Header()
2nd part of code (NEW):
# No_gender
a = list(df.loc[df['gender'].isnull()]['id'])
#print(a)
cells=sh3.range(1,1,len(a),1)
for i,cell in enumerate(cells):
cell.value=a[i]
sh3.update_cells(cells)
At the moment it updates into A1 cell....
This is what I essentially want to
As you can see, the code writes the results onto the first available cell which is A1, i essentially want it to appear at the bottom of my anomali_header of "No_gender" but I'm not sure how to link my 1st part of the code to the 2nd part of the code...
Thanks to v25, the code below works, but rather than going through the code one by one, i wanted to create a loop which goes through all the function
I'm trying to run the code below, but it seems I get an error when I use the loop.
Error:
TypeError: 'list' object cannot be interpreted as an integer
Code:
# No_DOB_Y
a = list(df.loc[df['Year'].isnull()]['id'])
# No number of ins
b = list(df.loc[df['number of ins'].isnull()]['id'])
# No_gender
c = list(df.loc[df['gender'].isnull()]['id'])
# Updating anomalies to sheet
condition = [a,b,c]
column = [1,2,3]
for j in range(column,condition):
cells=sh3.range(2,column,len(condition)+1,column)
for i,cell in enumerate(cells):
cell.value=condition[i]
print('end of check')
sh3.update_cells(cells)
You need to change the range() parameters:
first_row (int) – Row number
first_col (int) – Row number
last_row (int) – Row number
last_col (int) – Row number
So something like:
cells=sh3.range(2, 6, len(a)+1, 6)
Or you could issue the range as a string:
cells=sh3.range('F2:F' + str(len(a)+1))
These numbers may not be perfect, but this should change the positioning. You might need to tweak the digits slightly ;)
UPDATE:
I've encountered an error use a loop, updated my original post
TypeError: 'list' object cannot be interpreted as an integer
This is happneing because the function range which you use in the for loop (not to be confused with sh3.range which is a different function altogether) expects integers, but you're passing it lists.
However, a simpler way to implement this would be to create a list of tuples which map the strings to column integers, then loop based on this. Something like:
col_map = [ ('Year', 1),
('number of ins', 5),
('gender', 6)
]
for col_tup in col_map:
df_list = list(df.loc[df[col_tup[0]].isnull()]['id'])
cells = sh3.range(2, col_tup[1], len(df_list)+1, col_tup[1])
for i, cell in enumerate(cells)
cell.value=df_list[i]
sh3.update_cells(cells)

Convert Matlab to Python

I'm converting matlab code to python, and I'm having a huge doubt on the following line of code:
BD_teste = [BD_teste; grupos.(['g',int2str(l)]).('elementos')(ind_element,:),l];
the whole code is this:
BD_teste = [];
por_treino = 0;
for l = 1:k
quant_elementos_t = int64((length(grupos.(['g',int2str(l)]).('elementos')) * por_treino)/100);
for element_c = 1 : quant_elementos_t
ind_element = randi([1 length(grupos.(['g',int2str(l)]).('elementos'))]);
BD_teste = [BD_teste; grupos.(['g',int2str(l)]).('elementos')(ind_element,:),l];
grupos.(['g',int2str(l)]).('elementos')(ind_element,:) = [];
end
end
This line of code below is a structure, as I am converting to python, I used a list and inside it, a dictionary with its list 'elementos':
'g',int2str(l)]).('elementos')
So my question is just in the line I quoted above, I was wondering what is happening and how it is occurring, and how I would write in python.
Thank you very much in advance.
BD_teste = [BD_teste; grupos.(['g',int2str(l)]).('elementos')(ind_element,:),l];
Is one very weird line. Let's break it down into pieces:
int2str(l) returns the number l as a char array (will span from '1' until k).
['g',int2str(l)] returns the char array g1, then g2 and so on along with the value of l.
grupos.(['g',int2str(l)]) will return the value of the field named g1, g2 and so on that belongs to the struct grupos.
grupos.(['g',int2str(l)]).('elementos') Now assumes that grupos.(['g',int2str(l)]) is itself a struct, and returns the value of its field named 'elementos'.
grupos.(['g',int2str(l)]).('elementos')(ind_element,:) Assuming that grupos.(['g',int2str(l)]) is a matrix, this line returns a line-vector containing the ind_element-th line of said matrix.
grupos.(['g',int2str(l)]).('elementos')(ind_element,:),l appends the number one to the vector obtained before.
[BD_teste; grupos.(['g',int2str(l)]).('elementos')(ind_element,:),l] appends the line vector [grupos.(['g',int2str(l)]).('elementos')(ind_element,:),l] to the matrix BD_teste, at its bottom. and creates a new matrix.
Finally:
BD_teste = [BD_teste; grupos.(['g',int2str(l)]).('elementos')(ind_element,:),l];``assignes the value of the obtained matrix to the variableBD_teste`, overwriting its previous value. Effectively, this just appends the new line, but because of the overwriting step, it is not very effective.
It would be recommendable to append with:
BD_teste(end+1,:) = [grupos.(['g',int2str(l)]).('elementos')(ind_element,:),l];
Now, how you will rewrite this in Python is a whole different story, and will depend on how you want to define the variable grupos mostly.

Scipy Sparse Eigensolver: MemoryError after multiple passes through loop without anything new being written during loop

I'm using Python + Scipy to diagonalize sparse matrices with random entries on the diagonal; in particular, I need eigenvalues in the middle of the spectrum. The code I've written has worked fine for months, but now I'm looking at bigger matrices and am running into "MemoryError"s. What's confusing/driving me insane is that the error only shows up after a few iterations (namely 9) of constructing a random matrix and diagonalizing it, but I don't see any way in which my code stores anything extra in memory from one iteration to the next, and so can't see how my code could fail during the 9th iteration but not the 1st.
Here are the details (and I apologize in advance if I've left anything out, I'm new to posting on this site):
Each matrix I construct is 16000x16000, with 15x16000 non-zero entries. Everything ran fine when I was looking at 4000x4000-size matrices. The bulk of my code is
#Initialization
#...
for i in range(dim):
for n in range(N):
digit = (i % 2**(n+1)) / 2**n
index = (i % 2**n) + ((digit + 1) % 2)*(2**n) + (i / 2**(n+1))*(2**(n+1))
row[dim + N*i + n] = index
col[dim + N*i + n] = i
dat[dim + N*i + n] = -G
e_list = open(e_list_name + "_%03dk_%010ds" % (num_states, int(start_time)), "w")
e_log = open(e_log_name + "_%03dk_%010ds" % (num_states, int(start_time)), "w")
for t in range(num_itr): #Begin iterations
dat[0:dim] = math.sqrt(N/2.0)*np.random.randn(dim) #Get new diagonal elements
H = sparse.csr_matrix((dat, (row, col))) #Construct new matrix
vals = sparse.linalg.eigsh(H, k = num_states + 2, sigma = target_energy, which = 'LM', return_eigenvectors = False) #Get new eigenvalues
vals = np.sort(vals)
vals.tofile(e_list)
e_log.write("Iter %d complete\n" % (t+1))
e_list.flush()
e_log.flush()
e_list.close()
e_log.close()
I've been setting num_itr to 100. During the 9th pass through the num_itr loop (as indicated by 8 lines having been written to e_log), the program crashes with the error message
Can't expand MemType 0: jcol 7438
Traceback (most recent call last):
File "/usr/lusers/clb37/QREM_Energy_Gatherer.py", line 55, in <module>
vals = sparse.linalg.eigsh(H, k = num_states + 2, sigma = target_energy, which = 'LM', return_eigenvectors = False)
File "/usr/lusers/clb37/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/scipy/sparse/linalg/eigen/arpack/arpack.py", line 1524, in eigsh
symmetric=True, tol=tol)
File "/usr/lusers/clb37/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/scipy/sparse/linalg/eigen/arpack/arpack.py", line 1030, in get_OPinv_matvec
return SpLuInv(A.tocsc()).matvec
File "/usr/lusers/clb37/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/scipy/sparse/linalg/eigen/arpack/arpack.py", line 898, in __init__
self.M_lu = splu(M)
File "/usr/lusers/clb37/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/scipy/sparse/linalg/dsolve/linsolve.py", line 242, in splu
ilu=False, options=_options)
MemoryError
Sure enough, the program will fail during the 9th pass through that loop every time I run it on my machine, and when I try running this code on machines with more memory the program makes it through more iterations before crashing, so it looks like the computer really is running out of memory. If that's all there is to it then fine, but what I can't understand is why the program doesn't crash during the 1st iteration. I don't see any point in the 8 lines of the num_itr loop at which something gets written to memory without just being overwritten during the following iteration. I've used Heapy's heap() function to look at my memory usage, and it just prints out "Total size = 11715240 bytes" during every pass.
I feel like there's something fundamental that I just don't know about going on here, either some bug in my writing that I don't know to look for or some detail about how memory is handled. Can anyone explain to me why this code fails during the 9th pass through the num_itr loop but not the 1st?
Ok, this seems to be reproducible on Scipy 0.14.0.
It can apparently be worked around the issue by adding
import gc; gc.collect()
inside the loop to force Pythons cyclic garbage collector to run.
The issue appears that somewhere inside scipy.sparse.eigh there is a cyclic reference loop, in the vein of:
class Foo(object):
pass
a = Foo()
b = Foo()
a.spam = b
b.spam = a
del a, b # <- but a, b still refer to each other and are not dead
This is still perfectly OK in principle: although Python's reference counting doesn't detect such cyclic garbage, a collection is run periodically to gather such objects. However, if each object is very large in memory (eg. big Numpy arrays) the periodic runs are too infrequent, and you run out of memory before the next cyclic garbage collection run is done.
So a workaround is to force the GC to run when you know there's big garbage to collect.
A better workaround would be to change scipy.sparse.eigh so that such cyclic garbage is not generated in the first place.

Categories

Resources