I have a matrix with 236 x 97 dimension. When I print the matrix in Python its output isn't complete, having ....... in the middle of matrix.
I tried to write the matrix to a test file, but the result is exactly same.
I can't post the screenshot because my reputation is not enough, and won't appear correctly if I choose another markup option.
Can anyone solve this?
def build(self):
self.keys = [k for k in self.wdict.keys() if len(self.wdict[k]) > 1]
self.keys.sort()
self.A = zeros([len(self.keys), self.dcount])
for i, k in enumerate(self.keys):
for d in self.wdict[k]:
self.A[i,d] += 1
def printA(self):
outprint = open('outputprint.txt','w')
print 'Here is the weighted matrix'
print self.A
outprint.write('%s' % self.A)
outprint.close()
print self.A.shape
Assuming your matrix is an numpy array you can use matrix.tofile(<options>) to write the array to a file as documented here:
#!/usr/bin/env python
# coding: utf-8
import numpy as np
# create a matrix of random numbers and desired dimension
a = np.random.rand(236, 97)
# write matrix to file
a.tofile('output.txt', sep = ' ')
The problem is that you're specifically saving the str representation to a file with this line:
outprint.write('%s' % self.A)
Which explicitly casts it to a string (%s) --- generating the abridged version you're seeing.
There are lots of ways to write the entire matrix to output, one easy option would be to use numpy.savetxt, for example:
import numpy
numpy.savetxt('outputprint.txt', self.A)
Related
I'm trying to translatethe following lines of code from Python to MATLAB. V, Id, and J are of size (6400,) which in MATLAB are 1 -by- 6400 row vectors. pts is of size 242.
My Python code
A = coo_matrix((V, (Id, J)), shape=(pts.size, pts.size)).tocsr()
A = A.tobsr(blocksize=(2, 2))
I translated the first line as follows to MATLAB
A = sparse(V,Id,J,242,242);
However, I got the error
Error using sparse
Index into matrix must be an integer.
How can I translate this code to MATLAB?
The MATLAB sparse function has several forms:
S = sparse(A)
S = sparse(m,n)
S = sparse(i,j,v)
S = sparse(i,j,v,m,n)
S = sparse(i,j,v,m,n,nz)
The form you are most likely looking for is the fourth one: S = sparse(i,j,v,m,n), and will want to call it (using your use case) as:
A = sparse(Id, J, V, 242, 242);
I think your error is that MATLAB wants the I and J indices first, followed by the value and you are making the value the first argument.
I am working on a problem which involves a batch of 19 tokens each with 400 features. I get the shape (19,1,400) when concatenating two vectors of size (1, 200) into the final feature vector. If I squeeze the 1 out I am left with (19,) but I am trying to get (19,400). I have tried converting to list, squeezing and raveling but nothing has worked.
Is there a way to convert this array to the correct shape?
def attn_output_concat(sample):
out_h, state_h = get_output_and_state_history(agent.model, sample)
attns = get_attentions(state_h)
inner_outputs = get_inner_outputs(state_h)
if len(attns) != len(inner_outputs):
print 'Length err'
else:
tokens = [np.zeros((400))] * largest
print(tokens.shape)
for j, (attns_token, inner_token) in enumerate(zip(attns, inner_outputs)):
tokens[j] = np.concatenate([attns_token, inner_token], axis=1)
print(np.array(tokens).shape)
return tokens
The easiest way would be to declare tokens to be a numpy.shape=(19,400) array to start with. That's also more memory/time efficient. Here's the relevant portion of your code revised...
import numpy as np
attns_token = np.zeros(shape=(1,200))
inner_token = np.zeros(shape=(1,200))
largest = 19
tokens = np.zeros(shape=(largest,400))
for j in range(largest):
tokens[j] = np.concatenate([attns_token, inner_token], axis=1)
print(tokens.shape)
BTW... It makes it difficult for people to help you if you don't include a self-contained and runnable segment of code (which is probably why you haven't gotten a response on this yet). Something like the above snippet is preferred and will help you get better answers because there's less guessing at what your trying to accomplish.
I am creating a sparse matrix file, by extracting the features from an input file. The input file contains in each row, one film id, and then followed by some feature IDs and that features score.
6729792 4:0.15568 8:0.198796 9:0.279261 13:0.17829 24:0.379707
the first number is the ID of the film, and then the value to the left of the colon is feature ID and the value to the right is the score of that feature.
Each line represents one film, and the number of feature:score pairs vary from one film to another.
here is how I construct my sparse matrix.
import sys
import os
import os.path
import time
import numpy as np
from Film import Film
import scipy
from scipy.sparse import coo_matrix, csr_matrix, rand
def sparseCreate(self, Debug):
a = rand(self.total_rows, self.total_columns, format='csr')
l, m = a.shape[0], a.shape[1]
f = tb.open_file("sparseFile.h5", 'w')
filters = tb.Filters(complevel=5, complib='blosc')
data_matrix = f.create_carray(f.root, 'data', tb.Float32Atom(), shape=(l, m), filters=filters)
index_film = 0
input_data = open('input_file.txt', 'r')
for line in input_data:
my_line = np.array(line.split())
id_film = my_line[0]
my_line = np.core.defchararray.split(my_line[1:], ":")
self.data_matrix_search_normal[str(id_film)] = index_film
self.data_matrix_search_reverse[index_film] = str(id_film)
for element in my_line:
if int(element[0]) in self.selected_features:
column = self.index_selected_feature[str(element[0])]
data_matrix[index_film, column] = float(element[1])
index_film += 1
self.selected_matrix = data_matrix
json.dump(self.data_matrix_search_reverse,
open(os.path.join(self.output_path, "data_matrix_search_reverse.json"), 'wb'),
sort_keys=True, indent=4)
my_films = Film(
self.selected_matrix, self.data_matrix_search_reverse, self.path_doc, self.output_path)
x_matrix_unique = self.selected_matrix[:, :]
r_matrix_unique = np.asarray(x_matrix_unique)
f.close()
return my_films
Question:
I feel that this function is too slow on big datasets, and it takes too long to calculate.
How can I improve and accelerate it? maybe using MapReduce? What is wrong in this function that makes it too slow?
IO + conversions (from str, to str, even 2 times to str of the same var, etc) + splits + explicit loops. Btw, there is CSV python module which may be used to parse your input file, you can experiment with it (I suppose you use space as delimiter). Also I' see you convert element[0] to int/str which is bad - you create many tmp. object. If you call this function several times, you may to try to reuse some internal objects (array?). Also, you can try to implement it in another style: with map or list comprehension, but experiments are needed...
General idea of Python code optimization is to avoid explicit Python byte-code execution and to prefer native/C Python functions (for anything). And sure try to solve so many conversions. Also if input file is yours you can format it to fixed length of fields - this helps you to avoid split/parse totally (only string indexing).
i'm having trouble with 3.4 using numpy. My question is to know how can i have a numpy matrix with plain string format instead byte-string.
def res(data):
M = np.zeros(data.shape).astype(dtype='|S20')
lines,columns = M.shape
for l in range(lines):
M[l][0] = data[l][1]
M[l][1] = data[l][2]
M[l][2] = data[l][3]
return M
**result python2.7**
[['Ann' '38.72' '-9.133']
['John' '55.68' '12.566']
['Richard' '52.52' '13.411']
['Alex' '40.42' '-3.703']]
**result python3.4**
[[b'Ann' b'38.72' b'-9.133']
[b'John' b'55.68' b'12.566']
[b'Richard' b'52.52' b'13.411']
[b'Alex' b'40.42' b'-3.703']]
In Python3.4 How can i have my Matrix in plain string like in example for python2.7 this is bad because i have functions that expect string values and not byte-strings.
Any help would be great. thanks
in my case the solution were simply to change dtype('|S20') to dtype(str)..I hope this help.
I am rather new to python programming so please be a big simple with your answer.
I have a .raw file which is 2b/2b complex short int format. Its actually a 2-D raster file. I want to read and seperate both real and complex parts. Lets say the raster is [MxN] size.
Please let me know if question is not clear.
Cheers
N
You could do it with the struct module. Here's a simple example based on the file formatting information you mentioned in a comment:
import struct
def read_complex_array(filename, M, N):
row_fmt = '={}h'.format(N) # "=" prefix means integers in native byte-order
row_len = struct.calcsize(row_fmt)
result = []
with open(filename, "rb" ) as input:
for col in xrange(M):
reals = struct.unpack(row_fmt, input.read(row_len))
imags = struct.unpack(row_fmt, input.read(row_len))
cmplx = [complex(r,i) for r,i in zip(reals, imags)]
result.append(cmplx)
return result
This will return a list of complex-number lists, as can be seen in this output from a trivial test I ran:
[
[ 0.0+ 1.0j 1.0+ 2.0j 2.0+ 3.0j 3.0+ 4.0j],
[256.0+257.0j 257.0+258.0j 258.0+259.0j 259.0+260.0j],
[512.0+513.0j 513.0+514.0j 514.0+515.0j 515.0+516.0j]
]
Both the real and imaginary parts of complex numbers in Python are usually represented as a pair of machine-level double precision floating point numbers.
You could also use the array module. Here's the same thing using it:
import array
def read_complex_array2(filename, M, N):
result = []
with open(filename, "rb" ) as input:
for col in xrange(M):
reals = array.array('h')
reals.fromfile(input, N)
# reals.byteswap() # if necessary
imags = array.array('h')
imags.fromfile(input, N)
# imags.byteswap() # if necessary
cmplx = [complex(r,i) for r,i in zip(reals, imags)]
result.append(cmplx)
return result
As you can see, they're very similar, so it's not clear there's a big advantage to using one over the other. I suspect the array based version might be faster, but that would have to be determined by actually timing it with some real data to be able to say with any certainty.
Take a look at Hachoir library. It's designed for this purposes, and does it's work really good.