How to to read a matrix from a given file?

How to to read a matrix from a given file? - python

I have a text file which contains matrix of N * M dimensions.
For example the input.txt file contains the following:
0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0
0,0,2,1,0,2,0,0,0,0
0,0,2,1,1,2,2,0,0,1
0,0,1,2,2,1,1,0,0,2
1,0,1,1,1,2,1,0,2,1
I need to write python script where in I can import the matrix.
My current python script is:
f = open ( 'input.txt' , 'r')
l = []
l = [ line.split() for line in f]
print l
the output list comes like this
[['0,0,0,0,0,0,0,0,0,0'], ['0,0,0,0,0,0,0,0,0,0'], ['0,0,0,0,0,0,0,0,0,0'],
['0,0,0,0,0,0,0,0,0,0'], ['0,0,0,0,0,0,0,0,0,0'], ['0,0,0,0,0,0,0,0,0,0'],
['0,0,2,1,0,2,0,0,0,0'], ['0,0,2,1,1,2,2,0,0,1'], ['0,0,1,2,2,1,1,0,0,2'],
['1,0,1,1,1,2,1,0,2,1']]
I need to fetch the values in int form . If I try to type cast, it throws errors.

Consider
with open('input.txt', 'r') as f:
l = [[int(num) for num in line.split(',')] for line in f]
print(l)
produces
[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 2, 1, 0, 2, 0, 0, 0, 0], [0, 0, 2, 1, 1, 2, 2, 0, 0, 1], [0, 0, 1, 2, 2, 1, 1, 0, 0, 2], [1, 0, 1, 1, 1, 2, 1, 0, 2, 1]]
Note that you have to split on commas.
If you do have blank lines then change
l = [[int(num) for num in line.split(',')] for line in f ]
to
l = [[int(num) for num in line.split(',')] for line in f if line.strip() != "" ]

You can simply use numpy.loadtxt.
Easy to use, and you can also specify your delimiter, datatypes etc.
specifically, all you need to do is this:
import numpy as np
input = np.loadtxt("input.txt", dtype='i', delimiter=',')
print(input)
And the output would be:
[[0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]
[0 0 2 1 0 2 0 0 0 0]
[0 0 2 1 1 2 2 0 0 1]
[0 0 1 2 2 1 1 0 0 2]
[1 0 1 1 1 2 1 0 2 1]]

You can do this:
fin = open('input.txt','r')
a=[]
for line in fin.readlines():
a.append( [ int (x) for x in line.split(',') ] )

The following does what you want:
l = []
with open('input.txt', 'r') as f:
for line in f:
line = line.strip()
if len(line) > 0:
l.append(map(int, line.split(',')))
print l

You should not write your csv parser, consider the csv module when reading such files and use the with statement to close after reading:
import csv
with open('input.txt') ad f:
data = [map(int, row) for row in csv.reader(f)]

Check out this small one line code for reading matrix,
matrix = [[input() for x in range(3)] for y in range(3)]
this code will read matrix of order 3*3.

import numpy as np
f = open ( 'input.txt' , 'r')
l = []
l = np.array([ line.split() for line in f])
print (l)
type(l)
output:
[['0'] ['0'] ['0'] ['0,0,0,0,0,0,0,0,0,0,0']
['0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0']
['0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0']]
numpy.ndarray

The following code converts the above input to matrix form:
f = open ('input.txt' , 'r')
l = []
l = [line.split() for line in f]
l=np.array(l)
l=l.astype(np.int)

Related

Numpy scalable diagonal matrices

Assuming I have the variables:
A = 3
B = 2
C = 1
How can i transform them into diagonal matrices in the following form:
np.diag([1, 1, 1, 0, 0, 0])
Out[0]:
array([[1, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0]])
np.diag([0,0,0,1,1,0])
Out[1]:
array([[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 0],
[0, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 0, 0]])
np.diag([0,0,0,0,0,1])
Out[2]:
array([[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 1]])
I would like this to be scalable, so for instance with 4 variables a = 500, b = 20, c = 300, d = 200 the size of the matrix will be 500 + 20 + 300 + 200 = 1020.
What is the easiest way to do this?

The obligatory solution with np.einsum, about ~2.25x slower than the accepted answer for the [500,20,200,300] arrays on a 2-core colab instance.
import numpy as np
A = 3
B = 2
C = 1
r = [A,B,C]
m = np.arange(len(r))
np.einsum('ij,kj->ijk', m.repeat(r) == m[:,None], np.eye(np.sum(r), dtype='int'))
Output
array([[[1, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0]],
[[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 0],
[0, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 0, 0]],
[[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 1]]])

Here's one approach. The resulting array mats contains the matrices you're looking for.
A = 3
B = 2
C = 1
n_list = [A,B,C]
ab_list = np.cumsum([0] + n_list)
ran = np.arange(ab_list[-1])
mats = [np.diag(((a <= ran) & (ran < b)).astype('int'))
for a,b in zip(ab_list[:-1],ab_list[1:])]
for mat in mats:
print(mat,'\n')
Result:
[[1 0 0 0 0 0]
[0 1 0 0 0 0]
[0 0 1 0 0 0]
[0 0 0 0 0 0]
[0 0 0 0 0 0]
[0 0 0 0 0 0]]
[[0 0 0 0 0 0]
[0 0 0 0 0 0]
[0 0 0 0 0 0]
[0 0 0 1 0 0]
[0 0 0 0 1 0]
[0 0 0 0 0 0]]
[[0 0 0 0 0 0]
[0 0 0 0 0 0]
[0 0 0 0 0 0]
[0 0 0 0 0 0]
[0 0 0 0 0 0]
[0 0 0 0 0 1]]
Edit: Here's a faster solution that yields the same result
n_list = [A,B,C]
ab_list = np.cumsum([0] + n_list)
total = ab_list[-1]
ran = np.arange(total)
mats = np.zeros((len(n_list),total,total))
for k,p in enumerate(zip(ab_list[:-1],ab_list[1:])):
idx = np.arange(p[0],p[1])
mats[k,idx,idx] = 1
for mat in mats:
print(mat,'\n')
This seems to yield a ~10% speedup over the currently accepted solution
Another with roughly equivalent performance:
n_list = [A,B,C]
m = len(n_list)
ab_list = np.cumsum([0] + n_list)
total = ab_list[-1]
ran = np.arange(total)
mats = np.zeros((m,total,total))
idx = [k for a,b in zip(ab_list[:-1],ab_list[1:]) for k in range(a,b)]
mats[[k for k,n in enumerate(n_list) for _ in range(n)],
idx,idx] = 1
for mat in mats:
print(mat,'\n')

You can achieve even better performance by just allocating the array once, then setting the values all at once by specifying the indices. The indices are fortunately easy to obtain.
import numpy as np
a = [3, 2, 1] # Put your values in a list
s = np.sum(a)
m = np.zeros((len(a), s, s), dtype=int) # Initialize array once
indices = (np.repeat(range(len(a)), a), *np.diag_indices(s, 2)) # Get indices
m[indices] = 1 # Set the diagonals at once
return m
Output:
[[[1 0 0 0 0 0]
[0 1 0 0 0 0]
[0 0 1 0 0 0]
[0 0 0 0 0 0]
[0 0 0 0 0 0]
[0 0 0 0 0 0]]
[[0 0 0 0 0 0]
[0 0 0 0 0 0]
[0 0 0 0 0 0]
[0 0 0 1 0 0]
[0 0 0 0 1 0]
[0 0 0 0 0 0]]
[[0 0 0 0 0 0]
[0 0 0 0 0 0]
[0 0 0 0 0 0]
[0 0 0 0 0 0]
[0 0 0 0 0 0]
[0 0 0 0 0 1]]]
Comparing to #Ben Grossmann's answer, with A=3000, B=2000, C=1000 and 100 repeats:
def A():
'''My solution'''
a = [3000, 2000, 1000] # Put your values in a list
s = np.sum(a)
m = np.zeros((len(a), s, s), dtype=int) # Initialize array once
indices = (np.repeat(range(len(a)), a), *np.diag_indices(s, 2)) # Get indices
m[indices] = 1 # Set the diagonals at once
return m
def B():
'''Bens solution'''
A = 3000
B = 2000
C = 1000
n_list = [A,B,C]
ab_list = np.cumsum([0] + n_list)
ran = np.arange(ab_list[-1])
return [np.diag(((a <= ran) & (ran < b)).astype('int')) for a,b in zip(ab_list[:-1], ab_list[1:])]
print(f'Timings:')
timeA = timeit.timeit(A, number=100)
timeB = timeit.timeit(B, number=100)
ratio = timeA / timeB
print(f'This solution: {timeA} seconds')
print(f'Current accepted answer: {timeB} seconds')
if ratio < 1:
print(f'This solution is {1 / ratio} times faster than Bens solution')
else:
print(f'Bens solution is {ratio} times faster than this solution')
Output:
Timings:
This solution: 1.6834218999993027 seconds
Current accepted answer: 5.096610300000066 seconds
This solution is 3.027529997086397 times faster than Bens solution
EDIT: Changed the "indices" algorithm to use np.repeat instead of np.concatenate.

One posible method ( don't think it's optimal but it works):
import numpy as np
a = 3
b = 2
c = 1
values = [a,b,c] #create a list with values
n = sum(values) #calc total length of diagnal
#create an array with cumulative sums but starting from 0 to use as index
idx_vals = np.zeros(len(values)+1,dtype=int)
np.cumsum(values,out=idx_vals[1:]);
#create every diagonal using values, then create diagonal matrices and
#save them in `matrices` list
matrices = []
for idx,v in enumerate(values):
diag = np.zeros(n)
diag[idx_vals[idx]:idx_vals[idx]+v] = np.ones(v)
print(diag)
matrices.append(np.diag(diag))

Yet another possibility:
import numpy as np
# your constants here
constants = [3, 2, 1] # [A, B, C]
size = sum(constants)
cumsum = np.cumsum([0] + constants)
for i in range(len(cumsum) - 1):
inputVector = np.zeros(size, dtype=int)
inputVector[cumsum[i]:cumsum[i+1]] = 1
matrix = np.diag(inputVector)
print(matrix, '\n')
Output:
[[1 0 0 0 0 0]
[0 1 0 0 0 0]
[0 0 1 0 0 0]
[0 0 0 0 0 0]
[0 0 0 0 0 0]
[0 0 0 0 0 0]]
[[0 0 0 0 0 0]
[0 0 0 0 0 0]
[0 0 0 0 0 0]
[0 0 0 1 0 0]
[0 0 0 0 1 0]
[0 0 0 0 0 0]]
[[0 0 0 0 0 0]
[0 0 0 0 0 0]
[0 0 0 0 0 0]
[0 0 0 0 0 0]
[0 0 0 0 0 0]
[0 0 0 0 0 1]]

Dividing an input into diffferent lists by the lines

I'm trying to divide a multi-lined input into different lists by the lines. Is there a method for that?
For example:
#The input:
0 0 0 0 0
0 0 0 0 1
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
#What I request:
column1 = [0, 0, 0, 0, 0] #the first line as a list
column2 = [0, 0, 0, 0, 1] # the second line as a a list
.
. #so goes on
.

You don't need a method. This line is pretty much the standard:
first_line = list(map(int, input().split()))
Breakdown:
First, input() takes one line of input from input stream i.e. "0 0 0 0 0".
split method breaks it by ' ' (space) which results in ["0", "0", "0", "0", "0"]
Then map function is used to convert every character to integer i.e. from "0" to 0.
If you want to process all input lines regardless of the input length, then you can use fileinput.input(). Here's a sample code:
import fileinput
input_grid = [list(map(int, line.split())) for line in fileinput.input()]
input_grid:
[[0, 0, 0, 0, 0],
[0, 0, 0, 0, 1],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0]]

You probably want a list of lists, not individual variables for each row. If you do it this way, once output is built, you can call any row by index, such as output[2], etc
s ='''0 0 0 0 0
0 0 0 0 1
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0'''
s = s.split('\n')
output = [list(map(int,x.split())) for x in s]
Output
[[0, 0, 0, 0, 0],
[0, 0, 0, 0, 1],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0]]

Do you have some advices about signal processing on binary time series?

I have a binary time series with some ASK modulated signals in different frequencies inside of it.
Let's say it's something like this: x = [0,0,0,1,1,1,1,0,0,0,0,1,1,1,1,0,0,0,0,1,1,1,1,0,0,0,0,0,0,0, ...]
What's matter to me is having all the '1' and '0' in an interval of 4 samples or more, but sometimes the '0' and '1' change places like this: x1 = [0,0,0,1,1,1,1,1] when it had to be x2 = [0,0,0,0,1,1,1,1]
And there's also some noise as spikes as seen in n1 = [0,0,0,0,0,0,1,1,0,0,0,0,0] when it should be only zeros.
I've already tried moving average and it introduced a lag to the signal that was't good for my application.
Do you have some advices about signal processing on binary time series?

The following code finds the indices of all continuous sequences with the length smaller than 4 (min_cont_length). It also gives you the lengths of the problematic sectors, so you can decide how to handle them.
import numpy as np
def find_index_of_err(signal, min_cont_length = 4):
# pad sides to detect problems at the edges
signal = np.concatenate(([1-signal[0]],signal,[1-signal[-1]]))
# calculate differences from 1 element to the next
delta = np.concatenate(([0], np.diff(signal, 1)))
# detect discontinuities
discontinuity = np.where(delta!=0)[0]
# select discontinuities with matching length (< min_cont_length)
err_idx = discontinuity[:-1][np.diff(discontinuity) < min_cont_length] - 1
# get also the size of the gap
err_val = np.diff(discontinuity)[np.argwhere(np.diff(discontinuity) < min_cont_length).flatten()]
return err_idx, err_val
# some test signals
signals = np.array([[1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1]])
for sig in signals:
index, value = find_index_of_err(sig)
print(sig, index, value)
# Output:
# [1 0 0 0 0 0 0 0 0 0 0] [0] [1]
# [0 0 1 0 0 0 0 0 0 0 0] [0 2] [2 1]
# [0 0 0 0 1 0 0 0 0 0 0] [4] [1]
# [0 0 0 0 0 0 1 1 0 0 0] [6 8] [2 3]
# [0 0 0 0 0 0 1 1 1 1 1] [] []

Reading numpy list of vectors from file causes a change of vectors to be string

I have the following code that generates numpy list, writes it, and later reads it back as a numpy list again:
filepath = 'vectors.txt'
rows = 10
dim = 5
V = np.random.choice(np.array([0, 1], dtype=np.uint8), size=(rows, dim))
M = np.unique(V.view(V.dtype.descr * dim))
Matrix = M.view(V.dtype).reshape(-1, dim)
with open(filepath, 'w') as f:
for i in Matrix:
f.write("{}\n".format(i))
f.close()
with open(filepath, "r") as f:
contents = []
for line in f:
l = np.asarray(line.strip())
contents.append(l)
contents = np.asarray(contents)
print(contents)
The output looks like this:
['[0 0 0 1 0]' '[0 0 1 0 1]' '[0 0 1 1 1]' '[0 1 0 1 1]' '[0 1 1 0 0]'
'[0 1 1 0 1]' '[0 1 1 1 0]' '[1 1 1 0 0]' '[1 1 1 0 1]']
How can I remove single quotation mark around each vector so that it become a numpy vector? In other word, instead of '[0 0 0 1 0]', it must be [0 0 0 1 0]. I tried using l = np.asarray(line.strip()) but it seems it has no effect when appending. Note that I do looping while read and write on purpose.
Thank you

There are easier ways to re-read the output if you write it out in a better format. Assuming you have a reason to output it the way you did, and if you don't care about the data structure when it's read back in...
with open(filepath, "r") as f:
contents = []
for line in f:
l = np.fromstring(line[1:-1], sep=' ', dtype=int)
contents.append(l)
>>> contents = np.asarray(contents)
>>> print(contents)
array([[0, 0, 0, 0, 1],
[0, 1, 0, 0, 0],
[0, 1, 1, 0, 1],
[1, 0, 0, 0, 1],
[1, 0, 1, 0, 1],
[1, 0, 1, 1, 0],
[1, 0, 1, 1, 1],
[1, 1, 0, 1, 1]])

How to check word in file with list element by using index?

I have a list of words in two files. I want to check the word from file 2 with file 1. The word that match will replace by 1. If have duplicate word then number of them will be count and used instead of 1. If not match then 0 will be used. They will used the same row format as in file 1.
(sorry about my explanation)
file 1: a,b,c,1,5,9,12
file 2: a 1 c 12
c 9 a b
5 b 5 c
9 12 a b
I tried the code below but I still lost as I got all 0. Any suggestion?
header = []
for line in open(file1):
lines = line.strip().split(',')
for i,j in enumerate(lines):
header.append(j)
#print header
for line in open(file2):
linesMo = line.strip().split()
for words in linesMo:
if words != j:
print '0',
if words == j:
print '1',
I want the results to be:
1, 0, 1, 1, 0, 0, 1 # a 1 c 12
1, 1, 1, 0, 0, 1, 0 # c 9 a b
0, 1, 1, 0, 2, 0, 0 # 5 b 5 c
1, 1, 0, 0, 0, 1, 1 # 9 12 a b

with open("Input1.txt") as in_file1, open("Input2.txt") as in_file2:
line = next(in_file1).rstrip().split(",")
for row in map(str.split, in_file2):
print [row.count(item) for item in line]
Output
[1, 0, 1, 1, 0, 0, 1]
[1, 1, 1, 0, 0, 1, 0]
[0, 1, 1, 0, 2, 0, 0]
[1, 1, 0, 0, 0, 1, 1]
You can do it even more efficiently, like this
from collections import Counter
with open("Input1.txt") as in_file1, open("Input2.txt") as in_file2:
line = next(in_file1).rstrip().split(",")
for row in map(str.split, in_file2):
print map(Counter(row).__getitem__, line)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to to read a matrix from a given file? - python

You can do this: fin = open('input.txt','r') a=[] for line in fin.readlines(): a.append( [ int (x) for x in line.split(',') ] )

The following does what you want: l = [] with open('input.txt', 'r') as f: for line in f: line = line.strip() if len(line) > 0: l.append(map(int, line.split(','))) print l

You should not write your csv parser, consider the csv module when reading such files and use the with statement to close after reading: import csv with open('input.txt') ad f: data = [map(int, row) for row in csv.reader(f)]

Check out this small one line code for reading matrix, matrix = [[input() for x in range(3)] for y in range(3)] this code will read matrix of order 3*3.

import numpy as np f = open ( 'input.txt' , 'r') l = [] l = np.array([ line.split() for line in f]) print (l) type(l) output: [['0'] ['0'] ['0'] ['0,0,0,0,0,0,0,0,0,0,0'] ['0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0'] ['0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0']] numpy.ndarray

The following code converts the above input to matrix form: f = open ('input.txt' , 'r') l = [] l = [line.split() for line in f] l=np.array(l) l=l.astype(np.int)

Related

Numpy scalable diagonal matrices

Dividing an input into diffferent lists by the lines

Do you have some advices about signal processing on binary time series?

Reading numpy list of vectors from file causes a change of vectors to be string

How to check word in file with list element by using index?

Categories

Resources