How to Read file and store content into a 2D matrix? - python

I have a ton of data file in the same format as described below and I'm trying to make a colormesh plot from this:
0 0 1 2
-3 1 7 7
-2 1 2 3
-1 1 7 3
[0 1 2] of the first row are values for the y axis of the plot, and [-3 -2 -1] of the first column are values for the x values of the same plot. The first 0 is only for spacing
these are the numbers that I really want inside the pcolormesh:
1 7 7
1 2 3
1 7 3
I'm trying to read these values and store into a 2D matrix as:
Matrix = [[1. 7. 7.]
[1. 2. 3.]
[1. 7. 3.]]
Here is a figure ilustrating it further:
Here is my code:
import numpy as np
import matplotlib.pyplot as plt
# ------------- Input Data Files ------------- #
data = np.loadtxt('my_colormesh_data.dat') # Load Data File
# ------ Transform Data into 2D Matrix ------- #
Matrix = []
n_row = 4 # Number of rows counting 0 from file #
n_column = 4 # Number of columns couting 0 from file #
x = data[range(1,n_row),0] # Read x axis values from data file and store in a variable #
y = data[0, range(1,n_column)] # Read y axis values from data file and store in a variable #
print(data)
print('\n', x) # print values of x (for checking)
print('\n', y) # print values of y (for checking)
for i in range (2, n_row):
for j in range(2, n_column):
print(i, j, data[i,j]) # print values of i, j and data (for checking)
Matrix[i,j] = data[i,j]
print(Matrix)
and results in this error:
Matrix[i,j] = data[i,j]
TypeError: list indices must be integers or slices, not tuple
Could you clarify what i'm doing wrong?
Thanks in advance!

You are getting the error because Matrix is a list and you are trying to index it using a tuple, i,j. And that is not a valid operation. You can index a list oly with integers or slices
Secondly your data variable is already a 2D array. You don't have to any further conversions.
In order to skip the first row and first column you can simply use index slicing.
>>> input_data = """0 0 1 2
... -3 1 7 7
... -2 1 2 3
... -1 1 7 3 """
>>>
>>> data = np.loadtxt(StringIO(input_data))
>>> data
array([[ 0., 0., 1., 2.],
[-3., 1., 7., 7.],
[-2., 1., 2., 3.],
[-1., 1., 7., 3.]])
>>> data[1:,1:]
array([[1., 7., 7.],
[1., 2., 3.],
[1., 7., 3.]])

Related

Get a 25x2 array like the following (a sorted list with its coordinates)

My goal: Get a 25x2 array like the following (a sorted list with its coordinates)
0 0
0 1
0 2
0 3
0 4
1 0
1 1
..
..
4 3
4 4
My (failed) solution:
import numpy as np
n=5
lis=np.zeros((n*n,2))
for i in range(n):
for j in range(n):
print(i,j) # This prints just what I want to have in the array
lis[j,0]=i
lis[j,1]=j
Output:
array([[4., 0.],
[4., 1.],
[4., 2.],
[4., 3.],
[4., 4.],
[0., 0.],
[0., 0.],
....
The row number is just j. It has to increment by 1 every iteration, even when j goes back to 0 on the next iteration of the i loop.
for i in range(n):
for j in range(n):
lis[i*n + j, 0] = i
lis[i*n + j, 1] = j
Another way to do it is to loop in the range up to n*n, and then calculate the indices using division and remainder.
for i in range(n*n):
lis[i, 0] = i // n
lis[i, 1] = i % n
When you access lis by lis[j,..], j ranges from 0 to 4, thus you never update rows after.
You have some options to achive your goal:
Following your logic you can do:
for i, e in enumerate(range(n*n)):
lis[i] = e//n, e%n
Using cartesian product:
import itertools
for i, e in enumerate(itertools.product(range(n), range(n))):
lis[i] = e

Tuples of arrays 1D and 2D to dataframe with python

This is what a model.predic returns. ¿How can i convert this tuple in columns of a dataframe?
(array([1., 1., 1., ..., 1., 1., 1.]), array([[0.46502338, 0.53497662],
[0.47072865, 0.52927135],
[0.4696557 , 0.5303443 ],
...,
[0.47139825, 0.52860175],
[0.46367829, 0.53632171],
[0.46586898, 0.53413102]]))
<class 'tuple'>
Nothing of those is working for me
pd.DataFrame(dict(class_pred=tuple[0], prob_0=tuple[1], prob_1=tuple[2]))
pd.DataFrame(np.column_stack(tuple),columns=['class_pred','prob_0','prob_1'])
I would like to obtain something like this:
class_pred prob_0 prob_1
1 0.470728 0.5292713
AniSkywalker solution works perfectly.
type(data)
print(data)
tuple
(array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]),
array([[0.46502338, 0.53497662],
[0.47072865, 0.52927135],
[0.4696557 , 0.5303443 ],
[0.46511921, 0.53488079],
[0.46739934, 0.53260066],
[0.47387646, 0.52612354],
[0.4737461 , 0.5262539 ],
[0.47052631, 0.52947369],
[0.47658316, 0.52341684],
[0.47222654, 0.52777346]]))
df_pred = pd.DataFrame(data=dict(pred=data[0], prob_0=data[1][:,0], prob_1=data[1][:,1]))
print(df_pred)
pred prob_0 prob_1
0 1.0 0.465023 0.534977
1 1.0 0.470729 0.529271
2 1.0 0.469656 0.530344
3 1.0 0.465119 0.534881
4 1.0 0.467399 0.532601
5 1.0 0.473876 0.526124
6 1.0 0.473746 0.526254
7 1.0 0.470526 0.529474
8 1.0 0.476583 0.523417
9 1.0 0.472227 0.527773
I'm assuming your data is of the form ((n), (n, 2)) so that:
import numpy as np
n = 5
data = (np.random.rand(n), np.random.rand(n, 2))
provides a reasonable estimate of what your output looks like.
Let's say that data is:
(array([0.27856312, 0.66255123, 0.47976175, 0.59381106, 0.82096555]), array([[0.53719357, 0.55803381],
[0.5749893 , 0.09712089],
[0.91607789, 0.21579499],
[0.50163898, 0.39188127],
[0.60427654, 0.07801227]]))
Your dict method actually works with one modification:
import pandas as pd
df = pd.DataFrame(data=dict(class_pred=data[0], prob_0=data[1][:,0], prob_1=data[1][:,1]))
Notice that prob_0 and prob_1 are both derived from the second tuple element, but using Numpy's column indexing we can split the individual arrays as you described.
Let's take data[1][:,0], for example: first, we select the second element of the data tuple, which is the (n, 2) matrix. Then, we select the first column (0) from all rows (:). The result is a vector of the first element of every row in that matrix.
Using my made-up numbers, df.head() should give you:
class_pred prob_0 prob_1
0 0.278563 0.537194 0.558034
1 0.662551 0.574989 0.097121
2 0.479762 0.916078 0.215795
3 0.593811 0.501639 0.391881
4 0.820966 0.604277 0.078012

Find the sum of a pair of columns starting from first and the middle one?

My input:
a b c d e f
g h i j k l
My output should have three sets of solutions like this:
Sq( a**2 + d**2 ) + Sq ( g**2 + j**2 )
Sq( b**2 + e**2 ) + Sq ( h**2 + k**2 )
Sq( c**2 + f**2 ) + Sq ( i**2 + l**2 )
My actual text file has so many rows and columns with no header. This is what I have so far:
import os
import math
import numpy as np
for file in os.listdir("directory"):
if file.endswith(".txt"):
fin=open(file, 'r')
total = 0
for line in fin:
str = [float(x) for x in line.split()]
for i in range(len(str[0:5])):
str[i]=float(str[i])
sum=np.sum((math.pow(str[i],2)+math.pow(str[i+3],2))**0.5
total += sum
fin.close()
With a file:
1 2 3 4 5 6
11 12 13 14 15 16
Correcting indention and range:
with open('stack53269737.txt') as f:
total = 0
for line in f:
str = [float(x) for x in line.split()]
for i in range(3):
str[i]=float(str[i])
sum=np.sum((math.pow(str[i],2)+math.pow(str[i+3],2))**0.5)
total += sum
In [111]: total
Out[111]: 73.84586902040324
with further cleanup
with open('stack53269737.txt') as f:
total = 0
for line in f:
alist = [float(x) for x in line.split()]
for i in range(3):
total += (alist[i]**2+alist[i+3]**2)**0.5
We don't need to convert to float twice; we don't need math for simple squares.
A numpy approach:
load it with a numpy csv reader:
In [126]: data = np.genfromtxt('stack53269737.txt')
In [127]: data
Out[127]:
array([[ 1., 2., 3., 4., 5., 6.],
[11., 12., 13., 14., 15., 16.]])
reshape the array to express your row splitting:
In [128]: data1 = data.reshape(2,2,3)
In [129]: data1
Out[129]:
array([[[ 1., 2., 3.],
[ 4., 5., 6.]],
[[11., 12., 13.],
[14., 15., 16.]]])
Now we can just square all values, sum on the correct axis, take squareroot and sum again:
In [130]: np.sum(np.sum(data1**2, axis=1)**.5)
Out[130]: 73.84586902040324
If you wish to do it without numpy, you could try the following:
import math
with open("data.txt", "r") as infile:
# Split the lines and then split numbers from each line.
lines = list(map(str.split, infile.read().split('\n')))
# Use zip to create tuples of values that take part in each operation.
lines = list(zip(*lines))
# Get length of each line.
lineLength = len(lines)
# Find the total.
total = sum([math.sqrt(int(lines[i][j])**2 + int(lines[i+3][j])**2) for i in range(lineLength-3) for j in range(2)])
print(total)
Given a file with the following data:
1 2 3 4 5 6
7 8 9 10 11 12
The result is:
57.02450048972068

ValueError: number of diagonals (1) does not match the number of offsets (3)

In python 2.7, I'm trying to create a sparse matrix with 3 diagonals. The matrix should look like this:
[[ 10 0 0 0 -19 0 0 0 10 0 0 0 ... 0]
[ 0 10 0 0 0 -19 0 0 0 10 0 0 ... 0]
[ 0 0 10 0 0 0 -19 0 0 0 10 0 ... 0]
[ -1 1 1 0 0 0 0 0 0 0 0 0 ... 0]]
My code is
import numpy as np
import numpy.matlib
import scipy.sparse
Dk = np.array([[ 10.], [ 10.],[ 10.]])
Ns = 3
N = 100
z = np.array([[-1.],
[ 1.],
[ 1.]])
dg0 = np.array([numpy.matlib.repmat(1-2*Dk,1,1)])
dgn = np.array([numpy.matlib.repmat(Dk,1,1)])
dgp = np.array([numpy.matlib.repmat(Dk,1,1)])
B = np.zeros((Ns+1,N))
dg0 = np.append(dg0,0)
dgn = np.append(dgn,0)
dgp = np.append(dgp,0)
zerosN1 = np.zeros((1,Ns+1))
zerosN2 = np.zeros((1,2*(Ns+1)))
dg0 = np.append(zerosN1,dg0)
dgp = np.append(zerosN2,dgp)
data0 = np.array([dgn,dg0,dgp])
diags0 = np.array([0,Ns+1,2*(Ns+1)])
B = scipy.sparse.spdiags(data0, diags0, Ns+1, N)
B = scipy.sparse.lil_matrix(B)
zerosN = np.zeros((1,N-Ns))
B[3] = np.append(z,zerosN)
I am getting an error: ValueError: number of diagonals (1) does not match the number of offsets (3)
I don't understand what's wrong. I'd appreciate any help.
Your problem is that data0 looks like this:
array([array([ 10., 10., 10., 0.]),
array([ 0., 0., 0., 0., -19., -19., -19., 0.]),
array([ 0., 0., 0., 0., 0., 0., 0., 0., 10., 10., 10.,
0.])], dtype=object)
I'm not sure what you're doing with repmat (or a lot of parts of your program, for that matter), as all you have to do to create a sparse matrix from diagonals is provide the same number of diagonals (of the same length) as the offsets provided. Thus, this should suffice:
flattened_Dk = Dk.ravel()
data0 = [flattened_Dk, 1-2*flattened_Dk, flattened_Dk]
B = scipy.sparse.diags(data0, diags0, shape=(Ns,N))
The array data0 is a 1-D array of three lists of different lengths:
data0.shape # (3,)
As such, spdiags sees data0 as having only one dimension, and therefore one diagonal, but you're providing it with 3 diagonals in diags0, hence the error.
You can use a simple list comprehension to run each of the elements in data0 separately, something like this:
B = [scipy.sparse.spdiags(data0[i], diags0[i], Ns+1, N) for i in range(len(data0))]
B = np.array([scipy.sparse.lil_matrix(b) for b in B])
zerosN = np.zeros((1,N-Ns))
B = np.append(B, np.append(z,zerosN))

Matrix in python: transposing 2nd column into matrix row then populating with data in 3rd column

New to python here (but have experience in R, SQL).
I tried googling this, however was unable to generate new ideas.
My main purpose is to generate a matrix using my csv data, but I'd like to transpose my 2nd column into a row for the matrix. I'd then like to populate that matrix with the data in my 3rd column, but wasn't able to get anywhere.
After a couple of days, I have come up with this code :
import csv
def readcsv(csvfile_name):
with open(csvfile_name) as csvfile:
file=csv.reader(csvfile, delimiter=",")
#remove rubbish data in first few rows
skiprows = int(input('Number of rows to skip? '))
for i in range(skiprows):
_ = next(file)
#change strings into integers/floats
for z in file:
z[:2]=map(int, z[:2])
z[2:]=map(float, z[2:])
print(z[:2])
return
This just cleans up my data, however what I'd like to do is to transpose the data into a matrix. The data I have is like this (imagine x,y,z,d and other letters are floats):
1 1 x
1 2 y
1 3 z
1 4 d
. . .
. . .
However, I'd like to turn this data into a matrix like this: i.e. I'd like to populate that matrix with data in the 3rd column (letters here just to make it easier to read for you guys) and convert the 2nd column into a row for the matrix. So in essence, the first and second columns of that CSV file are co-ordinates for the matrix.
1 2 3 4 . .
1 x y z d
1 a b c u
1 e f e y
.
.
I tried learning numpy, however it appears like it requires my data to already be in a matrix form.
If you want to use numpy you've got two options depending on how your data is stored.
If it is GUARANTEED that your keys increase consistently, e.g:
THIS NOT THIS
------ --------
1 1 a 1 1 a
1 2 b 1 3 b
1 3 c 2 1 c
1 4 d 3 1 d
2 1 e 1 2 e
2 2 f 1 4 f
2 3 g 8 8 g
2 4 h 2 2 h
Then simply take all the values in the far right column and chuck them into a flat numpy array and reshape according to the maximum values in the left and middle column.
import numpy as np
m = np.array(right_column)
# For the sake of example:
#: array([1., 2., 3., 4., 5., 6., 7., 8.])
m = m.reshape(max(left_column), max(middle_column))
#: array([[1., 2., 3., 4.],
#: [5., 6., 7., 8.]])
If it is not guaranteed, you could either sort it so that it is (probably easiest), OR create a zero array of the correct shape and cycle through each element.
# Example data
left_column = [1, 2, 1, 2, 1, 2, 1, 2]
middle_column = [1, 1, 3, 3, 2, 2, 4, 4]
right_column = [1., 5., 3., 7., 2., 6., 4., 8.]
import numpy as np
m = np.zeros((max(left_column), max(middle_column)), dtype=np.float)
for x, y, z in zip(left_column, middle_column, right_column):
x -= 1 # Because the indicies are 1-based
y -= 1 # Need to be 0-based
m[x, y] = z
print(m)
#: array([[ 1., 2., 3., 4.],
#: [ 5., 6., 7., 8.]])

Categories

Resources