Read file elements into 3 different arrays - python

I have a file that is space delimited with values for x,y,x. I need to visualise the data so I guess I need so read the file into 3 separate arrays (X,Y,Z) and then plot them. How do I read the file into 3 seperate arrays I have this so far which removes the white space element at the end of every line.
def fread(f=None):
"""Reads in test and training CSVs."""
X = []
Y = []
Z = []
if (f==None):
print("No file given to read, exiting...")
sys.exit(1)
read = csv.reader(open(f,'r'),delimiter = ' ')
for line in read:
line = line[:-1]
I tried to add something like:
for x,y,z in line:
X.append(x)
Y.append(y)
Z.append(z)
But I get an error like "ValueError: too many values to unpack"
I have done lots of googling but nothing seems to address having to read in a file into a separate array every element.
I should add my data isn't sorted nicely into rows/columns it just looks like this
"107745590026 2 0.02934046648 0.01023879368 3.331810236 2 0.02727724425 0.07867902517 3.319272757 2 0.01784882881"......
Thanks!

EDIT: If your data isn't actually separated into 3-element lines (and is instead one long space-separated list of values), you could use python list slicing with stride to make this easier:
X = read[::3]
Y = read[1::3]
Z = read[2::3]
This error might be happening because some of the lines in read contain more than three space-separated values. It's unclear from your question exactly what you'd want to do in these cases. If you're using python 3, you could put the first element of a line into X, the second into Y, and all the rest of that line into Z with the following:
for x, y, *z in line:
X.append(x)
Y.append(y)
for elem in z:
Z.append(elem)
If you're not using python 3, you can perform the same basic logic in a slightly more verbose way:
for i, elem in line:
if i == 0:
X.append(elem)
elif i == 1:
Y.append(elem)
else:
Z.append(elem)

Related

Counting the number of rows filled with tuples

As part of parsing a PDB file, I've extracted a set of coordinates (x, y, z) for particular atoms that I want to exist as floats. However, I also need to know how many sets of coordinates I have extracted.
Below is my code through the coordinate extraction, and what I thought would produce the count of how many sets of three coordinates I've extracted.
When using len(coordinates), I unfortunately get back that each set of coordinates contains 3 tuples (the x, y, and z coordinates.
Any insight into how to properly count the number of sets would be helpful. I'm quite new to Python and am still in the stage of being unsure about if I am even asking this correctly!
from sys import argv
with open(argv[1]) as pbd:
print()
for line in pbd:
if line[:4] == 'ATOM':
atom_type = line[13:16]
if atom_type == "CA" or "N" or "C":
x = float(line[31:38])
y = float(line[39:46])
z = float(line[47:54])
coordinates = (x, y, z)
# printing (coordinates) gives
# (36.886, 53.177, 21.887)
# (38.323, 52.817, 21.996)
# (38.493, 51.553, 22.83)
# (37.73, 51.314, 23.77)
print(len(coordinates))
# printing len(coordinates)) gives
# 3
# 3
# 3
# 3
Thank you for any insight!
If you want to count the number of specific atoms in your file, try this one
from sys import argv
with open(argv[1]) as pbd:
print()
atomCount = 0
for line in pbd:
if line[:4] == 'ATOM':
atom_type = line[13:16]
if atom_type == "CA" or "N" or "C":
atomCount += 1
print(atomCount)
What it does is basically, you traverse your whole pbd file and check the type of each atom(seems fourth column in your data). Each time you encounter your desired atom types you increase a counter variable by 1.
Your coordinates variable is a tuple, tuples are ordered and unchangeable. Use lists is better.
coordinates=[]
for ....:
coordinates.append([x,y,z])
len(coordinates) # should be 4 I guess.

Using split in python on excel to get two parameter

I'm getting really confused with all the information on here using 'split' in python. Basically I want to write a code which opens a spreadsheet (with two columns in it) and the function I write will use the first column as x's and the second column as y's and then it will plot it in the x-y plane.
I thought I would use line.splitlines to cut each line in excel into (x,y) but I keep getting
'ValueError: need more than 1 value to unpack'
I don't know what this means?
Below is what I've written so far, (xdir is an initial condition for a different part of my question):
def plotMo(filename, xdir):
infile = open(filename)
data = []
for line in infile:
x,y = line.splitlines()
x = float(x)
y = float(y)
data.append([x,y])
infile.close()
return data
plt.plot(x,y)
For example with
0 0.049976
0.01 0.049902
0.02 0.04978
0.03 0.049609
0.04 0.04939
0.05 0.049123
0.06 0.048807
I would want to the first point in my plane to be (0, 0.049976) and the second plot to be (0.01, 0.049902).
x,y = line.splitlines() tries to split the current line into several lines.
Since splitlines returns only 1 element, there's an error because python cannot find a value to assign to y.
What you want is x,y = line.split() which will split the line according to 1 or more spaces (like awk would do) if no parameter is specified.
However it depends of the format: if there are blank lines you'll get the "unpack" problem at some point, so to be safe and skip malformed lines, write:
items = line.split()
if len(items)==2: x,y = items
To sum it up, a more pythonic, shorter & safer way of writing your routine would be:
def plotMo(filename):
with open(filename) as infile:
data = []
for line in infile:
items = line.split()
if len(items)==2:
data.append([float(e) for e in items])
return data
(maybe it could be condensed more, but that's good for starters)

Extract multiple arrays from .DAT file with undefined size

I have a device that stores three data sets in a .DAT file, they always have the same heading and number of columns, but the number of rows vary.
They are (n x 4), (m x 4), (L x 3).
I need to extract the three data sets into seperate arrays for plotting.
I have been trying to use numpy.genfromtxt and numpy.loadtxt, but the only way I can get them to work for this format is to manually define the row which each data set starts.
As I will regularly need to deal with this format I have been trying to automate it.
If someone could suggest a method which might work I would greatly appreciate it. I have attached an example file.
example file
Just a quck and dirty solution. At your file size, you might run into performance issues. If you know m, n and L, initialize the output vectors with the respective length.
here is the strategy: Load the whole File in a variable. Read the variable line by line. As soon as you discover a keyword, raise a flag that you are in the specific block. In the next line, read out the line to the correct variables.
isblock1 = isblock2 = isblock3 = False
fout = [] # construct also all the other variables that you want to collect.
with open(file, 'r') as file:
lines = file.readlines() #read all the lines
for line in lines:
if isblock1:
(f, psd, ipj, itj) = line.split()
fout.append(f) #do this also with the other variables
if isblock2:
(t1, p1, p2, p12) = line.split()
if isblock3:
(t2, v1, v2) = line.split()
if 'Frequency' is in line:
isblock1 = True
isblock2 = isblock3 = False
if 'Phasor' is in line:
isblock2 = True
isblock1 = isblock3 = False
if 'Voltage' is in line:
isblock3 = True
isblock1 = isblock2 = False
Hope that helps.

Arranging distinct number of floats in an 2d array

First of all I am quite a newbie on python, so please forgive me if I don't see the wood for the trees. My question is on reading a huge file of float numbers and storing them in an array for fast mathematical postprocessing.
Lets assume the file looks similar to this:
!!
-3.2297390 0.4474691 3.5690145 3.5976372 6.9002712 7.7787466 14.2159269 14.3291490
16.7660723 17.1258704 18.9469059 19.1716808 20.0700721 21.4088414
-3.2045361 0.4123081 3.5625981 3.5936954 6.8901539 7.7543415 14.2764611 14.3623976
16.7955934 17.1560337 18.9527369 19.1251184 20.0700709 21.3515145
-3.2317597 0.4494166 3.5799182 3.6005429 6.8838705 7.7661897 14.2576455 14.3295731
16.7550357 17.0986678 19.0187779 19.1687722 20.0288587 21.3818250
-3.1921346 0.3949598 3.5636878 3.5892085 6.8833690 7.7404542 14.3061281 14.3855389
16.8063645 17.1697110 18.9549920 19.1134580 20.0613223 21.3196066
here there are 4 (nb) blocks of 14 (nk) float numbers each. I want them to be arranged in an array elements[nb][nk] so that I can access easily looping over certain floats of the blocks.
Here is what I thought it should look like, but it doesn't work at all:
nb=4
nk=14
with open("datafile") as file:
elements = []
n = 0
while '!!' not in file:
while n <= (nb-1):
elements.append([])
current = map(float,file.read().split()) # here I would need something to assure only 14 (nk) floats are read in
elements[n].append(current)
n += 1
print(elements[0][1])
It would be great if had some ideas and suggestions. Thanks!
EDIT:
here an datafile where the numbers follow after each other with no clear seperator after a block nb. Here it is nb=2 and nk=160. How to split the read in floats after each 160th number?
!!
-7.2578105433 -7.2578105433 -6.7774609392 -6.7774609392 -6.3343986693 -6.3343986693 -5.8537216826 -5.8537216826
-5.6031029888 -5.6031029888 -2.9103190893 -2.9103190893 -1.7962279174 -1.7962279174 -0.8136720023 -0.8136720023
-0.1418500769 -0.1418500769 2.9923464558 2.9923464558 3.5797768050 3.5797768050 3.8793240270 3.8793240270
4.0774192689 4.0774192689 4.2378755781 4.2378755781 4.2707165126 4.2707165126 4.3290523910 4.3290523910
4.4487102661 4.4487102661 4.5341883539 4.5341883539 4.7946098470 4.7946098470 4.9518205998 4.9518205998
4.9592549825 4.9592549825 5.1648268937 5.1648268937 5.2372127454 5.2372127454 5.9377062691 5.9377062691
6.2971992823 6.2971992823 6.6324702419 6.6324702419 6.7948808733 6.7948808733 7.0835270703 7.0835270703
7.6252686579 7.6252686579 7.7886279100 7.7886279100 7.8514022664 7.8514022664 7.9188180854 7.9188180854
7.9661386138 7.9661386138 8.2830991934 8.2830991934 8.4581462733 8.4581462733 8.5537201519 8.5537201519
10.2738010533 10.2738010533 11.4495306517 11.4495306517 11.4819579346 11.4819579346 11.5788238984 11.5788238984
11.9411469341 11.9411469341 12.5006172267 12.5006172267 12.5055546075 12.5055546075 12.6659410418 12.6659410418
12.8741094000 12.8741094000 12.9560279595 12.9560279595 12.9780521671 12.9780521671 13.2195973082 13.2195973082
13.2339969658 13.2339969658 13.3594047155 13.3594047155 13.4530024795 13.4530024795 13.4556342387 13.4556342387
13.5784994631 13.5784994631 14.6887369915 14.6887369915 14.9019726334 14.9019726334 15.1279383300 15.1279383300
15.1953349879 15.1953349879 15.3209538297 15.3209538297 15.4042612992 15.4042612992 15.4528348692 15.4528348692
15.4542742538 15.4542742538 15.5291462589 15.5291462589 15.5415591416 15.5415591416 16.0741610117 16.0741610117
16.1117432607 16.1117432607 16.3566675522 16.3566675522 17.7569123657 17.7569123657 18.4416346230 18.4416346230
18.9525843134 18.9525843134 19.0591624486 19.0591624486 19.1069867477 19.1069867477 19.1853525353 19.1853525353
19.4020021909 19.4020021909 19.4718240723 19.4718240723 19.6384650104 19.6384650104 19.6919638323 19.6919638323
19.7044699790 19.7044699790 19.8851141335 19.8851141335 20.6132283388 20.6132283388 21.4074471478 21.4074471478
-7.2568288331 -7.2568280628 -6.7765483088 -6.7765429702 -6.3336003082 -6.3334841531 -5.8529872639 -5.8528369047
-5.6024822566 -5.6024743589 -2.9101060346 -2.9100930470 -1.7964872791 -1.7959333994 -0.8153333579 -0.8144924713
-0.1440078470 -0.1421444935 2.9869228390 2.9935342026 3.5661875018 3.5733148387 3.8777649741 3.8828300867
4.0569348321 4.0745074351 4.2152251981 4.2276050415 4.2620483420 4.2649182323 4.3401804124 4.3402590222
4.4446178512 4.4509411587 4.5139270348 4.5526439516 4.7788285567 4.7810706248 4.9282976775 4.9397807768
4.9737752749 4.9900180286 5.1456209436 5.1507667583 5.2528363215 5.2835144984 5.9252188817 5.9670441193
6.2699491148 6.3270140700 6.5912060019 6.6576016532 6.7976670773 6.7982056614 7.0789050974 7.1023337244
7.6182108739 7.6309688587 7.7678148773 7.7874194913 7.8544608005 7.8594983757 7.9019395451 7.9100447766
7.9872550937 7.9902791771 8.2617740182 8.3147140843 8.4533756827 8.4672364683 8.5556163680 8.5558640539
10.2756173692 10.2760227976 11.4344757209 11.4355375519 11.4737803653 11.4760186102 11.5914333288 11.5953932241
11.9369518613 11.9380900159 12.4973099542 12.5002401499 12.5030167542 12.5031963862 12.6629548222 12.6634150863
12.8719844312 12.8728126622 12.9541436501 12.9568445777 12.9762780998 12.9764840239 13.2074024551 13.2108294169
13.2279146175 13.2308902307 13.3780648962 13.3839050348 13.4634576072 13.4650575047 13.4701414823 13.4718238883
13.5901622459 13.5971076111 14.6735704782 14.6840793519 14.8963924604 14.8968395615 15.1163287408 15.1219631271
15.1791724308 15.1817299995 15.2628531102 15.3027136606 15.3755066968 15.3802521520 15.3969012144 15.4139294088
15.5131322524 15.5315039463 15.5465532500 15.5629105034 15.5927166831 15.5966393750 16.0841067052 16.0883417123
16.1224821534 16.1226510159 16.3646268213 16.3665839987 17.7654543366 17.7657216551 18.4305335335 18.4342292730
18.9110142692 18.9215889808 18.9821593138 18.9838270736 19.1633959849 19.1637558341 19.2040877093 19.2056062802
19.3760597529 19.3846323861 19.4323552578 19.4329488797 19.6494790293 19.6813374885 19.6943820824 19.7202356536
19.7381237231 19.7414645409 19.9056461663 19.9197428869 20.6239183178 20.6285756411 21.4127637743 21.4128909767
This should work:
elements = []
with open("datafile") as file:
next(file)
for line in file:
elements.append([float(x) for x in line.split()])
next(line) reads the first line. Then for line in file: iterates over all other lines. The list comprehension [float(x) for x in line.split()] goes through all entries in the line split by whitespace. Finally, elements.append() appends this list to elements, which becomes a list of lists that you can call an 2D array.
Access the first entry in the first line:
>>> elements[0][0]
-3.229739
or the last entry in the last line:
>>> elements[3][13]
21.319606
alternatively:
>>> elements[-1][-1]
21.319606
Update
This reads the file into a list of lists without taking line breaks as special:
nb = 2
nk = 160
with open("datafile") as fobj:
all_values = iter(x for x in fobj.read().split())
next(all_values)
elements = []
for x in range(nb):
elements.append([float(next(all_values)) for counter in range(nk)])
If you like nested list comprehensions:
with open("datafile") as fobj:
all_values = iter(x for x in fobj.read().split())
next(all_values)
elements = [[float(next(all_values)) for counter in range(nk)] for x in range(nb)]

Free up numbers from string

I have a very annoying output format from a program for my x,y,r values, namely:
circle(201.5508,387.68505,2.298685) # text={1}
circle(226.21442,367.48613,1.457215) # text={2}
circle(269.8067,347.73605,1.303065) # text={3}
circle(343.29599,287.43024,6.5938) # text={4}
is there a way to get the 3 numbers out into an array without doing manual labor?
So I want the above input to become
201.5508,387.68505,2.298685
226.21442,367.48613,1.457215
269.8067,347.73605,1.303065
343.29599,287.43024,6.5938
if you mean that the circle(...) construct is the output you want to parse. Try something like this:
import re
a = """circle(201.5508,387.68505,2.298685) # text={1}
circle(226.21442,367.48613,1.457215) # text={2}
circle(269.8067,347.73605,1.303065) # text={3}
circle(343.29599,287.43024,6.5938) # text={4}"""
for line in a.split("\n"):
print [float(x) for x in re.findall(r"\d+(?:\.\d+)?", line)]
Otherwise, you might mean that you want to call circle with numbers taken from an array containing 3 numbers, which you can do as:
arr = [343.29599,287.43024,6.5938]
circle(*arr)
A bit unorthodox, but as the format of your file is valid Python code and there are probably no security risks regarding untrusted code, why not just simply define a circle function which puts all the circles into a list and execute the file like:
circles = []
def circle(x, y, r):
circles.append((x, y, r))
execfile('circles.txt')
circles is now list containing triplets of x, y and r:
[(201.5508, 387.68505, 2.298685),
(226.21442, 367.48613, 1.457215),
(269.8067, 347.73605, 1.303065),
(343.29599, 287.43024, 6.5938)]

Categories

Resources