First of all I am quite a newbie on python, so please forgive me if I don't see the wood for the trees. My question is on reading a huge file of float numbers and storing them in an array for fast mathematical postprocessing.
Lets assume the file looks similar to this:
!!
-3.2297390 0.4474691 3.5690145 3.5976372 6.9002712 7.7787466 14.2159269 14.3291490
16.7660723 17.1258704 18.9469059 19.1716808 20.0700721 21.4088414
-3.2045361 0.4123081 3.5625981 3.5936954 6.8901539 7.7543415 14.2764611 14.3623976
16.7955934 17.1560337 18.9527369 19.1251184 20.0700709 21.3515145
-3.2317597 0.4494166 3.5799182 3.6005429 6.8838705 7.7661897 14.2576455 14.3295731
16.7550357 17.0986678 19.0187779 19.1687722 20.0288587 21.3818250
-3.1921346 0.3949598 3.5636878 3.5892085 6.8833690 7.7404542 14.3061281 14.3855389
16.8063645 17.1697110 18.9549920 19.1134580 20.0613223 21.3196066
here there are 4 (nb) blocks of 14 (nk) float numbers each. I want them to be arranged in an array elements[nb][nk] so that I can access easily looping over certain floats of the blocks.
Here is what I thought it should look like, but it doesn't work at all:
nb=4
nk=14
with open("datafile") as file:
elements = []
n = 0
while '!!' not in file:
while n <= (nb-1):
elements.append([])
current = map(float,file.read().split()) # here I would need something to assure only 14 (nk) floats are read in
elements[n].append(current)
n += 1
print(elements[0][1])
It would be great if had some ideas and suggestions. Thanks!
EDIT:
here an datafile where the numbers follow after each other with no clear seperator after a block nb. Here it is nb=2 and nk=160. How to split the read in floats after each 160th number?
!!
-7.2578105433 -7.2578105433 -6.7774609392 -6.7774609392 -6.3343986693 -6.3343986693 -5.8537216826 -5.8537216826
-5.6031029888 -5.6031029888 -2.9103190893 -2.9103190893 -1.7962279174 -1.7962279174 -0.8136720023 -0.8136720023
-0.1418500769 -0.1418500769 2.9923464558 2.9923464558 3.5797768050 3.5797768050 3.8793240270 3.8793240270
4.0774192689 4.0774192689 4.2378755781 4.2378755781 4.2707165126 4.2707165126 4.3290523910 4.3290523910
4.4487102661 4.4487102661 4.5341883539 4.5341883539 4.7946098470 4.7946098470 4.9518205998 4.9518205998
4.9592549825 4.9592549825 5.1648268937 5.1648268937 5.2372127454 5.2372127454 5.9377062691 5.9377062691
6.2971992823 6.2971992823 6.6324702419 6.6324702419 6.7948808733 6.7948808733 7.0835270703 7.0835270703
7.6252686579 7.6252686579 7.7886279100 7.7886279100 7.8514022664 7.8514022664 7.9188180854 7.9188180854
7.9661386138 7.9661386138 8.2830991934 8.2830991934 8.4581462733 8.4581462733 8.5537201519 8.5537201519
10.2738010533 10.2738010533 11.4495306517 11.4495306517 11.4819579346 11.4819579346 11.5788238984 11.5788238984
11.9411469341 11.9411469341 12.5006172267 12.5006172267 12.5055546075 12.5055546075 12.6659410418 12.6659410418
12.8741094000 12.8741094000 12.9560279595 12.9560279595 12.9780521671 12.9780521671 13.2195973082 13.2195973082
13.2339969658 13.2339969658 13.3594047155 13.3594047155 13.4530024795 13.4530024795 13.4556342387 13.4556342387
13.5784994631 13.5784994631 14.6887369915 14.6887369915 14.9019726334 14.9019726334 15.1279383300 15.1279383300
15.1953349879 15.1953349879 15.3209538297 15.3209538297 15.4042612992 15.4042612992 15.4528348692 15.4528348692
15.4542742538 15.4542742538 15.5291462589 15.5291462589 15.5415591416 15.5415591416 16.0741610117 16.0741610117
16.1117432607 16.1117432607 16.3566675522 16.3566675522 17.7569123657 17.7569123657 18.4416346230 18.4416346230
18.9525843134 18.9525843134 19.0591624486 19.0591624486 19.1069867477 19.1069867477 19.1853525353 19.1853525353
19.4020021909 19.4020021909 19.4718240723 19.4718240723 19.6384650104 19.6384650104 19.6919638323 19.6919638323
19.7044699790 19.7044699790 19.8851141335 19.8851141335 20.6132283388 20.6132283388 21.4074471478 21.4074471478
-7.2568288331 -7.2568280628 -6.7765483088 -6.7765429702 -6.3336003082 -6.3334841531 -5.8529872639 -5.8528369047
-5.6024822566 -5.6024743589 -2.9101060346 -2.9100930470 -1.7964872791 -1.7959333994 -0.8153333579 -0.8144924713
-0.1440078470 -0.1421444935 2.9869228390 2.9935342026 3.5661875018 3.5733148387 3.8777649741 3.8828300867
4.0569348321 4.0745074351 4.2152251981 4.2276050415 4.2620483420 4.2649182323 4.3401804124 4.3402590222
4.4446178512 4.4509411587 4.5139270348 4.5526439516 4.7788285567 4.7810706248 4.9282976775 4.9397807768
4.9737752749 4.9900180286 5.1456209436 5.1507667583 5.2528363215 5.2835144984 5.9252188817 5.9670441193
6.2699491148 6.3270140700 6.5912060019 6.6576016532 6.7976670773 6.7982056614 7.0789050974 7.1023337244
7.6182108739 7.6309688587 7.7678148773 7.7874194913 7.8544608005 7.8594983757 7.9019395451 7.9100447766
7.9872550937 7.9902791771 8.2617740182 8.3147140843 8.4533756827 8.4672364683 8.5556163680 8.5558640539
10.2756173692 10.2760227976 11.4344757209 11.4355375519 11.4737803653 11.4760186102 11.5914333288 11.5953932241
11.9369518613 11.9380900159 12.4973099542 12.5002401499 12.5030167542 12.5031963862 12.6629548222 12.6634150863
12.8719844312 12.8728126622 12.9541436501 12.9568445777 12.9762780998 12.9764840239 13.2074024551 13.2108294169
13.2279146175 13.2308902307 13.3780648962 13.3839050348 13.4634576072 13.4650575047 13.4701414823 13.4718238883
13.5901622459 13.5971076111 14.6735704782 14.6840793519 14.8963924604 14.8968395615 15.1163287408 15.1219631271
15.1791724308 15.1817299995 15.2628531102 15.3027136606 15.3755066968 15.3802521520 15.3969012144 15.4139294088
15.5131322524 15.5315039463 15.5465532500 15.5629105034 15.5927166831 15.5966393750 16.0841067052 16.0883417123
16.1224821534 16.1226510159 16.3646268213 16.3665839987 17.7654543366 17.7657216551 18.4305335335 18.4342292730
18.9110142692 18.9215889808 18.9821593138 18.9838270736 19.1633959849 19.1637558341 19.2040877093 19.2056062802
19.3760597529 19.3846323861 19.4323552578 19.4329488797 19.6494790293 19.6813374885 19.6943820824 19.7202356536
19.7381237231 19.7414645409 19.9056461663 19.9197428869 20.6239183178 20.6285756411 21.4127637743 21.4128909767
This should work:
elements = []
with open("datafile") as file:
next(file)
for line in file:
elements.append([float(x) for x in line.split()])
next(line) reads the first line. Then for line in file: iterates over all other lines. The list comprehension [float(x) for x in line.split()] goes through all entries in the line split by whitespace. Finally, elements.append() appends this list to elements, which becomes a list of lists that you can call an 2D array.
Access the first entry in the first line:
>>> elements[0][0]
-3.229739
or the last entry in the last line:
>>> elements[3][13]
21.319606
alternatively:
>>> elements[-1][-1]
21.319606
Update
This reads the file into a list of lists without taking line breaks as special:
nb = 2
nk = 160
with open("datafile") as fobj:
all_values = iter(x for x in fobj.read().split())
next(all_values)
elements = []
for x in range(nb):
elements.append([float(next(all_values)) for counter in range(nk)])
If you like nested list comprehensions:
with open("datafile") as fobj:
all_values = iter(x for x in fobj.read().split())
next(all_values)
elements = [[float(next(all_values)) for counter in range(nk)] for x in range(nb)]
Related
I have a program that's looking for certain values in a log file and listing them out. Essentially, one line of a 50000 line file would look like this:
Step Elapsed Temp Press Volume TotEng KinEng PotEng E_mol E_pair Pxx Pyy Pzz Pxz Pxy Pyz
0 0 298 -93.542117 448382.78 -67392.894 17986.81 -85379.704 12349.955 -97729.659 -313.09273 44.936408 -12.47003 100.97953 -215.4029 254.07517
10 10 301.05619 -14.956923 448382.78 -66191.142 18171.277 -84362.419 12474.283 -96836.702 -56.794471 103.79453 -91.870824 300.09707 -27.638439 196.2738
The bit of code that's doing the searching and appending looks like this:
line=fp.readline()
while line:
line=fp.readline()
words = line.split()
if (words[0]=="Step"):
break
numcol = len(words)
header = words
data = numpy.zeros((numcol,100000))
ln = 0
while line:
line=fp.readline()
words=line.split()
if(words[0]=="Loop"):
break
for i in range(numcol):
data[i][ln]=(float(words[i]))
ln_original = ln
ln = ln +1
Currently, I'm specifying the number of columns in my array. I can't seem to figure out how to get appending to work. Any ideas as to what I could change so that the array can be dynamic for log files of various lengths instead of specifying something like 1,000,000 lines in the array to begin with?
make a list of lists and append items to those lists. when you get to the end of the file cast the list of lists to a np.ndarray.
change
data = numpy.zeros((numcol,100000))
to
data = [[] for i in range(numcol)]
and change
data[i][ln]=(float(words[i]))
to
data[i].append(float(words[i]))
at the end of the code add
data = np.array(data)
I'm getting really confused with all the information on here using 'split' in python. Basically I want to write a code which opens a spreadsheet (with two columns in it) and the function I write will use the first column as x's and the second column as y's and then it will plot it in the x-y plane.
I thought I would use line.splitlines to cut each line in excel into (x,y) but I keep getting
'ValueError: need more than 1 value to unpack'
I don't know what this means?
Below is what I've written so far, (xdir is an initial condition for a different part of my question):
def plotMo(filename, xdir):
infile = open(filename)
data = []
for line in infile:
x,y = line.splitlines()
x = float(x)
y = float(y)
data.append([x,y])
infile.close()
return data
plt.plot(x,y)
For example with
0 0.049976
0.01 0.049902
0.02 0.04978
0.03 0.049609
0.04 0.04939
0.05 0.049123
0.06 0.048807
I would want to the first point in my plane to be (0, 0.049976) and the second plot to be (0.01, 0.049902).
x,y = line.splitlines() tries to split the current line into several lines.
Since splitlines returns only 1 element, there's an error because python cannot find a value to assign to y.
What you want is x,y = line.split() which will split the line according to 1 or more spaces (like awk would do) if no parameter is specified.
However it depends of the format: if there are blank lines you'll get the "unpack" problem at some point, so to be safe and skip malformed lines, write:
items = line.split()
if len(items)==2: x,y = items
To sum it up, a more pythonic, shorter & safer way of writing your routine would be:
def plotMo(filename):
with open(filename) as infile:
data = []
for line in infile:
items = line.split()
if len(items)==2:
data.append([float(e) for e in items])
return data
(maybe it could be condensed more, but that's good for starters)
I have a file with data like:
POTENTIAL
TYPE 1
-5.19998150116627E+07 -5.09571848744513E+07 -4.99354600752570E+07 -4.89342214499422E+07 -4.79530582388520E+07
-4.69915679183017E+07 -4.60493560354389E+07 -4.51260360464197E+07 -4.42212291578282E+07 -4.33345641712756E+07
-4.24656773311163E+07 -4.16142121752159E+07 -4.07798193887125E+07 -3.99621566607090E+07 -3.91608885438409E+07
-3.83756863166569E+07
-8.99995987594328E+07 -8.81884626368405E+07 -8.64137733336537E+07 -8.46747974037847E+07 -8.29708161608188E+07
-8.13011253809965E+07 -7.96650350121689E+07 -7.80618688886128E+07 -7.64909644515842E+07 -7.49516724754953E+07
-7.34433567996002E+07 -7.19653940650832E+07 -7.05171734574350E+07 -6.90980964540154E+07 -6.77075765766936E+07
-6.63450391494693E+07
Note as per Nsh's comment these data are not single line. They always have 5 data per line, and as per this example, 4 row, with only one data in 4th row. So, I have 16 float spread over 4 line. I always know the total number (i.e. 16 in this case)
My aim is to read them as a list (please let me know if there is better things). The row with the single entry denotes end of a list (e.g. the list[1] ends with -3.83756863166569E+07).
I tried to read it as:
if line.startswith("POTENTIAL"):
lines = f.readline()
if lines.startswith("TYPE "):
lines=f.readline()
lines=lines.split()
lines = [float(i) for i in lines]
pots.append(lines)
print(pots)
which gives result:
[[-51999815.0116627, -50957184.8744513, -49935460.075257, -48934221.4499422, -47953058.238852]]
i.e. just the first line from the list, and not going any further.
My aim is to get them as different list (possibly) as:
pots[1]=[-5.19998150116627E+07....-3.83756863166569E+07]
pots[2]=[-8.99995987594328E+07....-6.63450391494693E+07]
I have read searched google extensively (the present state itself is from another SO question), but due to my inexperience, I cant solve my problem.
Kindly help.
use + instead of append.
It will append the elements of lines to pots.
pots = pots + lines
I didn't see in the start:
pots = []
It is needed in this case...
ITEMS_PER_LIST = 16
lists = [[]] # list of lists with initialized first sublist
with open('data.txt') as f:
for line in f:
if line.startswith(("POTENTIAL", "TYPE")):
continue
if len(lists[-1]) == ITEMS_PER_LIST:
lists.append([]) # create new list
lists[-1].extend([float(i) for i in line.split()])
Additional tweaks are required to validate headers.
I have a file that is space delimited with values for x,y,x. I need to visualise the data so I guess I need so read the file into 3 separate arrays (X,Y,Z) and then plot them. How do I read the file into 3 seperate arrays I have this so far which removes the white space element at the end of every line.
def fread(f=None):
"""Reads in test and training CSVs."""
X = []
Y = []
Z = []
if (f==None):
print("No file given to read, exiting...")
sys.exit(1)
read = csv.reader(open(f,'r'),delimiter = ' ')
for line in read:
line = line[:-1]
I tried to add something like:
for x,y,z in line:
X.append(x)
Y.append(y)
Z.append(z)
But I get an error like "ValueError: too many values to unpack"
I have done lots of googling but nothing seems to address having to read in a file into a separate array every element.
I should add my data isn't sorted nicely into rows/columns it just looks like this
"107745590026 2 0.02934046648 0.01023879368 3.331810236 2 0.02727724425 0.07867902517 3.319272757 2 0.01784882881"......
Thanks!
EDIT: If your data isn't actually separated into 3-element lines (and is instead one long space-separated list of values), you could use python list slicing with stride to make this easier:
X = read[::3]
Y = read[1::3]
Z = read[2::3]
This error might be happening because some of the lines in read contain more than three space-separated values. It's unclear from your question exactly what you'd want to do in these cases. If you're using python 3, you could put the first element of a line into X, the second into Y, and all the rest of that line into Z with the following:
for x, y, *z in line:
X.append(x)
Y.append(y)
for elem in z:
Z.append(elem)
If you're not using python 3, you can perform the same basic logic in a slightly more verbose way:
for i, elem in line:
if i == 0:
X.append(elem)
elif i == 1:
Y.append(elem)
else:
Z.append(elem)
This is my first post even though I've been reading SO for a while.
I'm a Python beginner and I'd need your help.
I'm processing a very big file (more than 2 million of lines) but I'll show you a much smaller example (24 lines rather than 74513). So let's say I've got 24 lines, each one with a floating point number, after that 3 numbers on the same line, then again 24 lines, line with 3 numbers and so on for 29 times.
56.71739
56.67950
56.65762
56.63320
56.61648
56.60323
56.63215
56.74365
56.98378
57.34681
57.78903
58.27959
58.81514
59.38853
59.98271
60.58515
-1.00000
56.09566
56.05496
56.02777
56.00158
55.98341
55.96830
55.99615
1 1 1
56.34692
56.70977
57.15187
57.64234
58.17782
58.75118
59.34534
59.94779
-1.00000
55.47366
55.42963
55.39739
55.36958
55.35020
55.33404
55.36098
55.47148
55.71110
56.07384
56.51588
57.00632
57.54180
58.11517
58.70937
2 1 1
It's quite easy to create an array with the first 24 lines:
import numpy
def ttarray_tms (traveltimes):
'''It defines the 3-D array, organized as I want.'''
with open (traveltimes, 'r') as file_in:
newarray = file_in.readlines()
ttarray = np.array(newarray)
ttarray.shape = (2,3,4)
ttarray = np.swapaxes(ttarray,1,2)
ttarray = np.swapaxes(ttarray,0,2)
return ttarray
PLEASE NOTE: There's no blank line between each number. It's a simple colon-vector file. For some reason I had to post like that.
What I want is to basically get 29 arrays, so I should loop over the 24 lines and get an array, then loop again over the next 24 lines (jumping the line with 3 numbers, I don't really need them) and get another array and so on. I think my main problem is how to skip the line with the 3 numbers and start again a new loop for a new array.
Have you got any good idea?
Thanks very much!
You can use readline() to read a single line 24 times then use another readline() to skip a line and so on.
With your code:
import numpy
def mk_array(elems):
'''Makes the nparray from an array of 24 numbers'''
ttarray = np.array(elems) # perhaps [ float(a) for a in elems ] is needed
ttarray.shape = (2,3,4)
ttarray = np.swapaxes(ttarray,1,2)
ttarray = np.swapaxes(ttarray,0,2)
return ttarray
def ttarray_tms(traveltimes):
'''It defines the 3-D array, organized as I want.'''
arrays = list()
with open (traveltimes, 'r') as file_in:
ret = "." # force the loop
while ret != "":
newarray = [ file_in.readline() for i in range(24) ]
ret = file_in.realine()
if ret != "": # avoid an empty array
ttarray = mk_array(newarray)
arrays.append(ttarray)
return arrays
Not tested.
The numbers in the three set line are following an incrementing pattern. So why don't you keep track of that pattern by keeping the last two numbers in two variables and if the three correspond to the pattern drop them and continue? It is kind of a sliding window approach.