Related
I would like to process the following line (output of a Fortran program) from a file, with Python:
74 0.4131493371345440E-03 -0.4592776407685850E-03 -0.1725046324754540
and obtain an array such as:
[74,0.4131493371345440e-3,-0.4592776407685850E-03,-0.1725046324754540]
My previous attempts do not work. In particular, if I do the following :
with open(filename,"r") as myfile:
line=np.array(re.findall(r"[-+]?\d*\.*\d+",myfile.readline())).astype(float)
I have the following error :
ValueError: could not convert string to float: 'E-03'
Steps:
Get list of strings (str.split(' '))
Get rid of "\n" (del arr[-1])
Turn list of strings into numbers (Converting a string (with scientific notation) to an int in Python)
Code:
import decimal # you may also leave this out and use `float` instead of `decimal.Decimal()`
arr = "74 0.4131493371345440E-03 -0.4592776407685850E-03 -0.1725046324754540 \n"
arr = arr.split(' ')
del arr[-1]
arr = [decimal.Decimal(x) for x in arr]
# do your np stuff
Result:
>>> print(arr)
[Decimal('74'), Decimal('0.0004131493371345440'), Decimal('-0.0004592776407685850'), Decimal('-0.1725046324754540')]
PS:
I don't know if you wrote the file that gives the output in the first place, but if you did, you could just think about outputting an array of float() / decimal.Decimal() from that file instead.
#ant.kr Here is a possible solution:
# Initial data
a = "74 0.4131493371345440E-03 -0.4592776407685850E-03 -0.1725046324754540 \n"
# Given the structure of the initial data, we can proceed as follow:
# - split the initial at each white space; this will produce **list** with the last
# the element being **\n**
# - we can now convert each list element into a floating point data, store them in a
# numpy array.
line = np.array([float(i) for i in a.split(" ")[:-1]])
I'm trying to make a time series plot, and I have data points every second for about 50 seconds of time (which in my case is in UTC). Python is yelling at me about my array of data in the x axis of my plot, which is as follows:
%run "C:/Users/Jeff/Desktop/Python/STEPS_data.py"
File "C:\Users\Jeff\Desktop\Python\STEPS_data.py", line 3
x = [23:13:51,23:13:52,23:13:53,23:13:54,23:13:55,23:13:56,23:13:57,23:13:58,23:13:59,23:14:00,23:14:01,23:14:02,23:14:03,23:14:04,23:14:05,23:14:06,23:14:07,23:14:08,23:14:09,23:14:10,23:14:11,23:14:12,23:14:13,23:14:14,23:14:15,23:14:16,23:14:17,23:14:18,23:14:19,23:14:20,23:14:21,23:14:22,23:14:23,23:14:24,23:14:25,23:14:26,23:14:27,23:14:28,23:14:29,23:14:30,23:14:31,23:14:32,23:14:33,23:14:34,23:14:35,23:14:36]
^
SyntaxError: invalid syntax
There's a bunch of other info about the plot after this, but it gets hung up on this line, where it says that I have an invalid syntax error at the first colon in the array element 23:14:23, which doesn't really make sense to me. I tried making the array its own variable x1 and just saying x = x1, but that only pushed the syntax error point back by one character.
This seems like a really stupid problem but I'm stumped.
The problem is that : is not allowed everywhere, for example:
>>> a = 10:2
File "<ipython-input-12-63c21fb7e990>", line 1
a = 10:2
^
SyntaxError: invalid syntax
I think you wanted them as strings (in strings the : are allowed):
l = ['23:13:51', '23:13:52', '23:13:53', '23:13:54', '23:13:55', '23:13:56',
'23:13:57', '23:13:58', '23:13:59', '23:14:00', '23:14:01', '23:14:02', '23:14:03',
'23:14:04', '23:14:05', '23:14:06', '23:14:07', '23:14:08', '23:14:09', '23:14:10',
'23:14:11', '23:14:12', '23:14:13', '23:14:14', '23:14:15', '23:14:16', '23:14:17',
'23:14:18', '23:14:19', '23:14:20', '23:14:21', '23:14:22', '23:14:23', '23:14:24',
'23:14:25', '23:14:26', '23:14:27', '23:14:28', '23:14:29', '23:14:30', '23:14:31',
'23:14:32', '23:14:33', '23:14:34', '23:14:35', '23:14:36']
In case you don't want to add all these '' manually just wrap the whole thing as a string and split it:
>>> l = "[23:13:51,23:13:52,23:13:53,23:13:54,23:13:55,23:13:56,23:13:57,23:13:58,23:13:59,23:14:00,23:14:01,23:14:02,23:14:03,23:14:04,23:14:05,23:14:06,23:14:07,23:14:08,23:14:09,23:14:10,23:14:11,23:14:12,23:14:13,23:14:14,23:14:15,23:14:16,23:14:17,23:14:18,23:14:19,23:14:20,23:14:21,23:14:22,23:14:23,23:14:24,23:14:25,23:14:26,23:14:27,23:14:28,23:14:29,23:14:30,23:14:31,23:14:32,23:14:33,23:14:34,23:14:35,23:14:36]"
>>> l[1:-1].split(',')
or did you want them as datetimes?
>>> import datetime
>>> [datetime.datetime.strptime(t, '%H:%M:%S') for t in l[1:-1].split(',')]
or times?
>>> [datetime.datetime.strptime(t, '%H:%M:%S').time() for t in l[1:-1].split(',')]
I was constructing a database for a deep learning algorithm. The points I'm interested in are these:
with open(fname, 'a+') as f:
f.write("intens: " + str(mean_intensity_this_object) + "," + "\n")
f.write("distances: " + str(dists_this_object) + "," + "\n")
Where mean_intensity_this_object is a list and dists_this_object is a numpy.array, something I didn't pay enough attention to to begin with. After I opened the file, I found out that the second variable, distances, looks very different to intens: The former is
distances: [430.17802963 315.2197058 380.33997833 387.46190951 41.93648858
221.5210474 488.99452579],
and the latter
intens: [0.15381262,..., 0.13638344],
The important bit is that the latter is a standard list, while the former is very hard to read: multiple lines without delimiters and unclear rules for starting a new line. Essentially as a result I had to rerun the whole tracking algorithm and change str(dists_this_object) to str(dists_this_object.tolist()) which increased the file size.
So, my question is: why does this happen? Is it possible to save np.array objects in a more readable format, like lists?
In an interactive Python session:
>>> import numpy as np
>>> x = np.arange(10)/.33 # make an array of floats
>>> x
array([ 0. , 3.03030303, 6.06060606, 9.09090909,
12.12121212, 15.15151515, 18.18181818, 21.21212121,
24.24242424, 27.27272727])
>>> print(x)
[ 0. 3.03030303 6.06060606 9.09090909 12.12121212
15.15151515 18.18181818 21.21212121 24.24242424 27.27272727]
>>> print(x.tolist())
[0.0, 3.0303030303030303, 6.0606060606060606, 9.09090909090909, 12.121212121212121, 15.15151515151515, 18.18181818181818, 21.21212121212121, 24.242424242424242, 27.27272727272727]
The standard display for a list is with [] and ,. The display for an array is without ,. If there are over 1000 items, the array display employs an ellipsis
>>> print(x)
[ 0. 3.03030303 6.06060606 ..., 3024.24242424
3027.27272727 3030.3030303 ]
while the list display continues to show every value.
In this line, did you add the ..., or is that part of the print?
intens: [0.15381262,..., 0.13638344],
Or doing the same with a file write:
In [299]: with open('test.txt', 'w') as f:
...: f.write('array:'+str(x)+'\n')
...: f.write('list:'+str(x.tolist())+'\n')
In [300]: cat test.txt
array:[ 0. 3.33333333 6.66666667 10. 13.33333333
16.66666667 20. 23.33333333 26.66666667 30. ]
list:[0.0, 3.3333333333333335, 6.666666666666667, 10.0, 13.333333333333334, 16.666666666666668, 20.0, 23.333333333333336, 26.666666666666668, 30.0]
np.savetxt gives more control over the formatting of an array, for example:
In [312]: np.savetxt('test.txt',[x], fmt='%10.6f',delimiter=',')
In [313]: cat test.txt
0.000000, 3.333333, 6.666667, 10.000000, 13.333333, 16.666667, 20.000000, 23.333333, 26.666667, 30.000000
The default array print is aimed mainly at interactive work, where you want to see enough of the values to see whether they are right or not, but you don't intend to reload them. The savetxt/loadtxt pair are better for that.
The savetxt does, roughly:
for row in x:
f.write(fmt%tuple(row))
where fmt is constructed from your input paramater and the number of items in the row, e.g. ', '.join(['%10.6f']*10)+'\n'
In [320]: print('[%s]'%', '.join(['%10.6f']*10)%tuple(x))
[ 0.000000, 3.333333, 6.666667, 10.000000, 13.333333, 16.666667, 20.000000, 23.333333, 26.666667, 30.000000]
Actually python converts both in the same way: str(object) calls object.__str__() or object.__repr__() if the former does not exist. From that point it is the responsibility of object to provide its string representation.
Python lists and numpy arrays are different objects, designed and implemented by different people to serve different needs so it is to be expected that their __str__ and __repr__ methods do not behave the same.
I have a very annoying output format from a program for my x,y,r values, namely:
circle(201.5508,387.68505,2.298685) # text={1}
circle(226.21442,367.48613,1.457215) # text={2}
circle(269.8067,347.73605,1.303065) # text={3}
circle(343.29599,287.43024,6.5938) # text={4}
is there a way to get the 3 numbers out into an array without doing manual labor?
So I want the above input to become
201.5508,387.68505,2.298685
226.21442,367.48613,1.457215
269.8067,347.73605,1.303065
343.29599,287.43024,6.5938
if you mean that the circle(...) construct is the output you want to parse. Try something like this:
import re
a = """circle(201.5508,387.68505,2.298685) # text={1}
circle(226.21442,367.48613,1.457215) # text={2}
circle(269.8067,347.73605,1.303065) # text={3}
circle(343.29599,287.43024,6.5938) # text={4}"""
for line in a.split("\n"):
print [float(x) for x in re.findall(r"\d+(?:\.\d+)?", line)]
Otherwise, you might mean that you want to call circle with numbers taken from an array containing 3 numbers, which you can do as:
arr = [343.29599,287.43024,6.5938]
circle(*arr)
A bit unorthodox, but as the format of your file is valid Python code and there are probably no security risks regarding untrusted code, why not just simply define a circle function which puts all the circles into a list and execute the file like:
circles = []
def circle(x, y, r):
circles.append((x, y, r))
execfile('circles.txt')
circles is now list containing triplets of x, y and r:
[(201.5508, 387.68505, 2.298685),
(226.21442, 367.48613, 1.457215),
(269.8067, 347.73605, 1.303065),
(343.29599, 287.43024, 6.5938)]
I need help formatting my matrix when i write it to a file. I am using the numpy method called toFile()
it takes 3 args. 1-name of file,2-seperator(must be a string),3-format(Also a string)
I dont know a lot about formatting but i am trying to format the file to there is a new line each 9 charatcers. (not including spaces). The output is a 9x9 soduku game. So I need to it be formatted 9x9.
finished = M.tofile("soduku_solved.txt", " ", "")
Where M is a matrix
My first argument is the name of the file, the second is a space, but I dont know what format argument i need to to make it 9x9
I could be wrong, but I don't think that's possible with the numpy tofile function. I think the format argument just allows you to format how each individual item is formatted, it doesn't consider them in a group.
You could do something like:
M = np.random.randint(1, 9, (9, 9))
each_item_fmt = '{:>3}'
each_row_fmt = ' '.join([each_item_fmt] * 9)
fmt = '\n'.join([each_row_fmt] * 9)
as_string = fmt.format(*M.flatten())
It's not a very nice way to build up the format string and there's bound to be a better way of doing it. You'll see the final result (print(fmt)) is a big block of '{:>3}', which basically says, put a bit of data in here with a fixed width of 3 characters, right aligned.
EDIT Since you're putting it directly into a file you could write it line by line:
M = np.random.randint(1, 9, (9, 9))
fmt = ('{:>3} ' * 9).strip()
with open('soduku_solved.txt', 'w') as f:
for m in M:
f.write(fmt.format(*m) + '\n')