For debugging purposes my program writes out the armadillo-based matrices in a raw-ascii format into text files, i.e. complex numbers are written as (1, 1). Moreover, the resulting matrices result in file sizes > 3 GByte.
I would like to "plot" those matrices (representing fields) such that I can look at different points within the field for debugging. What would be the best way of doing that?
When directly plotting my file with gnuplot using
plot "matrix_file.txt" matrix with image
I get the response
warning: matrix contains missing or undefined values
Warning: empty cb range [0:0], adjusting to [-1:1]
I also could use Matplotlib, iterate over each row in the file and convert the values into appropriate python values, but I assume reading the full file doing that will be rather time-consuming.
Thus, are there other reasonable fast options for plotting my matrix, or is there a way to tell gnuplot how to treat my complex numbers properly?
A part of the first line looks like
(0.0000000000000000e+00,0.0000000000000000e+00) (8.6305562282169946e-07,6.0526580514090297e-07) (1.2822974500623326e-05,1.1477679031930141e-05) (5.8656372718492336e-05,6.6626342814082442e-05) (1.6183121649896915e-04,2.3519364967920469e-04) (3.2919257507746272e-04,6.2745022681547850e-04) (5.3056616247733281e-04,1.3949688132772061e-03) (6.7714688179733437e-04,2.7240206117506108e-03) (6.0083005524875425e-04,4.8217990806492588e-03) (3.6759450038482363e-05,7.8957232784174231e-03) (-1.3887302495780910e-03,1.2126758313515496e-02) (-4.1629396217170980e-03,1.7638346107957101e-02) (-8.8831593853181175e-03,2.4463072133103888e-02) (-1.6244140097742808e-02,3.2509486873735290e-02) (-2.7017231109227786e-02,4.1531431496659221e-02) (-4.2022691198292300e-02,5.1101686500864850e-02) (-6.2097364532786636e-02,6.0590740956970250e-02) (-8.8060067117896060e-02,6.9150058884242055e-02) (-1.2067637255414780e-01,7.5697648270160053e-02) (-1.6062285417043359e-01,7.8902435158400494e-02) (-2.0844826713055306e-01,7.7163461035715558e-02) (-2.6452596415873003e-01,6.8580842184681204e-02) (-3.2898869195273894e-01,5.0918234150147214e-02) (-4.0163477687695504e-01,2.1561405580661022e-02) (-4.8179470918233597e-01,-2.2515842273449008e-02) (-5.6815035401912617e-01,-8.4759639628930100e-02) (-6.5850621484774385e-01,-1.6899215347429869e-01) (-7.4952345707877654e-01,-2.7928561041518252e-01) (-8.3644196044174313e-01,-4.1972419090890900e-01) (-9.1283160402230334e-01,-5.9403043419268908e-01) (-9.7042844114238713e-01,-8.0504703287094281e-01) (-9.9912107865273936e-01,-1.0540865412492695e+00) (-9.8715384989307420e-01,-1.3401890190155983e+00) (-9.2160320921981831e-01,-1.6593576679224276e+00) (-7.8916051033438095e-01,-2.0038702251062159e+00) (-5.7721850912406181e-01,-2.3617835609973805e+00) (-2.7521347260072193e-01,-2.7167550691449942e+00)
Ideally, I would like to be able to choose if I plot only the real part, the imaginary part or the abs()-value.
Here is a gnuplot only version.
Actually, I haven't seen (yet) a gnuplot example about how to plot complex numbers from a datafile.
Here, the idea is to split the data into columns at the characters ( and , and ) via:
set datafile separator '(,)'
Then you can address your i-th real and imaginary parts in column via column(3*i-1) and column(3*i), respectively.
You are creating a new dataset via plotting the data many times in a double loop, which is ok for small data. However, my guess would be that this solution might become pretty slow for large datasets, especially if you are plotting from a file. I assume if you have your data once in a datablock (instead of a file) it might be faster. Check gnuplot: load datafile 1:1 into datablock. In general, maybe it is more efficient to use another tool, e.g. Python, awk, etc. to prepare the data.
Just a thought: if you have approx. 3e9 Bytes of data and (according to your example) approx. 48-50 Bytes per datapoint and if you want to plot it as a square graph, then the number of pixels on a side would be sqrt(3e9/50)=7746 pixels. I doubt that you have a display which can display this at once.
Edit:
The modified version below is now using set print to datablock and is much faster then the original version (using a double loop of plot ... every ...). The speed improvement I can already see with my little data example. Good luck with your huge dataset ;-).
Just for reference and comparison, the old version listed again here:
# create a new datablock with row,col,Real,Imag,Abs
# using plot ...with table (pretty slow and inefficient)
set table $Data2
set datafile separator '(,)' # now, split your data at these characters
myReal(i) = column(3*i-1)
myImag(i) = column(3*i)
myAbs(i) = sqrt(myReal(i)**2 + myImag(i)**2)
plot for [row=0:rowMax-1] for [col=1:colMax] $Data u (row):(col):(myReal(col)):(myImag(col)):(myAbs(col)) every ::row::row w table
set datafile separator whitespace # set separator back to whitespace
unset table
Code: (modified using set print)
### plotting complex numbers
reset session
$Data <<EOD
(0.1,0.1) (0.2,1.2) (0.3,2.3) (0.4,3.4) (0.5,4.5)
(1.1,0.1) (1.2,1.2) (1.3,2.3) (1.4,3.4) (1.5,4.5)
(2.1,0.1) (2.2,1.2) (2.3,2.3) (2.4,3.4) (2.5,4.5)
(3.1,0.1) (3.2,1.2) (3.3,2.3) (3.4,3.4) (3.5,4.5)
(4.1,0.1) (4.2,1.2) (4.3,2.3) (4.4,3.4) (4.5,4.5)
(5.1,0.1) (5.2,1.2) (5.3,2.3) (5.4,3.4) (5.5,4.5)
(6.1,0.1) (6.2,1.2) (6.3,2.3) (6.4,3.4) (6.5,4.5)
(7.1,0.1) (7.2,1.2) (7.3,2.3) (7.4,3.4) (7.5,4.5)
EOD
stats $Data u 0 nooutput # get number of columns and rows, separator is whitespace
colMax = STATS_columns
rowMax = STATS_records
# create a new datablock with row,col,Real,Imag,Abs
# using print to datablock
set print $Data2
myCmplx(row,col) = word($Data[row+1],col)
myReal(row,col) = (s=myCmplx(row,col),s[2:strstrt(s,',')-1])
myImag(row,col) = (s=myCmplx(row,col),s[strstrt(s,',')+1:strlen(s)-1])
myAbs(row,col) = sqrt(myReal(row,col)**2 + myImag(row,col)**2)
do for [row=0:rowMax-1] {
do for [col=1:colMax] {
print sprintf("%d %d %s %s %g",row-1,col,myReal(row,col),myImag(row,col),myAbs(row,col))
}
}
set print
set key box opaque
set multiplot layout 2,2
plot $Data2 u 1:2:3 w image ti "Real part"
plot $Data2 u 1:2:4 w image ti "Imaginary part"
set origin 0.25,0
plot $Data2 u 1:2:5 w image ti "Absolute value"
unset multiplot
### end of code
Result:
Maybe not what you asked for but I think it is neat to plot directly from your code and it is simple to modify what you want to show abs(x),real(x),... Here is a simple snippet to plot an Armadillo matrix as an image in gnuplot (Linux)
#include <armadillo>
using namespace std;
using namespace arma;
void plot_image(mat& x, FILE* cmd_pipe)
{
fputs("set nokey;set yrange [*:*] reverse\n", cmd_pipe);
fputs("plot '-' matrix with image\n", cmd_pipe);
for(uword r=0; r<x.n_rows; r++){
for(uword c=0; c<x.n_cols; c++){
string str=to_string(x(r,c))+" ";
fputs(str.c_str(), cmd_pipe);
}
fputs("\n", cmd_pipe);
}
fputs("e\n", cmd_pipe);
}
int main()
{
FILE* gnuplot_pipe = popen("gnuplot -persist","w");
mat x={{1,2,3,4,5},
{2,2,3,4,5},
{3,3,3,4,5},
{4,4,4,4,5},
{5,5,9,9,9}};
plot_image(x,gnuplot_pipe);
return 0 ;
}
The output is:
I have a string of ndarray. I want to convert it back to ndarray.
I tried newval = np.fromstring(val, dtype=float). But it gives ValueError: string size must be a multiple of element size
Also I tried newval = ast.literal_eval(val). This gives
File "<unknown>", line 1
[-1.45181984e-01 1.51671678e-01 1.59053639e-01 -1.02861412e-01
^
SyntaxError: invalid syntax
String of ndarray
'[-1.45181984e-01 1.51671678e-01 1.59053639e-01 -1.02861412e-01
-9.70948339e-02 -1.75551832e-01 -7.24434480e-02 1.19182713e-01
-4.54084426e-02 -9.23779532e-02 8.87222588e-02 1.05331177e-02
-1.31792471e-01 3.50326337e-02 -6.58577830e-02 1.02670217e+00
-5.29987812e-02 2.09167395e-02 -1.19845152e-01 2.30511073e-02
2.89404951e-02 4.17387672e-02 -2.08203331e-01 2.34342851e-02]'
How can I convert this back to ndarray?
To expand upon my comment:
If you're trying to parse a human-readable string representation of a NumPy array you've acquired from somewhere, you're already doing something you shouldn't.
Instead use numpy.save() and numpy.load() to persist NumPy arrays in an efficient binary format.
Maybe use .savetxt() if you need human readability at the expense of precision and processing speed... but never consider str(arr) to be something you can ever parse again.
However, to answer your question, if you're absolutely desperate and don't have a way to get the array into a better format...
>>> data = '''
... [-1.45181984e-01 1.51671678e-01 1.59053639e-01 -1.02861412e-01
... -9.70948339e-02 -1.75551832e-01 -7.24434480e-02 1.19182713e-01
... -4.54084426e-02 -9.23779532e-02 8.87222588e-02 1.05331177e-02
... -1.31792471e-01 3.50326337e-02 -6.58577830e-02 1.02670217e+00
... -5.29987812e-02 2.09167395e-02 -1.19845152e-01 2.30511073e-02
... 2.89404951e-02 4.17387672e-02 -2.08203331e-01 2.34342851e-02]
... '''.strip()
>>> list_of_floats = [float(x) for x in data.strip('[]').split(None)]
[-0.145181984, 0.151671678, 0.159053639, -0.102861412, -0.0970948339, -0.175551832, -0.072443448, 0.119182713, -0.0454084426, -0.0923779532, 0.0887222588, 0.0105331177, -0.131792471, 0.0350326337, -0.065857783, 1.02670217, -0.0529987812, 0.0209167395, -0.119845152, 0.0230511073, 0.0289404951, 0.0417387672, -0.208203331, 0.0234342851]
EDIT: For the case OP mentioned in the comments,
I am storing these arrays in LevelDB as key value pairs. The arrays are fasttext vectors. In levelDB vector (value) for each ngram (key) are stored. Is what you mentioned above applicable here?
Yes – you'd use BytesIO from the io module to emulate an in-memory "file" NumPy can write into, then put that buffer into LevelDB, and reverse the process (read from LevelDB into an empty BytesIO and pass it to NumPy) to read:
bio = io.BytesIO()
np.save(bio, my_array)
ldb.put('my-key', bio.getvalue())
# ...
bio = io.BytesIO(ldb.get('my-key'))
my_array = np.load(bio)
I would like to process the following line (output of a Fortran program) from a file, with Python:
74 0.4131493371345440E-03 -0.4592776407685850E-03 -0.1725046324754540
and obtain an array such as:
[74,0.4131493371345440e-3,-0.4592776407685850E-03,-0.1725046324754540]
My previous attempts do not work. In particular, if I do the following :
with open(filename,"r") as myfile:
line=np.array(re.findall(r"[-+]?\d*\.*\d+",myfile.readline())).astype(float)
I have the following error :
ValueError: could not convert string to float: 'E-03'
Steps:
Get list of strings (str.split(' '))
Get rid of "\n" (del arr[-1])
Turn list of strings into numbers (Converting a string (with scientific notation) to an int in Python)
Code:
import decimal # you may also leave this out and use `float` instead of `decimal.Decimal()`
arr = "74 0.4131493371345440E-03 -0.4592776407685850E-03 -0.1725046324754540 \n"
arr = arr.split(' ')
del arr[-1]
arr = [decimal.Decimal(x) for x in arr]
# do your np stuff
Result:
>>> print(arr)
[Decimal('74'), Decimal('0.0004131493371345440'), Decimal('-0.0004592776407685850'), Decimal('-0.1725046324754540')]
PS:
I don't know if you wrote the file that gives the output in the first place, but if you did, you could just think about outputting an array of float() / decimal.Decimal() from that file instead.
#ant.kr Here is a possible solution:
# Initial data
a = "74 0.4131493371345440E-03 -0.4592776407685850E-03 -0.1725046324754540 \n"
# Given the structure of the initial data, we can proceed as follow:
# - split the initial at each white space; this will produce **list** with the last
# the element being **\n**
# - we can now convert each list element into a floating point data, store them in a
# numpy array.
line = np.array([float(i) for i in a.split(" ")[:-1]])
I need help formatting my matrix when i write it to a file. I am using the numpy method called toFile()
it takes 3 args. 1-name of file,2-seperator(must be a string),3-format(Also a string)
I dont know a lot about formatting but i am trying to format the file to there is a new line each 9 charatcers. (not including spaces). The output is a 9x9 soduku game. So I need to it be formatted 9x9.
finished = M.tofile("soduku_solved.txt", " ", "")
Where M is a matrix
My first argument is the name of the file, the second is a space, but I dont know what format argument i need to to make it 9x9
I could be wrong, but I don't think that's possible with the numpy tofile function. I think the format argument just allows you to format how each individual item is formatted, it doesn't consider them in a group.
You could do something like:
M = np.random.randint(1, 9, (9, 9))
each_item_fmt = '{:>3}'
each_row_fmt = ' '.join([each_item_fmt] * 9)
fmt = '\n'.join([each_row_fmt] * 9)
as_string = fmt.format(*M.flatten())
It's not a very nice way to build up the format string and there's bound to be a better way of doing it. You'll see the final result (print(fmt)) is a big block of '{:>3}', which basically says, put a bit of data in here with a fixed width of 3 characters, right aligned.
EDIT Since you're putting it directly into a file you could write it line by line:
M = np.random.randint(1, 9, (9, 9))
fmt = ('{:>3} ' * 9).strip()
with open('soduku_solved.txt', 'w') as f:
for m in M:
f.write(fmt.format(*m) + '\n')
Basically i have to dump a series of temperature readings, into a text file. This is a space delimited list of elements, where each row represents something (i don't know, and it just gets forced into a fortran model, shudder). I am more or less handling it from our groups side, which is extracting those temperature readings and dumping them into a text file.
Basically a quick example is i have a list like this(but with alot more elements):
temperature_readings = [ [1.343, 348.222, 484844.3333], [12349.000002, -2.43333]]
In the past we just dumped this into a file, unfortunately there is some people who have this irritating knack of wanting to look directly at the text file, and picking out certain columns and changing some things (for testing.. i don't really know..). But they always complain about the columns not lining up properly, they pretty much the above list to be printed like this:
1.343 348.222 484844.333
12349.000002 -2.433333
So those wonderful decimals line up. Is there an easy way to do this?
you can right-pad like this:
str = '%-10f' % val
to left pad:
set = '%10f' % val
or in combination pad and set the precision to 4 decimal places:
str = '%-10.4f' % val
:
import sys
rows = [[1.343, 348.222, 484844.3333], [12349.000002, -2.43333]]
for row in rows:
for val in row:
sys.stdout.write('%20f' % val)
sys.stdout.write("\n")
1.343000 348.222000 484844.333300
12349.000002 -2.433330
The % (String formatting) operator is deprecated now.
You can use str.format to do pretty printing in Python.
Something like this might work for you:
for set in temperature_readings:
for temp in set:
print "{0:10.4f}\t".format(temp),
print
Which prints out the following:
1.3430 348.2220 484844.3333
12349.0000 -2.4333
You can read more about this here: http://docs.python.org/tutorial/inputoutput.html#fancier-output-formatting
If you also want to display a fixed number of decimals (which probably makes sense if the numbers are really temperature readings), something like this gives quite nice output:
for line in temperature_readings:
for value in line:
print '%10.2f' % value,
print
Output:
1.34 348.22 484844.33
12349.00 -2.43
In Python 2.*,
for sublist in temperature_readings:
for item in sublist:
print '%15.6f' % item,
print
emits
1.343000 348.222000 484844.333300
12349.000002 -2.433330
for your example. Tweak the lengths and number of decimals as you prefer, of course!