Read array of values export in blocks of formatted LaTeX? - python

I'm trying to figure out if there is a way to read a table like this:
x1 y1 z1
x2 y2 z2
.. .. ..
xn yn zn
And then have my code print a text file that looks like this:
\object{x1}
\ra{y1}
\dec{z1}
\object{x2}
\ra{y2}
\dec{z2}
\object{..}
\ra{..}
\dec{..}
\object{xn}
\ra{yn}
\dec{zn}
Thus far, I have a code that reads in these arrays just fine, but I do not know how to save them to a text file that's anything other than exactly what was read in. Is there a way I can have each of these lines printed in some customized format, like above?
I've tried
np.savetxt('data.txt',zip(x,y,z),fmt='(messing with formatting options here)')
but I've had no luck and I'm not sure if savetxt is even the right route. Thanks very much in advance for any help you can provide!

if you have an array arr and want your output in a file, you can try:
arr = [['x1','y1','z1'],
['x2','y2','z2'],
['x3','y3','z3']]
with open('latex.txt', 'a') as myfile:
[myfile.write('\object{'+row[0]+'}\n\\ra{'+row[1]+'}\n\dec{'+row[2]+'}\n\n') for row in arr]

This should do the job:
import numpy as np
txt = np.genfromtxt('input.txt', dtype='str')
# python >= 3.6
with open('outfile.txt', 'w') as file:
file.write("\n".join(
[f'\\object{{{row[0]}}}\n'
f'\\ra{{{row[1]}}}\n'
f'\\dec{{{row[2]}}}\n'
'\\color{red}\n' # static line
for row in txt]
))
# python < 3.6
with open('outfile.txt', 'w') as file:
file.write("\n".join(
['\\object{{{row[0]}}}\n'
'\\ra{{{row[1]}}}\n'
'\\dec{{{row[2]}}}\n'
'\\color{{red}}\n'.format(row=row) # static line
for row in txt]
))

You need to split the reading and writing processes. A solution involving the use of the module csv, to read comma separated values from a file (in this case, not a comma but a space).
Here data.txt is the file with the table, out.txt the file with the format you want.
import csv
with open('data.txt') as rr:
reader = csv.reader(rr, delimiter=' ')
with open('out.txt', 'w') as oo:
for line in reader:
oo.write(f"\\object{{{line[0]}}}\n")
oo.write(f"\\ra{{{line[1]}}}\n")
oo.write(f"\\dec{{{line[2]}}}\n\n")
Notice the triple curly braces in the formatted string literals two braces to print a brace, one to print the variable value.
If your python version is < 3.6, use instead:
oo.write("\\object{{{}}}\n".format(line[0]))
oo.write("\\ra{{{}}}\n".format(line[1]))
oo.write("\\dec{{{}}}\n\n".format(line[2]))
EDIT after comment
To add extra lines each block, simply write them with an extra call to oo.write inside the for loop. For example:
oo.write(f"\\object{{{line[0]}}}\n")
oo.write(f"\\ra{{{line[1]}}}\n")
oo.write(f"\\dec{{{line[2]}}}\n")
oo.write("\color{red}\n\n") #no need to use format here, one curly bracket is enough
Last line does not depend on line, so will be the same each iteration of the for loop.

Related

Use of readline()?

I have a question about this program:
%%file data.csv
x1,x2,y
0.4946,5.7661,0
4.7206,5.7661,1
1.2888,5.3433,0
4.2898,5.3433,1
1.4293,4.5592,0
4.2286,4.5592,1
1.1921,5.8563,0
3.1454,5.8563,1
f = open('data.csv')
data = []
f.readline()
for line in f:
(x1,x2,y) = line.split(',')
x1 = float(x1)
x2 = float(x2)
y = int(y)
data.append((x1,x2,y))
What is the purpose of readline here? I have seen different examples but here seems that it delete the first line.
Python is reading the data serially, so if a line gets read once, python jumps to the next one. The r.readline() reads the first line, so in the loop it doesn't get read.
That's precisely the point: to delete the first line. If you notice, the file has the names of the columns as its first line (x1,x2,y), and the program wants to ignore that line.
Using readline() method before reading lines of file in loop
is equals to:
for line in f.readlines()[1:]:
...
for example that may be used to skip table header.
In your file, when you will convert x1 variable to float type it raise ValueError because in first iteration x1 contain not digit sting type value "x1". And to avoid that error you use readline() to swich iterator to second line wich contain pure digits.

Parsing a text file with line breaks in python

I have a text file with about 20 entries. They look like this:
~
England
Link: http://imgur.com/foobar.jpg
Capital: London
~
Iceland
Link: http://imgur.com/foobar2.jpg
Capital: Reykjavik
...
etc.
I would like to take these entries and turn them into a CSV.
There is a '~' separating each entry. I'm scratching my head trying to figure out how to go thru line by line and create the CSV values for each country. Can anyone give me a clue on how to go about this?
Use the libraries luke :)
I'm assuming your data is well formatted. Most real world data isn't that way. So, here goes a solution.
>>> content.split('~')
['\nEngland\nLink: http://imgur.com/foobar.jpg\nCapital: London\n', '\nIceland\nLink: http://imgur.com/foobar2.jpg\nCapital: Reykjavik\n', '\nEngland\nLink: http://imgur.com/foobar.jpg\nCapital: London\n', '\nIceland\nLink: http://imgur.com/foobar2.jpg\nCapital: Reykjavik\n']
For writing the CSV, Python has standard library functions.
>>> import csv
>>> csvfile = open('foo.csv', 'wb')
>>> fieldnames = ['Country', 'Link', 'Capital']
>>> writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
>>> for entry in entries:
... cols = entry.strip().splitlines()
... writer.writerow({'Country': cols[0], 'Link':cols[1].split(': ')[1], 'Capital':cols[2].split(':')[1]})
...
If your data is more semi structured or badly formatted, consider using a library like PyParsing.
Edit:
Second column contains URLs, so we need to handle the splits well.
>>> cols[1]
'Link: http://imgur.com/foobar2.jpg'
>>> cols[1].split(':')[1]
' http'
>>> cols[1].split(': ')[1]
'http://imgur.com/foobar2.jpg'
The way that I would do that would be to use the open() function using the syntax of:
f = open('NameOfFile.extensionType', 'a+')
Where "a+" is append mode. The file will not be overwritten and new data can be appended. You could also use "r+" to open the file in read mode, but would lose the ability to edit. The "+" after a letter signifies that if the document does not exist, it will be created. The "a+" I've never found to work without the "+".
After that I would use a for loop like this:
data = []
tmp = []
for line in f:
line.strip() #Removes formatting marks made by python
if line == '~':
data.append(tmp)
tmp = []
continue
else:
tmp.append(line)
Now you have all of the data stored in a list, but you could also reformat it as a class object using a slightly different algorithm.
I have never edited CSV files using python, but I believe you can use a loop like this to add the data:
f2 = open('CSVfileName.csv', 'w') #Can change "w" for other needs i.e "a+"
for entry in data:
for subentry in entry:
f2.write(str(subentry) + '\n') #Use '\n' to create a new line
From my knowledge of CSV that loop would create a single column of all of the data. At the end remember to close the files in order to save the changes:
f.close()
f2.close()
You could combine the two loops into one in order to save space, but for the sake of explanation I have not.

what is a quick way to import a text file in python?

I have a plain text file with a sequence of numbers, one on each line. I need to import those values into a list. I'm currently learning python and I'm not sure of which is a fast or even "standard" way of doing this (also, I come from R so I'm used to the scan or readLines functions that makes this task a breeze).
The file looks like this (note: this isn't a csv file, commas are decimal points):
204,00
10,00
10,00
10,00
10,00
11,00
70,00
276,00
58,00
...
Since it uses commas instead of '.' for decimal points, I guess the task's a little harder, but it should be more or less the same, right?
This is my current solution, which I find quite cumbersome:
f = open("some_file", "r")
data = f.read().replace('\n', '|')
data = data[0:(len(data) - 2)].replace(',', '.')
data = data.split('|')
x = range(len(data))
for i in range(len(data)):
x[i] = float(data[i])
Thanks in advance.
UPDATE
I didn't realize the comma was the decimal separator. If the locale is set right, something like this should work
lines = [locale.atof(line.strip()) for line in open(filename)]
if not, you could do
lines = [float(line.strip().replace(',','.')) for line in open(filename)]
lines = [line.strip() for line in open(filename)]
if you want the data as numbers ...
lines = [map(float,line.strip().split(',')) for line in open(filename)]
edited as per first two comments below
bsoist's answer is good if locale is set correctly. If not, you can simply read the entire file in and split on the line breaks (\n), then use a list comprehension for replacements.
with open('some_file.txt', 'r') as datafile:
data = datafile.read()
x = [float(value.replace(",", ".")) for value in data.split('\n')]
For a more simpler way you could just do
Read = []
with open('File.txt', 'r') as File:
Read = File.readLines()
for A in Read:
print A
The "with open()" will open the file and quit when it's finished reading. This is good practice IIRC.
Then the For loop will just loop over Read and print out the lines.

parse a csv file into a text file

I am a second year EE student.
I just started learning python for my project.
I intend to parse a csv file with a format like
3520005,"Toronto (Ont.)",C ,F,2503281,2481494,F,F,0.9,1040597,979330,630.1763,3972.4,1
2466023,"Montréal (Que.)",V ,F,1620693,1583590,T,F,2.3,787060,743204,365.1303,4438.7,2
5915022,"Vancouver (B.C.)",CY ,F,578041,545671,F,F,5.9,273804,253212,114.7133,5039.0,8
3519038,"Richmond Hill (Ont.)",T ,F,162704,132030,F,F,23.2,53028,51000,100.8917,1612.7,28
into a text file like the following
Toronto 2503281
Montreal 1620693
Vancouver 578041
I am extracting the 1st and 5th column and save it into a text file.
This is what i have so far.
import csv
file = open('raw.csv')
reader = csv.reader(file)
f = open('NicelyDone.text','w')
for line in reader:
f.write("%s %s"%line[1],%line[5])
This is not working for me, I was able to extract the data from the csv file as line[1],line[5]. (I am able to print it out)
But I dont know how to write it to a .text file in the format i wanted.
Also, I have to process the first column eg, "Toronto (Ont.)" into "Toronto".
I am familiar with the function find(), I assume that i could extract Toronto out of Toronto(Ont.) using "(" as the stopping character,
but based on my research , I have no idea how to use it and ask it to return me the string(Toronto).
Here is my question:
What is the data format for line[1]?
If it is string how come f.write() does not work?
If it is not string, how do i convert it to a string?
How do i extract the word Toronto out of Toronto(Ont) into a string form using find() or other methods.
My thinking is that I could add those 2 string together like c = a+ ' ' + b, that would give me the format i wanted.
So i can use f.write() to write into a file :)
Sorry if my questions sounds too easy or stupid.
Thanks ahead
Zhen
All data read you get from csv.reader are strings.
There is a variety of solutions to this, but the simplest would be to split on ( and strip away any whitespace:
>>> a = 'Toronto (Ont.)'
>>> b = a.split('(')
>>> b
Out[16]: ['Toronto ', 'Ont.)']
>>> c = b[0]
>>> c
Out[18]: 'Toronto '
>>> c.strip()
Out[19]: 'Toronto'
or in one line:
>>> print 'Toronto (Ont.)'.split('(')[0].strip()
Another option would have been to use regular expression (the re module).
The specific problem in your code lies here:
f.write("%s %s"%line[1],%line[5])
Using the % syntax to format your string, you have to provide either a single value, or an iterable. In your case this should be:
f.write("%s %s" % (line[1], line[5]))
Another way to do the exact same thing, is to use the format method.
f.write('{} {}'.format(line[1], line[5]))
This is a flexible way of formating strings, and I recommend that you read about in the docs.
Regarding your code, there is a couple of things you should consider.
Always remember to close your file handlers. If you use with open(...) as fp, this is taken care of for you.
with open('myfile.txt') as ifile:
# Do stuff
# The file is closed here
Don't use reserved words as your variable name. file is such a thing, and by using it as something else (shadowing it), you may cause problems later on in your code.
To write your data, you can use csv.writer:
with open('myfile.txt', 'wb') as ofile:
writer = csv.writer(ofile)
writer.writerow(['my', 'data'])
From Python 2.6 and above, you can combine multiple with statements in one statement:
with open('raw.csv') as ifile, open('NicelyDone.text','w') as ofile:
reader = csv.reader(ifile)
writer = csv.writer(ofile)
Combining this knowledge, your script can be rewritten to something like:
import csv
with open('raw.csv') as ifile, open('NicelyDone.text', 'wb') as ofile:
reader = csv.reader(ifile)
writer = csv.writer(ofile, delimiter=' ')
for row in reader:
city, num = row[1].split('(')[0].strip(), row[5]
writer.writerow([city, num])
I don't recall csv that well, so I don't know if it's a string or not. What error are you getting? In any case, assuming it is a string, your line should be:
f.write("%s %s " % (line[1], line[5]))
In other words, you need a set of parentheses. Also, you should have a trailing space in your string.
A somewhat hackish but concise way to do this is: line[1].split("(")[0]
This will create a list that splits on the ( symbol, and then you extract the first element.

Python: Concise / elegant way to reformat a set of text files?

I have written a python script to process a set of ASCII files within a given dir. I wonder if there is a more concise and/or "pythonesque" way to do it, without loosing readability?
Python Code
import os
import fileinput
import glob
import string
indir='./'
outdir='./processed/'
for filename in glob.glob(indir+'*.asc'): # get a list of input ASCII files to be processed
fin=open(indir+filename,'r') # input file
fout=open(outdir+filename,'w') # out: processed file
lines = iter(fileinput.input([indir+filename])) # iterator over all lines in the input file
fout.write(next(lines)) # just copy the first line (the header) to output
for line in lines:
val=iter(string.split(line,' '))
fout.write('{0:6.2f}'.format(float(val.next()))), # first value in the line has it's own format
for x in val: # iterate over the rest of the numbers in the line
fout.write('{0:10.6f}'.format(float(val.next()))), # the rest of the values in the line has a different format
fout.write('\n')
fin.close()
fout.close()
An example:
Input:
;;; This line is the header line
-5.0 1.090074154029272 1.0034662411357929 0.87336062116561186 0.78649408279093869 0.65599958665017222 0.4379879132749317 0.26310799350679176 0.087808018565486673
-4.9900000000000002 1.0890770415316042 1.0025480136545413 0.87256100700428996 0.78577373527626004 0.65539842673645277 0.43758616966566649 0.26286647978335914 0.087727357602906453
-4.9800000000000004 1.0880820021223023 1.0016316956763136 0.87176305623792771 0.78505488659611744 0.65479851808106115 0.43718526271594083 0.26262546925502467 0.087646864773454014
-4.9700000000000006 1.0870890372077564 1.0007172884938402 0.87096676998908273 0.78433753775986659 0.65419986152386733 0.4367851929843618 0.26238496225635727 0.087566540188423345
-4.9600000000000009 1.086098148170821 0.99980479337809591 0.87017214936140763 0.78362168975984026 0.65360245789061966 0.4363859610200459 0.26214495911617541 0.087486383957276398
Processed:
;;; This line is the header line
-5.00 1.003466 0.786494 0.437988 0.087808
-4.99 1.002548 0.785774 0.437586 0.087727
-4.98 1.001632 0.785055 0.437185 0.087647
-4.97 1.000717 0.784338 0.436785 0.087567
-4.96 0.999805 0.783622 0.436386 0.087486
Other than a few minor changes, due to how Python has changed through time, this looks fine.
You're mixing two different styles of next(); the old way was it.next() and the new is next(it). You should use the string method split() instead of going through the string module (that module is there mostly for backwards compatibility to Python 1.x). There's no need to use go through the almost useless "fileinput" module, since open file handle are also iterators (that module comes from a time before Python's file handles were iterators.)
Edit: As #codeape pointed out, glob() returns the full path. Your code would not have worked if indir was something other than "./". I've changed the following to use the correct listdir/os.path.join solution. I'm also more familiar with the "%" string interpolation than string formatting.
Here's how I would write this in more idiomatic modern Python
def reformat(fin, fout):
fout.write(next(fin)) # just copy the first line (the header) to output
for line in fin:
fields = line.split(' ')
# Make a format header specific to the number of fields
fmt = '%6.2f' + ('%10.6f' * (len(fields)-1)) + '\n'
fout.write(fmt % tuple(map(float, fields)))
basenames = os.listdir(indir) # get a list of input ASCII files to be processed
for basename in basenames:
input_filename = os.path.join(indir, basename)
output_filename = os.path.join(outdir, basename)
with open(input_filename, 'r') as fin, open(output_filename, 'w') as fout:
reformat(fin, fout)
The Zen of Python is "There should be one-- and preferably only one --obvious way to do it". It's interesting how you functions which, during the last 10+ years, was "obviously" the right solution, but are no longer. :)
fin=open(indir+filename,'r') # input file
fout=open(outdir+filename,'w') # out: processed file
#code
fin.close()
fout.close()
can be written as:
with open(indir+filename,'r') as fin, open(outdir+filename,'w') as fout:
#code
In python 2.6, you can use:
with open(indir+filename,'r') as fin:
with open(outdir+filename,'w') as fout:
#code
And the line
lines = iter(fileinput.input([indir+filename]))
is useless. You can just iterate over an open file(fin in your case)
You can also do line.split(' ') instead of string.split(line, ' ')
If you change those things, there is no need to import string and fileinput.
Edit: I didn't know you can use inline code. That's cool
In my build script, I have this code:
inFile = open(sourceFile,'r')
outFile = open(targetFile,'w')
for line in inFile:
line = doKeywordSubstitution(line)
outFile.write(line)
inFile.close()
outFile.close()
I don't know of a way to make this any more concise. Putting the line-changing logic in a different function looks neater to me though.
I may be missing the point of your code, but I don't understand why you have lines = iter(fileinput.input([indir+filename])).
I don't understand why do you use: string.split(line, ' ') instead of just line.split(' ').
Well maybe I would write the string-processing part like this:
values = line.split(' ')
values[0] = '{0:6.2f}'.format(float(values[0]))
values[1:] = ['{0:10.6f}'.format(float(v)) for v in values[1:]]
fout.write(' '.join(values))
At least for me this looks better but this might be subjective :)
Instead of indir I would use os.curdir. Instead of "./processed" I would do: os.path.join(os.curdir, 'processed').

Categories

Resources