python csv delimiter doesn't work properly - python

I try to write a python code to extract DVDL values from the input. Here is the truncated input.
A V E R A G E S O V E R 50000 S T E P S
NSTEP = 50000 TIME(PS) = 300.000 TEMP(K) = 300.05 PRESS = -70.0
Etot = -89575.9555 EKtot = 23331.1725 EPtot = -112907.1281
BOND = 759.8213 ANGLE = 2120.6039 DIHED = 4231.4019
1-4 NB = 940.8403 1-4 EEL = 12588.1950 VDWAALS = 13690.9435
EELEC = -147238.9339 EHBOND = 0.0000 RESTRAINT = 0.0000
DV/DL = 13.0462
EKCMT = 10212.3016 VIRIAL = 10891.5181 VOLUME = 416404.8626
Density = 0.9411
Ewald error estimate: 0.6036E-04
R M S F L U C T U A T I O N S
NSTEP = 50000 TIME(PS) = 300.000 TEMP(K) = 1.49 PRESS = 129.9
Etot = 727.7890 EKtot = 115.7534 EPtot = 718.8344
BOND = 23.1328 ANGLE = 36.1180 DIHED = 19.9971
1-4 NB = 12.7636 1-4 EEL = 37.3848 VDWAALS = 145.7213
EELEC = 739.4128 EHBOND = 0.0000 RESTRAINT = 0.0000
DV/DL = 3.7510
EKCMT = 76.6138 VIRIAL = 1195.5824 VOLUME = 43181.7604
Density = 0.0891
Ewald error estimate: 0.4462E-04
Here is the script. Basically we have a lot of DVDL in the input (not in the above truncated input) and we only want the last two. So we read all of them into a list and only get the last two. Finally, we write the last two DVDL in the list into a csv file. The desire output is
13.0462, 3.7510
However, the following script (python 2.7) will bring the output like this. Could any guru enlighten? Thanks.
13.0462""3.7510""
Here is the script:
import os
import csv
DVDL=[]
filename="input.out"
file=open(filename,'r')
with open("out.csv",'wb') as outfile: # define output name
line=file.readlines()
for a in line:
if ' DV/DL =' in a:
DVDL.append(line[line.index(a)].split(' ')[1]) # Extract DVDL number
print DVDL[-2:] # We only need the last two DVDL
yeeha="".join(str(a) for a in DVDL[-2:])
print yeeha
writer = csv.writer(outfile, delimiter=',',lineterminator='\n')#Output the list into a csv file called "outfile"
writer.writerows(yeeha)

As the commenter who proposed an approach has not had the chance to outline some code for this, here's how I'd suggest doing it (edited to allow optionally signed floating point numbers with optional exponents, as suggested by an answer to Python regular expression that matches floating point numbers):
import re,sys
pat = re.compile("DV/DL += +([+-]?(\d+(\.\d*)?|\.\d+)([eE][+-]?\d+)?)")
values = []
for line in open("input.out","r"):
m = pat.search(line)
if m:
values.append(m.group(1))
outfile = open("out.csv","w")
outfile.write(",".join(values[-2:]))
Having run this script:
$ cat out.csv
13.0462,3.7510
I haven't used the csv module in this case because it isn't really necessary for a simple output file like this. However, adding the following lines to the script will use csv to write the same data into out1.csv:
import csv
writer = csv.writer(open("out1.csv","w"))
writer.writerow(values[-2:])

Related

Python reading rows from csv, operating and organizing rows of numbers

I am a non-programmer geographer, heard some programming concepts but very newby :-)
I am to read six rows of environmental data. 1000 lines at the most, each time.
Each row housing two digit numbers (0 to 99) a summer issue, only positive numbers.
Once I read them I am to display the numbers 0 to 99 vertically with the number of occurrences for the reading for each of the six rows:
0 = 230.....0 = 3........0 = 230......0 = 123......0 = 223......0 = 334
1 = 67......1 = 657......1 = 627......1 = 767......1 = 467......1 = 337
2 = 762.....2 = 328......2 = 987......2 = 326......2 = 32.......2 = 123
.
.
99 = 3.....99 = 34.......99 = 1.......99 = 89......99 = 78......99 = 123
If I can get this far I will feel great. Once I learn how to do this and I can look at the data I can decide what makes sense to run next; excel, graphs, statistics, statistics in R, get the numbers into a matrix to manipulate from there, etc. First time so I am figuring this out as I go.
Any help will be much appreciated,
Adolfo
I am working in the research for the restoration of Quebrada Verde watershed in Valparaiso, Chile.
from array import array
import sys
if len(sys.argv) > 1:
count = array('H', [0]*100)
file = open(sys.argv[1], 'r')
if file:
for line in file:
count[int(line)]+=1
file.close()
for a in range (100):
print(a, count[a], sep='\t')
else:
print('unable to open the file')
else:
print('usage: python', sys.argv[0], ' file')

Print strings with line break Python

import csv
import datetime
with open('soundTransit1_remote_rawMeasurements_15m.txt','r') as infile, open('soundTransit1.txt','w') as outfile:
inr = csv.reader(infile,delimiter='\t')
#ouw = csv.writer(outfile,delimiter=' ')
for row in inr:
d = datetime.datetime.strptime(row[0],'%Y-%m-%d %H:%M:%S')
s = 1
p = int(row[5])
nr = [format(s,'02')+format(d.year,'04')+format(d.month,'02')+format(d.day,'02')+format(d.hour,'02')+format(d.minute,'02')+format(int(p*0.2),'04')]
outfile.writelines(nr+'/n')
Using the above script, I have read in a .txt file and reformatted it as 'nr' so it looks like this:
['012015072314000000']
['012015072313450000']
['012015072313300000']
['012015072313150000']
['012015072313000000']
['012015072312450000']
['012015072312300000']
['012015072312150000']
..etc.
I need to now print it onto my new .txt file, but Python is not allowing me to print 'nr' with line breaks after each entry, I think because the data is in strings. I get this error:
TypeError: can only concatenate list (not "str") to list
Is there another way to do this?
You are trying to combine a list with a string, which cannot work. Simply don't create a list in nr.
import csv
import datetime
with open('soundTransit1_remote_rawMeasurements_15m.txt','r') as infile, open('soundTransit1.txt','w') as outfile:
inr = csv.reader(infile,delimiter='\t')
#ouw = csv.writer(outfile,delimiter=' ')
for row in inr:
d = datetime.datetime.strptime(row[0],'%Y-%m-%d %H:%M:%S')
s = 1
p = int(row[5])
nr = "{:02d}{:%Y%m%d%H%M}{:04d}\n".format(s,d,int(p*0.2))
outfile.write(nr)
There is no need to put your string into a list; just use outfile.write() here and build a string without a list:
nr = format(s,'02') + format(d.year,'04') + format(d.month, '02') + format(d.day, '02') + format(d.hour, '02') + format(d.minute, '02') + format(int(p*0.2), '04')
outfile.write(nr + '\n')
Rather than use 7 separate format() calls, use str.format():
nr = '{:02}{:%Y%m%d%H%M}{:04}\n'.format(s, d, int(p * 0.2))
outfile.write(nr)
Note that I formatted the datetime object with one formatting operation, and I included the newline into the string format.
You appear to have hard-coded the s value; you may as well put that into the format directly:
nr = '01{:%Y%m%d%H%M}{:04}\n'.format(d, int(p * 0.2))
outfile.write(nr)
Together, that updates your script to:
with open('soundTransit1_remote_rawMeasurements_15m.txt', 'r') as infile,\
open('soundTransit1.txt','w') as outfile:
inr = csv.reader(infile, delimiter='\t')
for row in inr:
d = datetime.datetime.strptime(row[0], '%Y-%m-%d %H:%M:%S')
p = int(int(row[5]) * 0.2)
nr = '01{:%Y%m%d%H%M}{:04}\n'.format(d, p)
outfile.write(nr)
Take into account that the csv module works better if you follow the guidelines about opening files; in Python 2 you need to open the file in binary mode ('rb'), in Python 3 you need to set the newline parameter to ''. That way the module can control newlines correctly and supports including newlines in column values.

Sympy refuses to calculate values

I'm trying to implement a markdown-like language to do math with. The basic idea is to have a file where you can write down your math, then have a python-script do the calculations and spit out tex.
However, I'm facing the problem, that Sympy refuses to spit out values, it only gives me back the equation. Much weirder is the fact, that it DOES spit out values in an alternate test-script, that is essentially the same code.
This is the working code:
import sympy as sp
m = sp.symbols('m')
kg = sp.symbols('kg')
s = sp.symbols('s')
g = sp.sympify(9.80665*m/s**2)
mass = sp.sympify(0.2*kg)
acc = sp.sympify(g)
F = sp.sympify(mass*acc)
print F
Output:
1.96133*kg*m/s**2
This the not working code:
import re
import sympy as sp
print 'import sympy as sp'
#read units
mymunits = 'units.mymu'
with open(mymunits) as mymu:
mymuinput = mymu.readlines()
for lines in mymuinput:
lines = re.sub('\s+','',lines).split()
if lines != []:
if lines[0][0] != '#':
unit = lines[0].split('#')[0]
globals()[unit] = sp.symbols(unit)
print unit+' = sp.symbols(\''+unit+'\')'
#read constants
mymconstants = 'constants.mymc'
with open(mymconstants) as mymc:
mymcinput = mymc.readlines()
for lines in mymcinput:
lines = re.sub('\s+','',lines).split()
if lines != []:
if lines[0][0] != '#':
constant = lines[0].split('#')[0].split(':=')
globals()[constant[0]] = sp.sympify(constant[1])
print constant[0]+' = sp.sympify('+constant[1]+')'
#read file
mymfile = 'test.mym'
with open(mymfile) as mym:
myminput = mym.readlines()
#create equations by removing spaces and splitting lines
for line in myminput:
line = line.replace(' ','').strip().split(';')
for eqstr in line:
if eqstr != '':
eq = re.split(':=',eqstr)
globals()[eq[0]] = sp.sympify(eq[1])
print eq[0]+' = sp.sympify('+eq[1]+')'
print 'print F'
print F
It outputs this:
acc*mass
It SHOULD output a value, just like the test-script.
The same script also outputs the code that is used in the test-script. The only difference is, that in the not-working script, I try to generate the code from an input-file, which looks like that:
mass := 0.2*kg ; acc := g
F := mass*acc
as well as files for units:
#SI
m #length
kg #mass
s #time
and constants:
#constants
g:=9.80665*m/s**2 #standard gravity
The whole code is also to be found on github.
What I don't get is why the one version works, while the other doesn't. Any ideas are welcomed.
Thank you.
Based on Everts comment, I cam up with this solution:
change:
sp.sympify(eq[1])
to:
sp.sympify(eval(eq[1]))

How do I convert integers into high-resolution times in Python? Or how do I keep Python from dropping zeros?

Currently, I'm using this to calculate the time between two messages and listing the times if they are above 20 seconds.
def time_deltas(infile):
entries = (line.split() for line in open(INFILE, "r"))
ts = {}
for e in entries:
if " ".join(e[2:5]) == "OuchMsg out: [O]":
ts[e[8]] = e[0]
elif " ".join(e[2:5]) == "OuchMsg in: [A]":
in_ts, ref_id = e[0], e[7]
out_ts = ts.pop(ref_id, None)
yield (float(out_ts),ref_id[1:-1],(float(in_ts)*10000 - float(out_ts)*10000))
n = (float(in_ts)*10000 - float(out_ts)*10000)
if n> 20:
print float(out_ts),ref_id[1:-1], n
INFILE = 'C:/Users/klee/Documents/text.txt'
import csv
with open('output_file1.csv', 'w') as f:
csv.writer(f).writerows(time_deltas(INFILE))
However, there are two major errors. First of all, python drops zeros when the time is before 10, ie. 0900. And, it drops zeros making the time difference not accurate.
It looks like:
130203.08766
when it should be:
130203.087660
You are yielding floats, so the csv writer turns those floats into strings as it pleases.
If you want your output values to be a certain format, yield a string in that format.
Perhaps something like this?
print "%04.0f" % (900) # prints 0900

numerical calculations with the items in a list

I am back again with another python query. I have been trying to do some calculations with the items present in a list. Here is the code:
import math
def Usage() :
print "Usage :python beznew.py transcriptionsFile"
if __name__ == "__main__" :
if len(sys.argv) != 2 :
Usage()
else :
transcriptionFile = sys.argv[1]
tFile = open(transcriptionFile, "r")
for line in iter(tFile) :
list = line.split()
# changing the unit of time from 100 nano seconds to seconds
list[0] = list[0] / 100000000
list[1] = list[1] / 100000000
# duration of each phoneme
list = list[1] - list[0]
# extracting the start time of each phoneme
newlist = list.pop[0]
print list
print newlist
close.tFile
The input file looks like the following:
000000 1200000 pau
1200000 1600000 dh
1600000 2000000 ih
2000000 3100000 k
3100000 3400000 aa
3400000 3800000 r
I am trying to change the numerical values to seconds. And also trying to get the difference between first and second numbers. It would not allow me to divide. I dont understand what am I doing wrong. Thank you.
First, don't use list as a variable name. Every time you do that, a kitten dies.
Second, you should convert the strings you've extracted from your file to a number, preferably a Decimal if you value the precision. Currently you're trying to divide a string.
Third, nanoseconds are billionths of a second, not millionths.
Fourth, it's tFile.close(), not close.tfile.
Fifth, use for line in tfile:. A file descriptor is already an iterator.
Sixth, use with open(transcriptionfile, "r") as tfile: and be done with having to close it.
you can simplify your code as follows:
transcriptionFile = 'calculus.txt'
with open(transcriptionFile, "r") as tFile:
for line in tFile :
li = line.split()
if li:
new = ((int(li[1]) - int(li[0]))/10000000. , li[2])
print li,' ',new
The condition if li: is here to eliminate possible void lines.
Important points:
don't call a list with the name list because list is the name of built-in function of Python
in Python, 10/100 produces 0 ; you must put a dot to obtain the right result: 10./100 or 10/100.
do the calculus list = list[1] - list[0] before dividing by 10000000, it is more precise
with open(....) as handle: is better to open the files
Personally, I would do
transcriptionFile = 'calculus.txt'
with open(transcriptionFile, "r") as tFile:
gen = (line.split() for line in tFile if line.strip())
li = [((int(t2)-int(t1))/10000000.,phon) for (t1,t2,phon) in gen]
print '\n'.join(map(str,li))
Note that I used 10000000. to divide: if 1600000 - 1200000 = 400000 is in a unit which is 100 nanoseconds, then 400000 / 10000000 is 0.04 second
Edit 1
transcriptionFile = 'calculus.txt'
with open(transcriptionFile, "r") as tFile:
gen = (line.split() for line in tFile if line.strip())
firstVals, lapTimes = [],[]
for (t1,t2,phon) in gen:
firstVals.append( (int(t1)/10000000.,phon) )
lapTimes.append( (int(t2)-int(t1))/10000000.,phon) )
line.split() returns a list of strings. Try list[0] = float(list[0]) / 100000000.
This converts each string to a number which supports division before you do your calculations.
You do not convert the strings to numerical values. In order to conduct mathematical operations on your data, you have to convert the either to int or float objects:
valueA = int(list[0]) / 100000000
valueB = int(list[1]) / 100000000

Categories

Resources