I'm trying to read a specifically formatted file (namely, the Butcher tableau) in python 3.5.
The file looks like this(tab separated):
S
a1 b11 b12 ... b1S
a2 b21 b22 ... b2S
...
aS bS1 bS2 ... bSS
0.0 c1 c2 ... cS
[tolerance]
for example, (tab separated)
2
0.0 0.0 0.0
1.0 0.5 0.5
0.0 0.5 0.5
0.0001
So my code looks like i'm writing in C. Is there a more pythonic approach to parsing this file? Maybe there are numpy methods that could be used here?
#the data from .dat file
S = 0 #method order, first char in .dat file
a = [] #S-dim left column of buther tableau
b = [] #S-dim matrix
c = [] #S-dim lower row
tolerance = 0 # for implicit methods
def parse_method(file_name):
'read the file_name, process lines, produce a Method object'
try:
with open('methods\\' + file_name) as file:
global S
S = int(next(file))
temp = []
for line in file:
temp.append([float(x) for x in line.replace('\n', '').split('\t')])
for i in range(S):
a.append(temp[i].pop(0))
b.append(temp[i])
global c
c = temp[S][1:]
global tolerance
tolerance = temp[-1][0] if len(temp)>S+1 else 0
except OSError as ioerror:
print('File Error: ' + str(ioerror))
My suggestion using Numpy:
import numpy as np
def read_butcher(filename):
with open(filename, 'rb') as fh:
S = int(fh.readline())
array = np.fromfile(fh, float, (S+1)**2, '\t')
rest = fh.read().strip()
array.shape = (S+1, S+1)
a = array[:-1, 0]
b = array[:-1, 1:]
c = array[ -1, 1:]
tolerance = float(rest) if rest else 0.0
return a, b, c, tolerance
Although I'm not entirely sure how consistently numpy.fromfile advances the file pointer... There are no guarantees in the documentation.
Handling of file exceptions should probably be done outside of the parsing method.
Code -
from collections import namedtuple
def parse_file(file_name):
with open('a.txt', 'r') as f:
file_content = f.readlines()
file_content = [line.strip('\n') for line in file_content]
s = int(file_content[0])
a = [float(file_content[i].split()[0]) for i in range(1, s + 1)]
b = [list(map(float, file_content[i].split()[1:]))
for i in range(1, s + 1)]
c = list(map(float, file_content[-2].split()))
tolerance = float(file_content[-1])
ButcherTableau = namedtuple('ButcherTableau', 's a b c tolerance')
bt = ButcherTableau(s, a, b, c, tolerance)
return bt
p = parse_file('a.txt')
print('S :', p.s)
print('a :', p.a)
print('b :', p.b)
print('c :', p.c)
print('tolerance :', p.tolerance)
Output -
S : 2
a : [0.0, 1.0]
b : [[0.0, 0.0], [0.5, 0.5]]
c : [0.0, 0.5, 0.5]
tolerance : 0.0001
Here's a bunch of suggestions you should consider:
from collections import namedtuple
import csv
def parse_method(file_name):
# for conveniency create a namedtuple
bt = namedtuple('ButcherTableau', dict(a=[], b=[], c=[], order=0, tolerance=0))
line = None
# advice ①: do not assume file path in a function, make assumptions as close to your main function as possible (to make it easier to parameterize later on)
# advice ②: do not call your file "file" so you're not shadowing the class "file" that's loaded globally at runtime
with open(file_name, 'r') as f:
# read the first line alone to setup your "method order" value before reading all the tab separated values
bt.order = int(f.readline())
# create a csv reader with cell separator as tabs
# and create an enumerator to have indexes for each line
for idx, line in enumerate(csv.reader(f, delimiter='\t')))
# instead of iterating again, you can just check the index
# and build your a and b values
if idx < bt.order:
bt.a.append(line.pop(0))
bt.b.append(line)
# if line is None (as set before the for), it means we did not iterate, meaning that we need to make it an error
if not line:
raise Exception("File is empty. Could not parse {}".format(file_name))
# finally you can build your c (and tolerance) values with the last line, which conveniently is still available once the for is finished
bt.c = line[1:]
bt.tolerance = line[0] if idx > S+1 else 0
# avoid the globals, return the namedtuple instead and use the results in the caller function
return bt
This code is untested (just rework of your code as I read it), so it might not work as is, but you might want take the good ideas and make them your own.
Related
I'm a reading a text file with several hundred lines of data in python. The text file contains data written as a tuple assignment. For example, the data looks exactly like this in the text file:
d1: p,h,t,m= 74.15 18 6 0.1 ign: 0.0003
d2: p,h,t,m= 54. 378 -0.14 0.1 ign: 0.0009
How can I separate the data as such:
p = 20
t = 15
etc.
Then, how can I perform calculations on the tuple assignment? For example calculate:
p*p = 20*15?
I am not sure if I should convert the tuple assignment to an array. But I was not successful. In addition, I do not know how to get rid of the d1 and d2: which is there to identify which data set I am looking at
I have read the data and picked out the lines that have the data, (ignoring the First Set line and of Data Given as line)
The results that I need would be:
p (from first set of data d1)*p(from first set of data d2) = 20*15 = 300
p (from second set of data d1)*p(from second set of data d2) = 12*5 = 60
I believe I would need to do this over some kind of loop so that I can separate the data in all the lines in the file.
I would appreciate any help on this! I couldn't find anything pertaining to my question. I would only find how to deal with tuples in the simplest manner but nothing on how to extract variables and performing calculations on a tuple assignment contained in a text file.
EDIT:
After looking at the answer given for this question given by #JArunMani, I went back to try to see if I can understand each line of code. I understand that we need to create a dictionary that fills in the respective values for p, q, etc...
When I try to rewrite the code to how I understand it, I have:
with open("d.txt") as fp: # Opens the file
# The database kinda thing here
line = fp.readline() # Read the file's first line
number, _,cont = line.partition(":")#separates m1 from p, m, h, n =..."
print(cont)
data, _,ignore = cont.partition("int") #separates int from p, m, h, n =..."
print(data) #prints tuple assignment needed
keys, _,values = data.partition("=")
print(keys) #prints p, m, h, n
print(values) #prints values (all numbers after =)
thisdict = {} #creating an empty dictionary to fill with keys and values
thisdict[keys] = values
print(thisdict)
if "m" in thisdict:
print("Yes")
print(thisdict) gives me the Output: {' p,m,h,n': ' 76 6818 2.2 1 '}
However, if "m" in thisdict: did not print anything. I do not understand why m is not in the dictionary, yet print(thisdict) shows that thisdict = {} has been filled. Also, is it necessary to add the for loop in the answer given below?
Thank you.
EDIT 2
I am now trying my second attempt to this problem. I combining both answers to write the code since I using what I understand from each code:
def DataExtract(self):
with open("muonsdata.txt") as fp: # Opens the file
line = fp.readline() # Read the file's first line
number, _,cont = line.partition(":")#separates m1 from pt, eta, phi, m =..."
print(cont)
data, _,ignore = cont.partition("dptinv") #separates dptinv from pt, eta, phi, m =..."
print(data) #prints tuple assignment needed
keys, _,values = data.partition("=")
print(keys) #prints pt, eta, phi, m
print(values) #prints values (all numbers after =)
key = [k for k in keys.split(",")]
value = [v for v in values.strip().split(" ")]
print(key)
print(value)
thisdict = {}
data = {}
for k, v in zip(key, value): #creating an empty dictionary to fill with keys and values
thisdict[k] = v
print(thisdict)
if "m" in thisdict:
print("Yes")
x = DataExtract("C:/Users/username/Desktop/data.txt")
mul_p = x['m1']['p'] * x['d2']['p']
print(mul_p)
However, this gives me the error: Traceback (most recent call last):
File "read.py", line 29, in
mul_p = x['d1']['p'] * x['d2']['p']
TypeError: 'NoneType' object is not subscriptable
EDIT 3
I have the code made from a combination of answers 1 and 2, BUT...
the only thing is that I have the code written and working but why doesn't the while loop go on until we reach the end of the file. I only get one answer from the calculating the values from the first two lines, but what about the remaining lines? Also, it seems like it is not reading the d2 data lines (or the line = fp.readline is not doing anything), because when I try to calculate m , I get the error Traceback (most recent call last):
File "read.py", line 37, in
m = math.cosh(float(data[" m2"]["eta"])) * float(data["m1"][" pt"])
KeyError: ' m2'
Here is my code that I have:
import math
with open("d.txt") as fp: # Opens the file
data ={} #final dictionary
line = fp.readline() # Read the file's first line
while line: #continues to end of file
name, _,cont = line.partition(":")#separates d1 from p, m, h, t =..."
#print(cont)
numbers, _,ignore = cont.partition("ign") #separates ign from p, m, h, t =..."
#print(numbers) #prints tuple assignment needed
keys, _,values = numbers.partition("=")
#print(keys) #prints p, m, h, t
#print(values) #prints values (all numbers after =)
key = [k for k in keys.split(",")]
value = [v for v in values.strip().split(" ")]
#print(key) #prints pt, eta, phi, m
#print(value)
thisdict = {}
for k, v in zip(key, value): #creating an empty dictionary to fill with keys and values
#thisdict[k] = v
#print(thisdict)
#data[name]=thisdict
line = fp.readline()#read next lines, not working I think
thisdict[k] = v
data[name]=thisdict
print(thisdict)
#if " m2" in thisdict:
#print("Yes")
#print(data)
#mul_p = float(data["d1"][" p"])*float(data["d1"]["m"])
m = math.cosh(float(data[" d2"]["m"])) * float(data["m1"][" p"])
#m1 = float(data["d1"][" p"]) * float(2)
print(m)
#print(mul_p)
If I replace the d2's with d1 the code runs fine, except it skips the last d1. I do not know what I am doing wrong. Would appreciate any input or guidance.
So the following function returns a dictionary with values of 'p', 'q' and other variables. But I leave it to you to find out how to multiply or perform operations on them ^^
def DataExtract(path): # 'path' is the path to the data file
fp = open(path) # Opens the file
data = {} # The database kinda thing here
line = fp.readline() # Read the file's first line
while line: # This goes on till we reach end of file (EOF)
name, _, cont = line.partition(":") # So this gives, 'd1', ':', 'p, q, ...'
keys, _, values = cont.partition("=") # Now we split the text into RHS and LHS
keys = keys.split(",") # Split the variables by ',' as separator
values = values.split(",") # Split the values
temp_d = {} # Dict for variables
for i in range(len(keys)):
key = keys[i].strip() # Get the item at the index and remove left-right spaces
val = values[i].strip() # Same
temp_d[key] = float(val) # Store it in dictionary but as number
data[name.strip()] = temp_d # Store the temp_d itself in main dict
line = fp.readline() # Now read next line
fp.close() # Close the file
return data # Return the data
I used simple methods, to make it easy for you. Now to access data, you have to do something like this:
x = DataExtract("your_file_path")
mul_p = x['d1']['p'] * x['d2']['p']
print(mul_p) # Tadaaa !
Feel free to comment...
This answer is quite familiar with #JArunMani, but it's shorter a bit and sure that can run successfully.
The idea is return your data to dictionary.
lines = "d1: p,h,t,m= 74.15 18 6 0.1 ign: 0.0003\nd2: p,h,t,m= 54. 378 -0.14 0.1 ign: 0.0009".split("\n") # lines=open("d.txt",'r').read().split("\n")
data = {}
for line in lines:
l = line.split("ign")[0] # remove "ign:.."
name_dict, vals_dict = l.split(":") #['d1',' p,h,t,m= 74.15 18 6 0.1']
keys_str, values_str = vals_dict.split("=") #[' p,h,t,m',' 74.15 18 6 0.1']
keys=[k for k in keys_str.strip().split(',')] #['p','h','t','m']
values=[float(v) for v in values_str.strip().split(' ')] #[74.15, 18, 6, 0.1]
sub_dict = {}
for k,v in zip(keys, values):
sub_dict[k]=v
data[name_dict]=sub_dict
Result:
>>>data
{'d1': {'p': 74.15, 'h': 18.0, 't': 6.0, 'm': 0.1}, 'd2': {'p': 54.0, 'h': 378.0, 't': -0.14, 'm': 0.1}}
>>>data['d1']['p']*data['d2']['p']
4004.1000000000004
I have a text file that contains data. A snippet of the text file looks like this:
d1: p,h,t,m= 74.15 18 6 0.1 ign: 0.0003
d2: p,h,t,m= 54. 378 -0.14 0.1 ign: 0.0009
d1: p,h,t,m= 715 8 16 0.1 ign: 0.0003
d2: p,h,t,m= 50 78 4 0.1 ign: 0.0009
(where there is a space before d2). The text file contains several hundred lines.
What I am trying to do is extract the data from d1 and d2 like:
p = 74.15
t = 18
etc
I have done this by creating a dictionary.
Then, I want to perform a calculation on the data as such, for example,
p (from d1)* p(d2) + t(from d1)
and repeat the calculation throughout the txt file.
Here is the code I have:
import math
with open("d.txt") as fp: # Opens the file
data ={} #final dictionary
line = fp.readline() # Read the file's first line
while line: #continues to end of file
name, _,cont = line.partition(":")#separates m1 from pt, eta, phi, m =..."
#print(cont)
numbers, _,ignore = cont.partition("dptinv") #separates dptinv from pt, eta, phi, m =..."
#print(numbers) #prints tuple assignment needed
keys, _,values = numbers.partition("=")
#print(keys) #prints pt, eta, phi, m
#print(values) #prints values (all numbers after =)
key = [k for k in keys.split(",")]
value = [v for v in values.strip().split(" ")]
#print(key) #prints pt, eta, phi, m
#print(value)
thisdict = {}
for k, v in zip(key, value): #creating an empty dictionary to fill with keys and values
#thisdict[k] = v
#print(thisdict)
#data[name]=thisdict
line = fp.readline()#read next lines
thisdict[k] = v
data[name]=thisdict
print(thisdict)
#if " m2" in thisdict:
#print("Yes")
#print(data)
#mul_p = float(data["m1"][" pt"])*float(data["m1"]["eta"])
m = math.cosh(float(data[" m2"]["eta"])) * float(data["m1"][" pt"])
#m1 = float(data["m1"][" pt"]) * float(2)
print(m)
I had the code made from a combination of answers from my previous question on this, BUT...
One problem is: that the while loop reads through the entire file except the last two lines.
d1:...
d2:...
The second problem is that it seems like it is not reading the d2 data lines (or the line = fp.readline #read next lines is not doing anything), because when I try to calculate m , I get the error
Traceback (most recent call last): File "read.py", line 37, in m = math.cosh(float(data[" m2"]["eta"])) * float(data["m1"][" pt"]) KeyError: ' m2'
I asked about this from another forum and I am still trying to understand what is WRONG with HOW I wrote the code. And what do I need to do to fix it? Any help and guidance is much appreciated! Thank you !
you should try reorganize your reading process
and use more readable data structure
as far as i can see,
data in your text file are grouped in paired lines ,
so my suggested process on this would be
# do your init outside of the loop
# 4 lists should have same length
d1p =[]
d2p= []
d1t= []
d2t= []
with open("muonsdata.txt") as fp: # Opens the file
d1line = fp.readline() # Read one line supposed to have d1
d2line = fp.readline() # Read second line supposed to have d2
# do more split staff
# extract numbers and append to the associate list
for i in range(0..lens(d1p)):
m=d1p[i]*d2p[i]+d1t[i]
I have a program that creates a 2d array in Python but how do I save it as a csv file, it is
value_a = int(input("Type in a value for a: "))
value_b = int(input("Now a value for b: "))
value_c = int(input("And a value for c: "))
d = value_a + value_b + value_c
result = [[value_a, value_b, value_c, d]] # put the initial values into the array
number_of_loops = int(input("type in the number of loops the program must execute: "))
def loops(a, b, c, n):
global result
for i in range(n):
one_loop = [] # assign an empty array for the result of one loop
temp_a = a
a = ((a + 1) * 2) # This adds 1 to a and then multiplies by 2
one_loop.append(str(a))
b = b * 2
one_loop.append(b)
c = (temp_a + b)
one_loop.append(c)
d = a + b + c
one_loop.append(d)
result.append(one_loop)
print(result)
loops(value_a, value_b, value_c, number_of_loops)
print(result)
It prints ok but how do I save the array as a csv file
Use csvwriter.writerows,
import csv
with open(filename, 'w') as f:
writer = csv.writer(f)
writer.writerows(result)
If you're able to use third-party libraries and you're going to be working with 2d (or more) arrays in Python, I'd recommend you use a library like numpy or pandas. Numpy includes a method to write out arrays as csv files called savetxt. Good luck!
Python comes with CSV writing and reading functionality. See The Python Standard Library » 13.1csv — CSV File Reading and Writing for fuller documentation, but here is a quick example taken from that page and adapted to your problem:
import csv
with open('eggs.csv', 'wb') as csvfile:
spamwriter = csv.writer(csvfile, delimiter=' ',
quotechar='|', quoting=csv.QUOTE_MINIMAL)
for row in results:
spamwriter.writerow(row)
following is my code. not finding any comments, I will add my codes.
filenames2 = ['BROWN1_L1.txt', 'BROWN1_M1.txt', 'BROWN1_N1.txt', 'BROWN1_P1.txt', 'BROWN1_R1.txt']
with open("C:/Python27/L1_R1_TRAINING.txt", 'w') as outfile:
for fname in filenames2:
with open(fname) as infile:
for line in infile:
outfile.write(line)
b = open("C:/Python27/L1_R1_TRAINING.txt", 'rU')
filenames3 =[]
for path, dirs, files in os.walk("C:/Python27/Reutertest"):
for file in files:
file = os.path.join(path, file)
filenames3.append(file)
with open("C:/Python27/REUTER.txt", 'w') as outfile:
for fname in filenames3:
with open(fname) as infile:
for line in infile:
outfile.write(line)
c = open("C:/Python27/REUTER.txt", 'rU')
def Cross_Entropy(x,y):
filecontents1 = x.read()
filecontents2 = y.read()
sentence1 = filecontents1.upper()
sentence2 = filecontents2.upper()
count_A1 = sentence1.count('A')
count_B1 = sentence1.count('B')
count_C1 = sentence1.count('C')
count_all1 = len(sentence1)
prob_A1 = count_A1 / count_all1
prob_B1 = count_B1 / count_all1
prob_C1 = count_C1 / count_all1
count_A2 = sentence2.count('A')
count_B2 = sentence2.count('B')
count_C2 = sentence2.count('C')
count_all2 = len(sentence2)
prob_A2 = count_A2 / count_all2
prob_B2 = count_B2 / count_all2
prob_C2 = count_C2 / count_all2
Cross_Entropy = -(prob_A1 * math.log(prob_A2, 2) + prob_B1 * math.log(prob_B2, 2) + prob_C1 * math.log(prob_C2, 2)
Cross_Entropy(b, c)
Yes. now. I'v got error "prob_A1 = count_A1 / count_all1
ZeroDivisionError: division by zero" . what's wrong with my code? Is my orthography is wrong?
I'm not quite sure what is behind your failure to read your strings from the files, but your cross-entropy can be computed much more succinctly:
def crossEntropy(s1,s2):
s1 = s1.upper()
s2 = s2.upper()
probsOne = (s1.count(c)/float(len(s1)) for c in 'ABC')
probsTwo = (s2.count(c)/float(len(s2)) for c in 'ABC')
return -sum(p*math.log(q,2) for p,q in zip(probsOne,probsTwo))
For example,
>>> crossEntropy('abbcabcba','abbabaccc')
1.584962500721156
If this is what you want to compute -- you can now concentrate on assembling the strings to pass to crossEntropy. I would recommend getting rid of the read-write-read logic (unless you need the two files that you are trying to create) and just directly read the files in the two directories into two arrays, joining them into two big strings which are stripped of all white space and then passed to crossEntropy
Another possible approach. If all you want are the counts of 'A', 'B', 'C' in the two directories -- just create two dictionaries, one for each directory, both keyed by 'A', 'B', and 'C', and iterate through the files in each directory, reading each file in turn, iterating through but not saving the resulting string, just getting the counts of those three characters, and creating a version of crossEntropy which is expecting two dictionaries.
Something like:
def crossEntropy(d1,d2):
countOne = sum(d1[c] for c in 'ABC')
countTwo = sum(d2[c] for c in 'ABC')
probsOne = (d1[c]/float(countOne) for c in 'ABC')
probsTwo = (d2[c]/float(countTwo) for c in 'ABC')
return -sum(p*math.log(q,2) for p,q in zip(probsOne,probsTwo))
For example,
>>> d1 = {'A':3,'B':5,'C':2}
>>> d2 = {'A':2,'B':5,'C':3}
>>> crossEntropy(d1,d2)
1.54397154729945
I'm trying to use Python to parse data from a text file which is formatted like this:
<event>
A 0.8
B 0.4 0.3 -0.5 0.3
</event>
<event>
A 0.2
B 0.3 0.2 -0.5 0.8
C 0.1 0.3 -0.3 0.2
C -0.2 0.4 -0.1 0.9
</event>
<event>
A 0.4
B 0.4 0.3 -0.5 0.3
C 0.3 0.7 0.6 0.5
</event>
Variables A & B are always present in each event, but as you can see, the C variable can occur up to two times in one event and sometimes doesn't occur at all. There are about 10,000+ events in total.
I'd like to format all of this so I can call each piece of data individually (i.e. column 2 for variable B from event 3), as well as in groups (i.e. plotting variable A, column 0 for all the events) but the repeating C variable is tripping me up a bit. I would ideally like to have a column of data for C variable #1 and C variable #2, where the data can simply be 0 when there is only one or zero C variables in an event.
My code is far from elegant at the moment and the output format isn't quite what it needs to be, so I'd love suggestions on how to simplify and improve this.
M = 10000 # number of events
file = open('data.txt')
a_lines = open('a.txt','w')
b_lines = open('b.txt','w')
c1_lines = open('c1.txt','w')
c2_lines = open('c2.txt','w')
c1 = []
c2 = []
for i in range(M):
for line in file:
if not line.strip():
continue
if line.startswith("</event>"):
break
elif line.startswith("<event>"):
a = file.next()
print >>a_lines,i,a
for i in range(M):
for line in file:
if line.startswith("B"):
print >>b_lines,i,line.strip()
nextline=file.next().strip()
c1.append(nextline)
nextline2=file.next().strip()
c2.append(nextline2)
break
# Parsing the duplicate C columns...
# I've formatted it so the 0 is aligned with the other data
for i in range(M):
if "C" in c1[i]:
print >>c1_lines, i, c1[i]
else:
print >>c1_lines, i, "C 0"
for i in range(M):
if "C" in c2[i]:
print >>c2_lines, i, c2[i]
else:
print >>c2_lines, i, "C 0"
# Sample variable formatting attempt:
b_event_num,b_0,b_1,b_2,b_3=loadtxt("b.txt",usecols=(0,1,2,3,4),unpack=True)
b_0=array(b_0)
b_1=array(b_1)
b_2=array(b_2)
b_3=array(b_3)
b_0=b_0.reshape((len(b_0)),1)
b_1=b_1.reshape((len(b_1)),1)
b_2=b_2.reshape((len(b_2)),1)
b_3=b_3.reshape((len(b_3)),1)
b_points=np.hstack((b_0,b_1,b_2,b_3))
The extracted data itself looks okay, but when I try to load in the columns, I'm getting the following error, and I don't know why:
vals = [vals[i] for i in usecols]
IndexError: list index out of range
Any help would be appreciated; thanks!
The IndexError is coming from trying to access vals[0] when vals = []. If you expand your code the error might make more sense:
vals = []
for i in usecols:
vals[i] = i
The error happens in the first use of the loop because vals[0] isn't in the list. I would suggest a fix, but I'm not sure what your trying to do. If you just want vals to be the list [0,1,2,3,4] you can just use
vals = range(5)
Edit:
On a side note I don't think that saving it in a separate file is necessary. It would be a lot better to just save it directly into the array, like:
M = 10000 # number of events
file = open('data.txt')
a = []
b = []
c2 = []
c2 = []
def parseLine(line, section):
line = line.split()
line = line[1:] # To take out the letter at the start
section.append(line)
file.next()
for i in range(M):
parseLine(file.next(), a)
parseLine(file.next(), b)
nextLine = file.next()
if nextLine.startswith("C"):
parseLine(nextLine, c1)
nextLine = file.next()
if nextLine.startswith("C"):
parseLine(nextLine, c2)
file.next() # To get to the end of the event
else:
c2.append([0])
else:
c1.append([0])
c2.append([0])
file.next()
Be careful though because to get the element from the 2nd element from the 8th event for b you would do b[7][1], so it's b[event-1][column-1]