Parsing hl7 message line by line using python - python

i want to read my hl7 messages from a file line by line and parse them using python. I'm able to read But my problem is in parsing. It parses only my 1st line of the file and prints till the 2nd line but does not parses futhur because it tells that my 2nd line is not hl7. And the error shown is
h=hl7.parse(line)
File "C:\Python27\lib\site-packages\hl7\parser.py", line 45, in parse
plan = create_parse_plan(strmsg, factory)
File "C:\Python27\lib\site-packages\hl7\parser.py", line 88, in create_parse_plan
assert strmsg[:3] in ('MSH')
AssertionError
for the code:
with open('example.txt','r') as f:
for line in f:
print line
print hl7.isfile(line)
h=hl7.parse(line)
So how do i make my file a valid one. This is example.txt file
MSH|^~\&|AcmeMed|Lab|Main HIS|St.Micheals|20130408031655||ADT^A01|6306E85542000679F11EEA93EE38C18813E1C63CB09673815639B8AD55D6775|P|2.6|
EVN||20050622101634||||20110505110517|
PID|||231331||Garland^Tracy||19010201|F||EU|147 Yonge St.^^LA^CA^58818|||||||28-457-773|291-697-644|
NK1|1|Smith^Sabrina|Second Cousin|
NK1|2|Fitzgerald^Sabrina|Second Cousin|
NK1|3|WHITE^Tracy|Second Cousin|
OBX|||WT^WEIGHT||78|pounds|
OBX|||HT^HEIGHT||57|cm|

I had a similar issue and came up with this solution that works for me.
In short, put all your lines into an object and then parse that object.
(Obviously you can clean up the way I checked to see if the object is made yet or not, but I was going for an easy to read example.)
a = 0
with open('example.txt','r') as f:
for line in f:
if a == 0:
message = line
a = 1
else:
message += line
h=hl7.parse(message)
Now you will have to clean up some \r\n depending on how the file is encoded for end of the line values. But it takes the message as valid and you can parse to your hearts content.
for line in h:
print(line)
MSH|^~\&|AcmeMed|Lab|Main HIS|St.Micheals|20130408031655||ADT^A01|6306E85542000679F11EEA93EE38C18813E1C63CB09673815639B8AD55D6775|P|2.6|
EVN||20050622101634||||20110505110517|
PID|||231331||Garland^Tracy||19010201|F||EU|147 Yonge St.^^LA^CA^58818|||||||28-457-773|291-697-644|
NK1|1|Smith^Sabrina|Second Cousin|
NK1|2|Fitzgerald^Sabrina|Second Cousin|
NK1|3|WHITE^Tracy|Second Cousin|
OBX|||WT^WEIGHT||78|pounds|
OBX|||HT^HEIGHT||57|cm|

Tagging onto #jtweeder answer, the following code worked for prepping my HL7 data.
In notepad++, I noticed that each line ended with LF, but did not have CR. It seems as though this hl7 library requires \r, not \n.
filename = "TEST.dat"
lines = open(filepath + filename, "r").readlines()
h = '\r'.join(lines)

Related

Writing nested list to file per line: UnsupportedOperation: not writable

I tried to write a code that removes any line from a file that starts with a number smaller than T and which then writes the remaining lines to another file.
def filter(In,Out, T):
with open(In,'r') as In:
with open(Out,'r') as Out:
lines=In.readlines()
lines=[[e for e in line.split()] for line in lines]
lines=[line for line in lines if int(line[0])>=T]
for line in lines:
for word in line:
Out.write(f"{word} ")
return None
I thought The code would probably write the words in one long line instead of putting it per line but it just returned UnsupportedOperation: not writable and I don't understand why.
it seems like your code had a few bugs
you opened the file for reading with the line with open(Out,'r') as Out: and then tried to write to the file, so I changed the r to w
you tried to open the file for writing when it was already open for reading (you cant open a file while it's open) so i moved the code the writes back to the file to be after you have finished reading from it
you open the file for reading and gave it the name In but this name is already the name of an argument that you function is getting
this should do the trick:
def filter(In_name,Out, T):
with open(In_name,'r') as In:
lines=In.readlines()
lines=[[e for e in line.split()] for line in lines]
lines=[line for line in lines if int(line[0])>=T]
with open(Out, 'w') as Out:
for line in lines:
for word in line:
Out.write(f"{word} ")
return None

Read a large text file and write to another file with Python

I am trying to convert a large text file (size of 5 gig+) but got a
From this post, I managed to convert encoding format of a text file into a format that is readable with this:
path ='path/to/file'
des_path = 'path/to/store/file'
for filename in os.listdir(path):
with open('{}/{}'.format(path, filename), 'r+', encoding='iso-8859-11') as f:
t = open('{}/{}'.format(des_path, filename), 'w')
string = f.read()
t.write(string)
t.close()
The problem here is that when I tried to convert a text file with a large size(5 GB+). I will got this error
Traceback (most recent call last):
File "Desktop/convertfile.py", line 12, in <module>
string = f.read()
File "/usr/lib/python3.6/encodings/iso8859_11.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
MemoryError
which I know that it cannot read a file with this large. And I found from several link that I can do it by reading line by line.
So, how can I apply to the code I have to make it read line by line? What I understand about reading line by line here is that I need to read a line from f and add it to t until end of the line, right?
You can iterate on the lines of an open file.
for filename in os.listdir(path):
inp, out = open_files(filename):
for line in inp:
out.write(line)
inp.close(), out.close()
Note that I've hidden the complexity of the different paths, encodings, modes in a function that I suggest you to actually write...
Re buffering, i.e. reading/writing larger chunks of the text, Python does its own buffering undercover so this shouldn't be too slow with respect to a more complex solution.

Python read lines from input file and write back to the file

I want to read the input file, line by line, then modify one line and write back the changes to the same file
the problem is that, after writing back, I lose the return to the line and I have all the data in one line
open(bludescFilePath, 'a+') as blu:
blu_file_in_lines = blu.readlines()
for line in blu_file_in_lines:
if "Length" in line:
blu_file_in_lines[13] = line.replace("0x8000",str(size))
with open(bludescFilePath, 'w') as blu:
blu.write(str(blu_file_in_lines))
EDIT
Ok, what was missing is the for loop.
with open(bludescFilePath, 'w') as blu:
for line in blu_file_in_lines:
blu.write(str(line))

Reading gzipped text file line-by-line for processing in python 3.2.6

I'm a complete newbie when it comes to python, but I've been tasked with trying to get a piece of code running on a machine which has a different version of python (3.2.6) than that which the code was originally built for.
I've come across an issue with reading in a gzipped-text file line-by-line (and processing it depending on the first character). The code (which obviously is written in python > 3.2.6) is
for line in gzip.open(input[0], 'rt'):
if line[:1] != '>':
out.write(line)
continue
chromname = match2chrom(line[1:-1])
seqname = line[1:].split()[0]
print('>{}'.format(chromname), file=out)
print('{}\t{}'.format(seqname, chromname), file=mappingout)
(for those who know, this strips gzipped FASTA genome files into headers (with ">" at start) and sequences, and processes the lines into two different files depending on this)
I have found https://bugs.python.org/issue13989, which states that mode 'rt' cannot be used for gzip.open in python-3.2 and to use something along the lines of:
import io
with io.TextIOWrapper(gzip.open(input[0], "r")) as fin:
for line in fin:
if line[:1] != '>':
out.write(line)
continue
chromname = match2chrom(line[1:-1])
seqname = line[1:].split()[0]
print('>{}'.format(chromname), file=out)
print('{}\t{}'.format(seqname, chromname), file=mappingout)
but the above code does not work:
UnsupportedOperation in line <4> of /path/to/python_file.py:
read1
How can I rewrite this routine to give out exactly what I want - reading the gzip file line-by-line into the variable "line" and processing based on the first character?
EDIT: traceback from the first version of this routine is (python 3.2.6):
Mode rt not supported
File "/path/to/python_file.py", line 79, in __process_genome_sequences
File "/opt/python-3.2.6/lib/python3.2/gzip.py", line 46, in open
File "/opt/python-3.2.6/lib/python3.2/gzip.py", line 157, in __init__
Traceback from the second version is:
UnsupportedOperation in line 81 of /path/to/python_file.py:
read1
File "/path/to/python_file.py", line 81, in __process_genome_sequences
with no further traceback (the extra two lines in the line count are the import io and with io.TextIOWrapper(gzip.open(input[0], "r")) as fin: lines
I have actually appeared to have solved the problem.
In the end I had to use shell("gunzip {input[0]}") to ensure that the gunzipped file could be read in in text mode, and then read in the resulting file using
for line in open(' *< resulting file >* ','r'):
if line[:1] != '>':
out.write(line)
continue
chromname = match2chrom(line[1:-1])
seqname = line[1:].split()[0]
print('>{}'.format(chromname), file=out)
print('{}\t{}'.format(seqname, chromname), file=mappingout)

How to load a formatted txt file into Python to be searched

I have a file that is formatted with different indentation and which is several hundred lines long and I have tried various methods to load it into python as a file and variable but have not been successful. What would be an efficient way to load the file. My end goal is to load the file, and and search it for a specific line of text.
with open('''C:\Users\Samuel\Desktop\raw.txt''') as f:
for line in f:
if line == 'media_url':
print line
else:
print "void"
Error: Traceback (most recent call last): File "<pyshell#35>", line 1, in <module> with open('''C:\Users\Samuel\Desktop\raw''') as f: IOError: [Errno 22] invalid mode ('r') or filename: 'C:\\Users\\Samuel\\Desktop\raw
If you're trying to search for a specific line, then it's much better to avoid loading the whole file in:
with open('filename.txt') as f:
for line in f:
if line == 'search string': # or perhaps: if 'search string' in line:
# do something
If you're trying to search for the presence of a specific line while ignoring indentation, you'll want to use
if line.strip() == 'search string'.strip():
in order to strip off the leading (and trailing) whitespace before comparing.
The following is the standard way of reading a file's contents into a variable:
with open("filename.txt", "r") as f:
contents = f.read()
Use the following if you want a list of lines instead of the whole file in a string:
with open("filename.txt", "r") as f:
contents = list(f.read())
You can then search for text with
if any("search string" in line for line in contents):
print 'line found'
Python uses backslash to mean "escape". For Windows paths, this means giving the path as a "raw string" -- 'r' prefix.
lines have newlines attached. To compare, strip them.
with open(r'C:\Users\Samuel\Desktop\raw.txt') as f:
for line in f:
if line.rstrip() == 'media_url':
print line
else:
print "void"

Categories

Resources