Python with open file error - python

my python program gets via ssh a mail log data. When I try to go over it line per line with
with text as f:
for line in f:
try:
.... regex stuff....
I get the error:
Traceback (most recent call last):
File "xxxxxxxxxxxxxxxxxxxxxxxx", line 90, in <module>
start()
File "xxxxxxxxxxxxxxxxxxxxxxxx", line 64, in start
with text as f:
AttributeError: __exit__
That doesn't work, the only solution which works for me is the following. When I save the text as file and open it again. But the file is about 1.24 MB big, which slows the program unnecessarely. Anyone know how I can get rid of the extra saving?
....
stdin, stdout, stderr = ssh.exec_command('sudo cat /var/log/mailing.log')
text = ''.join(stdout.readlines())
text_file = open('mail.log', 'w')
text_file.write(text)
text_file.close()
ssh.close()
with open('mail.log') as f:
for line in f:
try:
.... regex stuff....

text is a string with data. You can't open that. Instead of opening it you should just loop over it
for line in text.splitlines():
try:
.... regex stuff....

You probably want to look at StringIO from the standard library, which makes a string look more or less like a file.
Or, you could just say
for line in f.split('\n'):
try:
<regex stuff>

Related

Read a large text file and write to another file with Python

I am trying to convert a large text file (size of 5 gig+) but got a
From this post, I managed to convert encoding format of a text file into a format that is readable with this:
path ='path/to/file'
des_path = 'path/to/store/file'
for filename in os.listdir(path):
with open('{}/{}'.format(path, filename), 'r+', encoding='iso-8859-11') as f:
t = open('{}/{}'.format(des_path, filename), 'w')
string = f.read()
t.write(string)
t.close()
The problem here is that when I tried to convert a text file with a large size(5 GB+). I will got this error
Traceback (most recent call last):
File "Desktop/convertfile.py", line 12, in <module>
string = f.read()
File "/usr/lib/python3.6/encodings/iso8859_11.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
MemoryError
which I know that it cannot read a file with this large. And I found from several link that I can do it by reading line by line.
So, how can I apply to the code I have to make it read line by line? What I understand about reading line by line here is that I need to read a line from f and add it to t until end of the line, right?
You can iterate on the lines of an open file.
for filename in os.listdir(path):
inp, out = open_files(filename):
for line in inp:
out.write(line)
inp.close(), out.close()
Note that I've hidden the complexity of the different paths, encodings, modes in a function that I suggest you to actually write...
Re buffering, i.e. reading/writing larger chunks of the text, Python does its own buffering undercover so this shouldn't be too slow with respect to a more complex solution.

How to resume python program skipping data files that have faulty line format?

I need to clean *.txt files from a folder. Some files do not have the right data format and python throws a traceback. how do I skip the improper files and resume with the others without having to remove it and re-run the program.
for filename in glob.glob('datafolder/*.txt'):
inputfile = open(filename, 'r')
npv = []
for i, line in inputfile:
npv.append(line[34:36]) # the site of fault
The error is:
Traceback (most recent call last):
File "dataprep.py", line 51, in <module>
npv.append(int(line[34:36]))
ValueError: invalid literal for int() with base 10: ''
I want to drop the current 'filename' and go ahead with the next 'filename'.
Catch the ValueError and break the for-loop.
for filename in glob.glob('datafolder/*.txt'):
inputfile = open(filename, 'r')
npv = []
for i, line in inputfile:
try:
npv.append(line[34:36]) # the site of fault
except ValueError:
break
Alternatively, you could pass the line and read other lines in the file if that makes sense to your task.

Parsing hl7 message line by line using python

i want to read my hl7 messages from a file line by line and parse them using python. I'm able to read But my problem is in parsing. It parses only my 1st line of the file and prints till the 2nd line but does not parses futhur because it tells that my 2nd line is not hl7. And the error shown is
h=hl7.parse(line)
File "C:\Python27\lib\site-packages\hl7\parser.py", line 45, in parse
plan = create_parse_plan(strmsg, factory)
File "C:\Python27\lib\site-packages\hl7\parser.py", line 88, in create_parse_plan
assert strmsg[:3] in ('MSH')
AssertionError
for the code:
with open('example.txt','r') as f:
for line in f:
print line
print hl7.isfile(line)
h=hl7.parse(line)
So how do i make my file a valid one. This is example.txt file
MSH|^~\&|AcmeMed|Lab|Main HIS|St.Micheals|20130408031655||ADT^A01|6306E85542000679F11EEA93EE38C18813E1C63CB09673815639B8AD55D6775|P|2.6|
EVN||20050622101634||||20110505110517|
PID|||231331||Garland^Tracy||19010201|F||EU|147 Yonge St.^^LA^CA^58818|||||||28-457-773|291-697-644|
NK1|1|Smith^Sabrina|Second Cousin|
NK1|2|Fitzgerald^Sabrina|Second Cousin|
NK1|3|WHITE^Tracy|Second Cousin|
OBX|||WT^WEIGHT||78|pounds|
OBX|||HT^HEIGHT||57|cm|
I had a similar issue and came up with this solution that works for me.
In short, put all your lines into an object and then parse that object.
(Obviously you can clean up the way I checked to see if the object is made yet or not, but I was going for an easy to read example.)
a = 0
with open('example.txt','r') as f:
for line in f:
if a == 0:
message = line
a = 1
else:
message += line
h=hl7.parse(message)
Now you will have to clean up some \r\n depending on how the file is encoded for end of the line values. But it takes the message as valid and you can parse to your hearts content.
for line in h:
print(line)
MSH|^~\&|AcmeMed|Lab|Main HIS|St.Micheals|20130408031655||ADT^A01|6306E85542000679F11EEA93EE38C18813E1C63CB09673815639B8AD55D6775|P|2.6|
EVN||20050622101634||||20110505110517|
PID|||231331||Garland^Tracy||19010201|F||EU|147 Yonge St.^^LA^CA^58818|||||||28-457-773|291-697-644|
NK1|1|Smith^Sabrina|Second Cousin|
NK1|2|Fitzgerald^Sabrina|Second Cousin|
NK1|3|WHITE^Tracy|Second Cousin|
OBX|||WT^WEIGHT||78|pounds|
OBX|||HT^HEIGHT||57|cm|
Tagging onto #jtweeder answer, the following code worked for prepping my HL7 data.
In notepad++, I noticed that each line ended with LF, but did not have CR. It seems as though this hl7 library requires \r, not \n.
filename = "TEST.dat"
lines = open(filepath + filename, "r").readlines()
h = '\r'.join(lines)

How to load a formatted txt file into Python to be searched

I have a file that is formatted with different indentation and which is several hundred lines long and I have tried various methods to load it into python as a file and variable but have not been successful. What would be an efficient way to load the file. My end goal is to load the file, and and search it for a specific line of text.
with open('''C:\Users\Samuel\Desktop\raw.txt''') as f:
for line in f:
if line == 'media_url':
print line
else:
print "void"
Error: Traceback (most recent call last): File "<pyshell#35>", line 1, in <module> with open('''C:\Users\Samuel\Desktop\raw''') as f: IOError: [Errno 22] invalid mode ('r') or filename: 'C:\\Users\\Samuel\\Desktop\raw
If you're trying to search for a specific line, then it's much better to avoid loading the whole file in:
with open('filename.txt') as f:
for line in f:
if line == 'search string': # or perhaps: if 'search string' in line:
# do something
If you're trying to search for the presence of a specific line while ignoring indentation, you'll want to use
if line.strip() == 'search string'.strip():
in order to strip off the leading (and trailing) whitespace before comparing.
The following is the standard way of reading a file's contents into a variable:
with open("filename.txt", "r") as f:
contents = f.read()
Use the following if you want a list of lines instead of the whole file in a string:
with open("filename.txt", "r") as f:
contents = list(f.read())
You can then search for text with
if any("search string" in line for line in contents):
print 'line found'
Python uses backslash to mean "escape". For Windows paths, this means giving the path as a "raw string" -- 'r' prefix.
lines have newlines attached. To compare, strip them.
with open(r'C:\Users\Samuel\Desktop\raw.txt') as f:
for line in f:
if line.rstrip() == 'media_url':
print line
else:
print "void"

Python script to unzip and print one line of a file

I am trying a simple example of retrieving data from a file and printing only one line of the output. I get semicolon error around encoded and 'r'.
import gzip
data = gzip.open('pagecounts-20130601-000000.gz', 'r')
encoded=data.read()
print encoded[2]
It gives this error:
Traceback (most recent call last):
File "filter_articles.scpt", line 4, in <module> encoded=data.read()
File "/usr/lib/python2.7/gzip.py", line 249, in read self._read(readsize)
File "/usr/lib/python2.7/gzip.py", line 308, in _read self._add_read_data( uncompress )
File "/usr/lib/python2.7/gzip.py", line 326, in _add_read_data self.extrabuf = self.extrabuf[offset:] + data MemoryError
I guess this is because the file is huge and was not able to read the content? What could be better way to print few lines of the file?
I am assuming that:
You meant to have quotes around the file name in your script.
You actually want the third line (as your post suggests) and not the third character (as your script suggests)
In this case the following should work:
import gzip
data = gzip.open('pagecounts-20130601-000000.gz', 'r')
data.readline()
data.readline()
print data.readline()

Categories

Resources