python zipfile module with TextIOWrapper - python

I wrote the following piece of code to read a text file inside of a zipped directory. Since I don't want the output in bytes I added the TextIOWrapper to display the output as a string. Assuming that this is the right way to read a zip file line by line (if it isn't let me know), then why does the output print a blank line? Is there any way to get rid of it?
import zipfile
import io
def test():
zf = zipfile.ZipFile(r'C:\Users\test\Desktop\zip1.zip')
for filename in zf.namelist():
words = io.TextIOWrapper(zf.open(filename, 'r'))
for line in words:
print (line)
zf.close()
test()
>>>
This is a test line...
This is a test line...
>>>
The two lines in the file inside of the zipped folder are:
This is a test line...
This is a test line...
Thanks!

zipfile.open opens the zipped file in binary mode, which doesn't strip out carriage returns (i.e. '\r'), and neither did the defaults for TextIOWrapper in my test. Try configuring TextIOWrapper to use universal newlines (i.e. newline=None):
import zipfile
import io
zf = zipfile.ZipFile('data/test_zip.zip')
for filename in zf.namelist():
with zf.open(filename, 'r') as f:
words = io.TextIOWrapper(f, newline=None)
for line in words:
print(repr(line))
Output:
'This is a test line...\n'
'This is a test line...'
The normal behavior when iterating a file by line in Python is to retain the newline at the end. The print function also adds a newline, so you'll get a blank line. To just print the file you could instead use print(words.read()). Or you could use the end option of the print function: print(line, end='').

Related

replace new line in a different file with an underscore (without using with)

I posted a question yesterday in similar regards to this but didn't quite gauge the response I wanted because I wasn't specific enough. Basically the function takes a .txt file as the argument and returns a string with all \n characters replaced with an '_' on the same line. I want to do this without using WITH. I thought I did this correctly but when I run it and check the file, nothing has changed. Any pointers?
This is what I did:
def one_line(filename):
wordfile = open(filename)
text_str = wordfile.read().replace("\n", "_")
wordfile.close()
return text_str
one_line("words.txt")
but to no avail. I open the text file and it remains the same.
The contents of the textfile are:
I like to eat
pancakes every day
and the output that's supposed to be shown is:
>>> one_line("words.txt")
’I like to eat_pancakes every day_’
The fileinput module in the Python standard library allows you to do this in one fell swoop.
import fileinput
for line in fileinput.input(filename, inplace=True):
line = line.replace('\n', '_')
print(line, end='')
The requirement to avoid a with statement is trivial but rather pointless. Anything which looks like
with open(filename) as handle:
stuff
can simply be rewritten as
try:
handle = open(filename)
stuff
finally:
handle.close()
If you take out the try/finally you have a bug which leaves handle open if an error happens. The purpose of the with context manager for open() is to simplify this common use case.
You are missing some steps. After you obtain the updated string, you need to write it back to the file, example below without using with
def one_line(filename):
wordfile = open(filename)
text_str = wordfile.read().replace("\n", "_")
wordfile.close()
return text_str
def write_line(s):
# Open the file in write mode
wordfile = open("words.txt", 'w')
# Write the updated string to the file
wordfile.write(s)
# Close the file
wordfile.close()
s = one_line("words.txt")
write_line(s)
Or using with
with open("file.txt",'w') as wordfile:
#Write the updated string to the file
wordfile.write(s)
with pathlib you could achieve what you want this way:
from pathlib import Path
path = Path(filename)
contents = path.read_text()
contents = contents.replace("\n", "_")
path.write_text(contents)

file open() , readLines()

import os.path
os.path.exists('~/fileToExperiment.txt')
myfile = open('~/fileToExperiment.txt','r')
myfile.readlines()
for line in myfile:
print line
So I am trying to run this very simple python code but it doesnot output anything nor does it has any errors.
The filestoExperiment text is not empty.
Whats wrong here ? Could someone point out
By doing, myfile.readlines() you already read the entire file. Then, we you try to iterate over your file object, you already are at the end of the file.
A better practice is to do:
with open('~/fileToExperiment.txt','r') as myfile:
for line in myfile:
print line
myfile.readlines() will store the whole content of the file in memory. If you do not need the entire content at once, it is best to read line by line.
If you do need the entire content, you can use
with open('~/fileToExperiment.txt','r') as myfile:
content = myfile.read() ## or content = myfile.readlines()
Also note the use of the with statement, which is recommended when handling files (no need to close the file afterwards).
You didn't store the lines in a variable. So try this:
lines = myfile.readlines()
for line in lines:
print line
You can use either readlines() or looping file object to print or read the lines from file.
readlines() - returns the complete file as a "list of strings each separated by \n"
for example,
code:
print myfile.readlines()
output:
['Hello World\n', 'Welcome to Python\n', 'End of line\n']
Looping file object - You can loop over the file object for reading lines from a file. This is memory efficient, fast, and leads to simple code. For example,
code:
myfile = open('newfile.txt', 'r')
for line in myfile:
print line
output:
Hello World
Welcome to Python
End of line

Python failing to read lines properly

I'm supposed to open a file, read it line per line and display the lines out.
Here's the code:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import os
import re
in_path = "../vas_output/Glyph/20140623-FLYOUT_mins_cleaned.csv"
out_path = "../vas_gender/Glyph/"
csv_read_line = open(in_path, "rb").read().split("\n")
line_number = 0
for line in csv_read_line:
line_number+=1
print str(line_number) + line
Here's the contents of the input file:
12345^67890^abcedefg
random^test^subject
this^sucks^crap
And here's the result:
this^sucks^crapjectfg
Some weird combo of all three. In addition to this, the result of line_number is missing. Printing out the result of len(csv_read_line) outputs 1, for some reason, no matter how many is in the input file. Changing the split type from \n to ^ gives the expected output, though, so I'm assuming the problem is probably with the input file.
I'm using a Mac, and did both the python code and the input file (on Sublime Text) on the Mac itself.
Am I missing something?
You seem to be splitting on "\n" which isn't necessary, and could be incorrect depending on the line terminators used in the input file. Python includes functionality to iterate over the lines of a file one at a time. The advantages are that it will worry about processing line terminators in a portable way, as well as not requiring the entire file to be held in memory at once.
Further, note that you are opening the file in binary mode (the b character in your mode string) when you actually intend to read the file as text. This can cause problems similar to the one you are experiencing.
Also, you do not close the file when you are done with it. In this case that isn't a problem, but you should get in the habit of using with blocks when possible to make sure the file gets closed at the earliest possible time.
Try this:
with open(in_path, "r") as f:
line_number = 0
for line in f:
line_number += 1
print str(line_number) + line.rstrip('\r\n')
So your example just works for me.
But then, i just copied your text into a text editor on linux, and did it that way, so any carriage returns will have been wiped out.
Try this code though:
import os
in_path = "input.txt"
with open(in_path, "rb") as inputFile:
for lineNumber, line in enumerate(inputFile):
print lineNumber, line.strip()
It's a little cleaner, and the for line in file style deals with line breaks for you in a system independent way - Python's open has universal newline support.
I'd try the following Pythonic code:
#!/usr/bin/env python
in_path = "../vas_output/Glyph/20140623-FLYOUT_mins_cleaned.csv"
out_path = "../vas_gender/Glyph/"
with open(in_path, 'rb') as f:
for i, line in enumerate(f):
print(str(i) + line)
There are several improvements that can be made here to make it more idiomatic python.
import csv
in_path = "../vas_output/Glyph/20140623-FLYOUT_mins_cleaned.csv"
out_path = "../vas_gender/Glyph/"
#Lets open the file and make sure that it closes when we unindent
with open(in_path,"rb") as input_file:
#Create a csv reader object that will parse the input for us
reader = csv.reader(input_file,delimiter="^")
#Enumerate over the rows (these will be lists of strings) and keep track of
#of the line number using python's built in enumerate function
for line_num, row in enumerate(reader):
#You can process whatever you would like here. But for now we will just
#print out what you were originally printing
print str(line_num) + "^".join(row)

Open a file for input and output in Python

I have the following code which is intended to remove specific lines of a file. When I run it, it prints the two filenames that live in the directory, then deletes all information in them. What am I doing wrong? I'm using Python 3.2 under Windows.
import os
files = [file for file in os.listdir() if file.split(".")[-1] == "txt"]
for file in files:
print(file)
input = open(file,"r")
output = open(file,"w")
for line in input:
print(line)
# if line is good, write it to output
input.close()
output.close()
open(file, 'w') wipes the file. To prevent that, open it in r+ mode (read+write/don't wipe), then read it all at once, filter the lines, and write them back out again. Something like
with open(file, "r+") as f:
lines = f.readlines() # read entire file into memory
f.seek(0) # go back to the beginning of the file
f.writelines(filter(good, lines)) # dump the filtered lines back
f.truncate() # wipe the remains of the old file
I've assumed that good is a function telling whether a line should be kept.
If your file fits in memory, the easiest solution is to open the file for reading, read its contents to memory, close the file, open it for writing and write the filtered output back:
with open(file_name) as f:
lines = list(f)
# filter lines
with open(file_name, "w") as f: # This removes the file contents
f.writelines(lines)
Since you are not intermangling read and write operations, the advanced file modes like "r+" are unnecessary here, and only compicate things.
If the file does not fit into memory, the usual approach is to write the output to a new, temporary file, and move it back to the original file name after processing is finished.
One way is to use the fileinput stdlib module. Then you don't have to worry about open/closing and file modes etc...
import fileinput
from contextlib import closing
import os
fnames = [fname for fname in os.listdir() if fname.split(".")[-1] == "txt"] # use splitext
with closing(fileinput.input(fnames, inplace=True)) as fin:
for line in fin:
# some condition
if 'z' not in line: # your condition here
print line, # suppress new line but adjust for py3 - print(line, eol='') ?
When using inplace=True - the fileinput redirects stdout to be to the file currently opened. A backup of the file with a default '.bak' extension is created which may come in useful if needed.
jon#minerva:~$ cat testtext.txt
one
two
three
four
five
six
seven
eight
nine
ten
After running the above with a condition of not line.startswith('t'):
jon#minerva:~$ cat testtext.txt
one
four
five
six
seven
eight
nine
You're deleting everything when you open the file to write to it. You can't have an open read and write to a file at the same time. Use open(file,"r+") instead, and then save all the lines to another variable before writing anything.
You should not open the same file for reading and writing at the same time.
"w" means create a empty for writing. If the file already exists, its data will be deleted.
So you can use a different file name for writing.

How can I use readline method?

I have this trivial code:
from sys import argv
script, input_file = argv
def fma(f):
f.readline()
current_file = open(input_file)
fma(current_file)
The contents of the txt file is:
Hello this is a test.\n
I like cheese and macaroni.\n
I love to drink juice.\n
\n
\n
I put the \n chars so you know I hit enter in my text editor.
What I want to accomplish is to get back every single line and every \n character.
The problem is, when running the script I get nothing back. What am I doing wrong and how can I fix it in order to run as I stated above?
Your function reads a line, but does nothing with it:
def fma(f):
f.readline()
You'd need to return the string that f.readline() gives you. Keep in mind, in the interactive prompt, the last value produced is printed automatically, but that isn't how Python code in a .py file works.
Pretty certain what you actually want is f.readlines, not f.readline.
# module start
from __future__ import with_statement # for python 2.5 and earlier
def readfile(path):
with open(path) as src:
return src.readlines()
if __name__ == '__main__':
import sys
print readfile(sys.argv[1])
# module end
Note that I am using the with context manager to open your file more efficiently (it does the job of closing the file for you). In Python 2.6 and later you don't need the fancy import statement at the top to use it, but I have a habit of including it for anyone still using older Python.
def fma(f):
f.readline()
f.readline() is a function, it returns a value which is in this case a line from the file, you need to "do" something with that value like:
def fma(f):
print f.readline()
Your function is not returning anything.
def fma(f):
data = f.readline()
return data
f.readline() reads a single line from the file; a newline character (\n) is left at the end of the string, and is only omitted on the last line of the file if the file doesn’t end in a newline.
the script should be as follow:
from sys import argv
script, input_file = argv
def fma(f):
line = f.readline()
return line
current_file = open(input_file)
print fma(current_file)

Categories

Resources