StringIO with binary files?

StringIO with binary files? - python

I seem to get different outputs:
from StringIO import *
file = open('1.bmp', 'r')
print file.read(), '\n'
print StringIO(file.read()).getvalue()
Why? Is it because StringIO only supports text strings or something?

When you call file.read(), it will read the entire file into memory. Then, if you call file.read() again on the same file object, it will already have reached the end of the file, so it will only return an empty string.
Instead, try e.g. reopening the file:
from StringIO import *
file = open('1.bmp', 'r')
print file.read(), '\n'
file.close()
file2 = open('1.bmp', 'r')
print StringIO(file2.read()).getvalue()
file2.close()
You can also use the with statement to make that code cleaner:
from StringIO import *
with open('1.bmp', 'r') as file:
print file.read(), '\n'
with open('1.bmp', 'r') as file2:
print StringIO(file2.read()).getvalue()
As an aside, I would recommend opening binary files in binary mode: open('1.bmp', 'rb')

The second file.read() actually returns just an empty string. You should do file.seek(0) to rewind the internal file offset.

Shouldn't you be using "rb" to open, instead of just "r", since this mode assumes that you'll be processing only ASCII characters and EOFs?

Related

Replacing a line in an already opened file python [duplicate]

I want to loop over the contents of a text file and do a search and replace on some lines and write the result back to the file. I could first load the whole file in memory and then write it back, but that probably is not the best way to do it.
What is the best way to do this, within the following code?
f = open(file)
for line in f:
if line.contains('foo'):
newline = line.replace('foo', 'bar')
# how to write this newline back to the file

The shortest way would probably be to use the fileinput module. For example, the following adds line numbers to a file, in-place:
import fileinput
for line in fileinput.input("test.txt", inplace=True):
print('{} {}'.format(fileinput.filelineno(), line), end='') # for Python 3
# print "%d: %s" % (fileinput.filelineno(), line), # for Python 2
What happens here is:
The original file is moved to a backup file
The standard output is redirected to the original file within the loop
Thus any print statements write back into the original file
fileinput has more bells and whistles. For example, it can be used to automatically operate on all files in sys.args[1:], without your having to iterate over them explicitly. Starting with Python 3.2 it also provides a convenient context manager for use in a with statement.
While fileinput is great for throwaway scripts, I would be wary of using it in real code because admittedly it's not very readable or familiar. In real (production) code it's worthwhile to spend just a few more lines of code to make the process explicit and thus make the code readable.
There are two options:
The file is not overly large, and you can just read it wholly to memory. Then close the file, reopen it in writing mode and write the modified contents back.
The file is too large to be stored in memory; you can move it over to a temporary file and open that, reading it line by line, writing back into the original file. Note that this requires twice the storage.

I guess something like this should do it. It basically writes the content to a new file and replaces the old file with the new file:
from tempfile import mkstemp
from shutil import move, copymode
from os import fdopen, remove
def replace(file_path, pattern, subst):
#Create temp file
fh, abs_path = mkstemp()
with fdopen(fh,'w') as new_file:
with open(file_path) as old_file:
for line in old_file:
new_file.write(line.replace(pattern, subst))
#Copy the file permissions from the old file to the new file
copymode(file_path, abs_path)
#Remove original file
remove(file_path)
#Move new file
move(abs_path, file_path)

Here's another example that was tested, and will match search & replace patterns:
import fileinput
import sys
def replaceAll(file,searchExp,replaceExp):
for line in fileinput.input(file, inplace=1):
if searchExp in line:
line = line.replace(searchExp,replaceExp)
sys.stdout.write(line)
Example use:
replaceAll("/fooBar.txt","Hello\sWorld!$","Goodbye\sWorld.")

This should work: (inplace editing)
import fileinput
# Does a list of files, and
# redirects STDOUT to the file in question
for line in fileinput.input(files, inplace = 1):
print line.replace("foo", "bar"),

Based on the answer by Thomas Watnedal.
However, this does not answer the line-to-line part of the original question exactly. The function can still replace on a line-to-line basis
This implementation replaces the file contents without using temporary files, as a consequence file permissions remain unchanged.
Also re.sub instead of replace, allows regex replacement instead of plain text replacement only.
Reading the file as a single string instead of line by line allows for multiline match and replacement.
import re
def replace(file, pattern, subst):
# Read contents from file as a single string
file_handle = open(file, 'r')
file_string = file_handle.read()
file_handle.close()
# Use RE package to allow for replacement (also allowing for (multiline) REGEX)
file_string = (re.sub(pattern, subst, file_string))
# Write contents to file.
# Using mode 'w' truncates the file.
file_handle = open(file, 'w')
file_handle.write(file_string)
file_handle.close()

As lassevk suggests, write out the new file as you go, here is some example code:
fin = open("a.txt")
fout = open("b.txt", "wt")
for line in fin:
fout.write( line.replace('foo', 'bar') )
fin.close()
fout.close()

If you're wanting a generic function that replaces any text with some other text, this is likely the best way to go, particularly if you're a fan of regex's:
import re
def replace( filePath, text, subs, flags=0 ):
with open( filePath, "r+" ) as file:
fileContents = file.read()
textPattern = re.compile( re.escape( text ), flags )
fileContents = textPattern.sub( subs, fileContents )
file.seek( 0 )
file.truncate()
file.write( fileContents )

A more pythonic way would be to use context managers like the code below:
from tempfile import mkstemp
from shutil import move
from os import remove
def replace(source_file_path, pattern, substring):
fh, target_file_path = mkstemp()
with open(target_file_path, 'w') as target_file:
with open(source_file_path, 'r') as source_file:
for line in source_file:
target_file.write(line.replace(pattern, substring))
remove(source_file_path)
move(target_file_path, source_file_path)
You can find the full snippet here.

fileinput is quite straightforward as mentioned on previous answers:
import fileinput
def replace_in_file(file_path, search_text, new_text):
with fileinput.input(file_path, inplace=True) as file:
for line in file:
new_line = line.replace(search_text, new_text)
print(new_line, end='')
Explanation:
fileinput can accept multiple files, but I prefer to close each single file as soon as it is being processed. So placed single file_path in with statement.
print statement does not print anything when inplace=True, because STDOUT is being forwarded to the original file.
end='' in print statement is to eliminate intermediate blank new lines.
You can used it as follows:
file_path = '/path/to/my/file'
replace_in_file(file_path, 'old-text', 'new-text')

Create a new file, copy lines from the old to the new, and do the replacing before you write the lines to the new file.

Expanding on #Kiran's answer, which I agree is more succinct and Pythonic, this adds codecs to support the reading and writing of UTF-8:
import codecs
from tempfile import mkstemp
from shutil import move
from os import remove
def replace(source_file_path, pattern, substring):
fh, target_file_path = mkstemp()
with codecs.open(target_file_path, 'w', 'utf-8') as target_file:
with codecs.open(source_file_path, 'r', 'utf-8') as source_file:
for line in source_file:
target_file.write(line.replace(pattern, substring))
remove(source_file_path)
move(target_file_path, source_file_path)

Using hamishmcn's answer as a template I was able to search for a line in a file that match my regex and replacing it with empty string.
import re
fin = open("in.txt", 'r') # in file
fout = open("out.txt", 'w') # out file
for line in fin:
p = re.compile('[-][0-9]*[.][0-9]*[,]|[-][0-9]*[,]') # pattern
newline = p.sub('',line) # replace matching strings with empty string
print newline
fout.write(newline)
fin.close()
fout.close()

if you remove the indent at the like below, it will search and replace in multiple line.
See below for example.
def replace(file, pattern, subst):
#Create temp file
fh, abs_path = mkstemp()
print fh, abs_path
new_file = open(abs_path,'w')
old_file = open(file)
for line in old_file:
new_file.write(line.replace(pattern, subst))
#close temp file
new_file.close()
close(fh)
old_file.close()
#Remove original file
remove(file)
#Move new file
move(abs_path, file)

Read txt files, results in empty lines

I have some problem to open and read a txt-file in Python. The txt file contains text (cat text.txt works fine in Terminal). But in Python I only get 5 empty lines.
print open('text.txt').read()
Do you know why?

I solved it. Was a utf-16 file.
print open('text.txt').read().decode('utf-16-le')

if this prints the lines in your file then perhaps the file your program is selecting is empty? I don't know, but try this:
import tkinter as tk
from tkinter import filedialog
import os
def fileopen():
GUI=tk.Tk()
filepath=filedialog.askopenfilename(parent=GUI,title='Select file to print lines.')
(GUI).destroy()
return (filepath)
filepath = fileopen()
filepath = os.path.normpath(filepath)
with open (filepath, 'r') as fh:
print (fh.read())
or alternatively, using this method of printing lines:
fh = open(filepath, 'r')
for line in fh:
line=line.rstrip('\n')
print (line)
fh.close()
or if you want the lines loaded into a list of strings:
lines = []
fh = open(filepath, 'r')
for line in fh:
line=line.rstrip('\n')
lines.append(line)
fh.close()
for line in lines:
print (line)

When you open file I think you have to specify how do you want to open it. In your example you should open it for reading like:
print open('text.txt',"r").read()
Hope this does the trick.

How can I tell python to edit another python file?

Right now, I have file.py and it prints the word "Hello" into text.txt.
f = open("text.txt")
f.write("Hello")
f.close()
I want to do the same thing, but I want to print the word "Hello" into a Python file. Say I wanted to do something like this:
f = open("list.py")
f.write("a = 1")
f.close
When I opened the file list.py, would it have a variable a with a value 1? How would I go about doing this?

If you want to append a new line to the end of a file
with open("file.py", "a") as f:
f.write("\na = 1")
If you want to write a line to the beginning of a file try creating a new one
with open("file.py") as f:
lines = f.readlines()
with open("file.py", "w") as f:
lines.insert(0, "a = 1")
f.write("\n".join(lines))

with open("list.py","a") as f:
f.write("a=1")
This is simple as you see. You have to open that file in write and read mode (a). Also with open() method is safier and more clear.
Example:
with open("list.py","a") as f:
f.write("a=1")
f.write("\nprint(a+1)")
list.py
a=1
print(a+1)
Output from list.py:
>>>
2
>>>
As you see, there is a variable in list.py called a equal to 1.

I would recommend you specify opening mode, when you are opening a file for reading, writing, etc. For example:
for reading:
with open('afile.txt', 'r') as f: # 'r' is a reading mode
text = f.read()
for writing:
with open('afile.txt', 'w') as f: # 'w' is a writing mode
f.write("Some text")
If you are opening a file with 'w' (writing) mode, old file content will be removed. To avoid that appending mode exists:
with open('afile.txt', 'a') as f: # 'a' as an appending mode
f.write("additional text")
For more information, please, read documentation.

Replace and overwrite instead of appending

I have the following code:
import re
#open the xml file for reading:
file = open('path/test.xml','r+')
#convert to string:
data = file.read()
file.write(re.sub(r"<string>ABC</string>(\s+)<string>(.*)</string>",r"<xyz>ABC</xyz>\1<xyz>\2</xyz>",data))
file.close()
where I'd like to replace the old content that's in the file with the new content. However, when I execute my code, the file "test.xml" is appended, i.e. I have the old content follwed by the new "replaced" content. What can I do in order to delete the old stuff and only keep the new?

You need seek to the beginning of the file before writing and then use file.truncate() if you want to do inplace replace:
import re
myfile = "path/test.xml"
with open(myfile, "r+") as f:
data = f.read()
f.seek(0)
f.write(re.sub(r"<string>ABC</string>(\s+)<string>(.*)</string>", r"<xyz>ABC</xyz>\1<xyz>\2</xyz>", data))
f.truncate()
The other way is to read the file then open it again with open(myfile, 'w'):
with open(myfile, "r") as f:
data = f.read()
with open(myfile, "w") as f:
f.write(re.sub(r"<string>ABC</string>(\s+)<string>(.*)</string>", r"<xyz>ABC</xyz>\1<xyz>\2</xyz>", data))
Neither truncate nor open(..., 'w') will change the inode number of the file (I tested twice, once with Ubuntu 12.04 NFS and once with ext4).
By the way, this is not really related to Python. The interpreter calls the corresponding low level API. The method truncate() works the same in the C programming language: See http://man7.org/linux/man-pages/man2/truncate.2.html

file='path/test.xml'
with open(file, 'w') as filetowrite:
filetowrite.write('new content')
Open the file in 'w' mode, you will be able to replace its current text save the file with new contents.

Using truncate(), the solution could be
import re
#open the xml file for reading:
with open('path/test.xml','r+') as f:
#convert to string:
data = f.read()
f.seek(0)
f.write(re.sub(r"<string>ABC</string>(\s+)<string>(.*)</string>",r"<xyz>ABC</xyz>\1<xyz>\2</xyz>",data))
f.truncate()

import os#must import this library
if os.path.exists('TwitterDB.csv'):
os.remove('TwitterDB.csv') #this deletes the file
else:
print("The file does not exist")#add this to prevent errors
I had a similar problem, and instead of overwriting my existing file using the different 'modes', I just deleted the file before using it again, so that it would be as if I was appending to a new file on each run of my code.

See from How to Replace String in File works in a simple way and is an answer that works with replace
fin = open("data.txt", "rt")
fout = open("out.txt", "wt")
for line in fin:
fout.write(line.replace('pyton', 'python'))
fin.close()
fout.close()

in my case the following code did the trick
with open("output.json", "w+") as outfile: #using w+ mode to create file if it not exists. and overwrite the existing content
json.dump(result_plot, outfile)

Using python3 pathlib library:
import re
from pathlib import Path
import shutil
shutil.copy2("/tmp/test.xml", "/tmp/test.xml.bak") # create backup
filepath = Path("/tmp/test.xml")
content = filepath.read_text()
filepath.write_text(re.sub(r"<string>ABC</string>(\s+)<string>(.*)</string>",r"<xyz>ABC</xyz>\1<xyz>\2</xyz>", content))
Similar method using different approach to backups:
from pathlib import Path
filepath = Path("/tmp/test.xml")
filepath.rename(filepath.with_suffix('.bak')) # different approach to backups
content = filepath.read_text()
filepath.write_text(re.sub(r"<string>ABC</string>(\s+)<string>(.*)</string>",r"<xyz>ABC</xyz>\1<xyz>\2</xyz>", content))

Python command isn't reading a .txt file

Trying to follow the guide here, but it's not working as expected. I'm sure I'm missing something.
http://docs.python.org/tutorial/inputoutput.html#reading-and-writing-files
file = open("C:/Test.txt", "r");
print file
file.read()
file.read()
file.read()
file.read()
file.read()
file.read()
Using the readline() method gives the same results.
file.readline()
The output I get is:
<open file 'C:/Test.txt', mode 'r' at 0x012A5A18>
Any suggestions on what might be wrong?

Nothing's wrong there. file is an object, which you are printing.
Try this:
file = open('C:/Test.txt', 'r')
for line in file.readlines(): print line,

print file invokes the file object's __repr__() function, which in this case is defined to return just what is printed. To print the file's contents, you must read() the contents into a variable (or pass it directly to print). Also, file is a built-in type in Python, and by using file as a variable name, you shadow the built-in, which is almost certainly not what you want. What you want is this:
infile = open('C:/test.txt', 'r')
print infile.read()
infile.close()
Or
infile = open('C:/test.txt', 'r')
file_contents = infile.read()
print file_contents
infile.close()

print file.read()

You have to read the file first!
file = open("C:/Test.txt", "r")
foo = file.read()
print(foo)
You can write also:
file = open("C:/Test.txt", "r").read()
print(file)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

StringIO with binary files? - python

I seem to get different outputs: from StringIO import * file = open('1.bmp', 'r') print file.read(), '\n' print StringIO(file.read()).getvalue() Why? Is it because StringIO only supports text strings or something?

The second file.read() actually returns just an empty string. You should do file.seek(0) to rewind the internal file offset.

Shouldn't you be using "rb" to open, instead of just "r", since this mode assumes that you'll be processing only ASCII characters and EOFs?

Related

Replacing a line in an already opened file python [duplicate]

Read txt files, results in empty lines

How can I tell python to edit another python file?

Replace and overwrite instead of appending

Python command isn't reading a .txt file

Categories

Resources