Write() starts halfway through - python

I am new to python and I really don't understand why this is happening: when I run my code, the lower() is only applied to half (or less) of the text file. How I can fix this?
import glob, os, string, re
list_of_files = glob.glob("/Users/louis/Downloads/assignment/data2/**/*.txt")
for file_name in list_of_files:
f = open(file_name, 'r+')
for line in f:
line = line.lower()
f.write(line)

The problem is most probably because you are reading and writing at the same time. And you need to return to the start of the file to write in place of the original content. Try this:
for file_name in list_of_files:
with open(file_name, 'r+') as f:
content = f.read().lower()
f.seek(0, 0) # returns to the start of the file
f.write(content)

Related

how to save output file.txt after hashing sha256 in python

I am trying to hash sha256 using python.
I got the result like this but I don't know how to save the hash result to file.txt. Please help me to edit the command.
import hashlib
with open('x.txt') as f:
for line in f:
line = line.strip()
print(hashlib.sha256(line.encode()).hexdigest())
9869f9826306f436e1a8581b7ce467d38bab6e280839fd88fd51d45de39b6409
b51d80a47274161a0eeb44cfa1586ee9c4bc3d33740895a4d688f9090e24d8c2
f0f2ea3096f72e0d6916f9febd912a17fd9c91e83dd9e558967e21329dfbe393
4799d169d99c206ae68fe96c67736d88b6976c1a47ce2383ced8de9edf41ade9
2a68d417af41750b17a1b65b2002c5875b2b40232d54f7566e1fc51f5395b9f9
826c4d573dc5766eb44461f76ce0ca08487e9d5894583214b7c52bdf032039c4
results like this
[1]: https://i.stack.imgur.com/DBj1p.png
Just need to open the file with the w flag and write to it.
See the documentation on Reading and Writing Files.
import hashlib
with open('x.txt') as f_in, open('file.txt', 'w') as f_out:
for line in f_in:
line = line.strip()
f_out.write(hashlib.sha256(line.encode()).hexdigest())
PS. you're not hashing the file properly, you should not use line.strip() as it means you hash the line without leading/trailing spaces/tabs/newlines.
You have one file ('x.txt') opened to read line by line, so you'll need to open another file to write your output.
You could write line by line:
import hashlib
with open("x.txt") as f:
with open("file.txt", "w") as outfile:
for line in f:
line = line.strip()
hash = hashlib.sha256(line.encode()).hexdigest()
outfile.write(hash + "\n")
Or you could define a list, append every line, and write once:
import hashlib
hashes = []
with open("x.txt") as f:
for line in f:
line = line.strip()
hash = hashlib.sha256(line.encode()).hexdigest()
hashes.append(hash + "\n")
with open("file.txt", "w") as outfile:
outfile.writelines(hashes)
Note that you would need to append "\n" to jump to a new line.
import hashlib
with open('x.txt') as f:
for line in f:
line = line.strip()
txt_to_write = hashlib.sha256(line.encode()).hexdigest())
with open('readme.txt', 'w') as f:
f.write(txt_to_write)
You have to create a file and write in it.
You put the result of your hash in a variable and you write it.
Here the file will be created in your current repository.

How to replace a string in every file in a directory

I'm trying to replace with python a string: "XXXXXXXXXXX" with a new string: "ILoveStackOverflow" in every file in a particular folder (all the files in the folder are xml).
My code is as follows:
import os, fnmatch
for filename in os.listdir("C:/Users/Francesco.Borri/Desktop/passivo GME"):
if filename.endswith('.xml'):
with open(os.path.join("C:/Users/Francesco.Borri/Desktop/passivo GME", filename)) as f:
content = f.read()
content = content.replace("XXXXXXXXXXX", "ILoveStackOverflow")
with open(os.path.join("C:/Users/Francesco.Borri/Desktop/passivo GME", filename), mode="w") as f: #Long Pierre-André answer
f.write(content)
The next step would be to replace a different string: "YYYY" with a number that increases every time. If in my directory there are 10 files and I set the starting number 1, the first file "YYYY" will be replaced with 1, the second file with 2 and so on until 10.
You are close. When you open the file the second time, you have to open it in writeable mode to be able to write the content.
with open(os.path.join("C:/Users/Francesco.Borri/Desktop/passivo GME", filename), 'w') as f:
f.write(content)
Once you fix this, I think the second part of your question is just maintaining a variable whose value you increment everytime you replace the string. You could do it manually (iterate over the string), or use the replace function in a for loop:
with open(os.path.join("C:/Users/Francesco.Borri/Desktop/passivo GME", filename)) as f:
content = f.read()
for i in range(content.count("YYYY")):
content.replace("YYYY", str(i), 1) # or str(i+1)
with open(os.path.join("C:/Users/Francesco.Borri/Desktop/passivo GME", filename), 'w') as f:
f.write(content)
with open(os.path.join("C:/Users/Francesco.Borri/Desktop/passivo GME", filename), mode="w") as f:
You must open the file on writing mode.

on opening a data file to reading the lines included in that file

I am using the following code segment to partition a data file into two parts:
def shuffle_split(infilename, outfilename1, outfilename2):
with open(infilename, 'r') as f:
lines = f.readlines()
lines[-1] = lines[-1].rstrip('\n') + '\n'
shuffle(lines)
with open(outfilename1, 'w') as f:
f.writelines(lines[:90000])
with open(outfilename2, 'w') as f:
f.writelines(lines[90000:])
outfilename1.close()
outfilename2.close()
shuffle_split(data_file, training_file,validation_file)
Running this code segment cause the following error,
in shuffle_split
with open(infilename, 'r') as f:
TypeError: coercing to Unicode: need string or buffer, file found
What's wrong with the way of opening the data_file for input?
Whatever you're passing in as infilename is already a file, rather than a file's path name.

Replacing a line in an already opened file python [duplicate]

I want to loop over the contents of a text file and do a search and replace on some lines and write the result back to the file. I could first load the whole file in memory and then write it back, but that probably is not the best way to do it.
What is the best way to do this, within the following code?
f = open(file)
for line in f:
if line.contains('foo'):
newline = line.replace('foo', 'bar')
# how to write this newline back to the file
The shortest way would probably be to use the fileinput module. For example, the following adds line numbers to a file, in-place:
import fileinput
for line in fileinput.input("test.txt", inplace=True):
print('{} {}'.format(fileinput.filelineno(), line), end='') # for Python 3
# print "%d: %s" % (fileinput.filelineno(), line), # for Python 2
What happens here is:
The original file is moved to a backup file
The standard output is redirected to the original file within the loop
Thus any print statements write back into the original file
fileinput has more bells and whistles. For example, it can be used to automatically operate on all files in sys.args[1:], without your having to iterate over them explicitly. Starting with Python 3.2 it also provides a convenient context manager for use in a with statement.
While fileinput is great for throwaway scripts, I would be wary of using it in real code because admittedly it's not very readable or familiar. In real (production) code it's worthwhile to spend just a few more lines of code to make the process explicit and thus make the code readable.
There are two options:
The file is not overly large, and you can just read it wholly to memory. Then close the file, reopen it in writing mode and write the modified contents back.
The file is too large to be stored in memory; you can move it over to a temporary file and open that, reading it line by line, writing back into the original file. Note that this requires twice the storage.
I guess something like this should do it. It basically writes the content to a new file and replaces the old file with the new file:
from tempfile import mkstemp
from shutil import move, copymode
from os import fdopen, remove
def replace(file_path, pattern, subst):
#Create temp file
fh, abs_path = mkstemp()
with fdopen(fh,'w') as new_file:
with open(file_path) as old_file:
for line in old_file:
new_file.write(line.replace(pattern, subst))
#Copy the file permissions from the old file to the new file
copymode(file_path, abs_path)
#Remove original file
remove(file_path)
#Move new file
move(abs_path, file_path)
Here's another example that was tested, and will match search & replace patterns:
import fileinput
import sys
def replaceAll(file,searchExp,replaceExp):
for line in fileinput.input(file, inplace=1):
if searchExp in line:
line = line.replace(searchExp,replaceExp)
sys.stdout.write(line)
Example use:
replaceAll("/fooBar.txt","Hello\sWorld!$","Goodbye\sWorld.")
This should work: (inplace editing)
import fileinput
# Does a list of files, and
# redirects STDOUT to the file in question
for line in fileinput.input(files, inplace = 1):
print line.replace("foo", "bar"),
Based on the answer by Thomas Watnedal.
However, this does not answer the line-to-line part of the original question exactly. The function can still replace on a line-to-line basis
This implementation replaces the file contents without using temporary files, as a consequence file permissions remain unchanged.
Also re.sub instead of replace, allows regex replacement instead of plain text replacement only.
Reading the file as a single string instead of line by line allows for multiline match and replacement.
import re
def replace(file, pattern, subst):
# Read contents from file as a single string
file_handle = open(file, 'r')
file_string = file_handle.read()
file_handle.close()
# Use RE package to allow for replacement (also allowing for (multiline) REGEX)
file_string = (re.sub(pattern, subst, file_string))
# Write contents to file.
# Using mode 'w' truncates the file.
file_handle = open(file, 'w')
file_handle.write(file_string)
file_handle.close()
As lassevk suggests, write out the new file as you go, here is some example code:
fin = open("a.txt")
fout = open("b.txt", "wt")
for line in fin:
fout.write( line.replace('foo', 'bar') )
fin.close()
fout.close()
If you're wanting a generic function that replaces any text with some other text, this is likely the best way to go, particularly if you're a fan of regex's:
import re
def replace( filePath, text, subs, flags=0 ):
with open( filePath, "r+" ) as file:
fileContents = file.read()
textPattern = re.compile( re.escape( text ), flags )
fileContents = textPattern.sub( subs, fileContents )
file.seek( 0 )
file.truncate()
file.write( fileContents )
A more pythonic way would be to use context managers like the code below:
from tempfile import mkstemp
from shutil import move
from os import remove
def replace(source_file_path, pattern, substring):
fh, target_file_path = mkstemp()
with open(target_file_path, 'w') as target_file:
with open(source_file_path, 'r') as source_file:
for line in source_file:
target_file.write(line.replace(pattern, substring))
remove(source_file_path)
move(target_file_path, source_file_path)
You can find the full snippet here.
fileinput is quite straightforward as mentioned on previous answers:
import fileinput
def replace_in_file(file_path, search_text, new_text):
with fileinput.input(file_path, inplace=True) as file:
for line in file:
new_line = line.replace(search_text, new_text)
print(new_line, end='')
Explanation:
fileinput can accept multiple files, but I prefer to close each single file as soon as it is being processed. So placed single file_path in with statement.
print statement does not print anything when inplace=True, because STDOUT is being forwarded to the original file.
end='' in print statement is to eliminate intermediate blank new lines.
You can used it as follows:
file_path = '/path/to/my/file'
replace_in_file(file_path, 'old-text', 'new-text')
Create a new file, copy lines from the old to the new, and do the replacing before you write the lines to the new file.
Expanding on #Kiran's answer, which I agree is more succinct and Pythonic, this adds codecs to support the reading and writing of UTF-8:
import codecs
from tempfile import mkstemp
from shutil import move
from os import remove
def replace(source_file_path, pattern, substring):
fh, target_file_path = mkstemp()
with codecs.open(target_file_path, 'w', 'utf-8') as target_file:
with codecs.open(source_file_path, 'r', 'utf-8') as source_file:
for line in source_file:
target_file.write(line.replace(pattern, substring))
remove(source_file_path)
move(target_file_path, source_file_path)
Using hamishmcn's answer as a template I was able to search for a line in a file that match my regex and replacing it with empty string.
import re
fin = open("in.txt", 'r') # in file
fout = open("out.txt", 'w') # out file
for line in fin:
p = re.compile('[-][0-9]*[.][0-9]*[,]|[-][0-9]*[,]') # pattern
newline = p.sub('',line) # replace matching strings with empty string
print newline
fout.write(newline)
fin.close()
fout.close()
if you remove the indent at the like below, it will search and replace in multiple line.
See below for example.
def replace(file, pattern, subst):
#Create temp file
fh, abs_path = mkstemp()
print fh, abs_path
new_file = open(abs_path,'w')
old_file = open(file)
for line in old_file:
new_file.write(line.replace(pattern, subst))
#close temp file
new_file.close()
close(fh)
old_file.close()
#Remove original file
remove(file)
#Move new file
move(abs_path, file)

Replace and overwrite instead of appending

I have the following code:
import re
#open the xml file for reading:
file = open('path/test.xml','r+')
#convert to string:
data = file.read()
file.write(re.sub(r"<string>ABC</string>(\s+)<string>(.*)</string>",r"<xyz>ABC</xyz>\1<xyz>\2</xyz>",data))
file.close()
where I'd like to replace the old content that's in the file with the new content. However, when I execute my code, the file "test.xml" is appended, i.e. I have the old content follwed by the new "replaced" content. What can I do in order to delete the old stuff and only keep the new?
You need seek to the beginning of the file before writing and then use file.truncate() if you want to do inplace replace:
import re
myfile = "path/test.xml"
with open(myfile, "r+") as f:
data = f.read()
f.seek(0)
f.write(re.sub(r"<string>ABC</string>(\s+)<string>(.*)</string>", r"<xyz>ABC</xyz>\1<xyz>\2</xyz>", data))
f.truncate()
The other way is to read the file then open it again with open(myfile, 'w'):
with open(myfile, "r") as f:
data = f.read()
with open(myfile, "w") as f:
f.write(re.sub(r"<string>ABC</string>(\s+)<string>(.*)</string>", r"<xyz>ABC</xyz>\1<xyz>\2</xyz>", data))
Neither truncate nor open(..., 'w') will change the inode number of the file (I tested twice, once with Ubuntu 12.04 NFS and once with ext4).
By the way, this is not really related to Python. The interpreter calls the corresponding low level API. The method truncate() works the same in the C programming language: See http://man7.org/linux/man-pages/man2/truncate.2.html
file='path/test.xml'
with open(file, 'w') as filetowrite:
filetowrite.write('new content')
Open the file in 'w' mode, you will be able to replace its current text save the file with new contents.
Using truncate(), the solution could be
import re
#open the xml file for reading:
with open('path/test.xml','r+') as f:
#convert to string:
data = f.read()
f.seek(0)
f.write(re.sub(r"<string>ABC</string>(\s+)<string>(.*)</string>",r"<xyz>ABC</xyz>\1<xyz>\2</xyz>",data))
f.truncate()
import os#must import this library
if os.path.exists('TwitterDB.csv'):
os.remove('TwitterDB.csv') #this deletes the file
else:
print("The file does not exist")#add this to prevent errors
I had a similar problem, and instead of overwriting my existing file using the different 'modes', I just deleted the file before using it again, so that it would be as if I was appending to a new file on each run of my code.
See from How to Replace String in File works in a simple way and is an answer that works with replace
fin = open("data.txt", "rt")
fout = open("out.txt", "wt")
for line in fin:
fout.write(line.replace('pyton', 'python'))
fin.close()
fout.close()
in my case the following code did the trick
with open("output.json", "w+") as outfile: #using w+ mode to create file if it not exists. and overwrite the existing content
json.dump(result_plot, outfile)
Using python3 pathlib library:
import re
from pathlib import Path
import shutil
shutil.copy2("/tmp/test.xml", "/tmp/test.xml.bak") # create backup
filepath = Path("/tmp/test.xml")
content = filepath.read_text()
filepath.write_text(re.sub(r"<string>ABC</string>(\s+)<string>(.*)</string>",r"<xyz>ABC</xyz>\1<xyz>\2</xyz>", content))
Similar method using different approach to backups:
from pathlib import Path
filepath = Path("/tmp/test.xml")
filepath.rename(filepath.with_suffix('.bak')) # different approach to backups
content = filepath.read_text()
filepath.write_text(re.sub(r"<string>ABC</string>(\s+)<string>(.*)</string>",r"<xyz>ABC</xyz>\1<xyz>\2</xyz>", content))

Categories

Resources