Python - Translate a file and keep original paragraph spacing

Python - Translate a file and keep original paragraph spacing - python

I have this project I am working on but need help. My main goal is to make the translated text file look the same as the original file with the exception of the translated words.
Here is what a snippet of the original file looks like:
Original Text File
Here is my python code:
# Step 1: Import the english.txt file
import json
english_text = open('/home/jovyan/english_to_lolspeak_fellow/english.txt', 'r')
text = english_text.readlines()
english_text.close()
# Step 2: Import the glossary (the tranzlashun.json file)
with open('/home/jovyan/english_to_lolspeak_fellow/tranzlashun.json') as translationFile:
data = json.load(translationFile)
# Step 3:Translate the English text into Lolspeak
translated_text= ''
for line in text:
for word in line.split():
if word in data:
translated_text += data[word.lower()]+" "
else:
translated_text += word.lower()+ " "
pass
# Step 4 :Save the translated text as the "lolcat.txt" file
with open('/home/jovyan/english_to_lolspeak_fellow/lolcat.txt', 'w') as lolcat_file:
lolcat_file.write(translated_text)
lolcat_file.close()
And lastly, here is what my output looks like:
Output Translated File
As you can see, I was able to translate the file but the original spacing is ignored. How do I change my code to keep the spacing as it was before?

You can keep the spaces by reading one line at a time.
with open('lolcat.txt', 'w') as fw, open('english.txt') as fp:
for line in fp:
for word in line.split():
line = line.replace(word, data.get(word.lower(), word))
fw.write(line)

I'd suggest combining steps 3 and 4 to translate each line and write the line and then \n to start the next line.
I haven't checked the following on a compiler so you might have to modify it to get it to work.
Note I changed the 'w' to 'a' so it appends instead of just writes and afaik using 'with' means the file will close so you don't need the explicit close().
for line in text:
translated_line = ""
for word in line.split():
if word in data:
translated_line += data[word.lower()]+" "
else:
translated_line += word.lower()+ " "
with open('/home/jovyan/english_to_lolspeak_fellow/lolcat.txt', 'a') as lolcat_file:
lolcat_file.write(translated_line)
write("\n")

Related

how to write a line to read a certain sentence in a text file?

I am currently learning how to code python and i need help with something.
is there a way where I can only allow the script to read a line that starts with Text = .. ?
Because I want the program to read the text file and the text file has a lot of other sentences but I only want the program to focus on the sentences that starts with Text = .. and print it out, ignoring the other lines in the text file.
for example,
in text file:
min = 32.421
text = " Hello I am Robin and I am hungry"
max = 233341.42
how I want my output to be:
Hello I am Robin and I am hungry
I want the output to just solely be the sentence so without the " " and text =
This is my code so far after reading through comments!
import os
import sys
import glob
from english_words import english_words_set
try:
print('Finding file...')
file = glob.glob(sys.argv[1])
print("Found " + str(len(file)) + " file!")
print('LOADING NOW...')
with open(file) as f:
lines = f.read()
for line in lines:
if line.startswith('Text = '):
res = line.split('"')[1]
print(res)

You can read the text file and read its lines like so :
# open file
with open('text_file.txt') as f:
# store the list of lines contained in file
lines = f.readlines()
for line in lines:
# find match
if line.startswith('text ='):
# store the string inside double quotes
res = line.split('"')[1]
print(res)
This should print your expected output.

You can open the file and try to find if the word "text" begins a sentence in the file and then checking the value by doing
file = open("file.txt", "r") # specify the variable as reading the file, change file.txt to the files path
for line in file: # for each line in file
if line.startswith("text"): # checks for text following a new line
text = line.strip() # removes any whitespace from the line
text = text.replace("text = \"", "") # removes the part before the string
text = text.replace("\"", "") # removes the part after the string
print(text)
Or you could convert it from text to something like yml or toml (in python 3.11+) as those are natively supported in python and are much simpler than text files while still keeping your file system about the same. It would store it as a dictionary instead of a string in the variable.

List comprehensions in python:
https://www.youtube.com/watch?v=3dt4OGnU5sM
Using list comprehension with files:
https://www.youtube.com/watch?v=QHFWb_6fHOw
First learn list comprehensions, then the idea is this:
listOutput = ['''min = 32.421
text = "Hello I am Robin and I am hungry"
max = 233341.42''']
myText = ''.join(listOutput)
indexFirst= myText.find("text") + 8 # add 8 to this index to discard {text = "}
indexLast = myText.find('''"''', indexFirst) # locate next quote since indexFirst position
print(myText[indexFirst:indexLast])
Output:
Hello I am Robin and I am hungry

with open(file) as f:
lines = f.read().split("\n")
prefix = "text = "
for line in lines:
if line.startswith(prefix):
# replaces the first occurence of prefix and assigns it to result
result = line.replace(prefix, '', 1)
print(result)
Alternatively, you could use result = line.removeprefix(prefix) but removeprefix is only available in python3.9 upwards

How to save a string in a file as a quoted string?

I have a file file.md that I want to read and get it as a string.
Then I want to take that string and save it in another file, but as a string with quotes (and all). The reason is I want to transfer the content of my markdown file to a markdown string so that I can include it in html using the javascript marked library.
How can I do that using a python script?
Here's what I have tried so far:
with open('file.md', 'r') as md:
text=""
lines = md.readlines()
for line in lines:
line = "'" + line + "'" + '+'
text = text + line
with open('file.txt', 'w') as txt:
txt.write(text)
Input file.md
This is one line of markdown
This is another line of markdown
This is another one
Desired output: file.txt
"This is one line of markdown" +
"This is another line of markdown" +
(what should come here by the way to encode an empty line?)
"This is another one"

There are two things you need to pay attention here.
First is that you should not change your iterator line while it is running through lines. Instead, assign it to a new string variable (I call it new_line).
Second, if you add more characters at the end of each line, it will be placed after the end-of-line character and thus be moved into the next line when you write it to a new file. Instead, skip the last character of each line and add the line break manually.
If I understand you right, this should give you the wanted output:
with open('file.md', 'r') as md:
text = ""
lines = md.readlines()
for line in lines:
if line[-1] == "\n":
text += "'" + line[:-1] + "'+\n"
else:
text += "'" + line + "'+"
with open('file.txt', 'w') as txt:
txt.write(text)
Note how the last line is treated different than the others (no eol-char and no + sign).
text += ... adds more characters to the existing string.
This also works and might be a bit nicer, because it avoids the if-statement. You can remove the newline-character right at reading the content from file.md. In the end you skip the last two characters of your content, which is the + and the \n.
with open('file.md', 'r') as md:
text = ""
lines = [line.rstrip('\n') for line in md]
for line in lines:
text += "'" + line + "' +\n"
with open('file.txt', 'w') as txt:
txt.write(text[:-2])
...and with using a formatter:
text += "'{}' +\n".format(line)
...checking for empty lines as you asked in the comments:
for line in lines:
if line == '':
text += '\n'
else:
text += "'{}' +\n".format(line)

This works:
>>> a = '''This is one line of markdown
... This is another line of markdown
...
... This is another one'''
>>> lines = a.split('\n')
>>> lines = [ '"' + i + '" +' if len(i) else i for i in lines]
>>> lines[-1] = lines[-1][:-2] # drop the '+' at the end of the last line
>>> print '\n'.join( lines )
"This is one line of markdown" +
"This is another line of markdown" +
"This is another one"
You may add reading/writing to files yourself.

Python Splitting Text file based on a keyword

I am trying to write a python program that will constantly read a text file line by line and each time it comes across a line with the word 'SPLIT' it will write the contents to a new text file.
Please could someone point me in the right direction of writing a new text file each time the script comes across the word 'split'. I have no problem reading a text file with Python, I'm unsure how to split on the keyword and create an individual text file each time.
THE SCRIPT BELOW WORKS IN 2.7.13
file_counter = 0
done = False
with open('test.txt') as input_file:
# with open("test"+str(file_counter)+".txt", "w") as out_file:
while not done:
for line in input_file:
if "SPLIT" in line:
done = True
file_counter += 1
else:
print(line)
out_file = open("test"+str(file_counter)+".txt", "a")
out_file.write(line)
#out_file.write(line.strip()+"\n")
print file_counter

You need to have two loops. One which iterates the filenames of the output files then another inside to write the input contents to the current active output until "split" is found:
out_n = 0
done = False
with open("test.txt") as in_file:
while not done: #loop over output file names
with open(f"out{out_n}.txt", "w") as out_file: #generate an output file name
while not done: #loop over lines in inuput file and write to output file
try:
line = next(in_file).strip() #strip whitespace for consistency
except StopIteration:
done = True
break
if "SPLIT" in line: #more robust than 'if line == "SPLIT\n":'
break
else:
out_file.write(line + '\n') #must add back in newline because we stripped it out earlier
out_n += 1 #increment output file name integer

for line in text.splitlines():
if " SPLIT " in line:
# write in new file.
pass
To write in new file check here:
https://www.tutorialspoint.com/python/python_files_io.htm
or
https://docs.python.org/3.6/library/functions.html#open

Search the word, and replace the whole line containing the word in a file in Python using fileinput

I want to search a particular word in a text file. For each line,where the word is present, I want to completely change the line by a new text.
I want to achieve this using fileinput module of python. There are two observation, I am seeing with following variations :-
Code piece 1 :-
text = "mov9 = " # if any line contains this text, I want to modify the whole line.
new_text = "mov9 = Alice in Wonderland"
x = fileinput.input(files="C:\Users\Admin\Desktop\DeletedMovies.txt", inplace=1)
for line in x:
if text in line:
line = new_text
print line,
x.close()
The above piece of code wipes out all the content of the file, and writes the new_text i.e. the file content is only
mov9 = Alice in Wonderland
Code Piece 2 :-
text = "mov9 = " # if any line contains this text, I want to modify the whole line.
new_text = "mov9 = Alice in Wonderland"
x = fileinput.input(files="C:\Users\Admin\Desktop\DeletedMovies.txt", inplace=1)
for line in x:
if text in line:
line = line.replace(text, new_text)
print line,
x.close()
The above piece of code, even though adds the needed line i.e. new_text where text is found, but doesn't deletes the line, but keeps the previous data also.
That is if the line was earlier :-
mov9 = Fast & Furios
after running the above piece of code it becomes :-
mov9 = Alice in WonderlandFast & Furios
And other content of the files remain untouched, not deleted as in code in Code piece 1.
But my goal is to find the word mov9 =, and whatever is present along with it, I want to replace the whole line as mov9 = Alice in Wonderland.
How can I achieve that? Thanks in advance....

I realized that I was wrong by just an indentation. In the code piece 1 mentioned in the question, if I am bringing the 'print line,' from the scope of if i.e. if i outdent it, then this is solved...
As this line was inside the scope of if, hence, only this new_text was being written to the file, and other lines were not being written, and hence the file was left with only the new_text. So, the code piece should be as follow :-
text = "mov9 = " # if any line contains this text, I want to modify the whole line.
new_text = "mov9 = Alice in Wonderland"
x = fileinput.input(files="C:\Users\Admin\Desktop\DeletedMovies.txt", inplace=1)
for line in x:
if text in line:
line = new_text
print line,
x.close()
Also, the second solution given by Rolf of Saxony & the first solution by Padraic Cunningham is somehow similar.

You empty you file because you only write when you find a match, you need to always write the lines:
import sys
text = "mov9 = " # if any line contains this text, I want to modify the whole line.
new_text = "mov9 = Alice in Wonderland\n"
x = fileinput.input(files="C:\Users\Admin\Desktop\DeletedMovies.txt", inplace=1)
for line in x:
if text in line:
line = new_text
sys.stdout.write(line)
If you find a match the line will be set to new_text, so either sys.stdout.write(line) will write the original line or new_text. Also if you actually want to find lines starting with text use if line.startswith(text):
You could also write to a tempfile and replace the original:
from shutil import move
from tempfile import NamedTemporaryFile
text = "mov9 = " # if any line contains this text, I want to modify the whole line.
new_text = "mov9 = Alice in Wonderland\n"
with open("C:\Users\Admin\Desktop\DeletedMovies.txt") as f, NamedTemporaryFile("w", dir=".", delete=False) as tmp:
for line in f:
if text in line:
line = new_text
tmp.write(line)
move(tmp.name, "C:\Users\Admin\Desktop\DeletedMovies.txt")

As vyscond suggested:
import re
yourTxt = ''.join(open('file.txt').readlines())
yourTxt = re.sub(r'\bmov9\b', r'mov9 = Alice in Wonderland', yourTxt)
#Save the new text on disk
f = open('newTxt.txt', 'w')
f.write(yourTxt)
f.close()

I would suggest use two file simultaneously to replace the text and write the replaced data in another file and simply delete the older file.
path="Hello.txt"
path2="Hello2.txt"
file1_open=open(path,"r")
file2_open=open(path2,"w")
for line in file1_open:
if "a" in line:
print "found"
line=line.replace('a',"replaced")
print line
file2_open.write(line)
Now you can internally delete the file1_open file.
It's small and simple though may take load on CPU when with large files.

It looks like this may not be the only solution on this theme, now that I have re-read the answers but as I knocked the code up here goes.
Keep it simple:
import os
text = "mov9 = " # if any line contains this text, I want to modify the whole line.
new_text = "mov9 = Alice in Wonderland "
new=open("new.txt",'w')
with open('DeletedMovies.txt', 'r') as f:
for line in f.readlines():
if text in line:
line = line.replace(text, new_text)
print line
new.write(line)
new.close()
os.rename('DeletedMovies.txt', 'DeletedMovies.txt.old')
os.rename('new.txt', 'DeletedMovies.txt')
It rather depends on just how big you expect this file to get.
Edit:
If you are determined to use fileinput despite the the fact that it will be confusing for anyone unfamiliar with that package, as it is not at all clear what is happening, then you were almost there with your existing code.
The following should work:
import fileinput
text = "mov9 = " # if any line contains this text, I want to modify the whole line.
new_text = "mov9 = Alice in Wonderland "
for line in fileinput.input("DeletedMovies.txt", inplace = 1):
if text in line:
print line.replace(line,new_text)
else:
print line.strip()

With a slight improvement to the Rolf of Saxony answer, this you can run the script without having to hard coded values.
import fileinput
import sys
filename=sys.argv[1]
old_value=sys.argv[2]
new_value=sys.argv[3]
with fileinput.FileInput(filename, inplace=True, backup='.bak') as file:
for line in file:
if old_value in line:
print(line.replace(line,new_value))
else:
print(line.strip())
Now run it as replace.py filename.txt old_value New_Value from the command line

Printing to a file via Python

Hopefully this is an easy fix. I'm trying to edit one field of a file we use for import, however when I run the following code it leaves the file blank and 0kb. Could anyone advise what I'm doing wrong?
import re #import regex so we can use the commands
name = raw_input("Enter filename:") #prompt for file name, press enter to just open test.nhi
if len(name) < 1 : name = "test.nhi"
count = 0
fhand = open(name, 'w+')
for line in fhand:
words = line.split(',') #obtain individual words by using split
words[34] = re.sub(r'\D', "", words[34]) #remove non-numeric chars from string using regex
if len(words[34]) < 1 : continue # If the 34th field is blank go to the next line
elif len(words[34]) == 2 : "{0:0>3}".format([words[34]]) #Add leading zeroes depending on the length of the field
elif len(words[34]) == 3 : "{0:0>2}".format([words[34]])
elif len(words[34]) == 4 : "{0:0>1}".format([words[34]])
fhand.write(words) #write the line
fhand.close() # Close the file after the loop ends

I have taken below text in 'a.txt' as input and modified your code. Please check if it's work for you.
#Intial Content of a.txt
This,program,is,Java,program
This,program,is,12Python,programs
Modified code as follow:
import re
#Reading from file and updating values
fhand = open('a.txt', 'r')
tmp_list=[]
for line in fhand:
#Split line using ','
words = line.split(',')
#Remove non-numeric chars from 34th string using regex
words[3] = re.sub(r'\D', "", words[3])
#Update the 3rd string
# If the 3rd field is blank go to the next line
if len(words[3]) < 1 :
#Removed continue it from here we need to reconstruct the original line and write it to file
print "Field empty.Continue..."
elif len(words[3]) >= 1 and len(words[3]) < 5 :
#format won't add leading zeros. zfill(5) will add required number of leading zeros depending on the length of word[3].
words[3]=words[3].zfill(5)
#After updating 3rd value in words list, again creating a line out of it.
tmp_str = ",".join(words)
tmp_list.append(tmp_str)
fhand.close()
#Writing to same file
whand = open("a.txt",'w')
for val in tmp_list:
whand.write(val)
whand.close()
File content after running code
This,program,is,,program
This,program,is,00012,programs

The file mode 'w+' Truncates your file to 0 bytes, so you'll only be able to read lines that you've written.
Look at Confused by python file mode "w+" for more information.
An idea would be to read the whole file first, close it, and re-open it to write files in it.

Not sure which OS you're on but I think reading and writing to the same file has undefined behaviour.
I guess internally the file object holds the position (try fhand.tell() to see where it is). You could probably adjust it back and forth as you went using fhand.seek(last_read_position) but really that's asking for trouble.
Also, I'm not sure how the script would ever end as it would end up reading the stuff it had just written (in a sort of infinite loop).
Best bet is to read the entire file first:
with open(name, 'r') as f:
lines = f.read().splitlines()
with open(name, 'w') as f:
for l in lines:
# ....
f.write(something)

For 'Printing to a file via Python' you can use:
ifile = open("test.txt","r")
print("Some text...", file = ifile)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python - Translate a file and keep original paragraph spacing - python

You can keep the spaces by reading one line at a time. with open('lolcat.txt', 'w') as fw, open('english.txt') as fp: for line in fp: for word in line.split(): line = line.replace(word, data.get(word.lower(), word)) fw.write(line)

Related

how to write a line to read a certain sentence in a text file?

How to save a string in a file as a quoted string?

Python Splitting Text file based on a keyword

Search the word, and replace the whole line containing the word in a file in Python using fileinput

Printing to a file via Python

Categories

Resources