I'm trying to build a translator using deepl for subtitles but it isn't running perfectly. I managed to translate the subtitles and most of the part I'm having problems replacing the lines. I can see that the lines are translated because it prints them but it doesn't replace them. Whenever I run the program it is the same as the original file.
This is the code responsible for:
def translate(input, output, languagef, languaget):
file = open(input, 'r').read()
fileresp = open(output,'r+')
subs = list(srt.parse(file))
for sub in subs:
try:
linefromsub = sub.content
translationSentence = pydeepl.translate(linefromsub, languaget.upper(), languagef.upper())
print(str(sub.index) + ' ' + translationSentence)
for line in fileresp.readlines():
newline = fileresp.write(line.replace(linefromsub,translationSentence))
except IndexError:
print("Error parsing data from deepl")
This is the how the file looks:
1
00:00:02,470 --> 00:00:04,570
- Yes, I do.
- (laughs)
2
00:00:04,605 --> 00:00:07,906
My mom doesn't want
to babysit everyday
3
00:00:07,942 --> 00:00:09,274
or any day.
4
00:00:09,310 --> 00:00:11,977
But I need
my mom's help sometimes.
5
00:00:12,013 --> 00:00:14,046
She's just gonna
have to be grandma today.
Help will be appreaciated :)
Thanks.
You are opening fileresp with r+ mode. When you call readlines(), the file's position will be set to the end of the file. Subsequent calls to write() will then append to the file. If you want to overwrite the original contents as opposed to append, you should try this instead:
allLines = fileresp.readlines()
fileresp.seek(0) # Set position to the beginning
fileresp.truncate() # Delete the contents
for line in allLines:
fileresp.write(...)
Update
It's difficult to see what you're trying to accomplish with r+ mode here but it seems you have two separate input and output files. If that's the case consider:
def translate(input, output, languagef, languaget):
file = open(input, 'r').read()
fileresp = open(output, 'w') # Use w mode instead
subs = list(srt.parse(file))
for sub in subs:
try:
linefromsub = sub.content
translationSentence = pydeepl.translate(linefromsub, languaget.upper(), languagef.upper())
print(str(sub.index) + ' ' + translationSentence)
fileresp.write(translationSentence) # Write the translated sentence
except IndexError:
print("Error parsing data from deepl")
Related
I am trying to find and replace several lines of plain text in multiple files with input() but when I enter '\n' characters to represent where the new line chars would be in the text, it doesn't find it and doesn't replace it.
I tried to use raw_strings but couldn't get them to work.
Is this a job for regular expressions?
python 3.7
import os
import re
import time
start = time.time()
# enter path and check input for standard format
scan_folder = input('Enter the absolute path to scan:\n')
validate_path_regex = re.compile(r'[a-z,A-Z]:\\?(\\?\w*\\?)*')
mo = validate_path_regex.search(scan_folder)
if mo is None:
print('Path is not valid. Please re-enter path.\n')
import sys
sys.exit()
os.chdir(scan_folder)
# get find/replaceStrings, and then confirm that inputs are correct.
find_string = input('Enter the text you wish to find:\n')
replace_string = input('Enter the text to replace:\n')
permission = input('\nPlease confirm you want to replace '
+ find_string + ' with '
+ replace_string + ' in ' + scan_folder
+ ' directory.\n\nType "yes" to continue.\n')
if permission == 'yes':
change_count = 0
# Context manager for results file
with open('find_and_replace.txt', 'w') as results:
for root, subdirs, files in os.walk(scan_folder):
for file in files:
# ignore files that don't endwith '.mpr'
if os.path.join(root, file).endswith('.mpr'):
fullpath = os.path.join(root, file)
# context manager for each file opened
with open(fullpath, 'r+') as f:
text = f.read()
# only add to changeCount if find_string is in text
if find_string in text:
change_count += 1
# move cursor back to beginning of the file
f.seek(0)
f.write(text.replace(find_string, replace_string))
results.write(str(change_count)
+ ' files have been modified to replace '
+ find_string + ' with ' + replace_string + '.\n')
print('Done with replacement')
else:
print('Find and replace has not been executed')
end = time.time()
print('Program took ' + str(round((end - start), 4)) + ' secs to complete.\n')
find_string = BM="LS"\nTI="12"\nDU="7"
replace_string = BM="LSL"\nDU="7"
The original file looks like
BM="LS"
TI="12"
DU="7"
and I would like it to change to
BM="LSL"
DU="7"
but the file doesn't change.
So, the misconception you have is the distinction between source code, which understands escape sequences like "this is a string \n with two lines", and things like "raw strings" (a concept that doesn't make sense in this context) and the data your are providing as user input. The input function basically processes data coming in from the standard input device. When you provide data to standard input, it is being interpreted as a raw bytes and then the input function assumes its meant to be text (decoded using whatever your system setting imply). There are two approaches to allow a user to input newlines, the first is to use sys.stdin, however, this will require you to provide an EOF, probably using ctrl + D:
>>> import sys
>>> x = sys.stdin.read()
here is some text and i'm pressing return
to make a new line. now to stop input, press control d>>> x
"here is some text and i'm pressing return\nto make a new line. now to stop input, press control d"
>>> print(x)
here is some text and i'm pressing return
to make a new line. now to stop input, press control d
This is not very user-friendly. You have to either pass a newline and an EOF, i.e. return + ctrl + D or do ctrl + D twice, and this depends on the system, I believe.
A better approach would be to allow the user to input escape sequences, and then decode them yourself:
>>> x = input()
I want this to\nbe on two lines
>>> x
'I want this to\\nbe on two lines'
>>> print(x)
I want this to\nbe on two lines
>>> x.encode('utf8').decode('unicode_escape')
'I want this to\nbe on two lines'
>>> print(x.encode('utf8').decode('unicode_escape'))
I want this to
be on two lines
>>>
everyone. Need help opening and reading the file.
Got this txt file - https://yadi.sk/i/1TH7_SYfLss0JQ
It is a dictionary
{"id0":"url0", "id1":"url1", ..., "idn":"urln"}
But it was written using json into txt file.
#This is how I dump the data into a txt
json.dump(after,open(os.path.join(os.getcwd(), 'before_log.txt'), 'a'))
So, the file structure is
{"id0":"url0", "id1":"url1", ..., "idn":"urln"}{"id2":"url2", "id3":"url3", ..., "id4":"url4"}{"id5":"url5", "id6":"url6", ..., "id7":"url7"}
And it is all a string....
I need to open it and check repeated ID, delete and save it again.
But getting - json.loads shows ValueError: Extra data
Tried these:
How to read line-delimited JSON from large file (line by line)
Python json.loads shows ValueError: Extra data
json.decoder.JSONDecodeError: Extra data: line 2 column 1 (char 190)
But still getting that error, just in different place.
Right now I got as far as:
with open('111111111.txt', 'r') as log:
before_log = log.read()
before_log = before_log.replace('}{',', ').split(', ')
mu_dic = []
for i in before_log:
mu_dic.append(i)
This eliminate the problem of several {}{}{} dictionaries/jsons in a row.
Maybe there is a better way to do this?
P.S. This is how the file is made:
json.dump(after,open(os.path.join(os.getcwd(), 'before_log.txt'), 'a'))
Your file size is 9,5M, so it'll took you a while to open it and debug it manually.
So, using head and tail tools (found normally in any Gnu/Linux distribution) you'll see that:
# You can use Python as well to read chunks from your file
# and see the nature of it and what it's causing a decode problem
# but i prefer head & tail because they're ready to be used :-D
$> head -c 217 111111111.txt
{"1933252590737725178": "https://instagram.fiev2-1.fna.fbcdn.net/vp/094927bbfd432db6101521c180221485/5CC0EBDD/t51.2885-15/e35/46950935_320097112159700_7380137222718265154_n.jpg?_nc_ht=instagram.fiev2-1.fna.fbcdn.net",
$> tail -c 219 111111111.txt
, "1752899319051523723": "https://instagram.fiev2-1.fna.fbcdn.net/vp/a3f28e0a82a8772c6c64d4b0f264496a/5CCB7236/t51.2885-15/e35/30084016_2051123655168027_7324093741436764160_n.jpg?_nc_ht=instagram.fiev2-1.fna.fbcdn.net"}
$> head -c 294879 111111111.txt | tail -c 12
net"}{"19332
So the first guess is that your file is a malformed series ofJSON data, and the best guess is to seperate }{ by a \n for further manipulations.
So, here is an example of how you can solve your problem using Python:
import json
input_file = '111111111.txt'
output_file = 'new_file.txt'
data = ''
with open(input_file, mode='r', encoding='utf8') as f_file:
# this with statement part can be replaced by
# using sed under your OS like this example:
# sed -i 's/}{/}\n{/g' 111111111.txt
data = f_file.read()
data = data.replace('}{', '}\n{')
seen, total_keys, to_write = set(), 0, {}
# split the lines of the in memory data
for elm in data.split('\n'):
# convert the line to a valid Python dict
converted = json.loads(elm)
# loop over the keys
for key, value in converted.items():
total_keys += 1
# if the key is not seen then add it for further manipulations
# else ignore it
if key not in seen:
seen.add(key)
to_write.update({key: value})
# write the dict's keys & values into a new file as a JSON format
with open(output_file, mode='a+', encoding='utf8') as out_file:
out_file.write(json.dumps(to_write) + '\n')
print(
'found duplicated key(s): {seen} from {total}'.format(
seen=total_keys - len(seen),
total=total_keys
)
)
Output:
found duplicated key(s): 43836 from 45367
And finally, the output file will be a valid JSON file and the duplicated keys will be removed with their values.
The basic difference between the file structure and actual json format is the missing commas and the lines are not enclosed within [. So the same can be achieved with the below code snippet
with open('json_file.txt') as f:
# Read complete file
a = (f.read())
# Convert into single line string
b = ''.join(a.splitlines())
# Add , after each object
b = b.replace("}", "},")
# Add opening and closing parentheses and ignore last comma added in prev step
b = '[' + b[:-1] + ']'
x = json.loads(b)
I am trying perform a nested loop to combine data into a line by using matched MAC Addresses in both files.
I am able to pull the loop fine without the regex, however, when using the search regex below, it will only loop through the MAC_Lines once and print the correct results using the first entry in the MAC_Lines and stop. I'm unsure how to make the MAC_Lines go to the next line and repeat the process for all of the entries in the MAC_Lines.
try:
for mac in MAC_Lines:
MAC_address = re.search(r'([a-fA-F0-9]{2}[:|\-]?){6}', mac, re.I)
MAC_address_final = MAC_address.group()
for arp in ARP_Lines:
ARP_address = re.search(r'([a-fA-F0-9]{2}[:|\-]?){6}', arp, re.I)
ARP_address_final = ARP_address.group()
if MAC_address_final == ARP_address_final:
print mac + arp
continue
except Exception:
print 'completed.'
Results:
13,64,00:0c:29:36:9f:02,giga-swx 0/213,172.20.13.70, 00:0c:29:36:9f:02, vlan 64
completed.
I learned that the issue was how I opened the file. I should have used the 'open':'as' keywords when opening both files to allow the files to properly close and reopen for the next loop. Below is the code I was looking for.
Below is the code:
with open('MAC_List.txt', 'r') as read0:for items0 in read0:
MAC_address = re.search(r'([a-fA-F0-9]{2}[:|\-]?){6}', items0, re.I)
if MAC_address:
mac_addy = MAC_address.group().upper()
with open('ARP_List.txt', 'r') as read1:
for items1 in read1:
ARP_address = re.search(r'([a-fA-F0-9]{2}[:|\-]?){6}', items1, re.I)
if ARP_address:
arp_addy = ARP_address.group()
if mac_addy == arp_addy:
print(items0.strip() + ' ' + items1.strip())
I have a file named sample.txt which looks like below
ServiceProfile.SharediFCList[1].DefaultHandling=1
ServiceProfile.SharediFCList[1].ServiceInformation=
ServiceProfile.SharediFCList[1].IncludeRegisterRequest=n
ServiceProfile.SharediFCList[1].IncludeRegisterResponse=n
Here my requirement is to remove the brackets and the integer and enter os commands with that
ServiceProfile.SharediFCList.DefaultHandling=1
ServiceProfile.SharediFCList.ServiceInformation=
ServiceProfile.SharediFCList.IncludeRegisterRequest=n
ServiceProfile.SharediFCList.IncludeRegisterResponse=n
I am quite a newbie in Python. This is my first attempt. I have used these codes to remove the brackets:
#!/usr/bin/python
import re
import os
import sys
f = os.open("sample.txt", os.O_RDWR)
ret = os.read(f, 10000)
os.close(f)
print ret
var1 = re.sub("[\(\[].*?[\)\]]", "", ret)
print var1f = open("removed.cfg", "w+")
f.write(var1)
f.close()
After this using the file as input I want to form application specific commands which looks like this:
cmcli INS "DefaultHandling=1 ServiceInformation="
and the next set as
cmcli INS "IncludeRegisterRequest=n IncludeRegisterRequest=y"
so basically now I want the all the output to be bunched to a set of two for me to execute the commands on the operating system.
Is there any way that I could bunch them up as set of two?
Reading 10,000 bytes of text into a string is really not necessary when your file is line-oriented text, and isn't scalable either. And you need a very good reason to be using os.open() instead of open().
So, treat your data as the lines of text that it is, and every two lines, compose a single line of output.
from __future__ import print_function
import re
command = [None,None]
cmd_id = 1
bracket_re = re.compile(r".+\[\d\]\.(.+)")
# This doesn't just remove the brackets: what you actually seem to want is
# to pick out everything after [1]. and ignore the rest.
with open("removed_cfg","w") as outfile:
with open("sample.txt") as infile:
for line in infile:
m = bracket_re.match(line)
cmd_id = 1 - cmd_id # gives 0, 1, 0, 1
command[cmd_id] = m.group(1)
if cmd_id == 1: # we have a pair
output_line = """cmcli INS "{0} {1}" """.format(*command)
print (output_line, file=outfile)
This gives the output
cmcli INS "DefaultHandling=1 ServiceInformation="
cmcli INS "IncludeRegisterRequest=n IncludeRegisterResponse=n"
The second line doesn't correspond to your sample output. I don't know how the input IncludeRegisterResponse=n is supposed to become the output IncludeRegisterRequest=y. I assume that's a mistake.
Note that this code depends on your input data being precisely as you describe it and has no error checking whatsoever. So if the format of the input is in reality more variable than that, then you will need to add some validation.
I have this code
with open ('ip.txt') as ip :
ips = ip.readlines()
with open ('user.txt') as user :
usrs = user.readlines()
with open ('pass.txt') as passwd :
passwds = passwd.readlines()
with open ('prefix.txt') as pfx :
pfxes = pfx.readlines()
with open ('time.txt') as timer :
timeout = timer.readline()
with open ('phone.txt') as num :
number = num.readline()
which open all those files and join them in this shape
result = ('Server:{0} # U:{1} # P:{2} # Pre:{3} # Tel:{4}\n{5}\n'.format(b,c,d,a,number,ctime))
print (result)
cmd = ("{0}{1}#{2}".format(a,number,b))
print (cmd)
I supposed it will print like this
Server:x.x.x.x # U:882 # P:882 # Pre:900 # Tel:456123456789
900456123456789#x.x.x.x
but the output was like this
Server:x.x.x.x
# U:882 # P:882 # Pre:900
# Tel:456123456789
900
456123456789#187.191.45.228
New output :-
Server:x.x.x.x # U:882 # P:882 # Pre:900 # Tel:['456123456789']
900['456123456789']#x.x.x.x
how i can solve this ?
may be you should remove newline using strip()
Example
with open ('ip.txt') as ip :
ips = ip.readline().strip()
readline() will read one line at a time, where readlines() will read entire files as a list of lines
I am guessing from your limited example is that b has a newline embedded. That's because of readlines(). The python idiom to use here is: ip.read().splitlines() where ip is one of your file handles.
See more splitlines options at python docs
Apart from other great answers, for completeness sake here I am going to post an alternative answer using string.translate, which will cover in case of any \n or newline has been accidentally inserted into middle of your string, like '123\n456\n78', which will cover the corner cases from using rstrip or strip.
Server:x.x.x.x # U:882 # P:882 # Pre:900 # Tel:['456123456789']
900['456123456789']#x.x.x.x
You have this is because you're printing a list, to resolve this, you need to join the string in your list number
Altogether, solution will be something Like this:
import string
# prepare for string translation to get rid of new lines
tbl = string.maketrans("","")
result = ('Server:{0} # U:{1} # P:{2} # Pre:{3} # Tel:{4}\n{5}\n'.format(b,c,d,a,''.join(number),ctime))
# this will translate all new lines to ""
print (result.translate(tbl, "\n"))
cmd = ("{0}{1}#{2}".format(a,''.join(number),b))
print (cmd.translate(tbl, "\n"))