I am trying to implement Caesar cipher in Python where my program would take input from a text file i.e. input_file.txt, and write the encrypted text as an output to another text file named output_file.txt. The input file contains:
Attack On Titans
4
where "Attack On Titan" is the string to be encrypted and 4 is the key to the encryption algorithm. The correct output for this string should be
Exxego Sr Xmxerw
but my program gives me
Exxego Sr Xmxerwv
i.e an extra character v. Here is my code for review:
data = open("input_file.txt", "r")
text = data.readline()
print(text)
key = int(data.readline())
def encrypt(text,key):
result = ""
for i in range(len(text)):
char = text[i]
if char == ' ':
result += ' '
elif char.isupper():
result += chr((ord(char) + key-65) % 26 + 65)
else:
result += chr((ord(char) + key - 97) % 26 + 97)
return result
ex= open("output_file.txt","w")
ex.write(encrypt(text,key))
print(encrypt(text , key))
I just wanted to know why am I getting this incorrect output although I know I can make it correct if I change the for statement by doing this:
for i in range(len(text)-1)
Please don't mind this amateurish coding since I am not good at it and want to improve it. Thanks.
data.readline() will give you the trailing newline character \n. You need to call text.strip() before passing to the encrypt function to get arid of it.
It look like you have a trailing newline character in the file you are reading in.
Testing it in the python interpreter:
>>> a = '\n'
>>> (ord(a)+4-97) % 26 + 97
118
>>> chr(118)
'v'
Remove trailing and beginning whitespace by calling test.strip() before passing it to your encrypt function.
As an aside, you should either explicitly close your files, e.g. ex.close() or wrap in in a block like this, to prevent file corruption.
with open('', 'r') as ex:
ex.write('bar')
data.readline() keeps the '\n' (newline) character at the end of the line. It's the reason why you have an extra character in your output.
To remove it you can replace
text = data.readline()
by
text = data.readline().rstrip('\n')
which will remove the '\n' at the end.
text.strip() (see other answers) will remove all whitespace characters from both end of the string. So if it's not the behaviour expected, use .rstrip('\n') which removes only '\n' at the end of the string.
You should also add
ex.close()
after
ex.write(encrypt(text,key))
to commit the change to the file.
Related
I met some problems about try to use string in text.
here is a provided file sqroot2_10kdigits.txt.
the sqroot2_10kdigits.txt is below:
1.4142135623 7309504880 1688724209 6980785696 7187537694 8073176679 7379907324 7846210703 8850387534 3276415727 3501384623 0912297024
9248360558 5073721264 4121497099 9358314132 2266592750 5592755799
9505011527 8206057147 0109559971 6059702745 3459686201 4728517418
6408891986 0955232923 0484308714 3214508397 6260362799 5251407989
6872533965 4633180882 9640620615 2583523950 5474575028 7759961729
8355752203 3753185701 1354374603 4084988471
My code is below:
myfile = open("sqroot2_10kdigits.txt")
txt = myfile.read()
print(txt)
myfile.close()
Q2: Make a new empty string called sqroot_2_string. Note that there's a space between every 10 digits.Instead of using the .rstrip() method, try using .replace(" ", "") to remove all the spaces in the file and save it in the empty string I just made. Check the length of the string as well, it should be 10002. Then print the first 10 digits followed by .... Here's an example:
The first 10 digit of square root of 2 is 1.4142135623... My codes are below:
def sqroot_2_string(string):
count = 0
list = []
for i in xrange(len(string)):
if string[i] != ' ':
list.append(string[i])
return toString(list)
# Utility Function
def toString(List):
return ''.join(List)
# Driver program
string = myfile
print sqroot_2_string(string)
Anyone can check my code in Q2? I don't know how to use .replace(" ", "") to remove all the spaces in the file and save it in the empty string
You can just do
def sqroot_2_string(string):
return string.replace(" ", "")
Also note that you should do
print(sqroot_2_string(txt))
so you are using the text from the file instead of the file handle
I am trying to find and replace several lines of plain text in multiple files with input() but when I enter '\n' characters to represent where the new line chars would be in the text, it doesn't find it and doesn't replace it.
I tried to use raw_strings but couldn't get them to work.
Is this a job for regular expressions?
python 3.7
import os
import re
import time
start = time.time()
# enter path and check input for standard format
scan_folder = input('Enter the absolute path to scan:\n')
validate_path_regex = re.compile(r'[a-z,A-Z]:\\?(\\?\w*\\?)*')
mo = validate_path_regex.search(scan_folder)
if mo is None:
print('Path is not valid. Please re-enter path.\n')
import sys
sys.exit()
os.chdir(scan_folder)
# get find/replaceStrings, and then confirm that inputs are correct.
find_string = input('Enter the text you wish to find:\n')
replace_string = input('Enter the text to replace:\n')
permission = input('\nPlease confirm you want to replace '
+ find_string + ' with '
+ replace_string + ' in ' + scan_folder
+ ' directory.\n\nType "yes" to continue.\n')
if permission == 'yes':
change_count = 0
# Context manager for results file
with open('find_and_replace.txt', 'w') as results:
for root, subdirs, files in os.walk(scan_folder):
for file in files:
# ignore files that don't endwith '.mpr'
if os.path.join(root, file).endswith('.mpr'):
fullpath = os.path.join(root, file)
# context manager for each file opened
with open(fullpath, 'r+') as f:
text = f.read()
# only add to changeCount if find_string is in text
if find_string in text:
change_count += 1
# move cursor back to beginning of the file
f.seek(0)
f.write(text.replace(find_string, replace_string))
results.write(str(change_count)
+ ' files have been modified to replace '
+ find_string + ' with ' + replace_string + '.\n')
print('Done with replacement')
else:
print('Find and replace has not been executed')
end = time.time()
print('Program took ' + str(round((end - start), 4)) + ' secs to complete.\n')
find_string = BM="LS"\nTI="12"\nDU="7"
replace_string = BM="LSL"\nDU="7"
The original file looks like
BM="LS"
TI="12"
DU="7"
and I would like it to change to
BM="LSL"
DU="7"
but the file doesn't change.
So, the misconception you have is the distinction between source code, which understands escape sequences like "this is a string \n with two lines", and things like "raw strings" (a concept that doesn't make sense in this context) and the data your are providing as user input. The input function basically processes data coming in from the standard input device. When you provide data to standard input, it is being interpreted as a raw bytes and then the input function assumes its meant to be text (decoded using whatever your system setting imply). There are two approaches to allow a user to input newlines, the first is to use sys.stdin, however, this will require you to provide an EOF, probably using ctrl + D:
>>> import sys
>>> x = sys.stdin.read()
here is some text and i'm pressing return
to make a new line. now to stop input, press control d>>> x
"here is some text and i'm pressing return\nto make a new line. now to stop input, press control d"
>>> print(x)
here is some text and i'm pressing return
to make a new line. now to stop input, press control d
This is not very user-friendly. You have to either pass a newline and an EOF, i.e. return + ctrl + D or do ctrl + D twice, and this depends on the system, I believe.
A better approach would be to allow the user to input escape sequences, and then decode them yourself:
>>> x = input()
I want this to\nbe on two lines
>>> x
'I want this to\\nbe on two lines'
>>> print(x)
I want this to\nbe on two lines
>>> x.encode('utf8').decode('unicode_escape')
'I want this to\nbe on two lines'
>>> print(x.encode('utf8').decode('unicode_escape'))
I want this to
be on two lines
>>>
I want to make a simple Python script that will map each Arabic letter to phoneme sound symbols. I have a file that has a bunch of words that the script will read to convert them to phonemes, and I have the following dictionary in my code:
Content in my .txt file:
السلام عليكم
السلام عليكم و رحمة الله
السلام عليكم و رحمة الله و بركاته
الحمد لله
كيف حالك
كيف الحال
The dictionary in my code:
ar_let_phon_maplist = {u'ﺍ':'A:', u'ﺏ':'B', u'ﺕ':'T', u'ﺙ':'TH', u'ﺝ':'J', u'ﺡ':'H', u'ﺥ':'KH', u'ﻩ':'H', u'ﻉ':'(ayn) ’', u'ﻍ':'GH', u'ﻑ':'F', u'ﻕ':'q', u'ﺹ':u'ṣ', u'ﺽ':u'ḍ', u'ﺩ':'D', u'ﺫ':'DH', u'ﻁ':u'ṭ', u'ﻙ':'K', u'ﻡ':'M', u'ﻥ':'N', u'ﻝ':'L', u'ﻱ':'Y', u'ﺱ':'S', u'ﺵ':'SH', u'ﻅ':u'ẓ', u'ﺯ':'Z', u'ﻭ':'W', u'ﺭ':'R'}
I have a nested loop where I'm reading each line, converting each character:
with codecs.open(sys.argv[1], 'r', encoding='utf-8') as file:
lines = file.readlines()
line_counter = 0
for line in lines:
print "Phonetics In Line " + str(line_counter)
print line + " ",
for word in line:
for character in word:
if character == '\n':
print ""
elif character == ' ':
print " "
else:
print ar_let_phon_maplist[character] + " ",
line_counter +=1
And this is the error I'm getting:
Phonetics In Line 0
السلام عليكم
Traceback (most recent call last):
File "grapheme2phoneme.py", line 25, in <module>
print ar_let_phon_maplist[character] + " ",
KeyError: u'\u0627'
And then I checked if the file type is UTF-8 using the Linux command:
file words.txt
The output I got:
words.txt: UTF-8 Unicode text
Any solution for this problem, why it's not mapping to an Unicode object that is in the dictionary since also the character I'm using as key in ar_let_phon_maplist[character] line is Unicode?
Is there something wrong with my code?
The first thing that catches the eye is KeyError. So your dictionary simply does not know about some symbols encountered in file. Looking ahead, it does not know about ANY of the submitted characters, not only about the first.
What we can to do with it? Okay, we can just add all of the symbols from Arabian segment of unicode table into our dictionary. Simple? Yes. Clear? No.
If you want to actually understand the reasons of this 'strange' behaviour, you should to know more about Unicode. In short, there are a lot of letters that looks similar but have different ordinal numbers. Moreover, the same letter sometimes can be presented in multiple forms. So comparing unicode characters is not a trivial task.
So, if I was allowed to use Python 3.3+ I would solve the task as follows. First I'll normalize keys in ar_let_phon_maplist dictionary:
ar_let_phon_maplist = {unicodedata.normalize('NFKD', k): v
for k, v in ar_let_phon_maplist.items()}
And then we will iterate over lines in file, words in line and characters in word like this:
for index, line in enumerate(lines):
print('Phonetics in line {0}, total {1} symbols'.format(index, len(line)))
unknown = [] # Here will be stored symbols that we haven't found in dict
words = line.split()
for word in words:
print(word, ': ', sep='', end='')
for character in word:
c = unicodedata.normalize('NFKD', character).casefold()
try:
print(ar_let_phon_maplist[c], sep='', end='')
except KeyError:
print('_', sep='', end='')
if c not in unknown:
unknown.append(c)
print()
if unknown:
print('Unrecognized symbols: {0}, total {1} symbols'.format(', '.join(unknown),
len(unknown)))
Script will produce something like that:
Phonetics in line 4, total 9 symbols
كيف: KYF
حالك: HA:LK
It looks like you forgot that character in the dictionary. You have ﺍ (u'\ufe8d', ARABIC LETTER ALEF ISOLATED FORM), which looks similar, but you don't have ا (u'\u0627', ARABIC LETTER ALEF).
I'm trying to write a program that opens a text file, and shifts each of the characters in the file 5 characters to the right. It should only do this for alphanumeric characters, and leave nonalphanumerics as they are. (ex: C becomes H) I'm supposed to be using the ASCII table to do this, and I'm having an issue when the characters wrap around. ex: w should become b, but my program gives me a character that's in the ASCII table. Another issue I'm having is that all the characters are printing on separate lines and I'd like them all to print on the same line.
I can't use lists or dictionaries.
This is what I have, I'm not sure how to do the final if statement
def main():
fileName= input('Please enter the file name: ')
encryptFile(fileName)
def encryptFile(fileName):
f= open(fileName, 'r')
line=1
while line:
line=f.readline()
for char in line:
if char.isalnum():
a=ord(char)
b= a + 5
#if number wraps around, how to correct it
if
print(chr(c))
else:
print(chr(b))
else:
print(char)
Using str.translate:
In [24]: import string
In [25]: string.uppercase
Out[25]: 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
In [26]: string.uppercase[5:]+string.uppercase[:5]
Out[26]: 'FGHIJKLMNOPQRSTUVWXYZABCDE'
In [27]: table = string.maketrans(string.uppercase, string.uppercase[5:]+string.uppercase[:5])
In [28]: 'CAR'.translate(table)
Out[28]: 'HFW'
In [29]: 'HELLO'.translate(table)
Out[29]: 'MJQQT'
First, it matters if it is lower or upper case. I am going to assume here that all the characters are lower case (if they aren't, it would be easy enough to make them)
if b>122:
b=122-b #z=122
c=b+96 #a=97
w=119 in ASCII and z=122 (decimal in ASCII) so 119+5=124 and 124-122=2 which is our new b, then we add that to a-1 (this takes care of if we get a 1 back, 2+96=98 and 98 is b.
For the printing on the same line, instead of printing when you have them, I would write them to a list, then create a string from that list.
e.g instead of
print(chr(c))
else:
print(chr(b))
I would do
someList.append(chr(c))
else:
somList.append(chr(b))
then join each element of the list together into one string.
You could create a dictionary to handle it:
import string
s = string.lowercase + string.uppercase + string.digits + string.lowercase[:5]
encryptionKey = {s[i]:s[i+5] for i in range(len(s)-5)}
The final addend to s (+ string.lowercase[:5]) adds the first 5 letters into the key. Then, we use a simple dictionary comprehension to create a key for the encryption.
Put into your code (I also changed it so you iterate through the lines rather than using f.readline():
import string
def main():
fileName= input('Please enter the file name: ')
encryptFile(fileName)
def encryptFile(fileName):
s = string.lowercase + string.uppercase + string.digits + string.lowercase[:5]
encryptionKey = {s[i]:s[i+5] for i in range(len(s)-5)}
f= open(fileName, 'r')
line=1
for line in f:
for char in line:
if char.isalnum():
print(encryptionKey[char])
else:
print(char)
I have done this operation millions of times, just using the + operator! I have no idea why it is not working this time, it is overwriting the first part of the string with the new one! I have a list of strings and just want to concatenate them in one single string! If I run the program from Eclipse it works, from the command-line it doesn't!
The list is:
["UNH+1+XYZ:08:2:1A+%CONVID%'&\r", "ORG+1A+77499505:ABC+++A+FR:EUR++123+1A'&\r", "DUM'&\r"]
I want to discard the first and the last elements, the code is:
ediMsg = ""
count = 1
print "extract_the_info, lineList ",lineList
print "extract_the_info, len(lineList) ",len(lineList)
while (count < (len(lineList)-1)):
temp = ""
# ediMsg = ediMsg+str(lineList[count])
# print "Count "+str(count)+" ediMsg ",ediMsg
print "line value : ",lineList[count]
temp = lineList[count]
ediMsg += " "+temp
print "ediMsg : ",ediMsg
count += 1
print "count ",count
Look at the output:
extract_the_info, lineList ["UNH+1+XYZ:08:2:1A+%CONVID%'&\r", "ORG+1A+77499505:ABC+++A+FR:EUR++123+1A'&\r", "DUM'&\r"]
extract_the_info, len(lineList) 8
line value : ORG+1A+77499505:ABC+++A+FR:EUR++123+1A'&
ediMsg : ORG+1A+77499505:ABC+++A+FR:EUR++123+1A'&
count 2
line value : DUM'&
DUM'& : ORG+1A+77499505:ABC+++A+FR:EUR++123+1A'&
count 3
Why is it doing so!?
While the two answers are correct (use " ".join()), your problem (besides very ugly python code) is this:
Your strings end in "\r", which is a carriage return. Everything is fine, but when you print to the console, "\r" will make printing continue from the start of the same line, hence overwrite what was written on that line so far.
You should use the following and forget about this nightmare:
''.join(list_of_strings)
The problem is not with the concatenation of the strings (although that could use some cleaning up), but in your printing. The \r in your string has a special meaning and will overwrite previously printed strings.
Use repr(), as such:
...
print "line value : ", repr(lineList[count])
temp = lineList[count]
ediMsg += " "+temp
print "ediMsg : ", repr(ediMsg)
...
to print out your result, that will make sure any special characters doesn't mess up the output.
'\r' is the carriage return character. When you're printing out a string, a '\r' will cause the next characters to go at the start of the line.
Change this:
print "ediMsg : ",ediMsg
to:
print "ediMsg : ",repr(ediMsg)
and you will see the embedded \r values.
And while your code works, please change it to the one-liner:
ediMsg = ' '.join(lineList[1:-1])
Your problem is printing, and it is not string manipulation. Try using '\n' as last char instead of '\r' in each string in:
lineList = [
"UNH+1+TCCARQ:08:2:1A+%CONVID%'&\r",
"ORG+1A+77499505:PARAF0103+++A+FR:EUR++11730788+1A'&\r",
"DUM'&\r",
"FPT+CC::::::::N'&\r",
"CCD+CA:5132839000000027:0450'&\r",
"CPY+++AF'&\r",
"MON+712:1.00:EUR'&\r",
"UNT+8+1'\r"
]
I just gave it a quick look. It seems your problem arises when you are printing the text. I haven't done such things for a long time, but probably you only get the last line when you print. If you check the actual variable, I'm sure you'll find that the value is correct.
By last line, I'm talking about the \r you got in the text strings.