Encrypting the lines in a file - python

I'm trying to write a program that opens a text file, and shifts each of the characters in the file 5 characters to the right. It should only do this for alphanumeric characters, and leave nonalphanumerics as they are. (ex: C becomes H) I'm supposed to be using the ASCII table to do this, and I'm having an issue when the characters wrap around. ex: w should become b, but my program gives me a character that's in the ASCII table. Another issue I'm having is that all the characters are printing on separate lines and I'd like them all to print on the same line.
I can't use lists or dictionaries.
This is what I have, I'm not sure how to do the final if statement
def main():
fileName= input('Please enter the file name: ')
encryptFile(fileName)
def encryptFile(fileName):
f= open(fileName, 'r')
line=1
while line:
line=f.readline()
for char in line:
if char.isalnum():
a=ord(char)
b= a + 5
#if number wraps around, how to correct it
if
print(chr(c))
else:
print(chr(b))
else:
print(char)

Using str.translate:
In [24]: import string
In [25]: string.uppercase
Out[25]: 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
In [26]: string.uppercase[5:]+string.uppercase[:5]
Out[26]: 'FGHIJKLMNOPQRSTUVWXYZABCDE'
In [27]: table = string.maketrans(string.uppercase, string.uppercase[5:]+string.uppercase[:5])
In [28]: 'CAR'.translate(table)
Out[28]: 'HFW'
In [29]: 'HELLO'.translate(table)
Out[29]: 'MJQQT'

First, it matters if it is lower or upper case. I am going to assume here that all the characters are lower case (if they aren't, it would be easy enough to make them)
if b>122:
b=122-b #z=122
c=b+96 #a=97
w=119 in ASCII and z=122 (decimal in ASCII) so 119+5=124 and 124-122=2 which is our new b, then we add that to a-1 (this takes care of if we get a 1 back, 2+96=98 and 98 is b.
For the printing on the same line, instead of printing when you have them, I would write them to a list, then create a string from that list.
e.g instead of
print(chr(c))
else:
print(chr(b))
I would do
someList.append(chr(c))
else:
somList.append(chr(b))
then join each element of the list together into one string.

You could create a dictionary to handle it:
import string
s = string.lowercase + string.uppercase + string.digits + string.lowercase[:5]
encryptionKey = {s[i]:s[i+5] for i in range(len(s)-5)}
The final addend to s (+ string.lowercase[:5]) adds the first 5 letters into the key. Then, we use a simple dictionary comprehension to create a key for the encryption.
Put into your code (I also changed it so you iterate through the lines rather than using f.readline():
import string
def main():
fileName= input('Please enter the file name: ')
encryptFile(fileName)
def encryptFile(fileName):
s = string.lowercase + string.uppercase + string.digits + string.lowercase[:5]
encryptionKey = {s[i]:s[i+5] for i in range(len(s)-5)}
f= open(fileName, 'r')
line=1
for line in f:
for char in line:
if char.isalnum():
print(encryptionKey[char])
else:
print(char)

Related

Python: how to avoid the space in dna calculated?

I am using python 2.7.
I want to find the DNA length. I have no idea where is the mistake.....The length of DNA supposed to be 283, but it comes up with 345.
The sequence in a single line is nothing wrong but just the length have some problem.....
I think the spaces are calculated too. May I know how to get the length of the DNA without including the spaces?
Thank you.
import re
singleSeq = ""
fh = open("seq.embl.txt")
lines = fh.readlines()
for line in lines:
lines = line.strip()
m = re.match(r"\s+(.[^\d]+)\s+\d+", line)
if m:
print(m.group(0))
seqline = m.group(1)
print(seqline)
singleSeq += seqline
print("\nSequence in a single line: ")
# print(line.strip(singleSeq))
print(singleSeq)
print("\nSequence length: ", len(singleSeq))
Output
Sequence in a single line:
cccatgtccc agcggcgtat tgctttgcat cgcgaacgca ctttcaatgt cccagcggcg tattgcttct attttataag taccagctaa attttttttt tttttttata agtaccagct aaaatttttt tttttttttt ttataagtac cagctaaaat tttttttttt tttttttata agtaccagct aaaatttttt ttttttttta taagttccag cggcgtattg ctttctgaaa tttaaaaaaa aaaaaaaatt tttttttaat aatatattat ata
Sequence length: 345
This should do the trick
# Python3 code to remove whitespace
def remove(string):
return string.replace(" ", "")
# Driver Program
string = ' t e s t '
print(remove(string))
it seems you are reinventing the wheel her. i strongly suggest you try BioPython for this
from Bio import SeqIO
record = SeqIO.read("seq.embl.txt", "embl")
print("\nSequence length: ", len(record))

String for text issue in python

I met some problems about try to use string in text.
here is a provided file sqroot2_10kdigits.txt.
the sqroot2_10kdigits.txt is below:
1.4142135623 7309504880 1688724209 6980785696 7187537694 8073176679 7379907324 7846210703 8850387534 3276415727 3501384623 0912297024
9248360558 5073721264 4121497099 9358314132 2266592750 5592755799
9505011527 8206057147 0109559971 6059702745 3459686201 4728517418
6408891986 0955232923 0484308714 3214508397 6260362799 5251407989
6872533965 4633180882 9640620615 2583523950 5474575028 7759961729
8355752203 3753185701 1354374603 4084988471
My code is below:
myfile = open("sqroot2_10kdigits.txt")
txt = myfile.read()
print(txt)
myfile.close()
Q2: Make a new empty string called sqroot_2_string. Note that there's a space between every 10 digits.Instead of using the .rstrip() method, try using .replace(" ", "") to remove all the spaces in the file and save it in the empty string I just made. Check the length of the string as well, it should be 10002. Then print the first 10 digits followed by .... Here's an example:
The first 10 digit of square root of 2 is 1.4142135623... My codes are below:
def sqroot_2_string(string):
count = 0
list = []
for i in xrange(len(string)):
if string[i] != ' ':
list.append(string[i])
return toString(list)
# Utility Function
def toString(List):
return ''.join(List)
# Driver program
string = myfile
print sqroot_2_string(string)
Anyone can check my code in Q2? I don't know how to use .replace(" ", "") to remove all the spaces in the file and save it in the empty string
You can just do
def sqroot_2_string(string):
return string.replace(" ", "")
Also note that you should do
print(sqroot_2_string(txt))
so you are using the text from the file instead of the file handle

Taking input from a text file for implementing Caesar Cipher

I am trying to implement Caesar cipher in Python where my program would take input from a text file i.e. input_file.txt, and write the encrypted text as an output to another text file named output_file.txt. The input file contains:
Attack On Titans
4
where "Attack On Titan" is the string to be encrypted and 4 is the key to the encryption algorithm. The correct output for this string should be
Exxego Sr Xmxerw
but my program gives me
Exxego Sr Xmxerwv
i.e an extra character v. Here is my code for review:
data = open("input_file.txt", "r")
text = data.readline()
print(text)
key = int(data.readline())
def encrypt(text,key):
result = ""
for i in range(len(text)):
char = text[i]
if char == ' ':
result += ' '
elif char.isupper():
result += chr((ord(char) + key-65) % 26 + 65)
else:
result += chr((ord(char) + key - 97) % 26 + 97)
return result
ex= open("output_file.txt","w")
ex.write(encrypt(text,key))
print(encrypt(text , key))
I just wanted to know why am I getting this incorrect output although I know I can make it correct if I change the for statement by doing this:
for i in range(len(text)-1)
Please don't mind this amateurish coding since I am not good at it and want to improve it. Thanks.
data.readline() will give you the trailing newline character \n. You need to call text.strip() before passing to the encrypt function to get arid of it.
It look like you have a trailing newline character in the file you are reading in.
Testing it in the python interpreter:
>>> a = '\n'
>>> (ord(a)+4-97) % 26 + 97
118
>>> chr(118)
'v'
Remove trailing and beginning whitespace by calling test.strip() before passing it to your encrypt function.
As an aside, you should either explicitly close your files, e.g. ex.close() or wrap in in a block like this, to prevent file corruption.
with open('', 'r') as ex:
ex.write('bar')
data.readline() keeps the '\n' (newline) character at the end of the line. It's the reason why you have an extra character in your output.
To remove it you can replace
text = data.readline()
by
text = data.readline().rstrip('\n')
which will remove the '\n' at the end.
text.strip() (see other answers) will remove all whitespace characters from both end of the string. So if it's not the behaviour expected, use .rstrip('\n') which removes only '\n' at the end of the string.
You should also add
ex.close()
after
ex.write(encrypt(text,key))
to commit the change to the file.

Determine ROT encoding

I want to determine which type of ROT encoding is used and based off that, do the correct decode.
Also, I have found the following code which will indeed decode rot13 "sbbone" to "foobart" correctly:
import codecs
codecs.decode('sbbone', 'rot_13')
The thing is I'd like to run this python file against an existing file which has rot13 encoding. (for example rot13.py encoded.txt).
Thank you!
To answer the second part of your first question, decode something in ROT-x, you can use the following code:
def encode(s, ROT_number=13):
"""Encodes a string (s) using ROT (ROT_number) encoding."""
ROT_number %= 26 # To avoid IndexErrors
alpha = "abcdefghijklmnopqrstuvwxyz" * 2
alpha += alpha.upper()
def get_i():
for i in range(26):
yield i # indexes of the lowercase letters
for i in range(53, 78):
yield i # indexes of the uppercase letters
ROT = {alpha[i]: alpha[i + ROT_number] for i in get_i()}
return "".join(ROT.get(i, i) for i in s)
def decode(s, ROT_number=13):
"""Decodes a string (s) using ROT (ROT_number) encoding."""
return encrypt(s, abs(ROT_number % 26 - 26))
To answer the first part of your first question, find the rot encoding of an arbitrarily encoded string, you probably want to brute-force. Uses all rot-encodings, and check which one makes the most sense. A quick(-ish) way to do this is to get a space-delimited (e.g. cat\ndog\nmouse\nsheep\nsay\nsaid\nquick\n... where \n is a newline) file containing most common words in the English language, and then check which encoding has the most words in it.
with open("words.txt") as f:
words = frozenset(f.read().lower().split("\n"))
# frozenset for speed
def get_most_likely_encoding(s, delimiter=" "):
alpha = "abcdefghijklmnopqrstuvwxyz" + delimiter
for punctuation in "\n\t,:; .()":
s.replace(punctuation, delimiter)
s = "".join(c for c in s if c.lower() in alpha)
word_count = [sum(w.lower() in words for w in encode(
s, enc).split(delimiter)) for enc in range(26)]
return word_count.index(max(word_count))
A file on Unix machines that you could use is /usr/dict/words, which can also be found here
Well, you can read the file line by line and decode it.
The output should go to an output file:
import codecs
import sys
def main(filename):
output_file = open('output_file.txt', 'w')
with open(filename) as f:
for line in f:
output_file.write(codecs.decode(line, 'rot_13'))
output_file.close()
if __name__ == "__main__":
_filename = sys.argv[1]
main(_filename)

Extracting Data from Multiple TXT Files and Creating a Summary CSV File in Python

I have a folder with about 50 .txt files containing data in the following format.
=== Predictions on test data ===
inst# actual predicted error distribution (OFTd1_OF_Latency)
1 1:S 2:R + 0.125,*0.875 (73.84)
I need to write a program that combines the following: my index number (i), the letter of the true class (R or S), the letter of the predicted class, and each of the distribution predictions (the decimals less than 1.0).
I would like it to look like the following when finished, but preferably as a .csv file.
ID True Pred S R
1 S R 0.125 0.875
2 R R 0.105 0.895
3 S S 0.945 0.055
. . . . .
. . . . .
. . . . .
n S S 0.900 0.100
I'm a beginner and a bit fuzzy on how to get all of that parsed and then concatenated and appended. Here's what I was thinking, but feel free to suggest another direction if that would be easier.
for i in range(1, n):
s = str(i)
readin = open('mydata/output/output'+s+'out','r')
#The files are all named the same but with different numbers associated
output = open("mydata/summary.csv", "a")
storage = []
for line in readin:
#data extraction/concatenation here
if line.startswith('1'):
id = i
true = # split at the ':' and take the letter after it
pred = # split at the second ':' and take the letter after it
#some have error '+'s and some don't so I'm not exactly sure what to do to get the distributions
ds = # split at the ',' and take the string of 5 digits before it
if pred == 'R':
dr = #skip the character after the comma but take the have characters after
else:
#take the five characters after the comma
lineholder = id+' , '+true+' , '+pred+' , '+ds+' , '+dr
else: continue
output.write(lineholder)
I think using the indexes would be another option, but it might complicate things if the spacing is off in any of the files and I haven't checked this for sure.
Thank you for your help!
Well first of all, if you want to use CSV, you should use CSV module that comes with python. More about this module here: https://docs.python.org/2.7/library/csv.html I won't demonstrate how to use it, because it's pretty simple.
As for reading the input data, here's my suggestion how to break down every line of the data itself. I assume that lines of data in the input file have their values separated by spaces, and each value cannot contain a space:
def process_line(id_, line):
pieces = line.split() # Now we have an array of values
true = pieces[1].split(':')[1] # split at the ':' and take the letter after it
pred = pieces[2].split(':')[1] # split at the second ':' and take the letter after it
if len(pieces) == 6: # There was an error, the + is there
p4 = pieces[4]
else: # There was no '+' only spaces
p4 = pieces[3]
ds = p4.split(',')[0] # split at the ',' and take the string of 5 digits before it
if pred == 'R':
dr = p4.split(',')[0][1:] #skip the character after the comma but take the have??? characters after
else:
dr = p4.split(',')[0]
return id_+' , '+true+' , '+pred+' , '+ds+' , '+dr
What I mainly used here was split function of strings: https://docs.python.org/2/library/stdtypes.html#str.split and in one place this simple syntax of str[1:] to skip the first character of the string (strings are arrays after all, we can use this slicing syntax).
Keep in mind that my function won't handle any errors or lines formated differently than the one you posted as an example. If the values in every line are separated by tabs and not spaces you should replace this line: pieces = line.split() with pieces = line.split('\t').
i think u can separte floats and then combine it with the strings with the help of re module as follows:
import re
file = open('sample.txt','r')
strings=[[num for num in re.findall(r'\d+\.+\d+',i) for i in file.readlines()]]
print (strings)
file.close()
file = open('sample.txt','r')
num=[[num for num in re.findall(r'\w+\:+\w+',i) for i in file.readlines()]]
print (num)
s= num+strings
print s #[['1:S','2:R'],['0.125','0.875','73.84']] output of the code
this prog is written for one line u can use it for multiple line as well but u need to use a loop for that
contents of sample.txt:
1 1:S 2:R + 0.125,*0.875 (73.84)
2 1:S 2:R + 0.15,*0.85 (69.4)
when you run the prog the result will be:
[['1:S,'2:R'],['1:S','2:R'],['0.125','0.875','73.84'],['0.15,'0.85,'69.4']]
simply concatenate them
This uses regular expressions and the CSV module.
import re
import csv
matcher = re.compile(r'[[:blank:]]*1.*:(.).*:(.).* ([^ ]*),[^0-9]?(.*) ')
filenametemplate = 'mydata/output/output%iout'
output = csv.writer(open('mydata/summary.csv', 'w'))
for i in range(1, n):
for line in open(filenametemplate % i):
m = matcher.match(line)
if m:
output.write([i] + list(m.groups()))

Categories

Resources