Rotate list for every letter in string - python

I am attempting to make a Caesar cipher that changes the key each letter, I currently have a working cipher that scrambles the entire string once, running 1-25 however I would like it to do it for each letter, as in the string "ABC" would shift A by 1, B by 2 and C by 3, resulting in BDF
I already have a working cipher, and am just not sure how to have it change each letter.
upper = collections.deque(string.ascii_uppercase)
lower = collections.deque(string.ascii_lowercase)
upper.rotate(number_to_rotate_by)
lower.rotate(number_to_rotate_by)
upper = ''.join(list(upper))
lower = ''.join(list(lower))
return rotate_string.translate(str.maketrans(string.ascii_uppercase, upper)).translate(str.maketrans(string.ascii_lowercase, lower))
#print (caesar("This is simple", 2))
our_string = "ABC"
for i in range(len(string.ascii_uppercase)):
print (i, "|", caesar(our_string, i))
Outcome is this:
0 | ABC
1 | ZAB
2 | YZA
3 | XYZ
4 | WXY
5 | VWX
6 | UVW
7 | TUV
8 | STU
9 | RST
10 | QRS
11 | PQR
12 | OPQ
13 | NOP
14 | MNO
15 | LMN
16 | KLM
17 | JKL
18 | IJK
19 | HIJ
20 | GHI
21 | FGH
22 | EFG
23 | DEF
24 | CDE
25 | BCD
What I would like is to have it a shift of 1 or 0 for the first letter, then 2 for the second, and so on.

Good effort! Note that the mapping doesn't only rearrange letters in the alphabet, so it's never achieved by rotating the alphabet. In your example, upper would become the following mapping:
ABCDEFGHIJKLMNOPQRSTUVWXYZ
BDFHJLNPRTVXZBDFHJLNPRTVXZ
Also note this cipher is not easily reversible, i.e. it's not clear whether to reverse 'B'->'A' or 'B'->'N'.
(Side note: If we treat letters ZABCDEFGHIJKLMNOPQRSTUVWXY as numbers 0-25, this cipher multiplies by two (in modulo 26): (x*2)%26. If instead of 2, we multiply by any number not divisible by 2 and 13, the resulting cipher will always be reversible. Can you see why? Hints: [1], [2].)
When you feel confused about a piece of code, often it's a good sign it's time to refactor a part of it into a separate function, e.g. like this:
(Playground: https://ideone.com/wNSADR)
import string
def letter_index(letter):
"""Determines the position of the given letter in the English alphabet
'a' -> 0
'A' -> 0
'z' -> 25
"""
if letter not in string.ascii_letters:
raise ValueError("The argument must be an English letter")
if letter in string.ascii_lowercase:
return ord(letter) - ord('a')
return ord(letter) - ord('A')
def caesar(s):
"""Ciphers the string s by shifting 'A'->'B', 'B'->'D', 'C'->'E', etc
The shift is cyclic, i.e. 'A' comes after 'Z'.
"""
ret = ""
for letter in s:
index = letter_index(letter)
new_index = 2*index + 1
if new_index >= len(string.ascii_lowercase):
# The letter is shifted farther than 'Z'
new_index %= len(string.ascii_lowercase)
new_letter = chr(ord(letter) - index + new_index)
ret += new_letter
return ret
print('caesar("ABC"):', caesar("ABC"))
print('caesar("abc"):', caesar("abc"))
print('caesar("XYZ"):', caesar("XYZ"))
Output:
caesar("ABC"): BDF
caesar("abc"): bdf
caesar("XYZ"): VXZ
Resources:
chr
ord

Related

Unify different separators in column

I have a dataframe df with ~450000 rows and 4 columns like "HK" as in the example:
df = pd.DataFrame(
{
"HK": [
"19000000-ac-;ghj-;qrs",
"19000000- abcd-",
"19000000 -abc;klm-",
"19000000 - abc-;",
"19000000 a-",
]
}
)
df.head()
| HK
| -------------
| 19000000-ac-;ghj-;qrs
| 19000000- abcd-
| 19000000 -abc-;klm-
| 19000000 - abc-;
| 19000000 a-
I always have 8 digits followed by a value. The digits and the value are separated through different forms of "-" (no whitespace inbetween digits and value, whitespace left, whitespace right, whitespace left and right or only a whitespace without a "-").
I would like to get a unified presentation whith "$digits$ - $value$" so that my column looks like this:
| HK
| -------------
| 19000000 - ac-;ghj-;qrs
| 19000000 - abcd-
| 19000000 - abc-;klm-
| 19000000 - abc-;
| 19000000 - a-
Using pd.Series.str.replace with a regular expression:
>>> df['HK'].str.replace(r'(?<=\d{8})[\s-]+(?=\w)', ' - ', regex=True)
0 19000000 - ac-;ghj-;qrs
1 19000000 - abcd-
2 19000000 - abc;klm-
3 19000000 - abc-;
4 19000000 - a-
Name: HK, dtype: object
Explaining the regular expression. There is a lookback (?<=\d{8}) requiring that there are eight digits immediately before the main section. The main section is [\s-]+ which requires one or more characters which are whitespace or hyphens. Then there is a lookahead (?=\w) requiring that immediately after this is a word character (in this case, something like a).

Removing rows contains non-english words in Pandas dataframe

I have a pandas data frame that consists of 4 rows, the English rows contain news titles, some rows contain non-English words like this one
**She’s the Hollywood Power Behind Those ...**
I want to remove all rows like this one, so all rows that contain at least non-English characters in the Pandas data frame.
If using Python >= 3.7:
df[df['col'].map(lambda x: x.isascii())]
where col is your target column.
Data:
df = pd.DataFrame({
'colA': ['**She’s the Hollywood Power Behind Those ...**',
'Hello, world!', 'Cainã', 'another value', 'test123*', 'âbc']
})
print(df.to_markdown())
| | colA |
|---:|:------------------------------------------------------|
| 0 | **She’s the Hollywood Power Behind Those ...** |
| 1 | Hello, world! |
| 2 | Cainã |
| 3 | another value |
| 4 | test123* |
| 5 | âbc |
Identifying and filtering strings with non-English characters (see the ASCII printable characters):
df[df.colA.map(lambda x: x.isascii())]
Output:
colA
1 Hello, world!
3 another value
4 test123*
Original approach was to use a user-defined function like this:
def is_ascii(s):
try:
s.encode(encoding='utf-8').decode('ascii')
except UnicodeDecodeError:
return False
else:
return True
You can use regex to do that.
Installation documentation is here. (just a simple pip install regex)
import re
and use [^a-zA-Z] to filter it.
to break it down:
^: Not
a-z: small letter
A-Z: Capital letters

Calculating the percentage of values based on the values in other columns

I am trying to create a column that includes a percentage of values based on the values in other columns in python. For example, let's assume that we have the following dataset.
+------------------------------------+------------+--------+
| Teacher | grades | counts |
+------------------------------------+------------+--------+
| Teacher1 | 1 | 1 |
| | 2 | 2 |
| | 3 | 1 |
| Teacher2 | 2 | 1 |
| Teacher3 | 3 | 2 |
| Teacher4 | 2 | 2 |
| | 3 | 2 |
+------------------------------------+------------+--------+
As you can see we have teachers in the first columns, grades that teacher gives (1,2 and 3) in the second column, and the number of given corresponding grade in third columns. Here, I am trying to get the percentage of grade numbers 1 and 2 in total given grade for each teacher. For instance, teacher 1 gave one grade 1, two grade 2, and one grade 3. In this case, the percentage of given grade numbers 1 and 2 in the total grade is 75%. Teacher 2 gave only 1 grade 2 so the percentage is 100%. Similarly, teacher 3 gave two grade 3 so the percentage 0% because he/she did not give any grades 1 and 2. So these percentages should be added to the new column in the dataset. Honestly, I couldn't even think about anything to try and I didn't find anything about it when I search it in here. Could you please help me to get the column.
I am not sure this is the most efficient way, but I find it quite readable and easy to follow.
percents = {} #store Teacher:percent
for t, g in df.groupby('Teacher'): #t,g is short for teacher,group
total = g.counts.sum()
one_two = g.loc[g.grades.isin([1,2])].counts.sum() #consider only 1&2
percent = (one_two/total)*100
#print(t, percent)
percents[t] = [percent]
xf = pd.DataFrame(percents).T.reset_index() #make a df from the dic
xf.columns = ['Teacher','percent'] #rename columns
df = df.merge(xf) #merge with initial df
print(df)
Teacher grades counts percent
0 Teacher1 1 1 75.0
1 Teacher1 2 2 75.0
2 Teacher1 3 1 75.0
3 Teacher2 2 1 100.0
4 Teacher3 3 2 0.0
5 Teacher4 2 2 50.0
6 Teacher4 3 2 50.0
I believe this will solve your query
y=0
data['Percentage']='None'
for teacher in teachers:
x=data[data['Teachers']==teacher]
total=sum(x['Counts'])
condition1= 1 in set(x['Grades'])
condition2= 2 in set(x['Grades'])
if (condition1==True or condition2==True):
for i in range(y,y+len(x)):
data['Percentage'].iloc[i]=(data['Counts'].iloc[i]/total)*100
else:
for i in range(y,y+len(x)):
data['Percentage'].iloc[i]=0
y=y+len(x)
Output:
Teachers Grades Counts Percentage
0 Teacher1 1 1 25
1 Teacher1 2 2 50
2 Teacher1 3 1 25
3 Teacher2 2 1 100
4 Teacher3 3 2 0
5 Teacher4 2 2 50
6 Teacher4 3 2 50
I have made used of boolean comprehension to segregate the data on
basis on each teacher. Most of the code is self explanatory. For any
other clarification please fill free to leave a comment.

how do I print parts of regex

I cannot print components of matched regex.
I am learning python3 and I need to verify that output of my command matches my needs. I have following short code:
#!/usr/bin/python3
import re
text_to_search = '''
1 | 27 23 8 |
2 | 21 23 8 |
3 | 21 23 8 |
4 | 21 21 21 |
5 | 21 21 21 |
6 | 27 27 27 |
7 | 27 27 27 |
'''
pattern = re.compile('(.*\n)*( \d \| 2[17] 2[137] [ 2][178] \|)')
matches = pattern.finditer(text_to_search)
for match in matches:
print (match)
print ()
print ('matched to group 0:' + match.group(0))
print ()
print ('matched to group 1:' + match.group(1))
print ()
print ('matched to group 2:' + match.group(2))
and following output:
<_sre.SRE_Match object; span=(0, 140), match='\n 1 | 27 23 8 |\n 2 | 21 23 8 |\n 3 >
matched to group 0:
1 | 27 23 8 |
2 | 21 23 8 |
3 | 21 23 8 |
4 | 21 21 21 |
5 | 21 21 21 |
6 | 27 27 27 |
7 | 27 27 27 |
matched to group 1: 6 | 27 27 27 |
matched to group 2: 7 | 27 27 27 |
please explain me:
1) why "print (match)" prints only beginning of match, does it have some kind of limit to trim output if its bigger than some threshold?
2) Why group(1) is printed as "6 | 27 27 27 |" ? I was hope (.*\n)* is as greedy as possible so it consumes everything from 1-6 lines, leaving last line of text_to_search to be matched against group(2), but seems (.*\n)* took only 6-th line. Why is that? Why lines 1-5 are not printed when printing group(1)?
3) I was trying to go through regex tutorial but failed to understand those tricks with (?...). How do I verify if numbers in last row are equal (so 27 27 27 is ok, but 21 27 27 is not)?
1) The print(match) only shows an outline of the object. match is an SRE_Match object, so in order to get information from it you need to do something like match.group(0), which is accessing a value stored in the object.
2) to capture lines 1-6 you need to change (.*\n)* to ((?:.*\n)*) according to this regex tester,
A repeated capturing group will only capture the last iteration. Put a capturing group around the repeated group to capture all iterations or use a non-capturing group instead if you're not interested in the data
3) to match specific numbers you need to make it more specific and include these numbers into a seperate group at the end.

'ValueError: need more than 1 value to unpack' when creating a grid?

I have been tring to make a more difficult grid with 16 words. I have made a grid with 9 words but am unable to do 16 words. I keep on getting the 'ValueError: need more than 1 value to unpack' in my code?
#(Hard) This is the part of the program which puts the words in a Grid.
with open('WordsExt.txt') as f:
wordshard = random.sample([x.rstrip() for x in f],16)
gridhard = [wordshard[i:i + 3] for i in range(0, len(wordshard), 3)]
for x,y,z in gridhard:
print (x,y,z)
Obviously, the error occurs because gridhard does contain less than 3 elements.
The last value of iterator i in the third line of your code example is 15, but wordshard is not longer than 16. In that case, gridhard will only contain 1 letter, and hence, can not be unpacked into three values.
You're doing a 4x4 grid, so those 3s need to become 4s.
Also, you could use the .join method to build each row of your word grid, which makes the output format a little more flexible:
wordshard = [c*4 for c in 'ABCDEFGHIJKLMNOP']
gridhard = [wordshard[i:i + 4] for i in range(0, len(wordshard), 4)]
for row in gridhard:
print(' '.join(row))
output
AAAA BBBB CCCC DDDD
EEEE FFFF GGGG HHHH
IIII JJJJ KKKK LLLL
MMMM NNNN OOOO PPPP
If we change the last line to print(' | '.join(row)) the output becomes:
AAAA | BBBB | CCCC | DDDD
EEEE | FFFF | GGGG | HHHH
IIII | JJJJ | KKKK | LLLL
MMMM | NNNN | OOOO | PPPP
Alternatively, we can get the same output by using the * "splat" unpacking operator, and specifying a separator string in the print call:
print(*row, sep=' | ')

Categories

Resources