Pandas Series Conditionally Change String - python

I am trying to change the strings in a Pandas Series by a condition. If the string name is say 'A', it should be 'AA'. The code snippet below works but it is very un elegant and inefficient. I am passing a Pandas series as an argument as I said. Is there any other way to accomplish this?
def conditions(x):
if x == 'A':
return "AA"
elif x == 'B':
return "BB"
elif x == 'C':
return "CC"
elif x == 'D':
return "DD"
elif x == 'E':
return "EE"
elif x == 'F':
return "FF"
elif x == 'G':
return "GG"
elif x == 'H':
return "HH"
elif x == 'I':
return "I"
elif x == 'J':
return "JJ"
elif x == 'K':
return "KK"
elif x == 'L':
return 'LL'
func = np.vectorize(conditions)
test = func(rfqs["client"])

If you are just trying to repeat a given string, you can add the string to itself across all rows at once. If you have some other condition, you can specify that condition and add the string to itself only for rows that meet the criteria. See this toy example:
df = pd.DataFrame({'client': ['A', 'B', 'Z']})
df.loc[df['client'].str.contains('[A-L]'), 'client'] = df['client'] * 2
to get
client
0 AA
1 BB
2 Z

You can use a dictionary to avoid all those if elses:
d = {i:2*i for i in ('A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L')}
test = rfqs["client"].apply(lambda x: d[x] if x in d else x)

Here is another way:
l = list('ABCDEFGHIJKL')
df['col'] = df['col'].mask(df['col'].isin(l),df['col'].str.repeat(2))
With np.where()
df['col'].mul(np.where(df['col'].isin(l),2,1))
With map()
df['A'].mul(df['A'].map({i:2 for i in l}).fillna(1).astype(int))

Related

Python- Function that checks the string in input and for each letter (correspond a score) return the total [duplicate]

This question already has answers here:
Why does "a == x or y or z" always evaluate to True? How can I compare "a" to all of those?
(8 answers)
Closed 2 years ago.
I'm new to python and what I'm trying to do is:
scores per letter:
1 point: e, a, i, o, n, r, t, l, s, u
2 point: d, g
3 point: b, c, m, p
4 point: f, h, v, w, y
5 point: k
8 point: j, x
10 point: q, z
considering the string 'hello' the value in outcome should be: 7 (h= 4, 1= e, 1= l, 1 = o).
my code :
def points_calc2(string):
points = 0
for i in string:
if i == 'e' or 'a' or 'i' or 'o' or 'n' or 'r' or 't' or 'l' or 's' or 'u':
points += 1
elif i == 'd' or 'g':
points += 2
elif i == 'b' or 'c' or 'm' or 'p':
points += 3
elif i == 'f' or 'h' or 'v' or 'w' or 'y':
points += 4
elif i == 'k':
points += 5
elif i == 'j' or 'x':
points += 8
elif i == 'q' or 'z':
points += 10
return points
thanks for the help!
You can make a few lists which would contain certain letters, for example a list which contains letters that count as one point:
tab_onepoint = ['e','a','i','o','n','r','t','l','s','u']
A list which contains letters that count as two points:
tab_twopoint = ['d', 'g']
And so on, and so on.
And then you could iterate through the string and check if "i" is in the "onepoint" list, so if it is, +1 would be added to the score. So after iterating through the string "Hello" score would be equal to 4. You can then make the other lists, so it would iterate through them too and it would add to the score adequately.
Full code:
tab_onepoint = ['e','a','i','o','n','r','t','l','s','u']
string = "Hello"
score = 0
for i in string:
if i in tab_onepoint:
score += 1
elif i in tab_twopoint:
score+= 2
//edit:
Here is also a solution for a dictionary (as one person in comments suggested):
points = {**dict.fromkeys(['e', 'a', 'i', 'o', 'n', 'r', 't', 'l', 's', 'u'], 1), **dict.fromkeys(['d', 'g'], 2), **dict.fromkeys(['b', 'c', 'm', 'p'], 3), **dict.fromkeys(['f', 'h', 'v', 'w', 'y'], 4), 'k': 5, **dict.fromkeys(['j', 'x'], 8), **dict.fromkeys(['q', 'z'], 10)}
string = "Hello"
score = 0
converted_string = string.lower()
for i in converted_string:
if i in points:
score = score + points[i]
print(score)
What is "fromkeys" method doing, is basically creating dictionaries inside this one dictionary. So basically, every single letter of 'e', 'a', 'i', 'o', 'n', 'r', 't', 'l', 's', 'u' is becoming a key to the value of 1. The same goes for the rest of the keys and their values. Note that I'm converting the input string to lower letters, just so it will fit with the dictionary's keys.
You just have to pack this solution into a function, and here you go. I guess it would also be okay to make this dictionary global.
Also - you've made a mistake in your spelling. "Hello" gets us a score of 8, you wrote "Helo", which gets us a score of 7.
Below
The code uses a dict in order to get the point per char. It also uses the sum function.
lookup = {'e': 1, 'a': 1, 'q': 10, 'z': 10} # TODO add more
my_string = 'Hello python zozo'
points = sum(lookup.get(x, 0) for x in my_string)
print(points)
The above answers are probably correct, but I notice you wrote
considering the string 'hello' the value in outcome should be: 7 (h= 4, 1= e, 1= l, 1 = o).
Either it's a typo and you meant the outcome should be 8 (counting the two l's), or you only want to take unique letters into account. In that case:
def points_calc2(string):
points = 0
for i in set(string):
if i in ['e' , 'a' , 'i' , 'o' , 'n' , 'r' , 't' , 'l' , 's' , 'u']:
points += 1
elif i in ['d' , 'g']:
points += 2
elif i in ['b' , 'c' , 'm' , 'p']:
points += 3
elif i in ['f' , 'h' , 'v' , 'w' , 'y']:
points += 4
elif i == 'k':
points += 5
elif i in ['j' , 'x']:
points += 8
elif i in ['q' , 'z']:
points += 10
return points

Loop over letters in a string that contains the alphabet to determine which are missing from a dictionary

I am very new to python and trying to find the solution to this for a class.
I need the function missing_letters to take a list, check the letters using histogram and then loop over the letters in alphabet to determine which are missing from the input parameter. Finally I need to print the letters that are missing, in a string.
alphabet = "abcdefghijklmnopqrstuvwxyz"
test = ["one","two","three"]
def histogram(s):
d = dict()
for c in s:
if c not in d:
d[c] = 1
else:
d[c] += 1
return d
def missing_letter(s):
for i in s:
checked = (histogram(i))
As you can see I haven't gotten very far, at the moment missing_letters returns
{'o': 1, 'n': 1, 'e': 1}
{'t': 1, 'w': 1, 'o': 1}
{'t': 1, 'h': 1, 'r': 1, 'e': 2}
I now need to loop over alphabet to check which characters are missing and print. Any help and direction will be much appreciated. Many thanks!
You can use set functions in python, which is very fast and efficient:
alphabet = set('abcdefghijklmnopqrstuvwxyz')
s1 = 'one'
s2 = 'two'
s3 = 'three'
list_of_missing_letters = set(alphabet) - set(s1) - set(s2) - set(s3)
print(list_of_missing_letters)
Or like this:
from functools import reduce
alphabet = set('abcdefghijklmnopqrstuvwxyz')
list_of_strings = ['one', 'two', 'three']
list_of_missing_letters = set(alphabet) - \
reduce(lambda x, y: set(x).union(set(y)), list_of_strings)
print(list_of_missing_letters)
Or using your own histogram function:
alphabet = "abcdefghijklmnopqrstuvwxyz"
test = ["one", "two", "three"]
def histogram(s):
d = dict()
for c in s:
if c not in d:
d[c] = 1
else:
d[c] += 1
return d
def missing_letter(t):
test_string = ''.join(t)
result = []
for l in alphabet:
if l not in histogram(test_string).keys():
result.append(l)
return result
print(missing_letter(test))
Output:
['a', 'b', 'c', 'd', 'f', 'g', 'i', 'j', 'k', 'l', 'm', 'p', 'q', 's', 'u', 'v', 'x', 'y', 'z']
from string import ascii_lowercase
words = ["one","two","three"]
letters = [l.lower() for w in words for l in w]
# all letters not in alphabet
letter_str = "".join(x for x in ascii_lowercase if x not in letters)
Output:
'abcdfgijklmpqsuvxyz'
It is not the easiest question to understand, but from what I can gather you require all the letters of the alphabet not in the input to be returned in console.
So a loop as opposed to functions which have been already shown would be:
def output():
output = ""
for i in list(alphabet):
for key in checked.keys():
if i != key:
if i not in list(output):
output += i
print(output)
Sidenote: Please either make checked a global variable or put it outside of function so this function can use it

Python Function that receives a letter and rotates that letter 13 places to the right

I'm trying to create a Python function that uses the Caesar cipher to encrypt a message.
So far, the code I have is
letter = input("Enter a letter: ")
def alphabet_position(letter):
alphabet_pos = {'A':0, 'a':0, 'B':1, 'b':1, 'C':2, 'c':2, 'D':3,
'd':3, 'E':4, 'e':4, 'F':5, 'f':5, 'G':6, 'g':6,
'H':7, 'h':7, 'I':8, 'i':8, 'J':9, 'j':9, 'K':10,
'k':10, 'L':11, 'l':11, 'M':12, 'm':12, 'N': 13,
'n':13, 'O':14, 'o':14, 'P':15, 'p':15, 'Q':16,
'q':16, 'R':17, 'r':17, 'S':18, 's':18, 'T':19,
't':19, 'U':20, 'u':20, 'V':21, 'v':21, 'W':22,
'w':22, 'X':23, 'x':23, 'Y':24, 'y':24, 'Z':25, 'z':25 }
pos = alphabet_pos[letter]
return pos
When I try to run my code, it will ask for the letter but it doesn't return anything after that
Please help if you have any suggestions.
you would need to access your dictionary in a different way:
pos = alphabet_pos.get(letter)
return pos
and then you can finally call the function.
alphabet_position(letter)
You can define two dictionaries, one the reverse of the other. You need to be careful on a few aspects:
Whether case is important. If it's not, use str.casefold as below.
What happens when you roll off the end of the alphabet, e.g. 13th letter after "z". Below we assume you start from the beginning again.
Don't type out the alphabet manually. You can use the string module.
Here's a demo:
letter = input("Enter a letter: ")
from string import ascii_lowercase
def get_next(letter, n):
pos_alpha = dict(enumerate(ascii_lowercase))
alpha_pos = {v: k for k, v in pos_alpha.items()}
return pos_alpha[alpha_pos[letter.casefold()] + n % 26]
get_next(letter, 13)
Enter a letter: a
'n'
If you need a entirely new encoded dict
import string
import numpy as np, random
letters = string.ascii_uppercase
d=dict(zip(list(letters),range(0,len(letters))))
encoded_dic={}
def get_caesar_value(v, by=13):
return(v+by)%26
for k,v in d.items():
encoded_dic[k]=chr(65+get_caesar_value(v))
print(encoded_dic)
Output:
{'A': 'N', 'C': 'P', 'B': 'O', 'E': 'R', 'D': 'Q', 'G': 'T', 'F': 'S', 'I': 'V', 'H': 'U', 'K': 'X', 'J': 'W', 'M': 'Z', 'L': 'Y', 'O': 'B', 'N': 'A', 'Q': 'D', 'P': 'C', 'S': 'F', 'R': 'E', 'U': 'H', 'T': 'G', 'W': 'J', 'V': 'I', 'Y': 'L', 'X': 'K', 'Z': 'M'}
The code you have only maps letters to a position. We'll rewrite it and make a rotate function.
Code
import string
import itertools as it
LOOKUP = {
**{x:i for i, x in enumerate(string.ascii_lowercase)},
**{x:i for i, x in enumerate(string.ascii_uppercase)}
}
def abc_position(letter):
"""Return the alpha position of a letter."""
return LOOKUP[letter]
def rotate(letter, shift=13):
"""Return a letter shifted some positions to the right; recycle at the end."""
iterable = it.cycle(string.ascii_lowercase)
start = it.dropwhile(lambda x: x != letter.casefold(), iterable)
# Advance the iterator
for i, x in zip(range(shift+1), start):
res = x
if letter.isupper():
return res.upper()
return res
Tests
func = abc_position
assert func("a") == 0
assert func("A") == 0
assert func("c") == 2
assert func("z") == 25
func = rotate
assert func("h") == "u"
assert func("a", 0) == "a"
assert func("A", 0) == "A"
assert func("a", 2) == "c"
assert func("c", 3) == "f"
assert func("A", 2) == "C"
assert func("a", 26) == "a"
# Restart after "z"
assert func("z", 1) == "a"
assert func("Z", 1) == "A"
Demo
>>> letter = input("Enter a letter: ")
Enter a letter: h
>>> rot = rotate(letter, 13)
>>> rot
'u'
>>> abc_position(rot)
20
Here we rotated the letter "h" 13 positions, got a letter and then determined the position of this resultant letter in the normal string of abc's.
Details
abc_position()
This function was rewritten to lookup the position of a letter. It merges two dictionaries:
one that enumerates a lowercase ascii letters
one that enumerates a uppercase ascii letters
The string module has this letters already.
rotate()
This function only rotates lowercase letters; uppercase letters are translated from the lowercase position. The string of letters is rotated by making an infinite cycle (an iterator) of lowercase letters.
The cycle is first advanced to start at the desired letter. This is done by dropping all letters that don't look like the one passed in.
Then it is advanced in a loop some number of times equal to shift. The loop is just one way to consume or move the iterator ahead. We only care about the last letter, not the ones in between. This letter is returned, either lower or uppercase.
Since a letter is returned (not a position), you can now use your abc_position() function to find it's normal position.
Alternatives
Other rotation functions can substitute rotate():
import codecs
def rot13(letter):
return codecs.encode(letter, "rot13")
def rot13(letter):
table = str.maketrans(
"ABCDEFGHIJKLMabcdefghijklmNOPQRSTUVWXYZnopqrstuvwxyz",
"NOPQRSTUVWXYZnopqrstuvwxyzABCDEFGHIJKLMabcdefghijklm")
return str.translate(letter, table)
However, these options are constrained to rot13, while rotate() can be shifted by any number. Note: rot26 will cycle back to the beginning, e.g. rotate("a", 26) -> a.
See also this post on how to make true rot13 cipher.
See also docs on itertools.cycle and itertools.dropwhile.
You can do it with quick calculations from ord and chr functions instead:
def encrypt(letter):
return chr((ord(letter.lower()) - ord('a') + 13) % 26 + ord('a'))
so that:
print(encrypt('a'))
print(encrypt('o'))
outputs:
n
b

define a loop which returns different possibilities

Hello I am pretty new to python. I have the following problem:
I want to write a script that, given a (dna) sequence with ambiguities, writes all possible sequences, (if there are less than 100, if there are more than 100 possible sequences, an appropriate error message is printed)
For DNA nucleotide ambiguities: http://www.bioinformatics.org/sms/iupac.html
Example: for the sequence “AYGH” the script’s output would be “ACGA”, “ACGC”, “ACGT”, “ATGA”, “ATGC”, and “ATGT”. A, C, G and T are the default nucleotides. ALL others can have different values (see link).
So i wrote this:
def possible_sequences (seq):
poss_seq = ''
for i in seq:
if i=='A'or i=='C'or i=='G'or i=='T':
poss_seq += i
else:
if i== 'R':
poss_seq += 'A' # OR 'G', how should i implement this?
elif i == 'Y':
poss_seq += 'C' # OR T
elif i == 'S':
poss_seq += 'G' # OR C
elif i == 'W':
poss_seq += 'A' # OR T
elif i == 'K':
poss_seq += 'G' # OR T
elif i == 'M':
poss_seq += 'A' # OR C
elif i == 'B':
poss_seq += 'C' # OR G OR T
elif i == 'D':
poss_seq += 'A' # OR G OR T
elif i == 'H':
poss_seq += 'A' # OR C OR T
elif i == 'V':
poss_seq += 'A' # OR C OR G
elif i == 'N':
poss_seq += 'A' # OR C OR G OR T
elif i == '-' or i == '.':
poss_seq += ' '
return poss_seq
when I test my function:
possible_sequences ('ATRY-C')
i got:
'ATAC C'
but i should have get:
'ATAC C'
'ATAT C'
'ATGC C'
'ATGT C'
Can somebody please help me? I understand that I have to recap the and write a second poss_seq when there is an ambiguity present but I don't know how...
You can use itertools.product to generate the possibilities:
from itertools import product
# List possible nucleotides for each possible item in sequence
MAP = {
'A': 'A',
'C': 'C',
'G': 'G',
'T': 'T',
'R': 'AG',
'Y': 'CT',
'S': 'GC',
'W': 'AT',
'K': 'GT',
'M': 'AC',
'B': 'CGT',
'D': 'AGT',
'H': 'ACT',
'V': 'ACG',
'N': 'ACGT',
'-': ' ',
'.': ' '
}
def possible_sequences(seq):
return (''.join(c) for c in product(*(MAP[c] for c in seq)))
print(list(possible_sequences('AYGH')))
print(list(possible_sequences('ATRY-C')))
Output:
['ACGA', 'ACGC', 'ACGT', 'ATGA', 'ATGC', 'ATGT']
['ATAC C', 'ATAT C', 'ATGC C', 'ATGT C']
In above we first iterate over the items in the given sequence and get the list of possible nucleotides for each item:
possibilities = [MAP[c] for c in 'ATRY-C']
print(possibilities)
# ['A', 'T', 'AG', 'CT', ' ', 'C']
Then the iterable is unpacked as arguments given to product which will return the cartesian product:
products = list(product(*['A', 'T', 'AG', 'CT', ' ', 'C']))
print(products)
# [('A', 'T', 'A', 'C', ' ', 'C'), ('A', 'T', 'A', 'T', ' ', 'C'),
# ('A', 'T', 'G', 'C', ' ', 'C'), ('A', 'T', 'G', 'T', ' ', 'C')]
Finally each one of the products is turned to a string with join:
print(list(''.join(p) for p in products))
# ['ATAC C', 'ATAT C', 'ATGC C', 'ATGT C']
Note that possible_sequences returns a generator instead of constructing all the possible sequences at once so you can easily stop the iteration whenever you want instead of having to wait every sequence to be generated.

encoding using a random cipher

I'm trying to write a program that takes a long string of letters and characters, and creates a dictionary of {original character:random character}. It should remove characters that have already been assigned a random value.
This is what I have:
import random
all_chars='abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789,.!?'
def make_encoder(all_chars):
all_chars=list(all_chars)
encoder = {}
for c in range (0,len(all_chars)):
e = random.choice(all_chars)
all_chars.remove(e)
key = all_chars[c]
encoder[key] = e
return encoder
I keep getting index out of range: 33 on line 10 key = all_chars[c]
Here's my whole code, with the first problem fixed:
import random
all_chars='abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789,.!?'
def make_encoder(all_chars):
list_chars= list(all_chars)
all_chars= list(all_chars)
encoder = {}
i=0
while len(encoder) < len(all_chars):
e = random.choice(all_chars)
key = all_chars[i]
if key not in encoder.keys():
encoder[key] = e
i += 1
return encoder
def encode_message(encoder,msg):
encoded_msg = ""
for x in msg:
c = encoder[x]
encoded_msg = encoded_msg + c
def make_decoder(encoder):
decoder = {}
for k in encoder:
v = encoder[k]
decoder[v] = k
return decoder
def decode_message(decoder,msg):
decoded_msg = ""
for x in msg:
c = decoder[x]
decoded_msg = decoded_msg + c
def main():
alphabet = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789 ,.!?"
e = make_encoder(alphabet)
d = make_decoder(e)
print(e)
print(d)
phrase = input("enter a phrase")
print(phrase)
encoded = encode_message(e,phrase)
print(encoded)
decoded = decode_message(d,encoded)
print(decoded)
I now get TypeError: iteration over non-sequence of type NoneType for the line for x in msg:
You are altering the list. Point: never alter list while iterating over it.
for c in range (0,len(all_chars)): this line will iterate till length of list but at same time you removing element, so list got altered, that is why you got list out of range.
try like this:
import random
all_chars='abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789,.!?'
def make_encoder(all_chars):
all_char = list(all_chars)
encoder = {}
i=0
while len(encoder) < len(all_char):
e = random.choice(all_char)
key = all_char[i]
if key not in encoder.keys():
encoder[key] = e
i += 1
return encoder
output:
>>> make_encoder(all_chars)
{'!': '3', ',': 'l', '.': 'J', '1': 'y', '0': 'l', '3': 'G', '2': ',', '5': '6', '4': 'f', '7': 'f', '6': 'C', '9': 'F', '8': 'y', '?': 'S', 'A': 'm', 'C': 'z', 'B': 'b', 'E': 'J', 'D': '0', 'G': 'S', 'F': 'v', 'I': 'v', 'H': '?', 'K': 'd', 'J': 'X', 'M': 'o', 'L': 'O', 'O': 'Q', 'N': 'P', 'Q': 'Z', 'P': '8', 'S': 'r', 'R': 'h', 'U': 'o', 'T': 'M', 'W': 'l', 'V': '.', 'Y': 'R', 'X': 'C', 'Z': 'a', 'a': 's', 'c': 'Y', 'b': 'X', 'e': 's', 'd': 'd', 'g': 'L', 'f': 'G', 'i': 'm', 'h': 'k', 'k': 'f', 'j': '1', 'm': 'J', 'l': 'L', 'o': '2', 'n': 'N', 'q': 'n', 'p': 'l', 's': 'W', 'r': '7', 'u': 'y', 't': 'S', 'w': 'J', 'v': 'E', 'y': 'r', 'x': 'C', 'z': 'i'}
You're modifying the list as you iterate over it:
for c in range(0,len(all_chars)):
e = random.choice(all_chars)
all_chars.remove(e)
The range item range(0,len(all_chars)) is only generated when the for loop starts. That means it will always assume its length is what it started as.
After you remove a character, all_chars.remove(e), now the list is one item shorter than when the for loop started, leading to the eventual over-run.
How about this instead:
while all_chars: # While there are chars left in the list
...
You should never modify an iterable while you are iterating over it.
Think about it: you told Python to loop from 0 to the length of the list all_chars, which is 66 in the beginning. But you are constantly shrinking this length with all_chars.remove(e). So, the loop still loops 66 times, but all_chars only has 66 items for the first iteration. Afterwards, it has 65, then 64, then 63, etc.
Eventually, you will run into an IndexError when c equals the length of the list (which happens at c==33). Note that it is not when c is greater than the length because Python indexes start at 0:
>>> [1, 2, 3][3] # There is no index 3 because 0 is the first index
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: list index out of range
>>> [1, 2, 3][2] # 2 is the greatest index
3
>>>
To fix the problem, you can either:
Stop removing elements from all_chars inside the loop. That way, its length will always be 66.
Use a while True: loop and break when all_chars is empty (you run out of characters).
I would recommend making two strings or at least separating the two databases.
import random
all_chars='abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789,.!?'
def make_encoder(all_chars):
list_chars= list(all_chars)
all_chars= list(all_chars) #<-------------EDIT
encoder = {}
for c in all_chars:
e = random.choice(list_chars)
list_chars.remove(e)
key = c #<---------------EDIT
encoder[key] = e
return encoder<--------EDIT, unindented this line.
That is your issue, because you were taking away from the list you were iterating though. Making two lists, although a little messy, is the best way.
You don't have to remove it from the initial string (it's bad practice to change a item while iterating over it)
Just check if the item isn't already in the dictonary.
import random
all_chars = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789,.!?'
encoder = {}
n = 0
while len(all_chars) != len(encoder):
rand = random.choice(all_chars)
if rand not in encoder:
encoder[all_chars[n]] = rand
n += 1
for k,v in sorted(encoder.iteritems()):
print k,v
By the way, your encoder may work fine by doing this, but you have no way to decode it back since you are using a random factor to build the encoder. You can fix this by using random.seed('KEY').

Categories

Resources