string replace in Python 2.7 - python

Using Python 2.7 and working on below string replace problem, wondering if any better ideas in terms of algorithm space complexity and algorithm time complexity?
I create an additional list to represent result since string Python 2.7 is immutable and I also created an additional dictionary to speed-up look-up for character replacement table.
In the example, From: "lod" and To: "xpf" means when met with l, replace to x ; and when met with o, replace to p; and when met with d, replace to f.
'''
Given "data", "from", and "to" fields, replaces all occurrences of the characters in the "from" field in the "data" field, with their counterparts in the "to" field.
Example:
Input:
Data: "Hello World"
From: "lod"
To: "xpf"
Output:
"Hexxp Wprxf"
'''
from collections import defaultdict
def map_strings(from_field, to_field, data):
char_map = defaultdict(str)
result = []
for i,v in enumerate(from_field):
char_map[v]=to_field[i]
for v in data:
if v not in char_map:
result.append(v)
else:
result.append(char_map[v])
return ''.join(result)
if __name__ == "__main__":
print map_strings('lod', 'xpf', 'Hexxp Wprxf')

There's efficient machinery in the standard modules for this. You first build a translation table using string.maketrans, then call the str.translate method:
import string
trans = string.maketrans('lod', 'xpf')
print "Hello World".translate(trans)
output
Hexxp Wprxf
But if you want to do it manually, here's a way that's a little more efficient than your current code:
def map_strings(from_field, to_field, data):
char_map = dict(zip(from_field, to_field))
return ''.join([char_map.get(c, c) for c in data])
s = map_strings('lod', 'xpf', 'Hello World')
print s
Note that in Python 3 the string.maketrans function no longer exists. There's now a str.maketrans method, with slightly different behaviour.

You can also use replace:
def map_strings(from_field, to_field, data):
for f, t in zip(from_field, to_field):
data = data.replace(f, t)
return data

Related

How to replace all T with U in an input string of DNA?

So, the task is quite simple. I just need to replace all "T"s with "U"s in an input string of DNA. I have written the following code:
def transcribe_dna_to_rna(s):
base_change = {"t":"U", "T":"U"}
replace = "".join([base_change(n,n) for n in s])
return replace.upper()
and for some reason, I get the following error code:
'dict' object is not callable
Why is it that my dictionary is not callable? What should I change in my code?
Thanks for any tips in advance!
To correctly convert DNA to RNA nucleotides in string s, use a combination of str.maketrans and str.translate, which replaces thymine to uracil while preserving the case. For example:
s = 'ACTGactgACTG'
s = s.translate(str.maketrans("tT", "uU"))
print(s)
# ACUGacugACUG
Note that in bioinformatics, case (lower or upper) is often important and should be preserved, so keeping both t -> u and T -> U is important. See, for example:
Uppercase vs lowercase letters in reference genome
SEE ALSO:
Character Translation using Python (like the tr command)
Note that there are specialized bioinformatics tools specifically for handling biological sequences.
For example, BioPython offers transcribe:
from Bio.Seq import Seq
my_seq = Seq('ACTGactgACTG')
my_seq = my_seq.transcribe()
print(my_seq)
# ACUGacugACUG
To install BioPython, use conda install biopython or conda create --name biopython biopython.
The syntax error tells you that base_change(n,n) looks like you are trying to use base_change as the name of a function, when in fact it is a dictionary.
I guess what you wanted to say was
def transcribe_dna_to_rna(s):
base_change = {"t":"U", "T":"U"}
replace = "".join([base_change.get(n, n) for n in s])
return replace.upper()
where the function is the .get(x, y) method of the dictionary, which returns the value for the key in x if it is present, and otherwise y (so in this case, return the original n if it's not in the dictionary).
But this is overcomplicating things; Python very easily lets you replace characters in strings.
def transcribe_dna_to_rna(s):
return s.upper().replace("T", "U")
(Stole the reordering to put the .upper() first from #norie's answer; thanks!)
If your real dictionary was much larger, your original attempt might make more sense, as long chains of .replace().replace().replace()... are unattractive and eventually inefficient when you have a lot of them.
In python 3, use str.translate:
dna = "ACTG"
rna = dna.translate(str.maketrans("T", "U")) # "ACUG"
Change s to upper and then do the replacement.
def transcribe_dna_to_rna(s):
return s.upper().replace("T", "U")

Efficient group substring search in Python?

Lets say I've loaded some information from a file into a Python3 dict and the result looks like this.
d = {
'hello' : ['hello', 'hi', 'greetings'],
'goodbye': ['bye', 'goodbye', 'adios'],
'lolwut': ['++$(#$(#%$(##*', 'ASDF #!## TOW']
}
Let's say I'm going to analyze a bunch, I mean an absolute ton, of strings. If a string contains any of the values for a given key of d, then I want to categorize it as being in that key.
For example...
'My name is DDP, greetings' => 'hello'
Obviously I can loop through the keys and values like this...
def classify(s, d):
for k, v in d.items():
if any([x in s for x in v]):
return k
return ''
But I want to know if there's a more efficient algorithm for this kind of bulk searching; more efficient than my naive loop. Is anyone aware of such an algorithm?
You can use regex to avoid extra operations. Here all you need is to join the words with a pip character and pass it to re.search(). Since the order or the exact word is not important to you this way you can find out if there's any intersection between any of those values and the given string.
import re
def classify(s, d):
for k, v in d.items():
regex = re.compile(re.escape(r'|'.join(v)))
if regex.search(s):
return k
Also note that you can, instead of returning k yield it to get an iterator of all occurrences or use a dictionary to store them, etc.

A better way to rewrite multiple appended replace methods using an input array of strings in python?

I have a really ugly command where I use many appended "replace()" methods to replace/substitute/scrub many different strings from an original string. For example:
newString = originalString.replace(' ', '').replace("\n", '').replace('()', '').replace('(Deployed)', '').replace('(BeingAssembled)', '').replace('ilo_', '').replace('ip_', '').replace('_ilop', '').replace('_ip', '').replace('backupnetwork', '').replace('_ilo', '').replace('prod-', '').replace('ilo-','').replace('(EndofLife)', '').replace('lctcvp0033-dup,', '').replace('newx-', '').replace('-ilo', '').replace('-prod', '').replace('na,', '')
As you can see, it's a very ugly statement and makes it very difficult to know what strings are in the long command. It also makes it hard to reuse.
What I'd like to do is define an input array of of many replacement pairs, where a replacement pair looks like [<ORIGINAL_SUBSTRING>, <NEW_SUBSTRING>]; where the greater array looks something like:
replacementArray = [
[<ORIGINAL_SUBSTRING>, <NEW_SUBSTRING>],
[<ORIGINAL_SUBSTRING>, <NEW_SUBSTRING>],
[<ORIGINAL_SUBSTRING>, <NEW_SUBSTRING>],
[<ORIGINAL_SUBSTRING>, <NEW_SUBSTRING>]
]
AND, I'd like to pass that replacementArray, along with the original string that needs to be scrubbed to a function that has a structure something like:
def replaceAllSubStrings(originalString, replacementArray):
newString = ''
for each pair in replacementArray:
perform the substitution
return newString
MY QUESTION IS: What is the right way to write the function's code block to apply each pair in the replacementArray? Should I be using the "replace()" method? The "sub()" method? I'm confused as to how to restructure the original code into a nice clean function.
Thanks, in advance, for any help you can offer.
You have the right idea. Use sequence unpacking to iterate each pair of values:
def replaceAllSubStrings(originalString, replacementArray):
for in_rep, out_rep in replacementArray:
originalString = originalString.replace(in_rep, out_rep)
return originalString
How about using re?
import re
def make_xlat(*args, **kwds):
adict = dict(*args, **kwds)
rx = re.compile('|'.join(map(re.escape, adict)))
def one_xlat(match):
return adict[match.group(0)]
def xlat(text):
return rx.sub(one_xlat, text)
return xlat
replaces = {
"a": "b",
"well": "hello"
}
replacer = make_xlat(replaces)
replacer("a well?")
# b hello?
You can add as many items in replaces as you want.

Returning multiple variables from a single list

Using Python 3
This is very basic I'm sure. The code is used to return the Country, from the country code provided. Essentially I need the first two letters of the input given.
The code I've worked so far will only output the first "country code"
def get_country_codes(prices):
c = prices.split(',')
for char in c:
return char[:2]
print(get_country_codes("NZ$300, KR$1200, DK$5"))
output:
NZ
Wanted output:
NZ, KR, DK
Very easy to one liner this:
>>> def get_country_codes(prices):
return [cc.strip()[:2] for cc in prices.split(',')]
>>> print(get_country_codes("NZ$300, KR$1200, DK$5"))
['NZ', 'KR', 'DK']
>>>
What your program was doing was executing the for loop, but when return is called, it terminates the function; your implementation looked as though you wanted a generator (i.e. using yield) which is doable, but probably more cumbersome than necessary for this.
def get_country_codes(prices):
values = []
price_codes = prices.split(',')
for price_code in price_codes:
values.append(price_code.strip()[0:2])
return values # output: ['NZ', 'KR', 'DK']
return ', '.join(values) # output: NZ, KR, DK
print(get_country_codes("NZ$300, KR$1200, DK$5"))
output:
['NZ', 'KR', 'DK']
basically your method was returning the first value from that split list.
You need to iterate on that split list and save each value in another list and return that.
Another approach:
country_price_values = "NZ$300, KR$1200, DK$5"
country_codes = [val.strip()[0:2] for val in country_price_values.split(',')]

How do I remove the spaces when Python prints in for loop?

I'm new to Python. Trying to make a simple function that converts a string input to braille via dict values (with '1' indicating a bump and 0 no bump).
I'm sure there are faster/better ways to do this but what I have is almost working (another way of saying it doesn't work at all).
alphabet = {
'a': '100000','b': '110000','c': '100100','d': '100110'
#etc
}
def answer(plaintext):
for i in str(plaintext):
for key, value in alphabet.iteritems():
if i in key:
print value,
answer('Kiwi')
This prints:
000001101000 010100 010111 010100
My question is how do I remove the spaces? I need it to print as:
000001101000010100010111010100
It's printing as a tuple so I can't use .strip().
For what I'd consider a pythonic way:
def answer(plaintext):
alphabet = { ... }
return "".join([alphabet[c] for c in plaintext])
(this assumes that all letters in plaintext are in alphabet)
brail = []
for i in str(...):
if alphabet.get(i):
brail.append(alphabet[i])
print ''.join(brail)
In the first line of the function, create a container l=[] .Change the last statement inside the loops to l.append(value).Outside the loops,but still inside the function, do return ''.join(l). Should work now.
You can use join() within a generator like this way:
alphabet = {'a': '100000','b': '110000','c': '100100','d': '100110'}
def answer(a = ''):
for i in a:
for key, value in alphabet.iteritems():
if i in key:
yield value
print ''.join(answer('abdaac'))
Output:
>>> 100000110000100110100000100000100100
One way to manipulate printing in Python is to use the end parameter. By default it is a newline character.
print("foo")
print("bar")
will print like this:
foo
bar
In order to make these two statements print on the same line, we can do this:
print("foo",end='')
print("bar",end='')
This will print like this:
foobar
Hopefully this is a helpful way to solve problems such as this.

Categories

Resources