Convert not_camel_case to notCamelCase and/or NotCamelCase in Python? - python

Basically, the reverse of this. Here's my attempt, but it's not working.
def titlecase(value):
s1 = re.sub('(_)([a-z][A-Z][0-9]+)', r'\2'.upper(), value)
return s1

def titlecase(value):
return "".join(word.title() for word in value.split("_"))
Python is more readable than regex, and easier to fix when it's not doing what you want.
If you want the first letter lowercase as well, I would use a second function that calls the function above to do most of the work, then just lowercases the first letter:
def titlecase2(value):
return value[:1].lower() + titlecase(value)[1:]

You have an error with your regex. Instead of
([a-z][A-Z][0-9]+) # would match 'oN3' but not 'one'
use
([a-zA-Z0-9]+) # matches any alphanumeric word
However, this also won't work because r'\2'.upper() can't be used that way. Instead, try:
s1 = re.sub('(_)([a-zA-Z0-9]+)', lambda p: p.group(2).capitalize(), value)

#kindall provide good solution(credit goes to him).
But if you want syntax "myCamel" the first word does not need to be capitalized then you have to change a bit:
def titlecase(value):
rest = value.split("_")
return rest[0]+"".join(word.title() for word in rest[1:])

For NotCamelCase, Using a regex or a loop sounds like an overkill.
str.title().replace("_", "")

Like jtbandes said, you should mash the character classes together like
([a-zA-Z0-9]+)
The next trick is what you do with the replacement. When you say
r'\2'.upper()
the upper() actually happens before called sub. But you can use another feature of sub: you can pass a function to handle the match:
re.sub('(_)([a-zA-Z0-9]+)', lambda match: match.group(2).capitalize(), value)
Now your lambda will get called with the match. Also you can use subn to have the replacement happen on more than one place:
re.subn('(_)([a-zA-Z0-9]+)', lambda match: match.group(2).capitalize(), value)[0]

Related

How to split strings with special characters without removing those characters?

I'm writing this function which needs to return an abbreviated version of a str. The return str must contain the first letter, number of characters removed and the, last letter;it must be abbreviated per word and not by sentence, then after that I need to join every word again with the same format including the special-characters. I tried using the re.findall() method but it automatically removes the special-characters so I can't use " ".join() because it will leave out the special-characters.
Here's my code:
import re
def abbreviate(wrd):
return " ".join([i if len(i) < 4 else i[0] + str(len(i[1:-1])) + i[-1] for i in re.findall(r"[\w']+", wrd)])
print(abbreviate("elephant-rides are really fun!"))
The output would be:
e6t r3s are r4y fun
But the output should be:
e6t-r3s are r4y fun!
No need for str.join. Might as well take full advantage of what the re module has to offer.
re.sub accepts a string or a callable object (like a function or lambda), which takes the current match as an input and must return a string with which to replace the current match.
import re
pattern = "\\b[a-z]([a-z]{2,})[a-z]\\b"
string = "elephant-rides are really fun!"
def replace(match):
return f"{match.group(0)[0]}{len(match.group(1))}{match.group(0)[-1]}"
abbreviated = re.sub(pattern, replace, string)
print(abbreviated)
Output:
e6t-r3s are r4y fun!
>>>
Maybe someone else can improve upon this answer with a cuter pattern, or any other suggestions. The way the pattern is written now, it assumes that you're only dealing with lowercase letters, so that's something to keep in mind - but it should be pretty straightforward to modify it to suit your needs. I'm not really a fan of the repetition of [a-z], but that's just the quickest way I could think of for capturing the "inner" characters of a word in a separate capturing group. You may also want to consider what should happen with words/contractions like "don't" or "shouldn't".
Thank you for viewing my question. After a few more searches, trial, and error I finally found a way to execute my code properly without changing it too much. I simply substituted re.findall(r"[\w']+", wrd) with re.split(r'([\W\d\_])', wrd) and also removed the whitespace in "".join() for they were simply not needed anymore.
import re
def abbreviate(wrd):
return "".join([i if len(i) < 4 else i[0] + str(len(i[1:-1])) + i[-1] for i in re.split(r'([\W\d\_])', wrd)])
print(abbreviate("elephant-rides are not fun!"))
Output:
e6t-r3s are not fun!

I want to split a string by a character on its first occurence, which belongs to a list of characters. How to do this in python?

Basically, I have a list of special characters. I need to split a string by a character if it belongs to this list and exists in the string. Something on the lines of:
def find_char(string):
if string.find("some_char"):
#do xyz with some_char
elif string.find("another_char"):
#do xyz with another_char
else:
return False
and so on. The way I think of doing it is:
def find_char_split(string):
char_list = [",","*",";","/"]
for my_char in char_list:
if string.find(my_char) != -1:
my_strings = string.split(my_char)
break
else:
my_strings = False
return my_strings
Is there a more pythonic way of doing this? Or the above procedure would be fine? Please help, I'm not very proficient in python.
(EDIT): I want it to split on the first occurrence of the character, which is encountered first. That is to say, if the string contains multiple commas, and multiple stars, then I want it to split by the first occurrence of the comma. Please note, if the star comes first, then it will be broken by the star.
I would favor using the re module for this because the expression for splitting on multiple arbitrary characters is very simple:
r'[,*;/]'
The brackets create a character class that matches anything inside of them. The code is like this:
import re
results = re.split(r'[,*;/]', my_string, maxsplit=1)
The maxsplit argument makes it so that the split only occurs once.
If you are doing the same split many times, you can compile the regex and search on that same expression a little bit faster (but see Jon Clements' comment below):
c = re.compile(r'[,*;/]')
results = c.split(my_string)
If this speed up is important (it probably isn't) you can use the compiled version in a function instead of having it re compile every time. Then make a separate function that stores the actual compiled expression:
def split_chars(chars, maxsplit=0, flags=0, string=None):
# see note about the + symbol below
c = re.compile('[{}]+'.format(''.join(chars)), flags=flags)
def f(string, maxsplit=maxsplit):
return c.split(string, maxsplit=maxsplit)
return f if string is None else f(string)
Then:
special_split = split_chars(',*;/', maxsplit=1)
result = special_split(my_string)
But also:
result = split_chars(',*;/', my_string, maxsplit=1)
The purpose of the + character is to treat multiple delimiters as one if that is desired (thank you Jon Clements). If this is not desired, you can just use re.compile('[{}]'.format(''.join(chars))) above. Note that with maxsplit=1, this will not have any effect.
Finally: have a look at this talk for a quick introduction to regular expressions in Python, and this one for a much more information packed journey.

str.replace() or re.sub() continually until substring no longer present

Let's say I have the following string: 'streets are shiny.' I wish to find every occurrence of the string 'st' and replace it with 'ts'. So the result should read 'tseets are shiny'.
I know this can be done using re.sub() or str.replace(). However, say I have the following strings:
'st'
'sts'
'stst'
I want them to change to 'ts','tss' and 'ttss' respectively, as I want all occurrences of 'st' to change to 'ts'.
What is the best way to replace these strings with optimal runtime? I know I could continually perform a check to see if "st" in string until this returns False, but is there a better way?
I think that a while loop that just checks if the 'st' is in the string is best in this case:
def recursive_replace(s, sub, new):
while sub in s:
s = s.replace(sub, new)
return s
tests = ['st', 'sts', 'stst']
print [recursive_replace(test, 'st', 'ts') for test in tests]
#OUT: ['ts', 'tss', 'ttss']
While the looping solutions are probably the simplest, you can actually write a re.sub call with a custom function to do all the transformations at once.
The key insight for this is that your rule (changing st to ts) will end up moving all ss in a block of mixed ss and ts to the right of all the ts. We can simply count the ss and ts and make an appropriate replacement:
def sub_func(match):
text = match.group(1)
return "t"*text.count("t") + "s"*text.count("s")
re.sub(r'(s[st]*t)', sub_func, text)
You can do that with a pretty simple while loop:
s="stst"
while('st' in s):
s = s.replace("st", "ts")
print(s)
ttss
If you want to continually check, then the other questions work well (with the problem that if you have something like stt you would get stt->tst->tts). I don't know if want that.
I think however, that you are trying to replace multiple occurences of st with ts. If that is the case, you should definitely use string.replace. .replace replaces every occurrence of a str, up to the extent you want.
This should be faster according to this.
string.replace(s, old, new[, maxreplace])
example:
>>>import string
>>>st='streets are shiny.streets are shiny.streets are shiny.'
>>>string.replace(st,'st','ts')
#out: 'tsreets are shiny.tsreets are shiny.tsreets are shiny.'
Naively you could do:
>>> ['t'*s.count('t')+'s'*s.count('s') for s in ['st', 'sts', 'stst']]
['ts', 'tss', 'ttss']

How do you filter a string to only contain letters?

How do I make a function where it will filter out all the non-letters from the string? For example, letters("jajk24me") will return back "jajkme". (It needs to be a for loop) and will string.isalpha() function help me with this?
My attempt:
def letters(input):
valids = []
for character in input:
if character in letters:
valids.append( character)
return (valids)
If it needs to be in that for loop, and a regular expression won't do, then this small modification of your loop will work:
def letters(input):
valids = []
for character in input:
if character.isalpha():
valids.append(character)
return ''.join(valids)
(The ''.join(valids) at the end takes all of the characters that you have collected in a list, and joins them together into a string. Your original function returned that list of characters instead)
You can also filter out characters from a string:
def letters(input):
return ''.join(filter(str.isalpha, input))
or with a list comprehension:
def letters(input):
return ''.join([c for c in input if c.isalpha()])
or you could use a regular expression, as others have suggested.
import re
valids = re.sub(r"[^A-Za-z]+", '', my_string)
EDIT: If it needs to be a for loop, something like this should work:
output = ''
for character in input:
if character.isalpha():
output += character
See re.sub, for performance consider a re.compile to optimize the pattern once.
Below you find a short version which matches all characters not in the range from A to Z and replaces them with the empty string. The re.I flag ignores the case, thus also lowercase (a-z) characters are replaced.
import re
def charFilter(myString)
return re.sub('[^A-Z]+', '', myString, 0, re.I)
If you really need that loop there are many awnsers, explaining that specifically. However you might want to give a reason why you need a loop.
If you want to operate on the number sequences and thats the reason for the loop consider replacing the replacement string parameter with a function like:
import re
def numberPrinter(matchString) {
print(matchString)
return ''
}
def charFilter(myString)
return re.sub('[^A-Z]+', '', myString, 0, re.I)
The method string.isalpha() checks whether string consists of alphabetic characters only. You can use it to check if any modification is needed.
As to the other part of the question, pst is just right. You can read about regular expressions in the python doc: http://docs.python.org/library/re.html
They might seem daunting but are really useful once you get the hang of them.
Of course you can use isalpha. Also, valids can be a string.
Here you go:
def letters(input):
valids = ""
for character in input:
if character.isalpha():
valids += character
return valids
Not using a for-loop. But that's already been thoroughly covered.
Might be a little late, and I'm not sure about performance, but I just thought of this solution which seems pretty nifty:
set(x).intersection(y)
You could use it like:
from string import ascii_letters
def letters(string):
return ''.join(set(string).intersection(ascii_letters))
NOTE:
This will not preserve linear order. Which in my use case is fine, but be warned.

Right-to-left string replace in Python?

I want to do a string replace in Python, but only do the first instance going from right to left. In an ideal world I'd have:
myStr = "mississippi"
print myStr.rreplace("iss","XXX",1)
> missXXXippi
What's the best way of doing this, given that rreplace doesn't exist?
rsplit and join could be used to simulate the effects of an rreplace
>>> 'XXX'.join('mississippi'.rsplit('iss', 1))
'missXXXippi'
>>> myStr[::-1].replace("iss"[::-1], "XXX"[::-1], 1)[::-1]
'missXXXippi'
>>> re.sub(r'(.*)iss',r'\1XXX',myStr)
'missXXXippi'
The regex engine cosumes all the string and then starts backtracking untill iss is found. Then it replaces the found string with the needed pattern.
Some speed tests
The solution with [::-1] turns out to be faster.
The solution with re was only faster for long strings (longer than 1 million symbols).
you may reverse a string like so:
myStr[::-1]
to replace just add the .replace:
print myStr[::-1].replace("iss","XXX",1)
however now your string is backwards, so re-reverse it:
myStr[::-1].replace("iss","XXX",1)[::-1]
and you're done.
If your replace strings are static just reverse them in file to reduce overhead.
If not, the same trick will work.
myStr[::-1].replace("iss"[::-1],"XXX"[::-1],1)[::-1]
def rreplace(s, old, new):
try:
place = s.rindex(old)
return ''.join((s[:place],new,s[place+len(old):]))
except ValueError:
return s
You could also use str.rpartition() which splits the string by the specified separator from right and returns a tuple:
myStr = "mississippi"
first, sep, last = myStr.rpartition('iss')
print(first + 'XXX' + last)
# missXXXippi
Using the package fishhook (available through pip), you can add this functionality.
from fishhook import hook
#hook(str)
def rreplace(self, old, new, count=-1):
return self[::-1].replace(old[::-1], new[::-1], count)[::-1]
print('abxycdxyef'.rreplace('xy', '--', count=1))
# 'abxycd--ef'
It's kind of a dirty hack, but you could reverse the string and replace with also reversed strings.
"mississippi".reverse().replace('iss'.reverse(), 'XXX'.reverse(),1).reverse()

Categories

Resources