capitalizing the first letter in a text after split - python

Is there an easy way to capitalize the first letter of each word after "-" in a string and leave the rest of string intact?
x="hi there-hello world from Python - stackoverflow"
expected output is
x="Hi there-Hello world from Python - Stackoverflow"
what I tried is :
"-".join([i.title() for i in x.split("-")]) #this capitalize the first letter in each word; what I want is only the first word after split
Note: "-" isn't always surrounded by spaces

You can do this with a regular expression:
import re
x = "hi there-hello world from Python - stackoverflow"
y = re.sub(r'(^|-\s*)(\w)', lambda m: m.group(1) + m.group(2).upper(), x)
print(y)

try this:
"-".join([i.capitalize() for i in x.split("-")])

Basically what #Milad Barazandeh did but another way to do it
answer = "-".join([i[0].upper() + i[1:] for i in x.split("-")])

Related

Removing numbers and _ symbol from a parsed URL using re module and sub() function in Python

I'm trying to exclude "numbers" and the symbols "-" and "_" from a string that I got parsing a URL.
For example,
string1 = 'historical-fiction_4'
string_cleaned = re.sub("[^a-z]", "", string1)
print(string1)
print(string_cleaned)
historical-fiction_4
historicalfiction
With re.sub("[^a-z]") I got just the strings from a to z but instead of getting the string "historicalfiction" I would like to get "Historical Fiction".
More or less all my data is collected with this structure "name1-name2_number".
If anyone can help me improve my re.sub() call I'll really appreciate. Thanks a lot!
You can use str.title() to capitalize every word:
import re
string1 = "historical-fiction_4"
string1 = re.sub(r"[^a-z]", " ", string1).strip().title()
print(string1)
Prints:
Historical Fiction
It appears to me that your logic is that you want to replace dashes with spaces, but completely strip off underscore and digits. If so, then use two separate calls to replace:
inp = "historical-fiction_4"
output = re.sub(r'[0-9_]+', '', inp.replace("-", " "))
print(output) # historical fiction

How to capitalize specific letters in a string given certain rules

I am massaging strings so that the 1st letter of the string and the first letter following either a dash or a slash needs to be capitalized.
So the following string:
test/string - this is a test string
Should look look like so:
Test/String - This is a test string
So in trying to solve this problem my 1st idea seems like a bad idea - iterate the string and check every character and using indexing etc. determine if a character follows a dash or slash, if it does set it to upper and write out to my new string.
def correct_sentence_case(test_phrase):
corrected_test_phrase = ''
firstLetter = True
for char in test_phrase:
if firstLetter:
corrected_test_phrase += char.upper()
firstLetter = False
#elif char == '/':
else:
corrected_test_phrase += char
This just seems VERY un-pythonic. What is a pythonic way to handle this?
Something along the lines of the following would be awesome but I can't pass in both a dash and a slash to the split:
corrected_test_phrase = ' - '.join(i.capitalize() for i in test_phrase.split(' - '))
Which I got from this SO:
Convert UPPERCASE string to sentence case in Python
Any help will be appreciated :)
I was able to accomplish the desired transformation with a regular expression:
import re
capitalized = re.sub(
'(^|[-/])\s*([A-Za-z])', lambda match: match[0].upper(), phrase)
The expression says "anywhere you match either the start of the string, ^, or a dash or slash followed by maybe some space and a word character, replace the word character with its uppercase."
demo
If you don't want to go with a messy splitting-joining logic, go with a regex:
import re
string = 'test/string - this is a test string'
print(re.sub(r'(^([a-z])|(?<=[-/])\s?([a-z]))',
lambda match: match.group(1).upper(), string))
# Test/String - This is a test string
Using double split
import re
' - '.join([i.strip().capitalize() for i in re.split(' - ','/'.join([i.capitalize() for i in re.split('/',test_phrase)]))])
I'm using that:
import string
last = 'pierre-GARCIA'
if last not in [None, '']:
last = last.strip()
if '-' in last:
last = string.capwords(last, sep='-')
else:
last = string.capwords(last, sep=None)

How to remove special characters except space from a file in python? [duplicate]

This question already has answers here:
Keeping only certain characters in a string using Python?
(3 answers)
Closed 7 months ago.
I have a huge corpus of text (line by line) and I want to remove special characters but sustain the space and structure of the string.
hello? there A-Z-R_T(,**), world, welcome to python.
this **should? the next line#followed- by# an#other %million^ %%like $this.
should be
hello there A Z R T world welcome to python
this should be the next line followed by another million like this
You can use this pattern, too, with regex:
import re
a = '''hello? there A-Z-R_T(,**), world, welcome to python.
this **should? the next line#followed- by# an#other %million^ %%like $this.'''
for k in a.split("\n"):
print(re.sub(r"[^a-zA-Z0-9]+", ' ', k))
# Or:
# final = " ".join(re.findall(r"[a-zA-Z0-9]+", k))
# print(final)
Output:
hello there A Z R T world welcome to python
this should the next line followed by an other million like this
Edit:
Otherwise, you can store the final lines into a list:
final = [re.sub(r"[^a-zA-Z0-9]+", ' ', k) for k in a.split("\n")]
print(final)
Output:
['hello there A Z R T world welcome to python ', 'this should the next line followed by an other million like this ']
I think nfn neil answer is great...but i would just add a simple regex to remove all no words character,however it will consider underscore as part of the word
print re.sub(r'\W+', ' ', string)
>>> hello there A Z R_T world welcome to python
you can try this
import re
sentance = '''hello? there A-Z-R_T(,**), world, welcome to python. this **should? the next line#followed- by# an#other %million^ %%like $this.'''
res = re.sub('[!,*)##%(&$_?.^]', '', sentance)
print(res)
re.sub('["]') -> here you can add which symbol you want to remove
A more elegant solution would be
print(re.sub(r"\W+|_", " ", string))
>>> hello there A Z R T world welcome to python this should the next line followed by another million like this
Here,
re is regex module in python
re.sub will substitute pattern with space i.e., " "
r'' will treat input string as raw (with \n)
\W for all non-words i.e. all special characters *&^%$ etc excluding underscore _
+ will match zero to unlimited matches, similar to * (one to more)
| is logical OR
_ stands for underscore
Create a dictionary mapping special characters to None
d = {c:None for c in special_characters}
Make a translation table using the dictionary. Read the entire text into a variable and use str.translate on the entire text.

Capitalize first character of a word in a string

How is one of the following versions different from the other?
The following code returns the first letter of a word from string capitalize:
s = ' '.join(i[0].upper() + i[1:] for i in s.split())
The following code prints only the last word with every character separated by space:
for i in s.split():
s=' '.join(i[0].upper()+i[1:]
print s
For completeness and for people who find this question via a search engine, the proper way to capitalize the first letter of every word in a string is to use the title method.
>>> capitalize_me = 'hello stackoverlow, how are you?'
>>> capitalize_me.title()
'Hello Stackoverlow, How Are You?'
for i in s.split():`
At this point i is a word.
s = ' '.join(i[0].upper() + i[1:])
Here, i[0] is the first character of the string, and i[1:] is the rest of the string. This, therefore, is a shortcut for s = ' '.join(capitalized_s). The str.join() method takes as its argument a single iterable. In this case, the iterable is a string, but that makes no difference. For something such as ' '.join("this"), str.join() iterates through each element of the iterable (each character of the string) and puts a space between each one. Result: t h i s There is, however, an easier way to do what you want: s = s.title()

Python: Remove First Character of each Word in String

I am trying to figure out how to remove the first character of a words in a string.
My program reads in a string.
Suppose the input is :
this is demo
My intention is to remove the first character of each word of the string, that is
tid, leaving his s emo.
I have tried
Using a for loop and traversing the string
Checking for space in the string using isspace() function.
Storing the index of the letter which is encountered after the
space, i = char + 1, where char is the index of space.
Then, trying to remove the empty space using str_replaced = str[i:].
But it removed the entire string except the last one.
List comprehensions is your friend. This is the most basic version, in just one line
str = "this is demo";
print " ".join([x[1:] for x in str.split(" ")]);
output:
his s emo
In case the input string can have not only spaces, but also newlines or tabs, I'd use regex.
In [1]: inp = '''Suppose we have a
...: multiline input...'''
In [2]: import re
In [3]: print re.sub(r'(?<=\b)\w', '', inp)
uppose e ave
ultiline nput...
You can simply using python comprehension
str = 'this is demo'
mstr = ' '.join([s[1:] for s in str.split(' ')])
then mstr variable will contains these values 'his s emo'
This method is a bit long, but easy to understand. The flag variable stores if the character is a space. If it is, the next letter must be removed
s = "alpha beta charlie"
t = ""
flag = 0
for x in range(1,len(s)):
if(flag==0):
t+=s[x]
else:
flag = 0
if(s[x]==" "):
flag = 1
print(t)
output
lpha eta harlie

Categories

Resources