I have to write a single function that should return the first word in the following strings:
("Hello world") -> return "Hello"
(" a word ") -> return "a"
("don't touch it") -> return "don't"
("greetings, friends") -> return "greetings"
("... and so on ...") -> return "and"
("hi") -> return "hi"
All have to return the first word and as you can see some start with a whitespace, have apostrophes or end with commas.
I've used the following options:
return text.split()[0]
return re.split(r'\w*, text)[0]
Both error at some of the strings, so who can help me???
Try the below code. I tested with all your inputs and it works fine.
import re
text=["Hello world"," a word ","don't touch it","greetings, friends","... and so on ...","hi"]
for i in text:
rgx = re.compile("(\w[\w']*\w|\w)")
out=rgx.findall(i)
print out[0]
Output:
Hello
a
don't
greetings
and
hi
It is tricky to distinguish apostrophes which are supposed to be part of a word and single quotes which are punctuation for the syntax. But since your input examples do not show single quotes, I can go with this:
re.match(r'\W*(\w[^,. !?"]*)', text).groups()[0]
For all your examples, this works. It won't work for atypical stuff like "'tis all in vain!", though. It assumes that words end on commas, dots, spaces, bangs, question marks, and double quotes. This list can be extended on demand (in the brackets).
try this one:
>>> def pm(s):
... p = r"[a-zA-Z][\w']*"
... m = re.search(p,s)
... print m.group(0)
...
test result:
>>> pm("don't touch it")
don't
>>> pm("Hello w")
Hello
>>> pm("greatings, friends")
greatings
>>> pm("... and so on...")
and
>>> pm("hi")
hi
A non-regex solution: stripping off leading punctation/whitespace characters, splitting the string to get the first word, then removing trailing punctuation/whitespace:
from string import punctuation, whitespace
def first_word(s):
to_strip = punctuation + whitespace
return s.lstrip(to_strip).split(' ', 1)[0].rstrip(to_strip)
tests = [
"Hello world",
"a word",
"don't touch it",
"greetings, friends",
"... and so on ...",
"hi"]
for test in tests:
print('#{}#'.format(first_word(test)))
Outputs:
#Hello#
#a#
#don't#
#greetings#
#and#
#hi#
You can try something like this:
import re
pattern=r"[a-zA-Z']+"
def first_word(words_tuple):
match=re.findall(pattern,words_tuple)
for i in match:
if i[0].isalnum():
return i
print(first_word(("don't touch it")))
output:
don't
I've done this by using the first occurrence of white space to stop the "getting" of the first word. Something like this:
stringVariable = whatever sentence
firstWord = ""
stringVariableLength = len(stringVariable)
for i in range(0, stringVariableLength):
if stringVariable[i] != " ":
firstWord = firstWord + stringVariable[i]
else:
break
This code will parse through the string variable that you want to get the first word of, and add it into a new variable called firstWord, until it gets to the first occurance of white space. I'm not exactly sure how you would put that into a function as I'm pretty new to this whole thing, but I'm sure it could be done!
Related
Unclear on how to frame the following function correctly:
Creating a function that will take in a string and return the string in camel case without spaces (or pascal case if the first letter was already capital), removing special characters
text = "This-is_my_test_string,to-capitalize"
def to_camel_case(text):
# Return 1st letter of text + all letters after
return text[:1] + text.title()[1:].replace(i" ") if not i.isdigit()
# Output should be "ThisIsMyTestStringToCapitalize"
the "if" statement at the end isn't working out, and I wrote this somewhat experimentally, but with a syntax fix, could the logic work?
Providing the input string does not contain any spaces then you could do this:
from re import sub
def to_camel_case(text, pascal=False):
r = sub(r'[^a-zA-Z0-9]', ' ', text).title().replace(' ', '')
return r if pascal else r[0].lower() + r[1:]
ts = 'This-is_my_test_string,to-capitalize'
print(to_camel_case(ts, pascal=True))
print(to_camel_case(ts))
Output:
ThisIsMyTestStringToCapitalize
thisIsMyTestStringToCapitalize
Here is a short solution using regex. First it uses title() as you did, then the regex finds non-alphanumeric-characters and removes them, and finally we take the first character to handle pascal / camel case.
import re
def to_camel_case(s):
s1 = re.sub('[^a-zA-Z0-9]+', '', s.title())
return s[0] + s1[1:]
text = "this-is2_my_test_string,to-capitalize"
print(to_camel_case(text)) # ThisIsMyTestStringToCapitalize
The below should work for your example.
Splitting apart your example by anything that isn's alphanumeric or a space. Then capitalizing each word. Finally, returning the re-joined string.
import re
def to_camel_case(text):
words = re.split(r'[^a-zA-Z0-9\s]', text)
return "".join([word.capitalize() for word in words])
text_to_camelcase = "This-is_my_test_string,to-capitalize"
print(to_camel_case(text_to_camelcase))
use the split function to split between anything that is not a letter or a whitespace and the function .capitalize() to capitalize single words
import re
text_to_camelcase = "This-is_my_test_string,to-capitalize"
def to_camel_case(text):
split_text = re.split(r'[^a-zA-Z0-9\s]', text)
cap_string = ''
for word in split_text:
cap_word = word.capitalize()
cap_string += cap_word
return cap_string
print(to_camel_case(text_to_camelcase))
using function
def make_cap(sentence):
return sentence.title()
tryining out
make_cap("hello world")
'Hello World'
# it workd but when I have world like "aren't" and 'isn't". how to write function for that
a = "I haven't worked hard"
make_cap(a)
"This Isn'T A Right Thing" # it's wrong I am aware of \ for isn\'t but confused how to include it in function
This should work:
def make_cap(sentence):
return " ".join(word[0].title() + (word[1:] if len(word) > 1 else "") for word in sentence.split(" "))
It manually splits the word by spaces (and not by any other character), and then capitalizes the first letter of each token. It does this by separating that first letter out, capitalizing it, and then concatenating the rest of the word. I used a ternary if statement to avoid an IndexError if the word is only one letter long.
Use .capwords() from the string library.
import string
def make_cap(sentence):
return string.capwords(sentence)
Demo: https://repl.it/repls/BlankMysteriousMenus
I found this method to be very helpful for formatting all different types of texts as titles.
from string import capwords
text = "I can't go to the USA due to budget concerns"
title = ' '.join([capwords(w) if w.islower() else w for w in text.split()])
print(title) # I Can't Go To The USA Due To Budget Concerns
I'm writing a program in python that'll split off the contents after the last space in a string. e.g. if a user enters "this is a test", I want it to return "test". I'm stuck on how to do this?
Easy and efficient with str.rsplit.
>>> x = 'this is a test'
>>> x.rsplit(None, 1)[-1] # Splits at most once from right on whitespace runs
'test'
Alternate:
>>> x.rpartition(' ')[-1] # Splits on the first actual space found
'test'
string = "this is a test"
lastWord = string.rsplit()[-1]
print lastWord
'test'
The fastest and most efficient way:
>>> "this is a test".rpartition(' ')[-1]
'test'
>>> help(str.rpartition)
Help on method_descriptor:
rpartition(...)
S.rpartition(sep) -> (head, sep, tail)
Search for the separator sep in S, starting at the end of S, and return
the part before it, the separator itself, and the part after it. If the
separator is not found, return two empty strings and S.
string= "im fine.gds how are you"
if '.gds' or '.cdl' in string :
a=string.split("????????")
the above string may contain .gds or .cdl extension. I want to split the string based on the extension.
here how the parameters can be passed to split function.(EX if .gds is present in string then it should take as split(".gds")
if .cdl is there in string then it should get split(".cdl"))
I think you have to split the if statements:
if '.gds' in string:
a = string.split('.gds')
elif '.cdl' in string:
a = string.split('.cdl')
else:
a = string # this is a fallback in case none of the patterns is in the string
Furthermore, your in statement is incorrect; it should have been
if '.gds' in string or '.cdl' in string:
Note that this solution assumes that only one of the patterns will be in the string. If both patterns can occur on the same string, see Vikas's answer.
Use regular expression module re to split either by pattern1 or pattern2
import re
re.split('\.gds|\.cdl', your_string)
Example:
>>> re.split('\.gds|\.cdl', "im fine.gds how are you")
['im fine', ' how are you']
>>> re.split('\.gds|\.cdl', "im fine.cdl how are you")
['im fine', ' how are you']
>>> re.split('\.gds|\.cdl', "im fine.cdl how are.gds you")
['im fine', ' how are', ' you']
You can try to define a function like:
def split_on_extensions(string, *extensions):
for ext in extensions:
if ext in string:
return string.split(ext)
return string
Of course, the order in which you give the extensions is critical, as you'll split on the first one...
Are you guaranteed that one of the two of them will be there?
a = next( string.split(v) for v in ('.gds','.cdl') if v in string )
If you're not positive it will be there, you can catch the StopIteration that is raised in next:
try:
a = next( string.split(v) for v in ('.gds','.cdl') if v in string )
except StopIteration:
a = string #????
Another option is to use the BIF str.partition. This is how it works:
sring= "im fine.gds how are you"
three_parts_of_sring = sring.partition('.gds')
>>> three_parts_of_sring
('im fine', '.gds', ' how are you')
Put it into a little function and your set.
The tags is captured into the first backreference. The question mark in the regex makes the star lazy, to make sure it stops before the first closing tag, rather than before the last, like a greedy star would do.
This regex will not properly match tags nested inside themselves, like in <TAG>one<TAG>two</TAG>one</TAG>.
You can iter separators:
string= "im fine.gds how are you"
separators = ['.gds', '.cdl']
for separator in separators:
if separator in string:
a = string.split(separator)
break
else:
a = []
I am working with a text where all "\n"s have been deleted (which merges two words into one, like "I like bananasAnd this is a new line.And another one.") What I would like to do now is tell Python to look for combinations of a small letter followed by capital letter/punctuation followed by capital letter and insert a whitespace.
I thought this would be easy with reg. expressions, but it is not - I couldnt find an "insert" function or anything, and the string commands seem not to be helpful either. How do I do this?
Any help would be greatly appreciated, I am despairing over here...
Thanks, patrick
Try the following:
re.sub(r"([a-z\.!?])([A-Z])", r"\1 \2", your_string)
For example:
import re
lines = "I like bananasAnd this is a new line.And another one."
print re.sub(r"([a-z\.!?])([A-Z])", r"\1 \2", lines)
# I like bananas And this is a new line. And another one.
If you want to insert a newline instead of a space, change the replacement to r"\1\n\2".
Using re.sub you should be able to make a pattern that grabs a lowercase and uppercase letter and substitutes them for the same two letters, but with a space in between:
import re
re.sub(r'([a-z][.?]?)([A-Z])', '\\1\n\\2', mystring)
You're looking for the sub function. See http://docs.python.org/library/re.html for documentation.
Hmm, interesting. You can use regular expressions to replace text with the sub() function:
>>> import re
>>> string = 'fooBar'
>>> re.sub(r'([a-z][.!?]*)([A-Z])', r'\1 \2', string)
'foo Bar'
If you really don't have any caps except at the beginning of a sentence, it will probably be easiest to just loop through the string.
>>> import string
>>> s = "a word endsA new sentence"
>>> lastend = 0
>>> sentences = list()
>>> for i in range(0, len(s)):
... if s[i] in string.uppercase:
... sentences.append(s[lastend:i])
... lastend = i
>>> sentences.append(s[lastend:])
>>> print sentences
['a word ends', 'A new sentence']
Here's another approach, which avoids regular expressions and does not use any imported libraries, just built-ins...
s = "I like bananasAnd this is a new line.And another one."
with_whitespace = ''
last_was_upper = True
for c in s:
if c.isupper():
if not last_was_upper:
with_whitespace += ' '
last_was_upper = True
else:
last_was_upper = False
with_whitespace += c
print with_whitespace
Yields:
I like bananas And this is a new line. And another one.