Splitting strings in Python, but with spaces in substrings

Splitting strings in Python, but with spaces in substrings - python

I have a string that I want to split into a list of certain types. For example, I want to split Starter Main Course Dessert to [Starter, Main Course, Dessert]
I cannot use split() because it will split up the Main Course type. How can I do the splitting? Is regex needed?

If you have a list of acceptable words, you could use a regex union :
import re
acceptable_words = ['Starter', 'Main Course', 'Dessert', 'Coffee', 'Aperitif']
pattern = re.compile("("+"|".join(acceptable_words)+")", re.IGNORECASE)
# "(Starter|Main Course|Dessert|Coffee|Aperitif)"
menu = "Starter Main Course NotInTheList dessert"
print pattern.findall(menu)
# ['Starter', 'Main Course', 'dessert']
If you just want to specify which special substrings should be matched, you could use :
acceptable_words = ['Main Course', '\w+']

I think it's more practical to specify 'special' two-words tokens only.
special_words = ['Main Course', 'Something Special']
sentence = 'Starter Main Course Dessert Something Special Date'
words = sentence.split(' ')
for i in range(len(words) - 1):
try:
idx = special_words.index(str(words[i]) + ' ' + words[i+1])
words[i] = special_words[idx]
words[i+1] = None
except ValueError:
pass
words = list(filter(lambda x: x is not None, words))
print(words)

Related

regex match word and what comes after it

I need some help with a regex I am writing. I have a list of words that I want to match and words that might come after them (words meaning [A-Za-z/\s]+) I.e no parenthesis,symbols, numbers.
words = ['qtr','hard','quarter'] # keywords that must exist
test=['id:12345 cli hard/qtr Mix',
'id:12345 cli qtr 90%',
'id:12345 cli hard (red)',
'id:12345 cli hard work','Hello world']
excepted output is
['hard/qtr Mix', 'qtr', 'hard', 'hard work', None]
What I have tried so far
re.search(r'((hard|qtr|quarter)(?:[[A-Za-z/\s]+]))',x,re.I)

The problem with the pattern you have i.e.'((hard|qtr|quarter)(?:[[A-Za-z/\s]+]))', you have \s inside squared brackets [] which means to match the characters individually i.e. either \ or s, instead, you can just use space character i.e.
You can join all the words in words list by | to create the pattern '((qtr|hard|quarter)([a-zA-Z/ ]*))', then search for the pattern in each of strings in the list, if the match is found, take the group 0 and append it to the resulting list, else, append None:
pattern = re.compile('(('+'|'.join(words)+')([a-zA-Z/ ]*))')
result = []
for x in test:
groups = pattern.search(x)
if groups:
result.append(groups.group(0))
else:
result.append(None)
OUTPUT:
result
['hard/qtr Mix', 'qtr ', 'hard ', 'hard work', None]
And since you are including the space characters, you may end up with some values that has space at the end, you can just strip off the white space characters later.

Idea extracted from the existing answer and made shorter :
>>> pattern = re.compile('(('+'|'.join(words)+')([a-zA-Z/ ]*))')
>>> [pattern.search(x).group(0) if pattern.search(x) else None for x in test])
['hard/qtr Mix', 'qtr ', 'hard ', 'hard work', None]
As mentioned in comment :
But it is quite inefficient, because it needs to search for same pattern twice, once for pattern.search(x).group(0) and the other one for if pattern.search(x), and list-comprehension is not the best way to go about in such scenarios.
We can try this to overcome that issue :
>>> [v.group(0) if v else None for v in (pattern.search(x) for x in test)]
['hard/qtr Mix', 'qtr ', 'hard ', 'hard work', None]

You can put all needed words in or expression and put your word definition after that
import re
words = ['qtr','hard','quarter']
regex = r"(" + "|".join(words) + ")[A-Za-z\/\s]+"
p = re.compile(regex)
test=['id:12345 cli hard/qtr Mix(qtr',
'id:12345 cli qtr 90%',
'id:12345 cli hard (red)',
'id:12345 cli hard work','Hello world']
for string in test:
result = p.search(string)
if result is not None:
print(p.search(string).group(0))
else:
print(result)
Output:
hard/qtr Mix
qtr
hard
hard work
None

Comparing strings to a text to set punctuation marks in the right places

So there's a text and phrases from this text we need to match punctuation marks to:
text = 'i like plums, apples, and donuts. if i had a donut, i would eat it'
phrases = [['apples and donuts'], ['a donut i would']]
The output I need is:
output = [['apples, and donuts'], ['a donut, i would']]
I'm a beginner, so I was thinking about using .replace() but I don't know how to slice a string and access the exact part I need from the text. Could you help me with that? (I'm not allowed to use any libraries)

You can try regex for that
import re
text = 'i like plums, apples, and donuts. if i had a donut, i would eat it'
phrases = [['apples and donuts'], ['a donut i would']]
print([re.findall(i[0].replace(" ", r"\W*"), text) for i in phrases])
Output
[['apples, and donuts'], ['a donut, i would']]
By iterating over the phrases list and replacing the space with \W* the regex findall method will be able to detect the search word and ignoring the punctuation.

you can remove all the punctuation in the text and then just use the plain substring search. Your only problem then is how to restore, or to map, the text you've found to the original.
You can do it by remembering the original position in the text of each letter that you keep in the search text. Here's an example. I just removed the nested list around each phrase as it looks useless, you can easily account for it if you need.
from pprint import pprint
text = 'i like plums, apples, and donuts. if i had a donut, i would eat it'
phrases = ['apples and donuts', 'a donut i would']
def find_phrase(text, phrases):
clean_text, indices = prepare_text(text)
res = []
for phr in phrases:
i = clean_text.find(phr)
if i != -1:
res.append(text[indices[i] : indices[i+len(phr)-1]+1])
return res
def prepare_text(text, punctuation='.,;!?'):
s = ''
ind = []
for i in range(len(text)):
if text[i] not in punctuation:
s += text[i]
ind.append(i)
return s, ind
if __name__ == "__main__":
pprint(find_phrase(text, phrases))
['apples, and donuts.', 'a donut, i would']

I'm trying to solve for the other Acronyms

I have the following strings that I need to make acronyms for:
Institute of Electrical and Electronics Engineers
As Soon As Possible
University of California San Diego
Self Contained Underwater Breathing Apparatus
This is my code
my_string = input()
my_string2 = my_string.upper()
for x in range(0, 1, len(my_string2)):
print(my_string2[0::15])
but it only worked for the first input. There are three more examples that this code doesn't cover. What I need is for this code to be modified in such a way where it will create an Acronym out of any input.The first Acronym is called "Institute of Electrical and Electronics Engineers" and once it's placed into the input it returns IEEE as the output. Basically all of the first letters that are capitalized are kept and no lower cased words remain.

I'm new to programming so I bet the way I did it is a bit funky, but this worked for me on my zybooks lab:
Name = input()
AStart = Name.split()
AFinal = ''
for string in AStart:
if string[0].isupper():
AFinal += string[0] + '.'
print(AFinal)

Here's a regex based solution which looks for words that start with a capital letter and extracts their starting letter, then joins all them together to make the acronym:
import re
strings = [
'As Soon As Possible',
'Institute of Electrical and Electronics Engineers',
'University of California San Diego',
'Self Contained Underwater Breathing Apparatus'
]
for s in strings:
acronym = ''.join(re.findall(r'\b[A-Z]', s))
print(acronym)
If you don't want to use regex, you can just split the strings and test the first character of each word to see if it is uppercase:
for s in strings:
acronym = ''.join(w[0] for w in s.split(' ') if w[0].isupper())
print(acronym)
In either case the output is:
ASAP
IEEE
UCSD
SCUBA
To run from input, use this code:
import re
s = input()
acronym = ''.join(re.findall(r'\b[A-Z]', s))
print(acronym)
Or:
s = input()
acronym = ''.join(w[0] for w in s.split(' ') if w[0].isupper())
print(acronym)
Demo on ideone.com

try this:
full_string = input("Enter Text: ")
string_list = full_string.split()
acronym = ""
for string in string_list:
acronym += f"{string[0].upper()}"
print(acronym)
output:
Enter Text: This is a long string please be kind
TIALSPBK

This is what I used.
phrase = str(input()).rstrip() #gets the phrase and makes string sanitized
for char in phrase: #goes through every char
x = char #Did this to make it easier to keep track
if x.isupper() == True: #The char loop check if the value is true or not
print(x, end='') #print the true uppercase, end print on 1 line.

Python - How to use re.finditer with multiple patterns

I want to search 3 Words in a String and put them in a List
something like:
sentence = "Tom once got a bike which he had left outside in the rain so it got rusty"
pattern = ['had', 'which', 'got' ]
and the answer should look like:
['got', 'which','had','got']
I haven't found a way to use re.finditer in such a way. Sadly im required to use finditer
rather that findall

You can build the pattern from your list of searched words, then build your output list with a list comprehension from the matches returned by finditer:
import re
sentence = "Tom once got a bike which he had left outside in the rain so it got rusty"
pattern = ['had', 'which', 'got' ]
regex = re.compile(r'\b(' + '|'.join(pattern) + r')\b')
# the regex will be r'\b(had|which|got)\b'
out = [m.group() for m in regex.finditer(sentence)]
print(out)
# ['got', 'which', 'had', 'got']

The idea is to combine the entries of the pattern list to form a regular expression with ors.
Then, you can use the following code fragment:
import re
sentence = 'Tom once got a bike which he had left outside in the rain so it got rusty. ' \
'Luckily, Margot and Chad saved money for him to buy a new one.'
pattern = ['had', 'which', 'got']
regex = re.compile(r'\b({})\b'.format('|'.join(pattern)))
# regex = re.compile(r'\b(had|which|got)\b')
results = [match.group(1) for match in regex.finditer(sentence)]
print(results)
The result is ['got', 'which', 'had', 'got'].

Replace a word in a String by indexing without "string replace function" -python

Is there a way to replace a word within a string without using a "string replace function," e.g., string.replace(string,word,replacement).
[out] = forecast('This snowy weather is so cold.','cold','awesome')
out => 'This snowy weather is so awesome.
Here the word cold is replaced with awesome.
This is from my MATLAB homework which I am trying to do in python. When doing this in MATLAB we were not allowed to us strrep().
In MATLAB, I can use strfind to find the index and work from there. However, I noticed that there is a big difference between lists and strings. Strings are immutable in python and will likely have to import some module to change it to a different data type so I can work with it like how I want to without using a string replace function.

just for fun :)
st = 'This snowy weather is so cold .'.split()
given_word = 'awesome'
for i, word in enumerate(st):
if word == 'cold':
st.pop(i)
st[i - 1] = given_word
break # break if we found first word
print(' '.join(st))

Here's another answer that might be closer to the solution you described using MATLAB:
st = 'This snow weather is so cold.'
given_word = 'awesome'
word_to_replace = 'cold'
n = len(word_to_replace)
index_of_word_to_replace = st.find(word_to_replace)
print st[:index_of_word_to_replace]+given_word+st[index_of_word_to_replace+n:]

You can convert your string into a list object, find the index of the word you want to replace and then replace the word.
sentence = "This snowy weather is so cold"
# Split the sentence into a list of the words
words = sentence.split(" ")
# Get the index of the word you want to replace
word_to_replace_index = words.index("cold")
# Replace the target word with the new word based on the index
words[word_to_replace_index] = "awesome"
# Generate a new sentence
new_sentence = ' '.join(words)

Using Regex and a list comprehension.
import re
def strReplace(sentence, toReplace, toReplaceWith):
return " ".join([re.sub(toReplace, toReplaceWith, i) if re.search(toReplace, i) else i for i in sentence.split()])
print(strReplace('This snowy weather is so cold.', 'cold', 'awesome'))
Output:
This snowy weather is so awesome.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Splitting strings in Python, but with spaces in substrings - python

I have a string that I want to split into a list of certain types. For example, I want to split Starter Main Course Dessert to [Starter, Main Course, Dessert] I cannot use split() because it will split up the Main Course type. How can I do the splitting? Is regex needed?

Related

regex match word and what comes after it

Comparing strings to a text to set punctuation marks in the right places

I'm trying to solve for the other Acronyms

Python - How to use re.finditer with multiple patterns

Replace a word in a String by indexing without "string replace function" -python

Categories

Resources