Python 3: Split string under certain condition

Python 3: Split string under certain condition - python

I have difficulties splitting a string into specific parts in Python 3.
The string is basically a list with a colon (:) as a delimiter.
Only when the colon (:) is prefixed with a backslash (\), it does
not count as a delimiter but part of the list item.
Example:
String --> I:would:like:to:find\:out:how:this\:works
Converted List --> ['I', 'would', 'like', 'to', 'find\:out', 'how', 'this\:works']
Any idea how this could work?
#Bertrand I was trying to give you some code and I was able to figure out a workaround but this is probably not the most beautiful solution
text = "I:would:like:to:find\:out:how:this\:works"
values = text.split(":")
new = []
concat = False
temp = None
for element in values:
# when one element ends with \\
if element.endswith("\\"):
temp = element
concat = True
# when the following element ends with \\
# concatenate both before appending them to new list
elif element.endswith("\\") and temp is not None:
temp = temp + ":" + element
concat = True
# when the following element does not end with \\
# append and set concat to False and temp to None
elif concat is True:
new.append(temp + ":" + element)
concat = False
temp = None
# Append element to new list
else:
new.append(element)
print(new)
Output:
['I', 'would', 'like', 'to', 'find\\:out', 'how', 'this\\:works']

You should use re.split and perform a negative lookbehind to check for the backslash character.
import re
pattern = r'(?<!\\):'
s = 'I:would:like:to:find\:out:how:this\:works'
print(re.split(pattern, s))
Output:
['I', 'would', 'like', 'to', 'find\\:out', 'how', 'this\\:works']

You can replace the ":\" with something (just make sure that this is something that doesn`t exist in the string in other place... you can use a long term or something), and than split by ":" and replace it back.
[x.replace("$","\:") for x in str1.replace("\:","$").split(":")]
Explanation:
str1 = 'I:would:like:to:find\:out:how:this\:works'
Replace ":" with "$" (or something else):
str1.replace("\:","$")
Out: 'I:would:like:to:find$out:how:this$works'
Now split by ":"
str1.replace("\:","$").split(":")
Out: ['I', 'would', 'like', 'to', 'find$out', 'how', 'this$works']
and replace "$" with ":" for every element:
[x.replace("$","\:") for x in str1.replace("\:","$").split(":")]
Out: ['I', 'would', 'like', 'to', 'find\\:out', 'how', 'this\\:works']

Use re.split
Ex:
import re
s = "I:would:like:to:find\:out:how:this\:works"
print( re.split(r"(?<=\w):", s) )
Output:
['I', 'would', 'like', 'to', 'find\\:out', 'how', 'this\\:works']

Related

Remove a specifc repeated word using python regex? [duplicate]

This question already has answers here:
Removing duplicates in lists
(56 answers)
Closed 1 year ago.
I have a string like :
'hi', 'what', 'are', 'are', 'what', 'hi'
I want to remove a specific repeated word. For example:
'hi', 'what', 'are', 'are', 'what'
Here, I am just removing the repeated word of hi, and keeping rest of the repeated words.
How to do this using regex?

Regex is used for text search. You have structured data, so this is unnecessary.
def remove_all_but_first(iterable, removeword='hi'):
remove = False
for word in iterable:
if word == removeword:
if remove:
continue
else:
remove = True
yield word
Note that this will return an iterator, not a list. Cast the result to list if you need it to remain a list.

You can do this
import re
s= "['hi', 'what', 'are', 'are', 'what', 'hi']"
# convert string to list. Remove first and last char, remove ' and empty spaces
s=s[1:-1].replace("'",'').replace(' ','').split(',')
remove = 'hi'
# store the index of first occurance so that we can add it after removing all occurance
firstIndex = s.index(remove)
# regex to remove all occurances of a word
regex = re.compile(r'('+remove+')', flags=re.IGNORECASE)
op = regex.sub("", '|'.join(s)).split('|')
# clean up the list by removing empty items
while("" in op) :
op.remove("")
# re-insert the removed word in the same index as its first occurance
op.insert(firstIndex, remove)
print(str(op))

You don't need regex for that, convert the string to list and then you can find the index of the first occurrence of the word and filter it from a slice of the rest of the list
lst = "['hi', 'what', 'are', 'are', 'what', 'hi']"
lst = ast.literal_eval(lst)
word = 'hi'
index = lst.index('hi') + 1
lst = lst[:index] + [x for x in lst[index:] if x != word]
print(lst) # ['hi', 'what', 'are', 'are', 'what']

How Can I Remove Newline and Add All Words To a List

I have a txt file it contains 4 lines. (like a poem)
The thing that I want is to add all words to one list.
For example the poem like this :
I am done with you,
Don't love me anymore
I want it like this : ['I', 'am', 'done', 'with', 'you', 'dont', 'love', 'me', 'anymore']
But I can not remove the row end of the first sentence it gives me 2 separated list.
romeo = open(r'd:\romeo.txt')
list = []
for line in romeo:
line = line.rstrip()
line = line.split()
list = list + [line]
print(list)

with open(r'd:\romeo.txt', 'r') as msg:
data = msg.read().replace("\n"," ")
data = [x for x in data.split() if x.strip()]

Even shorter:
with open(r'd:\romeo.txt', 'r') as msg:
list = " ".join(msg.split()).split(' ')
Or with removing the comma:
with open(r'd:\romeo.txt', 'r') as msg:
list = " ".join(msg.replace(',', ' ').split()).split(' ')

You can use regular expresion like this.
import re
poem = '' # your poem
split = re.split(r'\040|\n', poem)
print(split)
Regular expresion \040 is for white space an \n to match a new line.
The output is:
['I', 'am', 'done', 'with', 'you,', "Don't", 'love', 'me', 'anymore']

How to find all words in a string that begin with an uppercase letter, for multiple strings in a list

I have a list of strings, each string is about 10 sentences. I am hoping to find all words from each string that begin with a capital letter. Preferably after the first word in the sentence. I am using re.findall to do this. When I manually set the string = '' I have no trouble do this, however when I try to use a for loop to loop over each entry in my list I get a different output.
for i in list_3:
string = i
test = re.findall(r"(\b[A-Z][a-z]*\b)", string)
print(test)
output:
['I', 'I', 'As', 'I', 'University', 'Illinois', 'It', 'To', 'It', 'I', 'One', 'Manu', 'I', 'I', 'Once', 'And', 'Through', 'I', 'I', 'Most', 'Its', 'The', 'I', 'That', 'I', 'I', 'I', 'I', 'I', 'I']
When I manually input the string value
txt = 0
for i in list_3:
string = list_3[txt]
test = re.findall(r"(\b[A-Z][a-z]*\b)", string)
print(test)
output:
['Remember', 'The', 'Common', 'App', 'Do', 'Your', 'Often', 'We', 'Monica', 'Lannom', 'Co', 'Founder', 'Campus', 'Ventures', 'One', 'Break', 'Campus', 'Ventures', 'Universities', 'Undermatching', 'Stanford', 'Yale', 'Undermatching', 'What', 'A', 'Yale', 'Lannom', 'There', 'During', 'Some', 'The', 'Lannom', 'That', 'It', 'Lannom', 'Institutions', 'University', 'Chicago', 'Boston', 'College', 'These', 'Students', 'If', 'Lannom', 'Recruiting', 'Elite', 'Campus', 'Ventures', 'Understanding', 'Campus', 'Ventures', 'The', 'For', 'Lannom', 'What', 'I', 'Wish', 'I', 'Knew', 'Before', 'Starting', 'Company', 'I', 'Even', 'I', 'Lannom', 'The', 'There']
But I can't seem to write a for loop that correctly prints the output for each of the 5 items in the list. Any ideas?

The easiest way yo do that is to write a for loop which checks whether the first letter of an element of the list is capitalized. If it is, it will be appended to the output list.
output = []
for i in list_3:
if i[0] == i[0].upper():
output.append(i)
print(output)
We can also use the list comprehension and made that in 1 line. We are also checking whether the first letter of an element is the capitalized letter.
output = [x for x in list_3 if x[0].upper() == x[0]]
print(output)
EDIT
You want to place the sentence as an element of a list so here is the solution. We iterate over the list_3, then iterate for every word by using the split() function. We are thenchecking whether the word is capitalized. If it is, it is added to an output.
list_3 = ["Remember your college application process? The tedious Common App applications, hours upon hours of research, ACT/SAT, FAFSA, visiting schools, etc. Do you remember who helped you through this process? Your family and guidance counselors perhaps, maybe your peers or you may have received little to no help"]
output = []
for i in list_3:
for j in i.split():
if j[0].isupper():
output.append(j)
print(output)

Assuming sentences are separated by one space, you could use re.findall with the following regular expression.
r'(?m)(?<!^)(?<![.?!] )[A-Z][A-Za-z]*'
Start your engine! | Python code
Python's regex engine performs the following operations.
(?m) : set multiline mode so that ^ and $ match the beginning
and the end of a line
(?<!^) : negative lookbehind asserts current location is not
at the beginning of a line
(?<![.?!] ) : negative lookbehind asserts current location is not
preceded by '.', '?' or '!', followed by a space
[A-Z] : match an uppercase letter
[A-Za-z]* : match 1+ letters
If sentences can be separated by one or two spaces, insert the negative lookbehind (?<![.?!] ) after (?<![.?!] ).
If the PyPI regex module were used, one could use the variable-length lookbehind (?<![.?!] +)

As i understand, you have list like this:
list_3 = [
'First sentence. Another Sentence',
'And yet one another. Sentence',
]
You are iterating over the list but every iteration overrides test variable, thus you have incorrect result. You eihter have to accumulate result inside additional variable or print it right away, every iteration:
acc = []
for item in list_3:
acc.extend(re.findall(regexp, item))
print(acc)
or
for item in list_3:
print(re.findall(regexp, item))
As for regexp, that ignores first word in the sentence, you can use
re.findall(r'(?<!\A)(?<!\.)\s+[A-Z]\w+', s)
(?<!\A) - not the beginning of the string
(?<!\.) - not the first word after dot
\s+ - optional spaces after dot.
You'll receive words potentialy prefixed by space, so here's final example:
acc = []
for item in list_3:
words = [w.strip() for w in re.findall(r'(?<!\A)(?<!\.)\s+[A-Z]\w+', item)]
acc.extend(words)
print(acc)

as I really like regexes, try this one:
#!/bin/python3
import re
PATTERN = re.compile(r'[A-Z][A-Za-z0-9]*')
all_sentences = [
"My House! is small",
"Does Annie like Cats???"
]
def flat_list(sentences):
for sentence in sentences:
yield from PATTERN.findall(sentence)
upper_words = list(flat_list(all_sentences))
print(upper_words)
# Result: ['My', 'House', 'Does', 'Annie', 'Cats']

Turn a list, that is in another list, into a string, then reverse the string

I'm new to programming in Python (and programming in general) and we were asked to develop a function to encrypt a string by rearranging the text. We were given this as a test:
encrypt('THE PRICE OF FREEDOM IS ETERNAL VIGILENCE', 5)
'SI MODEERF FO ECIRP EHT ECNELIGIV LANRETE'
We have to make sure it works for any string of any length though. I got as far as this before getting stuck:
##Define encrypt
def encrypt(text, encrypt_value):
##Split string into list
text_list = text.split()
##group text_list according to encrypt_value
split_list = [text_list[index:index+encrypt_value] for index in xrange\
(0, len(text_list), encrypt_value)]
If I printed the result now, this would give me:
encrypt("I got a jar of dirt and you don't HA", 3)
[['I', 'got', 'a'], ['jar', 'of', 'dirt'], ['and', 'you', "don't"], ['HA']]
So I need to combine each of the lists in the list into a string (which I think is ' '.join(text)?), reverse it with [::-1], before joining the whole thing together into one string. But how in the world do I do that?

To combine your elements, you can try to using reduce:
l = [['I', 'got', 'a'], ['jar', 'of', 'dirt'], ['and', 'you', "don't"], ['HA']]
str = reduce(lambda prev,cur: prev+' '+reduce(lambda subprev,word: subprev+' '+word,cur, ''), l, '')
It will result in:
" I got a jar of dirt and you don't HA"
If you want to remove extra spaces:
str.replace(' ',' ').strip()
This reduce use can be easily modified to reverse each sublist right before combining their elements:
str = reduce(lambda prev,cur: prev+' '+reduce(lambda subprev,word: subprev+' '+word,cur[::-1], ''), l, '')
Or to reverse the combined substrings just before joining all together:
str = reduce(lambda prev,cur: prev+' '+reduce(lambda subprev,word: subprev+' '+word,cur, '')[::-1], l, '')

You can do what you're looking for fairly simply with a few nested list comprehensions.
For example, you already have
split_list = [['I', 'got', 'a'], ['jar', 'of', 'dirt'], ['and', 'you', "don't"], ['HA']]
What you want now is to reverse each triplet of words with a list comprehension, e.g. like so:
reversed_sublists = [sublist[::-1] for sublist in split_list]
// [['a', 'got', 'I'], ['dirt', 'of', 'jar'], ["don't", 'you', 'and'], ['HA']]
Then reverse each string in each sublist
reversed_strings = [[substr[::-1] for substr in sublist] for sublist in split_list]
// [['a', 'tog', 'I'], ['trid', 'fo', 'raj'], ["t'nod", 'uoy', 'dna'], ['AH']]
And then join them all up, as you said, with ' '.join(), e.g.
' '.join([' '.join(sublist) for sublist in reversed_strings])
// "a tog I trid fo raj t'nod uoy dna AH"
But nothing says you can't just do all those things at the same time with some nesting:
' '.join([' '.join([substring[::-1] for substring in sublist[::-1]]) for sublist in split_list])
// "a tog I trid fo raj t'nod uoy dna AH"
I personally prefer the aesthetic of this (and the fact you don't need to go back to strip spaces), but I'm not sure whether it performs better than Pablo's solution.

b = [['I', 'got', 'a'], ['jar', 'of', 'dirt'], ['and', 'you', "don't"], ['HA']]
print "".join([j[::-1]+' ' for i in b for j in reversed(i)])
a tog I trid fo raj t'nod uoy dna AH
Is this what you wanted...

Is there any reason you are trying to do it in one list comprehension?
It's probably easier to conceptualize (and implement) by breaking it down into parts:
def encrypt(text, encrypt_value):
reversed_words = [w[::-1] for w in text.split()]
rearranged_words = reversed_words[encrypt_value:] + reversed_words[:encrypt_value]
return ' '.join(rearranged_words[::-1])
Example output:
In [6]: encrypt('THE PRICE OF FREEDOM IS ETERNAL VIGILENCE', 5)
Out[6]: 'SI MODEERF FO ECIRP EHT ECNELIGIV LANRETE'

python: how to replace a string in a list of string with a list of strings?

okay here is the example:
data = ['This', 'is', 'a', 'test', 'of', 'the', 'list']
replaceText = 'test'
replaceData =['new', 'test']
i did data.replace(replaceText, replaceData) but it doesn't work. How to replace a string in a list of string with a list of strings? Any help will be appreciated.
Edit:
The exact condition is to replace or split the words that contain "s" so I put a loop in it. So the end result will print
data = ['Thi', 'i', 'a', 'te','t', 'of', 'the', 'li','t']

In a list, find the position of text with .index(), then replace by using slice assignment:
pos = data.index(replaceText)
data[pos:pos+1] = replaceData
This will replace only one occurrence of replaceText at a time. Demo:
>>> data = ['This', 'is', 'a', 'test', 'of', 'the', 'list']
>>> replaceText = 'test'
>>> replaceData =['new', 'test']
>>> pos = data.index(replaceText)
>>> data[pos:pos+1] = replaceData
To replace all occurences, use pos plus the length of replaceData to skip searching past the previous match:
pos = 0
while True:
try:
pos = data.index(replaceText, pos)
except ValueError:
break
data[pos:pos+1] = replaceData
pos += len(replaceData)
If you need to loop over data while modifying it, use a copy instead:
for n in data[:]:
# maniplate data

You can use list's index() method to find the position p of replaceText:
p = data.index(replaceText)
and then use the construct
data[start:end] = another_list
to replace elements from p to p+1 (end is not inclusive) with replaceData:
data[p:p+1] = replaceData
Note that index() throws ValueError if replaceText does not exist in data:
try:
p = data.index(replaceText)
data[p:p+1] = replaceData
except ValueError:
# replaceText is not present in data, handle appropriately.

yah, the actually condition needs me to replace or split any string
that contains the character 's', say 'test' will be replaced by 'te'
and 't' to the list
from itertools import chain
data = ['This', 'is', 'a', 'test', 'of', 'the', 'list']
>>> filter(None, chain.from_iterable(el.split('s') for el in data))
['Thi', 'i', 'a', 'te', 't', 'of', 'the', 'li', 't']

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python 3: Split string under certain condition - python

You should use re.split and perform a negative lookbehind to check for the backslash character. import re pattern = r'(?<!\\):' s = 'I:would:like:to:find\:out:how:this\:works' print(re.split(pattern, s)) Output: ['I', 'would', 'like', 'to', 'find\\:out', 'how', 'this\\:works']

Use re.split Ex: import re s = "I:would:like:to:find\:out:how:this\:works" print( re.split(r"(?<=\w):", s) ) Output: ['I', 'would', 'like', 'to', 'find\\:out', 'how', 'this\\:works']

Related

Remove a specifc repeated word using python regex? [duplicate]

How Can I Remove Newline and Add All Words To a List

How to find all words in a string that begin with an uppercase letter, for multiple strings in a list

Turn a list, that is in another list, into a string, then reverse the string

python: how to replace a string in a list of string with a list of strings?

Categories

Resources