Python Split Strings While Preserving Order?

Python Split Strings While Preserving Order? - python

I have a list of strings in python, where I need to preserve order and split some strings.
The condition to split a string is that after first match of : there is a none space/new line/tab char.
For example, this must be split:
example: Test to ['example':, 'Test']
While this stays the same: example: , IGNORE_ME_EXAMPLE
Given an input like this:
['example: Test', 'example: ', 'IGNORE_ME_EXAMPLE']
I'm expecting:
['example:', 'Test', 'example: ', 'IGNORE_ME_EXAMPLE']
Please Note that split strings are yet stick to each other and follow original order.
Plus, whenever I split a string I don't want to check split parts again. In other words, I don't want to check 'Test' after I split it.
To make it more clear, Given an input like this:
['example: Test::YES']
I'm expecting:
['example:', 'Test::YES']

You can use regular expressions for that:
import re
pattern = re.compile(r"(.+:)\s+([^\s].+)")
result = []
for line in lines:
match = pattern.match(line)
if match:
result.append(match.group(1))
result.append(match.group(2))
else:
result.append(line)

You can use nested loop comprehension for the input list:
l = ['example: Test::YES']
l1 = [j.lower().strip() for i in l for j in i.split(":", 1) if j.strip().lower() != '']
print(l1)
Output:
['example', 'Test::YES']

you need to iterate over your list of words, for each word, you need to check if : present or not. if present the then split the word in 2 parts, pre : and post part. append these pre and post to final list and if there is no : in word add that word in the result list and skip other operation for that word
# your code goes here
wordlist = ['example:', 'Test', 'example: ', 'IGNORE_ME_EXAMPLE']
result = []
for word in wordlist:
index = -1
part1, part2 = None, None
if ':' in word:
index = word.index(':')
else:
result.append(word)
continue
part1, part2 = word[:index+1], word[index+1:]
if part1 is not None and len(part1)>0:
result.append(part1)
if part2 is not None and len(part2)>0:
result.append(part2)
print(result)
output
['example:', 'Test', 'example:', ' ', 'IGNORE_ME_EXAMPLE']

Related

Python - Trying to replace words in a list of strings but having problems with single letter words

I have a list of strings such as
words = ['Twinkle Twinkle', 'How I wonder']
I am trying to create a function that will find and replace words in the original list and I was able to do that except for when the user inputs single letter words such as 'I' or 'a' etc.
current function
def sub(old: string, new: string, words: list):
words[:] = [w.replace(old, new) for w in words]
if input for old = 'I'
and new = 'ASD'
current output = ['TwASDnkle TwASDnkle', 'How ASD wonder']
intended output = ['Twinkle Twinkle', 'How ASD wonder']
This is my first post here and I have only been learning python for a few months now so I would appreciate any help, thank you

Don't use str.replace in a loop. This often doesn't do what is expected as it doesn't work on words but on all matches.
Instead, split the words, replace on match and join:
l = ['Twinkle Twinkle', 'How I wonder']
def sub(old: str, new: str, words: list):
words[:] = [' '.join(new if w==old else w for w in x.split()) for x in words]
sub('I', 'ASD', l)
Output: ['Twinkle Twinkle', 'How ASD wonder']
Or use a regex with word boundaries:
import re
def sub(old, new, words):
words[:] = [re.sub(fr'\b{re.escape(old)}\b', new, w) for w in words]
l = ['Twinkle Twinkle', 'How I wonder']
sub('I', 'ASD', l)
# ['Twinkle Twinkle', 'How ASD wonder']
NB. As #re-za pointed out, it might be a better practice to return a new list rather than mutating the input, just be aware of it

It seems like you are replacing letters and not words. I recommend splitting sentences (strings) into words by splitting strings by the ' ' (space char).
output = []
I would first get each string from the list like this:
for string in words:
I would then split the strings into a list of words like this:
temp_string = '' # a temp string we will use later to reconstruct the words
for word in string.split(' '):
Then I would check to see if the word is the one we are looking for by comparing it to old, and replacing (if it matches) with new:
if word == old:
temp_string += new + ' '
else:
temp_string += word + ' '
Now that we have each word reconstructed or replaced (if needed) back into a temp_string we can put all the temp_strings back into the array like this:
output.append(temp_string[:-1]) # [:-1] means we omit the space at the end
It should finally look like this:
def sub(old: string, new: string, words: list):
output = []
for string in words:
temp_string = '' # a temp string we will use later to reconstruct the words
for word in string.split(' '):
if word == old:
temp_string += new + ' '
else:
temp_string += word + ' '
output.append(temp_string[:-1]) # [:-1] means we omit the space at the end
return output

How to compare reverse strings in list of strings with the original list of strings in python?

Input a given string and check if any word in that string matches with its reverse in the same string then print that word else print $
I split the string and put the words in a list and then I reversed the words in that list. After that, I couldn't able to compare both the lists.
str = input()
x = str.split()
for i in x: # printing i shows the words in the list
str1 = i[::-1] # printing str1 shows the reverse of words in a new list
# now how to check if any word of the new list matches to any word of the old list
if(i==str):
print(i)
break
else:
print('$)
Input: suman is a si boy.
Output: is ( since reverse of 'is' is present in the same string)

You almost have it, just need to add another loop to compare each word against each inverted word. Try using the following
str = input()
x = str.split()
for i in x:
str1 = i[::-1]
for j in x: # <-- this is the new nested loop you are missing
if j == str1: # compare each inverted word against each regular word
if len(str1) > 1: # Potential condition if you would like to not include single letter words
print(i)
Update
To only print the first occurrence of a match, you could, in the second loop, only check the elements that come after. We can do this by keeping track of the index:
str = input()
x = str.split()
for index, i in enumerate(x):
str1 = i[::-1]
for j in x[index+1:]: # <-- only consider words that are ahead
if j == str1:
if len(str1) > 1:
print(i)
Note that I used index+1 in order to not consider single word palindromes a match.

a = 'suman is a si boy'
# Construct the list of words
words = a.split(' ')
# Construct the list of reversed words
reversed_words = [word[::-1] for word in words]
# Get an intersection of these lists converted to sets
print(set(words) & set(reversed_words))
will print:
{'si', 'is', 'a'}

Another way to do this is just in a list comprehension:
string = 'suman is a si boy'
output = [x for x in string.split() if x[::-1] in string.split()]
print(output)
The split on string creates a list split on spaces. Then the word is included only if the reverse is in the string.
Output is:
['is', 'a', 'si']
One note, you have a variable name str. Best not to do that as str is a Python thing and could cause other issues in your code later on.
If you want word more than one letter long then you can do:
string = 'suman is a si boy'
output = [x for x in string.split() if x[::-1] in string.split() and len(x) > 1]
print(output)
this gives:
['is', 'si']
Final Answer...
And for the final thought, in order to get just the 'is':
string = 'suman is a si boy'
seen = []
output = [x for x in string.split() if x[::-1] not in seen and not seen.append(x) and x[::-1] in string.split() and len(x) > 1]
print(output)
output is:
['is']
BUT, this is not necessarily a good way to do it, I don't believe. Basically you are storing information in seen during the list comprehension AND referencing that same list. :)

This answer wouldn't show you 'a' and won't output 'is' with 'si'.
str = input() #get input string
x = str.split() #returns list of words
y = [] #list of words
while len(x) > 0 :
a = x.pop(0) #removes first item from list and returns it, then assigns it to a
if a[::-1] in x: #checks if the reversed word is in the list of words
#the list doesn't contain that word anymore so 'a' that doesn't show twice wouldn't be returned
#and 'is' that is present with 'si' will be evaluated once
y.append(a)
print(y) # ['is']

Get substring between strings from a python list

How to get the content between strings &quot and autoRefresh which will be /commander/link/jobDetails/jobs/a2537f238-8622-11ee-a1a0-f0921c14c828? from a list as below, I just need the first match (there could be multiple matches).
['something', 'something', ' something top.window.location.href = "/commander/link/jobDetails/jobs/a2537f238-8622-11ee-a1a0-f0921c14c828?autoRefresh=0&s=Jobs";">','something']
Tried
link = re.search('"(.*?)autoRefresh', big_list)
print link.group(1)
and got TypeError: expected string or buffer

You need to iterate over the list, checking each string:
big_list = ['something', 'something', ' something top.window.location.href = "/commander/link/jobDetails/jobs/a2537f238-8622-11ee-a1a0-f0921c14c828?autoRefresh=0&s=Jobs";">','something']
def get_all_subs(lst, pat, grp=0):
patt = re.compile(pat)
for s in lst:
m = patt.search(s, grp)
if m:
yield m.group(grp)
print(list(get_all_subs(big_list, '"(.*?)autoRefresh', 1)))
Or call str.join on the list and use findall:
print(re.findall('"(.*?)autoRefresh', "".join(big_list)))

You may use the following:
re.search(r'(?<=&quot).*?(?=autoRefresh)', ''.join(YourList))

Iterating through a string word by word

I wanted to know how to iterate through a string word by word.
string = "this is a string"
for word in string:
print (word)
The above gives an output:
t
h
i
s
i
s
a
s
t
r
i
n
g
But I am looking for the following output:
this
is
a
string

When you do -
for word in string:
You are not iterating through the words in the string, you are iterating through the characters in the string. To iterate through the words, you would first need to split the string into words , using str.split() , and then iterate through that . Example -
my_string = "this is a string"
for word in my_string.split():
print (word)
Please note, str.split() , without passing any arguments splits by all whitespaces (space, multiple spaces, tab, newlines, etc).

This is one way to do it:
string = "this is a string"
ssplit = string.split()
for word in ssplit:
print (word)
Output:
this
is
a
string

for word in string.split():
print word

Using nltk.
from nltk.tokenize import sent_tokenize, word_tokenize
sentences = sent_tokenize("This is a string.")
words_in_each_sentence = word_tokenize(sentences)
You may use TweetTokenizer for parsing casual text with emoticons and such.

One way to do this is using a dictionary. The problem for the code above is it counts each letter in a string, instead of each word. To solve this problem, you should first turn the string into a list by using the split() method, and then create a variable counts each comma in the list as its own value. The code below returns each time a word appears in a string in the form of a dictionary.
s = input('Enter a string to see if strings are repeated: ')
d = dict()
p = s.split()
word = ','
for word in p:
if word not in d:
d[word] = 1
else:
d[word] += 1
print (d)

s = 'hi how are you'
l = list(map(lambda x: x,s.split()))
print(l)
Output: ['hi', 'how', 'are', 'you']

You can try this method also:
sentence_1 = "This is a string"
list = sentence_1.split()
for i in list:
print (i)

Extracting multiple substring from a string

I have a complicated string and would like to try to extract multiple substring from it.
The string consists of a set of items, separated by commas. Each item has an identifier (id-n) for a pair of words inside which is enclosed by brackets. I want to get only the word inside the bracket which has a number attached to its end (e.g. 'This-1'). The number actually indicates the position of how the words should be arrannged after extraction.
#Example of how the individual items would look like
id1(attr1, is-2) #The number 2 here indicates word 'is' should be in position 2
id2(attr2, This-1) #The number 1 here indicates word 'This' should be in position 1
id3(attr3, an-3) #The number 3 here indicates word 'an' should be in position 3
id4(attr4, example-4) #The number 4 here indicates word 'example' should be in position 4
id5(attr5, example-4) #This is a duplicate of the word 'example'
#Example of string - this is how the string with the items looks like
string = "id1(attr1, is-1), id2(attr2, This-2), id3(attr3, an-3), id4(attr4, example-4), id5(atttr5, example-4)"
#This is how the result should look after extraction
result = 'This is an example'
Is there an easier way to do this? Regex doesn't work for me.

A trivial/naive approach:
>>> z = [x.split(',')[1].strip().strip(')') for x in s.split('),')]
>>> d = defaultdict(list)
>>> for i in z:
... b = i.split('-')
... d[b[1]].append(b[0])
...
>>> ' '.join(' '.join(d[t]) for t in sorted(d.keys(), key=int))
'is This an example example'
You have duplicated positions for example in your sample string, which is why example is repeated in the code.
However, your sample is not matching your requirements either - but this results is as per your description. Words arranged as per their position indicators.
Now, if you want to get rid of duplicates:
>>> ' '.join(e for t in sorted(d.keys(), key=int) for e in set(d[t]))
'is This an example'

Why not regex? This works.
In [44]: s = "id1(attr1, is-2), id2(attr2, This-1), id3(attr3, an-3), id4(attr4, example-4), id5(atttr5, example-4)"
In [45]: z = [(m.group(2), m.group(1)) for m in re.finditer(r'(\w+)-(\d+)\)', s)]
In [46]: [x for y, x in sorted(set(z))]
Out[46]: ['This', 'is', 'an', 'example']

OK, how about this:
sample = "id1(attr1, is-2), id2(attr2, This-1),
id3(attr3, an-3), id4(attr4, example-4), id5(atttr5, example-4)"
def make_cryssie_happy(s):
words = {} # we will use this dict later
ll = s.split(',')[1::2]
# we only want items like This-1, an-3, etc.
for item in ll:
tt = item.replace(')','').lstrip()
(word, pos) = tt.split('-')
words[pos] = word
# there can only be one word at a particular position
# using a dict with the numbers as positions keys
# is an alternative to using sets
res = [words[i] for i in sorted(words)]
# sort the keys, dicts are unsorted!
# create a list of the values of the dict in sorted order
return ' '.join(res)
# return a nice string
print make_cryssie_happy(sample)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python Split Strings While Preserving Order? - python

You can use regular expressions for that: import re pattern = re.compile(r"(.+:)\s+([^\s].+)") result = [] for line in lines: match = pattern.match(line) if match: result.append(match.group(1)) result.append(match.group(2)) else: result.append(line)

You can use nested loop comprehension for the input list: l = ['example: Test::YES'] l1 = [j.lower().strip() for i in l for j in i.split(":", 1) if j.strip().lower() != ''] print(l1) Output: ['example', 'Test::YES']

Related

Python - Trying to replace words in a list of strings but having problems with single letter words

How to compare reverse strings in list of strings with the original list of strings in python?

Get substring between strings from a python list

Iterating through a string word by word

Extracting multiple substring from a string

Categories

Resources