Getting substring between nth and mth occurence of some sequence

Getting substring between nth and mth occurence of some sequence - python

I want to do search a string that I know contains several occurences of a particular char sequence and retrieve what's between two certain of these occurences, numbered. Or preferably, numbered from the end. I also want to do this as compact as possible, as it goes inside a list comprehension.
Let's say I have the following string:
s = "foofoo\tBergen, Norway\tAffluent\tDonkey"
I want to retrieve the substring of s that is situated between the last occurence of "\t" and the penultimate occurence.
So in this very example: "Affluent"
Here is the comprehension I am currently using (without having pruned the string):
data = [(entries[i], entries[i+1]) for i in range(0, len(entries), 3)]
It's the string entries[i] for every entry into data that I want to prune.

Rsplit is used to split the word from right side
a="foofoo\tBergen, Norway\tAffluent\tDonkey"
word= a.rsplit('\t',2)
if len(word)>2:
print word[-2]
#output =Affluent

Assuming that the beginning of your string is treated as the 0th occurrence of the delimeter symbol:
def concat_strings(strs):
result = ""
for substr in strs:
result = result + substr
return result
def find_section(s, delim, n, m):
tokens = s.split(delim)
return concat_strings(tokens[n:m])

You could split the string by your character sequence, and join back-together (using your character sequence as the joining string) the desired occurrences.
Update:For the example cited:
"\t".join(s.split("\t")[-2:-1])

Related

Swap last two characters in a string, make it lowercase, and add a space

I'm trying to take the last two letters of a string, swap them, make them lowercase, and leave a space in the middle. For some reason the output gives me white space before the word.
For example if input was APPLE then the out put should be e l
It would be nice to also be nice to ignore non string characters so if the word was App3e then the output would be e p
def last_Letters(word):
last_two = word[-2:]
swap = last_two[-1:] + last_two[:1]
for i in swap:
if i.isupper():
swap = swap.lower()
return swap[0]+ " " +swap[1]
word = input(" ")
print(last_Letters(word))

You can try with the following function:
import re
def last_Letters(word):
letters = re.sub(r'\d', '', word)
if len(letters) > 1:
return letters[-1].lower() + ' ' + letters[-2].lower()
return None
It follows these steps:
removes all the digits
if there are at least two characters:
lowers every character
builds the required string by concatenation of the nth letter, a space and the nth-1 letter
and returns the string
returns "None"

Since I said there was a simpler way, here's what I would write:
text = input()
result = ' '.join(reversed([ch.lower() for ch in text if ch.isalpha()][-2:]))
print(result)
How this works:
[ch.lower() for ch in text] creates a list of lowercase characters from some iterable text
adding if ch.isalpha() filters out anything that isn't an alphabetical character
adding [-2:] selects the last two from the preceding sequence
and reversed() takes the sequence and returns an iterable with the elements in reverse
' '.join(some_iterable) will join the characters in the iterable together with spaces in between.
So, result is set to be the last two characters of all of the alphabetical characters in text, in reverse order, separated by a space.
Part of what makes Python so powerful and popular, is that once you learn to read the syntax, the code very naturally tells you exactly what it is doing. If you read out the statement, it is self-describing.

I want to replace letters/words, but I am facing challenges in one aspect of my code

I will be using lloll as an example word.
Here's my code:
mapping = {'ll':'o','o':'ll'}
string = 'lloll'
out = ' '.join(mapping.get(s,s) for s in string.split())
print(out)
The output should be ollo, but I get lloll. When I write ll o ll, it works, but I don't want spaces in between the ll and the o, and I don't want to do something like mapping = {'lloll':'ollo'}.

Not sure if this accounts for all edge cases (what about overlapping matches?), but my current idea is to re.split the string by the mapping's keys and then apply the mapping.
import re
mapping = {'ll':'o','o':'ll'}
string = 'lloll'
choices = f'({"|".join(mapping)})'
result = ''.join(mapping.get(s, s) for s in re.split(choices, string))

Using Template.substitute
I would find each occurrence of the sub-strings to be replaced and pre-fix them with a $. Then use substitute(mapping) from string.Template to effectively do a global find and replace on them all.
Most importantly in this approach, you can control the way that potentially overlapping mappings are handled by using sorted() or reversed() with a key to sort the order in which they are applied.
findall then split_string
This way you also get a few nice extra generator functions to findall occurrences of a substring and to split_string into segments at given indices which may help with whatever larger task you are performing. If they aren't valuable there is shorter version at the bottom.
Given that it uses generators throughout it should be pretty fast and memory efficient.
from itertools import chain, repeat
from string import Template
def CharReplace(string: str, map):
if ("$" in map) or ("$" in string):
raise ValueError("Cannot handle anything with a $ sign")
for old in map: #"old" is the key in the mapping
locations = findall(string, old) #index of each occurance of "old"
bits = split_string(string, locations) #the string split into segments, each starting with "old"
template = zip(bits, repeat("$")) #tuples: (segment of the string, "$")
string = ''.join(chain(*template))[:-1] #use chain(*) to unpack template so we get a new string with a $ in front of each occurance of "old" and strip the last "$" from the end
template = Template(string)
string = template.substitute(map) #substitute replaces substrings which follow "$" based on a mapping
return string
def findall(string: str, sub: str):
i = string.find(sub)
while i != -1:
yield i
i = string.find(sub, i + len(sub))
def split_string(string: str, indices):
for i in indices:
yield string[:i]
string = string[i:]
yield string
This approach will not handle any strings with "$" in them without some extra code to escape them.
It will run through the string from front to back, one key at a time in whatever order the dict iterates them. You could add some form of sorted() on the keys in the line for old in map in order to handle keys in a specific order (eg longest first, alphabetically).
It will handle repeated occurrences of a key such that llll will be recognised as ll ll and lll as ll l
In your original case this turn the original string first into $llo$ll then into $ll$o$ll before using substitute to get ollo
Single Generator to add delimiters
If you prefer your code to be shorter:
def CharReplace2(string: str, map):
if ("$" in map) or ("$" in string):
raise ValueError("Cannot handle anything with a $ sign")
for old in map: #"old" is the key in the mapping
string = ''.join(add_delimiters(string, old)) #add a $ before each occurrence of old
template = Template(string)
string = template.substitute(map) #substitute replaces substrings which follow "$" based on a mapping
return string
def add_delimiters(string: str, sub: str):
i = string.find(sub)
while i != -1:
yield string[:i] #string up to next occurrence of sub
string = ''.join(('$',string[i:])) #add a dollar to the start of the rest of the string (starts with sub)
i = string.find(sub, i + len(sub)) #and find the next occurrence
yield(string) #don't forget to yield the last bit of the string

How to avoid .replace replacing a word that was already replaced

Given a string, I have to reverse every word, but keeping them in their places.
I tried:
def backward_string_by_word(text):
for word in text.split():
text = text.replace(word, word[::-1])
return text
But if I have the string Ciao oaiC, when it try to reverse the second word, it's identical to the first after beeing already reversed, so it replaces it again. How can I avoid this?

You can use join in one line plus generator expression:
text = "test abc 123"
text_reversed_words = " ".join(word[::-1] for word in text.split())

s.replace(x, y) is not the correct method to use here:
It does two things:
find x in s
replace it with y
But you do not really find anything here, since you already have the word you want to replace. The problem with that is that it starts searching for x from the beginning at the string each time, not at the position you are currently at, so it finds the word you have already replaced, not the one you want to replace next.
The simplest solution is to collect the reversed words in a list, and then build a new string out of this list by concatenating all reversed words. You can concatenate a list of strings and separate them with spaces by using ' '.join().
def backward_string_by_word(text):
reversed_words = []
for word in text.split():
reversed_words.append(word[::-1])
return ' '.join(reversed_words)
If you have understood this, you can also write it more concisely by skipping the intermediate list with a generator expression:
def backward_string_by_word(text):
return ' '.join(word[::-1] for word in text.split())

Splitting a string converts it to a list. You can just reassign each value of that list to the reverse of that item. See below:
text = "The cat tac in the hat"
def backwards(text):
split_word = text.split()
for i in range(len(split_word)):
split_word[i] = split_word[i][::-1]
return ' '.join(split_word)
print(backwards(text))

Pandas is not recognizing words, but only letters. How can I do so when I slice, it gives me the words not the letters?

When I apply this function to my text, it gets cleaned but when I search for a specific word in a cell, it only gives me the letter, not the word.
def clean_text(x):
txt = re.sub(r'https?://\S+', '', x)
txt = re.sub('[^A-Za-z]+', ' ', x)
txt = ' '.join(txt.split())
return txt
When I try to get the first word (which should be 'future') in the following manner: df_clean.iloc[0,][0], I only get an 'F'.
How can I find words by their index in the cell?

df_clean.iloc[0,][0] is going to return the 0th element of df_clean.iloc[0,], which given df_clean.iloc[0,] is a string will be the first letter. You don't want the 0th element, you want the 0th word. If the cell had a list then what you are doing would work.
Two solutions:
If you wanted you could return (in clean_text()) txt.split() so you have lists
Or if you want you can return as you are now and search using a splice up to the first space df_clean.iloc[0,][0:str.index(' ')].

How might I create an acronym by splitting a string at the spaces, taking the character indexed at 0, joining it together, and capitalizing it?

My code
beginning = input("What would you like to acronymize? : ")
second = beginning.upper()
third = second.split()
fourth = "".join(third[0])
print(fourth)
I can't seem to figure out what I'm missing. The code is supposed to the the phrase the user inputs, put it all in caps, split it into words, join the first character of each word together, and print it. I feel like there should be a loop somewhere, but I'm not entirely sure if that's right or where to put it.

Say input is "Federal Bureau of Agencies"
Typing third[0] gives you the first element of the split, which is "Federal". You want the first element of each element in the sprit. Use a generator comprehension or list comprehension to apply [0] to each item in the list:
val = input("What would you like to acronymize? ")
print("".join(word[0] for word in val.upper().split()))
In Python, it would not be idiomatic to use an explicit loop here. Generator comprehensions are shorter and easier to read, and do not require the use of an explicit accumulator variable.

When you run the code third[0], Python will index the variable third and give you the first part of it.
The results of .split() are a list of strings. Thus, third[0] is a single string, the first word (all capitalized).
You need some sort of loop to get the first letter of each word, or else you could do something with regular expressions. I'd suggest the loop.
Try this:
fourth = "".join(word[0] for word in third)
There is a little for loop inside the call to .join(). Python calls this a "generator expression". The variable word will be set to each word from third, in turn, and then word[0] gets you the char you want.

works for me this way:
>>> a = "What would you like to acronymize?"
>>> a.split()
['What', 'would', 'you', 'like', 'to', 'acronymize?']
>>> ''.join([i[0] for i in a.split()]).upper()
'WWYLTA'
>>>

One intuitive approach would be:
get the sentence using input (or raw_input in python 2)
split the sentence into a list of words
get the first letter of each word
join the letters with a space string
Here is the code:
sentence = raw_input('What would you like to acronymize?: ')
words = sentence.split() #split the sentece into words
just_first_letters = [] #a list containing just the first letter of each word
#traverse the list of words, adding the first letter of
#each word into just_first_letters
for word in words:
just_first_letters.append(word[0])
result = " ".join(just_first_letters) #join the list of first letters
print result

#acronym2.py
#illustrating how to design an acronymn
import string
def main():
sent=raw_input("Enter the sentence: ")#take input sentence with spaces
for i in string.split(string.capwords(sent)):#split the string so each word
#becomes
#a string
print string.join(i[0]), #loop through the split
#string(s) and
#concatenate the first letter
#of each of the
#split string to get your
#acronym
main()

name = input("Enter uppercase with lowercase name")
print(f'the original string = ' + name)
def uppercase(name):
res = [char for char in name if char.isupper()]
print("The uppercase characters in string are : " + "".join(res))
uppercase(name)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Getting substring between nth and mth occurence of some sequence - python

Rsplit is used to split the word from right side a="foofoo\tBergen, Norway\tAffluent\tDonkey" word= a.rsplit('\t',2) if len(word)>2: print word[-2] #output =Affluent

Assuming that the beginning of your string is treated as the 0th occurrence of the delimeter symbol: def concat_strings(strs): result = "" for substr in strs: result = result + substr return result def find_section(s, delim, n, m): tokens = s.split(delim) return concat_strings(tokens[n:m])

You could split the string by your character sequence, and join back-together (using your character sequence as the joining string) the desired occurrences. Update:For the example cited: "\t".join(s.split("\t")[-2:-1])

Related

Swap last two characters in a string, make it lowercase, and add a space

I want to replace letters/words, but I am facing challenges in one aspect of my code

How to avoid .replace replacing a word that was already replaced

Pandas is not recognizing words, but only letters. How can I do so when I slice, it gives me the words not the letters?

How might I create an acronym by splitting a string at the spaces, taking the character indexed at 0, joining it together, and capitalizing it?

Categories

Resources