Removing a 'range' of characters at specific indexes in string - python

I have looked over similar questions, but I still have trouble figuring this one out.
I have two Lists of strings, one of which consists of characters like 'abcdefg' and another one consisting of strings which consist of white spaces and a special character. The special character indicates where I should remove characters from my 'abcdefg' string. The special character's position in the list would be the same position I would need to remove a character from the first list. I also need to remove the adjacent characters.
EDIT: I want to remove a character (and the adjacent characters) at the same position the '*' char is located in airstrikes, but in reinforces. Does this make sense?
reinforces = ["abcdefg", "hijklmn"]
airstrikes = [" * "]
battlefield = reinforces[0]
bomb_range = []
count = 0
if range(len(airstrikes)) != 0:
for airstrike in airstrikes:
for char in airstrike:
print(count)
count = count + 1
if (char == '*'):
bomb_range.append(count-1)
bomb_range.append(count)
bomb_range.append(count+1)
break
#Trying to hardcode it initially just to get it to work. Some kind of looping is needed though.
battlefield = battlefield[:bomb_range[0]] + battlefield[bomb_range[1]:]
battlefield = battlefield[:bomb_range[1]] + battlefield[bomb_range[2]:]
#battlefield = battlefield[:bomb_range[2]] + battlefield[bomb_range[3]:] #Will not work of course. But how could I achieve what I want?
I am sorry about the nested loops. If it hurts looking at it, feel free to bash and correct me. I am sorry if I missed any answers on this forum which could have helped me find a solution. Know that I did try to find one.

Use index to find where to strike, then remove the character the usual way:
>>> reinforce = "abcdefg"
>>> airstrike = " * "
>>> strike_at = airstrike.index('*')
>>> reinforce[:strike_at]+reinforce[strike_at+1:]
'abcefg'
of course, you need to make sure strike_at+1 is a legal index (see try and except).

Related

Python replace all characters in a string except for 3 characters with underscores

I'm working on a simple hangman app and I am trying to replace all the chars but three, in a word users have to guess, with underscores randomly.
So for example: America to A_M___C_
I have most of it down, but my code sometimes leaves more than 3 characters unconverted to underscores. Here is my code:
topics = {
"movie": ["Star-Trek", "Shang-Chi"],
"place": ["PARIS", "AMERICA", "ENGLAND"]
}
topic = random.choice(["movie", "place"])
solution = random.choice(topics[topic])
question = ""
length = len(solution)
underscores = length - 3
for char in solution:
for i in range(underscores):
add = question + random.choice([char, "_"])
question = question + add
Let's rehash your problem statement:
Given a string of unspecified length with more than three different characters, you want to keep three of these unique characters, and replace all the others.
Notice that the string could contain duplicates, so if your string is "abracadabra" and a is one of the letters you want to keep, that already gets you a__a_a_a__a.
import random
topics = {
"movie": ["Star-Trek", "Shang-Chi"],
"place": ["PARIS", "AMERICA", "ENGLAND"]
}
topic = random.choice(["movie", "place"])
solution = random.choice(topics[topic])
characters = set(solution)
keep = random.sample(list(characters), 3)
question = solution
for char in characters.difference(keep):
question = question.replace(char, '_')
set(solution) produces a set of the unique characters in the string, and random_sample(list(characters), 3) picks three of them (forcing the set back to a list is required to avoid a deprecation warning in Python >= 3.9). We then use the set.difference() to loop over the remaining unique characters, and replace all occurrences of them with underscores.
Your attempt doesn't normalize characters to upper or lower case; I would assume this is something you would actually want to do, but then probably normalize the questions to upper case already in the source. I'm leaving this as an excercise; this code simply regards a and A as different characters.
You have a few problems in your code. First of all the inner loop will restart for every character but that's not what you want. You want to put x underscores. Secondly, you only need to add your random choice, and not the while question. So, below a working program:
# get len(solution) - 3 unique numbers
indices = random.sample(range(len(solution)), len(solution) - 3)
question = list(solution) # convert to a list so we can index
for i in indices:
question[i] = "_" # change them to underscores
question = "".join(question) # convert to list to a string
The reasen your solution (and my previous solution) don't work is because the number of not underscores isn't limited. So you could keep a counter with the number of not underscores, and when that's equal to 3, only replace characters by underscores like this, but keep in mind you would still need to track the number of underscores. So this solution is better.
Here is one function that takes in a string and optional number to save letters, so 3 will save 3 letters, 5 will save 5 letters. There is no error check to see if save_letters is larger than the word and so forth. The code is supposed to show you exactly what is happening, not to be a clever one liner.
import random
def change_word(word, save_letters = 3):
length = len(word)
randies = []
ret = ""
while len(randies) < save_letters:
x = random.randint(0,length - 1)
if not x in randies:
randies.append(x)
for i in range(0, len(word)):
if i in randies:
ret += word[i]
continue
ret += "_"
return ret

Python for loop to find a character and add substrings before/after that character

I am new to programming and am trying to work through computer science circles by the University of Waterloo and am stuck on this exercise: https://i.stack.imgur.com/ltVu9.png
The code in the image is what I have come up with so far.
The exercise wants me to take the substrings from before and after the "+" character and add them together using a for loop and I can't figure out how to get the substring. So far I've tried
print(S[0:len(char)])
To get the substring of the characters before and after the '+' symbol, you need to get the characters before the current position of the '+' char and after.
S = '12+5'
for pos in range(len(S)):
char = S[pos]
if char == '+':
sub_1 = S[:pos] # Get string before pos
sub_2 = S[pos + 1:] # Get string after pos
print('{} + {}'.format(sub_1, sub_2))
# Output: 12 + 5
However, if you are just wanting the simplest solution without thinking of how to do it manually then as others have said using .split() makes things easy. Since .split() will split a string into a list of strings separated by a specific character.
Using .split() the code can become like this:
S = '12+5'
split_S = S.split('+') # Creates a list of ['12', '5']
# Make sure our list has 2 items in it to print
if len(split_S) == 2:
print('{} + {}'.format(split_S[0], split_s[1])
# Output: 12 + 5
I would recommend this:
nums = S.split('+')
print(int(nums[0])+int(nums[1]))
sum([int(c) for c in input().split('+')])
5+12
Out: 17
Epicdaface25 is right, you could simply use
nums = S.split('+')
print(int(nums[0])+int(nums[1]))
But if you need to use a for loop, Karl's answer is the better option. However, you would not
print('{} + {}'.format(sub_1, sub_2))
you would need to print int(sub_1) + int(sub_2) to have Python actually add the two numbers and display the sum, as opposed to the mathematical expression.

Python inserting spaces in string

Alright, I'm working on a little project for school, a 6-frame translator. I won't go into too much detail, I'll just describe what I wanted to add.
The normal output would be something like:
TTCPTISPALGLAWS_DLGTLGFMSYSANTASGETLVSLYQLGLFEM_VVSYGRTKYYLICP_LFHLSVGFVPSD
The important part of this string are the M and the _ (the start and stop codons, biology stuff). What I wanted to do was highlight these like so:
TTCPTISPALGLAWS_DLGTLGF 'MSYSANTASGETLVSLYQLGLFEM_' VVSYGRTKYYLICP_LFHLSVGFVPSD
Now here is where (for me) it gets tricky, I got my output to look like this (adding a space and a ' to highlight the start and stop). But it only does this once, for the first start and stop it finds. If there are any other M....._ combinations it won't highlight them.
Here is my current code, attempting to make it highlight more than once:
def start_stop(translation):
index_2 = 0
while True:
if 'M' in translation[index_2::1]:
index_1 = translation[index_2::1].find('M')
index_2 = translation[index_1::1].find('_') + index_1
new_translation = translation[:index_1] + " '" + \
translation[index_1:index_2 + 1] + "' " +\
translation[index_2 + 1:]
else:
break
return new_translation
I really thought this would do it, guess not. So now I find myself being stuck.
If any of you are willing to try and help, here is a randomly generated string with more than one M....._ set:
'TTCPTISPALGLAWS_DLGTLGFMSYSANTASGETLVSLYQLGLFEM_VVSYGRTKYYLICP_LFHLSVGFVPSDGRRLTLYMPPARRLATKSRFLTPVISSG_DKPRHNPVARSQFLNPLVRPNYSISASKSGLRLVLSYTRLSLGINSLPIERLQYSVPAPAQITP_IPEHGNARNFLPEWPRLLISEPAPSVNVPCSVFVVDPEHPKAHSKPDGIANRLTFRWRLIG_VFFHNAL_VITHGYSRVDILLPVSRALHVHLSKSLLLRSAWFTLRNTRVTGKPQTSKT_FDPKATRVHAIDACAE_QQH_PDSGLRFPAPGSCSEAIRQLMI'
Thank you to anyone willing to help :)
Regular expressions are pretty handy here:
import re
sequence = "TTCP...."
highlighted = re.sub(r"(M\w*?_)", r" '\1' ", sequence)
# Output:
"TTCPTISPALGLAWS_DLGTLGF 'MSYSANTASGETLVSLYQLGLFEM_' VVSYGRTKYYLICP_LFHLSVGFVPSDGRRLTLY 'MPPARRLATKSRFLTPVISSG_' DKPRHNPVARSQFLNPLVRPNYSISASKSGLRLVLSYTRLSLGINSLPIERLQYSVPAPAQITP_IPEHGNARNFLPEWPRLLISEPAPSVNVPCSVFVVDPEHPKAHSKPDGIANRLTFRWRLIG_VFFHNAL_VITHGYSRVDILLPVSRALHVHLSKSLLLRSAWFTLRNTRVTGKPQTSKT_FDPKATRVHAIDACAE_QQH_PDSGLRFPAPGSCSEAIRQLMI"
Regex explanation:
We look for an M followed by any number of "word characters" \w* then an _, using the ? to make it a non-greedy match (otherwise it would just make one group from the first M to the last _).
The replacement is the matched group (\1 indicates "first group", there's only one), but surrounded by spaces and quotes.
You just require little slice of 'slice' module , you don't need any external module :
Python string have a method called 'index' just use it.
string_1='TTCPTISPALGLAWS_DLGTLGFMSYSANTASGETLVSLYQLGLFEM_VVSYGRTKYYLICP_LFHLSVGFVPSD'
before=string_1.index('M')
after=string_1[before:].index('_')
print('{} {} {}'.format(string_1[:before],string_1[before:before+after+1],string_1[before+after+1:]))
output:
TTCPTISPALGLAWS_DLGTLGF MSYSANTASGETLVSLYQLGLFEM_ VVSYGRTKYYLICP_LFHLSVGFVPSD

Most Frequent Character - User Submitted String without Dictionaries or Counters

Currently, I am in the midst of writing a program that calculates all of the non white space characters in a user submitted string and then returns the most frequently used character. I cannot use collections, a counter, or the dictionary. Here is what I want to do:
Split the string so that white space is removed. Then count each character and return a value. I would have something to post here but everything I have attempted thus far has been met with critical failure. The closest I came was this program here:
strin=input('Enter a string: ')
fc=[]
nfc=0
for ch in strin:
i=0
j=0
while i<len(strin):
if ch.lower()==strin[i].lower():
j+=1
i+=1
if j>nfc and ch!=' ':
nfc=j
fc=ch
print('The most frequent character in string is: ', fc )
If you can fix this code or tell me a better way of doing it that meets the required criteria that would be helpful. And, before you say this has been done a hundred times on this forum please note I created an account specifically to ask this question. Yes there are a ton of questions like this but some that are reading from a text file or an existing string within the program. And an overwhelmingly large amount of these contain either a dictionary, counter, or collection which I cannot presently use in this chapter.
Just do it "the old way". Create a list (okay it's a collection, but a very basic one so shouldn't be a problem) of 26 zeroes and increase according to position. Compute max index at the same time.
strin="lazy cat dog whatever"
l=[0]*26
maxindex=-1
maxvalue=0
for c in strin.lower():
pos = ord(c)-ord('a')
if 0<=pos<=25:
l[pos]+=1
if l[pos]>maxvalue:
maxindex=pos
maxvalue = l[pos]
print("max count {} for letter {}".format(maxvalue,chr(maxindex+ord('a'))))
result:
max count 3 for letter a
As an alternative to Jean's solution (not using a list that allows for one-pass over the string), you could just use str.count here which does pretty much what you're trying to do:
strin = input("Enter a string: ").strip()
maxcount = float('-inf')
maxchar = ''
for char in strin:
c = strin.count(char) if not char.isspace() else 0
if c > maxcount:
maxcount = c
maxchar = char
print("Char {}, Count {}".format(maxchar, maxcount))
If lists are available, I'd use Jean's solution. He doesn't use a O(N) function N times :-)
P.s: you could compact this with one line if you use max:
max(((strin.count(i), i) for i in strin if not i.isspace()))
To keep track of several counts for different characters, you have to use a collection (even if it is a global namespace implemented as a dictionary in Python).
To print the most frequent non-space character while supporting arbitrary Unicode strings:
import sys
text = input("Enter a string (case is ignored)").casefold() # default caseless matching
# count non-space character frequencies
counter = [0] * (sys.maxunicode + 1)
for nonspace in map(ord, ''.join(text.split())):
counter[nonspace] += 1
# find the most common character
print(chr(max(range(len(counter)), key=counter.__getitem__)))
A similar list in Cython was the fastest way to find frequency of each character.

Python script to insert space between different character types: Why is this *so* slow?

I'm working with some text that has a mix of languages, which I've already done some processing on and is in the form a list of single characters (called "letters"). I can tell which language each character is by simply testing if it has case or not (with a small function called "test_lang"). I then want to insert a space between characters of different types, so I don't have any words that are a mix of character types. At the same time, I want to insert a space between words and punctuation (which I defined in a list called "punc"). I wrote a script that does this in a very straight-forward way that made sense to me (below), but apparently is the wrong way to do it, because it is incredibly slow.
Can anyone tell me what the better way to do this is?
# Add a space between Arabic/foreign mixes, and between words and punc
cleaned = ""
i = 0
while i <= len(letters)-2: #range excludes last letter to avoid Out of Range error for i+1
cleaned += letters[i]
# words that have case are Latin; otherwise Arabic
if test_lang(letters[i]) != test_lang(letters[i+1]):
cleaned += " "
if letters[i] in punc or letters[i+1] in punc:
cleaned += " "
i += 1
cleaned += letters[len(letters)-1] # add in last letter
There are a few things going on here:
You call test_lang() on every letter in the string twice, this is probably the main reason this is slow.
Concatenating strings in Python isn't very efficient, you should instead use a list or generator and then use str.join() (most likely, ''.join()).
Here is the approach I would take, using itertools.groupby():
from itertools import groupby
def keyfunc(letter):
return (test_lang(letter), letter in punc)
cleaned = ' '.join(''.join(g) for k, g in groupby(letters, keyfunc))
This will group the letters into consecutive letters of the same language and whether or not they are punctuation, then ''.join(g) converts each group back into a string, then ' '.join() combines these strings adding a space between each string.
Also, as noted in comments by DSM, make sure that punc is a set.
Every time you perform a string concatenation, a new string is created. The longer the string gets, the longer each concatenation takes.
http://en.wikipedia.org/wiki/Schlemiel_the_Painter's_algorithm
You might be better off declaring a list big enough to store the characters of the output, and joining them at the end.
I suggest an entirely different solution that should be very fast:
import re
cleaned = re.sub(r"(?<!\s)\b(?!\s)", " ", letters, flags=re.LOCALE)
This inserts a space at every word boundary (defining words as "sequences of alphanumeric characters, including accented characters in your current locale", which should work in most cases), unless it's a word boundary next to whitespace.
This should split between Latin and Arabic characters as well as between Latin and punctuation.
Assuming test_lang is not the bottleneck, I'd try:
''.join(
x + ' '
if x in punc or y in punc or test_lang(x) != test_lang(y)
else x
for x, y in zip(letters[:-1], letters[1:])
)
Here is a solution that uses yield. I would be interested to know whether this runs any faster than your original solution.
This avoids all the indexing in the original. It just iterates through the input, holding onto a single previous character.
This should be easy to modify if your requirements change in the future.
ch_sep = ' '
def _sep_chars_by_lang(s_input):
itr = iter(s_input)
ch_prev = next(itr)
yield ch_prev
while True:
ch = next(itr)
if test_lang(ch_prev) != test_lang(ch) or ch_prev in punc:
yield ch_sep
yield ch
ch_prev = ch
def sep_chars_by_lang(s_input):
return ''.join(_sep_chars_by_lang(s_input))
Keeping the basic logic of the OP's original code, we speed it up by not doing all that [i] and [i+1] indexing. We use a prev and next reference that scan through the string, maintaining prev one character behind next:
# Add a space between Arabic/foreign mixes, and between words and punc
cleaned = ''
prev = letters[0]
for next in letters[1:]:
cleaned += prev
if test_lang(prev) != test_lang(next):
cleaned += ' '
if prev in punc or next in punc:
cleaned += ' '
prev = next
cleaned += next
Testing on a string of 10 million characters shows this is about twice the speed of the OP code. The "string concatenation is slow" complaint is obsolete, as others have pointed out. Running the test again using the ''.join(...) metaphor shows a slighly slower execution than using string concatenation.
Further speedup may come through not calling the test_lang() function but by inlining some simple code. Can't comment as I don't really know what test_lang() does :).
Edit: removed a 'return' statement that should not have been there (testing remnant!).
Edit: Could also speedup by not calling test_lang() twice on the same character (on next in one loop and then prev in the following loop). Cache the test_lang(next) result.

Categories

Resources