Python: Get single words from a string - python

i'm trying to make a string analyzer in python. I'm starting with this input as example:
toAnalyze= "Hello!!gyus-- lol\n"
and as output i want something like that:
>Output: ['Hello', '!!', 'guys', '--', ' ', 'lol']
I want every gropus sorted in the original order
I have thought to scan all chars in the original string until the "\n" character and i came up whith this solution:
toAnalyze= "Hello!!gyus-- lol\n"
final = ""
for char in toAnalyze:
if char != " \n\t" and char != " " and char != "\n" and char != "\n\t":
final += char
elif char == " " or char == "\n" or char == "\n\t" or char == " \n\t":
if not final.isalnum():
word= ""
thing = ""
for l in final:
if l.isalnum():
word += l
else:
thing += l
print("word: " + word)
print("thing: " + thing )
And my current output is:
>Output: thing: !!-- word: Hellogyus lol
Do you have and idea?
The output wanted :
>Output: ['Hello', '!!', 'guys', '--', ' ', 'lol']
Thanks in advance and have a nice day

I'm not a python guy, but want to help you to get started. This is the working solution which you can try to improve so that it becomes more pythonist:
toAnalyze= 'Hello!!gyus-- lol\n'
word = ''
separator = ''
tokens = []
for ch in toAnalyze:
if ch.isalnum():
word += ch
# we met the first character of a separator, so save a word
if not ch.isalnum() and word:
tokens.append(word)
word = ''
# 1. we met the first alphanumeric after a separator, so save the separator or
# 2. we met a new separator right after another one, also save the old separator
if ch.isalnum() and separator or separator and separator[-1] != ch:
tokens.append(separator)
separator = ''
if not ch.isalnum():
separator += ch
The output for your example is:
['Hello', '!!', 'gyus', '--', ' ', 'lol']

Related

Is there a way to simplify my deep string of "if" statements? None of them actually repeat they are all just similar

I have written some code to help with my GCSE revision (exams in the UK taken at age 16) which converts a string into just the first letter of every word but leaves everything else in tact. (i.e special characters at the ends of words, capitalisation, etc...)
For example:
If I input >>> "These are some words (now they're in brackets!)"
I would want it to output >>> "T a s w (n t i b!)"
I feel although there must be an easier way to do this than my string of similar "if" statements... For reference, I am reasonably new to python but I can't see to find an answer online. Thanks in advance!
Code:
line = input("What text would you like to memorise?\n")
words = line.split()
letters=''
spec_chars=[
'(',')',',','.','“','”','"',"‘","’","'",'!','¡','?','¿','…'
]
for word in words:
if word[0] in spec_chars:
if word[-1] in spec_chars:
if word[-2] in spec_chars:
if word[1] in spec_chars:
letters += word[0] + word[1] + word[2] + word[-2] + word[-1] + " "
else:
letters += word[0] + word[1] + word[-2] + word[-1] + " "
else:
if word[1] in spec_chars:
letters += word[0] + word[1] + word[2] + word[-1] + " "
else:
letters += word[0] + word[1] + word[-1] + " "
else:
if word[1] in spec_chars:
letters += word[0] + word[1] + word[2] + " "
else:
letters += word[0] + word[1] + " "
else:
if word[-1] in spec_chars:
if word[-2] in spec_chars:
letters += word[0] + word[-2] + word[-1] + " "
else:
letters += word[0] + word[-1] + " "
else:
letters += word[0] + " "
output=("".join(letters))
print(output)
Here's one alternative. We keep every punctuation except apostrophe, and we only keep the first letter encountered.
words = "These are some words (now they're in brackets!)"
alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzé'"
output = []
for word in words.split():
output.append( '' )
found = False
for i in word:
if i in alphabet:
if not found:
found = True
output[-1] += i
else:
output[-1] += i
print(' '.join(output))
Output:
T a s w (n t i b!)
This might be somewhat overwhelming for now, but I'd still like to point out a solution that allows for a much more concise solution using regular expressions, because it's quite instructional in terms of how to approach problems like this.
TL;DR: It can be done in one line
import re
' '.join(re.sub(r"(\w)[\w']*\w", r'\1', word) for word in text.split())
If you look at the words individually after using .split(), it appears that what you need to do is basically remove all letters (and word-internal apostrophe) after the first letter occurring in each word.
[
'"These', # remove 'hese'
'are', # 're'
'some', # 'ome'
'words', # 'ords'
'(now', # 'ow'
"they're", # "hey're"
'in', # 'n'
'brackets!)"' # 'rackets'
]
Another way to think about it is to find sequences consisting of
A letter x
A sequence of 1 or more letters
and replace the sequence with x. E.g., in '"These', replace 'These' with 'T'. to arrive at '"T'; in brackets!)", replace 'brackets' with 'b', etc.
In regular expression syntax, this becomes:
(\w): A letter is matched by \w, but we want to reference to it later, so we need to put it in a group - hence the parentheses.
A sequence of 1 or more (indicated by +) letters is \w+. We also want to include apostrophe, so we want a class indicated by [], i.e., [\w']+, which means "match one or more instances of a letter or apostrophe".
To replace/substitute substrings matched by the pattern we use re.sub(pattern, replacement, string). In the replacement string we can tell it to insert the group we defined before by using the reference \1.
Putting it all together:
# import the re module
import re
# define the regular expression
pattern = r"(\w)[\w']+"
# some test data
texts = ["\"These are some words (now they're in brackets!)\"",
"¿Qué es lo mejor asignatura? '(¡No es dibujo!!)'",
"The kids' favourite teacher"]
# testing the pattern
for text in texts:
words = text.split()
print(text)
print(' '.join(re.sub(pattern, r'\1', word) for word in words))
print()
Result:
"These are some words (now they're in brackets!)"
"T a s w (n t i b!)"
¿Qué es lo mejor asignatura? '(¡No es dibujo!!)'
¿Q e l m a? '(¡N e d!!)'
The kids' favourite teacher
T k f t
To include word-final apostrophe, modify the pattern to
pattern = r"(\w)[\w']*\w"
so that the letter-apostrophe sequence must end with a letter.
In other words, we now match
a group consisting of a letter (\w), followed by
zero or more (indicated by *) instances of letter or apostrophe, and
a letter \w.
The result is exactly the same as above, except the last sentence becomes "T k' f t".
Below code is working fine for me.
Here, I am just checking the left and right end of each word of the given sentence.
Let me know in case of any clarification.
words = "¿Qué es lo mejor asignatura? '(¡No es dibujo!!)'"
spec_chars = ['(', ')', ',', '.', '“', '”', '"', "‘",
"’", "'", '!', '¡', '?', '¿', '…']
s_lst = words.split(' ')
tmp, rev_tmp = '', ''
for i in range(len(s_lst)):
for l in s_lst[i]:
if l in spec_chars:
tmp += l
else:
tmp += l
for j in s_lst[i][::-1]:
if j in spec_chars:
rev_tmp += j
else:
tmp += rev_tmp[::-1]
break
s_lst[i] = tmp
tmp = ''
rev_tmp = ''
break
print(' '.join(s_lst))
Since you mentioned that you are at an entry-level, you can use a for loop to simplify your if statement. It is not perfect but could solve the question you have raised.`
line = input("What text would you like to memorise?\n")
words = line.split()
spec_chars=['(',')',',','.','“','”','"',"‘","’","'",'!','¡','?','¿','…']
letters=''
for word in words:
letters+=word[0]
if word[0] in spec_chars:
letters+=word[1]
elif word[-2] in spec_chars:
letters+=word[-2]+word[-1]
elif word[-1] in spec_chars:
letters+=word[-1]
print(letters)

Remove every second character (must be alphabetical or numerical) of a string without affecting spaces in python

For instance, if my initial string is "hello world! how are you? 0" I would like for the resulting string to be "hlo ol! hw r yu?". So far I have the following code:
s = "hello world! how are you? 0"
for char in s:
if char.isalpha() == True:
i = 0
s2 = ""
for char in s:
if char.isalpha() or char.isnumeric():
if (i % 2) == 0:
s2 += char
i += 1
else:
s2 += char
the output string s2 will be:
# s2 = 'hlo ol! hw r yu? '
Try this:
>>> s = "hello world! how are you? 0"
>>> ' '.join(j[::2] if i%2==0 else j[1::2] for i,j in enumerate(''.join(k for k in s if k.isalpha() or k==' ').split()))
'hlo ol hw r yu'
First we remove all non-alphabetical characters and spaces with ''.join(k for k in s if k.isalpha() or k==' '). This produces 'hello world how are you '. Then we split it. We get ['hello', 'world', 'how', 'are', 'you']. Now For each item in this list we skip alternating characters in the string starting from second index if they in odd position (index) and skip alternating characters in the string from the first index if they are in even position (index).
This is equivalent to :
s1 = ''.join(k for k in s if k.isalpha() or k==' ') #'hello world how are you'
s1_list = s1.split() #['hello', 'world', 'how', 'are', 'you']
s2_list = [j[::2] if i%2==0 else j[1::2] for i,j in enumerate(s1_list)] #['hlo', 'ol', 'hw', 'r', 'yu']
s3 = ' '.join(s2_list) #'hlo ol hw r yu'
If you Don't 100% need to use the char.isalpha()
s = "hello world! how are you? 0"
i = 0
for char in s:
if char == " " or (i % 2) != 0:
s2 += char
i += 1
s = s2
if you wanted every odd character rather than the even ones, simple remove the "not" from the if statement to reverse the logic
Or amend the above by taking out the if statement and inserting outside of the client code
char.isalpha():
if char == " " or (i % 2) != 0:
s2 += char
s = "hello world! how are you? 0"
i = 0
for char in s:
char.isalpha()
s = s2
Personally I would opt for the top because it's less confusing, particularly if you don't need to use the function elsewhere
Ok, just playing around.. I like the c[::2] operator method.
Problem here is that the count starts with the first letter in the word and doesn't include spaces.. but it was fun.
import re
import string
s = "hello world! how are you? 0"
split_by_punc = re.findall(f"[\w]+|[{string.punctuation}]", s)
result = ' '.join(c[::2] if c[::2].isalnum() else c for c in split_by_punc)
for punc in string.punctuation:
result = result.replace(f' {punc}', punc) # remove extra spaces before punctuation
"hlo wrd! hw ae yu? 0"
You have used isalpha() , instead use isalnum() to include both the alphabets and the numeric values .
word = "hello world! how are you? 0"
index = 0
result = ""
for letter in word:
if(letter.isalnum() == False):
result += letter
elif(index == 0 and letter.isalnum() == True):
result += letter
index = 1
else:
index = 0
print (result)

Replace Duplicate String Characters

I need to convert a string word where each character that appears only once should be appear as '(' in the new string. Any duplicate characters in the original string should be replaced with ')'.
My code below...
def duplicate_encode(word):
new_word = ''
for char in word:
if len(char) > 1:
new_word += ')'
else:
new_word += '('
return new_word
The test I'm not passing is as follows:
'((((((' should equal '()()()'
This would suggest that, if for example, the input is "recede," the output should read ()()().
Your Code is Good just need some alteration it will be great.
def duplicate_encode(word):
"""
To replace the duplicate letter with ")" in a string.
if given letter is unique it replaced with "("
"""
word_dict = {} # initialize a dictionary
new_word = ""
for i in set(word): # this loop is used to count duplicate words
word_count = word.count(i)
word_dict[i] = word_count # add letter and count of the letter to dictionary
for i in word:
if word_dict[i] > 1:
new_word += ")"
else:
new_word += "("
print new_word
duplicate_encode("recede")
I think you got the answer :)
Just because (it's late and) it's possible:
def duplicate_encode(word):
return (lambda w: ''.join(('(', ')')[c in w[:i] + w[i+1:]] for i, c in enumerate(w)))(word.lower())
print(duplicate_encode("rEcede"))
OUTPUT
> python3 test.py
()()()
>
Seems like your result is based on the number of occurrences of a character in the word, you can use Counter to keep track of that:
def duplicate_encode(word):
from collections import Counter
word = word.lower() # to disregard case
counter = Counter(word)
new_word = ''
for char in word:
if counter[char] > 1: # if the character appears more than once in the word
# translate it to )
new_word += ')'
else:
new_word += '('
return new_word
duplicate_encode('recede')
# '()()()'

How to find start and end of a string

Intention is to write a function that would reverse the words in a string. So that if the input is: "I am a student" the output should be "student am a I"
I have the following code in Python which first reverses all the characters in a string and then loops the reversed sentence to reverse the words and prints them to a "final sentence" variable.
Because the condition I am checking for is just a space, the first word doesn't get printed i.e. if the input is " I am a student" my code works (notice the space before "I") ... however if the input is "I am a student" then the output is just "student a am"
I need to know how can I modify my IF statement so it doesn't miss the first word
def reverse(sentence):
count = 0
new_sentence = ''
final_sentence = ''
counter = 0
word = ''
for char in sentence[::-1]:
new_sentence = new_sentence + char
for char in new_sentence:
if char != " ":
count = count + 1
continue
else:
for i in new_sentence[count-1::-1]:
if i != " ":
word = word + i
else:
break
count = count + 1
final_sentence = final_sentence + " " + word
word = ''
print final_sentence
reverse("I am a student")
I'm not sure why you are doing such complicated loops? You can just split the sentence, reverse and then join it again:
>>> ' '.join('I am a student'.split(' ')[::-1])
'student a am I'
To translate that into a function:
def reverse_sentence(sentence):
return ' '.join(sentence.split(' ')[::-1])
You're doing several strange things in your code. For example:
new_sentence = ''
for char in sentence[::-1]:
new_sentence = new_sentence + char
The string you're building through concatenation is already present in sentence[::-1]. You could've just done new_sentence = sentence[::-1].
You can check for the first word by using enumerate() and checking whether there is a space prior to that point in the sentence:
for idx,char in enumerate(new_sentence):
if char != " " or ' ' not in new_sentence[:idx]:
However, the easiest way to accomplish your actual goal is with split(), splitting the sentence by whitespace automatically. Use join() to put it back together once you've reversed it.
def reverse(sentence):
return ' '.join(sentence.split()[::-1])

How to format my string

def main():
print('Please enter a sentence without spaces and each word has ' + \
'a capital letter.')
sentence = input('Enter your sentence: ')
for ch in sentence:
if ch.isupper():
capital = ch
sentence = sentence.replace(capital, ' ' + capital)
main()
Ex: sentence = 'ExampleSentenceGoesHere'
I need this to print as: Example sentence goes here
as of right now, it prints as: Example Sentence Goes Here (with space at the beginning)
You can iterate over the string character by character and replace every upper case letter with a space and appropriate lower case letter:
>>> s = 'ExampleSentenceGoesHere'
>>> "".join(' ' + i.lower() if i.isupper() else i for i in s).strip().capitalize()
'Example sentence goes here'
Note that check if the string is in upper case is done by isupper(). Calling strip() and capitalize() just helps to deal with the first letter.
Also see relevant threads:
Elegant Python function to convert CamelCase to snake_case?
How to check if a character is upper-case in Python?
You need to convert the each uppercase letter to a lowercase one using capital.lower(). You should also ignore the first letter of the sentence so it stays capitalised and doesn't have a space first. You can do this using a flag as such:
is_first_letter = True
for ch in sentence:
if is_first_letter:
is_first_letter = False
continue
if ch.isupper():
capital = ch
sentence = sentence.replace(capital, ' ' + capital.lower())
I'd probably use re and re.split("[A-Z]", text) but I'm assuming you can't do that because this looks like homework. How about:
def main():
text = input(">>")
newtext = ""
for character in text:
if character.isupper():
ch = " " + character.lower()
else:
ch = character
newtext += ch
text = text[0]+newtext[2:]
You could also do:
transdict = {letter:" "+letter.lower() for letter in 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'}
transtable = str.maketrans(transdict)
text.translate(transtable).strip().capitalize()
But again I think that's outside the scope of the assignment

Categories

Resources