I'm working on PythonChallenge #3. I've got a huge block of text that I have to sort through. I am trying to find a sequence in which the first and last three letters are caps, and the middle one is lowercase.
My function loops through the text. The variable block stores the seven letters that are currently being looped through. There's a variable, toPrint, which gets turned on and off based on whether the letters in block correspond to my pattern (AAAaAAA). Based on the last block printed according to my function, my loop stops early in my text. I have no idea why this is happening and if you could help me figure this out, that would be great.
text = """kAewtloYgcFQaJNhHVGxXDiQmzjfcpYbzxlWrVcqsmUbCunkfxZWDZjUZMiGqhRRiUvGmYmvnJ"""
words = []
for i in text:
toPrint = True
block = text[text.index(i):text.index(i)+7]
for b in block[:3]:
if b.isupper() == False:
toPrint = False
for b in block[3]:
if b.islower() == False:
toPrint = False
for b in block[4:]:
if b.isupper() == False:
toPrint = False
if toPrint == True and block not in words:
words.append(block)
print (block)
print (words)
With Regex:
This is a really good time to use regex, it's super fast, more clear, and doesn't require a bunch of nested if statements.
import re
text = """kAewtloYgcFQaJNhHVGxXDiQmzjfcpYbzxlWrVcqsmUbCunkfxZWDZjUZMiGqhRRiUvGmYmvnJ"""
print(re.search(r"[A-Z]{3}[a-z][A-Z]{3}", text).group(0))
Explanation of regex:
[A-Z]{3] ---> matches any 3 uppercase letters
[a-z] -------> matches a single lowercase letter
[A-Z]{3] ---> matches 3 more uppercase letters
Without Regex:
If you really don't want to use regex this is how you could do it:
text = """kAewtloYgcFQaJNhHVGxXDiQmzjfcpYbzxlWrVcqsmUbCunkfxZWDZjUZMiGqhRRiUvGmYmvnJ"""
for i, _ in enumerate(text[:-6]): #loop through index of each char (not including last 6)
sevenCharacters = text[i:i+7] #create chunk of seven characters
shouldBeCapital = sevenCharacters[0:3] + sevenCharacters[4:7] #combine all the chars that should be cap into list
if (all(char.isupper() for char in shouldBeCapital)): #make sure all those characters are indeeed capital
if(sevenCharacters[3].islower()): #make sure middle character is lowercase
print(sevenCharacters)
I think your first problem is that you are using str.index(). Like find(), the .index() method of a string returns the index of the first match that is found.
Thus, in your example, whenever you search for 'x' you will get the index of the first 'x' found, etc. You cannot successfully work with any character that is not unique in the string, or that is not the first occurrence of a repeated character.
In order to keep the same structure (which isn't necessary- there is an answer posted using enumerate that I prefer myself) I implemented a queuing approach with your block variable. Each iteration, a character is dropped from the front of block, while the new character is appended to the end.
I also cleaned up some of your needless comparisons with False. You will find that this is not only inefficient, it is frequently wrong, because many of the "boolean" activities you perform will not be on actual boolean values. Get out of the habit of spelling out True/False. Just use if c or if not c.
Here's the result:
text = """kAewtloYgcFQaJNhHVGxXDiQmzjfcpYbzxlWrVcqsmUbCunkfxZWDZjUZMiGqhRRiUvGmYmvnJ"""
words = []
block = '.' + text[0:6]
for i in text[6:]:
block = block[1:] + i # Drop 1st char, append 'i'
toPrint = True
for b in block[:3]:
if not b.isupper():
toPrint = False
if not block[3].islower():
toPrint = False
for b in block[4:]:
if not b.isupper():
toPrint = False
if toPrint and block not in words:
words.append(block)
print (words)
If I understood your question, then according to my opinion there is no need of loop. My this simple code can find required sequence.
# Use this code
text = """kAewtloYgcFQaJNhHVGxXDiQmzjfcpYbzxlWrVcqsmUbCunkfxZWDZjUZMiGqhRRiUvGmYmvnJ"""
import re
print(re.findall("[A-Z]{3}[a-z][A-Z]{3}", text))
Related
Task
Write a program that will decode the secret message by reversing text
between square brackets. The message may contain nested brackets (that
is, brackets within brackets, such as One[owT[Three[ruoF]]]). In
this case, innermost brackets take precedence, similar to parentheses
in mathematical expressions, e.g. you could decode the aforementioned
example like this:
One[owT[Three[ruoF]]]
One[owT[ThreeFour]]
One[owTruoFeerhT]
OneThreeFourTwo
In order to make your own task slightly easier and less tricky, you
have already replaced all whitespaces in the original text with
underscores (“_”) while copying it from the paper version.
Input description
The first and only line of the standard input
consists of a non-empty string of up to 2 · 106 characters which may
be letters, digits, basic punctuation (“,.?!’-;:”), underscores (“_”)
and square brackets (“[]”). You can safely assume that all square
brackets are paired correctly, i.e. every opening bracket has exactly
one closing bracket matching it and vice versa.
Output description
The standard output should contain one line – the
decoded secret message without any square brackets.
Example
For sample input:
A[W_[y,[]]oh]o[dlr][!]
the correct output is:
Ahoy,_World!
Explanation
This example contains empty brackets. Of course, an empty string, when
reversed, remains empty, so we can simply ignore them. Then, as
previously, we can decode this example in stages, first reversing the
innermost brackets to obtain A[W_,yoh]o[dlr][!]. Afterwards, there
are no longer any nested brackets, so the remainder of the task is
trivial.
Below is my program that doesn't quite work
word = input("print something: ")
word_reverse = word[::-1]
while("[" in word and "]" in word):
open_brackets_index = word.index("[")
close_brackets_index = word_reverse.index("]")*(-1)-1
# print(word)
# print(open_brackets_index)
# print(close_brackets_index)
reverse_word_into_quotes = word[open_brackets_index+1:close_brackets_index:][::-1]
word = word[:close_brackets_index]
word = word[:open_brackets_index]
word = word+reverse_word_into_quotes
word = word.replace("[","]").replace("]","[")
print(word)
print(word)
Unfortunately my code only works with one pair of parentheses and I don't know how to fix it.
Thank you in advance for your help
Assuming the re module can be used, this code does the job:
import re
text = 'A[W_[y,[]]oh]o[dlr][!]'
# This scary regular expresion does all the work:
# It says find a sequence that starts with [ and ends with ] and
# contains anything BUT [ and ]
pattern = re.compile('\[([^\[\]]*)\]')
while True:
m = re.search(pattern, text)
if m:
# Here a single pattern like [String], if any, is replaced with gnirtS
text = re.sub(pattern, m[1][::-1], text, count=1)
else:
break
print(text)
Which prints this line:
Ahoy,_World!
I realize the my previous answer has been accepted but, for completeness, I'm submitting a second solution that does NOT use the re module:
text = 'A[W_[y,[]]oh]o[dlr][!]'
def find_pattern(text):
# Find [...] and return the locations of [ (start) ] (end)
# and the in-between str (content)
content = ''
for i,c in enumerate(text):
if c == '[':
content = ''
start = i
elif c == ']':
end = i
return start, end, content
else:
content += c
return None, None, None
while True:
start, end, content = find_pattern(text)
if start is None:
break
# Replace the content between [] with its reverse
text = "".join((text[:start], content[::-1], text[end+1:]))
print(text)
I'm trying to go through a list to find letter combinations that don't exist in English. After a fair amount of arguing, I have a word list that I can mess with. Each word is listed as 'word\n' since each word is on a line. If I wanted to find, say, the word 'winter', if in works but only if I'm looking for 'winter\n'. I can't look just for 'winter' so I can't find individual letter pairs which is the goal.
There's over a quarter million items, so I can't cycle through the list every time, it would take ages. I don't care about index, I just need a true/false of if a letter pair is anywhere in the list.
Sorry if this was a bit rambly, I hope I got my point across. Thanks!
Assuming you don't want to alter your wordlist, it sounds like you're looking for something like this:
def search(word_list, word): # word_list is your list of words, word is the word you're searching for
for w in word_list: # iterate over the list
if w.startswith(word): # check if any of them start with the word you're looking for
return True # return true if a match is found
return False # return false if no matches are found
If you instead want to find a substring anywhere in a word instead of at the beginning, replace w.startswith(word) with word in w.
There are several ways to do that but the easiest one is like this:
flag = True
STRING = 'YOUR STRING'
def check(letter):
for k in range(33 ,127):
if chr(k) == letter:
return True
return False
for i in STRING:
if not check(i):
break
flag = False
The reason for 33 and 127 in for loop is that they are the ascii code for English words and other things(such as: ?,!,*,(,), etc)
Notice: This code is just for one string!
And also you can use regex library to do that.
You can create a variable like pattern like this:
pattern = '[A-Za-z]'
this pattern is for all of the English letters.
And then:
new_string = re.sub(pattern,STRING,'')
if new_string == '':
flag = True
else:
flag = False
sub method is just like replace and you give a pattern, a string and the replace for pattern in string.
So we replace all of the English letters in a string with '' and when there is nothing left on your string it means that your string is made of English letters.
But I'm not sure about syntax for re. You have to take look at doc.
If you are looking for a fast algorithm, DO NOT USE THE FIRST WAY! BECAUSE THE ORDER OF CODE IS O(2) FOR A SINGLE STRING(NOT A LIST)
I'm newbie in Python so that I have a question. I want to change letter in word if the first letter appears more than once. Moreover I want to use input to get the word from user. I'll present the problem using an example:
word = 'restart'
After changes the word should be like this:
word = 'resta$t'
I was trying couple of ideas but always I got stuck. Is there any simple sollutions for this?
Thanks in advance.
EDIT: In response to Simas Joneliunas
It's not my homework. I'm just finished reading some basic Python tutorials and I found some questions that I couldn't solve on my own. My first thought was to separate word into a single letters and then to find out the place of the letter I want to replace by "$". I have wrote that code but I couldn't came up with sollution how to get to specific place and replace it.
word = 'restart'
how_many = {}
for x in word:
how_many=+1
else:
how_many=1
for y in how_many:
if how_many[y] > 0:
print(y,how_many[y])
Using str.replace:
s = "restart"
new_s = s[0] + s[1:].replace(s[0], "$")
Output:
'resta$t'
Try:
"".join([["$" if ch in word[:i] else ch for i, ch in enumerate(word)])
enumerate iterates through the string (i.e. a list of characters) and keeps a running index of the iteration
word[:i] checks the list of chars until the current index, i.e. previously appeared characters
"$" if ch in word[:i] else ch means replace the character at existing position with $ if it appears before others keep the character
"".join() joins the list of characters into a single string.
This is where the python console is handy and lets you experiment. Since you have to keep track of number of letters, for a good visual I would list the alphabet in a list. Then in the loop remove from the list the current letter. If letter does not exist in the list replace the letter with $.
So check if it exists first thing in the loop, if it exists, remove it, if it doesn’t exist replace it from example above.
I want to find out what words can be formed using the names of musical notes.
This question is very similar: Python code that will find words made out of specific letters. Any subset of the letters could be used
But my alphabet also contains "fis","cis" and so on.
letters = ["c","d","e","f","g","a","h","c","fis","cis","dis"]
I have a really long word list with one word per list and want to use
with open(...) as f:
for line in f:
if
to check if each word is part of that "language" and then save it to another file.
my problem is how to alter
>>> import re
>>> m = re.compile('^[abilrstu]+$')
>>> m.match('australia') is not None
True
>>> m.match('dummy') is not None
False
>>> m.match('australian') is not None
False
so it also matches with "fis","cis" and so on.
e.g. "fish" is a match but "ifsh" is not a match.
I believe ^(fis|cis|dis|[abcfhg])+$ will do the job.
Some deconstruction of what's going on here:
| workds like OR conjunction
[...] denotes "any symbol from what's inside the brackets"
^ and $ stand for beginning and end of line, respectively
+ stands for "1 or more time"
( ... ) stands for grouping, needed to apply +/*/{} modifiers. Without grouping such modifiers applies to closest left expression
Alltogether this "reads" as "whole string is one or more repetition of fis/cis/dis or one of abcfhg"
This function works, it doesn't use any external libraries:
def func(word, letters):
for l in sorted(letters, key=lambda x: x.length, reverse=True):
word = word.replace(l, "")
return not s
it works because if s=="", then it has been decomposed into your letters.
Update:
It seems that my explanation wasn't clear. WORD.replace(LETTER, "") will replace the note/LETTER in WORD by nothing, here is an example :
func("banana", {'na'})
it will replace every 'na' in "banana" by nothing ('')
the result after this is "ba", which is not a note
not "" means True and not "ba" is false, this is syntactic sugar.
here is another example :
func("banana", {'na', 'chicken', 'b', 'ba'})
it will replace every 'chicken' in "banana" by nothing ('')
the result after this is "banana"
it will replace every 'ba' in "banana" by nothing ('')
the result after this is "nana"
it will replace every 'na' in "nana" by nothing ('')
the result after this is ""
it will replace every 'b' in "" by nothing ('')
the result after this is ""
not "" is True ==> HURRAY IT IS A MELODY !
note: The reason for the sorted by length is because otherwise, the second example would not have worked. The result after deleting "b" would be "a", which can't be decomposed in notes.
You can calculate the number of letters of all units (names of musical notes), which are in the word, and compare this number to the length of the word.
from collections import Counter
units = {"c","d","e","f","g","a","h", "fis","cis","dis"}
def func(word, units=units):
letters_count = Counter()
for unit in units:
num_of_units = word.count(unit)
letters_count[unit] += num_of_units * len(unit)
if len(unit) == 1:
continue
# if the unit consists of more than 1 letter (e.g. dis)
# check if these letters are in one letter units
# if yes, substruct the number of repeating letters
for letter in unit:
if letter in units:
letters_count[letter] -= num_of_units
return len(word) == sum(letters_count.values())
print(func('disc'))
print(func('disco'))
# True
# False
A solution with tkinter window opening to choose file:
import re
from tkinter import filedialog as fd
m = re.compile('^(fis|ges|gis|as|ais|cis|des|es|dis|[abcfhg])+$')
matches = list()
filename = fd.askopenfilename()
with open(filename) as f:
for line in f:
if m.match(str(line).lower()) is not None:
matches.append(line[:-1])
print(matches)
This answer was posted as an edit to the question find all words in a certain alphabet with multi character letters by the OP Nivatius under CC BY-SA 4.0.
I have a sentence that I want to parse to check for some conditions:
a) If there is a period and it is followed by a whitespace followed by a lowercase letter
b) If there is a period internal to a sequence of letters with no adjacent whitespace (i.e. www.abc.com)
c) If there is a period followed by a whitespace followed by an uppercase letter and preceded by a short list of titles (i.e. Mr., Dr. Mrs.)
Currently I am iterating through the string (line) and using the next() function to see whether the next character is a space or lowercase, etc. And then I just loop through the line. But how would I check to see what the next, next character would be? And how would I find the previous ones?
line = "This is line.1 www.abc.com. Mr."
t = iter(line)
b = next(t)
for i in line[:len(line)-1]:
a = next(t)
if i == "." and (a.isdigit()): #for example, this checks to see if the value after the period is a number
print("True")
Any help would be appreciated. Thank you.
Regular expressions is what you want.
Since your going to check for a pattern in a string, you can make use of the python's builtin support for regular expressions through re library.
Example:
#To check if there is a period internal to a sequence of letters with no adjacent whitespace
import re
str = 'www.google.com'
pattern = '.*\..*'
obj = re.compile(pattern)
if obj.search(str):
print "Pattern matched"
Similarly generate patterns for the conditions you want to check in your string.
#If there is a period and it is followed by a whitespace followed by a lowercase letter
regex = '.*\. [a-z].*'
You can generate and test your regular expressions online using this simple tool
Read more extensively about re library here
You can use multiple next operations to get more data
line = "This is line.1 www.abc.com. Mr."
t = iter(line)
b = next(t)
for i in line[:len(line)-1]:
a = next(t)
c = next(t)
if i == "." and (a.isdigit()): #for example, this checks to see if the value after the period is a number
print("True")
You can get previous ones by saving your iterations to a temporary list