So I am currently trying to build a Caesar encrypted that automatically tries all the possibilities and compares them to a big list of words to see if it is a real word, so some sort of dictionary attack I guess.
I found a list with a lot of German words, and they even are split so that each word is on a new line. Currently, I am struggling with comparing the sentence that I currently have with the whole word list. So that when the program sees that a word in my sentence is also a word in the Word list that it prints out that this is a real word and possible the right sentence.
So this is how far I currently am, I have not included the code with which I try all the 26 letters. Only my way to look through the word list and compares it to a sentence. Maybe someone can tell me what I am doing wrong and why it doesn't work:
No idea why it doesn't work. I have also tried it with regular expressions but nothing works. The list is really long (166k Words).
There are /n at the en of each word of the list you created from the file, so the they will never be the same as what they are compared to.
Remove the newline character before appending (you can, for example, wordlist.append(line.rstrip())
I'm new to python however after scouring the internet and going back over my study, I cannot seem to find how to find duplicates of a word within multiple sentences. my aim is to define how many times the word python occurs within these strings. I have tried the split() method and count.(python) and even tried to make a dictionary and word_counter which initially I have been taught to do as part of the basics however nothin in my study has shown me anything similar to this before. i need to be able to display the frequency of the word. python occurs 4 times. any help would be very appreciated
python_occurs = ["welcome to our Python program", "Python is my favorite language!", "I am afraid of Pythons", "I love Python"]
A straight-forward approach is to iterate over every word using split. For each word, it's converted to lowercase and the number of times "python" occurs in it is counted using count.
I guess the reason for you approach not working might be that you forgot to change the letters to lowercase.
python_occurs = ["welcome to our Python program", "Python is my favorite language!", "I am afraid of Pythons", "I love Python"]
count = 0
for sentence in python_occurs:
for word in sentence.split():
# lower is necessary because we want to be case-insensitive
count += word.lower().count("python")
Iam trying to check if keyword occurs in the sentence and then add the said keyword. I managed to write this solution but it only works if the search term is one word (said keyword). How to improve it to work when keyword occurs in a sentence? Here is my code:
keyword = []
for i in keywords['keyword']:
keyword.append(i) #this was in a dataframe after readin xlsx file with Pandas so I made it a list
hit = []
for i in phrase['Search term']:
if i in keyword:
hit.append(i)
else:
hit.append("blank")
phrase['Keyword'] = hit
This only works when a single keyword occurs in "Phrase" - like "cat" but won't work if the word "cat" is part of a sentence. Any pointers to improve it ?
Thank you all in advance
I am not sure what you are trying to achieve here. However, I'm going to point an issue that might help you.
In your comment you said that keyword is a list of words and phrase['Search term'] is a list of sentences.
for i in phrase['Search term']:
if i in keyword:
hit.append(i)
...
In this part of your code you are checking if a entire sentence i can be found in any of the single words in keyword. That logic is flawed, you need to check if a word exists in the sentence, not the other way around.
Something like this:
for i in phrase['Search term']:
for j in keyword:
if j in i:
hit.append(i)
...
This is an example you will need to adjust to your purpose, since now it will check word for word.
The code above may lead to undesirable behavior since it checks if a smaller string(word) exists inside a larger string(sentence). It doesn't really check for words. For example, if looking for cat in a sentence like:
this patient is catatonic
Will trigger your if statement as True. A way to minimize this is spliting your sentence in a list of words and checking if the word is found inside the list. Like this:
for i in phrase['Search term']:
for j in keyword:
if j in i.split(" "):
hit.append(i)
...
I'm trying to count the number of occurrences of verbal contractions in some speeches I've gathered. One particular speech looks like this:
speech = "I've changed the path of the economy, and I've increased jobs in our own
home state. We're headed in the right direction - you've all been a great help."
So, in this case, I'd like to count four (4) contractions. I have a list of contractions, and here are some of the first few terms:
contractions = {"ain't": "am not; are not; is not; has not; have not",
"aren't": "are not; am not",
"can't": "cannot",...}
My code looks something like this, to begin with:
count = 0
for word in speech:
if word in contractions:
count = count + 1
print count
I'm not getting anywhere with this, however, as the code's iterating over every single letter, as opposed to whole words.
Use str.split() to split your string on whitespace:
for word in speech.split():
This will split on arbitrary whitespace; this means spaces, tabs, newlines, and a few more exotic whitespace characters, and any number of them in a row.
You may need to lowercase your words using str.lower() (otherwise Ain't won't be found, for example), and strip punctuation:
from string import punctuation
count = 0
for word in speech.lower().split():
word = word.strip(punctuation)
if word in contractions:
count += 1
I use the str.strip() method here; it removes everything found in the string.punctuation string from the start and end of a word.
You're iterating over a string. So the items are characters. To get the words from a string you can use naive methods like str.split() that makes this for you (now you can iterate over a list of strings (the words splitted on the argument of str.split(), default: split on whitespace). There is even re.split(), which is more powerful. But I don't think that you need splitting the text with regexes.
What you have to do at least is to lowercase your string with str.lower() or to put all possible occurences (also with capital letters) in the dictionary. I strongly recommending the first alternative. The latter isn't really practicable. Removing the punctuation is also a duty for this. But this is still naive. If you're need a more sophisticated method, you have to split the text via a word tokenizer. NLTK is a good starting point for that, see the nltk tokenizer. But I strongly feel that this problem is not your major one or affects you really in solving your question. :)
speech = """I've changed the path of the economy, and I've increased jobs in our own home state. We're headed in the right direction - you've all been a great help."""
# Maybe this dict makes more sense (list items as values). But for your question it doesn't matter.
contractions = {"ain't": ["am not", "are not", "is not", "has not", "have not"], "aren't": ["are not", "am not"], "i've": ["i have", ]} # ...
# with re you can define advanced regexes, but maybe
# from string import punctuation (suggestion from Martijn Pieters answer
# is still enough for you)
import re
def abbreviation_counter(input_text, abbreviation_dict):
count = 0
# what you want is a list of words. str.split() does this job for you.
# " " is default and you can also omit this. But if you really need better
# methods (see answer text abover), you have to take a word tokenizer tool
# or have to write your own.
for word in input_text.split(" "):
# and also clean word (remove ',', ';', ...) afterwards. The advantage of
# using re over `from string import punctuation` is that you have more
# control in what you want to remove. That means that you can add or
# remove easily any punctuation mark. It could be very handy. It could be
# also overpowered. If the latter is the case, just stick to Martijn Pieters
# solution.
if re.sub(',|;', '', word).lower() in abbreviation_dict:
count += 1
return count
print abbrev_counter(speech, contractions)
2 # yeah, it worked - I've included I've in your list :)
It's a litte bit frustrating to give an answer at the same time as Martijn Pieters does ;), but I hope I still have generated some values for you. That's why I've edited my question to give you some hints for future work in addition.
A for loop in Python iterates over all elements in an iterable. In the case of strings the elements are the characters.
You need to split the string into a list (or tuple) of strings that contain the words. You can use .split(delimiter) for this.
Your problem is quite common, so Python has a shortcut: speech.split() splits at any number of spaces/tabs/newlines, so you only get your words in the list.
So your code should look like this:
count = 0
for word in speech.split():
if word in contractions:
count = count + 1
print(count)
speech.split(" ") works too, but only splits on whitespaces but not tabs or newlines and if there are double spaces you'd get empty elements in your resulting list.
my friend and i are studying words and their meaning and i need help trying to get the word that we want to learn about. I have a list of words that we type into the command line that will tell the computer we want the meaning or history of the word we want. My problem is finding out which word we typed in. here is my code so far:
Meaning = {"Meaning", "define", "history"}
text1 = raw_input("Text: ")
tokens = commands.lower().split()
if Meaning.intersection(tokens):
#Here is the problem, because i don't know what word we might type
#so i need it to check the word after the Meaning. How can i do this?
I am using the program to type into a website and get information on the word, so i can't add the word and the definition because it would be too hard to do every word i know. How can i get the word after the meaning?, Thank you.
Set up your dictionary as set of pairs.
terms = {'word1': 'meaning', 'word2': 'othermeaning'}
print terms['word1']