So, this has been my project for a long time, and eventually, I have made an anagram solver in python 3.4, except it's supposed to find anagram for the word + a random letter. I have worked out all the error messages, but there are no more errors, it just doesn't do it. All help appreciated. I have had a lot of helpful comments, but it is still not working. I have updated the code in the question.(Here is the file I used with all the words of the dictionary on different lines, it's really helpful and I had to look for something like this for months.)
file = open("C:\\my stuff\\brit-a-z.txt","r")
def anagram(word):
for alpha in range(ord('a'), ord('z') + 1):
newletter = chr(alpha)
for line in file:
ref = line.strip()
word1 = list(word)
list.append(newletter)
word1_list.sort()
ref_list = list(line)
ref_list.sort()
if word1_list == ref_list:
print(line)
while True:
inp = input()
anagram(inp)
.
This should do what you need.
with open("C:\\my_folders_are_in_here\\brit-a-z.txt", 'r') as f:
check_list = [x.strip() for x in f.readlines()]
def anagram(word):
for alpha in range(ord('a'), ord('z') + 1):
newletter = chr(alpha)
for line in check_list:
word1_list = list(word + newletter)
word1_list.sort()
ref_list = list(line)
ref_list.sort()
if word1_list == ref_list:
print(line)
while True:
inp = input()
anagram(inp)
I took advantage of the chr() and ord() built-in function to remove the long if that converts alpha into newletter.
Reading lines from file in Python also includes newline characters.
So if one line in the file is the word "the" for example, assigning ref = line, ref will equal "the\n"(or "the\r\n"). Your sorted ref_list then becomes ['\n', 'e', 'h', 't']
Reading from keyboard using input(), however, does not include newline characters. Your word1_list never contains a '\n', thus, word1_list and ref_list will never be equal.
Fix: change ref = line into ref = line.strip() to remove newline characters.
Related
I'd like to create a program in python 3 to find how many time a specific words appears in txt files and then to built an excel tabel with these values.
I made this function but at the end when I recall the function and put the input, the progam doesn't work. Appearing this sentence: unindent does not match any outer indentation level
def wordcount(filename, listwords):
try:
file = open( filename, "r")
read = file.readlines()
file.close()
for x in listwords:
y = x.lower()
counter = 0
for z in read:
line = z.split()
for ss in line:
l = ss.lower()
if y == l:
counter += 1
print(y , counter)
Now I try to recall the function with a txt file and the word to find
wordcount("aaa.txt" , 'word' )
Like output I'd like to watch
word 4
thanks to everybody !
Here is an example you can use to find the number of time a specific word is in a text file;
def searching(filename,word):
counter = 0
with open(filename) as f:
for line in f:
if word in line:
print(word)
counter += 1
return counter
x = searching("filename","wordtofind")
print(x)
The output will be the word you try to find and the number of time it occur.
As short as possible:
def wordcount(filename, listwords):
with open(filename) as file_object:
file_text = file_object.read()
return {word: file_text.count(word) for word in listwords}
for word, count in wordcount('aaa.txt', ['a', 'list', 'of', 'words']).items():
print("Count of {}: {}".format(word, count))
Getting back to mij's comment about passing listwofwords as an actual list: If you pass a string to code that expects a list, python will interpret the string as a list of characters, which can be confusing if this behaviour is unfamiliar.
Good afternoon guys,
Today i have been asked to write the following function:
def compareurl(url1,url2,enc,n)
This function compares two urls and return a list containing:
[word,occ_in_url1,occ_in_u2]
where:
word ---> word with n lenght
occ_in_url1 ---> times word in url1
occ_in_url2 ---> times word in url2
So I started writing the function, this is what i have wrote so far:
def compare_url(url1,url2,enc,n):
from urllib.request import urlopen
with urlopen('url1') as f1:
readpage1 = f1.read()
decodepage1 = readpage1.decode('enc')
with urlopen('url2') as f2:
readpage2 = f2.read()
decodepage2 = readpage2.decode('enc')
all_lower1 = decodepage1.lower()
all_lower2 = decodepage2.lower()
import string
all_lower1nopunctuation = "".join(l for l in all_lower1 if l not in string.punctuation)
all_lower2nopunctuation = "".join(l for l in all_lower2 if l not in string.punctuation)
for word1 in all_lower1nopunctuation:
if len(word1) == k:
all_lower1nopunctuation.count(word1)
for word2 in all_lower2nopunctuation:
if len(word2) == k:
all_lower2opunctuation.count(word2)
return(word1,all_lower1nopunctuation.count(word1),all_lower2opunctuation.count(word2))
return(word2,all_lower1nopunctuation.count(word1),all_lower2opunctuation.count(word2))
But this code doesn't work in the way I thought, actually it doesn't work at all.
I would also like to:
sort the returning list decreasingly (from the word which return the most times)
if 2 words occurs the same number of times, they must be returned in
alphabetical order
There are some typos in your code (watch out for those in the future) but there are some python problems (or things that can be improved) as well.
First of all, your imports should come in the top of the document
from urllib.request import urlopen
import string
You should call urlopen with a string, and that's what you are doing, but this string is 'url1' and not 'http://...'. You don't use variables inside quotes:
with urlopen(url1) as f1: #remove quotes
readpage1 = f1.read()
decodepage1 = readpage1.decode(enc) #remove quotes
with urlopen(url2) as f2: #remove quotes
readpage2 = f2.read()
decodepage2 = readpage2.decode(enc) #remove quotes
You need to improve your all_lower1nopunctuation initialization. You are replacing stackoverflow.com with stackoverflowcom, which should actually be stackoverflow com.
#all_lower1nopunctuation = "".join(l for l in all_lower1 if l not in string.punctuation)
#the if statement should be after 'l' and before 'for'
#you should include 'else' to replace the punctuation with a space
all_lower1nopunctuation = ''.join(l if l not in string.punctuation
else ' ' for l in all_lower1)
all_lower2nopunctuation = ''.join(l if l not in string.punctuation
else ' ' for l in all_lower2)
Merge both for into one. Also add the found word in a set (list of unique elements).
all_lower1nopunctuation.count(word1) returns the number of times word1 appears in all_lower1nopunctuation. It doesn't increment a counter.
for word1 in all_lower1nopunctuation doesn't work because all_lower1nopunctuation is a string (and not a list). Transform it into a list with .split(' ').
.replace('\n', '') removes all line breaks, otherwise they would be counted as words too.
#for word1 in all_lower1nopunctuation:
# if len(word1) == k: #also, this should be == n, not == k
# all_lower1nopunctuation.count(word1)
#for word2 in all_lower2nopunctuation:
# if len(word2) == k:
# all_lower2opunctuation.count(word2)
word_set = set([])
for word in all_lower1nopunctuation.replace('\n', '').split(' '):
if len(word) == n and word in all_lower2nopunctuation:
word_set.add(word) #set uses .add() instead of .append()
Now that you have a set of words that appear on both urls, you need to store how many word is in each url.
The following code will ensure you have a list of tuples as you asked
count_list = []
for final_word in word_set:
count_list.append((final_word,
all_lower1nopunctuation.count(final_word),
all_lower2nopunctuation.count(final_word)))
Returning means the function is finished and the interpreter continues wherever it was before the function was called, so whatever comes after the return is irrelevant.
As said by RemcoGerlich.
Your code will always only return the first return, so you need to merge both returns into one.
#return(word1,all_lower1nopunctuation.count(word1),all_lower2opunctuation.count(word2))
#return(word2,all_lower1nopunctuation.count(word1),all_lower2opunctuation.count(word2))
return(count_list) # which contains a list of tuples with all words and its counts
TL;DR
from urllib.request import urlopen
import string
def compare_url(url1,url2,enc,n):
with urlopen(url1) as f1:
readpage1 = f1.read()
decodepage1 = readpage1.decode(enc)
with urlopen(url2) as f2:
readpage2 = f2.read()
decodepage2 = readpage2.decode(enc)
all_lower1 = decodepage1.lower()
all_lower2 = decodepage2.lower()
all_lower1nopunctuation = ''.join(l if l not in string.punctuation
else ' ' for l in all_lower1)
all_lower2nopunctuation = ''.join(l if l not in string.punctuation
else ' ' for l in all_lower2)
word_set = set([])
for word in all_lower1nopunctuation.replace('\n', '').split(' '):
if len(word) == n and word in all_lower2nopunctuation:
word_set.add(word)
count_list = []
for final_word in word_set:
count_list.append((final_word,
all_lower1nopunctuation.count(final_word),
all_lower2nopunctuation.count(final_word)))
return(count_list)
url1 = 'https://www.tutorialspoint.com/python/list_count.htm'
url2 = 'https://stackoverflow.com/a/128577/7067541'
for word_count in compare_url(url1,url2, 'utf-8', 5):
print (word_count)
So I'm trying to do this problem
Write a program that reads a file named text.txt and prints the following to the
screen:
The number of characters in that file
The number of letters in that file
The number of uppercase letters in that file
The number of vowels in that file
I have gotten this so far but I am stuck on step 2 this is what I got so far.
file = open('text.txt', 'r')
lineC = 0
chC = 0
lowC = 0
vowC = 0
capsC = 0
for line in file:
for ch in line:
words = line.split()
lineC += 1
chC += len(ch)
for letters in file:
for ch in line:
print("Charcter Count = " + str(chC))
print("Letter Count = " + str(num))
You can do this using regular expressions. Find all occurrences of your pattern as your list and then finding the length of that list.
import re
with open('text.txt') as f:
text = f.read()
characters = len(re.findall('\S', text))
letters = len(re.findall('[A-Za-z]', text))
uppercase = len(re.findall('[A-Z]', text))
vowels = len(re.findall('[AEIOUYaeiouy]', text))
The answer above uses regular expressions, which are very useful and worth learning about if you haven't used them before. Bunji's code is also more efficient, as looping through characters in a string in Python is relatively slow.
However, if you want to try doing this using just Python, take a look at the code below. A couple of points: First, wrap your open() inside a using statement, which will automatically call close() on the file when you are finished. Next, notice that Python lets you use the in keyword in all kinds of interesting ways. Anything that is a sequence can be "in-ed", including strings. You could replace all of the string.xxx lines with your own string if you would like.
import string
chars = []
with open("notes.txt", "r") as f:
for c in f.read():
chars.append(c)
num_chars = len(chars)
num_upper = 0;
num_vowels = 0;
num_letters = 0
vowels = "aeiouAEIOU"
for c in chars:
if c in vowels:
num_vowels += 1
if c in string.ascii_uppercase:
num_upper += 1
if c in string.ascii_letters:
num_letters += 1
print(num_chars)
print(num_letters)
print(num_upper)
print(num_vowels)
I have got this python program which reads through a wordlist file and checks for the suffixes ending which are given in another file using endswith() method.
the suffixes to check for is saved into the list: suffixList[]
The count is being taken using suffixCount[]
The following is my code:
fd = open(filename, 'r')
print 'Suffixes: '
x = len(suffixList)
for line in fd:
for wordp in range(0,x):
if word.endswith(suffixList[wordp]):
suffixCount[wordp] = suffixCount[wordp]+1
for output in range(0,x):
print "%-6s %10i"%(prefixList[output], prefixCount[output])
fd.close()
The output is this :
Suffixes:
able 0
ible 0
ation 0
the program is unable to reach this loop :
if word.endswith(suffixList[wordp]):
You need to strip the newline:
word = ln.rstrip().lower()
The words are coming from a file so each line ends with a newline character. You are then trying to use endswith which always fails as none of your suffixes end with a newline.
I would also change the function to return the values you want:
def store_roots(start, end):
with open("rootsPrefixesSuffixes.txt") as fs:
lst = [line.split()[0] for line in map(str.strip, fs)
if '#' not in line and line]
return lst, dict.fromkeys(lst[start:end], 0)
lst, sfx_dict = store_roots(22, 30) # List, SuffixList
Then slice from the end and see if the substring is in the dict:
with open('longWordList.txt') as fd:
print('Suffixes: ')
mx, mn = max(sfx_dict, key=len), min(sfx_dict, key=len)
for ln in map(str.rstrip, fd):
suf = ln[-mx:]
for i in range(mx-1, mn-1, -1):
if suf in sfx_dict:
sfx_dict[suf] += 1
suf = suf[-i:]
for k,v in sfx_dict:
print("Suffix = {} Count = {}".format(k,v))
Slicing the end of the string incrementally should be faster than checking every string especially if you have numerous suffixes that are the same length. At most it does mx - mn iterations, so if you had 20 four character suffixes you would only need to check the dict once, only one n length substring can be matched at a time so we would kill n length substrings at the one time with a single slice and lookup.
You could use a Counter to count the occurrences of suffix:
from collections import Counter
with open("rootsPrefixesSuffixes.txt") as fp:
List = [line.strip() for line in fp if line and '#' not in line]
suffixes = List[22:30] # ?
with open('longWordList.txt') as fp:
c = Counter(s for word in fp for s in suffixes if word.rstrip().lower().endswith(s))
print(c)
Note: add .split()[0] if there are more than one words per line you want to ignore, otherwise this is unnecessary.
Write a program that reads the contents of a random text file. The program should create a dictionary in which the keys are individual words found in the file and the values are the number of times each word appears.
How would I go about doing this?
def main():
c = 0
dic = {}
words = set()
inFile = open('text2', 'r')
for line in inFile:
line = line.strip()
line = line.replace('.', '')
line = line.replace(',', '')
line = line.replace("'", '') #strips the punctuation
line = line.replace('"', '')
line = line.replace(';', '')
line = line.replace('?', '')
line = line.replace(':', '')
words = line.split()
for x in words:
for y in words:
if x == y:
c += 1
dic[x] = c
print(dic)
print(words)
inFile.close()
main()
Sorry for the vague question. Never asked any questions here before. This is what I have so far. Also, this is the first ever programming I've done so I expect it to be pretty terrible.
with open('path/to/file') as infile:
# code goes here
That's how you open a file
for line in infile:
# code goes here
That's how you read a file line-by-line
line.strip().split()
That's how you split a line into (white-space separated) words.
some_dictionary['abcd']
That's how you access the key 'abcd' in some_dictionary.
Questions for you:
What does it mean if you can't access the key in a dictionary?
What error does that give you? Can you catch it with a try/except block?
How do you increment a value?
Is there some function that GETS a default value from a dict if the key doesn't exist?
For what it's worth, there's also a function that does almost exactly this, but since this is pretty obviously homework it won't fulfill your assignment requirements anyway. It's in the collections module. If you're interested, try and figure out what it is :)
There are at least three different approaches to add a new word to the dictionary and count the number of occurences in this file.
def add_element_check1(my_dict, elements):
for e in elements:
if e not in my_dict:
my_dict[e] = 1
else:
my_dict[e] += 1
def add_element_check2(my_dict, elements):
for e in elements:
if e not in my_dict:
my_dict[e] = 0
my_dict[e] += 1
def add_element_except(my_dict, elements):
for e in elements:
try:
my_dict[e] += 1
except KeyError:
my_dict[e] = 1
my_words = {}
with open('pathtomyfile.txt', r) as in_file:
for line in in_file:
words = [word.strip().lower() word in line.strip().split()]
add_element_check1(my_words, words)
#or add_element_check2(my_words, words)
#or add_element_except(my_words, words)
If you are wondering which is the fastest? The answer is: it depends. It depends on how often a given word might occur in the file. If a word does only occur (relatively) few times, the try-except would be the best choice in your case.
I have done some simple benchmarks here
This is a perfect job for the built in Python Collections class. From it, you can import Counter, which is a dictionary subclass made for just this.
How you want to process your data is up to you. One way to do this would be something like this
from collections import Counter
# Open your file and split by white spaces
with open("yourfile.txt","r") as infile:
textData = infile.read()
# Replace characters you don't want with empty strings
textData = textData.replace(".","")
textData = textData.replace(",","")
textList = textData.split(" ")
# Put your data into the counter container datatype
dic = Counter(textList)
# Print out the results
for key,value in dic.items():
print "Word: %s\n Count: %d\n" % (key,value)
Hope this helps!
Matt