Print frequency of words, in a sentence, in a single line - python

I have a sentence "The quick fox jumps over the lazy dog", and I have counted the number of times each word occurs in this sentence. The output should be like this:
brown:1,dog:1,fox:1,jumps:1,lazy:1,over:1,quick:1,the:2
There should be no spaces between the characters in this output, and there should be commas between the words/numbers.
The output from my program looks like this:
,brown:1,dog:1,fox:1,jumps:1,lazy:1,over:1,quick:1,the:2
I find that there is a comma place before 'brown'. Is there an easier way to print this?
filename = os.path.basename(path)
with open(filename, 'r+') as f:
fline = f.read()
fwords = fline.split()
allwords = [word.lower() for word in fwords]
sortwords = list(set(allwords))
r = sorted(sortwords, key=str.lower)
finalwords = ','.join(r)
sys.stdout.write(str(finalwords))
print '\n'
countlist = {}
for word in allwords:
try: countlist[word] += 1
except KeyError: countlist[word] = 1
for c,num in sorted(countlist.items()):
sys.stdout.write(",{:}:{:}".format(c, num))

A couple alternate ways of making the word list. First, a one-liner:
countlist = {word:allwords.count(word) for word in allwords}
As pointed out by DSM, that method can be slow with long lists. An alternate would be to use defaultdict:
from itertools import defaultdict
countlist = defaultdict(int)
for word in allwords:
countlist[word] += 1
For output, join individual word counts with a ,, which avoids having one at the beginning:
sys.stdout.write(",".join(["{:}:{:}".format(key, value) for key, value in countlist .items()]))

Related

Python treat words with commas the same as those without in a dictionary

I am making a program, that reads a file and makes a dictionary, that shows how many times a word has been used:
filename = 'for_python.txt'
with open(filename) as file:
contents = file.read().split()
dict = {}
for word in contents:
if word not in dict:
dict[word] = 1
else:
dict[word] += 1
dict = sorted(dict.items(), key=lambda x: x[1], reverse=True)
for i in dict:
print(i[0], i[1])
It works, but it treats words that have commas in them as different words, which I do not want it to do. Is there a simple and efficient way to do this?
Remove all commas before splitting them
filename = 'for_python.txt'
with open(filename) as file:
contents = file.read().replace(",", "").split()
You are splitting the whole data based on " " as the delimiter but not doing the same for commas. You can do so by splitting the words further using commas. Here's how:
...
for word in contents:
new_words = word.split(',')
for new_word in new_words:
if new_word not in dict:
dict[new_word] = 1
else:
dict[new_word] += 1
...
I'd suggest you strip() with the different punctuation chars when using the word. Also don't use builtin dict name, its the dictionnary constructor
import string
words = {}
for word in contents:
word = word.strip(string.punctuation)
if word not in words:
words[word] = 1
else:
words[word] += 1
For you know, it exists collections.Counter that does this jobs
import string
from collections import Counter
filename = 'test.txt'
with open(filename) as file:
contents = file.read().split()
words = Counter(word.strip(string.punctuation) for word in contents)
for k, v in words.most_common(): # All content, in occurence conut order descreasingly
print(k, v)
for k, v in words.most_common(5): # Only 5 most occurrence
print(k, v)

How to count one specific word in Python?

I want to count a specific word in the file.
For example how many times does 'apple' appear in the file.
I tried this:
#!/usr/bin/env python
import re
logfile = open("log_file", "r")
wordcount={}
for word in logfile.read().split():
if word not in wordcount:
wordcount[word] = 1
else:
wordcount[word] += 1
for k,v in wordcount.items():
print k, v
by replacing 'word' with 'apple', but it still counts all possible words in my file.
Any advice would be greatly appreciated. :)
You could just use str.count() since you only care about occurrences of a single word:
with open("log_file") as f:
contents = f.read()
count = contents.count("apple")
However, to avoid some corner cases, such as erroneously counting words like "applejack", I suggest that you use a regex:
import re
with open("log_file") as f:
contents = f.read()
count = sum(1 for match in re.finditer(r"\bapple\b", contents))
\b in the regex ensures that the pattern begins and ends on a word boundary (as opposed to a substring within a longer string).
If you only care about one word then you do not need to create a dictionary to keep track of every word count. You can just iterate over the file line-by-line and find the occurrences of the word you are interested in.
#!/usr/bin/env python
logfile = open("log_file", "r")
wordcount=0
my_word="apple"
for line in logfile:
if my_word in line.split():
wordcount += 1
print my_word, wordcount
However, if you also want to count all the words, and just print the word count for the word you are interested in then these minor changes to your code should work:
#!/usr/bin/env python
import re
logfile = open("log_file", "r")
wordcount={}
for word in logfile.read().split():
if word not in wordcount:
wordcount[word] = 1
else:
wordcount[word] += 1
# print only the count for my_word instead of iterating over entire dictionary
my_word="apple"
print my_word, wordcount[my_word]
You can use the Counter dictionary for this
from collections import Counter
with open("log_file", "r") as logfile:
word_counts = Counter(logfile.read().split())
print word_counts.get('apple')
This is an example of counting words in array of words. I am assuming file reader will be pretty much similar.
def count(word, array):
n=0
for x in array:
if x== word:
n+=1
return n
text= 'apple orange kiwi apple orange grape kiwi apple apple'
ar = text.split()
print(count('apple', ar))
def Freq(x,y):
d={}
open_file = open(x,"r")
lines = open_file.readlines()
for line in lines:
word = line.lower()
words = word.split()
for i in words:
if i in d:
d[i] = d[i] + 1
else:
d[i] = 1
print(d)
fi=open("text.txt","r")
cash=0
visa=0
amex=0
for line in fi:
k=line.split()
print(k)
if 'Cash' in k:
cash=cash+1
elif 'Visa' in k:
visa=visa+1
elif 'Amex' in k:
amex=amex+1
print("# persons paid by cash are:",cash)
print("# persons paid by Visa card are :",visa)
print("#persons paid by Amex card are :",amex)
fi.close()

How to create a dictionary for a text file

My program opens a file and it can word count the words contained in it but i want to create a dictionary consisting of all the unique words in the text
for example if the word 'computer' appears three times i want that to count as one unique word
def main():
file = input('Enter the name of the input file: ')
infile = open(file, 'r')
file_contents = infile.read()
infile.close()
words = file_contents.split()
number_of_words = len(words)
print("There are", number_of_words, "words contained in this paragarph")
main()
Use a set. This will only include unique words:
words = set(words)
If you don't care about case, you can do this:
words = set(word.lower() for word in words)
This assumes there is no punctuation. If there is, you will need to strip the punctuation.
import string
words = set(word.lower().strip(string.punctuation) for word in words)
If you need to keep track of how many of each word you have, just replace set with Counter in the examples above:
import string
from collections import Counter
words = Counter(word.lower().strip(string.punctuation) for word in words)
This will give you a dictionary-like object that tells you how many of each word there is.
You can also get the number of unique words from this (although it is slower if that is all you care about):
import string
from collections import Counter
words = Counter(word.lower().strip(string.punctuation) for word in words)
nword = len(words)
#TheBlackCat his solution works but only gives you how much unique words are in the string/file. This solution also shows you how many times it occurs.
dictionaryName = {}
for word in words:
if word not in list(dictionaryName):
dictionaryName[word] = 1
else:
number = dictionaryName.get(word)
dictionaryName[word] = dictionaryName.get(word) + 1
print dictionaryName
tested with:
words = "Foo", "Bar", "Baz", "Baz"
output: {'Foo': 1, 'Bar': 1, 'Baz': 2}
Probably more cleaner and quick solution:
words_dict = {}
for word in words:
word_count = words_dict.get(word, 0)
words_dict[word] = word_count + 1

How to return unique words from the text file using Python

How do I return all the unique words from a text file using Python?
For example:
I am not a robot
I am a human
Should return:
I
am
not
a
robot
human
Here is what I've done so far:
def unique_file(input_filename, output_filename):
input_file = open(input_filename, 'r')
file_contents = input_file.read()
input_file.close()
word_list = file_contents.split()
file = open(output_filename, 'w')
for word in word_list:
if word not in word_list:
file.write(str(word) + "\n")
file.close()
The text file the Python creates has nothing in it. I'm not sure what I am doing wrong
for word in word_list:
if word not in word_list:
every word is in word_list, by definition from the first line.
Instead of that logic, use a set:
unique_words = set(word_list)
for word in unique_words:
file.write(str(word) + "\n")
sets only hold unique members, which is exactly what you're trying to achieve.
Note that order won't be preserved, but you didn't specify if that's a requirement.
Simply iterate over the lines in the file and use set to keep only the unique ones.
from itertools import chain
def unique_words(lines):
return set(chain(*(line.split() for line in lines if line)))
Then simply do the following to read all unique lines from a file and print them
with open(filename, 'r') as f:
print(unique_words(f))
This seems to be a typical application for a collection:
...
import collections
d = collections.OrderedDict()
for word in wordlist: d[word] = None
# use this if you also want to count the words:
# for word in wordlist: d[word] = d.get(word, 0) + 1
for k in d.keys(): print k
You could also use a collection.Counter(), which would also count the elements you feed in. The order of the words would get lost though. I added a line for counting and keeping the order.
string = "I am not a robot\n I am a human"
list_str = string.split()
print list(set(list_str))
def unique_file(input_filename, output_filename):
input_file = open(input_filename, 'r')
file_contents = input_file.read()
input_file.close()
duplicates = []
word_list = file_contents.split()
file = open(output_filename, 'w')
for word in word_list:
if word not in duplicates:
duplicates.append(word)
file.write(str(word) + "\n")
file.close()
This code loops over every word, and if it is not in a list duplicates, it appends the word and writes it to a file.
Using Regex and Set:
import re
words = re.findall('\w+', text.lower())
uniq_words = set(words)
Other way is creating a Dict and inserting the words like keys:
for i in range(len(doc)):
frase = doc[i].split(" ")
for palavra in frase:
if palavra not in dict_word:
dict_word[palavra] = 1
print dict_word.keys()
The problem with your code is word_list already has all possible words of the input file. When iterating over the loop you are basically checking if a word in word_list is not present in itself. So it'll always be false. This should work.. (Note that this wll also preserve the order).
def unique_file(input_filename, output_filename):
z = []
with open(input_filename,'r') as fileIn, open(output_filename,'w') as fileOut:
for line in fileIn:
for word in line.split():
if word not in z:
z.append(word)
fileOut.write(word+'\n')
Use a set. You don't need to import anything to do this.
#Open the file
my_File = open(file_Name, 'r')
#Read the file
read_File = my_File.read()
#Split the words
words = read_File.split()
#Using a set will only save the unique words
unique_words = set(words)
#You can then print the set as a whole or loop through the set etc
for word in unique_words:
print(word)
try:
with open("gridlex.txt",mode="r",encoding="utf-8")as india:
for data in india:
if chr(data)==chr(data):
print("no of chrats",len(chr(data)))
else:
print("data")
except IOError:
print("sorry")

python dictionary function, textfile

I would like to define a function scaryDict() which takes one parameter (a textfile) and returns the words from the textfile in alphabetical order, basically produce a dictionary but does not print any one or two letter words.
Here is what I have so far...it isn't much but I don't know the next step
def scaryDict(fineName):
inFile = open(fileName,'r')
lines = inFile.read()
line = lines.split()
myDict = {}
for word in inFile:
myDict[words] = []
#I am not sure what goes between the line above and below
for x in lines:
print(word, end='\n')
You are doing fine till line = lines.split(). But your for loop must loop through the line array, not the inFile.
for word in line:
if len(word) > 2: # Make sure to check the word length!
myDict[word] = 'something'
I'm not sure what you want with the dictionary (maybe get the word count?), but once you have it, you can get the words you added to it by,
allWords = myDict.keys() # so allWords is now a list of words
And then you can sort allWords to get them in alphabetical order.
allWords.sort()
I would store all of the words into a set (to eliminate dups), then sort that set:
#!/usr/bin/python3
def scaryDict(fileName):
with open(fileName) as inFile:
return sorted(set(word
for line in inFile
for word in line.split()
if len(word) > 2))
scaryWords = scaryDict('frankenstein.txt')
print ('\n'.join(scaryWords))
Also keep in mind as of 2.5 the 'with' file contains an enter and exit methods which can prevent some issues (such as that file never getting closed)
with open(...) as f:
for line in f:
<do something with line>
Unique set
Sort the set
Now you can put it all together.
sorry that i am 3 years late : ) here is my version
def scaryDict():
infile = open('filename', 'r')
content = infile.read()
infile.close()
table = str.maketrans('.`/()|,\';!:"?=-', 15 * ' ')
content = content.translate(table)
words = content.split()
new_words = list()
for word in words:
if len(word) > 2:
new_words.append(word)
new_words = list(set(new_words))
new_words.sort()
for word in new_words:
print(word)

Categories

Resources