I am trying to count elements in a text file. I know I am missing an obvious part, but I can't put my finger on it. This is what I currently have which just produces the count of the letter "f" not the file:
filename = open("output3.txt")
f = open("countoutput.txt", "w")
import collections
for line in filename:
for number in line.split():
print(collections.Counter("f"))
break
import collections
counts = collections.Counter() # create a new counter
with open(filename) as infile: # open the file for reading
for line in infile:
for number in line.split():
counts.update((number,))
print("Now there are {} instances of {}".format(counts[number], number))
print(counts)
Related
I want to divide a file based on a specific word, and based on this word if he finds it, I want the line above it, then puts the line above it and the word with the content in a file, and it stops when he finds the line above the specified word ? plz help
this is mu code :
import collections
import itertools
import sys
count = 0
done = False
with open("file".txt") as in_file:
before = collections.deque(maxlen=3)
while not done:
with open(f"newfile{count}.txt", "w") as out_file:
while not done:
try:
line = next(in_file).strip()
except StopIteration:
done = True
break
if "X-IronPort-RCPT-TO" in line:
out_file.write(line)
before.append('\n')
break
else:
out_file.writelines(before)
out_file.write('\n')
out_file.write(line)
count += 1
not sure if that what you want
with open("file".txt") as in_file:
lines = in_file.readlines():
for l in range(len(lines)):
if "X-IronPort-RCPT-TO" in lines[l]:
line_above = lines[l-1]
I'm trying to create a game of hangman. When I try to read the file with a list of words it returns an empty array.
Code:
#Making a game of hangman
import random
def rand_word(file_name):
# Retrieves a random word from a file
f = open(file_name, "r")
for num_lines, words in enumerate(f):
pass
num_lines += 1
print(num_lines)
rand_line = random.randint(1,num_lines)
print (rand_line)
file = f.readlines()
print(file)
f.close()
rand_word("words.txt")
You exhausted the file when looping over it in your for loop, so when you use readlines, you're at the end of the file and there's nothing left to read.
You should read your words in a list first, then choose a random one from the list:
import random
def rand_word(filename):
with open(filename) as f:
words = f.readlines()
word = random.choice(words).strip()
return word
Say that I have a file of restaurant names and that I need to search through said file and find a particular string like "Italian". How would the code look if I searched the file for the string and print out the number of restaurants with the same string?
f = open("/home/ubuntu/ipynb/NYU_Notes/2-Introduction_to_Python/data/restaurant-names.txt", "r")
content = f.read()
f.close()
lines = content.split("\n")
with open("/home/ubuntu/ipynb/NYU_Notes/2-Introduction_to_Python/data/restaurant-names.txt") as f:
print ("There are", len(f.readlines()), "restaurants in the dataset")
with open("/home/ubuntu/ipynb/NYU_Notes/2-Introduction_to_Python/data/restaurant-names.txt") as f:
searchlines = f.readlines()
for i, line in enumerate(searchlines):
if "GREEK" in line:
for l in searchlines[i:i+3]: print (l),
print
You could count all the words using a Counter dict and then do lookups for certain words:
from collections import Counter
from string import punctuation
f_name = "/home/ubuntu/ipynb/NYU_Notes/2-Introduction_to_Python/data/restaurant-names.txt"
with open(f_name) as f:
# sum(1 for _ in f) -> counts lines
print ("There are", sum(1 for _ in f), "restaurants in the dataset")
# reset file pointer back to the start
f.seek(0)
# get count of how many times each word appears, at most once per line
cn = Counter(word.strip(punctuation).lower() for line in f for word in set(line.split()))
print(cn["italian"]) # no keyError if missing, will be 0
we use set(line.split()) so if a word appeared twice for a certain restaurant, we would only count it once. That looks for exact matches, if you are also looking to match partials like foo in foobar then it is going to be more complex to create a dataset where you can efficiently lookup multiple words.
If you really just want to count one word all you need to do is use sum how many times the substring appears in a line:
f_name = "/home/ubuntu/ipynb/NYU_Notes/2-Introduction_to_Python/data/restaurant-names.txt"
with open(f_name) as f:
print ("There are", sum(1 for _ in f), "restaurants in the dataset")
f.seek(0)
sub = "italian"
count = sum(sub in line.lower() for line in f)
If you want exact matches, you would need the split logic again or to use a regex with word boundaries.
You input the file as a string.
Then use the count method of strings.
Code:
#Let the file be taken as a string in s1
print s1.count("italian")
I have script that opens and reads text file, separate every word and making a list of those words. I made Counter to count each word from list how many times does it appears. Then I want to export in .csv file each row something like this:
word hello appears 10 times
word house appears 5 times
word tree appears 3 times
...And so on
Can you show me what do I need to change here to make script to work?
from collections import Counter
import re
import csv
cnt = Counter()
writefile = open('test1.csv', 'wb')
writer = csv.writer(writefile)
with open('screenplay.txt') as file: #Open .txt file with text
text = file.read().lower()
file.close()
text = re.sub('[^a-z\ \']+', " ", text)
words = list(text.split()) #Making list of each word
for word in words:
cnt[word] += 1 #Counting how many times word appear
for key, count in cnt.iteritems():
key = text
writer.writerow([cnt[word]])
The big issue is that your second for-loop is happening for every occurrence of every word, not just once for each unique word. You will need to de-dent the whole loop so that it executes after you have finished your counting. Try something like this:
from collections import Counter
import re
import csv
cnt = Counter()
writefile = open('test1.csv', 'wb')
writer = csv.writer(writefile)
with open('screenplay.txt') as file:
text = file.read().lower()
text = re.sub('[^a-z\ \']+', " ", text)
words = list(text.split())
for word in words:
cnt[word] += 1
for key, count in cnt.iteritems(): #De-dent this block
writer.writerow([key,count]) #Output both the key and the count
writefile.close() #Make sure to close your file to guarantee it gets flushed
I have two python files to count the words and frequency
import io
import collections
import codecs
from collections import Counter
with io.open('JNb.txt', 'r', encoding='utf8') as infh:
words = infh.read().split()
with open('e1.txt', 'a') as f:
for word, count in Counter(words).most_common(10):
f.write(u'{} {}\n'.format(word, count).encode('utf8'))
import io
import collections
import codecs
from collections import Counter
with io.open('JNb.txt', 'r', encoding='utf8') as infh:
for line in infh:
words =line.split()
with open('e1.txt', 'a') as f:
for word, count in Counter(words).most_common(10):
f.write(u'{} {}\n'.format(word, count).encode('utf8'))
None of the provides output.
The code contains no syntax error.
Output
താത്കാലിക 1
- 1
ഒഴിവ് 1
അധ്യാപക 1
വാര്ത്തകള് 1
ആലപ്പുഴ 1
ഇന്നത്തെപരിപാടി 1
വിവാഹം 1
അമ്പലപ്പുഴ 1
The actual file contains 100 occurrence of these words.
I am not printing anything, I am writing all to a file(e1)
Update: I tried another one and got result
import collections
import codecs
from collections import Counter
with io.open('JNb.txt', 'r', encoding='utf8') as infh:
words =infh.read().split()
with open('file.txt', 'wb') as f:
for word, count in Counter(words).most_common(10000000):
f.write(u'{} {}\n'.format(word, count).encode('utf8'))
It can count up to 2 GB files in 4Gb RAM
What is the problem here?
I coded up the task and here is my solution.
I have tested the program with a 5.1 GB text file. The program finished in ~20 minutes on a MBP6.2.
Let me know if there are any confusions or suggestions. Best of luck.
from collections import Counter
import io
import sys
cnt = Counter()
if len(sys.argv) < 2:
print("Provide an input file as argument")
sys.exit()
try:
with io.open(sys.argv[1], 'r', encoding='utf-8') as f:
for line in f:
for word in line.split():
cnt[word] += 1
except FileNotFoundError:
print("File not found")
with sys.stdout as f:
total_word_count = sum(cnt.values())
for word, count in cnt.most_common(30):
f.write('{: < 6} {:<7.2%} {}\n'.format(
count, count / total_word_count, word))
Output:
~ python countword.py CSW07.txt
79619 4.58% [n]
63717 3.67% a
56783 3.27% of
42341 2.44% to
40156 2.31% the
39295 2.26% [v]
38231 2.20% [n
36592 2.11% -S]
35250 2.03% or
17113 0.98% in
You need to read each line, split it into words, and then update the counter. Otherwise you are only counting each line separately. Even if the file is very big, since you are only storing the individual words, you will be processing it line-by-line.
Try this version instead:
import collections
import io
c = collections.defaultdict(int)
with io.open('somefile.txt', encoding='utf-8') as f:
for line in f:
if len(line.strip()):
for word in line.split(' '):
c[word] += 1
with io.open('out.txt', 'w') as f:
for word, count in c.iteritems():
f.write('{} {}\n'.format(word, count))
You are counting words for each line.
Maybe try to read whole file, split by words, and make the Counter call.
Edit: If you don't have enough memory for read all file but enough for store all different words:
import io
import collections
import codecs
from collections import Counter
def count(file):
f = open(file,'r')
cnt = Counter()
for line in f.readlines():
words = line.split(" ")
for word in words:
cnt[word] += 1
f.close()
return cnt
Now get the counter return and print to file the data you want.