So I am working with a set of dictionarys consisting of
{'Summery':["00","01","02"],'Location':["03","04"],'Name':["05"]}
now each number correlates to the word count on each line
And in the text file I have, the lines that are formatted as so
Fun Reading Afterschool 50°N 50°E Library
Education Learning Study 51°N 51°E School
Exercise Play Social 52°N 52°E Playground
How Can I convert the input.txt to the desired output:
output.txt
{'Summery':["Fun","Reading","Aftershchool"],'Location':["50°N","50°E"],'Name':["Library"]}
{'Summery':["Education","Learning","Study"],'Location':["51°N","51°E"],'Name':["School"]}
{'Summery':["Exercise","Play","Social"],'Location':["52°N","52°E"],'Name':["Playground"]}
so far I have
file = open("input.txt", 'r')
lines = file.readlines()
list_word = []
for l in lines:
list_word.append(l.split(" "))
my_list = [line.split(' , ')for line in open ("test")]
string1="".join(map(str,my_list))
print(string1)
new_main = open("output.txt", 'w')
new_main.write(string1)
new_main.close()
which prints and creates output.txt
['Fun Reading Afterschool 50°N 50°E Library\n']['Education Learning Study 51°N 51°E School\n']['Exercise Play Social 52°N 52°E Playground']
Assuming the summary is always 3 words, the location 2 and the name 1 word (each seperated by one whitespace), you can take the wanted words based on their indices.
for string in string1:
splits = string.split(" ")
summary = splits[0:3]
location = splits[3:5]
name = splits[5:6]
print(f"Summary: {summary}, location: {location}, name: {name}")
Related
I want to find the words that start with "CHAPTER" and add them to a dictionary.
I have written some but It gives me 0 as an output all the time:
def wordcount(filename, listwords):
try:
file = open(filename, "r")
read = file.readlines()
file.close()
for word in listwords:
lower = word.lower()
count = 0
for sentence in read:
line = sentence.split()
for each in line:
line2=each.lower()
line2=line2.strip("")
if lower == line2:
count += 1
print(lower, ":", count)
except FileExistError:
print("The file is not there ")
wordcount("dad.txt", ["CHAPTER"])
the txt file is here
EDİT*
The problem was encoding type and I solved it but the new question is that How can I add these words into a dictionary?
and How can I make this code case sensitive I mean when I type wordcount("dad.txt", ["CHAPTER"]) I want it to find only CHAPTER words with upper case.
It cannot work because of this line:
if lower == line2:
you can use this line to find the words that start with "CHAPTER"
if line2.startswith(lower):
I notice that you need to check if a word starts with a certain words from listwords rather than equality (lower == line2). Hence, you should use startswith method.
You can have a simpler code, something like this.
def wordcount(filename, listwords):
listwords = [s.lower() for s in listwords]
wordCount = {s:0 for s in listwords} # A dict to store the counts
with open(filename,"r") as f:
for line in f.readlines():
for word in line.split():
for s in listwords:
if word.lower().startswith(s):
wordCount[s]+=1
return wordCount
If the goal is to find chapters and paragraphs, don't try and count words or split any line
For example, start simpler. Since chapters are in numeric order, you only need a list, not a dictionary
chapters = [] # list of chapters
chapter = "" # store one chapter
with open(filename, encoding="UTF-8") as f:
for line in f.readlines():
# TODO: should skip to the first line that starts with "CHAPTER", otherwise 'chapters' variable gets extra, header information
if line.startswith("CHAPTER"):
print("Found chapter: " + line)
# Save off the most recent, non empty chapter text, and reset
if chapter:
chapters.append(chapter)
chapter = ""
else:
# up to you if you want to skip empty lines
chapter += line # don't manipulate any data yet
# Capture the last chapter at the end of the file
if chapter:
chapters.append(chapter)
del chapter # no longer needed
# del chapters[0] if you want to remove the header information before the first chapter header
# Done reading the file, now work with strings in your lists
print(len(chapters)) # find how many chapters there are
If you actually did want the text following "CHAPTER", then you can split that line in the first if statement, however note that the chapter numbers repeat between volumes, and this solution assumes the volume header is part of a chapter
If you want to count the paragraphs, start with finding the empty lines (for example split each element on '\n\n')
So, i have a nested list written into a file.txt
students = []
info = []
name = input("Name: ")
age = input("Age: ")
info.append(name)
info.append(age)
students.append(info)
my_file = open("file.txt", "a")
for data in students:
my_file.write("%s\n" % data)
my_file.close()
The contents in the file are in this format:
['john', '19']
['nick', '20']
Afterwards, i'm using nested loop to access the content of file.txt
my_file = open("file.txt", "r")
search_keyword = input("Please Enter Student Name: ")
for students in my_file:
for info in students:
print(info)
Expected output:
john
19
nick
20
Actual output:
j
o
h
n
1
9
n
i
c
k
2
0
Can someone explain why the inner list is missing after extracting from a file, as the loop treats each individual alphabet as an element.
Uh, I'm not 100% sure but maybe try: my_file.writelines("")
This seems a good candidate for pickle since you have a Python data structure you want to preserve.
import pickle
lol = [['john', '19'],['nick', '20']]
with open('lol.pickle','wb') as f:
pickle.dump(lol,f)
del lol
print(lol)
# NameError: name 'lol' is not defined
with open('lol.pickle','rb') as f:
lol = pickle.load(f)
print(lol)
# [['john', '19'], ['nick', '20']]
You have read the data as a string, so when you iterate you are iterating through elements of a string, not elements of a list.
I am assuming that you are learning python. The current program works, but is not the best way to do it. You should try using csv or pickle. However, it is always good to start from basic! :D
students = []
info = []
name = input("Name: ")
age = input("Age: ")
info.append(name)
info.append(age)
students.append(info)
with open("file.txt", "a") as my_file:
for data in students:
my_file.write("%s,%s\n" % (data[0], data[1])) # this will write in file like name,age
After that you can retrieve like
search_keyword = input("Please Enter Student Name: ")
with open("file.txt", "r") as my_file:
for students in my_file:
for info in students.split(','): # We split it based on commas to get the desired output
print(info)
The mistake you were making is that when you tried for info in students you were iterating over characters in the string rather than the actual information.
Notice how we have used with open this will do all the file handling automatically.
Iterating over file-like objects produces lines
for students in my_file:
On each iteration students will be a line of text.
Iterating over text produces individual characters.
for info in students:
On each iteration info will be a character.
If your text files have string representations of python objects you can use ast.literal_eval to evaluate the objects.
import ast
with open('file.txt') as f:
for line in f:
thing = ast.literal_eval(line)
print(thing)
for item in thing:
print(item)
As mentioned in the docs, ast.literal_eval Safely evaluate an expression node or a string with emphasis on safe - it shouldn't evaluate destructive or unwanted Python statements.
You would be better off using one of the built-in data persistence modules or perhaps json or even xml to save your data.
I am not sure how to get my file to sort and display the top 5 scores from my text file.
Text file below :
24fred
23alan
24bert
28dan
11orange
17purple
16dave
22andy
The code which I am using to write to the file.
Tried using sort but can't get it to display only the top 5 scores.
file = open("Score.txt", "a")
file.write(str(name))
file.write(str(Score))
file.write("\n")
file.close
the file will print out sorted and only showing the top 5
You can use the following sample:
import re
pat = re.compile(r"^(\d+)(\D+)$")
def sort_crit(i):
m = pat.match(i)
return - int(m.group(1)), m.group(2)
with open("Score.txt",'r') as f:
lines = [line.rstrip() for line in f]
lines.sort(key = sort_crit)
with open('Sorted_score.txt', 'w') as f:
for item in lines:
f.write("%s\n" % item)
input:
$ more Score.txt
24fred
23alan
24bert
28dan
28abc
11orange
17purple
16dave
22andy
output:
$ more Sorted_score.txt
28abc
28dan
24bert
24fred
23alan
22andy
17purple
16dave
11orange
Explanations:
re.compile(r"^(\d+)(\D+)$") will be used to extract individually the score and the name
sort_crit(i) will return the double sorting criteria based first on the score in reverse order (note the -), followed by the name in alphabetic order
You open the input file and store all the lines in an array
You sort the lines using the sorting function you have defined
you output them in a new file
I am trying to stem words from a file that contains about 90000 lines (each line has three to several hundred words.
I want to append the lines to a list after stemming the words. I was able to insert the stemmed words into a list, which contains one line. I want to insert the words into the list while keeping the 90000 lines. Any ideas?
clean_sentence = []
with open(folder_path+text_file_name, 'r', encoding='utf-8') as f:
for line in f:
sentence = line.split()
for word in sentence:
if word.endswith('er'):
clean_sentence.append(word[:-2])
else:
clean_sentence.append(word)
x = ' '.join(clean_sentence)
with open('StemmingOutFile.txt','w', encoding="utf8") as StemmingOutFile:
StemmingOutFile.write(x)
The file is not in English, but here is an example that illustrates the issue at hand: current code yields:
why don't you like to watch TV? are there any more fruits? why not?
I want the output file to be:
why don't you like to watch TV?
are there any more fruits?
why not?
Read the file in lines:
with open('file.txt','r') as f:
lines = f.read().splitlines()
and then do the stemming:
new_lines = []
for line in lines:
new_lines.append(' '.join[stemmed(word) for word in line])
where stemmed is a function as follows:
def stemmed(word):
return word[:-2] if word.endswith('er') else word
Then write each line of new_lines in StemmingOutFile.txt.
Example of data in txt file:
apple
orange
banana
lemon
pears
Code of filtering words with 5 letters without dictionary:
def numberofletters(n):
file = open("words.txt","r")
lines = file.readlines()
file.close()
for line in lines:
if len(line) == 6:
print(line)
return
print("===================================================================")
print("This program can use for identify and print out all words in 5
letters from words.txt")
n = input("Please Press enter to start filtering words")
print("===================================================================")
numberofletters(n)
My question is how create a dictionary whose keys are integers and values the English words with that many letters and Use the dictionary to identify and print out all the 5 letter words?
Imaging with a huge list of words
Sounds like a job for a defaultdict.
>>> from collections import defaultdict
>>> length2words = defaultdict(set)
>>>
>>> with open('file.txt') as f:
... for word in f: # one word per line
... word = word.strip()
... length2words[len(word)].add(word)
...
>>> length2words[5]
set(['lemon', 'apple', 'pears'])
If you care about duplicates and insertion order, use a defaultdict(list) and append instead of add.
you can make your for loop like this:
for line in lines:
line_len = len(line)
if line_len not in dicword.keys():
dicword.update({line_len: [line]})
else:
dicword[line_len].append(line)
Then you can get it by just doing dicword[5]
If I understood, you need to write filter your document and result into a file. For that you can write a CSV file with DictWriter (https://docs.python.org/2/library/csv.html).
DictWriter: Create an object which operates like a regular writer but maps dictionaries onto output rows.
BTW, you will be able to store and structure your document
def numberofletters(n):
file = open("words.txt","r")
lines = file.readlines()
file.close()
dicword = {}
writer = csv.DictWriter(filename, fieldnames=fieldnames)
writer.writeheader()
for line in lines:
if len(line) == 6:
writer.writerow({'param_label': line, [...]})
return
I hope that help you.