How to compare individual words in two text files in Python - python

I am attempting to compare 2 files, A and B. The purpose is to find all the words A has but that are not in B. For example,
File A
my: 2
hello: 5
me: 1
File B
my
name
is
output
hello
me
The code I have so far is
inFile = "fila.txt"
lexicon = "fileb.xml"
with open(inFile) as f:
content = f.readlines()
content = [x.strip() for x in content]
with open(lexicon) as File:
lexicon_file = File.readlines()
lexicon_file = [x.strip() for x in lexicon_file]
ordered_dict = {}
for line in content:
key = line.split(":")[0].strip()
value = int(line.split(":")[1].strip())
ordered_dict[key] = value
for entry in lexicon_file:
for (key, val) in ordered_dict.items():
if entry == key:
continue
else:
print(key)
However this takes too long because it's in double loops, it's also printing duplicate words. How do I make this efficient?

Convert both lists into sets and just do a substraction:
content_wo_lexicon = list(set(content) - set(lexicon_content))

Related

How do I use elements of list to set the key and value of dictionary?

I have to use element 0 of words as a dictionary key and set the value of to_nato for that key to words element 1.
I have this:
natofile = "nato-alphabet.txt"
to_nato = {} #creates empty string
fh = open(natofile) #opens natofile
for line in fh:
clean = line.strip()
lowerl = clean.lower()
words = lowerl.split()
to_nato = {words[0]:words[1]}
print(to_nato)
nato-alphabet is a text file that looks like this:
A Alfa
B Bravo
C Charlie
D Delta
E Echo
F Foxtrot
G Golf
H Hotel
I India
My code returns a list of dictionaries instead one dictionary.
Directly set the key value with dict_object[key] = value:
to_nato[words[0]] = words[1]
This can be written more concisely using the dict constructor and a generator expression.
to_nato = dict(line.lower().split() for line in fh)
Try this:
natofile = "nato-alphabet.txt"
to_nato = {} #creates empty string
fh = open(natofile) #opens natofile
for line in fh:
clean = line.strip()
lowerl = clean.lower()
words = lowerl.split()
to_nato[words[0]] = words[1]
fh.close()
print(to_nato)
This sets the element of to_nato with key words[0] to value words[1] for each pair in the file.
dict() can convert any list of pairs of values into a dict
lines=open('nato-alphabet.txt').read().lower().splitlines()
lines = [line.strip().split() for line in lines]
my_dict=dict(lines)

Compare lines in two files efficiently in Python

I am trying to compare the two lines and capture the lines that match with each other. For example,
file1.txt contains
my
sure
file2.txt contains
my : 2
mine : 5
sure : 1
and I am trying to output
my : 2
sure : 1
I have the following code so far
inFile = "file1.txt"
dicts = "file2.txt"
with open(inFile) as f:
content = f.readlines()
content = [x.strip() for x in content]
with open(dicts) as fd:
inDict = fd.readlines()
inDict = [x.strip() for x in inDict]
ordered_dict = {}
for line in inDict:
key = line.split(":")[0].strip()
value = int(line.split(":")[1].strip())
ordered_dict[key] = value
for (key, val) in ordered_dict.items():
for entry in content:
if entry == content:
print(key, val)
else:
continue
However, this is very inefficient because it loops two times and iterates a lot. Therefore, this is not ideal when it comes to large files. How can I make this workable for large files?
You don't need nested loops. One loop to read in file2 and translate to a dict, and another loop to read file1 and look up the results.
inFile = "file1.txt"
dicts = "file2.txt"
ordered_dict = {}
with open(dicts) as fd:
for line in fd:
a,b = line.split(' : ')
ordered_dict[a] = b
with open(inFile) as f:
for line in f:
line = line.strip()
if line in ordered_dict:
print( line, ":", ordered_dict[line] )
The first loop can be done as a list comprehension.
with open(dicts) as fd:
ordered_dict = dict( line.strip().split(' : ') for line in fd )
Here is a solution with one for loop:
inFile = "file1.txt"
dicts = "file2.txt"
with open(inFile) as f:
content_list = list(map(str.split,f.readlines()))
with open(dicts) as fd:
in_dict_lines = fd.readlines()
for dline in in_dict_lines:
key,val=dline.split(" : ")
if key in content_list:
ordered_dict[key] = value

How to make a dictionary entry from each line in a text file?

So, i have this text file which contains this infos:
student_num1 student_name1 student_grade1
student_num2 student_name2 student_grade2
student_num3 student_name3 student_grade3
What i want to do is i want to take each line of this text file as a dictionary entry with this format:
students = { student_num1: [student_name1, student_grade1], student_num2: [student_name2, student_grade2], student_num3: [student_name3, student_grade3] }
Basically, the first string of the line should be the key and the 2 strings next to it would be the value. But i don't know how will i make python separate the strings in each line and assign them as the key and value for the dictionary.
EDIT:
So, i've tried some code: (I saw all your solutions, and i think they'll all definitely work, but i also want to learn to create my solution, so i will really appreciate if you could check mine!)
for line in fh:
line = line.split(";")
student_num = line[0]
student_name = line[1]
student_grade = line[2]
count =+ 1
direc[student_num] = [student_name,student_grade]
student_num = "student_num" + str(count)
student_grade = "student_grade" + str(count)
student_name = "student_name" + str(count)
print(direc)
The problem is i get an error of list index out of range on line 10 or this part "student_name = line[1]"
EDIT: THANK YOU EVERYONE! Every single one of your suggested solutions works! I've also fixed my own solution. This is the fixed one (as suggest by #norok2):
for line in fh:
line = line.split(" ")
student_num = line[0]
student_name = line[1]
student_grade = line[2]
count =+ 1
direc[student_num] = [student_name,student_grade]
student_num = "student_num" + str(count)
student_grade = "student_grade" + str(count)
student_name = "student_name" + str(count)
As a dict comprehension:
with open("data.txt", "r") as f:
students = {k:v for k, *v in map(str.split, f)}
Explanation:
The file object f is already an iterator (that yields each line), we want to split the lines, so we can use map(str.split, f) or (line.split() for line in f).
After that we know, that the first item is the key of the dictionary, and the remaining items are the values. We can use unpacking for that. An unpacking example:
>>> a, *b = [1,2,3]
>>> a
1
>>> b
[2, 3]
Then we use a comprehension to build the dict with the values we are capturing in the unpacking.
A dict comprehension is an expresion to build up dictionaries, for example:
>>> {x:x+1 for x in range(5)}
{0: 1, 1: 2, 2: 3, 3: 4, 4: 5}
Example,
File data.txt:
student_num1 student_name1 student_grade1
student_num2 student_name2 student_grade2
student_num3 student_name3 student_grade3
Reading it
>>> with open("data.txt", "r") as f:
... students = {k:v for k, *v in map(str.split, f)}
...
>>> students
{'student_num1': ['student_name1', 'student_grade1'], 'student_num2': ['student_name2', 'student_grade2'], 'student_num3': ['student_name3', 'student_grade3']}
My current approach uses file handling to open a file in read mode, and then reading the lines present in the file. Then for each line, remove extra new line and whitespaces and split it at space, to create a list. Then used unpacking to store single value as key and a list of 2 values as value. Added values to the dictonary.
temp.txt
student_num1 student_name1 student_grade1
student_num2 student_name2 student_grade2
student_num3 student_name3 student_grade3
main.py
d = dict()
with open("temp.txt", "r") as f:
for line in f.readlines():
key, *values = line.strip().split(" ")
d[key] = values
print(d)
Output
{'student_num1': ['student_name1', 'student_grade1'], 'student_num2': ['student_name2', 'student_grade2'], 'student_num3': ['student_name3', 'student_grade3']}
with open('data.txt') as f:
lines = f.readlines()
d = {}
for line in lines:
tokens = line.split()
d[tokens[0]] = tokens[1:]
print(d)
I hope this is understandable. To split the lines into the different tokens, we use the split1 function.
The reason why your solution is giving you that error is that it seems your lines do not contain the character ;, yet you try to split by that character with line = line.split(";").
You should replace that with:
line = line.split(" ") to split by the space character
or
line = line.split(";") to split by any blank character
However, for a more elegant solution, see here.
Have you tried something as simple as this:
d = {}
with open('students.txt') as f:
for line in f:
key, *rest = line.split()
d[key] = rest
print(d)
# {'student_num1': ['student_name1', 'student_grade1'], 'student_num2': ['student_name2', 'student_grade2'], 'student_num3': ['student_name3', 'student_grade3']}
file.txt:
student_num1 student_name1 student_grade1
student_num2 student_name2 student_grade2
student_num3 student_name3 student_grade3
Main.py:
def main():
file = open('file.txt', 'r')
students = {}
for line in file:
fields = line.split(" ")
fields[2] = fields[2].replace("\n", "")
students[fields[1]] = [fields[0], fields[2]]
print(students)
main()
Output:
{'student_name1': ['student_num1', 'student_grade1'], 'student_name2': ['student_num2', 'student_grade2'], 'student_name3': ['student_num3', 'student_grade3']}

Python Beginning Program Dictionary and List Issue

Write a program that reads the contents of a random text file. The program should create a dictionary in which the keys are individual words found in the file and the values are the number of times each word appears.
How would I go about doing this?
def main():
c = 0
dic = {}
words = set()
inFile = open('text2', 'r')
for line in inFile:
line = line.strip()
line = line.replace('.', '')
line = line.replace(',', '')
line = line.replace("'", '') #strips the punctuation
line = line.replace('"', '')
line = line.replace(';', '')
line = line.replace('?', '')
line = line.replace(':', '')
words = line.split()
for x in words:
for y in words:
if x == y:
c += 1
dic[x] = c
print(dic)
print(words)
inFile.close()
main()
Sorry for the vague question. Never asked any questions here before. This is what I have so far. Also, this is the first ever programming I've done so I expect it to be pretty terrible.
with open('path/to/file') as infile:
# code goes here
That's how you open a file
for line in infile:
# code goes here
That's how you read a file line-by-line
line.strip().split()
That's how you split a line into (white-space separated) words.
some_dictionary['abcd']
That's how you access the key 'abcd' in some_dictionary.
Questions for you:
What does it mean if you can't access the key in a dictionary?
What error does that give you? Can you catch it with a try/except block?
How do you increment a value?
Is there some function that GETS a default value from a dict if the key doesn't exist?
For what it's worth, there's also a function that does almost exactly this, but since this is pretty obviously homework it won't fulfill your assignment requirements anyway. It's in the collections module. If you're interested, try and figure out what it is :)
There are at least three different approaches to add a new word to the dictionary and count the number of occurences in this file.
def add_element_check1(my_dict, elements):
for e in elements:
if e not in my_dict:
my_dict[e] = 1
else:
my_dict[e] += 1
def add_element_check2(my_dict, elements):
for e in elements:
if e not in my_dict:
my_dict[e] = 0
my_dict[e] += 1
def add_element_except(my_dict, elements):
for e in elements:
try:
my_dict[e] += 1
except KeyError:
my_dict[e] = 1
my_words = {}
with open('pathtomyfile.txt', r) as in_file:
for line in in_file:
words = [word.strip().lower() word in line.strip().split()]
add_element_check1(my_words, words)
#or add_element_check2(my_words, words)
#or add_element_except(my_words, words)
If you are wondering which is the fastest? The answer is: it depends. It depends on how often a given word might occur in the file. If a word does only occur (relatively) few times, the try-except would be the best choice in your case.
I have done some simple benchmarks here
This is a perfect job for the built in Python Collections class. From it, you can import Counter, which is a dictionary subclass made for just this.
How you want to process your data is up to you. One way to do this would be something like this
from collections import Counter
# Open your file and split by white spaces
with open("yourfile.txt","r") as infile:
textData = infile.read()
# Replace characters you don't want with empty strings
textData = textData.replace(".","")
textData = textData.replace(",","")
textList = textData.split(" ")
# Put your data into the counter container datatype
dic = Counter(textList)
# Print out the results
for key,value in dic.items():
print "Word: %s\n Count: %d\n" % (key,value)
Hope this helps!
Matt

how to create a dictionary from a file?

I'm trying to write a Python code that will allow me to take in text, and read it line by line. In each line, the words just go into the dictionary as a key and the numbers should be the assigned values, as a list.
the file 'topics.txt' will be composed of hundreds of lines that have the same format as this:
1~cocoa
2~
3~
4~
5~grain~wheat~corn~barley~oat~sorghum
6~veg-oil~linseed~lin-oil~soy-oil~sun-oil~soybean~oilseed~corn~sunseed~grain~sorghum~wheat
7~
8~
9~earn
10~acq
and so on..
i need to create dictionaries for each word
for ex:
Ideally, the name "grain" would be a key in the dictionary, and the values would be dict[grain]: [5,6,..].
similarly,
"cocoa" would be another key and values would be
dict[cocoa]:[1,..]
Not much,but so far..
with open("topics.txt", "r") as fi: # Data read from a text file is a string
d = {}
for i in fi.readlines():
temp = i.split()
#i am lost here
num = temp[0]
d[name] = [map(int, num)]
http://docs.python.org/3/library/collections.html#collections.defaultdict
import collections
with open('topics.txt') as f:
d = collections.defaultdict(list)
for line in f:
value, *keys = line.strip().split('~')
for key in filter(None, keys):
d[key].append(value)
value, *keys = ... is Extended Iterable Unpacking which is only available in Python 3.x.
with open("topics.txt", "r") as file: # Data read from a text file is a string
dict = {}
for fullLine in file:
splitLine = fullLine.split("~")
num = splitLine[0]
for name in splitLine[1:]:
if name in dict:
dict[name] = dict[name] + (num,)
else
dict[name] = (num,)

Categories

Resources