I'm trying to write a Python code that will allow me to take in text, and read it line by line. In each line, the words just go into the dictionary as a key and the numbers should be the assigned values, as a list.
the file 'topics.txt' will be composed of hundreds of lines that have the same format as this:
1~cocoa
2~
3~
4~
5~grain~wheat~corn~barley~oat~sorghum
6~veg-oil~linseed~lin-oil~soy-oil~sun-oil~soybean~oilseed~corn~sunseed~grain~sorghum~wheat
7~
8~
9~earn
10~acq
and so on..
i need to create dictionaries for each word
for ex:
Ideally, the name "grain" would be a key in the dictionary, and the values would be dict[grain]: [5,6,..].
similarly,
"cocoa" would be another key and values would be
dict[cocoa]:[1,..]
Not much,but so far..
with open("topics.txt", "r") as fi: # Data read from a text file is a string
d = {}
for i in fi.readlines():
temp = i.split()
#i am lost here
num = temp[0]
d[name] = [map(int, num)]
http://docs.python.org/3/library/collections.html#collections.defaultdict
import collections
with open('topics.txt') as f:
d = collections.defaultdict(list)
for line in f:
value, *keys = line.strip().split('~')
for key in filter(None, keys):
d[key].append(value)
value, *keys = ... is Extended Iterable Unpacking which is only available in Python 3.x.
with open("topics.txt", "r") as file: # Data read from a text file is a string
dict = {}
for fullLine in file:
splitLine = fullLine.split("~")
num = splitLine[0]
for name in splitLine[1:]:
if name in dict:
dict[name] = dict[name] + (num,)
else
dict[name] = (num,)
Related
I have to use element 0 of words as a dictionary key and set the value of to_nato for that key to words element 1.
I have this:
natofile = "nato-alphabet.txt"
to_nato = {} #creates empty string
fh = open(natofile) #opens natofile
for line in fh:
clean = line.strip()
lowerl = clean.lower()
words = lowerl.split()
to_nato = {words[0]:words[1]}
print(to_nato)
nato-alphabet is a text file that looks like this:
A Alfa
B Bravo
C Charlie
D Delta
E Echo
F Foxtrot
G Golf
H Hotel
I India
My code returns a list of dictionaries instead one dictionary.
Directly set the key value with dict_object[key] = value:
to_nato[words[0]] = words[1]
This can be written more concisely using the dict constructor and a generator expression.
to_nato = dict(line.lower().split() for line in fh)
Try this:
natofile = "nato-alphabet.txt"
to_nato = {} #creates empty string
fh = open(natofile) #opens natofile
for line in fh:
clean = line.strip()
lowerl = clean.lower()
words = lowerl.split()
to_nato[words[0]] = words[1]
fh.close()
print(to_nato)
This sets the element of to_nato with key words[0] to value words[1] for each pair in the file.
dict() can convert any list of pairs of values into a dict
lines=open('nato-alphabet.txt').read().lower().splitlines()
lines = [line.strip().split() for line in lines]
my_dict=dict(lines)
I am trying to compare the two lines and capture the lines that match with each other. For example,
file1.txt contains
my
sure
file2.txt contains
my : 2
mine : 5
sure : 1
and I am trying to output
my : 2
sure : 1
I have the following code so far
inFile = "file1.txt"
dicts = "file2.txt"
with open(inFile) as f:
content = f.readlines()
content = [x.strip() for x in content]
with open(dicts) as fd:
inDict = fd.readlines()
inDict = [x.strip() for x in inDict]
ordered_dict = {}
for line in inDict:
key = line.split(":")[0].strip()
value = int(line.split(":")[1].strip())
ordered_dict[key] = value
for (key, val) in ordered_dict.items():
for entry in content:
if entry == content:
print(key, val)
else:
continue
However, this is very inefficient because it loops two times and iterates a lot. Therefore, this is not ideal when it comes to large files. How can I make this workable for large files?
You don't need nested loops. One loop to read in file2 and translate to a dict, and another loop to read file1 and look up the results.
inFile = "file1.txt"
dicts = "file2.txt"
ordered_dict = {}
with open(dicts) as fd:
for line in fd:
a,b = line.split(' : ')
ordered_dict[a] = b
with open(inFile) as f:
for line in f:
line = line.strip()
if line in ordered_dict:
print( line, ":", ordered_dict[line] )
The first loop can be done as a list comprehension.
with open(dicts) as fd:
ordered_dict = dict( line.strip().split(' : ') for line in fd )
Here is a solution with one for loop:
inFile = "file1.txt"
dicts = "file2.txt"
with open(inFile) as f:
content_list = list(map(str.split,f.readlines()))
with open(dicts) as fd:
in_dict_lines = fd.readlines()
for dline in in_dict_lines:
key,val=dline.split(" : ")
if key in content_list:
ordered_dict[key] = value
I am attempting to compare 2 files, A and B. The purpose is to find all the words A has but that are not in B. For example,
File A
my: 2
hello: 5
me: 1
File B
my
name
is
output
hello
me
The code I have so far is
inFile = "fila.txt"
lexicon = "fileb.xml"
with open(inFile) as f:
content = f.readlines()
content = [x.strip() for x in content]
with open(lexicon) as File:
lexicon_file = File.readlines()
lexicon_file = [x.strip() for x in lexicon_file]
ordered_dict = {}
for line in content:
key = line.split(":")[0].strip()
value = int(line.split(":")[1].strip())
ordered_dict[key] = value
for entry in lexicon_file:
for (key, val) in ordered_dict.items():
if entry == key:
continue
else:
print(key)
However this takes too long because it's in double loops, it's also printing duplicate words. How do I make this efficient?
Convert both lists into sets and just do a substraction:
content_wo_lexicon = list(set(content) - set(lexicon_content))
So, i have this text file which contains this infos:
student_num1 student_name1 student_grade1
student_num2 student_name2 student_grade2
student_num3 student_name3 student_grade3
What i want to do is i want to take each line of this text file as a dictionary entry with this format:
students = { student_num1: [student_name1, student_grade1], student_num2: [student_name2, student_grade2], student_num3: [student_name3, student_grade3] }
Basically, the first string of the line should be the key and the 2 strings next to it would be the value. But i don't know how will i make python separate the strings in each line and assign them as the key and value for the dictionary.
EDIT:
So, i've tried some code: (I saw all your solutions, and i think they'll all definitely work, but i also want to learn to create my solution, so i will really appreciate if you could check mine!)
for line in fh:
line = line.split(";")
student_num = line[0]
student_name = line[1]
student_grade = line[2]
count =+ 1
direc[student_num] = [student_name,student_grade]
student_num = "student_num" + str(count)
student_grade = "student_grade" + str(count)
student_name = "student_name" + str(count)
print(direc)
The problem is i get an error of list index out of range on line 10 or this part "student_name = line[1]"
EDIT: THANK YOU EVERYONE! Every single one of your suggested solutions works! I've also fixed my own solution. This is the fixed one (as suggest by #norok2):
for line in fh:
line = line.split(" ")
student_num = line[0]
student_name = line[1]
student_grade = line[2]
count =+ 1
direc[student_num] = [student_name,student_grade]
student_num = "student_num" + str(count)
student_grade = "student_grade" + str(count)
student_name = "student_name" + str(count)
As a dict comprehension:
with open("data.txt", "r") as f:
students = {k:v for k, *v in map(str.split, f)}
Explanation:
The file object f is already an iterator (that yields each line), we want to split the lines, so we can use map(str.split, f) or (line.split() for line in f).
After that we know, that the first item is the key of the dictionary, and the remaining items are the values. We can use unpacking for that. An unpacking example:
>>> a, *b = [1,2,3]
>>> a
1
>>> b
[2, 3]
Then we use a comprehension to build the dict with the values we are capturing in the unpacking.
A dict comprehension is an expresion to build up dictionaries, for example:
>>> {x:x+1 for x in range(5)}
{0: 1, 1: 2, 2: 3, 3: 4, 4: 5}
Example,
File data.txt:
student_num1 student_name1 student_grade1
student_num2 student_name2 student_grade2
student_num3 student_name3 student_grade3
Reading it
>>> with open("data.txt", "r") as f:
... students = {k:v for k, *v in map(str.split, f)}
...
>>> students
{'student_num1': ['student_name1', 'student_grade1'], 'student_num2': ['student_name2', 'student_grade2'], 'student_num3': ['student_name3', 'student_grade3']}
My current approach uses file handling to open a file in read mode, and then reading the lines present in the file. Then for each line, remove extra new line and whitespaces and split it at space, to create a list. Then used unpacking to store single value as key and a list of 2 values as value. Added values to the dictonary.
temp.txt
student_num1 student_name1 student_grade1
student_num2 student_name2 student_grade2
student_num3 student_name3 student_grade3
main.py
d = dict()
with open("temp.txt", "r") as f:
for line in f.readlines():
key, *values = line.strip().split(" ")
d[key] = values
print(d)
Output
{'student_num1': ['student_name1', 'student_grade1'], 'student_num2': ['student_name2', 'student_grade2'], 'student_num3': ['student_name3', 'student_grade3']}
with open('data.txt') as f:
lines = f.readlines()
d = {}
for line in lines:
tokens = line.split()
d[tokens[0]] = tokens[1:]
print(d)
I hope this is understandable. To split the lines into the different tokens, we use the split1 function.
The reason why your solution is giving you that error is that it seems your lines do not contain the character ;, yet you try to split by that character with line = line.split(";").
You should replace that with:
line = line.split(" ") to split by the space character
or
line = line.split(";") to split by any blank character
However, for a more elegant solution, see here.
Have you tried something as simple as this:
d = {}
with open('students.txt') as f:
for line in f:
key, *rest = line.split()
d[key] = rest
print(d)
# {'student_num1': ['student_name1', 'student_grade1'], 'student_num2': ['student_name2', 'student_grade2'], 'student_num3': ['student_name3', 'student_grade3']}
file.txt:
student_num1 student_name1 student_grade1
student_num2 student_name2 student_grade2
student_num3 student_name3 student_grade3
Main.py:
def main():
file = open('file.txt', 'r')
students = {}
for line in file:
fields = line.split(" ")
fields[2] = fields[2].replace("\n", "")
students[fields[1]] = [fields[0], fields[2]]
print(students)
main()
Output:
{'student_name1': ['student_num1', 'student_grade1'], 'student_name2': ['student_num2', 'student_grade2'], 'student_name3': ['student_num3', 'student_grade3']}
I need this to print the corresponding line numbers from the text file.
def index (filename, lst):
infile = open('raven.txt', 'r')
lines = infile.readlines()
words = []
dic = {}
for line in lines:
line_words = line.split(' ')
words.append(line_words)
for i in range(len(words)):
for j in range(len(words[i])):
if words[i][j] in lst:
dic[words[i][j]] = i
return dic
The result:
In: index('raven.txt',['raven', 'mortal', 'dying', 'ghost', 'ghastly', 'evil', 'demon'])
Out: {'dying': 8, 'mortal': 29, 'raven': 77, 'ghost': 8}
(The words above appear in several lines but it's only printing one line and for some it doesn't print anything
Also, it does not count the empty lines in the text file. So 8 should actually be 9 because there's an empty line which it is not counting.)
Please tell me how to fix this.
def index (filename, lst):
infile = open('raven.txt', 'r')
lines = infile.readlines()
words = []
dic = {}
for line in lines:
line_words = line.split(' ')
words.append(line_words)
for i in range(len(words)):
for j in range(len(words[i])):
if words[i][j] in lst:
if words[i][j] not in dic.keys():
dic[words[i][j]] = set()
dic[words[i][j]].add(i + 1) #range starts from 0
return dic
Using a set instead of a list is useful in cases were the word is present several times in the same line.
Use defaultdict to create a list of linenumbers for each line:
from collections import defaultdict
def index(filename, lst):
with open(filename, 'r') as infile:
lines = [line.split() for line in infile]
word2linenumbers = defaultdict(list)
for linenumber, line in enumerate(lines, 1):
for word in line:
if word in lst:
word2linenumbers[word].append(linenumber)
return word2linenumbers
You can also use dict.setdefault to either start a new list for each word or append to an existing list if that word has already been found:
def index(filename, lst):
# For larger lists, checking membership will be asymptotically faster using a set.
lst = set(lst)
dic = {}
with open(filename, 'r') as fobj:
for lineno, line in enumerate(fobj, 1):
words = line.split()
for word in words:
if word in lst:
dic.setdefault(word, []).append(lineno)
return dic
Youre two main problems can be fixed by:
1.) multiple indices: you need to initiate/assign a list as the dict value instead of just a single int. otherwise, each word will be reassigned a new index every time a new line is found with that word.
2.) empty lines SHOULD be read as a line so I think its just an indexing issue. your first line is indexed to 0 since the first number in a range starts at 0.
You can simplify your program as follows:
def index (filename, lst):
wordinds = {key:[] for key in lst} #initiates an empty list for each word
with open(filename,'r') as infile: #why use filename param if you hardcoded the open....
#the with statement is useful. trust.
for linenum,line in enumerate(infile):
for word in line.rstrip().split(): #strip new line and split into words
if word in wordinds:
wordinds[word].append(linenum)
return {x for x in wordinds.iteritems() if x[1]} #filters empty lists
this simplifies everything to nest into one for loop that is enumerated for each line. if you want the first line to be 1 and second line as 2 you would have to change wordinds[word].append(linenum) to ....append(linenum + 1)
EDIT: someone made a good point in another answer to have enumerate(infile,1) to start your enumeration at index 1. thats way cleaner.