Text to dictionary doesn't work - python

I have the following text file in the same folder as my Python Code.
78459581
Black Ballpoint Pen
12345670
Football
49585922
Perfume
83799715
Shampoo
I have written this Python code.
file = open("ProductDatabaseEdit.txt", "r")
d = {}
for line in file:
x = line.split("\n")
a=x[0]
b=x[1]
d[a]=b
print(d)
This is the result I receive.
b=x[1] # IndexError: list index out of range
My dictionary should appear as follows:
{"78459581" : "Black Ballpoint Pen"
"12345670" : "Football"
"49585922" : "Perfume"
"83799715" : "Shampoo"}
What am I doing wrong?

A line is terminated by a linebreak, thus line.split("\n") will never give you more than one line.
You could cheat and do:
for first_line in file:
second_line = next(file)

You can simplify your solution by using a dictionary generator, this is probably the most pythonic solution I can think of:
>>> with open("in.txt") as f:
... my_dict = dict((line.strip(), next(f).strip()) for line in f)
...
>>> my_dict
{'12345670': 'Football', '49585922': 'Perfume', '78459581': 'Black Ballpoint Pen', '83799715': 'Shampoo'}
Where in.txt contains the data as described in the problem. It is necessary to strip() each line otherwise you would be left with a trailing \n character for your keys and values.

You need to strip the \n, not split
file = open("products.txt", "r")
d = {}
for line in file:
a = line.strip()
b = file.next().strip()
# next(file).strip() # if using python 3.x
d[a]=b
print(d)
{'12345670': 'Football', '49585922': 'Perfume', '78459581': 'Black Ballpoint Pen', '83799715': 'Shampoo'}

What's going on
When you open a file you get an iterator, which will give you one line at a time when you use it in a for loop.
Your code is iterating over the file, splitting every line in a list with \n as the delimiter, but that gives you a list with only one item: the same line you already had. Then you try to access the second item in the list, which doesn't exist. That's why you get the IndexError: list index out of range.
How to fix it
What you need is this:
file = open('products.txt','r')
d = {}
for line in file:
d[line.strip()] = next(file).strip()
In every loop you add a new key to the dictionary (by assigning a value to a key that didn't exist yet) and assign the next line as the value. The next() function is just telling to the file iterator "please move on to the next line". So, to drive the point home: in the first loop you set first line as a key and assign the second line as the value; in the second loop iteration, you set the third line as a key and assign the fourth line as the value; and so on.
The reason you need to use the .strip() method every time, is because your example file had a space at the end of every line, so that method will remove it.
Or...
You can also get the same result using a dictionary comprehension:
file = open('products.txt','r')
d = {line.strip():next(file).strip() for line in file}
Basically, is a shorter version of the same code above. It's shorter, but less readable: not necessarily something you want (a matter of taste).

In my solution i tried to not use any loops. Therefore, I first load the txt data with pandas:
import pandas as pd
file = pd.read_csv("test.txt", header = None)
Then I seperate keys and values for the dict such as:
keys, values = file[0::2].values, file[1::2].values
Then, we can directly zip these two as lists and create a dict:
result = dict(zip(list(keys.flatten()), list(values.flatten())))
To create this solution I used the information as provided in [question]: How to remove every other element of an array in python? (The inverse of np.repeat()?) and in [question]: Map two lists into a dictionary in Python

You can loop over a list two items at a time:
file = open("ProductDatabaseEdit.txt", "r")
data = file.readlines()
d = {}
for line in range(0,len(data),2):
d[data[i]] = data[i+1]

Try this code (where the data is in /tmp/tmp5.txt):
#!/usr/bin/env python3
d = dict()
iskey = True
with open("/tmp/tmp5.txt") as infile:
for line in infile:
if iskey:
_key = line.strip()
else:
_value = line.strip()
d[_key] = _value
iskey = not iskey
print(d)
Which gives you:
{'12345670': 'Football', '49585922': 'Perfume', '78459581': 'Black Ballpoint Pen', '83799715': 'Shampoo'}

Related

Index error iterating over list in python

So I have this file that contains 2 words each line. It looks like this.
[/lang:F </lang:foreign>
[lipmack] [lipsmack]
[Fang:foreign] <lang:foreign>
the first word is incorrectly formatted and the second one is correctly formatted. I am trying to put them in a dictionary. Below is my code.
textFile = open("hinditxt1.txt", "r+")
textFile = textFile.readlines()
flist = []
for word in textFile:
flist.append(word.split())
fdict = dict()
for num in range(len(flist)):
fdict[flist[num][0]] = flist[num][1]
First I split it then I try to put them in a dictionary. But for some reason I get "IndexError: list index out of range" when trying to put them in a dictionary. What can i do to fix it? Thanks!
It is better in python to iterate over the items of a list rather than a new range of indicies. My guess is that the IndexError is coming from a line in the input file that is blank or does not contain any spaces.
with open("input.txt", 'r') as f:
flist = [line.split() for line in f]
fdict = {}
for k, v in flist:
fdict[k] = v
print(fdict)
The code above avoids needing to access elements of the list using an index by simply iterating over the items of the list itself. We can further simplify this by using a dict comprehension:
with open("input.txt", 'r') as f:
flist = [line.split() for line in f]
fdict = {k: v for k, v in flist}
print(fdict)
With dictionaries it is typical to use the .update() method to add new key-value pairs. It would look more like:
for num in range(len(flist)):
fdict.update({flist[num][0] : flist[num][1]})
A full example without file reading would look like:
in_words = ["[/lang:F </lang:foreign>",
"[lipmack] [lipsmack]",
"[Fang:foreign] <lang:foreign>"]
flist = []
for word in in_words:
flist.append(word.split())
fdict = dict()
for num in range(len(flist)):
fdict.update({flist[num][0]: flist[num][1]})
print(fdict)
Yielding:
{'[lipmack]': '[lipsmack]', '[Fang:foreign]': '<lang:foreign>', '[/lang:F': '</lang:foreign>'}
Although your output may vary, since dictionaries do not maintain order.
As #Alex points out, the IndexError is likely from your data having improperly formatted data (i.e. a line with only 1 or 0 items on it). I suspect the most likely cause of this would be a \n at the end of your file that is causing the last line(s) to be blank.

Reading text file into dic file results in incomplete dic file

The following file 2016_01_22_Reps.txt is a list of expanded contractions that I want to put into a python dic file;
“can't":"cannot","could've":"could have","could've":"could have","didn't":"did not","doesn't":"does not", “don't":"do not"," hadn't":"had not", "hasn't":"has not","haven't":"have not","I'll":"I will","I'm":"I am","I've":"I have","isn't":"is not","I'll":"I
Note that the contents are a single line, not multiple lines.
My code is as follows;
reps = open('2016_01_22_Reps.txt', 'r')
Reps1dic={}
for line in reps:
x=line.split(",")
a=x[0]
b=x[1]
c=len(b)-1
b=b[0:c]
Reps1dic[a]=b
print (Reps1dic)
The output to Reps1dic stops after first two pairs of contractions. Contents are as follows;
{‘2016_01_22Reps = {“can\’t”:”cannot”‘ : ‘”could\’ve”:”could have’}
Instructions and explanation of why the complete file contents are not written to the dic file will be most appreciated.
The problem is that your values are all on one line, so your for line in reps only goes through the one iteration. Do something like this:
with open('2016_01_22_Reps.txt', 'r') as reps:
Reps1dic={}
contents = reps.read()
pairs = contents.split(',')
for pair in pairs:
parts = pair.split(':')
a = parts[0].replace('"', '').strip()
b = parts[1].replace('"', '').strip()
Reps1dic[a] = b
print(Reps1dic)
where you split the line and then iterate over that list instead of the lines in the file. I also used the with keyword to open your file - it's much better practice.

Python reading file problems

highest_score = 0
g = open("grades_single.txt","r")
arrayList = []
for line in highest_score:
if float(highest_score) > highest_score:
arrayList.extend(line.split())
g.close()
print(highest_score)
Hello, wondered if anyone could help me , I'm having problems here. I have to read in a file of which contains 3 lines. First line is no use and nor is the 3rd. The second contains a list of letters, to which I have to pull them out (for instance all the As all the Bs all the Cs all the way upto G) there are multiple letters of each. I have to be able to count how many off each through this program. I'm very new to this so please bear with me if the coding created is wrong. Just wondered if anyone could point me in the right direction of how to pull out these letters on the second line and count them. I then have to do a mathamatical function with these letters but I hope to work that out for myself.
Sample of the data:
GTSDF60000
ADCBCBBCADEBCCBADGAACDCCBEDCBACCFEABBCBBBCCEAABCBB
*
You do not read the contents of the file. To do so use the .read() or .readlines() method on your opened file. .readlines() reads each line in a file seperately like so:
g = open("grades_single.txt","r")
filecontent = g.readlines()
since it is good practice to directly close your file after opening it and reading its contents, directly follow with:
g.close()
another option would be:
with open("grades_single.txt","r") as g:
content = g.readlines()
the with-statement closes the file for you (so you don't need to use the .close()-method this way.
Since you need the contents of the second line only you can choose that one directly:
content = g.readlines()[1]
.readlines() doesn't strip a line of is newline(which usually is: \n), so you still have to do so:
content = g.readlines()[1].strip('\n')
The .count()-method lets you count items in a list or in a string. So you could do:
dct = {}
for item in content:
dct[item] = content.count(item)
this can be made more efficient by using a dictionary-comprehension:
dct = {item:content.count(item) for item in content}
at last you can get the highest score and print it:
highest_score = max(dct.values())
print(highest_score)
.values() returns the values of a dictionary and max, well, returns the maximum value in a list.
Thus the code that does what you're looking for could be:
with open("grades_single.txt","r") as g:
content = g.readlines()[1].strip('\n')
dct = {item:content.count(item) for item in content}
highest_score = max(dct.values())
print(highest_score)
highest_score = 0
arrayList = []
with open("grades_single.txt") as f:
arraylist.extend(f[1])
print (arrayList)
This will show you the second line of that file. It will extend arrayList then you can do whatever you want with that list.
import re
# opens the file in read mode (and closes it automatically when done)
with open('my_file.txt', 'r') as opened_file:
# Temporarily stores all lines of the file here.
all_lines_list = []
for line in opened_file.readlines():
all_lines_list.append(line)
# This is the selected pattern.
# It basically means "match a single character from a to g"
# and ignores upper or lower case
pattern = re.compile(r'[a-g]', re.IGNORECASE)
# Which line i want to choose (assuming you only need one line chosen)
line_num_i_need = 2
# (1 is deducted since the first element in python has index 0)
matches = re.findall(pattern, all_lines_list[line_num_i_need-1])
print('\nMatches found:')
print(matches)
print('\nTotal matches:')
print(len(matches))
You might want to check regular expressions in case you need some more complex pattern.
To count the occurrences of each letter I used a dictionary instead of a list. With a dictionary, you can access each letter count later on.
d = {}
g = open("grades_single.txt", "r")
for i,line in enumerate(g):
if i == 1:
holder = list(line.strip())
g.close()
for letter in holder:
d[letter] = holder.count(letter)
for key,value in d.iteritems():
print("{},{}").format(key,value)
Outputs
A,9
C,15
B,15
E,4
D,5
G,1
F,1
One can treat the first line specially (and in this case ignore it) with next inside try: except StopIteration:. In this case, where you only want the second line, follow with another next instead of a for loop.
with open("grades_single.txt") as f:
try:
next(f) # discard 1st line
line = next(f)
except StopIteration:
raise ValueError('file does not even have two lines')
# now use line

Adding characters from an external file to a dictionary

In python, how would i select a single character from a txt document that contains the following:
A#
M*
N%
(on seperate lines)...and then update a dictionary with the letter as the key and the symbol as the value.
The closest i have got is:
ftwo = open ("clues.txt", "r")
for lines in ftwo.readlines():
for char in lines:
I'm pretty new to coding so cant work it out!
Supposing that each line contains extactly two characters (first the key, then the value):
with open('clues.txt', 'r') as f:
myDict = {a[0]: a[1] for a in f}
If you have empty lines in your input file, you can filter these out:
with open('clues.txt', 'r') as f:
myDict = {a[0]: a[1] for a in f if a.strip()}
First, you'll want to read each line one at a time:
my_dict = {}
with open ("clues.txt", "r") as ftwo:
for line in ftwo:
# Then, you'll want to put your elements in a dict
my_dict[line[0]] = line[1]

compare two file and find matching words in python

I have a two file: the first one includes terms and their frequency:
table 2
apple 4
pencil 89
The second file is a dictionary:
abroad
apple
bread
...
I want to check whether the first file contains any words from the second file. For example both the first file and the second file contains "apple".
I am new to python.
I try something but it does not work. Could you help me ? Thank you
for line in dictionary:
words = line.split()
print words[0]
for line2 in test:
words2 = line2.split()
print words2[0]
Something like this:
with open("file1") as f1,open("file2") as f2:
words=set(line.strip() for line in f1) #create a set of words from dictionary file
#why sets? sets provide an O(1) lookup, so overall complexity is O(N)
#now loop over each line of other file (word, freq file)
for line in f2:
word,freq=line.split() #fetch word,freq
if word in words: #if word is found in words set then print it
print word
output:
apple
It may help you :
file1 = set(line.strip() for line in open('file1.txt'))
file2 = set(line.strip() for line in open('file2.txt'))
for line in file1 & file2:
if line:
print line
Here's what you should do:
First, you need to put all the dictionary words in some place where you can easily look them up. If you don't do that, you'd have to read the whole dictionary file every time you want to check one single word in the other file.
Second, you need to check if each word in the file is in the words you extracted from the dictionary file.
For the first part, you need to use either a list or a set. The difference between these two is that list keeps the order you put the items in it. A set is unordered, so it doesn't matter which word you read first from the dictionary file. Also, a set is faster when you look up an item, because that's what it is for.
To see if an item is in a set, you can do: item in my_set which is either True or False.
I have your first double list in try.txt and the single list in try_match.txt
f = open('try.txt', 'r')
f_match = open('try_match.txt', 'r')
print f
dictionary = []
for line in f:
a, b = line.split()
dictionary.append(a)
for line in f_match:
if line.split()[0] in dictionary:
print line.split()[0]

Categories

Resources