Reading text file into dic file results in incomplete dic file

Reading text file into dic file results in incomplete dic file - python

The following file 2016_01_22_Reps.txt is a list of expanded contractions that I want to put into a python dic file;
“can't":"cannot","could've":"could have","could've":"could have","didn't":"did not","doesn't":"does not", “don't":"do not"," hadn't":"had not", "hasn't":"has not","haven't":"have not","I'll":"I will","I'm":"I am","I've":"I have","isn't":"is not","I'll":"I
Note that the contents are a single line, not multiple lines.
My code is as follows;
reps = open('2016_01_22_Reps.txt', 'r')
Reps1dic={}
for line in reps:
x=line.split(",")
a=x[0]
b=x[1]
c=len(b)-1
b=b[0:c]
Reps1dic[a]=b
print (Reps1dic)
The output to Reps1dic stops after first two pairs of contractions. Contents are as follows;
{‘2016_01_22Reps = {“can\’t”:”cannot”‘ : ‘”could\’ve”:”could have’}
Instructions and explanation of why the complete file contents are not written to the dic file will be most appreciated.

The problem is that your values are all on one line, so your for line in reps only goes through the one iteration. Do something like this:
with open('2016_01_22_Reps.txt', 'r') as reps:
Reps1dic={}
contents = reps.read()
pairs = contents.split(',')
for pair in pairs:
parts = pair.split(':')
a = parts[0].replace('"', '').strip()
b = parts[1].replace('"', '').strip()
Reps1dic[a] = b
print(Reps1dic)
where you split the line and then iterate over that list instead of the lines in the file. I also used the with keyword to open your file - it's much better practice.

Related

Reading a text file and replacing it to value in dictionary

I have a dictionary made in python. I also have a text file where each line is a different word. I want to check each line of the text file against the keys of the dictionary and if the line in the text file matches the key I want to write that key's value to an output file. Is there an easy way to do this. Is this even possible?
for example I am reading my file in like this:
test = open("~/Documents/testfile.txt").read()
tokenising it and for each word token I want to look it up a dictionary, my dictionary is setup like this:
dic = {"a": ["ah0", "ey1"], "a's": ["ey1 z"], "a.": ["ey1"], "a.'s": ["ey1 z"]}
If I come across the letter 'a' in my file, I want it to output ["ah0", "ey1"].

you can try:
for line in all_lines:
for val in dic:
if line.count(val) > 0:
print(dic[val])
this will look through all lines in the file and if the line contains a letter from dic, then it will print the items associated with that letter in the dictionary (you will have to do something like all_lines = test.readlines() to get all the lines in a list) the dic[val] gives the list assined to the value ["ah0", "ey1"] so you do not just have to print it but you can use it in other places

you can give this a try:
#dictionary to match keys againts words in text filee
dict = {"a": ["ah0", "ey1"], "a's": ["ey1 z"], "a.": ["ey1"], "a.'s": ["ey1 z"]}
# Read from text filee
open_file = open('sampletext.txt', 'r')
lines = open_file.readlines()
open_file.close()
#search the word extracted from textfile, if found in dictionary then print list into the file
for word in lines:
if word in dict:
write_to_file = open('outputfile.txt', 'w')
write_to_file.writelines(str(dict[word]))
write_to_file.close()
Note: you may need to strip the newline "\n" if the textfile you read from have multiple lines

Faster method for replacing multiple words in a file

I am making a mini-translator for Japanese words for a given file.
The script have an expandable dictionary file that includes 13k+ lines in this format:-
JapaneseWord<:to:>EnglishWord
So I have to pick a line from the dictionary, then do a .strip('') to make a list in this format:-
[JapaneseWord,EnglishWord]
then I have to pick a line from the given file, and find the first item in this list in the line and replace it with its English equivalent, and I have to make sure to repeat this process in the same line for the number of times that Japanese word appears with the .count() function.
the problem is that this takes a long time because like this, I have to read the file again and again for 14k+ times, and this will expand as I expand the dictionary size.
I tried looking for a way to add the whole dictionary in the memory, and then compare them all in the given file at the same time, so like this, I will have to read the file one time, but I couldn't do it.
Here's the function I am using right now, it takes a var that includes the file's lines as a list with the file.readlines() function:-
def replacer(text):
#Current Dictionary.
cdic = open(argv[4], 'r', encoding='utf-8')
#Part To Replace.
for ptorep in cdic:
ptorep = ptorep.strip('\n')
ptorep = ptorep.split('<:to:>')
for line in text:
for clone in range(0, line.count(ptorep[0])):
line = line.replace(ptorep[0], ptorep[1])
text = ''.join(text)
return text
This takes around 1 min for a single small file.

Dictionary Method:
import re
with open(argv[4], 'r', encoding='utf-8') as file:
translations = [line.strip('\n').split('<:to:>') for line in file.readlines()]
translations = {t[0]:t[1] for t in translations} # Convert to dictionary where the key is the english word and the value is the translation
output = []
for word in re.split('\W+'): # Split into words (may require tweaking)
output.append(translations.get(word, word)) # Search for the key `word`, in case it does not exist, use `word`
output = ''.join(output)
Original Method:
Maybe keep the full dictionary in memory as a list:
cdic = open(argv[4], 'r', encoding='utf-8')
translations = []
for line in cdic.readlines():
translations.append(line.strip('\n').split('<:to:>'))
# Note: I would use a list comprehension for this
with open(argv[4], 'r', encoding='utf-8') as file:
translations = [line.strip('\n').split('<:to:>') for line in file.readlines()]
And make the replacements off of that:
def replacer(text, translations):
for entry in translations:
text = text.replace(entry[0], entry[1])
return text

Text to dictionary doesn't work

I have the following text file in the same folder as my Python Code.
78459581
Black Ballpoint Pen
12345670
Football
49585922
Perfume
83799715
Shampoo
I have written this Python code.
file = open("ProductDatabaseEdit.txt", "r")
d = {}
for line in file:
x = line.split("\n")
a=x[0]
b=x[1]
d[a]=b
print(d)
This is the result I receive.
b=x[1] # IndexError: list index out of range
My dictionary should appear as follows:
{"78459581" : "Black Ballpoint Pen"
"12345670" : "Football"
"49585922" : "Perfume"
"83799715" : "Shampoo"}
What am I doing wrong?

A line is terminated by a linebreak, thus line.split("\n") will never give you more than one line.
You could cheat and do:
for first_line in file:
second_line = next(file)

You can simplify your solution by using a dictionary generator, this is probably the most pythonic solution I can think of:
>>> with open("in.txt") as f:
... my_dict = dict((line.strip(), next(f).strip()) for line in f)
...
>>> my_dict
{'12345670': 'Football', '49585922': 'Perfume', '78459581': 'Black Ballpoint Pen', '83799715': 'Shampoo'}
Where in.txt contains the data as described in the problem. It is necessary to strip() each line otherwise you would be left with a trailing \n character for your keys and values.

You need to strip the \n, not split
file = open("products.txt", "r")
d = {}
for line in file:
a = line.strip()
b = file.next().strip()
# next(file).strip() # if using python 3.x
d[a]=b
print(d)
{'12345670': 'Football', '49585922': 'Perfume', '78459581': 'Black Ballpoint Pen', '83799715': 'Shampoo'}

What's going on
When you open a file you get an iterator, which will give you one line at a time when you use it in a for loop.
Your code is iterating over the file, splitting every line in a list with \n as the delimiter, but that gives you a list with only one item: the same line you already had. Then you try to access the second item in the list, which doesn't exist. That's why you get the IndexError: list index out of range.
How to fix it
What you need is this:
file = open('products.txt','r')
d = {}
for line in file:
d[line.strip()] = next(file).strip()
In every loop you add a new key to the dictionary (by assigning a value to a key that didn't exist yet) and assign the next line as the value. The next() function is just telling to the file iterator "please move on to the next line". So, to drive the point home: in the first loop you set first line as a key and assign the second line as the value; in the second loop iteration, you set the third line as a key and assign the fourth line as the value; and so on.
The reason you need to use the .strip() method every time, is because your example file had a space at the end of every line, so that method will remove it.
Or...
You can also get the same result using a dictionary comprehension:
file = open('products.txt','r')
d = {line.strip():next(file).strip() for line in file}
Basically, is a shorter version of the same code above. It's shorter, but less readable: not necessarily something you want (a matter of taste).

In my solution i tried to not use any loops. Therefore, I first load the txt data with pandas:
import pandas as pd
file = pd.read_csv("test.txt", header = None)
Then I seperate keys and values for the dict such as:
keys, values = file[0::2].values, file[1::2].values
Then, we can directly zip these two as lists and create a dict:
result = dict(zip(list(keys.flatten()), list(values.flatten())))
To create this solution I used the information as provided in [question]: How to remove every other element of an array in python? (The inverse of np.repeat()?) and in [question]: Map two lists into a dictionary in Python

You can loop over a list two items at a time:
file = open("ProductDatabaseEdit.txt", "r")
data = file.readlines()
d = {}
for line in range(0,len(data),2):
d[data[i]] = data[i+1]

Try this code (where the data is in /tmp/tmp5.txt):
#!/usr/bin/env python3
d = dict()
iskey = True
with open("/tmp/tmp5.txt") as infile:
for line in infile:
if iskey:
_key = line.strip()
else:
_value = line.strip()
d[_key] = _value
iskey = not iskey
print(d)
Which gives you:
{'12345670': 'Football', '49585922': 'Perfume', '78459581': 'Black Ballpoint Pen', '83799715': 'Shampoo'}

Adding characters from an external file to a dictionary

In python, how would i select a single character from a txt document that contains the following:
A#
M*
N%
(on seperate lines)...and then update a dictionary with the letter as the key and the symbol as the value.
The closest i have got is:
ftwo = open ("clues.txt", "r")
for lines in ftwo.readlines():
for char in lines:
I'm pretty new to coding so cant work it out!

Supposing that each line contains extactly two characters (first the key, then the value):
with open('clues.txt', 'r') as f:
myDict = {a[0]: a[1] for a in f}
If you have empty lines in your input file, you can filter these out:
with open('clues.txt', 'r') as f:
myDict = {a[0]: a[1] for a in f if a.strip()}

First, you'll want to read each line one at a time:
my_dict = {}
with open ("clues.txt", "r") as ftwo:
for line in ftwo:
# Then, you'll want to put your elements in a dict
my_dict[line[0]] = line[1]

compare two file and find matching words in python

I have a two file: the first one includes terms and their frequency:
table 2
apple 4
pencil 89
The second file is a dictionary:
abroad
apple
bread
...
I want to check whether the first file contains any words from the second file. For example both the first file and the second file contains "apple".
I am new to python.
I try something but it does not work. Could you help me ? Thank you
for line in dictionary:
words = line.split()
print words[0]
for line2 in test:
words2 = line2.split()
print words2[0]

Something like this:
with open("file1") as f1,open("file2") as f2:
words=set(line.strip() for line in f1) #create a set of words from dictionary file
#why sets? sets provide an O(1) lookup, so overall complexity is O(N)
#now loop over each line of other file (word, freq file)
for line in f2:
word,freq=line.split() #fetch word,freq
if word in words: #if word is found in words set then print it
print word
output:
apple

It may help you :
file1 = set(line.strip() for line in open('file1.txt'))
file2 = set(line.strip() for line in open('file2.txt'))
for line in file1 & file2:
if line:
print line

Here's what you should do:
First, you need to put all the dictionary words in some place where you can easily look them up. If you don't do that, you'd have to read the whole dictionary file every time you want to check one single word in the other file.
Second, you need to check if each word in the file is in the words you extracted from the dictionary file.
For the first part, you need to use either a list or a set. The difference between these two is that list keeps the order you put the items in it. A set is unordered, so it doesn't matter which word you read first from the dictionary file. Also, a set is faster when you look up an item, because that's what it is for.
To see if an item is in a set, you can do: item in my_set which is either True or False.

I have your first double list in try.txt and the single list in try_match.txt
f = open('try.txt', 'r')
f_match = open('try_match.txt', 'r')
print f
dictionary = []
for line in f:
a, b = line.split()
dictionary.append(a)
for line in f_match:
if line.split()[0] in dictionary:
print line.split()[0]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Reading text file into dic file results in incomplete dic file - python

Related

Reading a text file and replacing it to value in dictionary

Faster method for replacing multiple words in a file

Text to dictionary doesn't work

Adding characters from an external file to a dictionary

compare two file and find matching words in python

Categories

Resources