How to convert a file into a dictionary? - python

I have a file comprising two columns, i.e.,
1 a
2 b
3 c
I wish to read this file to a dictionary such that column 1 is the key and column 2 is the value, i.e.,
d = {1:'a', 2:'b', 3:'c'}
The file is small, so efficiency is not an issue.

d = {}
with open("file.txt") as f:
for line in f:
(key, val) = line.split()
d[int(key)] = val

This will leave the key as a string:
with open('infile.txt') as f:
d = dict(x.rstrip().split(None, 1) for x in f)

You can also use a dict comprehension like:
with open("infile.txt") as f:
d = {int(k): v for line in f for (k, v) in [line.strip().split(None, 1)]}

def get_pair(line):
key, sep, value = line.strip().partition(" ")
return int(key), value
with open("file.txt") as fd:
d = dict(get_pair(line) for line in fd)

By dictionary comprehension
d = { line.split()[0] : line.split()[1] for line in open("file.txt") }
Or By pandas
import pandas as pd
d = pd.read_csv("file.txt", delimiter=" ", header = None).to_dict()[0]

Simple Option
Most methods for storing a dictionary use JSON, Pickle, or line reading. Providing you're not editing the dictionary outside of Python, this simple method should suffice for even complex dictionaries. Although Pickle will be better for larger dictionaries.
x = {1:'a', 2:'b', 3:'c'}
f = 'file.txt'
print(x, file=open(f,'w')) # file.txt >>> {1:'a', 2:'b', 3:'c'}
y = eval(open(f,'r').read())
print(x==y) # >>> True

If you love one liners, try:
d=eval('{'+re.sub('\'[\s]*?\'','\':\'',re.sub(r'([^'+input('SEP: ')+',]+)','\''+r'\1'+'\'',open(input('FILE: ')).read().rstrip('\n').replace('\n',',')))+'}')
Input FILE = Path to file, SEP = Key-Value separator character
Not the most elegant or efficient way of doing it, but quite interesting nonetheless :)

IMHO a bit more pythonic to use generators (probably you need 2.7+ for this):
with open('infile.txt') as fd:
pairs = (line.split(None) for line in fd)
res = {int(pair[0]):pair[1] for pair in pairs if len(pair) == 2 and pair[0].isdigit()}
This will also filter out lines not starting with an integer or not containing exactly two items

I had a requirement to take values from text file and use as key value pair. i have content in text file as key = value, so i have used split method with separator as "=" and
wrote below code
d = {}
file = open("filename.txt")
for x in file:
f = x.split("=")
d.update({f[0].strip(): f[1].strip()})
By using strip method any spaces before or after the "=" separator are removed and you will have the expected data in dictionary format

import re
my_file = open('file.txt','r')
d = {}
for i in my_file:
g = re.search(r'(\d+)\s+(.*)', i) # glob line containing an int and a string
d[int(g.group(1))] = g.group(2)

Here's another option...
events = {}
for line in csv.reader(open(os.path.join(path, 'events.txt'), "rb")):
if line[0][0] == "#":
continue
events[line[0]] = line[1] if len(line) == 2 else line[1:]

Related

How can i add these sentences to a dictionary?

I have a directory of files entitled 45-1.txt 1-17.txt etc.. basically they're 2 numbers seperated by a '-' with .txt at the end.
And i have a dataset that looks like this but has thousands of lines:
values/test/10/blueprint-0.png,2089.0,545.0,2100.0,546.0
values/test/10/blueprint-0.png,2112.0,545.0,2136.0,554.0
values/test/45/blueprint-1.png,112.0,45.0,36.0,654.0
The values that i care about in these lines are the first 2 numbers of each line, so 10-0, 10-0,45-1 etc..
what i want to do is to copy the lines that have the 2 numbers let's say 10-0 as a part of the name of 1 of the previous files, in this example 45-1 should be copied.
My approach:
import os,csv,re
my_dict = {}
source_dir = '/home/ubuntu/Desktop/EAST/testing_txts/'
for element in os.listdir(source_dir):
my_dict[element] = ''
# print(my_dict)
with open('/home/ubuntu/Desktop/EAST/ground_truth.txt') as f:
reader = csv.reader(f)
for key in my_dict:
for filename, *numbers in reader:
k1, k2 = re.findall(r'\d+', filename)
k3,k4 = re.findall(r'\d+', key)
if k3 == k1 and k2 == k4:
my_dict[key].append(filename)
To explain what i did a bit, i read all the name files in my directory and made them keys in a dictionary, then will read my file line by line for a specific key, if i find a similar line exists i will append the entire line to the specific dictionary key so assuming we have in the first directory 25-1.txt , 45-1.txt and 1-0.txt, and in the other file i have :
values/test/10/blueprint-0.png,2089.0,545.0,2100.0,546.0
values/test/10/blueprint-0.png,2112.0,545.0,2136.0,554.0
values/test/45/blueprint-1.png,112.0,45.0,36.0,654.0
values/test/45/blueprint-1.png,2.0,5.0,6.0,54.0
the end result will be 3 keys only 45-1 having elements in it and the elements are values/test/45/blueprint-1.png,112.0,45.0,36.0,654.0 and values/test/45/blueprint-1.png,2.0,5.0,6.0,54.0 (list of elements) the issue that i had with my code above is that i can't append the full sentence properly and get my keys with elements i get the error can't use append with strings and when i used my_dict[key] =filename to test knowing that it wrong and overwrites, only my first key had any element in it the rest were empty knowing they should exist as well.
Edit:
After fixing a list issue after a helpful answer and i did some quick adjustments my code became:
import os,csv,re
my_dict = {}
source_dir = '/home/ubuntu/Desktop/EAST/testing_txts/'
for element in os.listdir(source_dir):
my_dict[element] = []
# print(my_dict)
with open('/home/ubuntu/Desktop/EAST/ground_truth.txt') as f:
reader = csv.reader(f)
for key in my_dict:
for filename in reader:
print(filename)
k = []
k.append(re.findall(r'\d+', str(filename)))
k1,k2 = k[0][0],k[0][1]
k3,k4 = re.findall(r'\d+', key)
if k3 == k1 and k2 == k4:
my_dict[key].append(filename)
print(my_dict)
However my main issue of not every key is getting the elements persists as many keys stay empty.
for element in os.listdir(source_dir):
my_dict[element] = ''
You have initialized your my_dict values to be string. Hence when you use append it will create AttributeError. Because you can't append to a string
Approach 1 is to mention the values are a list and then join them as a string when reading it. append will not throw an error in this case
for element in os.listdir(source_dir):
my_dict[element] = []
Approach 2 is to use string concatenation
my_dict[key] += filename
Issue 2
I am not really sure but guessing that it might be because of the looping over the dict.
with open('/home/ubuntu/Desktop/EAST/ground_truth.txt') as f:
reader = csv.reader(f)
for filename in reader:
print(filename)
k1,k2 = re.findall(r'\d+', str(filename)
my_dict[k1+"-"+k2].append(filename)
print(my_dict)
import os,csv,re
my_dict = {}
source_dir = 'source'
for element in os.listdir(source_dir):
my_dict[element] = []
# print(my_dict)
with open('readme.txt') as f:
reader = f.readlines()
for key in my_dict:
for line in reader:
k1= re.findall(r'\d+', line)
k1 = k1[0] + k1[1]
key_stripped = key.replace('-','').replace('.txt', '')
if k1 == key_stripped:
my_dict[key].append(line)
print(my_dict)

how to read file line by line in python

I'm trying to read below text file in python, I'm struggling to get as key value in output but its not working as expected:
test.txt
productId1 ProdName1,ProdPrice1,ProdDescription1,ProdDate1
productId2 ProdName2,ProdPrice2,ProdDescription2,ProdDate2
productId3 ProdName3,ProdPrice3,ProdDescription3,ProdDate3
productId4 ProdName4,ProdPrice4,ProdDescription4,ProdDate4
myPython.py
import sys
with open('test.txt') as f
lines = list(line.split(' ',1) for line in f)
for k,v in lines.items();
print("Key : {0}, Value: {1}".format(k,v))
I'm trying to parse the text file and trying to print key and value separately. Looks like I'm doing something wrong here. Need some help to fix this?
Thanks!
You're needlessly storing a list.
Loop, split and print
with open('test.txt') as f:
for line in f:
k, v = line.rstrip().split(' ',1)
print("Key : {0}, Value: {1}".format(k,v))
This should work, with a list comprehension:
with open('test.txt') as f:
lines = [line.split(' ',1) for line in f]
for k, v in lines:
print("Key: {0}, Value: {1}".format(k, v))
You can make a dict right of the bat with a dict comp and than iterate the list to print as you wanted. What you had done was create a list, which does not have an items() method.
with open('notepad.txt') as f:
d = {line.split(' ')[0]:line.split(' ')[1] for line in f}
for k,v in d.items():
print("Key : {0}, Value: {1}".format(k,v))
lines is a list of lists, so the good way to finish the job is:
import sys
with open('test.txt') as f:
lines = list(line.split(' ',1) for line in f)
for k,v in lines:
print("Key : {0}, Value: {1}".format(k,v))
Perhaps I am reading too much into your description but I see one key, a space and a comma limited name of other fields. If I interpret that as their being data for those items that is comma limited then I would conclude you want a dictionary of dictionaries. That would lead to code like:
data_keys = 'ProdName', 'ProdPrice', 'ProdDescription', 'ProdDate'
with open('test.txt') as f:
for line in f:
id, values = l.strip().split() # automatically on white space
keyed_values = zip(data_keys, values.split(','))
print(dict([('key', id)] + keyed_values))
You can use the f.readlines() function that returns a list of lines in the file f. I changed the code to include f.lines in line 3.
import sys
with open('test.txt') as f:
lines = list(line.split(' ',1) for line in f.readlines())
for k,v in lines.items();
print("Key : {0}, Value: {1}".format(k,v))

Python read line from file and sort int descending

So I got this text file looking like this:
PID TTY TIME CMD
1000 pts/2 00:00:00 aash
9000 pts/2 00:00:00 bash
3000 pts/2 00:00:00 cash
What I want to end up with is some kind of dictionary where I save |(PID,CMD)| sorted by PID descending.
So it would look like this:
[(9000,bash),(3000,cash),(1000,aash)]
Any Ideas?
This is how I read the file and save in dictionary.
dict = {}
with open('newfile.txt') as f:
next(f) #skipping first line
for line in f:
result[line.split()[3]] = int(line.split()[0])
Appreciate any kind of help! Thanks in advance !
So this is the solution:
import collections
result = {}
with open('newfile.txt') as f:
next(f)
for line in f:
result[line.split()[3]] = int(line.split()[0])
print(collections.OrderedDict(sorted(result.items(), key=lambda t: t[1])))
This is what it prints out:
OrderedDict([('aash', 1000), ('cash', 3000), ('bash', 9000)])])
If you need to end up with a list, then best is to read the data into a list and then to sort it, here is how:
lst = []
with open('newfile.txt') as f:
next(f)
for line in f:
if line.split() != '': # watch out for empty lines
a, b, c, d = line.split()
lst.append((int(a), d))
lst = sorted(lst)
print(lst)
====
[(1000, 'aash'), (3000, 'cash'), (9000, 'bash')]
sorted() sorts by the first item on the tuple, so you can use it in its basic form.
If what you need is a dictionary where the keys are sorted, then you can use OrderedDict, just import it and add another line to the code:
from collections import OrderedDict
and then
d = OrderedDict(lst)
print(d)
And here is the result:
OrderedDict([(1000, 'aash'), (3000, 'cash'), (9000, 'bash')])

Python: Read text file into dict and ignore comments

I am trying to put the following text file into a dictionary but I would like any section starting with '#' or empty lines ignored.
My text file looks something like this:
# This is my header info followed by an empty line
Apples 1 # I want to ignore this comment
Oranges 3 # I want to ignore this comment
#~*~*~*~*~*~*~*Another comment~*~*~*~*~*~*~*~*~*~*
Bananas 5 # I want to ignore this comment too!
My desired output would be:
myVariables = {'Apples': 1, 'Oranges': 3, 'Bananas': 5}
My Python code reads as follows:
filename = "myFile.txt"
myVariables = {}
with open(filename) as f:
for line in f:
if line.startswith('#') or not line:
next(f)
key, val = line.split()
myVariables[key] = val
print "key: " + str(key) + " and value: " + str(val)
The error I get:
Traceback (most recent call last):
File "C:/Python27/test_1.py", line 11, in <module>
key, val = line.split()
ValueError: need more than 1 value to unpack
I understand the error but I do not understand what is wrong with the code.
Thank you in advance!
Given your text:
text = """
# This is my header info followed by an empty line
Apples 1 # I want to ignore this comment
Oranges 3 # I want to ignore this comment
#~*~*~*~*~*~*~*Another comment~*~*~*~*~*~*~*~*~*~*
Bananas 5 # I want to ignore this comment too!
"""
We can do this in 2 ways. Using regex, or using Python generators. I would choose the latter (described below) as regex is not particularly fast(er) in such cases.
To open the file:
with open('file_name.xyz', 'r') as file:
# everything else below. Just substitute `for line in lines` with
# `for line in file.readline()`
Now to create a similar, we split the lines, and create a list:
lines = text.split('\n') # as if read from a file using `open`.
Here is how we do all you want in a couple of lines:
# Discard all comments and empty values.
comment_less = filter(None, (line.split('#')[0].strip() for line in lines))
# Separate items and totals.
separated = {item.split()[0]: int(item.split()[1]) for item in comment_less}
Lets test:
>>> print(separated)
{'Apples': 1, 'Oranges': 3, 'Bananas': 5}
Hope this helps.
This doesn't exactly reproduce your error, but there's a problem with your code:
>>> x = "Apples\t1\t# This is a comment"
>>> x.split()
['Apples', '1', '#', 'This', 'is', 'a', 'comment']
>>> key, val = x.split()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: too many values to unpack
Instead try:
key = line.split()[0]
val = line.split()[1]
Edit: and I think your "need more than 1 value to unpack" is coming from the blank lines. Also, I'm not familiar with using next() like this. I guess I would do something like:
if line.startswith('#') or line == "\n":
pass
else:
key = line.split()[0]
val = line.split()[1]
To strip comments, you could use str.partition() which works whether the comment sign is present or not in the line:
for line in file:
line, _, comment = line.partition('#')
if line.strip(): # non-blank line
key, value = line.split()
line.split() may raise an exception in this code too—it happens if there is a non-blank line that does not contain exactly two whitespace-separated words—it is application depended what you want to do in this case (ignore such lines, print warning, etc).
You need to ignore empty lines and lines starting with # splitting the remaining lines after either splitting on # or using rfind as below to slice the string, an empty line will have a new line so you need and line.strip() to check for one, you cannot just split on whitespace and unpack as you have more than two elements after splitting including what is in the comment:
with open("in.txt") as f:
d = dict(line[:line.rfind("#")].split() for line in f
if not line.startswith("#") and line.strip())
print(d)
Output:
{'Apples': '1', 'Oranges': '3', 'Bananas': '5'}
Another option is to split twice and slice:
with open("in.txt") as f:
d = dict(line.split(None,2)[:2] for line in f
if not line.startswith("#") and line.strip())
print(d)
Or splitting twice and unpacking using an explicit loop:
with open("in.txt") as f:
d = {}
for line in f:
if not line.startswith("#") and line.strip():
k, v, _ = line.split(None, 2)
d[k] = v
You can also use itertools.groupby to group the lines you want.
from itertools import groupby
with open("in.txt") as f:
grouped = groupby(f, lambda x: not x.startswith("#") and x.strip())
d = dict(next(v).split(None, 2)[:2] for k, v in grouped if k)
print(d)
To handle where we have multiple words in single quotes we can use shlex to split:
import shlex
with open("in.txt") as f:
d = {}
for line in f:
if not line.startswith("#") and line.strip():
data = shlex.split(line)
d[data[0]] = data[1]
print(d)
So changing the Banana line to:
Bananas 'north-side disabled' # I want to ignore this comment too!
We get:
{'Apples': '1', 'Oranges': '3', 'Bananas': 'north-side disabled'}
And the same will work for the slicing:
with open("in.txt") as f:
d = dict(shlex.split(line)[:2] for line in f
if not line.startswith("#") and line.strip())
print(d)
If the format of the file is correctly defined you can try a solution with regular expressions.
Here's just an idea:
import re
fruits = {}
with open('fruits_list.txt', mode='r') as f:
for line in f:
match = re.match("([a-zA-Z0-9]+)[\s]+([0-9]+).*", line)
if match:
fruit_name, fruit_amount = match.groups()
fruits[fruit_name] = fruit_amount
print fruits
UPDATED:
I changed the way of reading lines taking care of large files. Now I read line by line and not all in one. This improves the memory usage.

Python- how to convert lines in a .txt file to dictionary elements?

Say I have a file "stuff.txt" that contains the following on separate lines:
q:5
r:2
s:7
I want to read each of these lines from the file, and convert them to dictionary elements, the letters being the keys and the numbers the values.
So I would like to get
y ={"q":5, "r":2, "s":7}
I've tried the following, but it just prints an empty dictionary "{}"
y = {}
infile = open("stuff.txt", "r")
z = infile.read()
for line in z:
key, value = line.strip().split(':')
y[key].append(value)
print(y)
infile.close()
try this:
d = {}
with open('text.txt') as f:
for line in f:
key, value = line.strip().split(':')
d[key] = int(value)
You are appending to d[key] as if it was a list. What you want is to just straight-up assign it like the above.
Also, using with to open the file is good practice, as it auto closes the file after the code in the 'with block' is executed.
There are some possible improvements to be made. The first is using context manager for file handling - that is with open(...) - in case of exception, this will handle all the needed tasks for you.
Second, you have a small mistake in your dictionary assignment: the values are assigned using = operator, such as dict[key] = value.
y = {}
with open("stuff.txt", "r") as infile:
for line in infile:
key, value = line.strip().split(':')
y[key] = (value)
print(y)
Python3:
with open('input.txt', 'r', encoding = "utf-8") as f:
for line in f.readlines():
s=[] #converting strings to list
for i in line.split(" "):
s.append(i)
d=dict(x.strip().split(":") for x in s) #dictionary comprehension: converting list to dictionary
e={a: int(x) for a, x in d.items()} #dictionary comprehension: converting the dictionary values from string format to integer format
print(e)

Categories

Resources