Python : convert text file to dict - python

I would like to convert but I have gotten an error
A 123 132 21
B 34 293 91
d = {}
with open("ab.txt") as f:
for line in f:
(key, val) = line.split()
d[(key)] = val
print(d)

You probably got a "ValueError: too many values to unpack (expected 2)" error.
I am assuming you are trying to create a dictionary where the results will be:
d = {'A': [123, 132, 21], 'B': [34, 293, 91]}
If that is the case, you need:
d = {}
with open("ab.txt") as f:
for line in f:
(key, *val) = line.split() # this will allow val to become a list of the remaining values
d[key] = list(map(lambda x: int(x), val))
print(d)
You were just missing the * before val to cause iterable unpacking: Its operand must be an iterable. The iterable is expanded into a sequence of items, which are included in the new tuple, list, or set, at the site of the unpacking.
Update
From the Python manual concerning the map(function, iterable, ...) function:
Return an iterator that applies function to every item of iterable, yielding the results. If additional iterable arguments are passed, function must take that many arguments and is applied to the items from all iterables in parallel. With multiple iterables, the iterator stops when the shortest iterable is exhausted. For cases where the function inputs are already arranged into argument tuples, see itertools.starmap().
In this case we use a lambda function as the first argument to map. It takes one argument and converts it to an int.
Then we apply the list() constructor to the iterator returned by the map function to create a list of int items.
Crank up an IDE (you can just type python with no arguments):
>>> val = ['123', '132', '21']
>>> m = map(lambda x: int(x), val)
>>> m
<map object at 0x00000279A94F1358>
>>> list(m)
[123, 132, 21]
Or:
>>> val = ['123', '132', '21']
>>> m = map(lambda x: int(x), val)
>>> for x in m: print(x)
...
123
132
21
And:
>>> l = lambda x, y: x + y
>>> l(7,9)
16
>>>
You can also accomplish the conversion with a list comprehension:
Most people find the list comprehension method easier to read/understand compared with using map, but I wanted to show you both ways:
d = {}
with open("ab.txt") as f:
for line in f:
(key, *val) = line.split() # this will allow val to become a list of the remaining values
d[key] = [int(x) for x in val]
print(d)

Depending on what you want to achieve, either:
replace line.split() with line.split(maxsplit=1) - this way
values of your dict will be strings like "123 132 21"
replace (key, val) = line.split() with (key, *val) = line.split() - this way values of your dict will be lists like [123, 132, 21]

The built-in function str.split() splits a string wherever it finds whitespace when no separator is specified. So, using your first line as an example, the following happens:
>>> 'A 123 132 21'.split()
['A', '123', '132', '21']
Thus, you get too many values to unpack in your definition of the tuple. To fix it, you could use the star * operator (I don't know the proper name for that operator) in front of your variable definition
d = {}
with open("ab.txt") as f:
for line in f:
(key, *val) = line.split()
d[(key)] = val
print(d)
# Prints {'A': ['123', '132', '21'], 'B': ['34', '293', '91']}
This is explained in the Python tutorial.
Another option is to use the named maxsplit parameter of the split() function:
d = {}
with open("ab.txt") as f:
for line in f:
(key, val) = line.split(maxsplit=1)
d[(key)] = val.strip()
print(d)
Note that I had to append strip() when setting the entry to the dictionary to get rid of a trailing newline \n.

Yon can do it this way :
fichier = open("ab.txt", 'r')
d = {}
for line in fichier.readlines():
d[line[0]] = line[2:].rstrip().rsplit(' ')
print(d)
Output :
{'A': ['123', '132', '21'], 'B': ['34', '293', '91']}
Ask me if you don't understand something :).

Related

Converting a string to a dictionary

I am trying to convert string into Dictionary in Python. I have tried the following but getting the error
'dict' object is not callable. Please help me solve this problem.
l=[]
str='Vijay=23,Ganesh=20,Lakshmi=19,Nikhil=22'
for x in str.split(','):
y=x.split('=')
l.append(y)
d=dict(l)
for k in d:
print('{:1s}---{:1s}'.format(k,d[k]))
s = 'Vijay=23,Ganesh=20,Lakshmi=19,Nikhil=22'
dct = {n: a for n, a in (p.split("=") for p in s.split(","))} #Credits to JL0PD
print(dct)
#{'Vijay': '23', 'Ganesh': '20', 'Lakshmi': '19', 'Nikhil': '22'}
l=[]
str='Vijay=23,Ganesh=20,Lakshmi=19,Nikhil=22'
for x in str.split(','):
y=x.split('=')
l.append(y)
result = {}
for item in l:
result[item[0]] = item[1]
print(result)
You override the reserved word str.
In python the word str is the name of the class string.
When you wrote str = 'Vijay=23,Ganesh=20,Lakshmi=19,Nikhil=22',
you changed the class str to be 'Vijay=23,Ganesh=20,Lakshmi=19,Nikhil=22'
run this code:
l = []
string = 'Vijay=23,Ganesh=20,Lakshmi=19,Nikhil=22'
for x in string.split(','):
y = x.split('=')
l.append(y)
d = dict(l)
for k in d:
print('{:1s}---{:1s}'.format(k,d[k]))
Same code but follows PEP8 rules
pars_list = []
row_data = 'Vijay=23,Ganesh=20,Lakshmi=19,Nikhil=22'
for i in row_data.split(','):
pars_list.append(i.split('='))
data = dict(l)
for k in data.keys():
print('{:1s}---{:1s}'.format(k, data[k]))
Give this a try.
Step1
Iterate through ['Vijay=23', 'Ganesh=20', 'Lakshmi=19', 'Nikhil=22']
Step2
Split every person by =. ['Nikhil', '22']
Step3
Add the first element to the dict as a key and the age to the dict as a value
peopleDict = {people.split('=')[0]:int(people.split('=')[1]) for people in string.split(',')}
output
{'Vijay': 23, 'Ganesh': 20, 'Lakshmi': 19, 'Nikhil': 22}
#Note1: Make sure you do not name your variable str or int, input etc. These are all methods created by python, don't overide them.
#Note2: my output turns all the ages into integers just in case you need to manipulate them later on in your code.

Python - How do i build a dictionary from a text file?

for the class data structures and algorithms at Tilburg University i got a question in an in class test:
build a dictionary from testfile.txt, with only unique values, where if a value appears again, it should be added to the total sum of that productclass.
the text file looked like this, it was not a .csv file:
apples,1
pears,15
oranges,777
apples,-4
oranges,222
pears,1
bananas,3
so apples will be -3 and the output would be {"apples": -3, "oranges": 999...}
in the exams i am not allowed to import any external packages besides the normal: pcinput, math, etc. i am also not allowed to use the internet.
I have no idea how to accomplish this, and this seems to be a big problem in my development of python skills, because this is a question that is not given in a 'dictionaries in python' video on youtube (would be to hard maybe), but also not given in a expert course because there this question would be to simple.
hope you guys can help!
enter code here
from collections import Counter
from sys import exit
from os.path import exists, isfile
##i did not finish it, but wat i wanted to achieve was build a list of the
strings and their belonging integers. then use the counter method to add
them together
## by splitting the string by marking the comma as the split point.
filename = input("filename voor input: ")
if not isfile(filename):
print(filename, "bestaat niet")
exit()
keys = []
values = []
with open(filename) as f:
xs = f.read().split()
for i in xs:
keys.append([i])
print(keys)
my_dict = {}
for i in range(len(xs)):
my_dict[xs[i]] = xs.count(xs[i])
print(my_dict)
word_and_integers_dict = dict(zip(keys, values))
print(word_and_integers_dict)
values2 = my_dict.split(",")
for j in values2:
print( value2 )
the output becomes is this:
[['schijndel,-3'], ['amsterdam,0'], ['tokyo,5'], ['tilburg,777'], ['zaandam,5']]
{'zaandam,5': 1, 'tilburg,777': 1, 'amsterdam,0': 1, 'tokyo,5': 1, 'schijndel,-3': 1}
{}
so i got the dictionary from it, but i did not separate the values.
the error message is this:
28 values2 = my_dict.split(",") <-- here was the error
29 for j in values2:
30 print( value2 )
AttributeError: 'dict' object has no attribute 'split'
I don't understand what your code is actually doing, I think you don't know what your variables are containing, but this is an easy problem to solve in Python. Split into a list, split each item again, and count:
>>> input = "apples,1 pears,15 oranges,777 apples,-4 oranges,222 pears,1 bananas,3"
>>> parts = input.split()
>>> parts
['apples,1', 'pears,15', 'oranges,777', 'apples,-4', 'oranges,222', 'pears,1', 'bananas,3']
Then split again. Behold the list comprehension. This is an idiomatic way to transform a list to another in python. Note that the numbers are strings, not ints yet.
>>> strings = [s.split(',') for s in strings]
>>> strings
[['apples', '1'], ['pears', '15'], ['oranges', '777'], ['apples', '-4'], ['oranges', '222'], ['pears', '1'], ['bananas', '3']]
Now you want to iterate over pairs, and sum all the same fruits. This calls for a dict:
>>> result = {}
>>> for fruit, countstr in pairs:
... if fruit not in result:
... result[fruit] = 0
... result[fruit] += int(countstr)
>>> result
{'pears': 16, 'apples': -3, 'oranges': 999, 'bananas': 3}
This pattern of adding an element if it doesn't exist comes up frequently. You should checkout defaultdict in the collections module. If you use that, you don't even need the if.
Let's walk through what you need to do to. First, check if the file exists and read the contents to a variable. Second, parse each line - you need to split the line on the comma, convert the number from a string to an integer, and then pass the values to a dictionary. In this case I would recommend using defaultdict from collections, but we can also do it with a standard dictionary.
from os.path import exists, isfile
from collections import defaultdict
filename = input("filename voor input: ")
if not isfile(filename):
print(filename, "bestaat niet")
exit()
# this reads the file to a list, removing newline characters
with open(filename) as f:
line_list = [x.strip() for x in f]
# create a dictionary
my_dict = {}
# update the value in the dictionary if it already exists,
# otherwise add it to the dictionary
for line in line_list:
k, v_str = line.split(',')
if k in my_dict:
my_dict[k] += int(v_str)
else:
my_dict[k] = int(v_str)
# print the dictionary
table_str = '{:<30}{}'
print(table_str.format('Item','Count'))
print('='*35)
for k,v in sorted(my_dict.item()):
print(table_str.format(k,v))

Using `itertools.groupby()` to get lists of runs of strings that start with `A`?

The (abstracted) problem is this: I have a log file
A: 1
A: 2
A: 3
B: 4
B: 5
A: 6
C: 7
D: 8
A: 9
A: 10
A: 11
and I want to end up with a list of lists like this:
[["1", "2", "3"], ["6"], ["9", "10", "11"]]
where the file has been broken up into "runs" of strings starting with A. I know that I can use itertools.groupby to solve this, and right now I have this solution (where f is a list of the lines in the file).
starts_with_a = lambda x: x.startswith("A")
coalesced = [g for _, g in groupby(f), key=starts_with_a]
runs = [re.sub(r'A: ', '', s) for s in coalesced if starts_with_a(s)]
So I use groupby, but then I have to filter out the stuff that doesn't start with "A". This is okay, and pretty terse, but is there a more elegant way to do it? I'd love a way that:
doesn't require two passes
is terser (and/or) is more readable
Help me harness the might of itertools!
Yes, filter out the lines that don't start with A but use the key produced by groupby() for each group returned. It's the return value of the key function, so it'll be True for those lines that do start with A. I'd use str.partition() here instead of a regular expression:
coalesce = (g for key, g in groupby(f, key=lambda x: x[:1] == "A") if key)
runs = [[res.partition(':')[-1].strip() for res in group] for group in coalesce]
Since your str.startswith() argument is a fixed-width string literal, you may as well use slicing; x[:1] slices of the first character and compares that to 'A', which gives you the same test as x.startswith('A').
I used a generator expression to group the groupby() filtering; you could just inline that into just the one list comprehension:
runs = [[res.partition(':')[-1].strip() for res in group]
for key, group in groupby(f, key=lambda x: x[:1] == "A") if key]
Demo:
>>> from itertools import groupby
>>> f = '''\
... A: 1
... A: 2
... A: 3
... B: 4
... B: 5
... A: 6
... C: 7
... D: 8
... A: 9
... A: 10
... A: 11
... '''.splitlines(True)
>>> coalesce = (g for key, g in groupby(f, key=lambda x: x[:1] == "A") if key)
>>> [[res.partition(':')[-1].strip() for res in group] for group in coalesce]
[['1', '2', '3'], ['6'], ['9', '10', '11']]
You want terse? OK, you got it.
>>> lst = ['A: 1', 'A: 2', 'A: 3', 'B: 4', 'B: 5', 'A: 6', 'C: 7', 'D: 8', 'A: 9', 'A: 10', 'A: 11']
>>> [[x[1] for x in group[1]] for group in itertools.groupby((line.split(': ') for line in lst), key=lambda a:a[0]) if group[0]=='A']
[['1', '2', '3'], ['6'], ['9', '10', '11']]
Breaking it down, from the inside out:
(line.split(': ') for line in lst)
This is a generator expression that splits each element into its alpha key and the associated string value.
for group in itertools.groupby(..., key=lambda a:a[0])
This simply groups the elements by the alpha key that was determined in the first step.
... if group[0]=='A'
This simply excludes any results that don't match the criteria specified in the question. You could also use if not group[0].startswith('A') if the string isn't a single character.
[x[1] for x in group[1]] for ...]
This is a list comprehension that builds a list from the results of groupby that match the earlier condition. groupby returns an iterator as the second return value (group[1]), so we simply turn that iterator into a list with a list comprehension. x[0] is the key value, and x[1] is the string that follows it.
[...]
The desired output is a list, so a list comprehension makes it so. The whole operation occurs with a single pass over the input.
Probably not so pythonic way in one loop without itertools:
lines = '''
A: 1
A: 2
A: 3
B: 4
B: 5
A: 6
C: 7
D: 8
A: 9
A: 10
A: 11
'''
res = []
cont_last = []
for line in lines.splitlines():
if line.startswith('A: '):
cont_last.append(line.replace('A: ', ''))
else:
if cont_last:
res.append(cont_last)
cont_last = []
if cont_last:
res.append(cont_last)
print(res)
Without needing itertools, this does the full file with only one iteration:
lines = open('logfile.txt','r').readlines()
out_list = []
temp_list = []
for line in lines:
if line.split(':')[0].strip() == 'A':
temp_list.append(line.split(':')[1].strip())
elif temp_list:
out_list.append(temp_list)
temp_list = []
if temp_list:
out_list.append(temp_list)
temp_list = []
print (out_list)
I know you asked for itertools I just don't have it handy, so I couldn't have debugged it. Hope this helps.

Using a list as key-value pair to be inserted

Hello I have a list that I wish to insert into a dictionary - however not each element a new element in the dictionary - the list itself is 2 items long and should be used as "key-value" pair.
Or (as knowing python there are dozens of ways to do something so maybe this isn't even necessary). The base problem is that I wish to split a string into 2 parts around a delimiter and use the left as "key" and the right as "value":
for line in file:
if "=" in line:
tpair = line.split("=",1)
constantsMap.update(tpair)
Of course I could do a manual split like:
for line in file:
if "=" in line:
p = line.find("=")
constantsMap[line[:p]] = line[p+1:]
But that doesn't seem to be idiomaticcally "python", so I was wondering if there's a more clean way?
You can use sequence unpacking here:
key,val = line.split("=", 1)
constantsMap[key] = val
See a demonstration below:
>>> line = "a=1"
>>> constantsMap = {}
>>> key,val = line.split("=", 1)
>>> constantsMap[key] = val
>>> constantsMap
{'a': '1'}
>>>

How can I loop through blocks of lines in a file?

I have a text file that looks like this, with blocks of lines separated by blank lines:
ID: 1
Name: X
FamilyN: Y
Age: 20
ID: 2
Name: H
FamilyN: F
Age: 23
ID: 3
Name: S
FamilyN: Y
Age: 13
ID: 4
Name: M
FamilyN: Z
Age: 25
How can I loop through the blocks and process the data in each block? eventually I want to gather the name, family name and age values into three columns, like so:
Y X 20
F H 23
Y S 13
Z M 25
Here's another way, using itertools.groupby.
The function groupy iterates through lines of the file and calls isa_group_separator(line) for each line. isa_group_separator returns either True or False (called the key), and itertools.groupby then groups all the consecutive lines that yielded the same True or False result.
This is a very convenient way to collect lines into groups.
import itertools
def isa_group_separator(line):
return line=='\n'
with open('data_file') as f:
for key,group in itertools.groupby(f,isa_group_separator):
# print(key,list(group)) # uncomment to see what itertools.groupby does.
if not key: # however, this will make the rest of the code not work
data={} # as it exhausts the `group` iterator
for item in group:
field,value=item.split(':')
value=value.strip()
data[field]=value
print('{FamilyN} {Name} {Age}'.format(**data))
# Y X 20
# F H 23
# Y S 13
# Z M 25
Use a generator.
def blocks( iterable ):
accumulator= []
for line in iterable:
if start_pattern( line ):
if accumulator:
yield accumulator
accumulator= []
# elif other significant patterns
else:
accumulator.append( line )
if accumulator:
yield accumulator
import re
result = re.findall(
r"""(?mx) # multiline, verbose regex
^ID:.*\s* # Match ID: and anything else on that line
Name:\s*(.*)\s* # Match name, capture all characters on this line
FamilyN:\s*(.*)\s* # etc. for family name
Age:\s*(.*)$ # and age""",
subject)
Result will then be
[('X', 'Y', '20'), ('H', 'F', '23'), ('S', 'Y', '13'), ('M', 'Z', '25')]
which can be trivially changed into whatever string representation you want.
If your file is too large to read into memory all at once, you can still use a regular expressions based solution by using a memory mapped file, with the mmap module:
import sys
import re
import os
import mmap
block_expr = re.compile('ID:.*?\nAge: \d+', re.DOTALL)
filepath = sys.argv[1]
fp = open(filepath)
contents = mmap.mmap(fp.fileno(), os.stat(filepath).st_size, access=mmap.ACCESS_READ)
for block_match in block_expr.finditer(contents):
print block_match.group()
The mmap trick will provide a "pretend string" to make regular expressions work on the file without having to read it all into one large string. And the find_iter() method of the regular expression object will yield matches without creating an entire list of all matches at once (which findall() does).
I do think this solution is overkill for this use case however (still: it's a nice trick to know...)
If file is not huge you can read whole file with:
content = f.open(filename).read()
then you can split content to blocks using:
blocks = content.split('\n\n')
Now you can create function to parse block of text. I would use split('\n') to get lines from block and split(':') to get key and value, eventually with str.strip() or some help of regular expressions.
Without checking if block has required data code can look like:
f = open('data.txt', 'r')
content = f.read()
f.close()
for block in content.split('\n\n'):
person = {}
for l in block.split('\n'):
k, v = l.split(': ')
person[k] = v
print('%s %s %s' % (person['FamilyN'], person['Name'], person['Age']))
import itertools
# Assuming input in file input.txt
data = open('input.txt').readlines()
records = (lines for valid, lines in itertools.groupby(data, lambda l : l != '\n') if valid)
output = [tuple(field.split(':')[1].strip() for field in itertools.islice(record, 1, None)) for record in records]
# You can change output to generator by
output = (tuple(field.split(':')[1].strip() for field in itertools.islice(record, 1, None)) for record in records)
# output = [('X', 'Y', '20'), ('H', 'F', '23'), ('S', 'Y', '13'), ('M', 'Z', '25')]
#You can iterate and change the order of elements in the way you want
# [(elem[1], elem[0], elem[2]) for elem in output] as required in your output
This answer isn't necessarily better than what's already been posted, but as an illustration of how I approach problems like this it might be useful, especially if you're not used to working with Python's interactive interpreter.
I've started out knowing two things about this problem. First, I'm going to use itertools.groupby to group the input into lists of data lines, one list for each individual data record. Second, I want to represent those records as dictionaries so that I can easily format the output.
One other thing that this shows is how using generators makes breaking a problem like this down into small parts easy.
>>> # first let's create some useful test data and put it into something
>>> # we can easily iterate over:
>>> data = """ID: 1
Name: X
FamilyN: Y
Age: 20
ID: 2
Name: H
FamilyN: F
Age: 23
ID: 3
Name: S
FamilyN: Y
Age: 13"""
>>> data = data.split("\n")
>>> # now we need a key function for itertools.groupby.
>>> # the key we'll be grouping by is, essentially, whether or not
>>> # the line is empty.
>>> # this will make groupby return groups whose key is True if we
>>> care about them.
>>> def is_data(line):
return True if line.strip() else False
>>> # make sure this really works
>>> "\n".join([line for line in data if is_data(line)])
'ID: 1\nName: X\nFamilyN: Y\nAge: 20\nID: 2\nName: H\nFamilyN: F\nAge: 23\nID: 3\nName: S\nFamilyN: Y\nAge: 13\nID: 4\nName: M\nFamilyN: Z\nAge: 25'
>>> # does groupby return what we expect?
>>> import itertools
>>> [list(value) for (key, value) in itertools.groupby(data, is_data) if key]
[['ID: 1', 'Name: X', 'FamilyN: Y', 'Age: 20'], ['ID: 2', 'Name: H', 'FamilyN: F', 'Age: 23'], ['ID: 3', 'Name: S', 'FamilyN: Y', 'Age: 13'], ['ID: 4', 'Name: M', 'FamilyN: Z', 'Age: 25']]
>>> # what we really want is for each item in the group to be a tuple
>>> # that's a key/value pair, so that we can easily create a dictionary
>>> # from each item.
>>> def make_key_value_pair(item):
items = item.split(":")
return (items[0].strip(), items[1].strip())
>>> make_key_value_pair("a: b")
('a', 'b')
>>> # let's test this:
>>> dict(make_key_value_pair(item) for item in ["a:1", "b:2", "c:3"])
{'a': '1', 'c': '3', 'b': '2'}
>>> # we could conceivably do all this in one line of code, but this
>>> # will be much more readable as a function:
>>> def get_data_as_dicts(data):
for (key, value) in itertools.groupby(data, is_data):
if key:
yield dict(make_key_value_pair(item) for item in value)
>>> list(get_data_as_dicts(data))
[{'FamilyN': 'Y', 'Age': '20', 'ID': '1', 'Name': 'X'}, {'FamilyN': 'F', 'Age': '23', 'ID': '2', 'Name': 'H'}, {'FamilyN': 'Y', 'Age': '13', 'ID': '3', 'Name': 'S'}, {'FamilyN': 'Z', 'Age': '25', 'ID': '4', 'Name': 'M'}]
>>> # now for an old trick: using a list of column names to drive the output.
>>> columns = ["Name", "FamilyN", "Age"]
>>> print "\n".join(" ".join(d[c] for c in columns) for d in get_data_as_dicts(data))
X Y 20
H F 23
S Y 13
M Z 25
>>> # okay, let's package this all into one function that takes a filename
>>> def get_formatted_data(filename):
with open(filename, "r") as f:
columns = ["Name", "FamilyN", "Age"]
for d in get_data_as_dicts(f):
yield " ".join(d[c] for c in columns)
>>> print "\n".join(get_formatted_data("c:\\temp\\test_data.txt"))
X Y 20
H F 23
S Y 13
M Z 25
Use a dict, namedtuple, or custom class to store each attribute as you come across it, then append the object to a list when you reach a blank line or EOF.
simple solution:
result = []
for record in content.split('\n\n'):
try:
id, name, familyn, age = map(lambda rec: rec.split(' ', 1)[1], record.split('\n'))
except ValueError:
pass
except IndexError:
pass
else:
result.append((familyn, name, age))
Along with the half-dozen other solutions I already see here, I'm a bit surprised that no one has been so simple-minded (that is, generator-, regex-, map-, and read-free) as to propose, for example,
fp = open(fn)
def get_one_value():
line = fp.readline()
if not line:
return None
parts = line.split(':')
if 2 != len(parts):
return ''
return parts[1].strip()
# The result is supposed to be a list.
result = []
while 1:
# We don't care about the ID.
if get_one_value() is None:
break
name = get_one_value()
familyn = get_one_value()
age = get_one_value()
result.append((name, familyn, age))
# We don't care about the block separator.
if get_one_value() is None:
break
for item in result:
print item
Re-format to taste.

Categories

Resources