Making a raw dictionary sane - python

I have a dict brought in from a csv: {'0ca6f08e': '1111', '89b2e9ab': '2222', '0c2e5b6d': '3333', '07287d73': '4444'}
and what is needed is something like:
{'id' :'0ca6f08e', 'thing': '1111'}, {'id': '89b2e9ab', 'thing': '2222'}, {'id: '0c2e5b6d', 'thing': '3333'}
This is to bring order to the dict so I can operate later with sanity. I'm not clear on how to take a csv like:
0ca6f08e,1111
89b2e9ab,2222
0c2e5b6d,3333
an inject the keys for sanity and later use.

We can use a list comprehension to solve this:
>>> original = {'0ca6f08e': '1111', '89b2e9ab': '2222', '0c2e5b6d': '3333', '07287d73': '4444'}
>>> parsed = [{'id': key, 'thing': value} for key, value in a.items()]
>>> parsed
[{'thing': '1111', 'id': '0ca6f08e'}, {'thing': '2222', 'id': '89b2e9ab'}, {'thing': '3333', 'id': '0c2e5b6d'}, {'thing'
: '4444', 'id': '07287d73'}]
We're essentially grabbing each key and corresponding value in the original dict, and converting it into a list of dicts.
Note that it may be cleaner to just use the items method of a dict to grab the key and the value directly, and loop over that:
>>> original.items()
[('0ca6f08e', '1111'), ('89b2e9ab', '2222'), ('0c2e5b6d', '3333'), ('07287d73', '4444')]

If you are reading the file for the first time, you can fix the results like this:
with open('foo.csv') as f:
for line in f:
lines = [{'id': a, 'thing': b} for a,b in line.split(',')]
If you want to fix the results from the dictionary:
lines = [{'id': a, 'thing': b} for a,b in big_dict.iteritems()]

You can use the csv module's DictReader to read the csv file.
Here is an example:
import csv
with open('example.csv') as csvfile:
for csv_dict in csv.DictReader(csvfile, fieldnames=["id", "thing"])
# Now you can use the csv_dict as a normal dictionary
print csv_dict["id"]

Related

Convert structured content of text file to list with dictionaries

I'm reading a text file like this:
ATTACHMENT1=:1.xlsm
ATTACHMENT1RNG1=:Entity
ATTACHMENT1VRNG1=:TOT^^ENT1
ATTACHMENT1RNG2=:country
ATTACHMENT1VRNG2=:A
ATTACHMENT2=:2.xlsm
ATTACHMENT2RNG1=:Entity
ATTACHMENT2VRNG1=:TOT
ATTACHMENT2RNG2=:dept
ATTACHMENT2VRNG2=:F0008
and want to load it in list with dictionaries as in:
[
{'File': [1.xlsm'], 'Entity': ['TOT', 'ENT1'], 'country': ['A']},
{'File': [2.xlsm'], 'Entity': ['TOT'], 'dept': ['F0008']}
]
'File' is a fixed prefix for ATTACHMENT1 and ATTACHMENT2.
For the other lines I would like to have the value of RNGx as dictionary keys and the values of VRNGx as dictionary values.
I know I can split lines on '=:', I can also split a string based on a separator, but I cannot figure out how to create this data structure myself.
Any guidance would be very much appreciated.
Thanks in advance.
Assuming you can rely on the ordering, this is pretty easy to do with a state machine that just looks at the presence of the different suffixes:
with open("file.txt") as f:
data = []
key = ""
for line in f:
k, v = line.strip().split("=:")
if "RNG" not in k:
data.append({'File': [v]})
elif "VRNG" not in k:
key = v
else:
data[-1][key] = v.split("^^")
print(data)
[{'File': ['1.xlsm'], 'Entity': ['TOT', 'ENT1'], 'country': ['A']}, {'File': ['2.xlsm'], 'Entity': ['TOT'], 'dept': ['F0008']}]

How to create/read nested dictionary from file?

Here is the text file1 content
name = test1,description=None,releaseDate="2020-02-27"
name = test2,description=None,releaseDate="2020-02-28"
name = test3,description=None,releaseDate="2020-02-29"
I want a nested dictionary like this. How to create this?
{ 'test1': {'description':'None','releaseDate':'2020-02-27'},
'test2': {'description':'None','releaseDate':'2020-02-28'},
'test3': {'description':'None','releaseDate':'2020-02-29'}}
After this I want to append these values in the following line of code through "for" loop for a list of projects.
Example: For a project="IJTP2" want to go through each name in the dictionary like below
project.create(name="test1", project="IJTP2", description=None, releaseDate="2020-02-27")
project.create(name="test2", project="IJTP2", description=None, releaseDate="2020-02-28")
project.create(name="test3", project="IJTP2", description=None, releaseDate="2020-02-29")
Now to the next project:
List of projects is stored in another file as below
IJTP1
IJTP2
IJTP3
IJTP4
I just started working on Python and have never worked on the nested dictionaries.
I assume that:
each file line has comma-separated columns
each column has only one = and key on its left, value on its right
only first column is special(name)
Of course, as #Alex Hall mentioned, I recommend JSON or CSV, too.
Anyway, I wrote code for your case.
d = {}
with open('test-200229.txt') as f:
for line in f:
(_, name), *rest = (
tuple(value.strip() for value in column.split('='))
for column in line.split(',')
)
d[name] = dict(rest)
print(d)
output:
{'test1': {'description': 'None', 'releaseDate': '"2020-02-27"'}, 'test2': {'description': 'None', 'releaseDate': '"2020-02-28"'}, 'test3': {'description': 'None', 'releaseDate': '"2020-02-29"'}}

Write list of dictionary values to file [duplicate]

This question already has answers here:
Write values of Python dictionary back to file
(2 answers)
Closed 7 years ago.
I have a list of dictionaries such as:
values = [{'Name': 'John Doe', 'Age': 26, 'ID': '1279abc'},
{'Name': 'Jane Smith', 'Age': 35, 'ID': 'bca9721'}
]
What I'd like to do is print this list of dictionaries to a tab delimited text file to look something like this:
Name Age ID
John Doe 26 1279abc
Jane Smith 35 bca9721
However, I am unable to wrap my head around simply printing the values, as of right now I'm printing the entire dictionary per row via:
for i in values:
f.write(str(i))
f.write("\n")
Perhaps I need to iterate through each dictionary now? I've seen people use something like:
for i, n in iterable:
pass
But I've never understood this. Anyone able to shed some light into this?
EDIT:
Appears that I could use something like this, unless someone has a more pythonic way (Perhaps someone can explain "for i, n in interable"?):
for dic in values:
for entry in dic:
f.write(dic[entry])
This is simple enough to accomplish with a DictWriter. Its purpose is to write column-separated data, but if we specify our delimiter to be that of tabs instead of commas, we can make this work just fine.
from csv import DictWriter
values = [{'Name': 'John Doe', 'Age': 26, 'ID': '1279abc'},
{'Name': 'Jane Smith', 'Age': 35, 'ID': 'bca9721'}]
keys = values[0].keys()
with open("new-file.tsv", "w") as f:
dict_writer = DictWriter(f, keys, delimiter="\t")
dict_writer.writeheader()
for value in values:
dict_writer.writerow(value)
f.write('Name\tAge\tID')
for value in values:
f.write('\t'.join([value.get('Name'), str(value.get('Age')), value.get('ID')]))
You're probably thinking of the items() method. This will return the key and value for each entry in the dictionary. http://www.tutorialspoint.com/python/dictionary_items.htm
for k,v in values.items():
pass
# assuming your dictionary is in values
import csv
with open('out.txt', 'w') as fout:
writer = csv.DictWriter(fout, fields=values.keys(). delimiter="\t")
writer.writeheader()
writer.writerow(values.values())

In Python, what is the easiest way to add a list consisting of keyword pairs to a dictionary?

I have a homework problem in Python.
I am using Python version 3.4.0 on Linux.
The design document states that I am to read a CSV file using built in functions, specified as names.dat, that is in the format:
name:name2, name:name3, name2:name4, name3:name5\n (etc)
I am then to add these keyword pairs to a dictionary, which is the part I'm stuck on.
The code I have thus far is this:
dictionary = dict()
database = open('names.dat', 'r')
data = database.read()
data = data.rstrip('\n')
data = data.split(',')
for item in range(len(data)):
dictionary.update(data[item-1])
My thinking being that if I have a list element in the format "name:name2", and I call the dictionary update function with that element as an argument, it will properly map to a keyword pair in the dictionary.
However, this is not the case, as I get this error when I run this script:
File "MyName.py", line 7, in <module>
dictionary.update(data[item-1])
ValueError: dictionary update sequence element #0 has length 1; 2 is required
This and This seem similar, but I feel that this is enough of a different question to warrant a separate response.
What am I doing wrong here, and how can I fix it?
Is there a simpler way to do this?
#Paulo Scardine has a great answer if you want to create an exact dataset from the given csv. If you want to combine the values based on the key one could use this:
changes = {}
with open('test.csv', 'r') as f:
for row in f:
for e in row.rstrip('\n').split(", ") : #split lines by column
print (e) #just to show what is being generated here
(k,v) = e.split(":") #split further into key, value pairs
changes.setdefault(k, []).append(v)
#creates empty list if new key, adds value to list
print (changes)
Data will look like:
{'name3': ['name5'], 'name2': ['name4', 'name6', 'name5'], 'name1': ['name', 'name4'], 'name': ['name2', 'name3']}
This could be further simplified but I think this gives the good example that someone learning can follow.
Edit: added setdefault method following #Paulo Scardine comment
Try this:
data = []
with open('names.dat') as database:
for line in database:
if line.strip(): # skip blank lines
data.append(
dict(i.split(":") for i in line.rstrip('\n').split(","))
)
If your file is:
name:name2,name:name3,name2:name4,name3:name5
name:name2,name:name3,name2:name4,name3:name5
name:name2,name:name3,name2:name4,name3:name5
name:name2,name:name3,name2:name4,name3:name5
data will be:
[{'name': 'name3', 'name2': 'name4', 'name3': 'name5'},
{'name': 'name3', 'name2': 'name4', 'name3': 'name5'},
{'name': 'name3', 'name2': 'name4', 'name3': 'name5'},
{'name': 'name3', 'name2': 'name4', 'name3': 'name5'}]
Perhaps you want a dict of list instead of a list of dict:
data = {}
with open('names.dat') as database:
for line in database:
if line.strip(): # skip blank lines
for k, v in (i.split(":") for i in line.rstrip('\n').split(",")):
data.setdefault(k, []).append(v)
Resulting:
{'name': [ 'name2', 'name3', 'name2', 'name3', 'name2', 'name3', 'name2', 'name3'],
'name2': ['name4', 'name4', 'name4', 'name4'],
'name3': ['name5', 'name5', 'name5', 'name5']}

Python: Extract info from xml to dictionary

I need to extract information from an xml file, isolate it from the xml tags before and after, store the information in a dictionary, then loop through the dictionary to print a list. I am an absolute beginner so I'd like to keep it as simple as possible and I apologize if how I've described what I'd like to do doesn't make much sense.
here is what i have so far.
for line in open("/people.xml"):
if "name" in line:
print (line)
if "age" in line:
print(line)
Current Output:
<name>John</name>
<age>14</age>
<name>Kevin</name>
<age>10</age>
<name>Billy</name>
<age>12</age>
Desired Output
Name Age
John 14
Kevin 10
Billy 12
edit- So using the code below I can get the output:
{'Billy': '12', 'John': '14', 'Kevin': '10'}
Does anyone know how to get from this to a chart with headers like my desired output?
try xmldict (Convert xml to python dictionaries, and vice-versa.):
>>> xmldict.xml_to_dict('''
... <root>
... <persons>
... <person>
... <name first="foo" last="bar" />
... </person>
... <person>
... <name first="baz" last="bar" />
... </person>
... </persons>
... </root>
... ''')
{'root': {'persons': {'person': [{'name': {'last': 'bar', 'first': 'foo'}}, {'name': {'last': 'bar', 'first': 'baz'}}]}}}
# Converting dictionary to xml
>>> xmldict.dict_to_xml({'root': {'persons': {'person': [{'name': {'last': 'bar', 'first': 'foo'}}, {'name': {'last': 'bar', 'first': 'baz'}}]}}})
'<root><persons><person><name><last>bar</last><first>foo</first></name></person><person><name><last>bar</last><first>baz</first></name></person></persons></root>'
or try xmlmapper (list of python dictionary with parent-child relationship):
>>> myxml='''<?xml version='1.0' encoding='us-ascii'?>
<slideshow title="Sample Slide Show" date="2012-12-31" author="Yours Truly" >
<slide type="all">
<title>Overview</title>
<item>Why
<em>WonderWidgets</em>
are great
</item>
<item/>
<item>Who
<em>buys</em>
WonderWidgets1
</item>
</slide>
</slideshow>'''
>>> x=xml_to_dict(myxml)
>>> for s in x:
print s
>>>
{'text': '', 'tail': None, 'tag': 'slideshow', 'xmlinfo': {'ownid': 1, 'parentid': 0}, 'xmlattb': {'date': '2012-12-31', 'author': 'Yours Truly', 'title': 'Sample Slide Show'}}
{'text': '', 'tail': '', 'tag': 'slide', 'xmlinfo': {'ownid': 2, 'parentid': 1}, 'xmlattb': {'type': 'all'}}
{'text': 'Overview', 'tail': '', 'tag': 'title', 'xmlinfo': {'ownid': 3, 'parentid': 2}, 'xmlattb': {}}
{'text': 'Why', 'tail': '', 'tag': 'item', 'xmlinfo': {'ownid': 4, 'parentid': 2}, 'xmlattb': {}}
{'text': 'WonderWidgets', 'tail': 'are great', 'tag': 'em', 'xmlinfo': {'ownid': 5, 'parentid': 4}, 'xmlattb': {}}
{'text': None, 'tail': '', 'tag': 'item', 'xmlinfo': {'ownid': 6, 'parentid': 2}, 'xmlattb': {}}
{'text': 'Who', 'tail': '', 'tag': 'item', 'xmlinfo': {'ownid': 7, 'parentid': 2}, 'xmlattb': {}}
{'text': 'buys', 'tail': 'WonderWidgets1', 'tag': 'em', 'xmlinfo': {'ownid': 8, 'parentid': 7}, 'xmlattb': {}}
above code will give generator. When you iterate over it; you will get information in dict keys; like tag, text, xmlattb,tail and addition information in xmlinfo. Here root element will have parentid information as 0.
Use an XML parser for this. For example,
import xml.etree.ElementTree as ET
doc = ET.parse('people.xml')
names = [name.text for name in doc.findall('.//name')]
ages = [age.text for age in doc.findall('.//age')]
people = dict(zip(names,ages))
print(people)
# {'Billy': '12', 'John': '14', 'Kevin': '10'}
It seems to me that this is an exercise in learning how to parse this XML manually rather than simply pulling a library out of the bag to do it for you. If I am wrong, I suggest watching the udacity video by Steve Huffman that can be found here: http://www.udacity.com/view#Course/cs253/CourseRev/apr2012/Unit/362001/Nugget/365002. He explains how to use the minidom module to parse lightweight xml files such as these.
Now, the first point I want to make in my answer, is that you don't want to create a python dictionary to print all of these values. A python dictionary is simply a set of keys that correspond to values. There is no ordering to them, and so traversal in the order they appeared in the file is a pain in the butt. You are trying to print out all of the names together with their corresponding ages, so a data structure like a list of tuples would probably be better suited to collating your data.
It seems like the structure of your xml file is such that each name tag is succeeded by an age tag that corresponds to it. There also seems to only be a single name tag per line. This makes matters fairly simple. I'm not going to write the most efficient or universal solution to this problem, but instead I will try to make the code as simple to understand as I can.
So let's first create a list to store the data:
Let's then create a list to store the data:
a_list = []
Now open your file, and initialize a couple of variables to hold each name and age:
from __future__ import with_statement
with open("/people.xml") as f:
name, age = None, None #initialize a name and an age variable to be used during traversals.
for line in f:
name = extract_name(line,name) # This function will be defined later.
age = extract_age(line) # So will this one.
if age: #We know that if age is defined, we can add a person to our list and reset our variables
a_list.append( (name,age) ) # and now we can re-initialize our variables.
name,age = None , None # otherwise simply read the next line until age is defined.
Now for each line in the file, we wanted to determine whether it contains a user. If it did, we wanted to extract the name. Let's create a function used to do this:
def extract_name(a_line,name): #we pass in the line as well as the name value that that we defined before beginning our traversal.
if name: # if the name is predefined, we simply want to keep the name at its current value. (we can clear it upon encountering the corresponding age.)
return name
if not "<name>" in a_line: #if no "<name>" in a_line, return. otherwise, extract new name.
return
name_pos = a_line.find("<name>")+6
end_pos = a_line.find("</name>")
return a_line[name_pos:end_pos]
Now, we must create a function to parse the line for a user's age. We can do this in a similar way to the previous function, but we know that once we have an age, it will be added into the list immediately. As such, we never need to concern ourselves with age's previous value. The function can therefore look like this:
def extract_age(a_line):
if not "<age>" in a_line: #if no "<age>" in a_line:
return
age_pos = a_line.find("<age>")+5 # else extract age from line and return it.
end_pos = a_line.find("</age>")
return a_line[age_pos:end_pos]
Finally, you want to print the list. You might do it as follows:
for item in a_list:
print '\t'.join(item)
Hope this helped. I haven't tested out my code, so it might still be slightly buggy. The concepts are there, though. :)
Here's another way using lxml library:
from lxml import objectify
def xml_to_dict(xml_str):
""" Convert xml to dict, using lxml v3.4.2 xml processing library, see http://lxml.de/ """
def xml_to_dict_recursion(xml_object):
dict_object = xml_object.__dict__
if not dict_object: # if empty dict returned
return xml_object
for key, value in dict_object.items():
dict_object[key] = xml_to_dict_recursion(value)
return dict_object
return xml_to_dict_recursion(objectify.fromstring(xml_str))
xml_string = """<?xml version="1.0" encoding="UTF-8"?><Response><NewOrderResp>
<IndustryType>Test</IndustryType><SomeData><SomeNestedData1>1234</SomeNestedData1>
<SomeNestedData2>3455</SomeNestedData2></SomeData></NewOrderResp></Response>"""
print xml_to_dict(xml_string)
To preserve the parent node, use this instead:
def xml_to_dict(xml_str):
""" Convert xml to dict, using lxml v3.4.2 xml processing library, see http://lxml.de/ """
def xml_to_dict_recursion(xml_object):
dict_object = xml_object.__dict__
if not dict_object: # if empty dict returned
return xml_object
for key, value in dict_object.items():
dict_object[key] = xml_to_dict_recursion(value)
return dict_object
xml_obj = objectify.fromstring(xml_str)
return {xml_obj.tag: xml_to_dict_recursion(xml_obj)}
And if you want to only return a subtree and convert it to dict, you can use Element.find() :
xml_obj.find('.//') # lxml.objectify.ObjectifiedElement instance
See lxml documentation.

Categories

Resources