Convert structured content of text file to list with dictionaries

Convert structured content of text file to list with dictionaries - python

I'm reading a text file like this:
ATTACHMENT1=:1.xlsm
ATTACHMENT1RNG1=:Entity
ATTACHMENT1VRNG1=:TOT^^ENT1
ATTACHMENT1RNG2=:country
ATTACHMENT1VRNG2=:A
ATTACHMENT2=:2.xlsm
ATTACHMENT2RNG1=:Entity
ATTACHMENT2VRNG1=:TOT
ATTACHMENT2RNG2=:dept
ATTACHMENT2VRNG2=:F0008
and want to load it in list with dictionaries as in:
[
{'File': [1.xlsm'], 'Entity': ['TOT', 'ENT1'], 'country': ['A']},
{'File': [2.xlsm'], 'Entity': ['TOT'], 'dept': ['F0008']}
]
'File' is a fixed prefix for ATTACHMENT1 and ATTACHMENT2.
For the other lines I would like to have the value of RNGx as dictionary keys and the values of VRNGx as dictionary values.
I know I can split lines on '=:', I can also split a string based on a separator, but I cannot figure out how to create this data structure myself.
Any guidance would be very much appreciated.
Thanks in advance.

Assuming you can rely on the ordering, this is pretty easy to do with a state machine that just looks at the presence of the different suffixes:
with open("file.txt") as f:
data = []
key = ""
for line in f:
k, v = line.strip().split("=:")
if "RNG" not in k:
data.append({'File': [v]})
elif "VRNG" not in k:
key = v
else:
data[-1][key] = v.split("^^")
print(data)
[{'File': ['1.xlsm'], 'Entity': ['TOT', 'ENT1'], 'country': ['A']}, {'File': ['2.xlsm'], 'Entity': ['TOT'], 'dept': ['F0008']}]

Related

Python: How to read a CSV file in which each line is a string?

I am trying to read a CSV file. I need to access the keys and values in each line.
"{id:495981,start:""2020-09-23"",end:""2020-09-23"",something:point({srid:4326, x:10.96791704, y:49.7989944})}"
"{id:49963,start:""2020-09-23"",end:""2020-09-23"",something:point({srid:4326, x:10.96791704, y:49.7989944})}"
As shown above, each line is a string. What I want to do is reading the value of id in each line. Reading the file with "panda.read_csv" return something like this:
{id:495981 end:""2020-09-23"" start:""2020-09-23"" \
0 {id:49963 end:""2020-09-23"" start:""2020-09-23""
...
something:point({srid:4326 x:7.138 y:51.594})}
0 something:point({srid:4326 x:10.96791704 y:49.7989944})}
[31264 rows x 6 columns]
Any suggestions??

You could utilize regex here to pull each result out of the string as splitting would include the extra characters I'm assuming you would want to exclude.
import re
data = {}
with open('mycsvfile.csv', 'r') as file:
for line in file:
line_id = re.search('(?<=id:)[0-9]*(?=,)', line).group(0)
line_data = {'start': re.search('(?<=start:"").*(?="",end)', line).group(0),
'end': re.search('(?<=end:"").*(?="",something)', line).group(0),
'something': re.search('(?<=something:).*(?=}")', line).group(0),
}
data[line_id] = line_data
print(data)
This will result in a dict with all ids as a key with each key containing another dict with all the values in the string.
{'495981': {'start': '2020-09-23', 'end': '2020-09-23', 'something': 'point({srid:4326, x:10.96791704, y:49.7989944})'},
'49963': {'start': '2020-09-23', 'end': '2020-09-23', 'something': 'point({srid:4326, x:10.96791704, y:49.7989944})'}}

How to create/read nested dictionary from file?

Here is the text file1 content
name = test1,description=None,releaseDate="2020-02-27"
name = test2,description=None,releaseDate="2020-02-28"
name = test3,description=None,releaseDate="2020-02-29"
I want a nested dictionary like this. How to create this?
{ 'test1': {'description':'None','releaseDate':'2020-02-27'},
'test2': {'description':'None','releaseDate':'2020-02-28'},
'test3': {'description':'None','releaseDate':'2020-02-29'}}
After this I want to append these values in the following line of code through "for" loop for a list of projects.
Example: For a project="IJTP2" want to go through each name in the dictionary like below
project.create(name="test1", project="IJTP2", description=None, releaseDate="2020-02-27")
project.create(name="test2", project="IJTP2", description=None, releaseDate="2020-02-28")
project.create(name="test3", project="IJTP2", description=None, releaseDate="2020-02-29")
Now to the next project:
List of projects is stored in another file as below
IJTP1
IJTP2
IJTP3
IJTP4
I just started working on Python and have never worked on the nested dictionaries.

I assume that:
each file line has comma-separated columns
each column has only one = and key on its left, value on its right
only first column is special(name)
Of course, as #Alex Hall mentioned, I recommend JSON or CSV, too.
Anyway, I wrote code for your case.
d = {}
with open('test-200229.txt') as f:
for line in f:
(_, name), *rest = (
tuple(value.strip() for value in column.split('='))
for column in line.split(',')
)
d[name] = dict(rest)
print(d)
output:
{'test1': {'description': 'None', 'releaseDate': '"2020-02-27"'}, 'test2': {'description': 'None', 'releaseDate': '"2020-02-28"'}, 'test3': {'description': 'None', 'releaseDate': '"2020-02-29"'}}

python 2.7:iterate dictionary and map values to a file

I have a list of dictionaries which I build from .xml file:
list_1=[{'lat': '00.6849879', 'phone': '+3002201600', 'amenity': 'restaurant', 'lon': '00.2855850', 'name': 'Telegraf'},{'lat': '00.6850230', 'addr:housenumber': '6', 'lon': '00.2844493', 'addr:city': 'XXX', 'addr:street': 'YYY.'},{'lat': '00.6860304', 'crossing': 'traffic_signals', 'lon': '00.2861978', 'highway': 'crossing'}]
My aim is to build a text file with values (not keys) in such order:
lat,lon,'addr:street','addr:housenumber','addr:city','amenity','crossing' etc...
00.6849879,00.2855850, , , ,restaurant, ,'\n'00.6850230,00.2844493,YYY,6,XXX, , ,'\n'00.6860304,00.2861978, , , , ,traffic_signals,'\n'
if value not exists there should be empty space.
I tried to loop with for loop:
for i in list_1:
line= i['lat'],i['lon']
print line
Problem occurs if I add value which does not exist in some cases:
for i in list_1:
line= i['lat'],i['lon'],i['phone']
print line
Also tried to loop and use map() function, but results seems not correct:
for i in list_1:
line=map(lambda x1,x2:x1+','+x2+'\n',i['lat'],i['lon'])
print line
Also tried:
for i in list_1:
for k,v in i.items():
if k=='addr:housenumber':
print v
This time I think there might be too many if/else conditions to write.
Seems like solutions is somewhere close. But can't figure out the solution and its optimal way.

I would look to use the csv module, in particular DictWriter. The fieldnames dictate the order in which the dictionary information is written out. Actually writing the header is optional:
import csv
fields = ['lat','lon','addr:street','addr:housenumber','addr:city','amenity','crossing',...]
with open('<file>', 'w') as f:
writer = csv.DictWriter(f, fields)
#writer.writeheader() # If you want a header
writer.writerows(list_1)
If you really didn't want to use csv module then you can simple iterate over the list of the fields you want in the order you want them:
fields = ['lat','lon','addr:street','addr:housenumber','addr:city','amenity','crossing',...]
for row in line_1:
print(','.join(row.get(field, '') for field in fields))

If you can't or don't want to use csv you can do something like
order = ['lat','lon','addr:street','addr:housenumber',
'addr:city','amenity','crossing']
for entry in list_1:
f.write(", ".join([entry.get(x, "") for x in order]) + "\n")
This will create a list with the values from the entry map in the order present in the order list, and default to "" if the value is not present in the map.
If your output is a csv file, I strongly recommend using the csv module because it will also escape values correctly and other csv file specific things that we don't think about right now.

Thanks guys
I found the solution. Maybe it is not so elegant but it works.
I made a list of node keys look for them in another list and get values.
key_list=['lat','lon','addr:street','addr:housenumber','amenity','source','name','operator']
list=[{'lat': '00.6849879', 'phone': '+3002201600', 'amenity': 'restaurant', 'lon': '00.2855850', 'name': 'Telegraf'},{'lat': '00.6850230', 'addr:housenumber': '6', 'lon': '00.2844493', 'addr:city': 'XXX', 'addr:street': 'YYY.'},{'lat': '00.6860304', 'crossing': 'traffic_signals', 'lon': '00.2861978', 'highway': 'crossing'}]
Solution:
final_list=[]
for i in list:
line=str()
for ii in key_list:
if ii in i:
x=ii
line=line+str(i[x])+','
else:
line=line+' '+','
final_list.append(line)

Write list of dictionary values to file [duplicate]

This question already has answers here:
Write values of Python dictionary back to file
(2 answers)
Closed 7 years ago.
I have a list of dictionaries such as:
values = [{'Name': 'John Doe', 'Age': 26, 'ID': '1279abc'},
{'Name': 'Jane Smith', 'Age': 35, 'ID': 'bca9721'}
]
What I'd like to do is print this list of dictionaries to a tab delimited text file to look something like this:
Name Age ID
John Doe 26 1279abc
Jane Smith 35 bca9721
However, I am unable to wrap my head around simply printing the values, as of right now I'm printing the entire dictionary per row via:
for i in values:
f.write(str(i))
f.write("\n")
Perhaps I need to iterate through each dictionary now? I've seen people use something like:
for i, n in iterable:
pass
But I've never understood this. Anyone able to shed some light into this?
EDIT:
Appears that I could use something like this, unless someone has a more pythonic way (Perhaps someone can explain "for i, n in interable"?):
for dic in values:
for entry in dic:
f.write(dic[entry])

This is simple enough to accomplish with a DictWriter. Its purpose is to write column-separated data, but if we specify our delimiter to be that of tabs instead of commas, we can make this work just fine.
from csv import DictWriter
values = [{'Name': 'John Doe', 'Age': 26, 'ID': '1279abc'},
{'Name': 'Jane Smith', 'Age': 35, 'ID': 'bca9721'}]
keys = values[0].keys()
with open("new-file.tsv", "w") as f:
dict_writer = DictWriter(f, keys, delimiter="\t")
dict_writer.writeheader()
for value in values:
dict_writer.writerow(value)

f.write('Name\tAge\tID')
for value in values:
f.write('\t'.join([value.get('Name'), str(value.get('Age')), value.get('ID')]))

You're probably thinking of the items() method. This will return the key and value for each entry in the dictionary. http://www.tutorialspoint.com/python/dictionary_items.htm
for k,v in values.items():
pass

# assuming your dictionary is in values
import csv
with open('out.txt', 'w') as fout:
writer = csv.DictWriter(fout, fields=values.keys(). delimiter="\t")
writer.writeheader()
writer.writerow(values.values())

Making a raw dictionary sane

I have a dict brought in from a csv: {'0ca6f08e': '1111', '89b2e9ab': '2222', '0c2e5b6d': '3333', '07287d73': '4444'}
and what is needed is something like:
{'id' :'0ca6f08e', 'thing': '1111'}, {'id': '89b2e9ab', 'thing': '2222'}, {'id: '0c2e5b6d', 'thing': '3333'}
This is to bring order to the dict so I can operate later with sanity. I'm not clear on how to take a csv like:
0ca6f08e,1111
89b2e9ab,2222
0c2e5b6d,3333
an inject the keys for sanity and later use.

We can use a list comprehension to solve this:
>>> original = {'0ca6f08e': '1111', '89b2e9ab': '2222', '0c2e5b6d': '3333', '07287d73': '4444'}
>>> parsed = [{'id': key, 'thing': value} for key, value in a.items()]
>>> parsed
[{'thing': '1111', 'id': '0ca6f08e'}, {'thing': '2222', 'id': '89b2e9ab'}, {'thing': '3333', 'id': '0c2e5b6d'}, {'thing'
: '4444', 'id': '07287d73'}]
We're essentially grabbing each key and corresponding value in the original dict, and converting it into a list of dicts.
Note that it may be cleaner to just use the items method of a dict to grab the key and the value directly, and loop over that:
>>> original.items()
[('0ca6f08e', '1111'), ('89b2e9ab', '2222'), ('0c2e5b6d', '3333'), ('07287d73', '4444')]

If you are reading the file for the first time, you can fix the results like this:
with open('foo.csv') as f:
for line in f:
lines = [{'id': a, 'thing': b} for a,b in line.split(',')]
If you want to fix the results from the dictionary:
lines = [{'id': a, 'thing': b} for a,b in big_dict.iteritems()]

You can use the csv module's DictReader to read the csv file.
Here is an example:
import csv
with open('example.csv') as csvfile:
for csv_dict in csv.DictReader(csvfile, fieldnames=["id", "thing"])
# Now you can use the csv_dict as a normal dictionary
print csv_dict["id"]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Convert structured content of text file to list with dictionaries - python

Related

Python: How to read a CSV file in which each line is a string?

How to create/read nested dictionary from file?

python 2.7:iterate dictionary and map values to a file

Write list of dictionary values to file [duplicate]

Making a raw dictionary sane

Categories

Resources