Converting a list of dicts to wiki-format - python

I have a list of dicts that looks like this (might look like this, I really have no idea upfront what data they contain):
data = [
{'k1': 'v1-a', 'k2': 'v2-a', 'k3': 'v3-a'},
{'k1': 'v1-b', 'k3': 'v3-b'},
{'k1': 'v1-c', 'k2': 'v2-c', 'k3': 'v3-c'},
{'k1': 'v1-d', 'k2': 'v2-d', 'k3': 'v3-d'}
]
The goal is to make it into a string that looks like this:
||k1||k2||k3||
|v1-a|v2-a|v3-a|
|v1-b||v3-b|
|v1-c|v2-c|v3-c|
|v1-d|v2-d|v3-d|
This is for the confluence wiki format.
The problem in itself is not that complicated, but the solution I come up with is so ugly that I almost don't want to use it.
What I got currently is this:
from pandas import DataFrame
// data = ...
df = DataFrame.from_dict(data).fillna('')
body = '||{header}||\n{data}'.format(
header='||'.join(df.columns.values.tolist()),
data='\n'.join(['|{}|'.format('|'.join(i)) for i in df.values.tolist()])
)
Which isn't just ugly, it depends on pandas, which is huge (I don't want to depend on this library just for this)!
The solution above would work without pandas if there was a good way of getting the list of headers, and list of list of values from the dict. But python 2 does't guaranty dict order, so I can't count on .values() giving me the correct info.
Is there anything in itertools or collections I've been missing out of?

This works for me in Python 3 and 2.7. Try it: https://repl.it/repls/VividMediumturquoiseAlbino
all_keys = sorted({key for dic in data for key in dic.keys()})
header = "||" + "||".join(all_keys) + "||"
lines = [header]
for row in data:
elems_on_row = [row.get(key, "") for key in all_keys]
current_row = "|" + "|".join(elems_on_row) + "|"
lines.append(current_row)
wikistr = "\n".join(lines)
print(wikistr)

One approach would be to use csv.DictWriter to handle the formatting, with StringIO to collect the input and defaultdict to do a bit of creative cheating. Whether or not this is prettier is up for debate.
from StringIO import StringIO
from collections import defaultdict
from csv import DictWriter
output = StringIO()
keys = list(set(key for datum in data for key in datum.keys()))
header = '|'.join('|{}|'.format(key) for key in keys)
output.write(header + '\n')
fields = [''] + keys + [''] # provides empty fields for starting and ending |
writer = DictWriter(output, fields, delimiter = '|')
for row in data:
writer.writerow(defaultdict(str, **row)) # fills in the empty fields
output.seek(0)
result = output.read()
How it works
Create the list of headers by making a set containing all keys that are in any one of your dictionaries.
Make a DictWriter that uses '|' for its delimiter, to get the pipes between entries.
Add empty-string headers at the beginning and end, so that the beginning and ending pipes will get written.
Use a defaultdict to supply the empty beginning and ending values, since they're not in the dictionaries.

The answer in pure Python is to walk the list and therefore every dictionary twice.
In the first run you can collect all distinct keys and in the second run you can build your wiki formatted string output.
Let's start by collecting the keys where we can use a set as storage:
keys = set()
for dict_ in data:
keys.update(set(dict_.keys())
keys = sorted(keys)
Now that we have the set of unique keys, we can run through the list again for the output:
wiki_output = ''
wiki_output = '||' + '||'.join(keys) + '||'
for dict_ in data:
for key in keys:
wiki_output += '|' + dict_.get(key, '')
wiki_output += '|\n'
There we go...

Related

Matching variable number of occurrences of token using regex in python

I am trying to match a token multiple times, but I only get back the last occurrence, which I understand is the normal behavior as per this answer, but I haven't been able to get the solution presented there in my example.
My text looks something like this:
&{dict1_name}= key1=key1value key2=key2value
&{dict2_name}= key1=key1value
So basically multiple lines, each with a starting string, spaces, then a variable number of key pairs. If you are wondering where this comes from, it is a robot framework variables file that I am trying to transform into a python variables file.
I will be iterating per line to match the key pairs and construct a python dictionary from them.
My current regex pattern is:
&{([^ ]+)}=[ ]{2,}(?:[ ]{2,}([^\s=]+)=([^\s=]+))+
This correctly gets me the dict name but the key pairs only match the last occurrence, as mentioned above. How can I get it to return a tuple containing: ("dict1_name","key1","key1value"..."keyn","keynvalue") so that I can then iterate over this and construct the python dictionary like so:
dict1_name= {"key1": "key1value",..."keyn": "keynvalue"}
Thanks!
As you point out, you will need to work around the fact that capture groups will only catch the last match. One way to do so is to take advantage of the fact that lines in a file are iterable, and to use two patterns: one for the "line name", and one for its multiple keyvalue pairs:*
import re
dname = re.compile(r'^&{(?P<name>\w+)}=')
keyval = re.compile(r'(?P<key>\w+)=(?P<val>\w+)')
data = {}
with open('input/keyvals.txt') as f:
for line in f:
name = dname.search(line)
if name:
name = name.group('name')
data[name] = dict(keyval.findall(line))
*Admittedly, this is a tad inefficient since you're conducting two searches per line. But for moderately sized files, you should be fine.
Result:
>>> from pprint import pprint
>>> pprint(data)
{'d5': {'key1': '28f_s', 'key2': 'key2value'},
'name1': {'key1': '5', 'key2': 'x'},
'othername2': {'key1': 'key1value', 'key2': '7'}}
Note that \w matches Unicode word characters.
Sample input, keyvals.txt:
&{name1}= key1=5 key2=x
&{othername2}= key1=key1value key2=7
&{d5}= key1=28f_s key2=aaa key2=key2value
You could use two regexes one for the names and other for the items, applying the one for the items after the first space:
import re
lines = ['&{dict1_name}= key1=key1value key2=key2value',
'&{dict2_name}= key1=key1value']
name = re.compile('^&\{(\w+)\}=')
item = re.compile('(\w+)=(\w+)')
for line in lines:
n = name.search(line).group(1)
i = '{{{}}}'.format(','.join("'{}' : '{}'".format(m.group(1), m.group(2)) for m in item.finditer(' '.join(line.split()[1:]))))
exec('{} = {}'.format(n, i))
print(locals()[n])
Output
{'key2': 'key2value', 'key1': 'key1value'}
{'key1': 'key1value'}
Explanation
The '^&\{(\w+)\}=' matches an '&' followed by a word (\w+) surrounded by curly braces '\{', '\}'. The second regex matches any words joined by a '='. The line:
i = '{{{}}}'.format(','.join("'{}' : '{}'".format(m.group(1), m.group(2)) for m in item.finditer(' '.join(line.split()[1:]))))
creates a dictionary literal, finally you create a dictionary with the required name using exec. You can access the value of the dictionary querying locals.
Use two expressions in combination with a dict comprehension:
import re
junkystring = """
lorem ipsum
&{dict1_name}= key1=key1value key2=key2value
&{dict2_name}= key1=key1value
lorem ipsum
"""
rx_outer = re.compile(r'^&{(?P<dict_name>[^{}]+)}(?P<values>.+)', re.M)
rx_inner = re.compile(r'(?P<key>\w+)=(?P<value>\w+)')
result = {m_outer.group('dict_name'): {m_inner.group('key'): m_inner.group('value')
for m_inner in rx_inner.finditer(m_outer.group('values'))}
for m_outer in rx_outer.finditer(junkystring)}
print(result)
Which produces
{'dict1_name': {'key1': 'key1value', 'key2': 'key2value'},
'dict2_name': {'key1': 'key1value'}}
With the two expressions being
^&{(?P<dict_name>[^{}]+)}(?P<values>.+)
# the outer format
See a demo on regex101.com. And the second
(?P<key>\w+)=(?P<value>\w+)
# the key/value pairs
See a demo for the latter on regex101.com as well.
The rest is simply sorting the different expressions in the dict comprehension.
Building off of Brad's answer, I made some modifications. As mentioned in my comment on his reply, it failed at empty lines or comment lines. I modified it to ignore these and continue. I also added handling of spaces: it now matches spaces in dictionary names but replaces them with underscore since python cannot have spaces in variable names. Keys are left untouched since they are strings.
import re
def robot_to_python(filename):
"""
This function can be used to convert robot variable files containing dicts to a python
variables file containing python dict that can be imported by both python and robot.
"""
dname = re.compile(r"^&{(?P<name>.+)}=")
keyval = re.compile(r"(?P<key>[\w|:]+)=(?P<val>[\w|:]+)")
data = {}
with open(filename + '.robot') as f:
for line in f:
n = dname.search(line)
if n:
name = dname.search(line).group("name").replace(" ", "_")
if name:
data[name] = dict(keyval.findall(line))
with open(filename + '.py', 'w') as file:
for dictionary in data.items():
dict_name = dictionary[0]
file.write(dict_name + " = { \n")
keyvals = dictionary[1]
for k in sorted(keyvals.keys()):
file.write("'%s':'%s', \n" % (k, keyvals[k]))
file.write("}\n\n")
file.close()

how to convert a Dictionary with multiple values for a single key in multiple rows to single row with multiple values to a key

Below is my requirement. Below is the data that is present in json file:
{"[a]":" text1","[b]":" text2","[a]":" text3","[c]":" text4","[c]":" Text5"}.
The final output should be like
{"[a]":[" text1","text3"],"[b]":" text2","[c]":" text4"," Text5"]}.
I tried below code:
data_in= ["[a]"," text1","[b]"," text2","[a]"," text3","[c]"," text4","[c]"," text5"]
data_pairs = zip(data_in[::2],data_in[1::2])
data_dict = {}
for x in data_pairs:
data_dict.setdefault(x[0],[]).append(x[1])
print data_dict
But the input it takes is more in form of List than a dictionary.
Please advise.
Or is there a way where i can convert my original dictionary into list with multiple values as list will take only unique values. Please let me know the code also i am very new to Python and still learning it. TIA.
Keys are unique within a dictionary while values may not be.
You can try
>>> l = ["[a]"," text1","[b]"," text2","[a]"," text3","[c]"," text4","[c]"," text5"]
>>> dict_data = {}
>>> for i in range(0,len(l),2):
if dict_data.has_key(l[i]):
continue
else:
dict_data[l[i]] = []
>>> for i in range(1,len(l),2):
dict_data[l[i-1]].append(l[i])
>>> print dict_data
{'[c]': [' text4', ' text5'], '[a]': [' text1', ' text3'], '[b]': [' text2']}

appending regex matches to a dictionary

I have a file in which there is the following info:
dogs_3351.txt:34.13559322033898
cats_1875.txt:23.25581395348837
cats_2231.txt:22.087912087912088
elephants_3535.txt:37.092592592592595
fish_1407.txt:24.132530120481928
fish_2078.txt:23.470588235294116
fish_2041.txt:23.564705882352943
fish_666.txt:23.17241379310345
fish_840.txt:21.77173913043478
I'm looking for a way to match the colon and append whatever appears afterwards (the numbers) to a dictionary the keys of which are the name of the animals in the beginning of each line.
Actually, regular expressions are unnecessary, provided that your data is well formatted and contains no surprises.
Assuming that data is a variable containing the string that you listed above:
dict(item.split(":") for item in data.split())
t = """
dogs_3351.txt:34.13559322033898
cats_1875.txt:23.25581395348837
cats_2231.txt:22.087912087912088
elephants_3535.txt:37.092592592592595
fish_1407.txt:24.132530120481928
fish_2078.txt:23.470588235294116
fish_2041.txt:23.564705882352943
fish_666.txt:23.17241379310345
fish_840.txt:21.77173913043478
"""
import re
d = {}
for p, q in re.findall(r'^(.+?)_.+?:(.+)', t, re.M):
d.setdefault(p, []).append(q)
print d
why dont you use the python find method to locate the index of the colons which you can use to slice the string.
>>> x='dogs_3351.txt:34.13559322033898'
>>> key_index = x.find(':')
>>> key = x[:key_index]
>>> key
'dogs_3351.txt'
>>> value = x[key_index+1:]
>>> value
'34.13559322033898'
>>>
Read in each line of the file as a text and process the lines individually as above.
Without regex and using defaultdict:
from collections import defaultdict
data = """dogs_3351.txt:34.13559322033898
cats_1875.txt:23.25581395348837
cats_2231.txt:22.087912087912088
elephants_3535.txt:37.092592592592595
fish_1407.txt:24.132530120481928
fish_2078.txt:23.470588235294116
fish_2041.txt:23.564705882352943
fish_666.txt:23.17241379310345
fish_840.txt:21.77173913043478"""
dictionary = defaultdict(list)
for l in data.splitlines():
animal = l.split('_')[0]
number = l.split(':')[-1]
dictionary[animal] = dictionary[animal] + [number]
Just make sure your data is well formatted

How to append lists in a dictionary in Python?

Hey everyone this code is working fine just one thing to deal with. It overwrites the multiple entries against a key. I need to avoid the overwriting and to save all those entries. Can you help me in this please?
#!/usr/bin/python
import sys
import fileinput
#trys to create dictionary from african country list
dic = {}
for line in sys.stdin:
lst = line.split('|')
links = lst[4].split()
new=links[0] + ' ' + links[len(links)-1]
dic[new]=lst[4] # WARNING: overwrites multiple entriess at the moment! :)
for newline in fileinput.input('somefile.txt'):
asn = newline.split()
new = asn[0] + ' ' + asn[1]
if new in dic:
print "found from " + asn[0] + " to " + asn[1]
print dic[new]
Note: Sys.stdin takes a string in the following format;
1.0.0.0/24|US|158.93.182.221|GB|7018 1221 3359 3412 2543 1873
You've got a number of problems with your code. The simplest way to do what you describe is to use a defaultdict, which gets rid of the explicit if and has_key (which you should replace by new in dic anyway):
#trys to create dictionary from african country list
from collections import defaultdict
dic = defaultdict(list) # automatically creates lists to append to in the dict
for line in sys.stdin:
mylist = line.split('|') # call it mylist instead of list
links = mylist[4].split()
new = links[0] + ' ' + links[-1] # can just use -1 to reference last item
dic[new].append(mylist[4]) # append the item to the list in the dict
# automatically creates an empty list if needed
See eryksun's comment on Gerrat's answer if you're on an old version of Python without defaultdict.
There is no method called appendlist. use append:
dic[dic[new]].append(list[4])
Also, it's inadvisable to use list as a variable name.
It is a builtin in python.
Also, this entire section of code:
if ( dic.has_key(new))
dic[dic[new]].appendlist(list[4])
else:
dic[dic[new]] = [list[4]]
should instead probably be:
if new in dic: # this is the preferrable way to test this
dic[new].append(list[4])
else:
dic[new] = [list[4]]

python string manipulation

Going to re-word the question.
Basically I'm wondering what is the easiest way to manipulate a string formatted like this:
Safety/Report/Image/489
or
Safety/Report/Image/490
And sectioning off each word seperated by a slash(/), and storing each section(token) into a store so I can call it later. (Reading in about 1200 cells from a CSV file).
The answer for your question:
>>> mystring = "Safety/Report/Image/489"
>>> mystore = mystring.split('/')
>>> mystore
['Safety', 'Report', 'Image', '489']
>>> mystore[2]
'Image'
>>>
If you want to store data from more than one string, then you have several options depending on how do you want to organize it. For example:
liststring = ["Safety/Report/Image/489",
"Safety/Report/Image/490",
"Safety/Report/Image/491"]
dictstore = {}
for line, string in enumerate(liststring):
dictstore[line] = string.split('/')
print dictstore[1][3]
print dictstore[2][3]
prints:
490
491
In this case you can use in the same way a dictionary or a list (a list of lists) for storage. In case each string has a especial identifier (one better than the line number), then the dictionary is the option to choose.
I don't quite understand your code and don't have too much time to study it, but I thought that the following might be helpful, at least if order isn't important ...
in_strings = ['Safety/Report/Image/489',
'Safety/Report/Image/490',
'Other/Misc/Text/500'
]
out_dict = {}
for in_str in in_strings:
level1, level2, level3, level4 = in_str.split('/')
out_dict.setdefault(level1, {}).setdefault(
level2, {}).setdefault(
level3, []).append(level4)
print out_dict
{'Other': {'Misc': {'Text': ['500']}}, 'Safety': {'Report': {'Image': ['489', '490']}}}
If your csv is line seperated:
#do something to load the csv
split_lines = [x.strip() for x in csv_data.split('\n')]
for line_data in split_lines:
split_parts = [x.strip() for x in line_data.split('/')]
# do something with individual part data
# such as some_variable = split_parts[1] etc
# if using indexes, I'd be sure to catch for index errors in case you
# try to go to index 3 of something with only 2 parts
check out the python csv module for some importing help (I'm not too familiar).

Categories

Resources