Python Config Parser (Duplicate Key Support) - python

So I recently started writing a config parser for a Python project I'm working on. I initially avoided configparser and configobj, because I wanted to support a config file like so:
key=value
key2=anothervalue
food=burger
food=hotdog
food=cake icecream
In short, this config file is going to be edited via the command line over SSH often. So I don't want to tab or finicky about spacing (like YAML), but I also want avoid keys with multiple values (easily 10 or more) being line wrapped in vi. This is why I would like to support duplicate keys.
An my ideal world, when I ask the Python config object for food, it would give me a list back with ['burger', 'hotdog', 'cake', 'icecream']. If there wasn't a food value defined, it would look in a defaults config file and give me that/those values.
I have already implemented the above
However, my troubles started when I realized I wanted to support preserving inline comments and such. The way I handle reading and writing to the config files, is decoding the file into a dict in memory, read the values from the dict, or write values to the dict, and then dump that dict back out into a file. This isn't really nice for preserving line order and commenting and such and it's bugging the crap out of me.
A) ConfigObj looks like it has everything I need except support duplicate keys. Instead it wants me to make a list is going to be a pain to edit manually in vi over ssh due to line wrapping. Can I make configobj more ssh/vi friendly?
B) Is my homebrew solution wrong? Is there a better way of reading/writing/storing my config values? Is there any easy way to handle changing a key value in a config file by just modifying that line and rewriting the entire config file from memory?

Well I would certainly try to leverage what is in the standard library if I could.
The signature for the config parser classes look like this:
class ConfigParser.SafeConfigParser([defaults[, dict_type[, allow_no_value]]])
Notice the dict_type argument. When provided, this will be used to construct the dictionary objects for the list of sections, for the options within a section, and for the default values. It defaults to collections.OrderedDict. Perhaps you could pass something in there to get your desired multiple-key behavior, and then reap all the advantages of ConfigParser. You might have to write your own class to do this, or you could possibly find one written for you on PyPi or in the ActiveState recipes. Try looking for a bag or multiset class.
I'd either go that route or just suck it up and make a list:
foo = value1, value2, value3

Crazy idea: make your dictionary values as a list of 3-tuples with line number, col number and value itself and add special key for comment.
CommentSymbol = ';'
def readConfig(filename):
f = open(filename, 'r')
if not f:
return
def addValue(dict, key, lineIdx, colIdx, value):
if key in dict:
dict[key].append((lineIdx, colIdx, value))
else:
dict[key] = [(lineIdx, colIdx, value)]
res = {}
i = 0
for line in f.readlines():
idx = line.find(CommentSymbol)
if idx != -1:
comment = line[idx + 1:]
addValue(res, CommentSymbol, i, idx, comment)
line = line[:idx]
pair = [x.strip() for x in line.split('=')][:2]
if len(pair) == 2:
addValue(res, pair[0], i, 0, pair[1])
i += 1
return res
def writeConfig(dict, filename):
f = open(filename, 'w')
if not f:
return
index = sorted(dict.iteritems(), cmp = lambda x, y: cmp(x[1][:2], y[1][:2]))
i = 0
for k, V in index:
for v in V:
if v[0] > i:
f.write('\n' * (v[0] - i - 1))
if k == CommentSymbol:
f.write('{0}{1}'.format(CommentSymbol, str(v[2])))
else:
f.write('{0} = {1}'.format(str(k), str(v[2])))
i = v[0]
f.close()

Related

Creating a dictionary in Python + working with that dictionary

I am quite new to Python and am just trying to get my head around some basics.
I was wondering if anyone could show me how to perform the following tasks. I have a text file with multiple lines, those lines are as follows:
name1, variable, variable, variable
name2, variable, variable, variable
name3, variable, variable, variable
I want to store these items in a dictionary so they can be easily called. I want the name to be the key. I then want to be able to call the variables like this: key[0] or key1
The code I have at the moment does not do this:
d = {}
with open("servers.txt") as f:
for line in f:
(key, val) = line.split()
d[int(key)] = val
Once this is done, I would like to be able to take an input from a user and then check the array to see if this item is present in the array. I have found a few threads on Stackoverflow however none seem to do what I require.
There is a Similar Question asked here.
Any assistance you can provide would be amazing. I am new to this but I hope to learn fast & start contributing to threads myself in the near future :)
Cheers!
You're nearly there. Assuming that .split() actually splits the lines correctly (which it wouldn't do if there are actual commas between the values), you just need an additional unpacking operator (*):
d = {}
with open("servers.txt") as f:
for line in f:
key, *val = line.split() # first element -> key, rest -> val[0], val[1] etc.
d[int(key)] = val
If you want to check if a user-entered key exists, you can do something like
ukey = int(input("Enter key number: "))
values = d.get(ukey)
if d is not None:
# do something
else:
print("That key doesn't exist.")
Suppose that your file my_file.csv looks like:
name1, variable, variable, variable
name2, variable, variable, variable
name3, variable, variable, variable
Use pandas to do the work:
import pandas as pd
result = pd.read_csv('my_file.csv', index_col=0, header=None)
print(result)
print(result.loc['name1'])
Notice that pandas is a 3rd party library, and you need to install it using pip or easy_install tools.

Python3 dictionary values being overwritten

I’m having a problem with a dictionary. I"m using Python3. I’m sure there’s something easy that I’m just not seeing.
I’m reading lines from a file to create a dictionary. The first 3 characters of each line are used as keys (they are unique). From there, I create a list from the information in the rest of the line. Each 4 characters make up a member of the list. Once I’ve created the list, I write to the directory with the list being the value and the first three characters of the line being the key.
The problem is, each time I add a new key:value pair to the dictionary, it seems to overlay (or update) the values in the previously written dictionary entries. The keys are fine, just the values are changed. So, in the end, all of the keys have a value equivalent to the list made from the last line in the file.
I hope this is clear. Any thoughts would be greatly appreciated.
A snippet of the code is below
formatDict = dict()
sectionList = list()
for usableLine in formatFileHandle:
lineLen = len(usableLine)
section = usableLine[:3]
x = 3
sectionList.clear()
while x < lineLen:
sectionList.append(usableLine[x:x+4])
x += 4
formatDict[section] = sectionList
for k, v in formatDict.items():
print ("for key= ", k, "value =", v)
formatFileHandle.close()
You always clear, then append and then insert the same sectionList, that's why it always overwrites the entries - because you told the program it should.
Always remember: In Python assignment never makes a copy!
Simple fix
Just insert a copy:
formatDict[section] = sectionList.copy() # changed here
Instead of inserting a reference:
formatDict[section] = sectionList
Complicated fix
There are lots of things going on and you could make it "better" by using functions for subtasks like the grouping, also files should be opened with with so that the file is closed automatically even if an exception occurs and while loops where the end is known should be avoided.
Personally I would use code like this:
def groups(seq, width):
"""Group a sequence (seq) into width-sized blocks. The last block may be shorter."""
length = len(seq)
for i in range(0, length, width): # range supports a step argument!
yield seq[i:i+width]
# Printing the dictionary could be useful in other places as well -> so
# I also created a function for this.
def print_dict_line_by_line(dct):
"""Print dictionary where each key-value pair is on one line."""
for key, value in dct.items():
print("for key =", key, "value =", value)
def mytask(filename):
formatDict = {}
with open(filename) as formatFileHandle:
# I don't "strip" each line (remove leading and trailing whitespaces/newlines)
# but if you need that you could also use:
# for usableLine in (line.strip() for line in formatFileHandle):
# instead.
for usableLine in formatFileHandle:
section = usableLine[:3]
sectionList = list(groups(usableLine[3:]))
formatDict[section] = sectionList
# upon exiting the "with" scope the file is closed automatically!
print_dict_line_by_line(formatDict)
if __name__ == '__main__':
mytask('insert your filename here')
You could simplify your code here by using a with statement to auto close the file and chunk the remainder of the line into groups of four, avoiding the re-use of a single list.
from itertools import islice
with open('somefile') as fin:
stripped = (line.strip() for line in fin)
format_dict = {
line[:3]: list(iter(lambda it=iter(line[3:]): ''.join(islice(it, 4)), ''))
for line in stripped
}
for key, value in format_dict.items():
print('key=', key, 'value=', value)

How to read 2 different text files line by line and make another file containing a dictionary using python?

I have two text files name weburl.txt and imageurl.txt, weburl.txt contain URLs of website and imageurl.txt contain all images URLs I want to create a dictionary that read a line of weburl.txt and make key of a dictionary and imageurl.txt line as a value.
weburl.txt
url1
url2
url3
url4
url5......
imageurl.txt
imgurl1
imgurl2
imgurl3
imgurl4
imgurl5
required output is
{'url1': imgurl1, 'url2': imgurl2, 'url3': imgurl3......}
I am using this code
with open('weburl.txt') as f :
key = f.readlines()
with open('imageurl.txt') as g:
value = g.readlines()
dict[key] = [value]
print dict
I am not getting the required results
you can write something like
with open('weburl.txt') as f, \
open('imageurl.txt') as g:
# we use `str.strip` method
# to remove newline characters
keys = (line.strip() for line in f)
values = (line.strip() for line in g)
result = dict(zip(keys, values))
print(result)
more info about zip at docs
There are problems with the statement dict[key] = [value] on so many levels that I get a kind of vertigo as we drill down through them:
The apparent intention to use a variable called dict (a bad idea because it would overshadow Python's builtin reference to the dict class). Let's call it d instead.
Not initializing the dictionary instance first. If you had called it something liked this oversight would earn you an easy-to-understand NameError. However since you're calling it dict, Python will actually be attempting to set items in the dict class itself (which doesn't support __setitem__) instead of inside a dict instance, so you'll get a different, more-confusing error.
Attempting to make a dict entry assignment where the key is a non-hashable type (key is alist). You could convert thelist to the hashable type tuple easily enough, but that's not what you want because you'd still be...
Attempting to assign bunch of values to their respective keys all at once. This can't be done with d[key] = value syntax. It could be done all in one relatively simple statement, i.e. d=dict(zip(key,value)) but unfortunately that doesn't get around the fact that you're...
Not stripping the newline character off the end of each key and value.
Instead, this line:
d = dict((k.strip(), v.strip()) for k, v in zip(key, value))
will do what you appear to want.

Which of these is more python-like?

I'm doing some exploring of various languages I hadn't used before, using a simple Perl script as a basis for what I want to accomplish. I have a couple of versions of something, and I'm curious which is the preferred method when using Python -- or if neither is, what is?
Version 1:
workflowname = []
paramname = []
value = []
for line in lines:
wfn, pn, v = line.split(",")
workflowname.append(wfn)
paramname.append(pn)
value.append(v)
Version 2:
workflowname = []
paramname = []
value = []
i = 0;
for line in lines:
workflowname.append("")
paramname.append("")
value.append("")
workflowname[i], paramname[i], value[i] = line.split(",")
i = i + 1
Personally, I prefer the second, but, as I said, I'm curious what someone who really knows Python would prefer.
A Pythonic solution might a bit like #Bogdan's, but using zip and argument unpacking
workflowname, paramname, value = zip(*[line.split(',') for line in lines])
If you're determined to use a for construct, though, the 1st is better.
Of your two attepts the 2nd one doesn't make any sense to me. Maybe in other languages it would. So from your two proposed approaces the 1st one is better.
Still I think the pythonic way would be something like Matt Luongo suggested.
Bogdan's answer is best. In general, if you need a loop counter (which you don't in this case), you should use enumerate instead of incrementing a counter:
for index, value in enumerate(lines):
# do something with the value and the index
Version 1 is definitely better than version 2 (why put something in a list if you're just going to replace it?) but depending on what you're planning to do later, neither one may be a good idea. Parallel lists are almost never more convenient than lists of objects or tuples, so I'd consider:
# list of (workflow,paramname,value) tuples
items = []
for line in lines:
items.append( line.split(",") )
Or:
class WorkflowItem(object):
def __init__(self,workflow,paramname,value):
self.workflow = workflow
self.paramname = paramname
self.value = value
# list of objects
items = []
for line in lines:
items.append( WorkflowItem(*line.split(",")) )
(Also, nitpick: 4-space tabs are preferable to 8-space.)

Parsing files with python

My input file is going to be something like this
key "value"
key "value"
... the above lines repeat
What I do is read the file contents, populate an object with the data and return it. There are only a set number of keys that can be present in the file. Since I am a beginner in python, I feel that my code to read the file is not that good
My code is something like this
ObjInstance = CustomClass()
fields = ['key1', 'key2', 'key3']
for field in fields:
for line in f:
if line.find(field) >= 0:
if pgn_field == 'key1':
objInstance.DataOne = get_value_using_re(line)
elif pgn_field == 'key2':
objInstance.DataTwo = get_value_using_re(line)
return objInstance;
The function "get_value_using_re" is very simple, it looks for a string in between the double quotes and returns it.
I fear that I will have multiple if elif statements and I don't know if this is the right way or not.
Am I doing the right thing here?
A normal approach in Python would be something like:
for line in f:
mo = re.match(r'^(\S+)\s+"(.*?)"\s*$',line)
if not mo: continue
key, value = mo.groups()
setattr(objInstance, key, value)
If the key is not the right attribute name, in the last line line in lieu of key you might use something like translate.get(key, 'other') for some appropriate dict translate.
I'd suggest looking at the YAML parser for python. It can conveniently read a file very similar to that and input it into a python dictionary. With the YAML parser:
import yaml
map = yaml.load(file(filename))
Then you can access it like a normal dictionary with map[key] returning value. The yaml files would look like this:
key1: 'value'
key2: 'value'
This does require that all the keys be unique.

Categories

Resources