Parsing files with python

Parsing files with python - python

My input file is going to be something like this
key "value"
key "value"
... the above lines repeat
What I do is read the file contents, populate an object with the data and return it. There are only a set number of keys that can be present in the file. Since I am a beginner in python, I feel that my code to read the file is not that good
My code is something like this
ObjInstance = CustomClass()
fields = ['key1', 'key2', 'key3']
for field in fields:
for line in f:
if line.find(field) >= 0:
if pgn_field == 'key1':
objInstance.DataOne = get_value_using_re(line)
elif pgn_field == 'key2':
objInstance.DataTwo = get_value_using_re(line)
return objInstance;
The function "get_value_using_re" is very simple, it looks for a string in between the double quotes and returns it.
I fear that I will have multiple if elif statements and I don't know if this is the right way or not.
Am I doing the right thing here?

A normal approach in Python would be something like:
for line in f:
mo = re.match(r'^(\S+)\s+"(.*?)"\s*$',line)
if not mo: continue
key, value = mo.groups()
setattr(objInstance, key, value)
If the key is not the right attribute name, in the last line line in lieu of key you might use something like translate.get(key, 'other') for some appropriate dict translate.

I'd suggest looking at the YAML parser for python. It can conveniently read a file very similar to that and input it into a python dictionary. With the YAML parser:
import yaml
map = yaml.load(file(filename))
Then you can access it like a normal dictionary with map[key] returning value. The yaml files would look like this:
key1: 'value'
key2: 'value'
This does require that all the keys be unique.

Related

Dynamically building a dictionary based on variables

I am trying to build a dictionary based on a larger input of text. From this input, I will create nested dictionaries which will need to be updated as the program runs. The structure ideally looks like this:
nodes = {}
node_name: {
inc_name: inc_capacity,
inc_name: inc_capacity,
inc_name: inc_capacity,
}
Because of the nature of this input, I would like to use variables to dynamically create dictionary keys (or access them if they already exist). But I get KeyError if the key doesn't already exist. I assume I could do a try/except, but was wondering if there was a 'cleaner' way to do this in python. The next best solution I found is illustrated below:
test_dict = {}
inc_color = 'light blue'
inc_cap = 2
test_dict[f'{inc_color}'] = inc_cap
# test_dict returns >>> {'light blue': 2}

Try this code, for Large Scale input. For example file input
Lemme give you an example for what I am aiming for, and I think, this what you want.
File.txt
Person1: 115.5
Person2: 128.87
Person3: 827.43
Person4:'18.9
Numerical Validation Function
def is_number(a):
try:
float (a)
except ValueError:
return False
else:
return True
Code for dictionary File.txt
adict = {}
with open("File.txt") as data:
adict = {line[:line.index(':')]: line[line.index(':')+1: ].strip(' \n') for line in data.readlines() if is_number(line[line.index(':')+1: ].strip('\n')) == True}
print(adict)
Output
{'Person1': '115.5', 'Person2': '128.87', 'Person3': '827.43'}
For more explanation, please follow this issue solution How to fix the errors in my code for making a dictionary from a file

As already mentioned in the comments sections, you can use setdefault.
Here's how I will implement it.
Assume I want to add values to dict : node_name and I have the keys and values in two lists. Keys are in inc_names and values are in inc_ccity. Then I will use the below code to load them. Note that inc_name2 key exists twice in the key list. So the second occurrence of it will be ignored from entry into the dictionary.
node_name = {}
inc_names = ['inc_name1','inc_name2','inc_name3','inc_name2']
inc_ccity = ['inc_capacity1','inc_capacity2','inc_capacity3','inc_capacity4']
for i,names in enumerate(inc_names):
node = node_name.setdefault(names, inc_ccity[i])
if node != inc_ccity[i]:
print ('Key=',names,'already exists with value',node, '. New value=', inc_ccity[i], 'skipped')
print ('\nThe final list of values in the dict node_name are :')
print (node_name)
The output of this will be:
Key= inc_name2 already exists with value inc_capacity2 . New value= inc_capacity4 skipped
The final list of values in the dict node_name are :
{'inc_name1': 'inc_capacity1', 'inc_name2': 'inc_capacity2', 'inc_name3': 'inc_capacity3'}
This way you can add values into a dictionary using variables.

Best way to manipulate variables inside a JSON config file in Python3

I want to have a JSON config file where I can reference values internally. For example, consider this JSON config file at bellow:
{
"hdfs-base":"/user/SOME_HDFS_USER/SOME_PROJECT"
,"incoming-path":"$hdfs-base/incoming"
,"processing-path":"$hdfs-base/processing"
,"processed-path":"$hdfs-base/processed"
}
The main idea is to leverage values already stored in json object. In this case, replacing '$hdfs-base' to 'hdfs-base' attribute's value. Do you know something that do that already? I don't wanna use ConfigParser module because I want to use JSON.
Thanks!

Loop over your values, and substitute the keys if there's a match:
import json
js = open('input.json').read()
data = json.loads(js)
for k, v in data.items():
for key in data.keys():
if key in v:
data[k] = v.replace("$" + key, data[key])
BEFORE
hdfs-base /user/SOME_HDFS_USER/SOME_PROJECT
incoming-path $hdfs-base/incoming
processing-path $hdfs-base/processing
processed-path $hdfs-base/processed
AFTER
hdfs-base /user/SOME_HDFS_USER/SOME_PROJECT
incoming-path /user/SOME_HDFS_USER/SOME_PROJECT/incoming
processing-path /user/SOME_HDFS_USER/SOME_PROJECT/processing
processed-path /user/SOME_HDFS_USER/SOME_PROJECT/processed
REPL: https://repl.it/repls/NumbMammothTelephone

Creating a dictionary in Python + working with that dictionary

I am quite new to Python and am just trying to get my head around some basics.
I was wondering if anyone could show me how to perform the following tasks. I have a text file with multiple lines, those lines are as follows:
name1, variable, variable, variable
name2, variable, variable, variable
name3, variable, variable, variable
I want to store these items in a dictionary so they can be easily called. I want the name to be the key. I then want to be able to call the variables like this: key[0] or key1
The code I have at the moment does not do this:
d = {}
with open("servers.txt") as f:
for line in f:
(key, val) = line.split()
d[int(key)] = val
Once this is done, I would like to be able to take an input from a user and then check the array to see if this item is present in the array. I have found a few threads on Stackoverflow however none seem to do what I require.
There is a Similar Question asked here.
Any assistance you can provide would be amazing. I am new to this but I hope to learn fast & start contributing to threads myself in the near future :)
Cheers!

You're nearly there. Assuming that .split() actually splits the lines correctly (which it wouldn't do if there are actual commas between the values), you just need an additional unpacking operator (*):
d = {}
with open("servers.txt") as f:
for line in f:
key, *val = line.split() # first element -> key, rest -> val[0], val[1] etc.
d[int(key)] = val
If you want to check if a user-entered key exists, you can do something like
ukey = int(input("Enter key number: "))
values = d.get(ukey)
if d is not None:
# do something
else:
print("That key doesn't exist.")

Suppose that your file my_file.csv looks like:
name1, variable, variable, variable
name2, variable, variable, variable
name3, variable, variable, variable
Use pandas to do the work:
import pandas as pd
result = pd.read_csv('my_file.csv', index_col=0, header=None)
print(result)
print(result.loc['name1'])
Notice that pandas is a 3rd party library, and you need to install it using pip or easy_install tools.

How to read 2 different text files line by line and make another file containing a dictionary using python?

I have two text files name weburl.txt and imageurl.txt, weburl.txt contain URLs of website and imageurl.txt contain all images URLs I want to create a dictionary that read a line of weburl.txt and make key of a dictionary and imageurl.txt line as a value.
weburl.txt
url1
url2
url3
url4
url5......
imageurl.txt
imgurl1
imgurl2
imgurl3
imgurl4
imgurl5
required output is
{'url1': imgurl1, 'url2': imgurl2, 'url3': imgurl3......}
I am using this code
with open('weburl.txt') as f :
key = f.readlines()
with open('imageurl.txt') as g:
value = g.readlines()
dict[key] = [value]
print dict
I am not getting the required results

you can write something like
with open('weburl.txt') as f, \
open('imageurl.txt') as g:
# we use `str.strip` method
# to remove newline characters
keys = (line.strip() for line in f)
values = (line.strip() for line in g)
result = dict(zip(keys, values))
print(result)
more info about zip at docs

There are problems with the statement dict[key] = [value] on so many levels that I get a kind of vertigo as we drill down through them:
The apparent intention to use a variable called dict (a bad idea because it would overshadow Python's builtin reference to the dict class). Let's call it d instead.
Not initializing the dictionary instance first. If you had called it something liked this oversight would earn you an easy-to-understand NameError. However since you're calling it dict, Python will actually be attempting to set items in the dict class itself (which doesn't support __setitem__) instead of inside a dict instance, so you'll get a different, more-confusing error.
Attempting to make a dict entry assignment where the key is a non-hashable type (key is alist). You could convert thelist to the hashable type tuple easily enough, but that's not what you want because you'd still be...
Attempting to assign bunch of values to their respective keys all at once. This can't be done with d[key] = value syntax. It could be done all in one relatively simple statement, i.e. d=dict(zip(key,value)) but unfortunately that doesn't get around the fact that you're...
Not stripping the newline character off the end of each key and value.
Instead, this line:
d = dict((k.strip(), v.strip()) for k, v in zip(key, value))
will do what you appear to want.

Python Config Parser (Duplicate Key Support)

So I recently started writing a config parser for a Python project I'm working on. I initially avoided configparser and configobj, because I wanted to support a config file like so:
key=value
key2=anothervalue
food=burger
food=hotdog
food=cake icecream
In short, this config file is going to be edited via the command line over SSH often. So I don't want to tab or finicky about spacing (like YAML), but I also want avoid keys with multiple values (easily 10 or more) being line wrapped in vi. This is why I would like to support duplicate keys.
An my ideal world, when I ask the Python config object for food, it would give me a list back with ['burger', 'hotdog', 'cake', 'icecream']. If there wasn't a food value defined, it would look in a defaults config file and give me that/those values.
I have already implemented the above
However, my troubles started when I realized I wanted to support preserving inline comments and such. The way I handle reading and writing to the config files, is decoding the file into a dict in memory, read the values from the dict, or write values to the dict, and then dump that dict back out into a file. This isn't really nice for preserving line order and commenting and such and it's bugging the crap out of me.
A) ConfigObj looks like it has everything I need except support duplicate keys. Instead it wants me to make a list is going to be a pain to edit manually in vi over ssh due to line wrapping. Can I make configobj more ssh/vi friendly?
B) Is my homebrew solution wrong? Is there a better way of reading/writing/storing my config values? Is there any easy way to handle changing a key value in a config file by just modifying that line and rewriting the entire config file from memory?

Well I would certainly try to leverage what is in the standard library if I could.
The signature for the config parser classes look like this:
class ConfigParser.SafeConfigParser([defaults[, dict_type[, allow_no_value]]])
Notice the dict_type argument. When provided, this will be used to construct the dictionary objects for the list of sections, for the options within a section, and for the default values. It defaults to collections.OrderedDict. Perhaps you could pass something in there to get your desired multiple-key behavior, and then reap all the advantages of ConfigParser. You might have to write your own class to do this, or you could possibly find one written for you on PyPi or in the ActiveState recipes. Try looking for a bag or multiset class.
I'd either go that route or just suck it up and make a list:
foo = value1, value2, value3

Crazy idea: make your dictionary values as a list of 3-tuples with line number, col number and value itself and add special key for comment.
CommentSymbol = ';'
def readConfig(filename):
f = open(filename, 'r')
if not f:
return
def addValue(dict, key, lineIdx, colIdx, value):
if key in dict:
dict[key].append((lineIdx, colIdx, value))
else:
dict[key] = [(lineIdx, colIdx, value)]
res = {}
i = 0
for line in f.readlines():
idx = line.find(CommentSymbol)
if idx != -1:
comment = line[idx + 1:]
addValue(res, CommentSymbol, i, idx, comment)
line = line[:idx]
pair = [x.strip() for x in line.split('=')][:2]
if len(pair) == 2:
addValue(res, pair[0], i, 0, pair[1])
i += 1
return res
def writeConfig(dict, filename):
f = open(filename, 'w')
if not f:
return
index = sorted(dict.iteritems(), cmp = lambda x, y: cmp(x[1][:2], y[1][:2]))
i = 0
for k, V in index:
for v in V:
if v[0] > i:
f.write('\n' * (v[0] - i - 1))
if k == CommentSymbol:
f.write('{0}{1}'.format(CommentSymbol, str(v[2])))
else:
f.write('{0} = {1}'.format(str(k), str(v[2])))
i = v[0]
f.close()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.