Python Advice for a beginner. Regex, Dictionaries etc? - python

I'm writing my second python script to try and parse the contents of a config file and would like some noob advice. I'm not sure if its best to use regex to parse my script since its multiple lines? I've also been reading about dictionaries and wondered if this would be good practice. I'm not necessarily looking for the code just a push in the right direction.
Example: My config file looks like this.
Job {
Name = "host.domain.com-foo"
Client = host.domain.com-fd
JobDefs = "DefaultJob"
FileSet = "local"
Write Bootstrap = "/etc/foo/host.domain.com-foo.bsr"
Pool = storage-disk1
}
Should I used regex, line splitting or maybe a module? If I had multiple jobs in my config file would I use a dictionary to correlate a job to a pool?

If you can change the configuration file format, you can directly write your file as a Python file.
config.py
job = {
'Name' : "host.domain.com-foo",
'Client' : "host.domain.com-fd",
'JobDefs' : "DefaultJob",
'FileSet' : "local",
'Write Bootstrap' : "/etc/foo/host.domain.com-foo.bsr",
'Pool' : 'storage-disk1'
}
yourscript.py
from config import job
print job['Name']

There are numorous existing alternatives for this task, json, pickle and yaml to name 3. Unless you really want to implement this yourself, you should use one of these. Even if you do roll your own, following the format of one of the above is still a good idea.
Also, it's a much better idea to use a parser/generator or similar tool to do the parsing, regex's are going to be harder to maintain and more inefficient for this type of task.

If your config file can be turned into a python file, just make it a dictionary and import the module.
Job = { "Name" : "host.domain.com-foo",
"Client" : "host.domain.com-fd",
"JobDefs" : "DefaultJob",
"FileSet" : "local",
"Write BootStrap" : "/etc/foo/host.domain.com-foo.bsr",
"Pool" : "storage-disk1" }
You can access the options by simply calling Job["Name"]..etc.
The ConfigParser is easy to use as well. You can create a text file that looks like this:
[Job]
Name=host.domain.com-foo
Client=host.domain.com-fd
JobDefs=DefaultJob
FileSet=local
Write BootStrap=/etc/foo/host.domain.com-foo.bsr
Pool=storage-disk1
Just keep it simple like one of the above.

ConfigParser module from the standard library is probably the most Pythonic and staight-forward way to parse a configuration file that your python script is using.
If you are restricted to using the particular format you have outlined, then using pyparsing is pretty good.

I don't think a regex is adequate for parsing something like this. You could look at a true parser, such as pyparsing. Or if the file format is within your control, you might consider XML. There are standard Python libraries for parsing that.

Related

Python parser. Need to read the "Name and author" out of a text file and out put all of the collected names into another text file

I'm trying to take two things out of text files that are in folders and output them into a neat list in a single text file. I've never done something like this before and all of the online resources are either too simple for my task or too complex for my task.I have a feeling this task is specific to what I'm trying to do.
[Info]
name = "bridget"
displayname = "BRIDGET"
versiondate = 04,13,2002
mugenversion = 04,14,2001
author = "[fraya]"
pal.defaults = 1
All I'm trying to do is take the "displayname" and "author" text fields and output them to a file in a list with the format "(displayname) by (author)"
a parser was the first thing that came to my mind when I wanted to try this (and python I heard was a good choice for this).
So if anyone could point me in the right direction or give me some building blocks that would be helpful.
You don't need to write a parser; this is (almost) standard .ini file format, which can be read by the configparser module. You'll just need to strip the quotes when you output the values.
To get you started:
import configparser
c = configparser.ConfigParser()
c.read(['myfilename.ini'])
info = c['Info']
displayname = info['displayname'].strip('"')
author = info['author'].strip('"')
print("{} by {}".format(displayname, author))

How can I handle reading a .json file in it that has comments with python?

Firstly, I understand that comments aren't valid json. That said, for some reason this .json file I have to process has comments at the start of lines and at the end of lines.
How can i handle this in python and basically load the .json file but ignore the comments so that I can process it? I am currently doing the following:
with open('/home/sam/Lean/Launcher/bin/Debug/config.json', 'r') as f:
config_data=json.load(f)
But this crashes at the json.load(f) command because the file f has comments in it.
I thought this would be a common problem but I can't find much online RE how to handle it in python. Someone suggested commentjson but that makes my script crash saying
ImportError: cannot import name 'dump'
When I import commentjson
Thoughts?
Edit:
Here is a snippet of the json file i must process.
{
// this configuration file works by first loading all top-level
// configuration items and then will load the specified environment
// on top, this provides a layering affect. environment names can be
// anything, and just require definition in this file. There's
// two predefined environments, 'backtesting' and 'live', feel free
// to add more!
"environment": "backtesting",// "live-paper", "backtesting", "live-interactive", "live-interactive-iqfeed"
// algorithm class selector
"algorithm-type-name": "BasicTemplateAlgorithm",
// Algorithm language selector - options CSharp, FSharp, VisualBasic, Python, Java
"algorithm-language": "CSharp"
}
Switch into json5. The JSON 5 is a very small superset of JSON that supports comments and few other features you could just ignore.
import json5 as json
# and the rest is the same
It is beta, and it is slower, but if you just need to read some short configuration once when starting the program, this probably can be considered as an option. It is better to switch into another standard than not to follow any.
kind of a hack (because if there are // within the json data then it will fail) but simple enough for most cases:
import json,re
s = """{
// this configuration file works by first loading all top-level
// configuration items and then will load the specified environment
// on top, this provides a layering affect. environment names can be
// anything, and just require definition in this file. There's
// two predefined environments, 'backtesting' and 'live', feel free
// to add more!
"environment": "backtesting",// "live-paper", "backtesting", "live-interactive", "live-interactive-iqfeed"
// algorithm class selector
"algorithm-type-name": "BasicTemplateAlgorithm",
// Algorithm language selector - options CSharp, FSharp, VisualBasic, Python, Java
"algorithm-language": "CSharp"
}
"""
result = json.loads(re.sub("//.*","",s,flags=re.MULTILINE))
print(result)
gives:
{'environment': 'backtesting', 'algorithm-type-name': 'BasicTemplateAlgorithm', 'algorithm-language': 'CSharp'}
apply regular expression to all the lines, removing double slashes and all that follows.
Maybe a state machine parsing the line would be better to make sure the // aren't in quotes, but that's slightly more complex (but doable)
I haven't used it personally but you can have a look on JSONComment python package which supports parsing a json file with comment. Use it in place of JsonParser
parser = JsonComment(json)
parsed_object = parser.loads(jsonString)
You can take out the comments with the following:
data=re.sub("//.*?\n","",data)
data=re.sub("/\\*.*?\\*/","",data)
This should remove all comments from the data. It could cause problems if there are // or /* inside your strings

Most Effecient way to parse Evtx files for specific content

I have hundreds of gigs of Evtx security event logs I want to parse for specific Event IDs (4624) and usernames (joe) based on the Event IDs. I have attempted to use Powershell cmdlet like below:
get-winevent -filterhashtable #{Path="mypath.evtx"; providername="securitystuffprovider"; id=4624}
I know I can pass a variable containing a list to the Path parameter for all of my evtx files, but I am unable to filter based on a subset of the message of the EVTX. Also, this takes an incredibly long time to parse just one Evtx file much less 150 or so. I know there is a python package to parse Evtx but I am not sure how that would look as the python-evtx parser doesn't provide great examples of importing and using the package itself. I can not extract all of the data into csv as that would take too much disk space. Any ideas on how would be amazing. Thanks.
Use -Path with the -FilterXPath parameter, and then filter using an XPath expression like so:
$Username = 'jdoe'
$XPathFilter = "*[System[(EventID=4624)] and EventData[Data[#Name='SubjectUserName'] and (Data='$Username')]]"
Get-WinEvent -Path C:\path\to\log\files\*.evtx -FilterXPath $XPathFilter

Saving and loading simple data in Python convenient way

I'm currently working on a simple Python 3.4.3 and Tkinter game.
I struggle with saving/reading data now, because I'm a beginner at coding.
What I do now is use .txt files to store my data, but I find this extremely counter-intuitive, as saving/reading more than one line of data requires of me to have additional code to catch any newlines.
Skipping a line would be terrible too.
I've googled it, but I either find .txt save/file options or way too complex ones for saving large-scale data.
I only need to save some strings right now and be able to access them (if possible) by key like in a dictionary key:value .
Do you know of any file format/method to help me accomplish that?
Also: If possible, should work on Win/iOS/Linux.
It sounds like using json would be best for this, which comes as part of the Python Standard library in Python-2.6+
import json
data = {'username':'John', 'health':98, 'weapon':'warhammer'}
# serialize the data to user-data.txt
with open('user-data.txt', 'w') as fobj:
json.dump(data, fobj)
# read the data back in
with open('user-data.txt', 'r') as fobj:
data = json.load(fobj)
print(data)
# outputs:
# {u'username': u'John', u'weapon': u'warhammer', u'health': 98}
A popular alternative is yaml, which is actually a superset of json and produces slightly more human readable results.
You might want to try Redis.
http://redis.io/
I'm not totally sure it'll meet all your needs, but it would probably be better than a flat file.

Parameter with dictionary path

I am very new to Python and am not very familiar with the data structures in Python.
I am writing an automatic JSON parser in Python, the JSON message is read into a dictionary using Ultra-JSON:
jsonObjs = ujson.loads(data)
Now, if I try something like:
jsonObjs[param1][0][param2] it works fine
However, I need to get the path from an external source (I read it from the DB), we initially thought we'll just write in the DB:
myPath = [param1][0][param2]
and then try to access:
jsonObjs[myPath]
But after a couple of failures I realized I'm trying to access:
jsonObjs[[param1][0][param2]]
Is there a way to fix this without parsing myPath?
Many thanks for your help and advice
Store the keys in a format that preserves type information, e.g. JSON, and then use reduce() to perform recursive accesses on the structure.

Categories

Resources