Importance of __getitem___ : HTML Parsing using Python

Importance of __getitem___ : HTML Parsing using Python - python

class mainprogram():
def getData(self, file, begin, end):
parser = MyHTMLParser()
f = open(file);
rawcontent = f.read()
#Get main content
content = rawcontent.split('<div id="header"')[1];
content = content.split('</html>')[0];
del parsed_data[:]
html = content.split(begin)[1];
html = html.split(end)[0];
parser.feed(html);
result = list(parsed_data);
return result;
I'm in the stage of practising Python and while I was doing an assignment related to Python I was stuck. The above code Snippet uses htmlparser for parsing .msg file to convert into csv format.
Could any one explain me the what does [1] [0] signify in these below lines
content = rawcontent.split('<div id="header"')[1];
content = content.split('</html>')[0];
Presently I'm using Python community version for development, when I highlight that particular [1] or [0] its showing up as
class list
def __getitem__(self, y)

As described in special method names, the method name __getitem__() allows a class to override the foo[bar] syntax. Lists do this to provide both subscripting (e.g. foo[5]) and slicing (foo[1:5]). Dictionaries do this to provide key lookup.

Related

Python unit testing with PyTest, stuck on more advanced testing

I have been using PyTest for some time now to write some simple tests (like the ones you find in tutorials and youtube video's) and I thought now it was time to start writing actual test for our python scripts. The scripts are way more advanced than any shown in tutorials so I am getting a bit stuck. I do not want the entire correct answer, but rather a nudge in the right direction if possible. Here is my issue:
We have a script that reads a .md text file and converts it to a pdf file based on an external template. Part of the script is here below (I removed most of it because I first just want to have 1 running test)
class DocumentationEngine:
def __init__(self, title, subtitle, series, style='TIIStyle_Digital_Aug_2020', templateFile='template.docet', tableOfContents=True, listOfFigures=False, listOfTables=False):
self.title = title
self.subtitle = subtitle
self.series = series
self.style = style
self.template = {}
self.hasTOC = tableOfContents
self.hasLOF = listOfFigures
self.hasLOT = listOfTables
self.loadTemplate(templateFile)
def loadTemplate(self, file='template.docet'):
with open(file, "r") as templatefile:
lines = templatefile.readlines()
key = "dummy"
value = ""
for line in lines:
line = line.strip()
if line.startswith('[') and line.endswith(']'):
self.template[key] = value
key = line[1:-1]
value = ""
else:
value += line + '\n'
def build(self, versions=[], content='', filename='Documenter\\_Autogenerated'):
document = self.template["doc"]
document = document.replace("%%style%%", self.style)
document = document.replace("%%body%%",
self.buildFirstPage() +
self.buildTableOfContents() +
self.buildListOfFigures() +
self.buildListOfTables() +
self.buildVersionTable(versions, filename) +
self.buildContentPages(content=content) +
self.buildLastPage()
)
return document
def buildLastPage(self):
return self.template["last_page"]
I am trying to write a simple unit test for the buildLastPage method and have been stuck for several days now.
I am not sure whether or not I need to mock the template file, use a fixture and/or if I can actually test only that method with all dependencies.
I started with the following:
from doceng import DocumentationEngine
import pytest
class Test:
def test_buildLastPage(self):
build_last_page = DocumentationEngine()
assert build_last_page.template(1) == 1
which gives me an error regarding 3 required arguments. When adding the arguments like this:
from doceng import DocumentationEngine
import pytest
class Test:
def test_buildLastPage(self, title, subtitle, series):
build_last_page = DocumentationEngine()
assert build_last_page.template(1) == 1
which gives me an error that the fixture is not found.
I added a fixture in conftest.py file like this:
import pytest
from doceng import DocumentationEngine
#pytest.fixture
def title(title):
return title("test")
which will get me another error, recursive dependency involving fixture 'title' detected
I'm quite stuck so any nudge in the right direction for a newbie would be highly appreciated

The error of the fixtures is regarding your test function test_buildLastPage. The way you are using it, it only needs the self argument.
A test function in pytest without any decorators always expects to find fixtures, that have the same name as the arguments. You did not define any fixtures and also do not use the arguments in your function. Therefore, you can remove them safely.
The actual error points DocumentationEngine(). The class expect 3 arguments when initializing the object. You set no arguments. Check your __init__ function again to find the proper arguments.

Using python class with spark DataFrame to parse URL's

I'm trying to process URL's in a pyspark dataframe using a class that I've written and a udf. I'm aware of urllib and other url parsing libraries but for this case I need to use my own code.
In order to get the tld of a url I cross check it against the iana public suffix list.
Here's a simplification of my code
class Parser:
# list of available public suffixes for extracting top level domains
file = open("public_suffix_list.txt", 'r')
data = []
for line in file:
if line.startswith("//") or line == '\n':
pass
else:
data.append(line.strip('\n'))
def __init__(self, url):
self.url = url
#the code here extracts port,protocol,query etc.
#I think this bit below is causing the error
matches = [r for r in self.data if r in self.hostname]
#extra functionality in my actual class
i = matches.index(self.string)
try:
self.tld = matches[i]
# logic to find tld if no match
The class works in pure python so for example I can run
import Parser
x = Parser("www.google.com")
x.tld #returns ".com"
However when I try to do
import Parser
from pyspark.sql.functions import udf
parse = udf(lambda x: Parser(x).url)
df = sqlContext.table("tablename").select(parse("column"))
When I call an action I get
File "<stdin>", line 3, in <lambda>
File "<stdin>", line 27, in __init__
TypeError: 'in <string>' requires string as left operand
So my guess is that it's failing to interpret the data as a list of strings?
I've also tried to use
file = sc.textFile("my_file.txt")\
.filter(lambda x: not x.startswith("//") or != "")\
.collect()
data = sc.broadcast(file)
to open my file instead, but that causes
Exception: It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transforamtion. SparkContext can only be used on the driver, not in code that it run on workers. For more information, see SPARK-5063.
Any ideas?
Thanks in advance
EDIT: Apologies, I didn't have my code to hand so my test code didn't explain very well the problems I was having. The error I initially reported was a result of the test data I was using.
I've updated my question to be more reflective of the challenge I'm facing.

Why do you need a class in this case (the code for defining your class is incorrect, you never declared self.data before using it in the init method) the only relevant line that affects the output you want is self.string=string, so you are basically passing the identity function as udf.
The UnicodeDecodeError is due to an encoding issue in your file, it has nothing to do with your definition of the class.
The second error is in the line sc.broadcast(file) , details of which can be found here : Spark: Broadcast variables: It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transforamtion
EDIT 1
I would redefine your class structure as follows. You basically need to create the instance self.data by calling self.data = data before you can use it. Also anything that you write before the init method is executed irrespective of whether you call that class or not. So moving out the file parsing part will not have any effect.
# list of available public suffixes for extracting top level domains
file = open("public_suffix_list.txt", 'r')
data = []
for line in file:
if line.startswith("//") or line == '\n':
pass
else:
data.append(line.strip('\n'))
class Parser:
def __init__(self, url):
self.url = url
self.data = data
#the code here extracts port,protocol,query etc.
#I think this bit below is causing the error
matches = [r for r in self.data if r in self.hostname]
#extra functionality in my actual class
i = matches.index(self.string)
try:
self.tld = matches[i]
# logic to find tld if no match

Retrieve XML parent and child attributes using Python and lxml

I'm trying to process an XML file using XPATH in Python / lxml.
I can pull out the values at a particular level of the tree using this code:
file_name = input('Enter the file name, including .xml extension: ') # User inputs file name
print('Parsing ' + file_name)
from lxml import etree
parser = etree.XMLParser()
tree = etree.parse(file_name, parser)
r = tree.xpath('/dataimport/programmelist/programme')
print (len(r))
with open(file_name+'.log', 'w', encoding='utf-8') as f:
for r in tree.xpath('/dataimport/programmelist/programme'):
progid = (r.get("id"))
print (progid)
It returns a list of values as expected. I also want to return the value of a 'child' (where it exists), but I can't work out how (I can only get it to work as a separate list, but I need to maintain the link between them).
Note: I will be writing the values out to a log file, but since I haven't been successful in getting everything out that I want, I haven't added the 'write out' code yet.
This is the structure of the XML:
<dataimport dtdversion="1.1">
<programmelist>
<programme id="eid-273168">
<imageref idref="img-1844575"/>
How can I get Python to return the id + idref?
The previous examples I have worked with had namespaces, but this file doesn't.

Since xpath() method returns tree, you can use xpath again to get idref list you want:
for r in tree.xpath('/dataimport/programmelist/programme')
progid = r.get("id")
ref_list = r.xpath('imageref/#idref')
print progid, ref_lis

Python: Update XML-file using ElementTree while conserving layout as much as possible

I have a document which uses an XML namespace for which I want to increase /group/house/dogs by one: (the file is called houses.xml)
<?xml version="1.0"?>
<group xmlns="http://dogs.house.local">
<house>
<id>2821</id>
<dogs>2</dogs>
</house>
</group>
My current result using the code below is: (the created file is called houses2.xml)
<ns0:group xmlns:ns0="http://dogs.house.local">
<ns0:house>
<ns0:id>2821</ns0:id>
<ns0:dogs>3</ns0:dogs>
</ns0:house>
</ns0:group>
I would like to fix two things (if it is possible using ElementTree. If it isn´t, I´d be greatful for a suggestion as to what I should use instead):
I want to keep the <?xml version="1.0"?> line.
I do not want to prefix all tags, I´d like to keep it as is.
In conclusion, I don´t want to mess with the document more than I absolutely have to.
My current code (which works except for the above mentioned flaws) generating the above result follows.
I have made a utility function which loads an XML file using ElementTree and returns the elementTree and the namespace (as I do not want to hard code the namespace, and am willing to take the risk it implies):
def elementTreeRootAndNamespace(xml_file):
from xml.etree import ElementTree
import re
element_tree = ElementTree.parse(xml_file)
# Search for a namespace on the root tag
namespace_search = re.search('^({\S+})', element_tree.getroot().tag)
# Keep the namespace empty if none exists, if a namespace exists set
# namespace to {namespacename}
namespace = ''
if namespace_search:
namespace = namespace_search.group(1)
return element_tree, namespace
This is my code to update the number of dogs and save it to the new file houses2.xml:
elementTree, namespace = elementTreeRootAndNamespace('houses.xml')
# Insert the namespace before each tag when when finding current number of dogs,
# as ElementTree requires the namespace to be prefixed within {...} when a
# namespace is used in the document.
dogs = elementTree.find('{ns}house/{ns}dogs'.format(ns = namespace))
# Increase the number of dogs by one
dogs.text = str(int(dogs.text) + 1)
# Write the result to the new file houses2.xml.
elementTree.write('houses2.xml')

An XML based solution to this problem is to write a helper class for ElementTree which:
Grabs the XML-declaration line before parsing as ElementTree at the time of writing is unable to write an XML-declaration line without also writing an encoding attribute(I checked the source).
Parses the input file once, grabs the namespace of the root element. Registers that namespace with ElementTree as having the empty string as prefix. When that is done the source file is parsed using ElementTree again, with that new setting.
It has one major drawback:
XML-comments are lost. Which I have learned is not acceptable for this situation(I initially didn´t think the input data had any comments, but it turns out it has).
My helper class with example:
from xml.etree import ElementTree as ET
import re
class ElementTreeHelper():
def __init__(self, xml_file_name):
xml_file = open(xml_file_name, "rb")
self.__parse_xml_declaration(xml_file)
self.element_tree = ET.parse(xml_file)
xml_file.seek(0)
root_tag_namespace = self.__root_tag_namespace(self.element_tree)
self.namespace = None
if root_tag_namespace is not None:
self.namespace = '{' + root_tag_namespace + '}'
# Register the root tag namespace as having an empty prefix, as
# this has to be done before parsing xml_file we re-parse.
ET.register_namespace('', root_tag_namespace)
self.element_tree = ET.parse(xml_file)
def find(self, xpath_query):
return self.element_tree.find(xpath_query)
def write(self, xml_file_name):
xml_file = open(xml_file_name, "wb")
if self.xml_declaration_line is not None:
xml_file.write(self.xml_declaration_line + '\n')
return self.element_tree.write(xml_file)
def __parse_xml_declaration(self, xml_file):
first_line = xml_file.readline().strip()
if first_line.startswith('<?xml') and first_line.endswith('?>'):
self.xml_declaration_line = first_line
else:
self.xml_declaration_line = None
xml_file.seek(0)
def __root_tag_namespace(self, element_tree):
namespace_search = re.search('^{(\S+)}', element_tree.getroot().tag)
if namespace_search is not None:
return namespace_search.group(1)
else:
return None
def __main():
el_tree_hlp = ElementTreeHelper('houses.xml')
dogs_tag = el_tree_hlp.element_tree.getroot().find(
'{ns}house/{ns}dogs'.format(
ns=el_tree_hlp.namespace))
one_dog_added = int(dogs_tag.text.strip()) + 1
dogs_tag.text = str(one_dog_added)
el_tree_hlp.write('hejsan.xml')
if __name__ == '__main__':
__main()
The output:
<?xml version="1.0"?>
<group xmlns="http://dogs.house.local">
<house>
<id>2821</id>
<dogs>3</dogs>
</house>
</group>
If someone has an improvement to this solution please don´t hesitate to grab the code and improve it.

Round-tripping, unfortunately, isn't a trivial problem. With XML, it's generally not possible to preserve the original document unless you use a special parser (like DecentXML but that's for Java).
Depending on your needs, you have the following options:
If you control the source and you can secure your code with unit tests, you can write your own, simple parser. This parser doesn't accept XML but only a limited subset. You can, for example, read the whole document as a string and then use Python's string operations to locate <dogs> and replace anything up to the next <. Hack? Yes.
You can filter the output. XML allows the string <ns0: only in one place, so you can search&replace it with < and then the same with <group xmlns:ns0=" → <group xmlns=". This is pretty safe unless you can have CDATA in your XML.
You can write your own, simple XML parser. Read the input as a string and then create Elements for each pair of <> plus their positions in the input. That allows you to take the input apart quickly but only works for small inputs.

when Save xml add default_namespace argument is easy to avoid ns0, on my code
key code: xmltree.write(xmlfiile,"utf-8",default_namespace=xmlnamespace)
if os.path.isfile(xmlfiile):
xmltree = ET.parse(xmlfiile)
root = xmltree.getroot()
xmlnamespace = root.tag.split('{')[1].split('}')[0] //get namespace
initwin=xmltree.find("./{"+ xmlnamespace +"}test")
initwin.find("./{"+ xmlnamespace +"}content").text = "aaa"
xmltree.write(xmlfiile,"utf-8",default_namespace=xmlnamespace)

etree from lxml provides this feature.
elementTree.write('houses2.xml',encoding = "UTF-8",xml_declaration = True) helps you in not omitting the declaration
While writing into the file it does not change the namespaces.
http://lxml.de/parsing.html is the link for its tutorial.
P.S : lxml should be installed separately.

Properties file in python (similar to Java Properties)

Given the following format (.properties or .ini):
propertyName1=propertyValue1
propertyName2=propertyValue2
...
propertyNameN=propertyValueN
For Java there is the Properties class that offers functionality to parse / interact with the above format.
Is there something similar in python's standard library (2.x) ?
If not, what other alternatives do I have ?

I was able to get this to work with ConfigParser, no one showed any examples on how to do this, so here is a simple python reader of a property file and example of the property file. Note that the extension is still .properties, but I had to add a section header similar to what you see in .ini files... a bit of a bastardization, but it works.
The python file: PythonPropertyReader.py
#!/usr/bin/python
import ConfigParser
config = ConfigParser.RawConfigParser()
config.read('ConfigFile.properties')
print config.get('DatabaseSection', 'database.dbname');
The property file: ConfigFile.properties
[DatabaseSection]
database.dbname=unitTest
database.user=root
database.password=
For more functionality, read: https://docs.python.org/2/library/configparser.html

For .ini files there is the configparser module that provides a format compatible with .ini files.
Anyway there's nothing available for parsing complete .properties files, when I have to do that I simply use jython (I'm talking about scripting).

I know that this is a very old question, but I need it just now and I decided to implement my own solution, a pure python solution, that covers most uses cases (not all):
def load_properties(filepath, sep='=', comment_char='#'):
"""
Read the file passed as parameter as a properties file.
"""
props = {}
with open(filepath, "rt") as f:
for line in f:
l = line.strip()
if l and not l.startswith(comment_char):
key_value = l.split(sep)
key = key_value[0].strip()
value = sep.join(key_value[1:]).strip().strip('"')
props[key] = value
return props
You can change the sep to ':' to parse files with format:
key : value
The code parses correctly lines like:
url = "http://my-host.com"
name = Paul = Pablo
# This comment line will be ignored
You'll get a dict with:
{"url": "http://my-host.com", "name": "Paul = Pablo" }

A java properties file is often valid python code as well. You could rename your myconfig.properties file to myconfig.py. Then just import your file, like this
import myconfig
and access the properties directly
print myconfig.propertyName1

if you don't have multi line properties and a very simple need, a few lines of code can solve it for you:
File t.properties:
a=b
c=d
e=f
Python code:
with open("t.properties") as f:
l = [line.split("=") for line in f.readlines()]
d = {key.strip(): value.strip() for key, value in l}

If you have an option of file formats I suggest using .ini and Python's ConfigParser as mentioned. If you need compatibility with Java .properties files I have written a library for it called jprops. We were using pyjavaproperties, but after encountering various limitations I ended up implementing my own. It has full support for the .properties format, including unicode support and better support for escape sequences. Jprops can also parse any file-like object while pyjavaproperties only works with real files on disk.

This is not exactly properties but Python does have a nice library for parsing configuration files. Also see this recipe: A python replacement for java.util.Properties.

i have used this, this library is very useful
from pyjavaproperties import Properties
p = Properties()
p.load(open('test.properties'))
p.list()
print(p)
print(p.items())
print(p['name3'])
p['name3'] = 'changed = value'

Here is link to my project: https://sourceforge.net/projects/pyproperties/. It is a library with methods for working with *.properties files for Python 3.x.
But it is not based on java.util.Properties

This is a one-to-one replacement of java.util.Propeties
From the doc:
def __parse(self, lines):
""" Parse a list of lines and create
an internal property dictionary """
# Every line in the file must consist of either a comment
# or a key-value pair. A key-value pair is a line consisting
# of a key which is a combination of non-white space characters
# The separator character between key-value pairs is a '=',
# ':' or a whitespace character not including the newline.
# If the '=' or ':' characters are found, in the line, even
# keys containing whitespace chars are allowed.
# A line with only a key according to the rules above is also
# fine. In such case, the value is considered as the empty string.
# In order to include characters '=' or ':' in a key or value,
# they have to be properly escaped using the backslash character.
# Some examples of valid key-value pairs:
#
# key value
# key=value
# key:value
# key value1,value2,value3
# key value1,value2,value3 \
# value4, value5
# key
# This key= this value
# key = value1 value2 value3
# Any line that starts with a '#' is considerered a comment
# and skipped. Also any trailing or preceding whitespaces
# are removed from the key/value.
# This is a line parser. It parses the
# contents like by line.

You can use a file-like object in ConfigParser.RawConfigParser.readfp defined here -> https://docs.python.org/2/library/configparser.html#ConfigParser.RawConfigParser.readfp
Define a class that overrides readline that adds a section name before the actual contents of your properties file.
I've packaged it into the class that returns a dict of all the properties defined.
import ConfigParser
class PropertiesReader(object):
def __init__(self, properties_file_name):
self.name = properties_file_name
self.main_section = 'main'
# Add dummy section on top
self.lines = [ '[%s]\n' % self.main_section ]
with open(properties_file_name) as f:
self.lines.extend(f.readlines())
# This makes sure that iterator in readfp stops
self.lines.append('')
def readline(self):
return self.lines.pop(0)
def read_properties(self):
config = ConfigParser.RawConfigParser()
# Without next line the property names will be lowercased
config.optionxform = str
config.readfp(self)
return dict(config.items(self.main_section))
if __name__ == '__main__':
print PropertiesReader('/path/to/file.properties').read_properties()

If you need to read all values from a section in properties file in a simple manner:
Your config.properties file layout :
[SECTION_NAME]
key1 = value1
key2 = value2
You code:
import configparser
config = configparser.RawConfigParser()
config.read('path_to_config.properties file')
details_dict = dict(config.items('SECTION_NAME'))
This will give you a dictionary where keys are same as in config file and their corresponding values.
details_dict is :
{'key1':'value1', 'key2':'value2'}
Now to get key1's value :
details_dict['key1']
Putting it all in a method which reads that section from config file only once(the first time the method is called during a program run).
def get_config_dict():
if not hasattr(get_config_dict, 'config_dict'):
get_config_dict.config_dict = dict(config.items('SECTION_NAME'))
return get_config_dict.config_dict
Now call the above function and get the required key's value :
config_details = get_config_dict()
key_1_value = config_details['key1']
-------------------------------------------------------------
Extending the approach mentioned above, reading section by section automatically and then accessing by section name followed by key name.
def get_config_section():
if not hasattr(get_config_section, 'section_dict'):
get_config_section.section_dict = dict()
for section in config.sections():
get_config_section.section_dict[section] =
dict(config.items(section))
return get_config_section.section_dict
To access:
config_dict = get_config_section()
port = config_dict['DB']['port']
(here 'DB' is a section name in config file
and 'port' is a key under section 'DB'.)

create a dictionary in your python module and store everything into it and access it, for example:
dict = {
'portalPath' : 'www.xyx.com',
'elementID': 'submit'}
Now to access it you can simply do:
submitButton = driver.find_element_by_id(dict['elementID'])

My Java ini files didn't have section headers and I wanted a dict as a result. So i simply injected an "[ini]" section and let the default config library do its job.
Take a version.ini fie of the eclipse IDE .metadata directory as an example:
#Mon Dec 20 07:35:29 CET 2021
org.eclipse.core.runtime=2
org.eclipse.platform=4.19.0.v20210303-1800
# 'injected' ini section
[ini]
#Mon Dec 20 07:35:29 CET 2021
org.eclipse.core.runtime=2
org.eclipse.platform=4.19.0.v20210303-1800
The result is converted to a dict:
from configparser import ConfigParser
#staticmethod
def readPropertyFile(path):
# https://stackoverflow.com/questions/3595363/properties-file-in-python-similar-to-java-properties
config = ConfigParser()
s_config= open(path, 'r').read()
s_config="[ini]\n%s" % s_config
# https://stackoverflow.com/a/36841741/1497139
config.read_string(s_config)
items=config.items('ini')
itemDict={}
for key,value in items:
itemDict[key]=value
return itemDict

This is what I'm doing in my project: I just create another .py file called properties.py which includes all common variables/properties I used in the project, and in any file need to refer to these variables, put
from properties import *(or anything you need)
Used this method to keep svn peace when I was changing dev locations frequently and some common variables were quite relative to local environment. Works fine for me but not sure this method would be suggested for formal dev environment etc.

import json
f=open('test.json')
x=json.load(f)
f.close()
print(x)
Contents of test.json:
{"host": "127.0.0.1", "user": "jms"}

I have created a python module that is almost similar to the Properties class of Java ( Actually it is like the PropertyPlaceholderConfigurer in spring which lets you use ${variable-reference} to refer to already defined property )
EDIT : You may install this package by running the command(currently tested for python 3).
pip install property
The project is hosted on GitHub
Example : ( Detailed documentation can be found here )
Let's say you have the following properties defined in my_file.properties file
foo = I am awesome
bar = ${chocolate}-bar
chocolate = fudge
Code to load the above properties
from properties.p import Property
prop = Property()
# Simply load it into a dictionary
dic_prop = prop.load_property_files('my_file.properties')

Below 2 lines of code shows how to use Python List Comprehension to load 'java style' property file.
split_properties=[line.split("=") for line in open('/<path_to_property_file>)]
properties={key: value for key,value in split_properties }
Please have a look at below post for details
https://ilearnonlinesite.wordpress.com/2017/07/24/reading-property-file-in-python-using-comprehension-and-generators/

you can use parameter "fromfile_prefix_chars" with argparse to read from config file as below---
temp.py
parser = argparse.ArgumentParser(fromfile_prefix_chars='#')
parser.add_argument('--a')
parser.add_argument('--b')
args = parser.parse_args()
print(args.a)
print(args.b)
config file
--a
hello
--b
hello dear
Run command
python temp.py "#config"

You could use - https://pypi.org/project/property/
eg - my_file.properties
foo = I am awesome
bar = ${chocolate}-bar
chocolate = fudge
long = a very long property that is described in the property file which takes up \
multiple lines can be defined by the escape character as it is done here
url=example.com/api?auth_token=xyz
user_dir=${HOME}/test
unresolved = ${HOME}/files/${id}/${bar}/
fname_template = /opt/myapp/{arch}/ext/{objid}.dat
Code
from properties.p import Property
## set use_env to evaluate properties from shell / os environment
prop = Property(use_env = True)
dic_prop = prop.load_property_files('my_file.properties')
## Read multiple files
## dic_prop = prop.load_property_files('file1', 'file2')
print(dic_prop)
# Output
# {'foo': 'I am awesome', 'bar': 'fudge-bar', 'chocolate': 'fudge',
# 'long': 'a very long property that is described in the property file which takes up multiple lines
# can be defined by the escape character as it is done here', 'url': 'example.com/api?auth_token=xyz',
# 'user_dir': '/home/user/test',
# 'unresolved': '/home/user/files/${id}/fudge-bar/',
# 'fname_template': '/opt/myapp/{arch}/ext/{objid}.dat'}

I did this using ConfigParser as follows. The code assumes that there is a file called config.prop in the same directory where BaseTest is placed:
config.prop
[CredentialSection]
app.name=MyAppName
BaseTest.py:
import unittest
import ConfigParser
class BaseTest(unittest.TestCase):
def setUp(self):
__SECTION = 'CredentialSection'
config = ConfigParser.ConfigParser()
config.readfp(open('config.prop'))
self.__app_name = config.get(__SECTION, 'app.name')
def test1(self):
print self.__app_name % This should print: MyAppName

This is what i had written to parse file and set it as env variables which skips comments and non key value lines added switches to specify
hg:d
-h or --help print usage summary
-c Specify char that identifies comment
-s Separator between key and value in prop file
and specify properties file that needs to be parsed eg : python
EnvParamSet.py -c # -s = env.properties
import pipes
import sys , getopt
import os.path
class Parsing :
def __init__(self , seprator , commentChar , propFile):
self.seprator = seprator
self.commentChar = commentChar
self.propFile = propFile
def parseProp(self):
prop = open(self.propFile,'rU')
for line in prop :
if line.startswith(self.commentChar)==False and line.find(self.seprator) != -1 :
keyValue = line.split(self.seprator)
key = keyValue[0].strip()
value = keyValue[1].strip()
print("export %s=%s" % (str (key),pipes.quote(str(value))))
class EnvParamSet:
def main (argv):
seprator = '='
comment = '#'
if len(argv) is 0:
print "Please Specify properties file to be parsed "
sys.exit()
propFile=argv[-1]
try :
opts, args = getopt.getopt(argv, "hs:c:f:", ["help", "seprator=","comment=", "file="])
except getopt.GetoptError,e:
print str(e)
print " possible arguments -s <key value sperator > -c < comment char > <file> \n Try -h or --help "
sys.exit(2)
if os.path.isfile(args[0])==False:
print "File doesnt exist "
sys.exit()
for opt , arg in opts :
if opt in ("-h" , "--help"):
print " hg:d \n -h or --help print usage summary \n -c Specify char that idetifes comment \n -s Sperator between key and value in prop file \n specify file "
sys.exit()
elif opt in ("-s" , "--seprator"):
seprator = arg
elif opt in ("-c" , "--comment"):
comment = arg
p = Parsing( seprator, comment , propFile)
p.parseProp()
if __name__ == "__main__":
main(sys.argv[1:])

Lightbend has released the Typesafe Config library, which parses properties files and also some JSON-based extensions. Lightbend's library is only for the JVM, but it seems to be widely adopted and there are now ports in many languages, including Python: https://github.com/chimpler/pyhocon

You can use the following function, which is the modified code of #mvallebr. It respects the properties file comments, ignores empty new lines, and allows retrieving a single key value.
def getProperties(propertiesFile ="/home/memin/.config/customMemin/conf.properties", key=''):
"""
Reads a .properties file and returns the key value pairs as dictionary.
if key value is specified, then it will return its value alone.
"""
with open(propertiesFile) as f:
l = [line.strip().split("=") for line in f.readlines() if not line.startswith('#') and line.strip()]
d = {key.strip(): value.strip() for key, value in l}
if key:
return d[key]
else:
return d

this works for me.
from pyjavaproperties import Properties
p = Properties()
p.load(open('test.properties'))
p.list()
print p
print p.items()
print p['name3']

I followed configparser approach and it worked quite well for me. Created one PropertyReader file and used config parser there to ready property to corresponding to each section.
**Used Python 2.7
Content of PropertyReader.py file:
#!/usr/bin/python
import ConfigParser
class PropertyReader:
def readProperty(self, strSection, strKey):
config = ConfigParser.RawConfigParser()
config.read('ConfigFile.properties')
strValue = config.get(strSection,strKey);
print "Value captured for "+strKey+" :"+strValue
return strValue
Content of read schema file:
from PropertyReader import *
class ReadSchema:
print PropertyReader().readProperty('source1_section','source_name1')
print PropertyReader().readProperty('source2_section','sn2_sc1_tb')
Content of .properties file:
[source1_section]
source_name1:module1
sn1_schema:schema1,schema2,schema3
sn1_sc1_tb:employee,department,location
sn1_sc2_tb:student,college,country
[source2_section]
source_name1:module2
sn2_schema:schema4,schema5,schema6
sn2_sc1_tb:employee,department,location
sn2_sc2_tb:student,college,country

You can try the python-dotenv library. This library reads key-value pairs from a .env (so not exactly a .properties file though) file and can set them as environment variables.
Here's a sample usage from the official documentation:
from dotenv import load_dotenv
load_dotenv() # take environment variables from .env.
# Code of your application, which uses environment variables (e.g. from `os.environ` or
# `os.getenv`) as if they came from the actual environment.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Importance of getitem_ : HTML Parsing using Python - python

As described in special method names, the method name getitem() allows a class to override the foo[bar] syntax. Lists do this to provide both subscripting (e.g. foo[5]) and slicing (foo[1:5]). Dictionaries do this to provide key lookup.

Related

Python unit testing with PyTest, stuck on more advanced testing

Using python class with spark DataFrame to parse URL's

Retrieve XML parent and child attributes using Python and lxml

Python: Update XML-file using ElementTree while conserving layout as much as possible

Properties file in python (similar to Java Properties)

Categories

Resources