pyYAML, expected NodeEvent, but got DocumentEndEvent - python

I'm trying to dump a custom object, that is a kind of a list of objects. So I overrode the to_yaml method of the YAMLOBject class from which I set my class to inherit from:
#classmethod
def to_yaml(cls, dumper, data):
""" This methods defines how to save this class to a yml
file """
passage_list = []
for passage in data:
passage_dict = {
'satellite': passage.satellite.name,
'ground_station': passage.ground_station.name,
'aos': passage.aos,
'los': passage.los,
'tca': passage.tca,
}
passage_list.append(passage_dict)
passage_list_dict = {
'passages': passage_list
}
return dumper.represent(passage_list_dict)
When I call the yaml.dump method, the output file is created correctly with the correct data:
if save_to_file:
with open(save_to_file, 'w') as f:
yaml.dump(all_passages, f, default_flow_style=False)
but at the end of the execution I get a EmitterError: expected NodeEvent, but got DocumentEndEvent()
I believe it's related to not closing correctly the YAML document because when I was debugging my code I was getting save_to_file files that were missing the new line at the end of the document. Could it be? Or is it something else?

Your code does not work because dumper.represent doesn't return anything. You want to use dumper.represent_data instead.

Related

PyYAML: Custom/default constructor for nodes without tags

I have a YAML file for storing constants. Some of the entries have custom tags such as !HANDLER or !EXPR and these are easily handled by adding constructors to the YAML loader.
However, I also want to have a custom constructor for non-tagged nodes. Reason being, I want to add these non-tagged values to a dictionary for use elsewhere. These values need to be available before parsing finishes hence I can't just let parsing finish and then update the dictionary.
So with a YAML file like
sample_rate: 16000
input_file: !HANDLER
handler_fn: file_loader
handler_input: path/to/file
mode: w
I have a handler constructor
def file_handler_loader(loader, node):
params = loader.construct_mapping(node)
module = __import__('handlers.file_handlers', fromlist=[params.pop('handler_fn')])
func = getattr(module, params.pop('handler_fn'))
handler_input = params.pop('handler_input')
return func(handler_input, **params)
And a function initialize_constants
def _get_loader():
loader = FullLoader
loader.add_constructor('!HANDLER', file_handler_loader)
loader.add_constructor('!EXPR', expression_loader)
return loader
def initialize_constants(path_to_yaml: str) -> None:
try:
with open(path_to_yaml, 'r') as yaml_file:
constants = yaml.load(yaml_file, Loader=_get_loader())
except FileNotFoundError as ex:
LOGGER.error(ex)
exit(-1)
The goal is then to have a constructor for non-tagged entries in the YAML. I haven't been able to figure out though how to add a constructor for non-tagged entries. Ideally, the code would look like below
def default_constructor(loader, node):
param = loader.construct_scalar(node)
constants[node_name] = param
I've also attempted to add an resolver to solve the problem. The code below was tested but didn't work as expected.
loader.add_constructor('!DEF', default_constructor)
loader.add_implicit_resolver('!DEF', re.compile('.*'), first=None)
def default_constructor(loader, node):
# do stuff
In this case what happened was the node contained the value sample_rate and not the 16000 as expected.
Thanks in advance :)

Having trouble mocking a request in a class method using patch context manager in Python

I am trying to unit test a class method that makes an API request and returns a json file that is loaded into a variable as a dictionary. The request returns a json file as requested. Another test that I would like to implement is that it is accessing the correct link. I am using patch as a context manager with the mock module.
In summary, my app snowReport.py accesses a weather API that returns a weatherJson, and then it accesses that Json to determine if there will be snow in the forecast. The class is called Resort because it is meant specifically for ski resorts.
In my module, this is my __init__ function.
class Resort():
# kwargs is created so the user can pass in "96hr", "realtime", and or "360min"
def __init__(self, resortKey, *args):
# Check if you are in the current directory, if not, set it to the current directory
currentDir = os.getcwd()
if currentDir != D_NAME:
os.chdir(D_NAME)
else:
pass
# Checks if the user enters arguments to initiate json files or not
self.dataJSON = SKI_RESORT_JSON
self.args = args
if not args:
raise Exception("Invalid arg passed. Function arguments must contain one of '360min' and/or '96hr' and/or 'realtime'") # Note there is a hidden arg that is called test that does not access the api to do a mock test
# Opens json file to get location parameters
with open(SKI_RESORT_JSON, "r") as f:
resortDictList = json.load(f)
resortDict = resortDictList[resortKey]
self.name = resortDict["name"]
self.lon = resortDict["lon"]
self.lat = resortDict["lat"]
self.country = resortDict["country"]
self.weatherJsonRealTime = {}
self.weatherJson360Min = {}
self.weatherJson96hr = {}
The purpose of this __init__ function is to initialize variables and access a .json file where it pulls the location data.
The function I am trying to test is a class method that is the following:
def requestRealtime(self):
querystring = {
"lat": str(self.lat),
"lon": str(self.lon),
"unit_system": "si",
"fields": "precipitation,precipitation_type,temp,feels_like,wind_speed,wind_direction,sunrise,sunset,visibility,cloud_cover,cloud_base,weather_code",
"apikey": CLIMACELL_KEY,
}
response = requests.request("GET", URL_REALTIME, params=querystring)
if response.ok:
self.weatherJsonRealTime = json.load(response.text)
else:
return "Bad response"
self.nowTime = localTime(self.weatherJsonRealTime["observation_time"]["value"])
self.nowTemp = self.weatherJsonRealTime["temp"]["value"]
self.nowFeelsLike = self.weatherJsonRealTime["feels_like"]["value"]
self.nowPrecipitation = self.weatherJsonRealTime["precipitation"]["value"]
self.nowPrecipitationType = self.weatherJsonRealTime["precipitation_type"]["value"]
self.nowWindSpeed = self.weatherJsonRealTime["wind_speed"]["value"]
self.nowWindDirection = self.weatherJsonRealTime["wind_direction"]["value"]
self.nowCloudCover = self.weatherJsonRealTime["cloud_cover"]["value"]
return self.weatherJsonRealTime
I am trying to test the request, more specifically, I am trying to test that the request. To do so, I am using the patch context manager to mock the request.requests and set the return value to be a static test json file. My test code is as follows:
with open(".\\Resources\\test_realtimeJson.json", "r") as f:
testrealtimeDict = json.load(f)
#classmethod
def setUpClass(cls):
cls.testResort = snowReport.Resort("test_Location (Banff)", "96hr", "realtime", "360min")
os.chdir(D_NAME) # Set the directory back to D_NAME because that the snowReport.Resort class changes it's class
def test_requestRealtime(self):
with patch("snowApp.snowReport.requests.request") as mocked_request:
mocked_request.return_value.ok = True
mocked_request.return_value.request = testrealtimeDict
self.testResort.requestRealtime()
Assuming the mock works, I would then like to use the assertEqual function to test and see if the attributes that are created in the snowReport function returns the expected value based on the dictionary that I pass it.
When I run the test script, it throws me the error:
File "c:\users\steve\appdata\local\programs\python\python39\lib\json\__init__.py", line 339, in loads
raise TypeError(f'the JSON object must be str, bytes or bytearray, '
TypeError: the JSON object must be str, bytes or bytearray, not MagicMock
Since I am mocking the request and setting the return value to the testrealtimeDict that I have created, shouldn't self.weatherJsonRealtime = testrealtimeDict? Why is it throwing the type error? Also - is the unit test that I am planning appropriate for this application or is there a better or easier way to complete this?

BDD behave Python need to create a World map to hold values

I'm not too familiar with Python but I have setup a BDD framework using Python behave, I now want to create a World map class that holds data and is retrievable throughout all scenarios.
For instance I will have a world class where I can use:
World w
w.key.add('key', api.response)
In one scenario and in another I can then use:
World w
key = w.key.get('key').
Edit:
Or if there is a built in way of using context or similar in behave where the attributes are saved and retrievable throughout all scenarios that would be good.
Like lettuce where you can use world http://lettuce.it/tutorial/simple.html
I've tried this between scenarios but it doesn't seem to be picking it up
class World(dict):
def __setitem__(self, key, item):
self.__dict__[key] = item
print(item)
def __getitem__(self, key):
return self.__dict__[key]
Setting the item in one step in scenario A: w.setitem('key', response)
Getting the item in another step in scenario B: w.getitem('key',)
This shows me an error though:
Traceback (most recent call last):
File "C:\Program Files (x86)\Python\lib\site-packages\behave\model.py", line 1456, in run
match.run(runner.context)
File "C:\Program Files (x86)\Python\lib\site-packages\behave\model.py", line 1903, in run
self.func(context, *args, **kwargs)
File "steps\get_account.py", line 14, in step_impl
print(w.__getitem__('appToken'))
File "C:Project\steps\world.py", line 8, in __getitem__
return self.__dict__[key]
KeyError: 'key'
It appears that the World does not hold values here between steps that are run.
Edit:
I'm unsure how to use environment.py but can see it has a way of running code before the steps. How can I allow my call to a soap client within environment.py to be called and then pass this to a particular step?
Edit:
I have made the request in environment.py and hardcoded the values, how can I pass variables to environment.py and back?
It's called "context" in the python-behave jargon. The first argument of your step definition function is an instance of the behave.runner.Context class, in which you can store your world instance. Please see the appropriate part of the tutorial.
Have you tried the
simple approach, using global var, for instance:
def before_all(context):
global response
response = api.response
def before_scenario(context, scenario):
global response
w.key.add('key', response)
Guess feature can be accessed from context, for instance:
def before_feature(context, feature):
feature.response = api.response
def before_scenario(context, scenario):
w.key.add('key', context.feature.response)
You are looking for:
Class variable: A variable that is shared by all instances of a class.
Your code in Q uses Class Instance variable.
Read about: python_classes_objects
For instance:
class World(dict):
__class_var = {}
def __setitem__(self, key, item):
World.__class_var[key] = item
def __getitem__(self, key):
return World.__class_var[key]
# Scenario A
A = World()
A['key'] = 'test'
print('A[\'key\']=%s' % A['key'] )
del A
# Scenario B
B = World()
print('B[\'key\']=%s' % B['key'] )
Output:
A['key']=test
B['key']=test
Tested with Python:3.4.2
Come back and Flag your Question as answered if this is working for you or comment why not.
Defining global var in before_all hook did not work for me.
As mentioned by #stovfl
But defining global var within one of my steps worked out.
Instead, as Szabo Peter mentioned use the context.
context.your_variable_name = api.response
and just use
context.your_variable_name anywhere the value is to be used.
For this I actually used a config file [config.py] I then added the variables in there and retrieved them using getattr. See below:
WSDL_URL = 'wsdl'
USERNAME = 'name'
PASSWORD = 'PWD'
Then retrieved them like:
import config
getattr(config, 'USERNAME ', 'username not found')

Using python class with spark DataFrame to parse URL's

I'm trying to process URL's in a pyspark dataframe using a class that I've written and a udf. I'm aware of urllib and other url parsing libraries but for this case I need to use my own code.
In order to get the tld of a url I cross check it against the iana public suffix list.
Here's a simplification of my code
class Parser:
# list of available public suffixes for extracting top level domains
file = open("public_suffix_list.txt", 'r')
data = []
for line in file:
if line.startswith("//") or line == '\n':
pass
else:
data.append(line.strip('\n'))
def __init__(self, url):
self.url = url
#the code here extracts port,protocol,query etc.
#I think this bit below is causing the error
matches = [r for r in self.data if r in self.hostname]
#extra functionality in my actual class
i = matches.index(self.string)
try:
self.tld = matches[i]
# logic to find tld if no match
The class works in pure python so for example I can run
import Parser
x = Parser("www.google.com")
x.tld #returns ".com"
However when I try to do
import Parser
from pyspark.sql.functions import udf
parse = udf(lambda x: Parser(x).url)
df = sqlContext.table("tablename").select(parse("column"))
When I call an action I get
File "<stdin>", line 3, in <lambda>
File "<stdin>", line 27, in __init__
TypeError: 'in <string>' requires string as left operand
So my guess is that it's failing to interpret the data as a list of strings?
I've also tried to use
file = sc.textFile("my_file.txt")\
.filter(lambda x: not x.startswith("//") or != "")\
.collect()
data = sc.broadcast(file)
to open my file instead, but that causes
Exception: It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transforamtion. SparkContext can only be used on the driver, not in code that it run on workers. For more information, see SPARK-5063.
Any ideas?
Thanks in advance
EDIT: Apologies, I didn't have my code to hand so my test code didn't explain very well the problems I was having. The error I initially reported was a result of the test data I was using.
I've updated my question to be more reflective of the challenge I'm facing.
Why do you need a class in this case (the code for defining your class is incorrect, you never declared self.data before using it in the init method) the only relevant line that affects the output you want is self.string=string, so you are basically passing the identity function as udf.
The UnicodeDecodeError is due to an encoding issue in your file, it has nothing to do with your definition of the class.
The second error is in the line sc.broadcast(file) , details of which can be found here : Spark: Broadcast variables: It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transforamtion
EDIT 1
I would redefine your class structure as follows. You basically need to create the instance self.data by calling self.data = data before you can use it. Also anything that you write before the init method is executed irrespective of whether you call that class or not. So moving out the file parsing part will not have any effect.
# list of available public suffixes for extracting top level domains
file = open("public_suffix_list.txt", 'r')
data = []
for line in file:
if line.startswith("//") or line == '\n':
pass
else:
data.append(line.strip('\n'))
class Parser:
def __init__(self, url):
self.url = url
self.data = data
#the code here extracts port,protocol,query etc.
#I think this bit below is causing the error
matches = [r for r in self.data if r in self.hostname]
#extra functionality in my actual class
i = matches.index(self.string)
try:
self.tld = matches[i]
# logic to find tld if no match

Use decorators to retrieve jsondata if file exists, otherwise run method and then store output as json?

I've read a little bit about decorators without my puny brain understanding them fully, but I believe this is one of the cases where they would be of use.
I have a main method running some other methods:
def run_pipeline():
gene_sequence_fasta_files_made = create_gene_sequence_fasta_files()
....several other methods taking one input argument and having one output argument.
Since each method takes a long time to run, I'd like to store the result in a json object for each method. If the json file exists I load it, otherwise I run the method and store the result. My current solution looks like this:
def run_pipeline():
gene_sequence_fasta_files_made = _load_or_make(create_gene_sequence_fasta_files, "/home/myfolder/ff.json", method_input=None)
...
Problem is, I find this really ugly and hard to read. If it is possible, how would I use decorators to solve this problem?
Ps. sorry for not showing my attempts. I haven't tried anything since I'm working against a deadline for a client and do not have the time (I could deliver the code above; I just find it aesthetically displeasing).
Psps. definition of _load_or_make() appended:
def _load_or_make(method, filename, method_input=None):
try:
with open(filename, 'r') as input_handle:
data = json.load(input_handle)
except IOError:
if method_input == None:
data = method()
else:
data = method(method_input)
with open(filename, 'w+') as output_handle:
json.dump(data, output_handle)
return data
Here's a decorator that tries loading json from the given filename, and if it can't find the file or the json load fails, it runs the original function, writes the result as json to disk, and returns.
def load_or_make(filename):
def decorator(func):
def wraps(*args, **kwargs):
try:
with open(filename, 'r') as f:
return json.load(input_handle)
except Exception:
data = func(*args, **kwargs)
with open(filename, 'w') as out:
json.dump(data, out)
return data
return wraps
return decorator
#load_or_make(filename)
def your_method_with_arg(arg):
# do stuff
return data
#load_or_make(other_filename)
def your_method():
# do stuff
return data
Note that there is an issue with this approach: if the decorated method returns different values depending on the arguments passed to it, the cache won't behave properly. It looks like that isn't a requirement for you, but if it is, you'd need to pick a different filename depending on the arguments passed in (or use pickle-based serialization, and just pickle a dict of args -> results). Here's an example of how to do it using a pickle approach, (very) loosely based on the memoized decorator Christian P. linked to:
import pickle
def load_or_make(filename):
def decorator(func):
def wrapped(*args, **kwargs):
# Make a key for the arguments. Try to make kwargs hashable
# by using the tuple returned from items() instead of the dict
# itself.
key = (args, kwargs.items())
try:
hash(key)
except TypeError:
# Don't try to use cache if there's an
# unhashable argument.
return func(*args, **kwargs)
try:
with open(filename) as f:
cache = pickle.load(f)
except Exception:
cache = {}
if key in cache:
return cache[key]
else:
value = func(*args, **kwargs)
cache[key] = value
with open(filename, "w") as f:
pickle.dump(cache, f)
return value
return wrapped
return decorator
Here, instead of saving the result as json, we pickle the result as a value in a dict, where the key is the arguments provided to the function. Note that you would still need to use a different filename for every function you decorate to ensure you never got incorrect results from the cache.
Do you want to save the results to disk or is in-memory okay? If so, you can use the memoize decorator / pattern, found here: https://wiki.python.org/moin/PythonDecoratorLibrary#Memoize
For each set of unique input arguments, it saves the result from the function in memory. If the function is then called again with the same arguments, it returns the result from memory rather than trying to run the function again.
It can also be altered to allow for a timeout (depending on how long your program runs for) so that if called after a certain time, it should re-run and re-cache the results.
A decorator is simply a callable that takes a function (or a class) as an argument, does something with/to it, and returns something (usually the function in a wrapper, or the class modified or registered):
Since Flat is better than nested I like to use classes if the decorator is at all complex:
class GetData(object):
def __init__(self, filename):
# this is called on the #decorator line
self.filename = filename
self.method_input = input
def __call__(self, func):
# this is called by Python with the completed def
def wrapper(*args, **kwds):
try:
with open(self.filename) as stored:
data = json.load(stored)
except IOError:
data = func(*args, **kwds)
with open(self.filename, 'w+') as stored:
json.dump(data, stored)
return data
return wrapper
and in use:
#GetData('/path/to/some/file')
def create_gene_sequence_fasta_files('this', 'that', these='those'):
pass
#GetData('/path/to/some/other/file')
def create_gene_sequence_fastb_files():
pass
I am no expert in python's decorator.I just learn it from a tutorial.But i think this can help u.But u may not get more readablity from it.
Decorator is a way to give ur different function the similar solution to deal with things,without make ur code mess or losing their readablity.It seems like transparent to the rest of ur code.
def _load_or_make(filename):
def _deco(method):
def __deco():
try:
with open(filename, 'r') as input_handle:
data = json.load(input_handle)
return data
except IOError:
if method_input == None:
data = method()
else:
data = method(method_input)
with open(filename, 'w+') as output_handle:
json.dump(data, output_handle)
return data
return __deco
return _deco
#_load_or_make(filename)
def method(arg):
#things need to be done
pass
return data

Categories

Resources