Import dictionary in loop in Python - python

My question is related to this one. I have a config_file consisting of dictionaries as shown below:
config_1 = {
'folder': 'raw1',
'start_date': '2019-07-01'
}
config_2 = {
'folder': 'raw2',
'start_date': '2019-08-01'
}
config_3 = {
'folder': 'raw3',
'start_date': '2019-09-01'
}
I then have a separate python file that imports each config and does some stuff:
from config_file import config_1 as cfg1
Do some stuff using 'folder' and 'start_date'
from config_file import config_2 as cfg2
Do some stuff using 'folder' and 'start_date'
from config_file import config_2 as cfg3
Do some stuff using 'folder' and 'start_date'
I would like to put this in a loop rather than have it listed 3 times in the python file. How can I do that?

If I understand your question correctly, just use importlib. In a nutshell, what in python you write like:
from package import module as alias_mod
in importlib it becomes:
alias_mod = importlib.import_module('module', 'package')
or, equivalentelly:
alias_mod = importlib.import_module('module.package')
for example:
from numpy import random as rm
in importlib:
rm = importlib.import_module('random', 'numpy')
Another interesting thing is this code proposed in this post, which allows you to import not only modules and packages but also directly functions and more:
def import_from(module, name):
module = __import__(module, fromlist=[name])
return getattr(module, name)
For your specific case, this code should work:
import importlib
n_conf = 3
for in range(1, n_conf)
conf = importlib.import_module('config_file.config_'+str(i))
# todo somethings with conf
However, if I can give you some advice I think the best thing for you is to build a json configuration file and read the file instead of importing modules. It's much more comfortable. For example in your case, you can create a config.json file like this:
{
"config_1": {
"folder": "raw1",
'start_date': '2019-07-01'
},
"config_2": {
'folder': 'raw2',
'start_date': '2019-08-01'
},
"config_3": {
'folder': 'raw3',
'start_date': '2019-09-01'
}
}
Read the json file as follows:
import json
with open('config.json') as json_data_file:
conf = json.load(json_data_file)
Now you have in memory a simple python dictionary with the configuration settings that interest you:
conf['config_1']
# output: {'folder': 'raw1', 'start_date': '2019-07-01'}

You can use inspect module to get all possible imports from config like following.
import config
import inspect
configs = [member[1] for member in inspect.getmembers(config) if 'config_' in member[0]]
configs
And then you can iterate over all configs, is this the behavior you wanted?
You can read more about inspect here
.

Based on #MikeMajara's comment, the following solution worked for me:
package = 'config_file'
configs = ['config_1', 'config_2', 'config_3']
for i in configs:
cfg = getattr(__import__(package, fromlist=[configs]), i)
Do some stuff using 'folder' and 'start_date'

Related

Creating local modules using cdktf

I am trying to create a Terraform module in CDKTF following the documentation at https://developer.hashicorp.com/terraform/cdktf/concepts/modules
I created the module as shown in the code:
#!/usr/bin/env python
from constructs import Construct
from cdktf import App, TerraformStack, TerraformOutput
from imports.aws.provider import AwsProvider
from imports.aws.s3_bucket import S3Bucket
class MyStack(TerraformStack):
def __init__(self, scope: Construct, id: str):
super().__init__(scope, id)
AwsProvider(self, "AWS", region="eu-west-1", profile="<MY-PROFILE>")
bucket_name = '<MY-BUCKET-NAME>'
# define resources here
s3_bucket = S3Bucket(
self, 'testBucket',
bucket=bucket_name
)
TerraformOutput(self, "my_output", value=s3_bucket.arn)
app = App()
MyStack(app, "s3_module")
app.synth()
Then, I created another folder for the main stack and added the module in the 'cdktf.json' file
{
"language": "python",
"app": "pipenv run python main.py",
"projectId": "e2b44a02-65b2-42de-ab52-3863d211c94c",
"sendCrashReports": "true",
"terraformProviders": [
"hashicorp/aws#~>4.0"
],
"terraformModules": [{
"name": "s3_module",
"source": "../s3_module"
}],
"codeMakerOutput": "imports",
"context": {
"excludeStackIdFromLogicalIds": "true",
"allowSepCharsInLogicalIds": "true"
}
}
I also called the module in main.py:
#!/usr/bin/env python
from constructs import Construct
from cdktf import App, TerraformStack
from imports.s3_module import S3Module
from imports.aws.provider import AwsProvider
class MyStack(TerraformStack):
def __init__(self, scope: Construct, id: str):
super().__init__(scope, id)
AwsProvider(self, "AWS", region="eu-west-1", profile="<MY-PROFILE>")
# define resources here
my_module = S3Module(self, 's3_module')
app = App()
MyStack(app, "service_v2")
app.synth()
However, when I run "cdktf deploy", Terraform does not add any resources:
Can you help me understand where the error is? Thank you
My understanding is that the terraformModules field in the cdktf.json file is for communicating which Terraform modules you want to use, i.e. directories that contain .tf files.
This is what allows the cdktf get command to know which Python bindings you need so that it can generate them into your imports/ directory. Enabling the import statements to work.
If instead you would like to factor out parts of your infrastructure configuration into separate Python modules, you could adapt a strategy where you subclass from Construct, as follows:
Example module: s3_bucket.py:
import constructs
import cdktf
from imports.aws.provider import AwsProvider
from imports.aws.s3_bucket import (
S3Bucket
)
class S3BucketInfra(constructs.Construct):
def __init__(self, scope: constructs.Construct,
construct_id: str,
bucket_name: str,
provider: AwsProvider):
super().__init__(scope=scope, id=construct_id)
self.bucket = S3Bucket( self,
'bucket',
bucket=bucket_name,
force_destroy=False,
provider=provider )
Example main module: main.py:
import constructs
import cdktf
from s3_bucket import (
S3BucketInfra
)
class SomeExampleStack(cdktf.TerraformStack):
def __init__(self, scope: constructs.Construct, construct_id: str):
super().__init__(scope=scope, id=construct_id)
self.some_bucket = S3BucketInfra( scope=self,
construct_id='some-bucket',
bucket_name='testBucket',
provider=AwsProvider(self, "AWS", region="eu-west-1", profile="<MY-PROFILE>") )
app = cdktf.App()
SomeExampleStack(scope=app, construct_id="service_v2")
app.synth()
I am pretty new to CDKTF myself and the documentation is a little thin, but this is what worked for me :)

create utils.py in AWS lambda

I had a def hello() function in my home/file.py file. I created a home/common/utils.pyfile and moved the function there.
Now, I want to import it in my file file.py.
I imported it like this: from utils import hello and from common.utils import hello and the import in my file doesn't throw an error. However, when I run it on AWS Lambda, I get an error that:
Runtime.ImportModuleError: Unable to import module 'file': No module named 'utils'
How can I fix this? without having to use Ec2 or something...
data "archive_file" "file_zip" {
type = "zip"
source_file = "${path.module}/src/file.py"
output_file_mode = "0666"
output_path = "${path.module}/bin/file.zip"
}
The deployment package that you're uploading only contains your main Python script (file.py). Specifically, it does not include any dependencies such as common/utils.py. That's why the import fails when the code runs in Lambda.
Modify the creation of your deployment package (file.zip) so that it includes all needed dependencies.
For example:
data "archive_file" "file_zip" {
type = "zip"
output_file_mode = "0666"
output_path = "${path.module}/bin/file.zip"
source {
content = file("${path.module}/src/file.py")
filename = "file.py"
}
source {
content = file("${path.module}/src/common/utils.py")
filename = "common/utils.py"
}
}
If all of your files happen to be in a single folder then you can use source_dir instead of indicating the individual files.
Note: I don't use Terraform so the file(...) with embedded interpolation may not be 100% correct, but you get the idea.
First of all, properly follow this standard URL:- https://docs.aws.amazon.com/lambda/latest/dg/python-package.html (refer section with title:- Deployment package with dependencies)
Now, if you notice, in the end of section,
zip -g my-deployment-package.zip lambda_function.py
Follow the same command for your utils file:-
zip -g my-deployment-package.zip common/
zip -g my-deployment-package.zip common/utils.py
Ensure that, in lambda_function, you are using proper import statement like:-
from common.utils import util_function_name
Now, you can upload this zip and test for yourself. It should run.
Hope this helps.

Export all Airflow variables

I have a problem with downloading all Airflow variables from the code.
There is an opportunity to export from UI, but i haven't found any way to do it programatically.
I discovered only Variable.get('variable_name') method which returns one Airflow variable.
There is no variants of getting the list of Airflow variables.
Searching in the source code didn't help as well.
Do you know some easy way?
Thank you in advance.
You can use Airflow CLI to export variables to a file and then read it from your Python code.
airflow variables --export FILEPATH
Programmatically you can use the BashOperator to achieve this.
I like the answer above about using the Airflow CLI, but it is also possible to extract all variables from a purely python point of view as well (so no need to do weird tricks to get it)
Use this code snippet:
from airflow.utils.db import create_session
from airflow.models import Variable
# a db.Session object is used to run queries against
# the create_session() method will create (yield) a session
with create_session() as session:
# By calling .query() with Variable, we are asking the airflow db
# session to return all variables (select * from variables).
# The result of this is an iterable item similar to a dict but with a
# slightly different signature (object.key, object.val).
airflow_vars = {var.key: var.val for var in session.query(Variable)}
The above method will query the Airflow sql database and return all variables.
Using a simple dictionary comprehension will allow you to remap the return values to a 'normal' dictionary.
The db.session.query will raise a sqlalchemy.exc.OperationalError if it is unable to connect to a running Airflow db instance.
If you (for whatever reason) wish to mock create_session as part of a unittest, this snippet can be used:
from unittest import TestCase
from unittest.mock import patch, MagicMock
import contextlib
import json
mock_data = {
"foo": {
"bar": "baz"
}
}
airflow_vars = ... # reference to an output (dict) of aforementioned method
class TestAirflowVariables(TestCase)
#contextlib.contextmanager
def create_session(self):
"""Helper that mocks airflow.settings.Session().query() result signature
This is achieved by yielding a mocked airflow.settings.Session() object
"""
session = MagicMock()
session.query.return_value = [
# for the purpose of this test mock_data is converted to json where
# dicts are encountered.
# You will have to modify the above method to parse data from airflow
# correctly (it will send json objects, not dicts)
MagicMock(key=k, val=json.dumps(v) if isinstance(v, dict) else v)
for k, v in mock_data.items()
]
yield session
#patch("airflow.utils.db")
def test_data_is_correctly_parsed(self, db):
db.create_session = self.create_session
self.assertDictEqual(airflow_vars, mock_data)
Note: you will have to change the patch to however you are importing the create_session method in the file you are referencing. I only got it to work by importing up until airflow.utils.db and calling db.create_session in the aforementioned method.
Hope this helps!
Good luck :)
Taking into account all propositions listed above, here is a code snippet that can be used to export all Airflow variables and store them in your GCS:
import datetime
import pendulum
import os
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from airflow.operators.dummy_operator import DummyOperator
local_tz = pendulum.timezone("Europe/Paris")
default_dag_args = {
'start_date': datetime.datetime(2020, 6, 18, tzinfo=local_tz),
'email_on_failure': False,
'email_on_retry': False
}
with DAG(dag_id='your_dag_id',
schedule_interval='00 3 * * *',
default_args=default_dag_args,
catchup=False,
user_defined_macros={
'env': os.environ
}) as dag:
start = DummyOperator(
task_id='start',
)
export_task = BashOperator(
task_id='export_var_task',
bash_command='airflow variables --export variables.json; gsutil cp variables.json your_cloud_storage_path',
)
start >> \
export_task
I have issue with using BashOperator for this use case, so I copied the result of the bashcommand to a variable and used it inside my program.
import subprocess
output = (subprocess.check_output("airflow variables", shell=True)).decode('utf-8').split('pid=')[1].split()[1:-1]
print(output)

how to write my AWS lambda function?

I am new to AWS lambda function and i am trying to add my existing code to AWS lambda. My existing code looks like :
import boto3
import slack
import slack.chat
import time
import itertools
from slacker import Slacker
ACCESS_KEY = ""
SECRET_KEY = ""
slack.api_token = ""
slack_channel = "#my_test_channel"
def gather_info_ansible():
.
.
def call_snapshot_creater(data):
.
.
def call_snapshot_destroyer(data):
.
.
if __name__ == '__main__':
print "Calling Ansible Box Gather detail Method first!"
ansible_box_info = gather_info_ansible()
print "Now Calling the Destroyer of SNAPSHOT!! BEHOLD THIS IS HELL!!"
call_snapshot_destroyer(ansible_box_info)
#mapping = {i[0]: [i[1], i[2]] for i in data}
print "Now Calling the Snapshot Creater!"
call_snapshot_creater(ansible_box_info)
Now i try to create a lambda function from scratch on AWS Console as follows (a hello world)
from __future__ import print_function
import json
print('Loading function')
def lambda_handler(event, context):
#print("Received event: " + json.dumps(event, indent=2))
print("value1 = " + event['key1'])
print("value2 = " + event['key2'])
print("value3 = " + event['key3'])
print("test")
return event['key1'] # Echo back the first key value
#raise Exception('Something went wrong')
and the sample test event on AWS console is :
{
"key3": "value3",
"key2": "value2",
"key1": "value1"
}
I am really not sure how to put my code in AWS lambda coz if i even add the modules in lambda console and run it it throws me error :
Unable to import module 'lambda_function': No module named slack
How to solve this and import my code in lambda?
You have to make a zipped package consisting of your python script containing the lambda function and all the modules that you are importing in the python script. Upload the zipped package on aws.
Whatever module you want to import, you have to include that module in the zip package. Only then the import statements will work.
For example your zip package should consist of
test_package.zip
|-test.py (script containing the lambda_handler function)
|-boto3(module folder)
|-slack(module folder)
|-slacker(module folder)
You receive an error because AWS lambda does not have any information about a module called slack.
A module is a set of .py files that are stored somewhere on a computer.
In case of lambda, you should import all your libraries by creating a deployment package.
Here is an another question that describes similar case and provides several solutions:
AWS Lambda questions

Python import from parent package

I'm having some trouble with imports in Python.
Here is a simple example of what's going wrong.
I have a directory structure like this:
app
|---__init__.py
|---sub_app
|---__init__.py
The code:
app/__init__.py
shared_data = {
'data': 123
}
from sub_app import more_shared_data
print more_shared_data
app/sub_app/__init__.py
more_shared_data = {
'data': '12345'
}
from app import shared_data
print shared_data
However I get the error:
ImportError: No module named app
How do I import the shared_data dict, into app/sub_app/__init__.py?
You can use relative imports for this . Example -
In your app/sub_app/__init__.py -
more_shared_data = {
'data': '12345'
}
from .. import shared_data
print shared_data
This should work for the simple example you have provided , but it does lead to circular import , app is importing sub_app and sub_app is importing app .
For more complex usecases, you can end up with errors (If you import sub_app) before defining specific elements, and then in sub_app/__init__.py you try to import app and use those elements that are only defined after the import statement for sub_app . A very simple example where it would cause issue -
app/__init__.py -
from .sub_app import more_shared_data
print(more_shared_data)
shared_data = {
'data': 123
}
app/sub_app/__init__.py -
more_shared_data = {
'data': '12345'
}
from .. import shared_data
print(shared_data)
Now, if you try to import app , you will get the error -
>>> import app
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<some file>\__init__.py", line 1, in <module>
from .shared import more_shared_data
File "<some file>\sub_app\__init__.py", line 4, in <module>
from .. import shared_data
ImportError: cannot import name 'shared_data'
You should re-think if shared_data does belong to app/__init__.py , or it can be moved to sub_app/__init__.py and then imported from there in app .
You have a few problems here, one of which is hidden.
It looks like you are trying to invoke your program like:
python app/__init__.py
python app/sub_app/__init__.py
This causes problems in that the directory of the main file is considered to be the root directory of the program. That is, why sub_app cannot see app.
You can instead invoke your programs like such
python -m app.sub_app
This way python assumed the current directory is the root directory and looks for the module app.sub_app under this directory. This causes another problem though. To be able to run a package you need to provide a __main__.py in the package (as well as __init__.py). If the modules do not import each other, then order of invocation will be app/__init__.py, app/sub_app/__init__.py, and then app/sub_app/__main__.py
app
|---__init__.py
|---sub_app
|---__init__.py
|---__main__.py
app/sub_app/__main__.py can do stuff like:
from app import shared_data
from . import more_shared_data
# or
from app.sub_app import more_shared_data
Finally, the hidden problem is that you have circular imports. That is, app depends on sub_app and sub_app depends on app. That both app and sub_app need the other to be loaded, before they can load -- this is of course impossible. You should refactor your code to avoid circular imports.

Categories

Resources