Pytest fails on importing pyhive

Pytest fails on importing pyhive - python

I am working on writing some data tests. Super simple nothing crazy.
Here is what my current directory looks like.
.
├── README.md
├── hive_tests
│   ├── __pycache__
│   ├── schema_checks_hive.py
│   ├── test_schema_checks_hive.py
│   └── yaml
│   └── job_output.address_stats.yaml
└── postgres
├── __pycache__
├── schema_checks_pg.py
├── test_schema_checks_pg.py
└── yaml
When I cd in to postgres and run pytest all my tests pass.
When I cd in to hive_test and run pytest I am getting an import error.
Here is my schema_checks_hive.py file.
from pyhive import hive
import pandas as pd
import numpy as np
import os, sys
import yaml
def check_column_name_hive(schema, table):
query = "DESCRIBE {0}.{1}".format(schema, table)
df = pd.read_sql_query(query, conn)
print(df.columns)
return df.columns
check_column_name_hive('myschema', 'mytable')
Here is my test_schema_checks_hive.py file where the tests are located.
import schema_checks_hive as sch
import pandas as pd
import yaml
import sys, os
def test_column_names_hive():
for filename in os.listdir('yaml'):
data = ""
with open("yaml/{0}".format(filename), 'r') as stream:
data = yaml.safe_load(stream)
schema = data['schema']
table = data['table']
cols = data['columns']
df = sch.check_column_name_hive(schema, table)
assert len(cols) == len(df)
assert cols == df.tolist()
When I run Pytest I get an error that says:
mportError while importing test module '/Usersdata/
tests/hive_tests/test_schema_checks_hive.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
test_schema_checks_hive.py:1: in <module>
import schema_checks_hive as sch
schema_checks_hive.py:1: in <module>
from pyhive import hive
E ModuleNotFoundError: No module named 'pyhive
I would love any help! Thanks so much.

Related

Using inventory group variables in dynamic inventory script

This question is exactly like this stackoverflow question. And I am hoping I get some different answer with this one. I have been trying to achieve this for over a year. but can't seem to achieve it.
Stripped down version of my ansible directory looks like this:
[root#python-test ansible]# tree
.
├── files
├── inventory
│   ├── prod
│   │   ├── group_vars
│   │   │   └── all
│   │   └── hosts -> ../../scripts/inventory.py
│   └── staging
│   ├── group_vars
│   │   └── all
│   └── hosts -> ../../scripts/inventory.py
├── roles
│   ├── ant
│   ├── build
│   ├── jdk
│   └── python
├── scripts
│   └── inventory.py
├── templates
└── vars
└── all.yaml
I would like to use some of the variables declared in group_vars/all file. It has some endpoint details that I can use during the script execution. A striped down version of this group_vars/all looks like this:
inv_cloudprovider: aws
inv_environment: prod
inv_environment_type: production
inv_vpc_cidr: 10.0.0.0/16
inv_build_url: 'https://build.{{inv_environment}}.local'
inv_cloud_vpc_name: '{{inv_environment}}-vpc'
inv_vpc_id: '{{inv_environment_type}}-{{inv_cloud_vpc_name}}'
inv_glassfish_version: 4
At this point I am loading this file using yaml.safe_loads() and then using them in script execution. but problem with that is there are variables which are recursive jinja templates. and I am having hard time getting to make those variables work. So I was wondering if I can use ansible to do this for me.
I am using ansible==2.2.3.0 and python 2.7.
The closest I have been to achieving this in python by doing this:
from ansible.parsing.dataloader import DataLoader
from ansible.vars import VariableManager
from ansible.inventory import Inventory
inventory_path = '/root/ansible/inventory/prod/hosts'
inventory = Inventory(DataLoader(), VariableManager(), inventory_path)
group = inventory.get_group('all')
group_vars = group.get_vars()
Which is probably not the correct way because my script tries to execute itself and that goes on recursively.
Is it possible to parse group vars variable using ansible? If no, how do I best get final value of variables from that file?

The stripped down version of code that can do what this question asked is:
#!/usr/bin/env python
import json
import os
import sys
import yaml
from jinja2 import Environment
class VarLoader(yaml.SafeLoader):
#staticmethod
def construct_python_string(text):
return str(text.value)
def yaml_remove_parse(self, text):
self.construct_python_string(text)
env_name = os.environ.get('ENV_NAME')
script_path = os.path.dirname(os.path.abspath(__file__))
ansible_dir_path = os.path.abspath(script_path + '/../')
global_vars_path = ansible_dir_path + '/vars/all.yaml'
inventory_vars_path = ansible_dir_path + '/inventory/' + env_name + '/group_vars/all'
with open(global_vars_path) as f1, open(inventory_vars_path) as f2:
global_vars_data = f1.read()
inventory_vars_data = f2.read()
all_vars_data = global_vars_data + '\n' + inventory_vars_data
VarLoader.add_constructor('tag:yaml.org,2002:float', VarLoader.yaml_remove_parse)
data = yaml.load(all_vars_data, Loader=VarLoader)
jinja_env = Environment()
template_file = jinja_env.from_string(all_vars_data)
for key in data:
val = data[key]
if isinstance(val, str) and '{{' in val:
template = jinja_env.from_string(val)
data[key] = template.render(**data)
yaml_data = yaml.load(template_file.render(data), Loader=VarLoader)
print(json.dumps(yaml_data))
This works with both python3 and python2. Output:
[root#python-test ansible]# python scripts/inventory.py | jq .
{
"inv_environment": "prod",
"inv_environment_type": "production",
"inv_cloudprovider": "aws",
"gv_redis_port": 6379,
"inv_glassfish_version": 4,
"inv_vpc_id": "production-prod-vpc",
"gv_redis_version": "4.0.11",
"inv_build_url": "https://build.prod.local",
"gv_glassfish_admin_user": "admin",
"inv_vpc_cidr": "10.0.0.0/16",
"gv_glassfish_asadmin_path": "/usr/local/glassfish/bin/asadmin",
"gv_java_home": "/usr/local/java/default",
"inv_cloud_vpc_name": "prod-vpc",
"gv_tomcat_home": "/usr/local/tomcat",
"gv_java_path": "/usr/local/java/default/bin/java"
}

ModuleNotFoundError no module named in Terminal on Mac

I am very new to python and still learning how it all works. I'm building a tiny card game for a python class and decided to build it using Sublime Text and Github to teach myself how it all works.
The (very small) directory looks like this:
/GitHub
├── python-practice
└── war
├── __init_.py
├── __main__.py
├── config
│   ├── __init__.py
│   ├── __pycache__
│   │   ├── card.cpython-38.pyc
│   │   ├── deck.cpython-38.pyc
│   │   └── player.cpython-38.pyc
│   ├── card.py
│   ├── card.pyc
│   ├── deck.py
│   ├── deck.pyc
│   ├── deck_test.py
│   └── player.py
└── game
├── __init__.py
└── logic.py
But no matter if I use the compiler in terminal or the system interpreter in sublime to run logic.py it always comes back:
Traceback (most recent call last):
File "game/logic.py", line 1, in <module>
import card, deck, player
ModuleNotFoundError: No module named 'card'
...even though it's clearly there. I've tried from config import card, deck, player and I've also tried import config.
It would be easy and simple to just put all my modules in one folder but I'm trying to teach myself directories so it feels important that I figure this out. How do I get my script to correctly import modules from another folder adjacent to it in the same directory? What am I missing?
edit: adding my code from the modules below.
card.py
values = {'Two':2, 'Three':3, 'Four':4, 'Five':5, 'Six':6, 'Seven':7, 'Eight':8, 'Nine':9, 'Ten':10, 'Jack':11, 'Queen':12, 'King':13, 'Ace':14}
suits = ('Hearts', 'Diamonds', 'Spades', 'Clubs')
ranks = ('Two','Three','Four','Five','Six','Seven','Eight','Nine','Ten','Jack','Queen','King','Ace')
class Card():
def __init__(self,suit,rank):
self.suit = suit
self.rank = rank
self.value = values[rank]
def __str__(self):
return self.rank+" of "+ self.suit
logic.py (unfinished, WIP)
from config.card import Card
from config.deck import Deck
from config.player import Player
#Game setup
player_one = player.Player("One")
player_two = player.Player("Two")
new_deck = deck.Deck()
new_deck.shuffle()
for x in range(26):
player_one.add_cards(new_deck.deal_one())
player_two.add_cards(new_deck.deal_one())
game_on = True
round_num = 0
while game_on:
if len(player_one.all_cards) == 0:
print('Player One, out of cards! Player Two Wins!')
game_on = False
break
elif len(player_two.all_cards) == 0:
print('Player Two, out of cards! Player One Wins!')
game_on = False
break
else:
round_num+=1
print(f'Round {round_num}')
#NEW ROUND
player_one_cards = []
player_one_cards.append(player_one.remove_one())
player_two_cards = []
player_two_cards.append(player_two.remove_one())
pass

Something like this should work:
import sys
from pathlib import Path
sys.path.append(str(Path('.').absolute().parent))
from config.card import Card
from config.deck import Deck
from config.player import Player
This basically allows you to import stuff from the parent directory, which is what you need.

Pytest needs to specify folder project when importing modules

I am working with python 3.7 with anaconda and visual code.
I have a project folder called providers.
Inside providers I have two folders:
├── Providers
├── __init__.py
|
├── _config
| ├── database_connection.py
| ├── __init__.py
├── src
| ├── overview.py
| ├── __init__.py
├── utils
| ├── pandas_functions.py
| ├── __init__.py
I want to import a class named DatabaseConnection that is inside the file database_connection.py in the file ovewview.py
# Overview.py
from config.database_connection import DatabaseConnection
This works as expected.
I want to run some tests, I am using pytest, and the script looks like this:
from unittest.mock import patch, Mock
import pandas as pd
from config.database_connection import DatabaseConnection
from providers.utils.pandas_functions import get_df
#patch("providers.utils.pandas_functions.pd.read_sql")
def test_get_df(read_sql: Mock):
read_sql.return_value = pd.DataFrame({"foo_id": [1, 2, 3]})
results = get_df()
read_sql.assert_called_once()
pd.testing.assert_frame_equal(results, pd.DataFrame({"bar_id": [1, 2, 3]}))
But is giving me this error:
plugins: hypothesis-5.5.4, arraydiff-0.3, astropy-header-0.1.2, doctestplus-0.5.0, mock-3.4.0, openfiles-0.4.0, remotedata-0.3.2
collected 0 items / 1 error
===================================================================================================== ERRORS =====================================================================================================
________________________________________________________________________________ ERROR collecting tests/test_pandas_functions.py _________________________________________________________________________________
ImportError while importing test module 'C:\Users\jordi_adm\Documents\GitHub\mcf-pipelines\cpke-cash-advance\providers\tests\test_pandas_functions.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
tests\test_pandas_functions.py:3: in <module>
from config.database_connection import DatabaseConnection
E ModuleNotFoundError: No module named 'config'
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 1 error during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
================================================================================================ 1 error in 0.35s ================================================================================================
It cannot find the module config, while if I run the overview.py file it can import it without problems.
If import config.database_connection import DatabaseConnection with providers at the beginning, pytest is able to call this function:
from unittest.mock import patch, Mock
import pandas as pd
from providers.config.database_connection import DatabaseConnection
from providers.utils.pandas_functions import get_df
#patch("providers.utils.pandas_functions.pd.read_sql")
def test_get_df(read_sql: Mock):
read_sql.return_value = pd.DataFrame({"foo_id": [1, 2, 3]})
results = get_df()
read_sql.assert_called_once()
pd.testing.assert_frame_equal(results, pd.DataFrame({"bar_id": [1, 2, 3]}))
plugins: hypothesis-5.5.4, arraydiff-0.3, astropy-header-0.1.2, doctestplus-0.5.0, mock-3.4.0, openfiles-0.4.0, remotedata-0.3.2
collected 1 item
tests\test_pandas_functions.py . [100%]
=============================================================================================== 1 passed in 0.29s ================================================================================================
If I try to run a script inside of this project with providers at the beginning, for example modifying overview.py:
from providers.config.database_connection import DatabaseConnection
I obtain this error:
from providers.config.database_connection import DatabaseConnection
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
c:\Users\jordi_adm\Documents\GitHub\mcf-pipelines\cpke-cash-advance\providers\run.py in
----> 2 from providers.config.database_connection import DatabaseConnection
ModuleNotFoundError: No module named 'providers'
What is the reason to pytest needs to specify providers at the begging?

Possible circular import in flask?

I have this structure:
├── app
│   ├── __init__.py
│   └── views.py
├── requirements.txt
├── sources
│   └── passport
│   ├── field_mapping.
│   ├── listener.py
│   ├── main.py
this is my init file:
from flask import Flask
app = Flask(__name__)
from app import views
my views file. Is this the best way to send plain text?
from app import app
from flask import Response
from sources.app_metrics import meters
# from sources.passport.main import subscription_types
#app.route('/metrics')
def metrics():
def generateMetrics():
metrics = ""
for subscription in ["something", "some other thing"]:
metrics += "thing_{}_count {}\n".format(subscription, meters[subscription].get()['count'])
return metrics
print(generateMetrics())
return Response(generateMetrics(), mimetype='text/plain')
My sources/passport/main file looks like this:
subscription_types = ["opportunity", "account", "lead"]
if __name__ == "__main__":
loop = asyncio.get_event_loop()
...
for subscription in subscription_types():
I also ran export FLASK_ENV=app/__init__.py before running flask app
When I visit /metrics I get an error that looks like some kind of circular dependency error.
When I uncomment that import comment in my views, file, the error occurs.
Pulling out subscription_types into a variable and importing it seems to be causing the problem.
My stack trace:
File "/usr/local/lib/python3.7/site-packages/flask/cli.py", line 235, in locate_app
__import__(module_name)
File "/Users/jwan/extract/app/__init__.py", line 5, in <module>
from app import views
File "/Users/jwan//extract/app/views.py", line 5, in <module>
from sources.passport.main import subscription_types
File "/Users/jwan/extract/sources/passport/main.py", line 3, in <module>
from sources.passport.listener import subscribe, close_subscriptions
File "/Users/jwan/extract/sources/passport/listener.py", line 18, in <module>
QUEUE = boto3.resource("sqs").get_queue_by_name(QueueName=CONFIG["assertions_queue"][ENV])
botocore.errorfactory.QueueDoesNotExist: An error occurred (AWS.SimpleQueueService.NonExistentQueue) when calling the GetQueueUrl operation: The specified queue does not exist for this wsdl versio
My sources/passport/listener file has this on line 18:
import gzip
import log
from os import getenv
from sources.passport.normalizer import normalize_message
from sources.app_metrics import meters
QUEUE = boto3.resource("sqs").get_queue_by_name(QueueName=CONFIG["assertions_queue"][ENV])

Using a fake mongoDB for pytest testing

I have code that connects to a MongoDB Client and I'm trying to test it. For testing, I don't want to connect to the actual client, so I'm trying to figure out make a fake one for testing purposes. The basic flow of the code is I have a function somewhere the creates a pymongo client, then queries that and makes a dict that is used elsewhere.
I want to write some tests using pytest that will test different functions and classes that will call get_stuff. My problem is that get_stuff calls mongo() which is what actually makes the connection to the database. I was trying to just use pytest.fixture(autouse=True) and mongomock.MongoClient() to replace mongo().
But this isn't replacing the mongo_stuff.mongo(). Is there some way I can tell pytest to replace a function so my fixture is called instead of the actual function? I thought making the fixture would put my testing mongo() higher priority in the namespace than the function in the actual module.
Here is an example file structure with my example:
.
├── project
│   ├── __init__.py
│   ├── mongo_stuff
│   │   ├── __init__.py
│   │   └── mongo_stuff.py
│   └── working_class
│   ├── __init__.py
│   └── somewhere_else.py
└── testing
├── __init__.py
└── test_stuff.py
mongo_stuff.py
import pymongo
def mongo():
return pymongo.MongoClient(connection_params)
def get_stuff():
db = mongo() # Makes the connection using another function
stuff = query_function(db) # Does the query and makes a dict
return result
somewhere_else.py
from project.mongo_stuff import mongo_stuff
mongo_dict = mongo_stuff.get_stuff()
test_stuff.py
import pytest
import mongomock
#pytest.fixture(autouse=True)
def patch_mongo(monkeypatch):
db = mongomock.MongoClient()
def fake_mongo():
return db
monkeypatch.setattr('project.mongo_stuff.mongo', fake_mongo)
from poject.working_class import working_class # This starts by calling project.mongo_stuff.mongo_stuff.get_stuff()
And this will currently give me a connection error since the connection params in mongo_stuff.py are only made to work in the production environment. If I put the import statement from test_stuff.py into a test function, then it works fine and mongomock db will be used in the testing enviornment. I also tried change the setattr to monkeypatch.setattr('project.working_class.mongo_stuff.mongo', fake_mongo) which also does not work.

You're halfway there: you have created a mock for the db client, now you have to patch the mongo_stuff.mongo function to return the mock instead of a real connection:
#pytest.fixture(autouse=True)
def patch_mongo(monkeypatch):
db = mongomock.MongoClient()
def fake_mongo():
return db
monkeypatch.setattr('mongo_stuff.mongo', fake_mongo)
Edit:
The reason why you get the connection error is that you are importing somewhere_else on module level in test_stuff, and somewhere_else runs connection code also on module level. So patching with fixtures will come too late and will have no effect. You have to patch the mongo client before the import of somewhere_else if you want to import on module level. This will avoid the error raise, but is extremely ugly:
from project.mongo_stuff import mongo_stuff
import mongomock
import pytest
from unittest.mock import patch
with patch.object(mongo_stuff, 'mongo', return_value=mongomock.MongoClient()):
from project.working_class import somewhere_else
#patch.object(mongo_stuff, 'mongo', return_value=mongomock.MongoClient())
def test_db1(mocked_mongo):
mongo_stuff.mongo()
assert True
#patch.object(mongo_stuff, 'mongo', return_value=mongomock.MongoClient())
def test_db2(mocked_mongo):
somewhere_else.foo()
assert True
You should rather avoid running code on module level when possible, or run the imports that execute code on module level inside the tests (as you already found out in the comments).

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pytest fails on importing pyhive - python

Related

Using inventory group variables in dynamic inventory script

ModuleNotFoundError no module named in Terminal on Mac

Pytest needs to specify folder project when importing modules

Possible circular import in flask?

Using a fake mongoDB for pytest testing

Categories

Resources