Python Pandas Unit Test is not recognized

Python Pandas Unit Test is not recognized - python

I have an issue with unit test definition. I am going to test data frames and I do not understand why the following result is returned.
Result:
Ran 0 tests in 0.000s
OK
Script:
import unittest
import pandas as pd
from pandas._testing import assert_frame_equal
def df_minus(df_main:pd.DataFrame, df_subset:pd.DataFrame) -> pd.DataFrame :
return df_main
class TestDataFrameMinus(unittest.TestCase):
def df_minus_equal(self):
df_A = pd.DataFrame(data={
'col1': [1, 2, 3, 4]
}
)
df_B = pd.DataFrame(data={
'col1': [1, 2, 3]
}
)
df_result = pd.DataFrame(data={
'col1': [1, 2, 3]
}
)
assert_frame_equal(df_minus(df_A, df_B), df_result)
if __name__ == '__main__':
unittest.main()
Do you have any idea why the test is not visible?

You should name your methods with test_ prefix
def test_df_minus_equal(self):
pass

Related

Pytest: 27 Assertion Error: How to format the input and output parameters properly

I've seen similar questions to this but I can't figure it out for my own example, I have this code:
import ast
import pytest
import re
def autocomplete1(str,list_name):
return [i for i in list(set(list_name)) if i.startswith(str)]
def autocomplete2(str,list_name):
return(list(filter(lambda x: x.startswith(str), list(set(list_name)))))
def autocomplete3(str,list_name):
return([i for i in list_name if re.match(str,i)])
#fix the list
#pytest.mark.parametrize('input1, input2, output1', [('de',['dog','deer','deal'],['deer','deal']), ('ear',['earplug','earphone','airplane'],['earplug','earphone'])])
def test_function(input1,input2,output1):
assert autocomplete1(input1,input2) == output1
assert autocomplete2(input1,input2) == output1
assert autocomplete3(input1,input2) == output1
The Error is:
start_query_string.py FF [100%]
============================================================================================= FAILURES ==============================================================================================
________________________________________________________________________________ test_function[de-input20-output10] _________________________________________________________________________________
input1 = 'de', input2 = ['dog', 'deer', 'deal'], output1 = ['deer', 'deal']
#pytest.mark.parametrize('input1, input2, output1', [('de',['dog','deer','deal'],['deer','deal']), ('ear',['earplug','earphone','airplane'],['earplug','earphone'])])
def test_function(input1,input2,output1):
> assert autocomplete1(input1,input2) == output1
E AssertionError: assert ['deal', 'deer'] == ['deer', 'deal']
E At index 0 diff: 'deal' != 'deer'
E Use -v to get the full diff
start_query_string.py:27: AssertionError
________________________________________________________________________________ test_function[ear-input21-output11] ________________________________________________________________________________
input1 = 'ear', input2 = ['earplug', 'earphone', 'airplane'], output1 = ['earplug', 'earphone']
#pytest.mark.parametrize('input1, input2, output1', [('de',['dog','deer','deal'],['deer','deal']), ('ear',['earplug','earphone','airplane'],['earplug','earphone'])])
def test_function(input1,input2,output1):
> assert autocomplete1(input1,input2) == output1
E AssertionError: assert ['earphone', 'earplug'] == ['earplug', 'earphone']
E At index 0 diff: 'earphone' != 'earplug'
E Use -v to get the full diff
start_query_string.py:27: AssertionError
I've tried slightly editing the code in different ways (e.g. turning the input in a tuple) but I'd like to understand how to get this version working so I know what I'm doing wrong. Could someone show me what's wrong?

The point is that in autocomplete1 and autocomplete2 - set is unordered type so
there is two ways, as I see it, for function to return predictable results:
Sort list after all manipulations will be done (btw there is no need to do list(set(list_name)), you can iterate over set)
If you need specific order you can use OrderedDict
from collections import OrderedDict
l = [1, 1, 2, 3, 4, 5, 5, 2, 1]
result = list(OrderedDict.fromkeys(l))
print(result) # [1, 2, 3, 4, 5]
Full working code is
import ast
import pytest
import re
from collections import OrderedDict
def autocomplete1(str,list_name):
return [i for i in list(OrderedDict.fromkeys(list_name)) if i.startswith(str)]
def autocomplete2(str,list_name):
return(list(filter(lambda x: x.startswith(str), list(OrderedDict.fromkeys(list_name)))))
def autocomplete3(str,list_name):
return([i for i in list_name if re.match(str,i)])
#fix the list
#pytest.mark.parametrize('input1, input2, output1', [('de',['dog','deer','deal'],['deer','deal']), ('ear',['earplug','earphone','airplane'],['earplug','earphone'])])
def test_function(input1,input2,output1):
assert autocomplete1(input1,input2) == output1
assert autocomplete2(input1,input2) == output1
assert autocomplete3(input1,input2) == output1

Pyspark pass function as a parameter to UDF

I'm trying to create a UDF which takes another function as a parameter. But the execution ends up with an exception.
The code I run:
import pandas as pd
from pyspark import SparkConf, SparkContext, SQLContext
from pyspark.sql.types import MapType, DataType, StringType
from pyspark.sql.functions import udf, struct, lit
import os
sc = SparkContext.getOrCreate(conf=conf)
sqlContext = SQLContext(sc)
df_to_test = sqlContext.createDataFrame(
pd.DataFrame({
'inn': ['111', '222', '333'],
'field1': [1, 2, 3],
'field2': ['a', 'b', 'c']
}))
def foo_fun(row, b) -> str:
return 'a' + b()
def bar_fun():
return 'I am bar'
foo_fun_udf = udf(foo_fun, StringType())
df_to_test.withColumn(
'foo',
foo_fun_udf(struct([df_to_test[x] for x in df_to_test.columns]), bar_fun)
).show()
The exception:
Invalid argument, not a string or column: <function bar_fun at 0x7f0e69ce6268> of type <class 'function'>. For column literals, use 'lit', 'array', 'struct' or 'create_map' function.
I tried to wrap bar_fun into udf with no success. Is there a way to pass function as a parameter?

You are not so far from the solution. Here is how I would do it :
def foo_fun_udf(func):
def foo_fun(row) -> str:
return 'a' + func()
out_udf = udf(foo_fun, StringType())
return out_udf
df_to_test.withColumn(
'foo',
foo_fun_udf(bar_fun)(struct([df_to_test[x] for x in df_to_test.columns]))
).show()

Python How to hide traceback in unittest

guys, I have a unittest was where I don't want to throw exception when find one but wait until the end and output raise it at the end.:
from unittest import TestCase
class TestA(TestCase):
def setUp(self):
pass
def tearDown(self):
pass
def test_lst(self):
a = [1, 2, 3, 4, 5]
b = [1, 3, 3, 5, 5]
total_errs_count = 0
total_errs_msg = []
for i in range(5):
try:
self.assertEqual(a[i], b[i])
except AssertionError:
total_errs_count += 1
total_errs_msg.append(f'Index {i}, Expected {a[i]}, Get {b[i]}')
if total_errs_count > 0:
for m in total_errs_msg:
print(m)
raise AssertionError("Test Failed")
test = TestA()
test.test_lst()
I got:
IOndex 1, Expected 2, Get 3
Number 3, Expected 4, Get 5
----------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-5-b70dc996c844> in <module>
27
28 test = TestA()
---> 29 test.test_lst()
<ipython-input-5-b70dc996c844> in test_lst(self)
24 for m in total_errs_msg:
25 print(m)
---> 26 raise AssertionError("Test Failed")
27
28 test = TestA()
AssertionError: Test Failed
However, the desired output is to hide the traceback:
Index 1, Expected 2, Get 3
Index 3, Expected 4, Get 5
----------------------------------------------------
AssertionError: Test Failed
How can I hide the traceback in this case? Another post suggested to catch the exception by unittest_exception = sys.exc_info(), but here I don't want to immediately throw the exceptions but wait for all test cases to finish.
Any suggestion ?
Thanks

Try this way
from unittest import TestCase
import unittest
class TestA(TestCase):
def setUp(self):
pass
def tearDown(self):
pass
def test_lst(self):
a = [1, 2, 3, 4, 5]
b = [1, 3, 3, 5, 5]
for i in range(len(a)):
with self.subTest(i=i):
self.assertEqual(a[i], b[i])
if __name__ == '__main__':
unittest.main()

Trying to get a weighted average out of a dictionary of grades data

I am trying to return the weighted average of the student's grades based on the last definition. I have the dictionaries defined, but think my attempt to pull the numbers out is incorrect.
def Average(lst):
return sum(lst) / len(lst)
# Driver Code
lst = [1,2,3,4,5]
average = Average(lst)
print("Average of the list =", average)
def get_weighted_average(student):
return average('homework')*0.10 + average('quizzes')*0.30 + average('tests')*.60
#driver code
students = [steve, alice, tyler]
print(get_weighted_average('steve'))
How to get a weighted average out of a dictionary of grades above?

What is the primary source of your data? Text? Anyway, it looks like you have something like this in mind.
Imperative approach
1 - Your "database"
students_marks = {
'steve':{
'homework':[1,2,3,4,5],
'quizzes' :[5,4,3,2,1],
'tests' :[0,0,0,0,0],
},
'alice':{
'homework':[5,4,3,2,1],
'quizzes' :[0,0,0,0,0],
'tests' :[1,2,3,4,5],
},
}
use case:
>>> students_marks['steve']
{'homework': [1, 2, 3, 4, 5], 'quizzes': [5, 4, 3, 2, 1], 'tests': [0, 0, 0, 0, 0]}
>>> students_marks['steve']['homework']
[1, 2, 3, 4, 5]
2 - The definition of average and get_weighted_average
def average(lst):
return sum(lst)/len(lst) # Python3
#return sum(lst)/float(len(lst)) # Python2
def get_weighted_average(student_name):
student_marks = students_marks[student_name]
return round(
average(student_marks['homework'])*.1
+ average(student_marks['quizzes'])*.3
+ average(student_marks['tests'])*.6
, 2)
use case:
>>> get_weighted_average('steve')
1.2
>>> get_weighted_average('alice')
2.1
or using list
>>> students_names = ['steve', 'alice']
>>> [get_weighted_average(name) for name in students_names]
[1.2, 2.1]
or using dict
>>> {name:get_weighted_average(name) for name in students_names}
{'steve': 1.2, 'alice': 2.1}
Object-Oriented (OO) approach
All this being shown, what you want to do would probably be better done by programming in an OO manner. A quick example
class Student(object):
homeworks_weight = .1
quizzes_weight = .3
tests_weight = .6
def __init__(self, name, homeworks_marks, quizzes_marks, tests_marks):
self.name = name
self.homeworks_marks = homeworks_marks
self.quizzes_marks = quizzes_marks
self.tests_marks = tests_marks
#staticmethod
def average(marks):
return sum(marks)/len(marks)
def get_gpa(self, rd=2):
return round(
self.average(self.homeworks_marks)*self.homeworks_weight
+ average(self.quizzes_marks)*self.quizzes_weight
+ average(self.tests_marks)*self.tests_weight
, rd)
use case:
>>> steve = Student(
name = 'Steve',
homeworks_marks = [1,2,3,4,5],
quizzes_marks = [5,4,3,2,1],
tests_marks = [0,0,0,0,0]
)
>>> steve.get_gpa()
1.2
>>> steve.homeworks_marks
[1, 2, 3, 4, 5]

How do you use a string declared in the project file in sys.argv?

Let's say I have this code in test.py:
import sys
a = 'alfa'
b = 'beta'
c = 'gamma'
d = 'delta'
print(sys.argv[1])
Running python test.py a would then return a. How can I make it return alfa instead?

Using a dictionary that maps to those strings:
mapping = {'a': 'alfa', 'd': 'delta', 'b': 'beta', 'c': 'gamma'}
Then when you get your sys.argv[1] just access the value from your dictionary as:
print(mapping.get(sys.argv[1]))
Demo:
File: so_question.py
import sys
mapping = {'a': 'alfa', 'd': 'delta', 'b': 'beta', 'c': 'gamma'}
user_var = sys.argv[1]
user_var_value = mapping.get(user_var)
print("user_var_value is: {}".format(user_var_value))
In a shell:
▶ python so_question.py a
user_var_value is: alfa

You can also use the globals or locals:
import sys
a = 'alfa'
b = 'beta'
c = 'gamma'
d = 'delta'
print(globals().get(sys.argv[1]))
# or
print(locals().get(sys.argv[1]))

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python Pandas Unit Test is not recognized - python

You should name your methods with test_ prefix def test_df_minus_equal(self): pass

Related

Pytest: 27 Assertion Error: How to format the input and output parameters properly

Pyspark pass function as a parameter to UDF

Python How to hide traceback in unittest

Trying to get a weighted average out of a dictionary of grades data

How do you use a string declared in the project file in sys.argv?

Categories

Resources