Related
I have two dictionaries like the following:
dict1 =
{'a': [67.0, 24.0, 45.0, 45.0, 45.0, 23.0, 21.0, 45.0],
'b': [0.9, 0.5, 9.0, 4.5, 54.0, 0.0, 0.0, 0.0],
'c': [1.0, 5.0, 40.0, 30.0, 20.0, 0.0, 10.0, 50.0],
'd': [60.0, 80.0, 56.0, 34.0, 78.0, 13.0, 0.0, 70.0]}
dict2 =
{'a': 0.897,'c': 3.4, 'd': 34.567}
I want all the values in dict1 to be shifted right by value of 1. The keys of dict1 and dict2 are compared. If there exist a value for the similar keys indict2, the value is put as the first element in the values of dict1 (which is a list). If there exist no value in dict2, the value the first element is 0.0. For eg:
When the two dictionaries are compared, dict2 contains values for the key 'a', 'c', 'd'. So the values for these keys are put as the first element in the value of dict1 (which is a list) while shifting the other elements of the list to right. The size of the list is maintained. For the keys which do not contain a value in dict2, a value of 0.0 is put as the first element in the list as shown below
dict1 =
{'a': [0.897, 67.0, 24.0, 45.0, 45.0, 45.0, 23.0, 21.0],
'b': [0.0, 0.9, 0.5, 9.0, 4.5, 54.0, 0.0, 0.0, 0.0],
'c': [3.4, 1.0, 5.0, 40.0, 30.0, 20.0, 0.0, 10.0],
'd': [34.567, 60.0, 80.0, 56.0, 34.0, 78.0, 13.0, 0.0]}
You can iterate over dict1 and if the key exists in dict2, insert the value from dict2 into the index 0 of list in dict1 or insert zero with the default value with dict.get.
for k,v in dict1.items():
dict1[k].pop() # removing last element from each 'list` in 'dict1'
dict1[k].insert(0, dict2.get(k, 0.0)) # insert elelment at 'index=0' from 'dict2' or 'zero' if key doesn't exist in 'dict2'
print(dict1)
{
'a': [0.897, 67.0, 24.0, 45.0, 45.0, 45.0, 23.0, 21.0],
'b': [0.0, 0.9, 0.5, 9.0, 4.5, 54.0, 0.0, 0.0],
'c': [3.4, 1.0, 5.0, 40.0, 30.0, 20.0, 0.0, 10.0],
'd': [34.567, 60.0, 80.0, 56.0, 34.0, 78.0, 13.0, 0.0]
}
If this operation is done repeatedly I suggest you use a deque, the time complexity of appending to the left of deque is O(1) (in a list is O(n)).
from collections import deque
dict1 = {k: deque(v, maxlen=len(v)) for k, v in dict1.items()}
for key, value in dict1.items():
value.appendleft(dict2.get(key, 0.0))
print(dict1)
Output
{'a': deque([0.897, 67.0, 24.0, 45.0, 45.0, 45.0, 23.0, 21.0], maxlen=8),
'b': deque([0.0, 0.9, 0.5, 9.0, 4.5, 54.0, 0.0, 0.0], maxlen=8),
'c': deque([3.4, 1.0, 5.0, 40.0, 30.0, 20.0, 0.0, 10.0], maxlen=8),
'd': deque([34.567, 60.0, 80.0, 56.0, 34.0, 78.0, 13.0, 0.0], maxlen=8)}
I’m attempting to fine-tune distilbert-base-uncased model for token classification with custom entities. The dataset has the annotated tags in IOB-format.
I imported and created a huggingface DatasetDict following the documentation. Each dataset Features is defined as below:
Features({
'id': Value(dtype='int32', id=None),
'tokens': Sequence(feature=Value(dtype='string', id=None), id=None),
'ner_tags': Sequence(feature=ClassLabel(num_classes=ner_tags_num_classes,
names=ner_tags_names, id=None), id=None)
})
And ner tags mapping is the following:
{0: 'O', 1: 'B-product', 2: 'I-product', 3: 'B-field', 4: 'I-field',
5: 'B-task', 6: 'I-task', 7: 'B-researcher', 8: 'I-researcher', 9: 'B-university',
10: 'B-programlang', 11: 'B-algorithm', 12: 'I-algorithm', 13: 'B-misc', 14: 'I-misc',
15: 'I-university', 16: 'B-metrics', 17: 'B-organisation', 18: 'I-organisation', 19: 'I-metrics',
20: 'B-conference', 21: 'I-conference', 22: 'B-country', 23: 'I-programlang', 24: 'B-location',
25: 'B-person', 26: 'I-person', 27: 'I-country', 28: 'I-location'}
I followed this tutorial for every step.
However, when it comes to compute the metric by using seqeval on the test set, this is the output that I get:
/opt/conda/lib/python3.7/site-packages/seqeval/metrics/v1.py:57: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples.
Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
{'algorithm': {'precision': 0.0, 'recall': 0.0, 'f1': 0.0, 'number': 191},
'conference': {'precision': 0.0, 'recall': 0.0, 'f1': 0.0, 'number': 391},
'country': {'precision': 0.0, 'recall': 0.0, 'f1': 0.0, 'number': 57},
'field': {'precision': 0.0, 'recall': 0.0, 'f1': 0.0, 'number': 93},
'location': {'precision': 0.0, 'recall': 0.0, 'f1': 0.0, 'number': 412},
'metrics': {'precision': 0.0, 'recall': 0.0, 'f1': 0.0, 'number': 521},
'misc': {'precision': 0.0, 'recall': 0.0, 'f1': 0.0, 'number': 181},
'organisation': {'precision': 0.0, 'recall': 0.0, 'f1': 0.0, 'number': 67},
'person': {'precision': 0.0, 'recall': 0.0, 'f1': 0.0, 'number': 219},
'product': {'precision': 0.0, 'recall': 0.0, 'f1': 0.0, 'number': 177},
'programlang': {'precision': 0.0, 'recall': 0.0, 'f1': 0.0, 'number': 201},
'researcher': {'precision': 0.0, 'recall': 0.0, 'f1': 0.0, 'number': 207},
'task': {'precision': 0.0, 'recall': 0.0, 'f1': 0.0, 'number': 44},
'university': {'precision': 0.0, 'recall': 0.0, 'f1': 0.0, 'number': 183},
'overall_precision': 0.0,
'overall_recall': 0.0,
'overall_f1': 0.0,
'overall_accuracy': 0.6539285236246821}
I have absolutely no idea how to solve this problem.
Is the model performing so bad that I get ill-defined precision and f-score?
Did I commit some error when I created the dataset?
Do I have to look at the fine-tuning part of the code or only the evaluation part?
Is there another way to evaluate the model using a test set which is a tensorflow dataset?
If I have the following type of data - a list of dictionaries, how can I extract some key values from it?
comps = [
{
"name":'Test1',
"p_value":0.02,
"group0_null": 0.0,
"group1_null": 0.0,
},{
"name":'Test2',
"p_value":0.05,
"group0_null": 0.0,
"group1_null": 0.0,
},{
"name":'Test3',
"p_value":0.03,
"group0_null": 0.0,
"group1_null": 0.0,
},{
"name":'Test4',
"p_value":0.07,
"group0_null": 0.0,
"group1_null": 0.0,
},{
"name":'Test5',
"p_value":0.03,
"group0_null": 0.0,
"group1_null": 0.0,
},{
"name":'Test6',
"p_value":0.02,
"group0_null": 0.0,
"group1_null": 0.0,
},{
"name":'Test7',
"p_value":0.01,
"group0_null": 0.0,
"group1_null": 0.0,
}]
Result
From the data above, let's say I only want name and p_value. How can I get this result.
[{
"name":'Test1',
"p_value":0.02,
},{
"name":'Test2',
"p_value":0.05,
},{
"name":'Test3',
"p_value":0.03,
},{
"name":'Test4',
"p_value":0.07,
},{
"name":'Test5',
"p_value":0.03,
},{
"name":'Test6',
"p_value":0.02,
},{
"name":'Test7',
"p_value":0.01,
}]
this shows everything
[c for c in comps]
This shows only the names
[c['name'] for c in comps]
But if I do this:
[c['name','p_value'] for c in comps ]
I get the error:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-94-b29459f7b089> in <module>
----> 1 [c['name','p_value'] for c in comps['continuous_explainers'] ]
2
3 # cont_comps = []
4
5 # for c in comps['continuous_explainers']:
<ipython-input-94-b29459f7b089> in <listcomp>(.0)
----> 1 [c['name','p_value'] for c in comps['continuous_explainers'] ]
2
3 # cont_comps = []
4
5 # for c in comps['continuous_explainers']:
KeyError: ('name', 'p_value')
The real data dictionary is much larger than this. I want to do this so that I can have a list of things that are need.
UPDATE
Since some pointed out that the structure of the data that I showed is different from what I receive from the server, here's the code that I used to pull the data.
# get all comparisons
comps = source.get_comparison(name='Pr1 vs. Rest')
# only take the continuous explainers
comps['continuous_explainers'][1:5]
DATA
[{'name': 'Gender',
'column_index': 2,
'ks_score': 0.0022329709328575142,
'p_value': 1.0,
'quartiles': [[0.0, 0.0, 1.0, 1.0, 2.0], [0.0, 0.0, 1.0, 1.0, 2.0]],
't_test_p_value': 0.8341377317414621,
'diff_means': 0.0014959875249118681,
'primary_group_mean': 0.6312769010043023,
'secondary_group_mean': 0.6297809134793905,
'ks_sign': '+',
'group0_percent_null': 0.0,
'group1_percent_null': 0.0},
{'name': 'Gender_Missing_color',
'column_index': 3,
'ks_score': 2.220446049250313e-16,
'p_value': 1.0,
'quartiles': [[1.0, 1.0, 1.0, 1.0, 1.0], [1.0, 1.0, 1.0, 1.0, 1.0]],
't_test_p_value': 1.0,
'diff_means': 0.0,
'primary_group_mean': 1.0,
'secondary_group_mean': 1.0,
'ks_sign': '0',
'group0_percent_null': 0.9966523194643712,
'group1_percent_null': 0.9959153360564427},
{'name': 'Gender_Missing',
'column_index': 4,
'ks_score': 0.0007369834078797544,
'p_value': 1.0,
'quartiles': [[0.0, 0.0, 0.0, 0.0, 1.0], [0.0, 0.0, 0.0, 0.0, 1.0]],
't_test_p_value': 0.40301091478187256,
'diff_means': -0.0007369834079284866,
'primary_group_mean': 0.0033476805356288893,
'secondary_group_mean': 0.004084663943557376,
'ks_sign': '-',
'group0_percent_null': 0.0,
'group1_percent_null': 0.0},
{'name': 'Male',
'column_index': 5,
'ks_score': 0.0029699543407862294,
'p_value': 0.9999999999915384,
'quartiles': [[0.0, 0.0, 1.0, 1.0, 1.0], [0.0, 0.0, 1.0, 1.0, 1.0]],
't_test_p_value': 0.6740956861786738,
'diff_means': 0.0029699543407684104,
'primary_group_mean': 0.6245815399330444,
'secondary_group_mean': 0.621611585592276,
'ks_sign': '+',
'group0_percent_null': 0.0,
'group1_percent_null': 0.0}]
This is the output I get. As mentioned above, I only need some data from this list of dictionaries.
You could create a new dict for each object in comparisons, and initialize it only with name and p_value keys.
ex = [{'name': d['name'], 'p_value': d['p_value']} for d in comparisons]
I'm still not sure how to make the answers above work for me. However, I figured another way to do this:
test = [(c['name'],c['p_value'], c['group0_percent_null']) for c in comps]
pd.DataFrame(test)
0 1 2
0 ID 5.374590e-13 0.000000
1 Gender 1.000000e+00 0.000000
2 Gender_Missing_color 1.000000e+00 0.996652
3 Gender_Missing 1.000000e+00 0.000000
4 Male 1.000000e+00 0.000000
... ... ... ...
It gave me the result I was looking for.
try
[{'name':c['name'], 'p_value':c['p_value']} for c in comps]
This is what transaction matrix (dataframe) looks like:
{'Avg. Winter temp 0-10C': {0: 1.0, 1: 1.0},
'Avg. Winter temp < 0C': {0: 0.0, 1: 0.0},
'Avg. Winter temp > 10C': {0: 0.0, 1: 0.0},
'Avg. summer temp 11-20C': {0: 0.0, 1: 1.0},
'Avg. summer temp 20-25C': {0: 1.0, 1: 0.0},
'Avg. summer temp > 25C': {0: 0.0, 1: 0.0},
'GENDER_DESC:F': {0: 0.0, 1: 1.0},
'GENDER_DESC:M': {0: 1.0, 1: 0.0},
'MODEL_TYPE:FED EMP': {0: 0.0, 1: 0.0},
'MODEL_TYPE:HCPROV': {0: 0.0, 1: 0.0},
'MODEL_TYPE:IPA': {0: 0.0, 1: 0.0},
'MODEL_TYPE:MED A': {0: 0.0, 1: 0.0},
'MODEL_TYPE:MED ADVG': {0: 0.0, 1: 0.0},
'MODEL_TYPE:MED B': {0: 1.0, 1: 0.0},
'MODEL_TYPE:MED SNPG': {0: 0.0, 1: 0.0},
'MODEL_TYPE:MED UNSP': {0: 0.0, 1: 1.0},
'MODEL_TYPE:MEDICAID': {0: 0.0, 1: 0.0},
'MODEL_TYPE:MEDICARE': {0: 0.0, 1: 0.0},
'MODEL_TYPE:PPO': {0: 0.0, 1: 0.0},
'MODEL_TYPE:TPA': {0: 0.0, 1: 0.0},
'MODEL_TYPE:UNSPEC': {0: 0.0, 1: 0.0},
'MODEL_TYPE:WORK COMP': {0: 0.0, 1: 0.0},
'Multiple_Cancer_Flag:No': {0: 1.0, 1: 1.0},
'Multiple_Cancer_Flag:Yes': {0: 0.0, 1: 0.0},
'PATIENT_AGE_GROUP 30-65': {0: 0.0, 1: 0.0},
'PATIENT_AGE_GROUP 65-69': {0: 0.0, 1: 0.0},
'PATIENT_AGE_GROUP 69-71': {0: 1.0, 1: 0.0},
'PATIENT_AGE_GROUP 71-77': {0: 0.0, 1: 0.0},
'PATIENT_AGE_GROUP 77-85': {0: 0.0, 1: 1.0},
'PATIENT_LOCATION:ARIZONA': {0: 0.0, 1: 0.0},
'PATIENT_LOCATION:CALIFORNIA': {0: 0.0, 1: 0.0},
'PATIENT_LOCATION:CONNECTICUT': {0: 0.0, 1: 0.0},
'PATIENT_LOCATION:DELAWARE': {0: 0.0, 1: 0.0},
'PATIENT_LOCATION:FLORIDA': {0: 0.0, 1: 0.0},
'PATIENT_LOCATION:GEORGIA': {0: 0.0, 1: 0.0},
'PATIENT_LOCATION:IOWA': {0: 0.0, 1: 0.0},
'PATIENT_LOCATION:KANSAS': {0: 0.0, 1: 0.0},
'PATIENT_LOCATION:KENTUCKY': {0: 0.0, 1: 0.0},
'PATIENT_LOCATION:LOUISIANA': {0: 0.0, 1: 0.0},
'PATIENT_LOCATION:MARYLAND': {0: 0.0, 1: 0.0},
'PATIENT_LOCATION:MASSACHUSETTS': {0: 0.0, 1: 0.0},
'PATIENT_LOCATION:MICHIGAN': {0: 0.0, 1: 0.0},
'PATIENT_LOCATION:MINNESOTA': {0: 0.0, 1: 0.0},
'PATIENT_LOCATION:MISSISSIPPI': {0: 0.0, 1: 0.0},
'PATIENT_LOCATION:MISSOURI': {0: 0.0, 1: 0.0},
'PATIENT_LOCATION:NEBRASKA': {0: 0.0, 1: 0.0},
'PATIENT_LOCATION:NEW JERSEY': {0: 0.0, 1: 0.0},
'PATIENT_LOCATION:NEW MEXICO': {0: 1.0, 1: 0.0},
'PATIENT_LOCATION:NEW YORK': {0: 0.0, 1: 0.0},
'PATIENT_LOCATION:OKLAHOMA': {0: 0.0, 1: 0.0},
'PATIENT_LOCATION:OREGON': {0: 0.0, 1: 0.0},
'PATIENT_LOCATION:PENNSYLVANIA': {0: 0.0, 1: 0.0},
'PATIENT_LOCATION:SOUTH CAROLINA': {0: 0.0, 1: 0.0},
'PATIENT_LOCATION:TENNESSEE': {0: 0.0, 1: 0.0},
'PATIENT_LOCATION:TEXAS': {0: 0.0, 1: 0.0},
'PATIENT_LOCATION:VIRGINIA': {0: 0.0, 1: 0.0},
'PATIENT_LOCATION:WASHINGTON': {0: 0.0, 1: 1.0},
'PAYER_TYPE:Commercial': {0: 0.0, 1: 0.0},
'PAYER_TYPE:Managed Medicaid': {0: 0.0, 1: 0.0},
'PAYER_TYPE:Medicare': {0: 1.0, 1: 0.0},
'PAYER_TYPE:Medicare D': {0: 0.0, 1: 1.0},
'PLAN_NAME:ALL OTHER THIRD PARTY': {0: 0.0, 1: 0.0},
'PLAN_NAME:BCBS FL UNSPECIFIED': {0: 0.0, 1: 0.0},
'PLAN_NAME:BCBS MI MEDICARE D GENERAL (MI)': {0: 0.0, 1: 0.0},
'PLAN_NAME:BCBS TEXAS GENERAL (TX)': {0: 0.0, 1: 0.0},
'PLAN_NAME:BLUE CARE (MS)': {0: 0.0, 1: 0.0},
'PLAN_NAME:BLUE PREFERRED PPO (AZ)': {0: 0.0, 1: 0.0},
'PLAN_NAME:CMMNWLTH CRE MED SNP GENERAL(MA)': {0: 0.0, 1: 0.0},
'PLAN_NAME:DEPT OF VETERANS AFFAIRS': {0: 0.0, 1: 0.0},
'PLAN_NAME:EMBLEMHEALTH/HIP/GHI UNSPEC': {0: 0.0, 1: 0.0},
'PLAN_NAME:ESSENCE MED ADV GENERAL (MO)': {0: 0.0, 1: 0.0},
'PLAN_NAME:HEALTH NET MED D GENERAL (OR)': {0: 0.0, 1: 0.0},
'PLAN_NAME:HIGHMARK UNSPECIFIED': {0: 0.0, 1: 0.0},
'PLAN_NAME:HUMANA MED D GENERAL(MN)': {0: 0.0, 1: 0.0},
'PLAN_NAME:HUMANA-UNSPECIFIED': {0: 0.0, 1: 0.0},
'PLAN_NAME:KEYSTONE FIRST (PA)': {0: 0.0, 1: 0.0},
'PLAN_NAME:MEDICARE A': {0: 0.0, 1: 0.0},
'PLAN_NAME:MEDICARE A KENTUCKY (KY)': {0: 0.0, 1: 0.0},
'PLAN_NAME:MEDICARE A MINNESOTA (MN)': {0: 0.0, 1: 0.0},
'PLAN_NAME:MEDICARE B': {0: 0.0, 1: 0.0},
'PLAN_NAME:MEDICARE B ARIZONA (AZ)': {0: 0.0, 1: 0.0},
'PLAN_NAME:MEDICARE B IOWA (IA)': {0: 0.0, 1: 0.0},
'PLAN_NAME:MEDICARE B KANSAS (KS)': {0: 0.0, 1: 0.0},
'PLAN_NAME:MEDICARE B NEW MEXICO (NM)': {0: 1.0, 1: 0.0},
'PLAN_NAME:MEDICARE B PENNSYLVANIA (PA)': {0: 0.0, 1: 0.0},
'PLAN_NAME:MEDICARE B TEXAS (TX)': {0: 0.0, 1: 0.0},
'PLAN_NAME:MEDICARE B VIRGINIA (VA)': {0: 0.0, 1: 0.0},
'PLAN_NAME:MEDICARE UNSP': {0: 0.0, 1: 0.0},
'PLAN_NAME:MOLINA HEALTHCARE (FL)': {0: 0.0, 1: 0.0},
'PLAN_NAME:OPTUMHEALTH PHYSICAL HEALTH': {0: 0.0, 1: 0.0},
'PLAN_NAME:PACIFICSOURCE HP MED ADV GNRL': {0: 0.0, 1: 0.0},
'PLAN_NAME:PAI PLANNED ADMIN INC (SC)': {0: 0.0, 1: 0.0},
'PLAN_NAME:PEOPLES HLTH NETWORK': {0: 0.0, 1: 0.0},
'PLAN_NAME:THE COVENTRY CORP UNSPECIFIED': {0: 0.0, 1: 0.0},
'PLAN_NAME:UHC/PAC/AARP MED D GENERAL (FL)': {0: 0.0, 1: 0.0},
'PLAN_NAME:UHC/PAC/AARP MED D GENERAL (MD)': {0: 0.0, 1: 0.0},
'PLAN_NAME:UHC/PAC/AARP MED D GENERAL (NY)': {0: 0.0, 1: 0.0},
'PLAN_NAME:UHC/PAC/AARP MED D GENERAL (TX)': {0: 0.0, 1: 0.0},
'PLAN_NAME:UHC/PAC/AARP MED D GENERAL (WA)': {0: 0.0, 1: 1.0},
'PLAN_NAME:UNITED HLTHCARE-(CT) CT PPO': {0: 0.0, 1: 0.0},
'PLAN_NAME:UNITED HLTHCARE-(NE) MIDLANDS': {0: 0.0, 1: 0.0},
'PLAN_NAME:UNITED HLTHCARE-UNSPECIFIED': {0: 0.0, 1: 0.0},
'PLAN_NAME:UNITED MEDICAL RESOURCES/UMR': {0: 0.0, 1: 0.0},
'PLAN_NAME:WORKERS COMP - EMPLOYER': {0: 0.0, 1: 0.0},
'PRI_SPECIALTY_DESC:DERMATOLOGY': {0: 0.0, 1: 0.0},
'PRI_SPECIALTY_DESC:HEMATOLOGY/ONCOLOGY': {0: 1.0, 1: 0.0},
'PRI_SPECIALTY_DESC:INTERNAL MEDICINE': {0: 0.0, 1: 0.0},
'PRI_SPECIALTY_DESC:MEDICAL ONCOLOGY': {0: 0.0, 1: 1.0},
'PRI_SPECIALTY_DESC:NURSE PRACTITIONER': {0: 0.0, 1: 0.0},
'PRI_SPECIALTY_DESC:OBSTETRICS & GYNECOLOGY': {0: 0.0, 1: 0.0},
'PROVIDER_LOCATION:ARIZONA': {0: 0.0, 1: 0.0},
'PROVIDER_LOCATION:CALIFORNIA': {0: 0.0, 1: 0.0},
'PROVIDER_LOCATION:CONNECTICUT': {0: 0.0, 1: 0.0},
'PROVIDER_LOCATION:DELAWARE': {0: 0.0, 1: 0.0},
'PROVIDER_LOCATION:FLORIDA': {0: 0.0, 1: 0.0},
'PROVIDER_LOCATION:IOWA': {0: 0.0, 1: 0.0},
'PROVIDER_LOCATION:KANSAS': {0: 0.0, 1: 0.0},
'PROVIDER_LOCATION:KENTUCKY': {0: 0.0, 1: 0.0},
'PROVIDER_LOCATION:LOUISIANA': {0: 0.0, 1: 0.0},
'PROVIDER_LOCATION:MASSACHUSETTS': {0: 0.0, 1: 0.0},
'PROVIDER_LOCATION:MICHIGAN': {0: 0.0, 1: 0.0},
'PROVIDER_LOCATION:MINNESOTA': {0: 0.0, 1: 0.0},
'PROVIDER_LOCATION:MISSISSIPPI': {0: 0.0, 1: 0.0},
'PROVIDER_LOCATION:MISSOURI': {0: 0.0, 1: 0.0},
'PROVIDER_LOCATION:NEBRASKA': {0: 0.0, 1: 0.0},
'PROVIDER_LOCATION:NEW MEXICO': {0: 1.0, 1: 0.0},
'PROVIDER_LOCATION:NEW YORK': {0: 0.0, 1: 0.0},
'PROVIDER_LOCATION:OREGON': {0: 0.0, 1: 0.0},
'PROVIDER_LOCATION:PENNSYLVANIA': {0: 0.0, 1: 0.0},
'PROVIDER_LOCATION:SOUTH CAROLINA': {0: 0.0, 1: 0.0},
'PROVIDER_LOCATION:TENNESSEE': {0: 0.0, 1: 0.0},
'PROVIDER_LOCATION:TEXAS': {0: 0.0, 1: 0.0},
'PROVIDER_LOCATION:VIRGINIA': {0: 0.0, 1: 0.0},
'PROVIDER_LOCATION:WASHINGTON': {0: 0.0, 1: 1.0},
'PROVIDER_TYP_DESC:PROFESSIONAL': {0: 1.0, 1: 1.0},
'Region:MIDWEST': {0: 0.0, 1: 0.0},
'Region:NORTHEAST': {0: 0.0, 1: 0.0},
'Region:SOUTH': {0: 0.0, 1: 0.0},
'Region:WEST': {0: 1.0, 1: 1.0},
'Vials Consumption == 1': {0: 0.0, 1: 0.0},
'Vials_Consumption_GROUP 1-2': {0: 0.0, 1: 0.0},
'Vials_Consumption_GROUP 12-91': {0: 0.0, 1: 0.0},
'Vials_Consumption_GROUP 2-3': {0: 0.0, 1: 0.0},
'Vials_Consumption_GROUP 3-6': {0: 0.0, 1: 1.0},
'Vials_Consumption_GROUP 6-12': {0: 1.0, 1: 0.0},
'keytruda_flag:No': {0: 1.0, 1: 1.0},
'keytruda_flag:Yes': {0: 0.0, 1: 0.0},
'libtayo_flag:No': {0: 0.0, 1: 0.0},
'libtayo_flag:Yes': {0: 1.0, 1: 1.0},
'optivo_flag:No': {0: 1.0, 1: 1.0},
'optivo_flag:Yes': {0: 0.0, 1:
0.0}}
This is a transactional matrix. Rules are created out of this using this:
from mlxtend.frequent_patterns import apriori
frequent_itemsets = apriori(train_bucket, min_support=0.2, use_colnames=True)
print (frequent_itemsets)
And create rules using this:
from mlxtend.frequent_patterns import association_rules
association_rules(frequent_itemsets, metric="confidence", min_threshold=0.7)
rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1.2)
print (len(rules["antecedents"]))
It gives 10k rules. I need to be able to visualize these. I tried using this:
https://intelligentonlinetools.com/blog/2018/02/10/how-to-create-data-visualization-for-association-rules-in-data-mining/
I tried the networkX example and it gives this:
If I plot all, it becomes cluttered.
I thought of applying t-SNE but that doesn't make quite sense to be used on initial transactional matrix. Tried it this way
import numpy as np
from sklearn.manifold import TSNE
X = train_bucket
X_embedded = TSNE(n_components=2).fit_transform(X)
X_embedded.shape
from sklearn.manifold import TSNE
from matplotlib import pyplot as plt
import seaborn as sns
sns.set(rc={'figure.figsize':(11.7,8.27)})
palette = sns.color_palette("bright", 10)
sns.scatterplot(X_embedded[:,0], X_embedded[:,1], legend='full', palette=palette)
I have no idea how to make sense of it. What are some options that I can explore?
I am having a list in below format. How can i create another list from the existing one with just selected elements.
[{'UserDiscount': 0.0, 'CostTotalInvEOPAmount': 940.0, 'WeekEndingData': u'2016-10-08', 'WeeksOnHand': 0.0, 'UnitTotalInvEOPQuantity': 250.0, 'WeeksOfSales': 0.0, 'UnitCostAmount': 3.76, 'Week': u'2016 Wk 36', 'CostReceiptAmount': 940.0, 'UnitSalesQuantity': 0.0, 'UnitReceiptQuantity': 250.0, 'Demand': 0.0, 'InventoryBOP': 0.0, 'PEMDiscount': 0.0, 'ElasticLift': 0.0, 'StoreCount': 0, 'PriceStatus': 4, 'UnitOnOrderQuantity': None, 'ReceiptSizeContributions': [{u'sizeId': u'e1656ac7-1cc1-40ce-b485-989bba9d758d', u'contribution': 1.0}], 'CostSalesAmount': 0.0, 'LifeCycleProperties': {u'IsAtRegularPrice': False, u'IsAtMarkdown': False, u'IsFinished': False, u'IsPreSeason': True}, 'MardownDiscount': 0.0, 'RecommendedReceipt': 250.0, 'RecommendedReceiptSizeContributions': [{u'sizeId': u'e1656ac7-1cc1-40ce-b485-989bba9d758d', u'contribution': 1.0}], 'UnitTotalInvBOPQuantity': 0.0, 'CostOnOrderAmount': None, 'InventoryEOP': 250.0, 'CostTotalInvBOPAmount': 0.0, 'Receipt': 250.0, 'Sales': 0.0, 'LostSales': 0.0, 'TotalDiscount': 0.0, 'RetailSalesAmount': 0.0},
{'UserDiscount': 0.0, 'CostTotalInvEOPAmount': 940.0, 'WeekEndingData': u'2016-10-15', 'WeeksOnHand': 0.0, 'UnitTotalInvEOPQuantity': 250.0, 'WeeksOfSales': 15.784951285314385, 'UnitCostAmount': 3.76, 'Week': u'2016 Wk 37', 'CostReceiptAmount': 0.0, 'UnitSalesQuantity': 0.0, 'UnitReceiptQuantity': 0.0, 'Demand': 0.0, 'InventoryBOP': 250.0, 'PEMDiscount': 0.0, 'ElasticLift': 0.0, 'StoreCount': 0, 'PriceStatus': 4, 'UnitOnOrderQuantity': None, 'ReceiptSizeContributions': [], 'CostSalesAmount': 0.0, 'LifeCycleProperties': {u'IsAtRegularPrice': False, u'IsAtMarkdown': False, u'IsFinished': False, u'IsPreSeason': True}, 'MardownDiscount': 0.0, 'RecommendedReceipt': 0.0, 'RecommendedReceiptSizeContributions': [], 'UnitTotalInvBOPQuantity': 250.0, 'CostOnOrderAmount': None, 'InventoryEOP': 250.0, 'CostTotalInvBOPAmount': 940.0, 'Receipt': 0.0, 'Sales': 0.0, 'LostSales': 0.0, 'TotalDiscount': 0.0, 'RetailSalesAmount': 0.0}]
My new list will having below elements.
[{'UserDiscount': 0.0, 'CostTotalInvEOPAmount': 940.0, 'WeekEndingData': u'2016-10-08', 'WeeksOnHand': 0.0, 'UnitTotalInvEOPQuantity': 250.0, 'WeeksOfSales': 0.0, 'UnitCostAmount': 3.76, 'Week': u'2016 Wk 36', 'CostReceiptAmount': 940.0, 'UnitSalesQuantity': 0.0, 'UnitReceiptQuantity': 250.0, 'Demand': 0.0, 'InventoryBOP': 0.0, 'PEMDiscount': 0.0, 'ElasticLift': 0.0, 'StoreCount': 0, 'PriceStatus': 4, 'UnitOnOrderQuantity': None, 'CostSalesAmount': 0.0, 'RecommendedReceipt': 250.0, 'RetailSalesAmount': 0.0},
{'UserDiscount': 0.0, 'CostTotalInvEOPAmount': 940.0, 'WeekEndingData': u'2016-10-15', 'WeeksOnHand': 0.0, 'UnitTotalInvEOPQuantity': 250.0, 'WeeksOfSales': 15.784951285314385, 'UnitCostAmount': 3.76, 'Week': u'2016 Wk 37', 'CostReceiptAmount': 0.0, 'UnitSalesQuantity': 0.0, 'UnitReceiptQuantity': 0.0, 'Demand': 0.0, 'InventoryBOP': 250.0, 'PEMDiscount': 0.0, 'ElasticLift': 0.0, 'StoreCount': 0, 'PriceStatus': 4, 'UnitOnOrderQuantity': None, 'CostSalesAmount': 0.0, 'RecommendedReceipt': 0.0, 'RetailSalesAmount': 0.0}]
You have a list with two dictionaries. To filter the dictionaries you can try
keep=[key1,key2] #keys you wanna keep
newList = []
for item in mylist:
d = dict((key,value) for key, value in item.iteritems() if key in keep)
newlist.append(d)
del mylist
Also using funcy you can do a
import funcy
mydict={1:1,2:2,3:3}
keep=[1,2]
funcy.project(mydict,keep)
=> {1: 1, 2: 2}
which is much prettier imho.
You could use the list comprehension https://docs.python.org/3/tutorial/datastructures.html#list-comprehensions
[l for l in your_list if l['UserDiscount'] >= 1 ]
[{'UserDiscount': l['UserDiscount'],'CostTotalInvEOPAmount': l['CostTotalInvEOPAmount']} for l in your_list ]
Using this way you can filter the elements in your list and change the structure of your dicts in the list