How to visualize transactional matrix - python

This is what transaction matrix (dataframe) looks like:
{'Avg. Winter temp 0-10C': {0: 1.0, 1: 1.0},
'Avg. Winter temp < 0C': {0: 0.0, 1: 0.0},
'Avg. Winter temp > 10C': {0: 0.0, 1: 0.0},
'Avg. summer temp 11-20C': {0: 0.0, 1: 1.0},
'Avg. summer temp 20-25C': {0: 1.0, 1: 0.0},
'Avg. summer temp > 25C': {0: 0.0, 1: 0.0},
'GENDER_DESC:F': {0: 0.0, 1: 1.0},
'GENDER_DESC:M': {0: 1.0, 1: 0.0},
'MODEL_TYPE:FED EMP': {0: 0.0, 1: 0.0},
'MODEL_TYPE:HCPROV': {0: 0.0, 1: 0.0},
'MODEL_TYPE:IPA': {0: 0.0, 1: 0.0},
'MODEL_TYPE:MED A': {0: 0.0, 1: 0.0},
'MODEL_TYPE:MED ADVG': {0: 0.0, 1: 0.0},
'MODEL_TYPE:MED B': {0: 1.0, 1: 0.0},
'MODEL_TYPE:MED SNPG': {0: 0.0, 1: 0.0},
'MODEL_TYPE:MED UNSP': {0: 0.0, 1: 1.0},
'MODEL_TYPE:MEDICAID': {0: 0.0, 1: 0.0},
'MODEL_TYPE:MEDICARE': {0: 0.0, 1: 0.0},
'MODEL_TYPE:PPO': {0: 0.0, 1: 0.0},
'MODEL_TYPE:TPA': {0: 0.0, 1: 0.0},
'MODEL_TYPE:UNSPEC': {0: 0.0, 1: 0.0},
'MODEL_TYPE:WORK COMP': {0: 0.0, 1: 0.0},
'Multiple_Cancer_Flag:No': {0: 1.0, 1: 1.0},
'Multiple_Cancer_Flag:Yes': {0: 0.0, 1: 0.0},
'PATIENT_AGE_GROUP 30-65': {0: 0.0, 1: 0.0},
'PATIENT_AGE_GROUP 65-69': {0: 0.0, 1: 0.0},
'PATIENT_AGE_GROUP 69-71': {0: 1.0, 1: 0.0},
'PATIENT_AGE_GROUP 71-77': {0: 0.0, 1: 0.0},
'PATIENT_AGE_GROUP 77-85': {0: 0.0, 1: 1.0},
'PATIENT_LOCATION:ARIZONA': {0: 0.0, 1: 0.0},
'PATIENT_LOCATION:CALIFORNIA': {0: 0.0, 1: 0.0},
'PATIENT_LOCATION:CONNECTICUT': {0: 0.0, 1: 0.0},
'PATIENT_LOCATION:DELAWARE': {0: 0.0, 1: 0.0},
'PATIENT_LOCATION:FLORIDA': {0: 0.0, 1: 0.0},
'PATIENT_LOCATION:GEORGIA': {0: 0.0, 1: 0.0},
'PATIENT_LOCATION:IOWA': {0: 0.0, 1: 0.0},
'PATIENT_LOCATION:KANSAS': {0: 0.0, 1: 0.0},
'PATIENT_LOCATION:KENTUCKY': {0: 0.0, 1: 0.0},
'PATIENT_LOCATION:LOUISIANA': {0: 0.0, 1: 0.0},
'PATIENT_LOCATION:MARYLAND': {0: 0.0, 1: 0.0},
'PATIENT_LOCATION:MASSACHUSETTS': {0: 0.0, 1: 0.0},
'PATIENT_LOCATION:MICHIGAN': {0: 0.0, 1: 0.0},
'PATIENT_LOCATION:MINNESOTA': {0: 0.0, 1: 0.0},
'PATIENT_LOCATION:MISSISSIPPI': {0: 0.0, 1: 0.0},
'PATIENT_LOCATION:MISSOURI': {0: 0.0, 1: 0.0},
'PATIENT_LOCATION:NEBRASKA': {0: 0.0, 1: 0.0},
'PATIENT_LOCATION:NEW JERSEY': {0: 0.0, 1: 0.0},
'PATIENT_LOCATION:NEW MEXICO': {0: 1.0, 1: 0.0},
'PATIENT_LOCATION:NEW YORK': {0: 0.0, 1: 0.0},
'PATIENT_LOCATION:OKLAHOMA': {0: 0.0, 1: 0.0},
'PATIENT_LOCATION:OREGON': {0: 0.0, 1: 0.0},
'PATIENT_LOCATION:PENNSYLVANIA': {0: 0.0, 1: 0.0},
'PATIENT_LOCATION:SOUTH CAROLINA': {0: 0.0, 1: 0.0},
'PATIENT_LOCATION:TENNESSEE': {0: 0.0, 1: 0.0},
'PATIENT_LOCATION:TEXAS': {0: 0.0, 1: 0.0},
'PATIENT_LOCATION:VIRGINIA': {0: 0.0, 1: 0.0},
'PATIENT_LOCATION:WASHINGTON': {0: 0.0, 1: 1.0},
'PAYER_TYPE:Commercial': {0: 0.0, 1: 0.0},
'PAYER_TYPE:Managed Medicaid': {0: 0.0, 1: 0.0},
'PAYER_TYPE:Medicare': {0: 1.0, 1: 0.0},
'PAYER_TYPE:Medicare D': {0: 0.0, 1: 1.0},
'PLAN_NAME:ALL OTHER THIRD PARTY': {0: 0.0, 1: 0.0},
'PLAN_NAME:BCBS FL UNSPECIFIED': {0: 0.0, 1: 0.0},
'PLAN_NAME:BCBS MI MEDICARE D GENERAL (MI)': {0: 0.0, 1: 0.0},
'PLAN_NAME:BCBS TEXAS GENERAL (TX)': {0: 0.0, 1: 0.0},
'PLAN_NAME:BLUE CARE (MS)': {0: 0.0, 1: 0.0},
'PLAN_NAME:BLUE PREFERRED PPO (AZ)': {0: 0.0, 1: 0.0},
'PLAN_NAME:CMMNWLTH CRE MED SNP GENERAL(MA)': {0: 0.0, 1: 0.0},
'PLAN_NAME:DEPT OF VETERANS AFFAIRS': {0: 0.0, 1: 0.0},
'PLAN_NAME:EMBLEMHEALTH/HIP/GHI UNSPEC': {0: 0.0, 1: 0.0},
'PLAN_NAME:ESSENCE MED ADV GENERAL (MO)': {0: 0.0, 1: 0.0},
'PLAN_NAME:HEALTH NET MED D GENERAL (OR)': {0: 0.0, 1: 0.0},
'PLAN_NAME:HIGHMARK UNSPECIFIED': {0: 0.0, 1: 0.0},
'PLAN_NAME:HUMANA MED D GENERAL(MN)': {0: 0.0, 1: 0.0},
'PLAN_NAME:HUMANA-UNSPECIFIED': {0: 0.0, 1: 0.0},
'PLAN_NAME:KEYSTONE FIRST (PA)': {0: 0.0, 1: 0.0},
'PLAN_NAME:MEDICARE A': {0: 0.0, 1: 0.0},
'PLAN_NAME:MEDICARE A KENTUCKY (KY)': {0: 0.0, 1: 0.0},
'PLAN_NAME:MEDICARE A MINNESOTA (MN)': {0: 0.0, 1: 0.0},
'PLAN_NAME:MEDICARE B': {0: 0.0, 1: 0.0},
'PLAN_NAME:MEDICARE B ARIZONA (AZ)': {0: 0.0, 1: 0.0},
'PLAN_NAME:MEDICARE B IOWA (IA)': {0: 0.0, 1: 0.0},
'PLAN_NAME:MEDICARE B KANSAS (KS)': {0: 0.0, 1: 0.0},
'PLAN_NAME:MEDICARE B NEW MEXICO (NM)': {0: 1.0, 1: 0.0},
'PLAN_NAME:MEDICARE B PENNSYLVANIA (PA)': {0: 0.0, 1: 0.0},
'PLAN_NAME:MEDICARE B TEXAS (TX)': {0: 0.0, 1: 0.0},
'PLAN_NAME:MEDICARE B VIRGINIA (VA)': {0: 0.0, 1: 0.0},
'PLAN_NAME:MEDICARE UNSP': {0: 0.0, 1: 0.0},
'PLAN_NAME:MOLINA HEALTHCARE (FL)': {0: 0.0, 1: 0.0},
'PLAN_NAME:OPTUMHEALTH PHYSICAL HEALTH': {0: 0.0, 1: 0.0},
'PLAN_NAME:PACIFICSOURCE HP MED ADV GNRL': {0: 0.0, 1: 0.0},
'PLAN_NAME:PAI PLANNED ADMIN INC (SC)': {0: 0.0, 1: 0.0},
'PLAN_NAME:PEOPLES HLTH NETWORK': {0: 0.0, 1: 0.0},
'PLAN_NAME:THE COVENTRY CORP UNSPECIFIED': {0: 0.0, 1: 0.0},
'PLAN_NAME:UHC/PAC/AARP MED D GENERAL (FL)': {0: 0.0, 1: 0.0},
'PLAN_NAME:UHC/PAC/AARP MED D GENERAL (MD)': {0: 0.0, 1: 0.0},
'PLAN_NAME:UHC/PAC/AARP MED D GENERAL (NY)': {0: 0.0, 1: 0.0},
'PLAN_NAME:UHC/PAC/AARP MED D GENERAL (TX)': {0: 0.0, 1: 0.0},
'PLAN_NAME:UHC/PAC/AARP MED D GENERAL (WA)': {0: 0.0, 1: 1.0},
'PLAN_NAME:UNITED HLTHCARE-(CT) CT PPO': {0: 0.0, 1: 0.0},
'PLAN_NAME:UNITED HLTHCARE-(NE) MIDLANDS': {0: 0.0, 1: 0.0},
'PLAN_NAME:UNITED HLTHCARE-UNSPECIFIED': {0: 0.0, 1: 0.0},
'PLAN_NAME:UNITED MEDICAL RESOURCES/UMR': {0: 0.0, 1: 0.0},
'PLAN_NAME:WORKERS COMP - EMPLOYER': {0: 0.0, 1: 0.0},
'PRI_SPECIALTY_DESC:DERMATOLOGY': {0: 0.0, 1: 0.0},
'PRI_SPECIALTY_DESC:HEMATOLOGY/ONCOLOGY': {0: 1.0, 1: 0.0},
'PRI_SPECIALTY_DESC:INTERNAL MEDICINE': {0: 0.0, 1: 0.0},
'PRI_SPECIALTY_DESC:MEDICAL ONCOLOGY': {0: 0.0, 1: 1.0},
'PRI_SPECIALTY_DESC:NURSE PRACTITIONER': {0: 0.0, 1: 0.0},
'PRI_SPECIALTY_DESC:OBSTETRICS & GYNECOLOGY': {0: 0.0, 1: 0.0},
'PROVIDER_LOCATION:ARIZONA': {0: 0.0, 1: 0.0},
'PROVIDER_LOCATION:CALIFORNIA': {0: 0.0, 1: 0.0},
'PROVIDER_LOCATION:CONNECTICUT': {0: 0.0, 1: 0.0},
'PROVIDER_LOCATION:DELAWARE': {0: 0.0, 1: 0.0},
'PROVIDER_LOCATION:FLORIDA': {0: 0.0, 1: 0.0},
'PROVIDER_LOCATION:IOWA': {0: 0.0, 1: 0.0},
'PROVIDER_LOCATION:KANSAS': {0: 0.0, 1: 0.0},
'PROVIDER_LOCATION:KENTUCKY': {0: 0.0, 1: 0.0},
'PROVIDER_LOCATION:LOUISIANA': {0: 0.0, 1: 0.0},
'PROVIDER_LOCATION:MASSACHUSETTS': {0: 0.0, 1: 0.0},
'PROVIDER_LOCATION:MICHIGAN': {0: 0.0, 1: 0.0},
'PROVIDER_LOCATION:MINNESOTA': {0: 0.0, 1: 0.0},
'PROVIDER_LOCATION:MISSISSIPPI': {0: 0.0, 1: 0.0},
'PROVIDER_LOCATION:MISSOURI': {0: 0.0, 1: 0.0},
'PROVIDER_LOCATION:NEBRASKA': {0: 0.0, 1: 0.0},
'PROVIDER_LOCATION:NEW MEXICO': {0: 1.0, 1: 0.0},
'PROVIDER_LOCATION:NEW YORK': {0: 0.0, 1: 0.0},
'PROVIDER_LOCATION:OREGON': {0: 0.0, 1: 0.0},
'PROVIDER_LOCATION:PENNSYLVANIA': {0: 0.0, 1: 0.0},
'PROVIDER_LOCATION:SOUTH CAROLINA': {0: 0.0, 1: 0.0},
'PROVIDER_LOCATION:TENNESSEE': {0: 0.0, 1: 0.0},
'PROVIDER_LOCATION:TEXAS': {0: 0.0, 1: 0.0},
'PROVIDER_LOCATION:VIRGINIA': {0: 0.0, 1: 0.0},
'PROVIDER_LOCATION:WASHINGTON': {0: 0.0, 1: 1.0},
'PROVIDER_TYP_DESC:PROFESSIONAL': {0: 1.0, 1: 1.0},
'Region:MIDWEST': {0: 0.0, 1: 0.0},
'Region:NORTHEAST': {0: 0.0, 1: 0.0},
'Region:SOUTH': {0: 0.0, 1: 0.0},
'Region:WEST': {0: 1.0, 1: 1.0},
'Vials Consumption == 1': {0: 0.0, 1: 0.0},
'Vials_Consumption_GROUP 1-2': {0: 0.0, 1: 0.0},
'Vials_Consumption_GROUP 12-91': {0: 0.0, 1: 0.0},
'Vials_Consumption_GROUP 2-3': {0: 0.0, 1: 0.0},
'Vials_Consumption_GROUP 3-6': {0: 0.0, 1: 1.0},
'Vials_Consumption_GROUP 6-12': {0: 1.0, 1: 0.0},
'keytruda_flag:No': {0: 1.0, 1: 1.0},
'keytruda_flag:Yes': {0: 0.0, 1: 0.0},
'libtayo_flag:No': {0: 0.0, 1: 0.0},
'libtayo_flag:Yes': {0: 1.0, 1: 1.0},
'optivo_flag:No': {0: 1.0, 1: 1.0},
'optivo_flag:Yes': {0: 0.0, 1:
0.0}}
This is a transactional matrix. Rules are created out of this using this:
from mlxtend.frequent_patterns import apriori
frequent_itemsets = apriori(train_bucket, min_support=0.2, use_colnames=True)
print (frequent_itemsets)
And create rules using this:
from mlxtend.frequent_patterns import association_rules
association_rules(frequent_itemsets, metric="confidence", min_threshold=0.7)
rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1.2)
print (len(rules["antecedents"]))
It gives 10k rules. I need to be able to visualize these. I tried using this:
https://intelligentonlinetools.com/blog/2018/02/10/how-to-create-data-visualization-for-association-rules-in-data-mining/
I tried the networkX example and it gives this:
If I plot all, it becomes cluttered.
I thought of applying t-SNE but that doesn't make quite sense to be used on initial transactional matrix. Tried it this way
import numpy as np
from sklearn.manifold import TSNE
X = train_bucket
X_embedded = TSNE(n_components=2).fit_transform(X)
X_embedded.shape
from sklearn.manifold import TSNE
from matplotlib import pyplot as plt
import seaborn as sns
sns.set(rc={'figure.figsize':(11.7,8.27)})
palette = sns.color_palette("bright", 10)
sns.scatterplot(X_embedded[:,0], X_embedded[:,1], legend='full', palette=palette)
I have no idea how to make sense of it. What are some options that I can explore?

Related

Creating a categorical variable from two dummy variables

I have the following data;
{'ID': {0: 5531.0, 1: 2658.0, 2: 5365.0, 3: 4468.0, 4: 3142.0},
'FEMALE': {0: 1.0, 1: 0.0, 2: 0.0, 3: 0.0, 4: 1.0},
'MALE': {0: 0.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 0.0},
'AGE': {0: 45.0, 1: 40.0, 2: 38.0, 3: 43.0, 4: 38.0},
'S': {0: 12.0, 1: 12.0, 2: 15.0, 3: 13.0, 4: 18.0}}
Where MALE is a dummy equal to one if the individual is male, 0 otherwise. The same for FEMALE.
I want to create a new variable, Gender, which is categorical. If MALE==1 then Gender = Male, if FEMALE==1 then Gender = Female. The purpose is to allow for a clear twoway scatter plot seperated by gender. I can do this currently, but the legend is hard to understand.
I tried the following;
import numpy as np
import pandas as pd
stata_data_P1 = pd.DataFrame({'ID': {0: 5531.0, 1: 2658.0, 2: 5365.0, 3: 4468.0, 4: 3142.0}, 'FEMALE': {0: 1.0, 1: 0.0, 2: 0.0, 3: 0.0, 4: 1.0}, 'MALE': {0: 0.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 0.0}, 'AGE': {0: 45.0, 1: 40.0, 2: 38.0, 3: 43.0, 4: 38.0}, 'S': {0: 12.0, 1: 12.0, 2: 15.0, 3: 13.0, 4: 18.0}})
stata_data_P1['Gender'] = np.where(stata_data_P1['MALE'] == '1', 'Female', 'Male')
stata_data_P1.head()
But from stata_data_P1.head() we can see it doesn't seem to have taken on board my command for true and false values.
Any Help would be greatly appreciated.
First use assign method to create new column then use idxmax just in MALE and FEMALE columns to return the index for the maximum value in each row.
Code:
stata_data_P1.assign(GENDER=lambda df_: df_.loc[:, ["MALE", "FEMALE"]].idxmax(axis=1))
Documentation:
Pandas - idxmax
Pandas - assign

Fine-tuning Token Classification with custom entities: “UndefinedMetricWarning: Precision and F-score are ill-defined”

I’m attempting to fine-tune distilbert-base-uncased model for token classification with custom entities. The dataset has the annotated tags in IOB-format.
I imported and created a huggingface DatasetDict following the documentation. Each dataset Features is defined as below:
Features({
'id': Value(dtype='int32', id=None),
'tokens': Sequence(feature=Value(dtype='string', id=None), id=None),
'ner_tags': Sequence(feature=ClassLabel(num_classes=ner_tags_num_classes,
names=ner_tags_names, id=None), id=None)
})
And ner tags mapping is the following:
{0: 'O', 1: 'B-product', 2: 'I-product', 3: 'B-field', 4: 'I-field',
5: 'B-task', 6: 'I-task', 7: 'B-researcher', 8: 'I-researcher', 9: 'B-university',
10: 'B-programlang', 11: 'B-algorithm', 12: 'I-algorithm', 13: 'B-misc', 14: 'I-misc',
15: 'I-university', 16: 'B-metrics', 17: 'B-organisation', 18: 'I-organisation', 19: 'I-metrics',
20: 'B-conference', 21: 'I-conference', 22: 'B-country', 23: 'I-programlang', 24: 'B-location',
25: 'B-person', 26: 'I-person', 27: 'I-country', 28: 'I-location'}
I followed this tutorial for every step.
However, when it comes to compute the metric by using seqeval on the test set, this is the output that I get:
/opt/conda/lib/python3.7/site-packages/seqeval/metrics/v1.py:57: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples.
Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
{'algorithm': {'precision': 0.0, 'recall': 0.0, 'f1': 0.0, 'number': 191},
'conference': {'precision': 0.0, 'recall': 0.0, 'f1': 0.0, 'number': 391},
'country': {'precision': 0.0, 'recall': 0.0, 'f1': 0.0, 'number': 57},
'field': {'precision': 0.0, 'recall': 0.0, 'f1': 0.0, 'number': 93},
'location': {'precision': 0.0, 'recall': 0.0, 'f1': 0.0, 'number': 412},
'metrics': {'precision': 0.0, 'recall': 0.0, 'f1': 0.0, 'number': 521},
'misc': {'precision': 0.0, 'recall': 0.0, 'f1': 0.0, 'number': 181},
'organisation': {'precision': 0.0, 'recall': 0.0, 'f1': 0.0, 'number': 67},
'person': {'precision': 0.0, 'recall': 0.0, 'f1': 0.0, 'number': 219},
'product': {'precision': 0.0, 'recall': 0.0, 'f1': 0.0, 'number': 177},
'programlang': {'precision': 0.0, 'recall': 0.0, 'f1': 0.0, 'number': 201},
'researcher': {'precision': 0.0, 'recall': 0.0, 'f1': 0.0, 'number': 207},
'task': {'precision': 0.0, 'recall': 0.0, 'f1': 0.0, 'number': 44},
'university': {'precision': 0.0, 'recall': 0.0, 'f1': 0.0, 'number': 183},
'overall_precision': 0.0,
'overall_recall': 0.0,
'overall_f1': 0.0,
'overall_accuracy': 0.6539285236246821}
I have absolutely no idea how to solve this problem.
Is the model performing so bad that I get ill-defined precision and f-score?
Did I commit some error when I created the dataset?
Do I have to look at the fine-tuning part of the code or only the evaluation part?
Is there another way to evaluate the model using a test set which is a tensorflow dataset?

Extracting the values from nested dictionary in python

d= {0: {'Name': 'MN', 'volt': 1.0, 'an': 0.0},
1: {'Name': 'LT', 'volt': 1.0, 'an': 5.8},
2: {'Name': 'CK', 'volt': 1.0, 'an': 2.72},
3: {'Name': 'QL', 'volt': 1.0, 'an': 0.33}}
I am trying to create a matrix from the above nested dictionary that would result in the following output:
[MN 1.0 0.0
LT 1.0 5.8
CK 1.0 2.72
QL 1.0 0.33]
Using .values() function in python would result in the key names as well.
I guess you can them in a list like this
>>> d = {0: {'Name': 'MN', 'volt': 1.0, 'an': 0.0},
1: {'Name': 'LT', 'volt': 1.0, 'an': 5.8},
2: {'Name': 'CK', 'volt': 1.0, 'an': 2.72},
3: {'Name': 'QL', 'volt': 1.0, 'an': 0.33}}
>>> [list(i.values()) for i in d.values()]
[['MN', 1.0, 0.0],
['LT', 1.0, 5.8],
['CK', 1.0, 2.72],
['QL', 1.0, 0.33]]
You iterate over the values of the outer dictionary and then again, extract the values of interior dictionaries via i.values during iteration, store them in a list with list-comprehension.
If you want them flattened in a list
>>> sum([list(i.values()) for i in d.values()], [])
['MN', 1.0, 0.0, 'LT', 1.0, 5.8, 'CK', 1.0, 2.72, 'QL', 1.0, 0.33]

Applying a lambda function with three arguments within a Group By

Currently attempting to create a function where I divide columns in my DataFrame called DF_1 and group them by a dimension column in the same DataFrame.
The below code is attempting to achieve this by first grouping by the dimension column and applying a lambda function to each of the columns that I am trying to divide in order to get the average of each of the metrics i.e. cost per conversions, or cost per click.
Unfortunately, I am unsure how to accomplish this. The below code gives an error of TypeError: lambda() takes 2 positional arguments but 3 were given
calc_1 = DF_1[['Conversions_10D', 'Total_Revenue', 'Total_Revenue', 'Clicks', 'Spend']]
calc_2 = DF_1[['Impressions', 'Spend', 'Conversions_10D', 'Impressions', 'Clicks' ]]
def agg_avg(df, group_field, list_a, list_b):
grouped = df.groupby(group_field, as_index = False).apply(lambda x, y: x/y, list_a, list_b)
grouped = pd.DataFrame(grouped).reset_index(drop = True)
return grouped
{'Date': {0: '2018-02-28', 1: '2018-02-28', 2: '2018-02-28', 3: '2018-02-28', 4: '2018-02-28'}, 'Audience_Category': {0: 'Affinity', 1: 'Affinity', 2: 'Affinity', 3: 'Affinity', 4: 'Affinity'},
'Demo': {0: 'F25-34', 1: 'F25-34', 2: 'F25-34', 3: 'F25-34', 4: 'F25-34'}, 'Gender': {0: 'Female', 1: 'Female', 2: 'Female', 3: 'Female', 4: 'Female'},
'Device': {0: 'Android', 1: 'Android', 2: 'Android', 3: 'Android', 4: 'Android'},
'Creative': {0: 'Bubble:15', 1: 'Bubble:30', 2: 'Wide :15', 3: 'Oscar :15', 4: 'Oscar :30'},
'Impressions': {0: 3834, 1: 3588, 2: 3831, 3: 3876, 4: 3676},
'Clicks': {0: 2.0, 1: 0.0, 2: 4.0, 3: 2.0, 4: 1.0},
'Conversions_10D': {0: 0.0, 1: 0.0, 2: 0.0, 3: 0.0, 4: 0.0},
'Total_Revenue': {0: 0.0, 1: 0.0, 2: 0.0, 3: 0.0, 4: 0.0}, 'Spend': {0: 28.600707059999991, 1: 25.95319236000001, 2: 28.29383795999998, 3: 29.287063200000013, 4: 26.514734159999968},
'Demo_Category': {0: 'Narrow', 1: 'Broad', 2: 'Narrow', 3: 'Broad', 4: 'Narrow'}
'CPM_Efficiency': {0: 'Low CPM', 1: 'Low CPM', 2: 'Low CPM', 3: 'Low CPM', 4: 'Low CPM'}}

How to create a list from existing list in python

I am having a list in below format. How can i create another list from the existing one with just selected elements.
[{'UserDiscount': 0.0, 'CostTotalInvEOPAmount': 940.0, 'WeekEndingData': u'2016-10-08', 'WeeksOnHand': 0.0, 'UnitTotalInvEOPQuantity': 250.0, 'WeeksOfSales': 0.0, 'UnitCostAmount': 3.76, 'Week': u'2016 Wk 36', 'CostReceiptAmount': 940.0, 'UnitSalesQuantity': 0.0, 'UnitReceiptQuantity': 250.0, 'Demand': 0.0, 'InventoryBOP': 0.0, 'PEMDiscount': 0.0, 'ElasticLift': 0.0, 'StoreCount': 0, 'PriceStatus': 4, 'UnitOnOrderQuantity': None, 'ReceiptSizeContributions': [{u'sizeId': u'e1656ac7-1cc1-40ce-b485-989bba9d758d', u'contribution': 1.0}], 'CostSalesAmount': 0.0, 'LifeCycleProperties': {u'IsAtRegularPrice': False, u'IsAtMarkdown': False, u'IsFinished': False, u'IsPreSeason': True}, 'MardownDiscount': 0.0, 'RecommendedReceipt': 250.0, 'RecommendedReceiptSizeContributions': [{u'sizeId': u'e1656ac7-1cc1-40ce-b485-989bba9d758d', u'contribution': 1.0}], 'UnitTotalInvBOPQuantity': 0.0, 'CostOnOrderAmount': None, 'InventoryEOP': 250.0, 'CostTotalInvBOPAmount': 0.0, 'Receipt': 250.0, 'Sales': 0.0, 'LostSales': 0.0, 'TotalDiscount': 0.0, 'RetailSalesAmount': 0.0},
{'UserDiscount': 0.0, 'CostTotalInvEOPAmount': 940.0, 'WeekEndingData': u'2016-10-15', 'WeeksOnHand': 0.0, 'UnitTotalInvEOPQuantity': 250.0, 'WeeksOfSales': 15.784951285314385, 'UnitCostAmount': 3.76, 'Week': u'2016 Wk 37', 'CostReceiptAmount': 0.0, 'UnitSalesQuantity': 0.0, 'UnitReceiptQuantity': 0.0, 'Demand': 0.0, 'InventoryBOP': 250.0, 'PEMDiscount': 0.0, 'ElasticLift': 0.0, 'StoreCount': 0, 'PriceStatus': 4, 'UnitOnOrderQuantity': None, 'ReceiptSizeContributions': [], 'CostSalesAmount': 0.0, 'LifeCycleProperties': {u'IsAtRegularPrice': False, u'IsAtMarkdown': False, u'IsFinished': False, u'IsPreSeason': True}, 'MardownDiscount': 0.0, 'RecommendedReceipt': 0.0, 'RecommendedReceiptSizeContributions': [], 'UnitTotalInvBOPQuantity': 250.0, 'CostOnOrderAmount': None, 'InventoryEOP': 250.0, 'CostTotalInvBOPAmount': 940.0, 'Receipt': 0.0, 'Sales': 0.0, 'LostSales': 0.0, 'TotalDiscount': 0.0, 'RetailSalesAmount': 0.0}]
My new list will having below elements.
[{'UserDiscount': 0.0, 'CostTotalInvEOPAmount': 940.0, 'WeekEndingData': u'2016-10-08', 'WeeksOnHand': 0.0, 'UnitTotalInvEOPQuantity': 250.0, 'WeeksOfSales': 0.0, 'UnitCostAmount': 3.76, 'Week': u'2016 Wk 36', 'CostReceiptAmount': 940.0, 'UnitSalesQuantity': 0.0, 'UnitReceiptQuantity': 250.0, 'Demand': 0.0, 'InventoryBOP': 0.0, 'PEMDiscount': 0.0, 'ElasticLift': 0.0, 'StoreCount': 0, 'PriceStatus': 4, 'UnitOnOrderQuantity': None, 'CostSalesAmount': 0.0, 'RecommendedReceipt': 250.0, 'RetailSalesAmount': 0.0},
{'UserDiscount': 0.0, 'CostTotalInvEOPAmount': 940.0, 'WeekEndingData': u'2016-10-15', 'WeeksOnHand': 0.0, 'UnitTotalInvEOPQuantity': 250.0, 'WeeksOfSales': 15.784951285314385, 'UnitCostAmount': 3.76, 'Week': u'2016 Wk 37', 'CostReceiptAmount': 0.0, 'UnitSalesQuantity': 0.0, 'UnitReceiptQuantity': 0.0, 'Demand': 0.0, 'InventoryBOP': 250.0, 'PEMDiscount': 0.0, 'ElasticLift': 0.0, 'StoreCount': 0, 'PriceStatus': 4, 'UnitOnOrderQuantity': None, 'CostSalesAmount': 0.0, 'RecommendedReceipt': 0.0, 'RetailSalesAmount': 0.0}]
You have a list with two dictionaries. To filter the dictionaries you can try
keep=[key1,key2] #keys you wanna keep
newList = []
for item in mylist:
d = dict((key,value) for key, value in item.iteritems() if key in keep)
newlist.append(d)
del mylist
Also using funcy you can do a
import funcy
mydict={1:1,2:2,3:3}
keep=[1,2]
funcy.project(mydict,keep)
=> {1: 1, 2: 2}
which is much prettier imho.
You could use the list comprehension https://docs.python.org/3/tutorial/datastructures.html#list-comprehensions
[l for l in your_list if l['UserDiscount'] >= 1 ]
[{'UserDiscount': l['UserDiscount'],'CostTotalInvEOPAmount': l['CostTotalInvEOPAmount']} for l in your_list ]
Using this way you can filter the elements in your list and change the structure of your dicts in the list

Categories

Resources