Neatest ways to extract pairs from pandas DataFrame - python

Given the following pandas DataFrame:
mydf = pd.DataFrame([{'Campaign': 'Campaign X', 'Date': '24-09-2014', 'Spend': 1.34, 'Clicks': 241}, {'Campaign': 'Campaign Y', 'Date': '24-08-2014', 'Spend': 2.89, 'Clicks': 12}, {'Campaign': 'Campaign X', 'Date': '24-08-2014', 'Spend': 1.20, 'Clicks': 1}, {'Campaign': 'Campaign Z2', 'Date': '24-08-2014', 'Spend': 4.56, 'Clicks': 13}] )
I wish to first extract Campaign-Spend pairs, first summing where applicable when a campaign has multiple entries (as is the case for campaign X in this example). With minimal pandas knowledge, I find myself doing:
summed = mydf.groupby('Campaign', as_index=False).sum()
campaignspends = zip(summed['Campaign'], summed['Spend'])
campaignspends = dict(campaignspends)
I'm guessing pandas or python itself has a one-liner for this?

You can pull out the column of interest from a groupby object using ["Spend"]:
>>> campaignspends
{'Campaign Y': 2.8900000000000001, 'Campaign Z2': 4.5599999999999996, 'Campaign X': 2.54}
>>> mydf.groupby("Campaign")["Spend"].sum()
Campaign
Campaign X 2.54
Campaign Y 2.89
Campaign Z2 4.56
Name: Spend, dtype: float64
>>> mydf.groupby("Campaign")["Spend"].sum().to_dict()
{'Campaign Y': 2.8900000000000001, 'Campaign Z2': 4.5599999999999996, 'Campaign X': 2.54}
Here I've added the to_dict() call (dict(mydf..etc) will also work), although note that depending on what you're planning to do next, you might not need to convert from a Series to a dictionary at all. For example,
>>> s = mydf.groupby("Campaign")["Spend"].sum()
>>> s["Campaign Z2"]
4.5599999999999996
works as you'd expect.

Related

How to find a single entry in a Python dictionary rather than a whole line?

I'm looking for a way to to return a single entry from a dictionary (or similar). For example I have a dictionary that looks something like this:
StockTable = {
"Code 1": ["Description 1", 100, 1.25],
"Code 2": ["Description 2", 200, 2.25],
}
For reference the columns are Code / Description / Price / Weight
I want to be able to use single aspects of that data (say description), in multiple ways, elsewhere; for example:
TreeView.insert("", END, values= (Value1, StockTable["Code1"], Value3))
In the above I only want to carry over the Description. Is there a way of indexing a cell within a dictionary and returning a single value?
Thanks
Use a better dictionary.
>>> StockTable = {k: dict(zip(('Description', 'Price', 'Weight'), v)) for k, v in StockTable.items()}
>>> StockTable
{'Code 1': {'Description': 'Description 1', 'Price': 100, 'Weight': 1.25}, 'Code 2': {'Description': 'Description 2', 'Price': 200, 'Weight': 2.25}}
>>> StockTable['Code 1']['Description']
'Description 1'
... or pandas.
>>> import pandas as pd
>>> df = pd.DataFrame(StockTable).T
>>> df
Description Price Weight
Code 1 Description 1 100 1.25
Code 2 Description 2 200 2.25
>>> df.at['Code 1', 'Description']
'Description 1'

Columns with mixed datatype to be saved as str and jsonb in postgres with python

I need your advice but don't be horrified with the code below, please.
Situation: I call an API to retrieve the sales information. The response looks like the following:
[{'Id': 123,
'Currency': 'USD',
'SalesOrder': [{'Price': 2,
'Subitem': 1,
'Discount': 0.0,
'OrderQuantity': 1.0},
{'Price': 3,
'Subitem': 2,
'Discount': 0.0,
'OrderQuantity': 2.0}],
'Tax': 18},
{'Id': 124,
'Currency': 'USD',
'SalesOrder': [{'Price': 2,
'Subitem': 1,
'Discount': 0.0,
'OrderQuantity': 1.0},
{'Price': 3,
'Subitem': 2,
'Discount': 0.0,
'OrderQuantity': 2.0}],
'Tax': 18}]
Expected outcome: 1. 'Id' is a stand-alone column; 'Currency' is a stand-alone column. 2. As there could be a different number of 'Subitems', I thought of adding 'SalesOrder' as a json blob in postgres and then, query the json column. Thus, the end result is a postgres table with three columns.
id =[]
currency = []
salesOrder = []
#extracting values
for item in df:
id.append(item.get("Id")
currency.append(item.get("Currency"))
salesOrders.append(item.get("SalesOrder"))
#converting to a pandas df
df_id = pd.DataFrame(id)
df_currency = pd.DataFrame(currency)
df_sales_order = pd.DataFrame(salesOrder)
#concatenating cols
df_row = pd.concat([df_id, df_currency, df_sales_order], axis = 1)
#outputting results to a table
engine = create_engine('postgresql+psycopg2://username:password#endpoint/db')
with engine.connect() as conn, conn.begin():
df_row.to_sql('tbl', con=conn, schema='schema', if_exists='append', index = False)
Doubts: 1. If I try to implement the code above, the 'SalesOrder' list gets split into an X number of columns. Why so? How can I avoid it and keep it together?
2. I am not sure how to proceed with the mixture of data types (str + jsonb). Shall I load 'non-json' columns and then, update the table with the json column?
Instead of doing this "df_sales_order = pd.DataFrame(salesOrder)
", just create a column in the "df_currency" like df_currency["sales_order"] and fill it with the "item.get("SalesOrder")". This should solve the issue.

Merge dict in for loop

I want to create a new data frame using for loop.
for (name, series) in Quantitative.iteritems():
data = {'Name': pd.Series(name), 'Count': pd.Series(Quantitative[name].size),
'% Miss.': pd.Series((sum(Quantitative[name].isnull().values.ravel()) / Quantitative[name].size) * 100),
'Card.': pd.Series(Quantitative[name].unique().size), 'Min': pd.Series(Quantitative[name].min()),
'1st Qrt.': pd.Series(Quantitative[name].quantile(0.25)), 'Mean': pd.Series(Quantitative[name].mean()),
'Median': pd.Series(Quantitative[name].median()), '3rd Qrt.': pd.Series(Quantitative[name].quantile(0.75)),
'Max': pd.Series(Quantitative[name].max()), 'Std.': pd.Series(Quantitative[name].std())}
dt = pd.DataFrame(data)
print(pd.DataFrame(dt))
However, it creates multiple dictionaries. How can I merge them together?
You don't need to create dataframe for each item in dict and then merge them. Use list comprehension to create rows list and create dataframe from it:
rows = [[
name, series.size, (sum(series.isnull().values.ravel()) / series.size) * 100,
series.unique().size, series.min(), series.quantile(0.25), series.mean(),
series.median(), series.quantile(0.75), series.max(), series.std()
] for name, series in Quantitative.iteritems()]
dt = pd.DataFrame(
rows,
columns=['Name', 'Count', '% Miss.', 'Card.', 'Min', '1st Qrt.',
'Mean', 'Median', '3rd Qrt.', 'Max', 'Std.'])
print(dt)

How to create a dict of dicts from pandas dataframe?

I have a dataframe df
id price date zipcode
u734 8923944 2017-01-05 AERIU87
uh72 9084582 2017-07-28 BJDHEU3
u029 299433 2017-09-31 038ZJKE
I want to create a dictionary with the following structure
{'id': xxx, 'data': {'price': xxx, 'date': xxx, 'zipcode': xxx}}
What I have done so far
ids = df['id']
prices = df['price']
dates = df['date']
zips = df['zipcode']
d = {'id':idx, 'data':{'price':p, 'date':d, 'zipcode':z} for idx,p,d,z in zip(ids,prices,dates,zips)}
>>> SyntaxError: invalid syntax
but I get the error above.
What would be the correct way to do this, using either
list comprehension
OR
pandas .to_dict()
bonus points: what is the complexity of the algorithm, and is there a more efficient way to do this?
I'd suggest the list comprehension.
v = df.pop('id')
data = [
{'id' : i, 'data' : j}
for i, j in zip(v, df.to_dict(orient='records'))
]
Or a compact version,
data = [dict(id=i, data=j) for i, j in zip(df.pop('id'), df.to_dict(orient='r'))]
Note that, if you're popping id inside the expression, it has to be the first argument to zip.
print(data)
[{'data': {'date': '2017-09-31',
'price': 299433,
'zipcode': '038ZJKE'},
'id': 'u029'},
{'data': {'date': '2017-01-05',
'price': 8923944,
'zipcode': 'AERIU87'},
'id': 'u734'},
{'data': {'date': '2017-07-28',
'price': 9084582,
'zipcode': 'BJDHEU3'},
'id': 'uh72'}]

Numeric sort of list of dictionary objects

I am very new to python programming and have yet to buy a textbook on the matter (I am buying one from the store or Amazon today). In the meantime, can you help me with the following problem I have encountered?
I have an list of dictionary objects like this:
stock = [
{ 'date': '2012', 'amount': '1.45', 'type': 'one'},
{ 'date': '2012', 'amount': '1.4', 'type': 'two'},
{ 'date': '2011', 'amount': '1.35', 'type': 'three'},
{ 'date': '2012', 'amount': '1.35', 'type': 'four'}
]
I would like to sort the list by the amount date column and then by the amount column so that the sorted list looks like this:
stock = [
{ 'date': '2011', 'amount': '1.35', 'type': 'three'},
{ 'date': '2012', 'amount': '1.35', 'type': 'four'},
{ 'date': '2012', 'amount': '1.4', 'type': 'two'},
{ 'date': '2012', 'amount': '1.45', 'type': 'one'}
]
I now think I need to use sorted() but as a beginner I am having difficulties understanding to concepts I see.
I tried this:
from operator import itemgetter
all_amounts = itemgetter("amount")
stock.sort(key = all_amounts)
but this resulted in an list that was sorted alphanumerically rather than numerically.
Can someone please tell me how to achieve this seemingly simple sort? Thank-you!
Your sorting condition is too complicated for an operator.itemgetter. You will have to use a lambda function:
stock.sort(key=lambda x: (int(x['date']), float(x['amount'])))
or
all_amounts = lambda x: (int(x['date']), float(x['amount']))
stock.sort(key=all_amounts)
Start by converting your data into a proper format:
stock = [
{ 'date': int(x['date']), 'amount': float(x['amount']), 'type': x['type']}
for x in stock
]
Now stock.sort(key=all_amounts) will return correct results.
As you appear to be new in programming, here's a word of general advice if I may:
Proper data structure is 90 percent of success. Do not try to work around broken data by writing more code. Create a structure adequate to your task and write as less code as possible.
You can also use the fact that python's sort is stable:
stock.sort(key=lambda x: int(x["amount"]))
stock.sort(key=lambda x: int(x["date"]))
Since the items with the same key keep their relative positions when sorting (they're never swapped), you can build up a complicated sort by sorting multiple times.

Categories

Resources