python pandas converting dataframe(s) to lists - python

I have an excel with 2 data frames , one data frame on score card and other data frame on consolidation basis
import pandas as pd
df_scr_crd = {'Subject': ['MATH', 'MATH', 'MATH', 'MATH', 'PSY', 'PSY', 'PSY', 'PSY'],
'SCR_STRT': [10, 20, 30, 99999, 'A', 'B', 'C', 'D'],
'POINTS': [100, 200, 300, 500, 10, 20, 30, 40]}
df_scr_crd_d = pd.DataFrame(df_scr_crd, columns = ['Subject', 'SCR_STRT', 'POINTS'])
df_scr_cns = {'Subject': ['MATH', 'PSY'],
'CNS': ['min', 'max']}
df_scr_cns_d = pd.DataFrame(df_scr_cns, columns = ['Subject', 'CNS'])
df_scr_crd_d
df_scr_cns_d
I want to generate lists/variable assignments from this data frame
The expected output is
MATH_df_scr_crd_bin = [10, 20, 30, 99999]
MATH_df_scr_crd_val = [100, 200, 300, 500]
PSY_df_scr_crd_bin = ['A', 'B', 'C', 'D']
PSY_df_scr_crd_val = [10, 20, 30, 40]
MATH_df_scr_cns = 'min'
MATH_df_scr_cns = 'max'
Is there any easy way to convert a data frame to lists ?
Thx in advance
Vittal

You can simply use .tolist() on the relevant series, e.g.:
>>> df_scr_crd_d.loc[df_scr_crd_d.Subject == 'MATH', 'SCR_STRT'].tolist()
[10, 20, 30, 99999]
>>> df_scr_crd_d.loc[df_scr_crd_d.Subject == 'MATH', 'POINTS'].tolist()
[100, 200, 300, 500]
For the whole dataframe, you can convert it to a dictionary keyed on the column names as follows:
>>> df_scr_crd_d.to_dict('list')
{'POINTS': [100, 200, 300, 500, 10, 20, 30, 40],
'SCR_STRT': [10, 20, 30, 99999, 'A', 'B', 'C', 'D'],
'Subject': ['MATH', 'MATH', 'MATH', 'MATH', 'PSY', 'PSY', 'PSY', 'PSY']}

Related

How to create conditional group-by with pandas?

Suppose I have a dataframe like this:
data = [['A', 'HIGH', 120, 200],
['A', 'MID', 350, 200],
['B', 'HIGH', 130, 100],
['B', 'HIGH', 70, 100],
['A', 'MID', 130, 200]]
df = pd.DataFrame(data, columns=['Category', 'Range', 'Total', 'Avg'])
Now, I want to create a Group By that when the category is A, it groups by category and Range, while when it is B, it group only by category.
Is it possible to do?
Thanks!
Check below code. It will also work B has multiple range.
import pandas as pd
import numpy as np
data = [['A', 'HIGH', 120, 200],
['A', 'MID', 350, 200],
['A', 'MID', 130, 200],
['B', 'HIGH', 130, 100],
['B', 'MID', 70, 100],
['B', 'MID', 70, 100]
]
df = pd.DataFrame(data, columns=['Category', 'Range', 'Total', 'Avg'])
df[['Total_New','Avg_New']] = df.assign(group_col = np.where(df['Category']=='A',df.Category+df.Range, df.Category)).\
groupby('group_col')['Total','Avg'].transform('sum')
df
Output:

How to create a pie-chart from pandas DataFrame?

I have a dataframe, with Count arranged in decending order, that looks something like this:
df = pd.DataFrame({'Topic': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M'],
'Count': [80, 75, 70, 65, 60, 55, 50, 45, 40, 35, 30, 25, 20]})
But with more than 50 rows.
I would like to create a pie chart for the top 10 topics and rest of them to be summed up and represent its percentange as label "Others" in the pie chart. Is it possible to exclude the pie labels against each pie, and mention them seperately in a legend?
Thanking in anticipation
Replace Topic by Other if no top N in Series.where and then aggregate sum with Series.plot.pie:
N = 10
df['Topic'] = df['Topic'].where(df['Count'].isin(df['Count'].nlargest(N)), 'Other')
s = df.groupby('Topic')['Count'].sum()
pie = df.plot.pie(y='Count', legend=False)
#https://stackoverflow.com/a/44076433/2901002
labels = [f'{l}, {s:0.1f}%' for l, s in zip(s.index, s / s.sum())]
plt.legend(bbox_to_anchor=(0.85, 1), loc='upper left', labels=labels)
You need to craft a new dataframe. Assuming your counts are sorted in descending order (if not, use df.sort_values(by='Count', inplace=True)):
TOP = 10
df2 = df.iloc[:TOP]
df2 = df2.append({'Topic': 'Other', 'Count': df['Count'].iloc[TOP:].sum()},
ignore_index=True)
df2.set_index('Topic').plot.pie(y='Count', legend=False)
Example (N=10, N=5):
Percentages in the legend:
N = 5
df2 = df.iloc[:N]
df2 = df2.append({'Topic': 'Other', 'Count': df['Count'].iloc[N:].sum()}, ignore_index=True)
df2.set_index('Topic').plot.pie(y='Count', legend=False)
leg = plt.legend(labels=df2['Count'])
output:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({'Topic': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M'],
'Count': [80, 75, 70, 65, 60, 55, 50, 45, 40, 35, 30, 25, 20]})
df.index = df.Topic
plot = df.plot.pie(y='Count', figsize=(5, 5))
plt.show()
Use documentation: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.plot.pie.html

Reshaping dataframe with multiple columns to row groups

inp Dataframe
df = pd.DataFrame({'Loc': ['Hyd', 'Hyd','Bang','Bang'],
'Item': ['A', 'B', 'A', 'B'],
'Month' : ['May','May','June','June'],
'Sales': [100, 100, 200, 200],
'Values': [1000, 1000, 2000, 2000]
})
My expected output
df = pd.DataFrame({'Loc': ['Hyd', 'Hyd','Hyd','Hyd','Bang','Bang','Bang','Bang'],
'Item': ['A', 'A', 'B', 'B','A', 'A', 'B', 'B'],
'VAR' : ['Sales','Values','Sales','Values','Sales','Values','Sales','Values'],
'May': [100, 1000, 100, 1000, 100, 1000, 100, 1000],
'June': [200, 2000, 200, 2000, 200, 2000, 200, 2000]
})
I have tried multiple solutions using melt and pivot but nothing seems to work ? not sure where I am missing ?
Here's my code
dem.melt(['Part','IBU','Date1']).pivot_table(index=['Part','IBU','variable'],columns=['Date1'])
Any help would be much appreciated
You can use melt and pivot functions in pandas:
df_melted = pd.melt(df, id_vars=["Loc", "Item", "Month"], value_vars=["Sales", "Values"])
This will result:
And then:
df_pivot = df_melted.pivot_table(index=["Loc", "Item", "variable"], columns="Month")
So, the final output will be:

Merge function will only work for ordered list

I have this 2 lists as input:
list1 = [['A', 14, 'I', 10, 20], ['B', 15, 'S', 30, 40], ['C', 16, 'F', 50, 60]]
list2 = [['A', 14, 'Y', 0, 200], ['B', 15, 'M', 0, 400], ['C', 17, 'G', 0, 600]]
and my desired output will be this:
finalList = [['A', 14, 'Y', 10, 200], ['B', 15, 'M', 30, 400], ['C', 16, 'F', 50, 60],['C', 17, 'G', 0, 600]]
Using this function:
def custom_merge(list1, list2):
finalList = []
for sub1, sub2 in zip(list1, list2):
if sub1[1]==sub2[1]:
out = sub1.copy()
out[2] = sub2[2]
out[4] = sub2[4]
finalList.append(out)
else:
finalList.append(sub1)
finalList.append(sub2)
return finalList
I will get indeed my desired output, but what if I switch positions (list2[1] and list2[2]) and my list2:
list2 = [['A', 14, 'Y', 0, 200], ['C', 17, 'G', 0, 600], ['B', 15, 'M', 0, 400]]
Then the output will be this:
[['A', 14, 'Y', 10, 200], ['B', 15, 'S', 30, 40], ['C', 17, 'G', 0, 600], ['C', 16, 'F', 50, 60], ['B', 15, 'M', 0, 400]]
(notice the extra ['B', 15, 'M', 0, 400])
What I have to modify in my function in order to get my first desired output if my lists have a different order in my list of lists!? I use python 3. Thank you!
LATER EDIT:
Merge rules:
When list1[listindex][1] == list2[listindex][1] (ex: when 14==14), replace in list1 -> list2[2] and list2[4] (ex: 'Y' and 200) and if not just add the unmatched list from list2 to list1 as it is (like in my desired output) and also keep the ones that are in list1 that aren't matched(ex: ['C', 16, 'F', 50, 60])
To be noted that list1 and list2 can have different len (list1 can have more lists than list2 or vice versa)
EDIT.2
I found this:
def combine(list1,list2):
combined_list = list1 + list2
final_dict = {tuple(i[:2]):tuple(i[2:]) for i in combined_list}
merged_list = [list(k) + list (final_dict[k]) for k in final_dict]
return merged_list
^^ That could work, still testing!
You can sort the lists by the first element in the sublists before merging them.
def custom_merge(list1, list2):
finalList = []
for sub1, sub2 in zip(sorted(list1), sorted(list2)):
if sub1[1]==sub2[1]:
out = sub1.copy()
out[2] = sub2[2]
out[4] = sub2[4]
finalList.append(out)
else:
finalList.append(sub1)
finalList.append(sub2)
return finalList
tests:
list1 = [['A', 14, 'I', 10, 20], ['B', 15, 'S', 30, 40], ['C', 16, 'F', 50, 60]]
list2 = [['A', 14, 'Y', 0, 200], ['C', 17, 'G', 0, 600], ['B', 15, 'M', 0, 400]]
custom_merge(list1, list2)
# returns:
[['A', 14, 'Y', 10, 200],
['B', 15, 'M', 30, 400],
['C', 16, 'F', 50, 60],
['C', 17, 'G', 0, 600]]

Join operation of 2 lists of lists based on index

I have this 2 list of lists:
list1 = [['A', 14, 'I', 10, 20], ['B', 15, 'S', 30, 40], ['C', 16, 'F', 50, 60]]
list2 = [['A', 14, 'Y', 0, 200], ['B', 15, 'M', 0, 400], ['C', 17, 'G', 0, 600]]
(this is just a sample with only 3 lists, I have more lists but they are on the exact same format and apply same rules)
And this will be my desired output:
finalList = [['A', 14, 'Y', 10, 200], ['B', 15, 'M', 30, 400], ['C', 16, 'F', 50, 60],['C', 17, 'G', 0, 600]]
This is the rule how I compute finalList:
When list1[listindex][1] == list2[listindex][1] (ex: when 14==14), replace in list1 -> list2[2] and list2[4] (ex: 'Y' and 200) and if not just add the unmatched list from list2 to list1 as it is (like in my desired output) and also keep the ones that are in list1 that aren't matched(ex: ['C', 16, 'F', 50, 60]).
How I can do this in a python 3 function? I would like a simple and straight forward function for this. Thank you so much for your time!
You can apply all of your rules using if statements in a function.
def custom_merge(list1, list2):
finalList = []
for sub1, sub2 in zip(list1, list2):
if sub1[1]==sub2[1]:
out = sub1.copy()
out[2] = sub2[2]
out[4] = sub2[4]
finalList.append(out)
else:
finalList.append(sub1)
finalList.append(sub2)
return finalList
For working on the two lists simultaneously you can use zip() Docs Here
For example:
for value in zip(list1, list2):
print (value[0], value[1])
will return:
['A', 14, 'I', 10, 20] ['A', 14, 'Y', 0, 200]
['B', 15, 'S', 30, 40] ['B', 15, 'M', 0, 400]
['C', 16, 'F', 50, 60] ['C', 17, 'G', 0, 600]
so using zip you can work on both your lists at the same time.
Here's one approach that converts the lists into a dict, and takes advantage of the fact that overlapping items from list2 will just overwrite their list1 counterparts:
combined_list = list1 + list2
final_dict = {tuple(i[:2]):tuple(i[2:]) for i in combined_list}
> {('A', 14): ('Y', 0, 200),
('B', 15): ('M', 0, 400),
('C', 16): ('F', 50, 60),
('C', 17): ('G', 0, 600)}
merged_list = [list(k) + list (final_dict[k]) for k in final_dict]
> [['C', 16, 'F', 50, 60],
['B', 15, 'M', 0, 400],
['C', 17, 'G', 0, 600],
['A', 14, 'Y', 0, 200]]
If the ordering of the list is important, you can just sort at the end or use an OrderedDict to create the merge in the first place.
Here's one way to do it using a list comprehension:
lst = [i for x, y in zip(list1, list2)
for i in (([*x[:2], y[2], x[3], y[4]],) if x[1] == y[1] else (x, y))]
print(lst)
# [['A', 14, 'Y', 10, 200], ['B', 15, 'M', 30, 400], ['C', 16, 'F', 50, 60], ['C', 17, 'G', 0, 600]]
The construction of the inner list for the matching case makes it slightly unreadable. It would be much readable in a 'deflattened' form with a for loop:
def merge_lists(list1, list2):
lst = []
for x, y in zip(list1, list2):
if x[1] == y[1]:
lst.append([*x[:2], y[2], x[3], y[4]])
else:
lst.extend((x, y))
return lst
Your "join" algorithm can work on each item independently and is straightforward:
def join_list(item1, item2):
if item1[1] == item2[1]:
result = item1[:]
result[2] = item2[2]
result[4] = item2[4]
return (result,)
return item1[:], item2[:]
This function always returns tuples: singletons in the cas of equality or couples in the general case.
You can apply this function to your two lists list1 and list2, using map() function. But the result will be a list of tuples (in fact a generator in Python 3), so you need to flatten the result:
list1 = [['A', 14, 'I', 10, 20], ['B', 15, 'S', 30, 40], ['C', 16, 'F', 50, 60]]
list2 = [['A', 14, 'Y', 0, 200], ['B', 15, 'M', 0, 400], ['C', 17, 'G', 0, 600]]
joined = [x
for row in map(join_list, list1, list2)
for x in row]
print(joined)
You get what you expect.

Categories

Resources