how to append 'new columns' at pivot table..? (pandas) - python

import numpy as np
import math
import pandas as pd
# making an example DataFrame
data = DataFrame({'cust_id': ['c1', 'c1', 'c1', 'c2', 'c2', 'c2', 'c3', 'c3', 'c3',
'c1', 'c1', 'c1', 'c2', 'c2', 'c2', 'c3', 'c3', 'c3'],
'step_seq': ['123', '123', '123', '123', '123', '123', '123', '123', '123',
'456','456','456','456','456','456','456','456','456'],
'grade' : ['A', 'A', 'A', 'A', 'A', 'A', 'B', 'B', 'B',
'C','C','C','C','C','C','C','C','D'],
'pch_amt': [1, 2, 3, 4, 5, 6, 7, 8, 9,
1, 2, 3, 4, 5, 6, 7, 8, 9]})
print(data)
data = pd.pivot_table(data, index='step_seq', columns='pch_amt', values='grade', aggfunc=np.sum)
a = data.iloc[0,:].tolist()
b = set(a)
len(b)
for i in range(len(data.index)):
a = data.iloc[i,:].tolist()
print(a)
b = set(a)
# Qestion1 Related
print(b)
print(len(b))
data.loc[i,'Number of types']=len(b)
data
# Qestion2 Related
Before asking questions, thank you for your help all the time.
I ask two question as above
Q1) Why second set get 'nan' ??.. and how can I remove it..?
Q2) How to make to append 'Number of types' in Coumuns(pivot) ?

Related

Function to concat undefinded number of dataframes

I'd like to create a function where I can input an undefined number of arrays, turn them into data frames ,concatenate them appending their columns and output a merged dataframe.
Example:
# Suppose we have 3 arrays:
data1 = {
'A': ['A1', 'A2', 'A3', 'A4', 'A5'],
'B': ['B1', 'B2', 'B3', 'B4', 'B5'],
'C': ['C1', 'C2', 'C3', 'C4', 'C5'],
}
data2 = {
'D': ['D1', 'D2', 'D3', 'D4', 'D5'],
'E': ['E1', 'E2', 'E3', 'E4', 'E5'],
'F': ['F1', 'F2', 'F3', 'F4', 'F5'],
}
data3 = {
'G': ['G1', 'G2', 'G3', 'G4', 'G5'],
'H': ['H1', 'H2', 'H3', 'H4', 'H5'],
'I': ['I1', 'I2', 'I3', 'I4', 'I5'],
}
# We could convert them into data frames using:
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)
df3 = pd.DataFrame(data3)
# And finally join them with:
df4 = pd.concat([df1, df2, df3], axis=1)
The output dataframe would look like this:
I would like to create a function that can do this, but with an unspecified amount of arrays, for example:
func(data1, data2)
func(data1, data2, data3)
func(data1, data2, data...n)
This is a short answer using list comprehension, provided by Ch3steR.
It works and is a very compact answer.
def func(*args): d = [pd.DataFrame(dc) for dc in args]; return pd.concat(d, axis=1)
In the end I went for a longer and slower solution, but that i will easily understand when looking at my code in the future:
def add_df(*args):
""" Function to concatenate columns of unlimited dataframes"""
list = []
for file in args:
df = pd.read_csv(file)
list.append(df)
return pd.concat(list, axis=1)

Split data in list based on condition

I have following list :
data = ['A1', 'C3', 'B2', 'A2', 'D3', 'C2', 'A3', 'D2', 'C1', 'B1', 'D1', 'B3']
I want to split the list such that
split1 = ['A1', 'C3', 'B2', 'A2', 'C2', 'A3', 'C1', 'B1', 'B3']
split2 = ['D3', 'D2', 'D1']
Constraint is that no item with same prefix(A, B, etc.) can wind up in separate list. The data can be split in any ratio like 50-50, 80-20.
Here you go:
import numpy as np
data = np.array(['A1', 'C3', 'B2', 'A2', 'D3', 'C2', 'A3', 'D2', 'C1', 'B1', 'D1', 'B3'])
# define some condition
condition = ['B', 'D']
boolean_selection = [np.any([ c in d for c in condition]) for d in data]
split1 = data[boolean_selection]
split2 = data[np.logical_not(boolean_selection)]

Concat DataFrame under specific condition

For the following dataframes which are stored in a list of lists, I want to concat them if there is something to:
df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']},
index=[0, 1, 2, 3])
fr_list = [[] for x in range(2)]
fr_list[0].append(df1)
fr_list[0].append(df1)
fr_list[1].append(df1)
for x in range(2):
df = pd.concat(fr_list[x] if len(fr_list[x]) > 1) # <-- here is the problem
The syntax you want is probably:
...
df = pd.concat((fr for fr in fr_list[x] if len(fr) > 1))

Convert dictionary to list with some data omitted

I'm trying to convert a dictionary of the format:
d = {'A1': ['a', 'a', 'A2 (A3-)', 'a'],
'B1': ['b', 'b', 'B2 (B3-)', 'b'],
'C1': ['c', 'c', 'C2 (C3)-', 'c']}
To a list of the form:
e = [['A1', 'A2', 'A3'], ['B1', 'B2', 'B3'], ['C1', 'C2', 'C3']]
I know I should use regex to get the A2 and A3 data, but I'm having trouble putting this all together...
import re
regex = re.compile(r'(\w+) \((\w+)-.*')
# I suppose that you meant (C3-) and not (C3)-
d = {'A1': ['a', 'a', 'A2 (A3-)', 'a'], 'B1': ['b', 'b', 'B2 (B3-)', 'b'], 'C1': ['c', 'c', 'C2 (C3-)', 'c']}
out = []
for key, values_list in d.items():
v2, v3 = regex.match(values_list[2]).groups()
out.append([key, v2, v3])
print(out)
# [['C1', 'C2', 'C3'], ['B1', 'B2', 'B3'], ['A1', 'A2', 'A3']]
Note that the order is random, as your original dict is unordered.

How can i make the list form list of tuples

I have this
d = \
[('a', {'b': 'c1', 'd': 'f1'}),
('a', {'bb': 'c2', 'dd': 'f2'}),
('a', {'bbb': 'c3', 'ddd': 'f3'})]
I want the ouput like this
['c1', 'f1', 'f2', 'c2', 'c3', 'f3']
I have tried this
In [51]: [a.values() for k,a in d]
Out[51]: [['c1', 'f1'], ['f2', 'c2'], ['c3', 'f3']]
I want to do that simplest and shortest possible way
>>> d = \
[('a', {'b': 'c1', 'd': 'f1'}),
('a', {'bb': 'c2', 'dd': 'f2'}),
('a', {'bbb': 'c3', 'ddd': 'f3'})]
>>> [y for x in d for y in x[1].values()]
['c1', 'f1', 'f2', 'c2', 'c3', 'f3']
You can use itertools.chain:
>>> d=[('a', {'b': 'c1', 'd': 'f1'}),
('a', {'bb': 'c2', 'dd': 'f2'}),
('a', {'bbb': 'c3', 'ddd': 'f3'})]
>>> from itertools import chain
>>> list(chain.from_iterable( x[1].values() for x in d ))
['c1', 'f1', 'f2', 'c2', 'c3', 'f3']
Just an alternative answer using reduce:
import operator
reduce(operator.add,(a.values() for k,a in d))
Maybe not the best idea, but it works. Essentially equivalent to Blender's
sum([a.values() for k, a in d], [])

Categories

Resources