how to append 'new columns' at pivot table..? (pandas)

how to append 'new columns' at pivot table..? (pandas) - python

import numpy as np
import math
import pandas as pd
# making an example DataFrame
data = DataFrame({'cust_id': ['c1', 'c1', 'c1', 'c2', 'c2', 'c2', 'c3', 'c3', 'c3',
'c1', 'c1', 'c1', 'c2', 'c2', 'c2', 'c3', 'c3', 'c3'],
'step_seq': ['123', '123', '123', '123', '123', '123', '123', '123', '123',
'456','456','456','456','456','456','456','456','456'],
'grade' : ['A', 'A', 'A', 'A', 'A', 'A', 'B', 'B', 'B',
'C','C','C','C','C','C','C','C','D'],
'pch_amt': [1, 2, 3, 4, 5, 6, 7, 8, 9,
1, 2, 3, 4, 5, 6, 7, 8, 9]})
print(data)
data = pd.pivot_table(data, index='step_seq', columns='pch_amt', values='grade', aggfunc=np.sum)
a = data.iloc[0,:].tolist()
b = set(a)
len(b)
for i in range(len(data.index)):
a = data.iloc[i,:].tolist()
print(a)
b = set(a)
# Qestion1 Related
print(b)
print(len(b))
data.loc[i,'Number of types']=len(b)
data
# Qestion2 Related
Before asking questions, thank you for your help all the time.
I ask two question as above
Q1) Why second set get 'nan' ??.. and how can I remove it..?
Q2) How to make to append 'Number of types' in Coumuns(pivot) ?

Related

Function to concat undefinded number of dataframes

I'd like to create a function where I can input an undefined number of arrays, turn them into data frames ,concatenate them appending their columns and output a merged dataframe.
Example:
# Suppose we have 3 arrays:
data1 = {
'A': ['A1', 'A2', 'A3', 'A4', 'A5'],
'B': ['B1', 'B2', 'B3', 'B4', 'B5'],
'C': ['C1', 'C2', 'C3', 'C4', 'C5'],
}
data2 = {
'D': ['D1', 'D2', 'D3', 'D4', 'D5'],
'E': ['E1', 'E2', 'E3', 'E4', 'E5'],
'F': ['F1', 'F2', 'F3', 'F4', 'F5'],
}
data3 = {
'G': ['G1', 'G2', 'G3', 'G4', 'G5'],
'H': ['H1', 'H2', 'H3', 'H4', 'H5'],
'I': ['I1', 'I2', 'I3', 'I4', 'I5'],
}
# We could convert them into data frames using:
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)
df3 = pd.DataFrame(data3)
# And finally join them with:
df4 = pd.concat([df1, df2, df3], axis=1)
The output dataframe would look like this:
I would like to create a function that can do this, but with an unspecified amount of arrays, for example:
func(data1, data2)
func(data1, data2, data3)
func(data1, data2, data...n)

This is a short answer using list comprehension, provided by Ch3steR.
It works and is a very compact answer.
def func(*args): d = [pd.DataFrame(dc) for dc in args]; return pd.concat(d, axis=1)
In the end I went for a longer and slower solution, but that i will easily understand when looking at my code in the future:
def add_df(*args):
""" Function to concatenate columns of unlimited dataframes"""
list = []
for file in args:
df = pd.read_csv(file)
list.append(df)
return pd.concat(list, axis=1)

Split data in list based on condition

I have following list :
data = ['A1', 'C3', 'B2', 'A2', 'D3', 'C2', 'A3', 'D2', 'C1', 'B1', 'D1', 'B3']
I want to split the list such that
split1 = ['A1', 'C3', 'B2', 'A2', 'C2', 'A3', 'C1', 'B1', 'B3']
split2 = ['D3', 'D2', 'D1']
Constraint is that no item with same prefix(A, B, etc.) can wind up in separate list. The data can be split in any ratio like 50-50, 80-20.

Here you go:
import numpy as np
data = np.array(['A1', 'C3', 'B2', 'A2', 'D3', 'C2', 'A3', 'D2', 'C1', 'B1', 'D1', 'B3'])
# define some condition
condition = ['B', 'D']
boolean_selection = [np.any([ c in d for c in condition]) for d in data]
split1 = data[boolean_selection]
split2 = data[np.logical_not(boolean_selection)]

Concat DataFrame under specific condition

For the following dataframes which are stored in a list of lists, I want to concat them if there is something to:
df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']},
index=[0, 1, 2, 3])
fr_list = [[] for x in range(2)]
fr_list[0].append(df1)
fr_list[0].append(df1)
fr_list[1].append(df1)
for x in range(2):
df = pd.concat(fr_list[x] if len(fr_list[x]) > 1) # <-- here is the problem

The syntax you want is probably:
...
df = pd.concat((fr for fr in fr_list[x] if len(fr) > 1))

Convert dictionary to list with some data omitted

I'm trying to convert a dictionary of the format:
d = {'A1': ['a', 'a', 'A2 (A3-)', 'a'],
'B1': ['b', 'b', 'B2 (B3-)', 'b'],
'C1': ['c', 'c', 'C2 (C3)-', 'c']}
To a list of the form:
e = [['A1', 'A2', 'A3'], ['B1', 'B2', 'B3'], ['C1', 'C2', 'C3']]
I know I should use regex to get the A2 and A3 data, but I'm having trouble putting this all together...

import re
regex = re.compile(r'(\w+) \((\w+)-.*')
# I suppose that you meant (C3-) and not (C3)-
d = {'A1': ['a', 'a', 'A2 (A3-)', 'a'], 'B1': ['b', 'b', 'B2 (B3-)', 'b'], 'C1': ['c', 'c', 'C2 (C3-)', 'c']}
out = []
for key, values_list in d.items():
v2, v3 = regex.match(values_list[2]).groups()
out.append([key, v2, v3])
print(out)
# [['C1', 'C2', 'C3'], ['B1', 'B2', 'B3'], ['A1', 'A2', 'A3']]
Note that the order is random, as your original dict is unordered.

How can i make the list form list of tuples

I have this
d = \
[('a', {'b': 'c1', 'd': 'f1'}),
('a', {'bb': 'c2', 'dd': 'f2'}),
('a', {'bbb': 'c3', 'ddd': 'f3'})]
I want the ouput like this
['c1', 'f1', 'f2', 'c2', 'c3', 'f3']
I have tried this
In [51]: [a.values() for k,a in d]
Out[51]: [['c1', 'f1'], ['f2', 'c2'], ['c3', 'f3']]
I want to do that simplest and shortest possible way

>>> d = \
[('a', {'b': 'c1', 'd': 'f1'}),
('a', {'bb': 'c2', 'dd': 'f2'}),
('a', {'bbb': 'c3', 'ddd': 'f3'})]
>>> [y for x in d for y in x[1].values()]
['c1', 'f1', 'f2', 'c2', 'c3', 'f3']

You can use itertools.chain:
>>> d=[('a', {'b': 'c1', 'd': 'f1'}),
('a', {'bb': 'c2', 'dd': 'f2'}),
('a', {'bbb': 'c3', 'ddd': 'f3'})]
>>> from itertools import chain
>>> list(chain.from_iterable( x[1].values() for x in d ))
['c1', 'f1', 'f2', 'c2', 'c3', 'f3']

Just an alternative answer using reduce:
import operator
reduce(operator.add,(a.values() for k,a in d))
Maybe not the best idea, but it works. Essentially equivalent to Blender's
sum([a.values() for k, a in d], [])

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

how to append 'new columns' at pivot table..? (pandas) - python

Related

Function to concat undefinded number of dataframes

Split data in list based on condition

Concat DataFrame under specific condition

Convert dictionary to list with some data omitted

How can i make the list form list of tuples

Categories

Resources