append same Series to data frame columns - python

I have the following dataframe:
data=pd.DataFrame(data=[[8,4,2,6,0],[3,4,5,6,7]],columns=["a","b","c","d","e"])
Output is like this:
a b c d e
0 8 4 2 6 0
1 3 4 5 6 7
I also have the following Series:
a=pd.Series([3,4])
I want to attach the series (a) to each of the columns in data. I tried few things with concat but I never seem to get it right.
Expected result is:
a b c d e
0 8 4 2 6 0
1 3 4 5 6 7
2 3 3 3 3 3
3 4 4 4 4 4
Thanks in advance

You can do:
out=data.append(pd.concat([a]*data.shape[1],axis=1,keys=data.columns),ignore_index=True)
a b c d e
0 8 4 2 6 0
1 3 4 5 6 7
2 3 3 3 3 3
3 4 4 4 4 4

Here is a method from for loop
for x ,y in a.iteritems():
data.loc[data.index[-1]+x+1]=y
data
Out[106]:
a b c d e
0 8 4 2 6 0
1 3 4 5 6 7
2 3 3 3 3 3
4 4 4 4 4 4

pandas.DataFrame.apply
with pandas.Series.append
I like this because it's pretty
data.apply(pd.Series.append, to_append=a, ignore_index=True)
a b c d e
0 8 4 2 6 0
1 3 4 5 6 7
2 3 3 3 3 3
3 4 4 4 4 4
A golfier answer
data.apply(pd.Series.append, args=(a, 1))
numpy.row_stack
Very similar to rafaelc's answer
pd.DataFrame(np.row_stack([
data,
a.to_numpy()[:, None].repeat(data.shape[1], axis=1)
]), columns=data.columns)
a b c d e
0 8 4 2 6 0
1 3 4 5 6 7
2 3 3 3 3 3
3 4 4 4 4 4

Using broadcast_to
df.append(pd.DataFrame(np.broadcast_to(a.to_frame(), (len(a), df.shape[1])), columns=df.columns), ignore_index=True)
a b c d e
0 8 4 2 6 0
1 3 4 5 6 7
2 3 3 3 3 3
3 4 4 4 4 4

Related

pandas - pivot column names as values

I have a dataframe like this:
1 2 3 4 5 6
Ax Ax Ax Ax Ax Ax
delta delta delta delta delta delta
0 6 4 1 5 3 2
1 6 1 5 3 2 4
2 6 1 5 3 2 4
3 6 1 5 3 2 4
4 6 1 5 3 2 4
5 6 1 5 3 2 4
6 6 1 5 3 2 4
7 6 1 5 3 2 4
8 6 1 5 3 2 4
9 6 1 5 3 2 4
I would like to pivot this such that the values are the column, and the columns are the value.
So, the first two rows would become the following:
1 2 3 4 5 6
0 3 6 5 2 4 1
1 3 6 2 5 4 1
I hope that this makes sense. I have tried using pivot() and pivot_table() but it doesn't seem possible with that.
Try:
df1 = df.copy()
df1.columns = df1.columns.droplevel([1,2])
df1.stack().reset_index().pivot(index='level_0', columns=0)
Slice the columns by the sorted indices:
import numpy as np
import pandas as pd
cols = df.columns.get_level_values(0).to_numpy()
pd.DataFrame(cols[np.argsort(df.to_numpy(), 1)],
columns=list(range(1, df.shape[1]+1)))
1 2 3 4 5 6
0 3 6 5 2 4 1
1 2 5 4 6 3 1
2 2 5 4 6 3 1
3 2 5 4 6 3 1
4 2 5 4 6 3 1
5 2 5 4 6 3 1
6 2 5 4 6 3 1
7 2 5 4 6 3 1
8 2 5 4 6 3 1
9 2 5 4 6 3 1

categorize numerical series with python

I'm figuring out how to assign a categorization from an increasing enumeration column. Here an example of my dataframe:
df = pd.DataFrame({'A':[1,1,1,1,1,1,2,2,3,3,3,3,3],'B':[1,2,3,12,13,14,1,2,5,6,7,8,50]})
This produce:
df
Out[9]:
A B
0 1 1
1 1 2
2 1 3
3 1 12
4 1 13
5 1 14
6 2 1
7 2 2
8 3 5
9 3 6
10 3 7
11 3 8
12 3 50
The column B has an increasing numerical serie, but sometimes the series is interrupted and keeps going with other numbers or start again. My desired output is:
Out[11]:
A B C
0 1 1 1
1 1 2 1
2 1 3 1
3 1 12 2
4 1 13 2
5 1 14 2
6 2 1 3
7 2 2 3
8 3 5 3
9 3 6 4
10 3 7 4
11 3 8 4
12 3 50 5
I appreciate your suggestions, because I can not find an ingenious way to
do it. Thanks
Is this what you need ?
df.B.diff().ne(1).cumsum()
Out[463]:
0 1
1 1
2 1
3 2
4 2
5 2
6 3
7 3
8 4
9 4
10 4
11 4
12 5
Name: B, dtype: int32

Efficient conversion of dataframe distinct values in Python

I have a data like this:
republican,n,y,n,y,y,y,n,n,n,y,?,y,y,y,n,y
republican,n,y,n,y,y,y,n,n,n,n,n,y,y,y,n,?
democrat,?,y,y,?,y,y,n,n,n,n,y,n,y,y,n,n
democrat,n,y,y,n,?,y,n,n,n,n,y,n,y,n,n,y
democrat,y,y,y,n,y,y,n,n,n,n,y,?,y,y,y,y
democrat,n,y,y,n,y,y,n,n,n,n,n,n,y,y,y,y
democrat,n,y,n,y,y,y,n,n,n,n,n,n,?,y,y,y
republican,n,y,n,y,y,y,n,n,n,n,n,n,y,y,?,y
from source.
I would like to change all different distinct values from all of the data (dataframe) into numeric values in most efficient way.
In the above mentioned example I would like to transform republican-> 1 and democrat -> 2, y ->3, n->4 and ? -> 5 (or NULL).
I tried to use the following:
# Convert string column to integer
def str_column_to_int(dataset, column):
class_values = [row[column] for row in dataset]
unique = set(class_values)
lookup = dict()
for i, value in enumerate(unique):
lookup[value] = i
for row in dataset:
row[column] = lookup[row[column]]
return lookup
However, I'm not sure if using Pandas can be more efficient or there are some other better solutions for it. (This should be generic to any source of data).
Here is the transform of data into dataframe using Pandas:
import pandas as pd
file_path = 'https://archive.ics.uci.edu/ml/machine-learning-databases/voting-records/house-votes-84.data'
dataset = pd.read_csv(file_path, header=None)
v = df.values
f = pd.factorize(v.ravel())[0].reshape(v.shape)
pd.DataFrame(f)
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
0 0 1 2 1 2 2 2 1 1 1 2 3 2 2 2 1 2
1 0 1 2 1 2 2 2 1 1 1 1 1 2 2 2 1 3
2 4 3 2 2 3 2 2 1 1 1 1 2 1 2 2 1 1
3 4 1 2 2 1 3 2 1 1 1 1 2 1 2 1 1 2
4 4 2 2 2 1 2 2 1 1 1 1 2 3 2 2 2 2
5 4 1 2 2 1 2 2 1 1 1 1 1 1 2 2 2 2
6 4 1 2 1 2 2 2 1 1 1 1 1 1 3 2 2 2
7 0 1 2 1 2 2 2 1 1 1 1 1 1 2 2 3 2
Use replace on the whole dataframe to make the mappings. You could first pass a dictionary of known mappings for values you need to remain consistent, and then generate a set of values for the dataset and map these extra values to say values 100 upwards.
For example, the ? here is not mapped, so would get a value of 100:
mappings = {'republican':1, 'democrat':2, 'y':3, 'n':4}
unknown = set(pd.unique(df.values.ravel())) - set(mappings.keys())
mappings.update([v, c] for c, v in enumerate(unknown, start=100))
df.replace(mappings, inplace=True)
Giving you:
republican n n.1 n.2 n.3 n.4 n.5 n.6 n.7 n.8 n.9 ? n.10 n.11 n.12 n.13 n.14
0 1 4 3 4 3 3 3 4 4 4 3 100 3 3 3 4 3
1 1 4 3 4 3 3 3 4 4 4 4 4 3 3 3 4 100
2 2 100 3 3 100 3 3 4 4 4 4 3 4 3 3 4 4
3 2 4 3 3 4 100 3 4 4 4 4 3 4 3 4 4 3
4 2 3 3 3 4 3 3 4 4 4 4 3 100 3 3 3 3
5 2 4 3 3 4 3 3 4 4 4 4 4 4 3 3 3 3
6 2 4 3 4 3 3 3 4 4 4 4 4 4 100 3 3 3
7 1 4 3 4 3 3 3 4 4 4 4 4 4 3 3 100 3
A more generalized version would be:
mappings = {v:c for c, v in enumerate(sorted(set(pd.unique(df.values.ravel()))), start=1)}
df.replace(mappings, inplace=True)
You can use:
v = df.values
a, b = v.shape
f = pd.factorize(v.T.ravel())[0].reshape(b,a).T
df = pd.DataFrame(f)
print (df)
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
0 0 2 4 2 4 4 4 2 2 2 4 3 4 4 4 2 4
1 0 2 4 2 4 4 4 2 2 2 2 2 4 4 4 2 3
2 1 3 4 4 3 4 4 2 2 2 2 4 2 4 4 2 2
3 1 2 4 4 2 3 4 2 2 2 2 4 2 4 2 2 4
4 1 4 4 4 2 4 4 2 2 2 2 4 3 4 4 4 4
5 1 2 4 4 2 4 4 2 2 2 2 2 2 4 4 4 4
6 1 2 4 2 4 4 4 2 2 2 2 2 2 3 4 4 4
7 0 2 4 2 4 4 4 2 2 2 2 2 2 4 4 3 4

Count distinct strings in rolling window using pandas

How do I count the number of unique strings in a rolling window of a pandas dataframe?
a = pd.DataFrame(['a','b','a','a','b','c','d','e','e','e','e'])
a.rolling(3).apply(lambda x: len(np.unique(x)))
Output, same as original dataframe:
0
0 a
1 b
2 a
3 a
4 b
5 c
6 d
7 e
8 e
9 e
10 e
Expected:
0
0 1
1 2
2 2
3 2
4 2
5 3
6 3
7 3
8 2
9 1
10 1
I think you need first convert values to numeric - by factorize or by rank. Also min_periods parameter is necessary for avoid NaN in start of column:
a[0] = pd.factorize(a[0])[0]
print (a)
0
0 0
1 1
2 0
3 0
4 1
5 2
6 3
7 4
8 4
9 4
10 4
b = a.rolling(3, min_periods=1).apply(lambda x: len(np.unique(x))).astype(int)
print (b)
0
0 1
1 2
2 2
3 2
4 2
5 3
6 3
7 3
8 2
9 1
10 1
Or:
a[0] = a[0].rank(method='dense')
0
0 1.0
1 2.0
2 1.0
3 1.0
4 2.0
5 3.0
6 4.0
7 5.0
8 5.0
9 5.0
10 5.0
b = a.rolling(3, min_periods=1).apply(lambda x: len(np.unique(x))).astype(int)
print (b)
0
0 1
1 2
2 2
3 2
4 2
5 3
6 3
7 3
8 2
9 1
10 1

Converting one to many mapping dictionary to Dataframe

I have a dictionary as follows:
d={1:(array[2,3]), 2:(array[8,4,5]), 3:(array[6,7,8,9])}
As depicted, here the values for each key are variable length arrays.
Now I want to convert it to DataFrame. So the output looks like:
A B
1 2
1 3
2 8
2 4
2 5
3 6
3 7
3 8
3 9
I used pd.Dataframe(d), but it does not handle one to many mapping.Any help would be appreciated.
Use Series constructor with str.len for lenghts of lists (arrays was converted to lists).
Then create new DataFrame with numpy.repeat, numpy.concatenate and Index.values:
d = {1:np.array([2,3]), 2:np.array([8,4,5]), 3:np.array([6,7,8,9])}
print (d)
a = pd.Series(d)
l = a.str.len()
df = pd.DataFrame({'A':np.repeat(a.index.values, l), 'B': np.concatenate(a.values)})
print (df)
A B
0 1 2
1 1 3
2 2 8
3 2 4
4 2 5
5 3 6
6 3 7
7 3 8
8 3 9
pd.DataFrame(
[[k, v] for k, a in d.items() for v in a.tolist()],
columns=['A', 'B']
)
A B
0 1 2
1 1 3
2 2 8
3 2 4
4 2 5
5 3 6
6 3 7
7 3 8
8 3 9
Setup
d = {1: np.array([2,3]), 2: np.array([8,4,5]), 3: np.array([6,7,8,9])}
Here's my version:
(pd.DataFrame.from_dict(d, orient='index').rename_axis('A')
.stack()
.reset_index(name='B')
.drop('level_1', axis=1)
.astype('int'))
Out[63]:
A B
0 1 2
1 1 3
2 2 8
3 2 4
4 2 5
5 3 6
6 3 7
7 3 8
8 3 9

Categories

Resources