String Formatting using many pandas columns to create a new one

String Formatting using many pandas columns to create a new one - python

I would like to create a new columns in a pandas DataFrame just like I would do using a python f-Strings or format function.
Here is an example:
df = pd.DataFrame({"str": ["a", "b", "c", "d", "e"],
"int": [1, 2, 3, 4, 5]})
print(df)
str int
0 a 1
1 b 2
2 c 3
3 d 4
4 e 5
I would like to obtain:
str int concat
0 a 1 a-01
1 b 2 b-02
2 c 3 c-03
3 d 4 d-04
4 e 5 e-05
So something like:
concat = f"{str}-{int:02d}"
but directly with elements of pandas columns. I imagine the solution is using pandas map, apply, agg but nothing successful.
Many thanks for your help.

Use lsit comprehension with f-strings:
df['concat'] = [f"{a}-{b:02d}" for a, b in zip(df['str'], df['int'])]
Or is possible use apply:
df['concat'] = df.apply(lambda x: f"{x['str']}-{x['int']:02d}", axis=1)
Or solution from comments with Series.str.zfill:
df["concat"] = df["str"] + "-" + df["int"].astype(str).str.zfill(2)
print (df)
str int concat
0 a 1 a-01
1 b 2 b-02
2 c 3 c-03
3 d 4 d-04
4 e 5 e-05

You could use a list comprehension to build the concat column:
import pandas as pd
df = pd.DataFrame({"str": ["a", "b", "c", "d", "e"],
"int": [1, 2, 3, 4, 5]})
df['concat'] = [f"{s}-{i:02d}" for s, i in df[['str', 'int']].values]
print(df)
Output
str int concat
0 a 1 a-01
1 b 2 b-02
2 c 3 c-03
3 d 4 d-04
4 e 5 e-05

I also just discovered that array indexing work on DataFrame columns
df["concat"] = df.apply(lambda x: f"{x[0]}-{x[1]:02d}", axis=1)
print(df)
str int concat
0 a 1 a-01
1 b 2 b-02
2 c 3 c-03
3 d 4 d-04
4 e 5 e-05
looks very sleek

You can use pandas' string concatenate method :
df['concat'] = df['str'].str.cat(df['int'].astype(str),sep='-0')
str int concat
0 a 1 a-01
1 b 2 b-02
2 c 3 c-03
3 d 4 d-04
4 e 5 e-05

Related

how to transform a dict of lists to a dataframe in python?

I have a dict in python like this:
d = {"a": [1,2,3], "b": [4,5,6]}
I want to transform in a dataframe like this:
letter
number
a
1
a
2
a
3
b
4
b
5
b
6
i have tried this code:
df = pd.DataFrame.from_dict(vulnerabilidade, orient = 'index').T
but this gave me:
a
1
2
3
b
4
5
6

You can always read your data in as you already have and then .melt it:
When passed no id_vars or value_vars, melt turns each of your columns into their own rows.
import pandas as pd
d = {"a": [1,2,3], "b": [4,5,6]}
out = pd.DataFrame(d).melt(var_name='letter', value_name='value')
print(out)
letter value
0 a 1
1 a 2
2 a 3
3 b 4
4 b 5
5 b 6

To use 'letter' and 'number' as column labels you could use:
a2 = [[key, val] for key, x in d.items() for val in x]
dict2 = pd.DataFrame(a2, columns = ['letter', 'number'])
which gives
letter number
0 a 1
1 a 2
2 a 3
3 b 4
4 b 5
5 b 6

Yet another possible solution:
(pd.Series(d, index=d.keys(), name='numbers')
.rename_axis('letters').reset_index()
.explode('numbers', ignore_index=True))
Output:
letters numbers
0 a 1
1 a 2
2 a 3
3 b 4
4 b 5
5 b 6

This will yield what you want (there might be a simpler way though):
import pandas as pd
my_dict = {"a": [1,2,3], "b": [4,5,6]}
my_list = [[key, val] for key in my_dict for val in my_dict[key] ]
df = pd.DataFrame(my_list, columns=['letter','number'])
df
# Out[106]:
# letter number
# 0 a 1
# 1 a 2
# 2 a 3
# 3 b 4
# 4 b 5
# 5 b 6

Map a Pandas Series with duplicate keys to a DataFrame

Env: Python 3.9.6, Pandas 1.3.5
I have a DataFrame and a Series like below
df = pd.DataFrame({"C1" : ["A", "B", "C", "D"]})
sr = pd.Series(data = [1, 2, 3, 4, 5],
index = ["A", "A", "B", "C", "D"])
"""
[DataFrame]
C1
0 A
1 B
2 C
3 D
[Series]
A 1
A 2
B 3
C 4
D 5
"""
What I tried,
df["C2"] = df["C1"].map(sr)
But InvalidIndexError occurred because the series has duplicate keys ("A").
pandas.errors.InvalidIndexError: Reindexing only valid with uniquely valued Index objects
Is there any method to make DF like below?
C1 C2
0 A 1
1 A 2
2 B 3
3 C 4
4 D 5
or
C1 C2
0 A 1
1 B 3
2 C 4
3 D 5
4 A 2
Row indices do not matter.

The question was heavily edited and now has a very different meaning.
You want a simple merge:
df.merge(sr.rename('C2'),
left_on='C1', right_index=True)
Output:
C1 C2
0 A 1
0 A 2
1 B 3
2 C 4
3 D 5
old answer
First, I don't reproduce your issue (tested with 3M rows on pandas 1.3.5).
Then why do you use slicing and not map? This would have the advantage of systematically outputting the correct number of rows (NaN if the key is absent):
Example:
sr = pd.Series({10:"A", 13:"B", 16:"C", 18:"D"})
df = pd.DataFrame({"C1":np.random.randint(10, 20, size=3000000)})
df['C2'] = df['C1'].map(sr)
print(df.head())
output:
C1 C2
0 10 A
1 18 D
2 10 A
3 13 B
4 15 NaN

How to convert an M x M pandas DataFrame into an N X 2 Dataframe?

I have a pandas DataFrame like below:
df = pd.DataFrame({"type": ["A", "B", "C"],
"A": [0, 0, 12],
"B": [1, 3, 0],
"C": [0, 1, 1]}
)
I want to transform this to a DataFrame that is N X 2, where I concatenate the column and type values with " - " as delimiter. The output should look like this:
pair value
A - A 0
A - B 0
A - C 12
B - A 1
B - B 3
B - C 0
C - A 0
C - B 1
C - C 1
I don't know if there is a name for what I want to accomplish (I thought about pivoting but I believe that is something else), so that didn't help me in googling the solution for this. How to solve this problem efficiently?

1st set index as type and then unstack and convert the result to dataframe.
try:
x = df.set_index('type').unstack().to_frame('value')
x.index = x.index.map(' - '.join)
res = x.rename_axis('pair').reset_index()
res:
pair value
0 A - A 0
1 A - B 0
2 A - C 12
3 B - A 1
4 B - B 3
5 B - C 0
6 C - A 0
7 C - B 1
8 C - C 1

First melt the column type, then join variable, and type column with a hyphen -, and take the required columns only:
>>> out = df.melt(id_vars='type')
>>> out.assign(pair=out['variable']+'-'+out['type'])[['pair', 'value']]
pair value
0 A-A 0
1 A-B 0
2 A-C 12
3 B-A 1
4 B-B 3
5 B-C 0
6 C-A 0
7 C-B 1
8 C-C 1

pandas DataFrame isin and following row

For a given DataFrame, sorted by b and index reset:
df = pd.DataFrame({'a': list('abcdef'),
'b': [0, 2, 7, 3, 9, 15]}
).sort_values('b').reset_index(drop=True)
a b
0 a 0
1 b 2
2 d 3
3 c 7
4 e 9
5 f 15
and a list, v
v = list('adf')
I would like to pull out just the rows in v and the following row (if there is one), similar to grep -A1:
a b
0 a 0
1 b 2
2 d 3
3 c 7
5 f 15
I can do this by concatenating the index from isin and the index from isin plus one, like so:
df[df.index.isin(
np.concatenate(
(df[df['a'].isin(v)].index,
df[df['a'].isin(v)].index + 1)))]
But this is long and not too easy to understand. Is there a better way?

You can combine the isin condition and the shift (next row) to create the boolean you needed:
df[df.a.isin(v).pipe(lambda x: x | x.shift())]
# a b
#0 a 0
#1 b 2
#2 d 3
#3 c 7
#5 f 15

How to simply add a column level to a pandas dataframe

let say I have a dataframe that looks like this:
df = pd.DataFrame(index=list('abcde'), data={'A': range(5), 'B': range(5)})
df
Out[92]:
A B
a 0 0
b 1 1
c 2 2
d 3 3
e 4 4
Asumming that this dataframe already exist, how can I simply add a level 'C' to the column index so I get this:
df
Out[92]:
A B
C C
a 0 0
b 1 1
c 2 2
d 3 3
e 4 4
I saw SO anwser like this python/pandas: how to combine two dataframes into one with hierarchical column index? but this concat different dataframe instead of adding a column level to an already existing dataframe.
-

As suggested by #StevenG himself, a better answer:
df.columns = pd.MultiIndex.from_product([df.columns, ['C']])
print(df)
# A B
# C C
# a 0 0
# b 1 1
# c 2 2
# d 3 3
# e 4 4

option 1
set_index and T
df.T.set_index(np.repeat('C', df.shape[1]), append=True).T
option 2
pd.concat, keys, and swaplevel
pd.concat([df], axis=1, keys=['C']).swaplevel(0, 1, 1)

A solution which adds a name to the new level and is easier on the eyes than other answers already presented:
df['newlevel'] = 'C'
df = df.set_index('newlevel', append=True).unstack('newlevel')
print(df)
# A B
# newlevel C C
# a 0 0
# b 1 1
# c 2 2
# d 3 3
# e 4 4

You could just assign the columns like:
>>> df.columns = [df.columns, ['C', 'C']]
>>> df
A B
C C
a 0 0
b 1 1
c 2 2
d 3 3
e 4 4
>>>
Or for unknown length of columns:
>>> df.columns = [df.columns.get_level_values(0), np.repeat('C', df.shape[1])]
>>> df
A B
C C
a 0 0
b 1 1
c 2 2
d 3 3
e 4 4
>>>

Another way for MultiIndex (appanding 'E'):
df.columns = pd.MultiIndex.from_tuples(map(lambda x: (x[0], 'E', x[1]), df.columns))
A B
E E
C D
a 0 0
b 1 1
c 2 2
d 3 3
e 4 4

I like it explicit (using MultiIndex) and chain-friendly (.set_axis):
df.set_axis(pd.MultiIndex.from_product([df.columns, ['C']]), axis=1)
This is particularly convenient when merging DataFrames with different column level numbers, where Pandas (1.4.2) raises a FutureWarning (FutureWarning: merging between different levels is deprecated and will be removed ... ):
import pandas as pd
df1 = pd.DataFrame(index=list('abcde'), data={'A': range(5), 'B': range(5)})
df2 = pd.DataFrame(index=list('abcde'), data=range(10, 15), columns=pd.MultiIndex.from_tuples([("C", "x")]))
# df1:
A B
a 0 0
b 1 1
# df2:
C
x
a 10
b 11
# merge while giving df1 another column level:
pd.merge(df1.set_axis(pd.MultiIndex.from_product([df1.columns, ['']]), axis=1),
df2,
left_index=True, right_index=True)
# result:
A B C
x
a 0 0 10
b 1 1 11

Another method, but using a list comprehension of tuples as the arg to pandas.MultiIndex.from_tuples():
df.columns = pd.MultiIndex.from_tuples([(col, 'C') for col in df.columns])
df
A B
C C
a 0 0
b 1 1
c 2 2
d 3 3
e 4 4

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

String Formatting using many pandas columns to create a new one - python

I also just discovered that array indexing work on DataFrame columns df["concat"] = df.apply(lambda x: f"{x[0]}-{x[1]:02d}", axis=1) print(df) str int concat 0 a 1 a-01 1 b 2 b-02 2 c 3 c-03 3 d 4 d-04 4 e 5 e-05 looks very sleek

You can use pandas' string concatenate method : df['concat'] = df['str'].str.cat(df['int'].astype(str),sep='-0') str int concat 0 a 1 a-01 1 b 2 b-02 2 c 3 c-03 3 d 4 d-04 4 e 5 e-05

Related

how to transform a dict of lists to a dataframe in python?

Map a Pandas Series with duplicate keys to a DataFrame

How to convert an M x M pandas DataFrame into an N X 2 Dataframe?

pandas DataFrame isin and following row

How to simply add a column level to a pandas dataframe

Categories

Resources