I having replace issue while I try to replace a string with value from another column.
I want to replace 'Length' with df['Length'].
df["Length"]= df["Length"].replace('Length', df['Length'], regex = True)
Below is my data
Input:
**Formula** **Length**
Length 5
Length+1.5 6
Length-2.5 5
Length 4
5 5
Expected Output:
**Formula** **Length**
5 5
6+1.5 6
5-2.5 5
4 4
5 5
However, with the code I used above, it will replace my entire cell instead of Length only.
I getting below output:
I found it was due to df['column'] is used, if I used any other string the behind offset (-1.5) will not get replaced.
**Formula** **Length**
5 5
6 6
5 5
4 4
5 5
May I know is there any replace method for values from other columns?
Thank you.
If want replace by another column is necessary use DataFrame.apply:
df["Formula"]= df.apply(lambda x: x['Formula'].replace('Length', str(x['Length'])), axis=1)
print (df)
Formula Length
0 5 5
1 6+1.5 6
2 5-2.5 5
3 4 4
4 5 5
Or list comprehension:
df["Formula"]= [x.replace('Length', str(y)) for x, y in df[['Formula','Length']].to_numpy()]
Just wanted to add, that list comprehension is much faster of course:
df = pd.DataFrame({'a': ['aba'] * 1000000, 'c': ['c'] * 1000000})
%timeit df.apply(lambda x: x['a'].replace('b', x['c']), axis=1)
# 1 loop, best of 5: 11.8 s per loop
%timeit [x.replace('b', str(y)) for x, y in df[['a', 'c']].to_numpy()]
# 1 loop, best of 5: 1.3 s per loop
Related
I have list of 5 elements which could be 50000, now I want to sum all the combinations from the same list and create a dataframe from the results, so I am writing following code,
x =list(range(1,5))
t=[]
for i in x:
for j in x:
t.append((i,j,i+j))
df=pd.Dataframe(t)
The above code is generating the correct results but taking so long to execute when I have more elements in the list. Looking for the fastest way to do the same thing
Combinations can be obtained through the pandas.merge() method without using explicit loops
x = np.arange(1, 5+1)
df = pd.DataFrame(x, columns=['x']).merge(pd.Series(x, name='y'), how='cross')
df['sum'] = df.x.add(df.y)
print(df)
x y sum
0 1 1 2
1 1 2 3
2 1 3 4
3 1 4 5
4 1 5 6
5 2 1 3
6 2 2 4
...
Option 2: with itertools.product()
import itertools
num = 5
df = pd.DataFrame(itertools.product(range(1,num+1),range(1,num+1)))
df['sum'] = df[0].add(df[1])
print(df)
List Comprehension can make it faster. So, you can use t=[(i,j,i+j) for i in x for j in x] instead of for loop, as the traditional for loop is slower than list comprehensions, and nested loop is even slower. Here's the updated code in replacement of nested loops.
x =list(range(1,5))
t=[(i,j,i+j) for i in x for j in x]
df=pd.Dataframe(t)
I have an input Pandas Series like this:
I would like to remove duplicates in each row. For example, change M,S,S to M,S.
I tried
fifa22['player_positions'] = fifa22['player_positions'].str.split(',').apply(pd.unique)
But the results are a Series of ndarray
I would like to convert the results to simple string, without the square bracket. Wondering what to do, thanks!
If it only on this one column, you should use map.
import pandas as pd
df = pd.DataFrame({
'player_positions' : "M,S,S S S,M M,M M,M M M,S S,M,M,S".split(' ')
})
print(df)
player_positions
0 M,S,S
1 S
2 S,M
3 M,M
4 M,M
5 M
6 M,S
7 S,M,M,S
out = df['player_positions'].map(lambda x: ','.join(set(x.split(','))))
print(out)
0 M,S
1 S
2 M,S
3 M
4 M
5 M
6 M,S
7 M,S
If you want to concatenate in any other way just change the , in ','.join(...) to anything else.
I am just starting to use python and im trying to learn some of the general things about it. As I was playing around with it I wanted to see if I could make a dataframe that shows a starting number which is compounded by a return. Sorry if this description doesnt make much sense but I basically want a dataframe x long that shows me:
number*(return)^(row number) in each row
so for example say number is 10 and the return is 10% so i would like for the dataframe to give me the series
1 11
2 12.1
3 13.3
4 14.6
5 ...
6 ...
Thanks so much in advanced!
Let us try
import numpy as np
val = 10
det = 0.1
n = 4
out = 10*((1+det)**np.arange(n))
s = pd.Series(out)
s
Out[426]:
0 10.00
1 11.00
2 12.10
3 13.31
dtype: float64
Notice here I am using the index from 0 , since 1.1**0 will yield the original value
I think this does what you want:
df = pd.DataFrame({'returns': [x for x in range(1, 10)]})
df.index = df.index + 1
df.returns = df.returns.apply(lambda x: (10 * (1.1**x)))
print(df)
Out:
returns
1 11.000000
2 12.100000
3 13.310000
4 14.641000
5 16.105100
6 17.715610
7 19.487171
8 21.435888
9 23.579477
I am combining two dataframe values from an Excel file to a new dataframe but the combined values changed to decimal number. Here are my codes:
My dataframe that I wish to combine:
cable_block pair
1 10
1 11
3 123
3 222
I insert a dataframe to have those two combined with a delimiter of /, so here is my code:
df['new_col'] = df[['cable_block', 'pair']].apply(lambda x: '/'.join(x.astype(str), axis=1))
The result I get is:
cable_block pair new_col
1 10 1.0/10.0
1 11 1.0/11.0
3 123 3.0/123.0
3 222 3.0/222.0
After searching, I found good answer by
here Psidom and Skirrebattie. So I tried:
df['new_col'] = df['new_col'].applymap(str)
and
df['new_col'] = df['new_col'].astype(str)
But it doesn't work the way it should. Looking by the codes, it should work and I find it weird that it doesn't.
Is there another work around?
First, to remove the trailing .0 ensure that data is int:
df = df.astype(int)
Then you can do:
df['cable_block'].astype(str) + '/' + df['pair'].astype(str)
0 1/10
1 1/11
2 3/123
3 3/222
dtype: object
Another option to ensure a correct formatting could be:
df.apply(lambda x: "%d/%d" %(x['cable_block'], x['pair']), axis=1)
0 1/10
1 1/11
2 3/123
3 3/222
dtype: object
Why not using astype
df.astype(str).apply('/'.join,1)
Out[604]:
0 1/10
1 1/11
2 3/123
3 3/222
dtype: object
df['cable_block'].astype(int).astype(str) + '/' + df['pair'].astype(int).astype(str)
The data in your dataframe is probably floats, not ints.
You can use a list comprehension and f-strings:
df['new_col'] = [f'{cable_block}/{pair}' for cable_block, pair in df.values]
print(df)
cable_block pair new_col
0 1 10 1/10
1 1 11 1/11
2 3 123 3/123
3 3 222 3/222
The approach compares reasonably well versus the alternatives:
df = pd.concat([df]*10000, ignore_index=True)
%timeit df['cable_block'].astype(str) + '/' + df['pair'].astype(str) # 62.8 ms
%timeit [f'{cable_block}/{pair}' for cable_block, pair in df.values] # 85.1 ms
%timeit list(map('/'.join, map(list, df.values.astype(str)))) # 157 ms
%timeit df.astype(str).apply('/'.join,1) # 1.11 s
I have a task that is completely driving me mad. Lets suppose we have this df:
import pandas as pd
k = {'random_col':{0:'a',1:'b',2:'c'},'isin':{0:'ES0140074008', 1:'ES0140074008ES0140074010', 2:'ES0140074008ES0140074016ES0140074024'},'n_isins':{0:1,1:2,2:3}}
k = pd.DataFrame(k)
What I want to do is to double or triple a row a number of times goberned by col n_isins which is a number obtained by dividing the lentgh of col isin didived by 12, as isins are always strings of 12 characters.
So, I need 1 time row 0, 2 times row 1 and 3 times row 2. My real numbers are up-limited by 6 so it is a hard task. I began by using booleans and slicing the col isin but that does not take me to nothing. Hopefully my explanation is good enough. Also I need the col isin sliced like this [0:11] + ' ' + [12:23]... splitting by the 'E' but I think I know how to do that, I just post it cause is the criteria that rules the number of times I have to copy each row. Thanks in advance!
I think you need numpy.repeat with loc, last remove duplicates in index by reset_index. Last for new column use custom splitting function with numpy.concatenate:
n = np.repeat(k.index, k['n_isins'])
k = k.loc[n].reset_index(drop=True)
print (k)
isin n_isins random_col
0 ES0140074008 1 a
1 ES0140074008ES0140074010 2 b
2 ES0140074008ES0140074010 2 b
3 ES0140074008ES0140074016ES0140074024 3 c
4 ES0140074008ES0140074016ES0140074024 3 c
5 ES0140074008ES0140074016ES0140074024 3 c
#https://stackoverflow.com/a/7111143/2901002
def chunks(s, n):
"""Produce `n`-character chunks from `s`."""
for start in range(0, len(s), n):
yield s[start:start+n]
s = np.concatenate(k['isin'].apply(lambda x: list(chunks(x, 12))))
df['new'] = pd.Series(s, index = df.index)
print (df)
isin n_isins random_col new
0 ES0140074008 1 a ES0140074008
1 ES0140074008ES0140074010 2 b ES0140074008
2 ES0140074008ES0140074010 2 b ES0140074010
3 ES0140074008ES0140074016ES0140074024 3 c ES0140074008
4 ES0140074008ES0140074016ES0140074024 3 c ES0140074016
5 ES0140074008ES0140074016ES0140074024 3 c ES0140074024