I want to replace a row in a csv file with a variable. The row itself also has to be a variable. The following code is an example:
import pandas as pd
# sample dataframe
df = pd.DataFrame({'A': ['a','b','c'], 'B':['b','c','d']})
print("Original DataFrame:\n", df)
x = 1
y = 12698
df_rep = df.replace([int(x),1], y)
print("\nAfter replacing:\n", df_rep)
This can be done using pandas indexing eg df.iloc[row_num, col_num].
#update df
df.iloc[x,1]=y
#print df
print(df)
A B
0 a b
1 b 12698
2 c d
Related
after processing some data I got df, now I need to get max 3 value from the data frame with column name
data=[[4.12,3,2],[1.0123123,-6.12312,5.123123],[-3.123123,-8.512323,6.12313]]
df = pd.DataFrame(data,columns =['a','b','c'],index=['aa','bb','cc'])
df
output:
a b c
aa 4.120000 3.000000 2.000000
bb 1.012312 -6.123120 5.123123
cc -3.123123 -8.512323 6.123130
Now I assigned each value with a columns name
df1 = df.astype(str).apply(lambda x:x+'='+x.name)
a b c
aa 4.12=a 3.0=b 2.0=c
bb 1.0123123=a -6.12312=b 5.123123=c
cc -3.123123=a -8.512323=b 6.12313=c
I need to get the max, I have tried to sort the data frame but not able to get the output
what I need is
final_df
max=1 max=2 max=3
aa 4.12=a 3.0=b 2.0=c
bb 5.123123=c 1.0123123=a -6.12312=b
cc 6.12313=c -3.123123=a -8.512323=b
I suggest you proceed as follows:
import pandas as pd
data=[[4.12,3,2],[1.0123123,-6.12312,5.123123],[-3.123123,-8.512323,6.12313]]
df = pd.DataFrame(data,columns =['a','b','c'],index=['aa','bb','cc'])
# first sort values in descending order
df.values[:, ::-1].sort(axis=1)
# then rename row values
df1 = df.astype(str).apply(lambda x: x + '=' + x.name)
# rename columns
df1.columns = [f"max={i}" for i in range(1, len(df.columns)+1)]
Result as desired:
max=1 max=2 max=3
aa 4.12=a 3.0=b 2.0=c
bb 5.123123=a 1.0123123=b -6.12312=c
cc 6.12313=a -3.123123=b -8.512323=c
As the solution proposed by #GuglielmoSanchini does not give the expected result, It works as follows:
# Imports
import pandas as pd
import numpy as np
# Data
data=[[4.12,3,2],[1.0123123,-6.12312,5.123123],[-3.123123,-8.512323,6.12313]]
df = pd.DataFrame(data,columns =['a','b','c'],index=['aa','bb','cc'])
df1 = df.astype(str).apply(lambda x:x+'='+x.name)
data = []
for index, row in df1.iterrows():
# the indices of the numbers sorted in descending order
indices_max = np.argsort([float(item[:-2]) for item in row])[::-1]
# We add the new values sorted
data.append(row.iloc[indices_max].values.tolist())
# We create the new dataframe with values sorted
df = pd.DataFrame(data, columns = [f"max={i}" for i in range(1, len(df1.columns)+1)])
df.index = df1.index
print(df)
Here is the result:
max=1 max=2 max=3
aa 4.12=a 3.0=b 2.0=c
bb 5.123123=c 1.0123123=a -6.12312=b
cc 6.12313=c -3.123123=a -8.512323=b
I have to retrieve all rows from w_loaded_updated_iod.xlsx where on column waived = Yes.
I have tried this:
import pandas as pd
excel1 = 'C:/Users/gopoluri/Desktop/Latest/w_loaded_updated_iod.xlsx'
df1 = pd.read_excel(excel1)
values1 = df1[0 : 7]
dataframes = [values1]
df1.loc[df1['Waived'] == 'Yes'].to_excel("output11.xlsx")
But I am getting and all columns. But I need the all rows only from column 2, column 3, column 5, column8. Can anyone please correct my code if anything is wrong.
Like below:
you can get columns x,y,z from your dataframe by filtering as follows:
df = df.loc[["x", "y", "z"]].
Example:
df = pd.DataFrame(dict(a=[1,2,3],b=[3,4,5],c=[5,6,7]))
df = df[["a","b"]]
df # prints output:
a b
0 1 3
1 2 4
2 3 5
I have a dataframe with CSVs in language column
Name Language
0 A French,Espanol
1 B Deutsch,English
I wish to transform the above dataframe as below
Name Language
0 A French
1 A Espanol
2 B Deutsch
3 B English
I tried the below code but couldn't accomplish
df=df.join(df.pop('Language').str.extractall(',$')[0] .reset_index(level=1,drop=True) .rename('Language')) .reset_index(drop=True)
pandas.DataFrame.explode should be suited for that task. Combine it with pandas.DataFrame.assign to get the desired column:
import pandas as pd
df = pd.DataFrame({'Name':['A', 'B'], 'Language': ['French,Espanol', 'Deutsch,English']})
df = df.assign(Language=df['Language'].str.split(',')).explode('Language')
# Name Language
# 0 A French
# 0 A Espanol
# 1 B Deutsch
# 1 B English
First create a new dataframe with the same columns, then split second values and appent rows to the dataframe.
import pandas as pd
csv_df = pd.DataFrame([['1', '2,3'], ['2', '4,5']], columns=['Name', 'Language'])
df = pd.DataFrame(columns=['Name ', 'Language'])
for index, row in csv_df .iterrows():
name = row['Name']
s = row['Language']
txt = s.split(',')
for x in txt:
df = df.append(pd.Series([name, x], index=df.columns), ignore_index=True)
print(df)
Let's say i have a dataframe with columns A, B, C, D
import pandas as pd
import numpy as np
## create dataframe 100 by 4
df = pd.DataFrame(np.random.randn(100,4), columns=list('ABCD'))
df.head(10)
I would like to create a new column, "max_bcd", and this column will say 'b','c','d', indicating that for that particular row, one of those three columns contains the largest value.
Does anyone know how to accomplish that?
Try this idmax with axis=1 will help you to find the max value among columnns:
>>> df.idxmax(axis=1)
0 B
1 C
2 D
dtype: object
import pandas as pd
import numpy as np
cols = ['B', 'C', 'D']
## create dataframe 100 by 4
df = pd.DataFrame(np.random.randn(100,4), columns=list('ABCD'))
df.head(10)
df.insert(4, 'max_BCD_name', None)
df.insert(5, 'max_BCD_value', None)
df['max_BCD_name'] = df.apply(lambda x: df[cols].idxmax(axis=1)) # column name
df['max_BCD_value'] = df.apply(lambda x: df[cols].max(axis=1)) # value
print(df)
Edit: Just saw your requirement of only B, C and D. Added code for that.
Output:
A B C D max_BCD_name max_BCD_value
0 -0.653010 -1.479903 3.415286 -1.246829 C 3.415286
1 0.343084 1.243901 0.502271 -0.467752 B 1.243901
2 0.099207 1.257792 -0.997121 -1.559208 B 1.257792
3 -0.646787 1.053846 -2.663767 1.022687 B 1.053846
Is it possible to append to an empty data frame that doesn't contain any indices or columns?
I have tried to do this, but keep getting an empty dataframe at the end.
e.g.
import pandas as pd
df = pd.DataFrame()
data = ['some kind of data here' --> I have checked the type already, and it is a dataframe]
df.append(data)
The result looks like this:
Empty DataFrame
Columns: []
Index: []
This should work:
>>> df = pd.DataFrame()
>>> data = pd.DataFrame({"A": range(3)})
>>> df = df.append(data)
>>> df
A
0 0
1 1
2 2
Since the append doesn't happen in-place, so you'll have to store the output if you want it:
>>> df = pd.DataFrame()
>>> data = pd.DataFrame({"A": range(3)})
>>> df.append(data) # without storing
>>> df
Empty DataFrame
Columns: []
Index: []
>>> df = df.append(data)
>>> df
A
0 0
1 1
2 2
And if you want to add a row, you can use a dictionary:
df = pd.DataFrame()
df = df.append({'name': 'Zed', 'age': 9, 'height': 2}, ignore_index=True)
which gives you:
age height name
0 9 2 Zed
You can concat the data in this way:
InfoDF = pd.DataFrame()
tempDF = pd.DataFrame(rows,columns=['id','min_date'])
InfoDF = pd.concat([InfoDF,tempDF])
The answers are very useful, but since pandas.DataFrame.append was deprecated (as already mentioned by various users), and the answers using pandas.concat are not "Runnable Code Snippets" I would like to add the following snippet:
import pandas as pd
df = pd.DataFrame(columns =['name','age'])
row_to_append = pd.DataFrame([{'name':"Alice", 'age':"25"},{'name':"Bob", 'age':"32"}])
df = pd.concat([df,row_to_append])
So df is now:
name age
0 Alice 25
1 Bob 32
pandas.DataFrame.append Deprecated since version 1.4.0: Use concat() instead.
Therefore:
df = pd.DataFrame() # empty dataframe
df2 = pd..DataFrame(...) # some dataframe with data
df = pd.concat([df, df2])