change specific columns of data frame using pandas - python

I have a data frame 3 columns: ["date", "volume", "ID"]. ID=0,1,2...15.
I would like to create new data frame: keep all rows with ID=5.
Other rows: still keep them all, but set the row["volume"] = 0.

First copy your dataframe:
df_new = df.copy()
Then, using pd.DataFrame.loc, set volume to 0 for your criteria:
df_new.loc[df_new['ID'] != 5, 'volume'] = 0

Related

Find values in a Pandas dataframe and insert the data in a column of another Pandas dataframe

I have a dataframe that I need to convert the Custom Field column rows to columns in a second dataframe. This part I have managed to do and it works fine.
The problem is that I need to add the corresponding values from the id column to the respective columns of the second dataframe.
Here is an example:
This is first dataframe:
This is the second dataframe, with the columns already converted.
But I would like to add the values corresponding to the id column of the first dataframe to the second dataframe:
Attached is the code:
import pandas as pd
Data = {
"Custom Field": ["CF1", "CF2", "CF3"],
"id": [50, 40, 45],
"Name": ["Wilson", "Junior", "Otavio"]
}
### create the dataframe ###
df = pd.DataFrame(data)
print(df)
### add new columns from a list ###
columns_list = []
for x in df['Custom Field']:
### create multiple columns with x ##
columns_list.append(x)
### convert list to new columns ###
df2 = pd.DataFrame(df,columns=columns_list)
df2["Name"] = df["Name"]
print(df2)
### If Name df3 is equal to Name df and equal to Custom Field of df, then get the id of df and insert the value into the corresponding column in df3. ###
#### First unsuccessful attempt ###
df2_columns_names = list(df2.columns.values)
for df2_name in df2['Name']:
for df2_cf in df2_columns_names:
for df_name in df['Name']:
for df_cf in df['Custom Field']:
for df_id in df['id']:
if df2_name == df_name and df2_cf == df_cf:
df2.loc[df2_name, df2_cf] = df_id
print(df2)
Any suggestions?
Thanks in advance.
Use pivot_table
df.pivot_table(index=['Name'], columns=['Custom Field'])
As a general rule of thumb, if you are doing for loops and changing cells manually, you're using pandas wrong. Explore the methods of the framework in the docs, it can be very powerful :)

How to merge columns interspersing the data?

I'm new to python and pandas and working to create a Pandas MultiIndex with two independent variables: flow and head which create a dataframe and I have 27 different design points. It's currently organized in a single dataframe with columns for each variable and rows for each design point.
Here's how I created the MultiIndex:
flow = df.loc[0, ["Mass_Flow_Rate", "Mass_Flow_Rate.1",
"Mass_Flow_Rate.2"]]
dp = df.loc[:,"Design Point"]
index = pd.MultiIndex.from_product([dp, flow], names=
['DP','Flows'])
I then created three columns of data:
df0 = df.loc[:,"Head2D"]
df1 = df.loc[:,"Head2D.1"]
df2 = df.loc[:,"Head2D.1"]
And want to merge these into a single column of data such that I can use this command:
pc = pd.DataFrame(data, index=index)
Using the three columns with the same indexes for the rows (0-27), I want to merge the columns into a single column such that the data is interspersed. If I call the columns col1, col2 and col3 and I denote the index in parentheses such that col1(0) indicates column1 index 0, I want the data to look like:
col1(0)
col2(0)
col3(0)
col1(1)
col2(1)
col3(1)
col1(2)...
it is a bit confusing. But what I understood is that you are trying to do this:
flow = df.loc[0, ["Mass_Flow_Rate", "Mass_Flow_Rate.1",
"Mass_Flow_Rate.2"]]
dp = df.loc[:,"Design Point"]
index = pd.MultiIndex.from_product([dp, flow], names=
['DP','Flows'])
df0 = df.loc[:,"Head2D"]
df1 = df.loc[:,"Head2D.1"]
df2 = df.loc[:,"Head2D.1"]
data = pd.concat[df0, df1, df2]
pc = pd.DataFrame(data=data, index=index)

Pandas : Data Frame Pruning

I have a data frame as given below:
data = [['1','tom',1,0],['1','tom',0,1],['2','lynda',0,1],['2','lynda',0,1]]
df = pd.DataFrame(data, columns = ['ID','NAME', 'A','B'])
df.head()
I want to transform the dataframe to look like the below:
where in logical OR is taken for columns A and B. ID and NAME will always have same pair-values irrespective of how many times they appear but columns A and B can change(00,10,11,01).
So at the end I want ID,NAME,A,B.
You can always sum and compare to 0.
data = [['1','tom',1,0],['1','tom',0,1],['2','lynda',0,1],['2','lynda',0,1]]
df = pd.DataFrame(data, columns = ['ID','NAME', 'A','B'])
g_df = (df.groupby(['ID', 'NAME']).sum() >0).astype(float)
g_df.reset_index()

Appending a column to data frame using Pandas in python

I'm trying some operations on Excel file using pandas. I want to extract some columns from a excel file and add another column to those extracted columns. And want to write all the columns to new excel file. To do this I have to append new column to old columns.
Here is my code-
import pandas as pd
#Reading ExcelFIle
#Work.xlsx is input file
ex_file = 'Work.xlsx'
data = pd.read_excel(ex_file,'Data')
#Create subset of columns by extracting columns D,I,J,AU from the file
data_subset_columns = pd.read_excel(ex_file, 'Data', parse_cols="D,I,J,AU")
#Compute new column 'Percentage'
#'Num Labels' and 'Num Tracks' are two different columns in given file
data['Percentage'] = data['Num Labels'] / data['Num Tracks']
data1 = data['Percentage']
print data1
#Here I'm trying to append data['Percentage'] to data_subset_columns
Final_data = data_subset_columns.append(data1)
print Final_data
Final_data.to_excel('111.xlsx')
No error is shown. But Final_data is not giving me expected results. ( Data not getting appended)
There is no need to explicitly append columns in pandas. When you calculate a new column, it is included in the dataframe. When you export it to excel, the new column will be included.
Try this, assuming 'Num Labels' and 'Num Tracks' are in "D,I,J,AU" [otherwise add them]:
import pandas as pd
data_subset = pd.read_excel(ex_file, 'Data', parse_cols="D,I,J,AU")
data_subset['Percentage'] = data_subset['Num Labels'] / data_subset['Num Tracks']
data_subset.to_excel('111.xlsx')
The append function of a dataframe adds rows, not columns to the dataframe. Well, it does add columns if the appended rows have more columns than in the source dataframe.
DataFrame.append(other, ignore_index=False, verify_integrity=False)[source]
Append rows of other to the end of this frame, returning a new object. Columns not in this frame are added as new columns.
I think you are looking for something like concat.
Combine DataFrame objects horizontally along the x axis by passing in axis=1.
>>> df1 = pd.DataFrame([['a', 1], ['b', 2]],
... columns=['letter', 'number'])
>>> df4 = pd.DataFrame([['bird', 'polly'], ['monkey', 'george']],
... columns=['animal', 'name'])
>>> pd.concat([df1, df4], axis=1)
letter number animal name
0 a 1 bird polly
1 b 2 monkey george

Python Pandas create new column in loop

I'm attempting to create a new column for each column by dividing two columns. df is a pandas dataframe...
columns = list(df.columns.values)
for column_1 in columns:
for column_2 in columns:
new_column = '-'.join([column_1,column_2])
df[new_column] = df[column_1] / df[column_2]
Getting an error: NotImplementedError: operator '/' not implemented for bool dtypes
Any thoughts would be appreciate?
Like Brian said you're definitely trying to divide non-numeric columns. Here's a working example of dividing two columns to create a third:
name = ['bob','sam','joe']
age = [25,32,50]
wage = [50000, 75000, 32000]
people = {}
for i in range(0,3):
people[i] = {'name':name[i], 'age':age[i],'wage':wage[i]}
# you should now have a data frame where each row is a person
# you have one string column (name), and two numerics (age and wage)
df = pd.DataFrame(people).transpose()
df['wage_per_year'] = df['wage']/df['age']
print df

Categories

Resources