Creating new column in Pandas based on values from another column - python

The task is the following:
Add a new column to df called income10. It should contain the same
values as income with all 0 values replaced with 1.
I have tried the following code:
df['income10'] = np.where(df['income']==0, df['income10'],1)
but I keep getting an error:

You can apply a function on each value in your column:
df["a"] = df.a.apply(lambda x: 1 if x == 0 else x)

You are trying to reference a column which does not exist yet.
df['income10'] = np.where(df['income']==0, ===>**df['income10']**,1)
In your np.where, you need to reference the column where the values originate. Try this instead
df['income10'] = np.where(df['income']==0, 1, df['income'])
Edit: corrected order of arguments

Related

Writing a loop with an integer in python

I have a dataframe as such:
data = [[0xD8E3ED, 2043441], [0xF7F4EB, 912788],[0x000000,6169]]
df = pd.DataFrame(data, columns=['c_code', 'occurence'])
I am attempting to loop through c_code to get an integer value. The following code works to obtain the integer
hex_val = '0xFF9B3B'
print(int(hex_val, 0))
16751419
But when I try to loop through the column I run into an issue. I currently have this running but am just overwriting every value.
for i in range(len(df)):
df['value'] = int((df['c_code'].iloc[i]), 0)
Ideal output would be a df with a value column that reflects the value of the c_code. The image below shows the desired format but notice that the value is the same for all rows. I believe that I need to append rows but I am unsure of how to do that
I believe that you can modify the type of your column c_code and assign this to a new column.
import pandas as pd
data = [['0xD8E3ED', 2043441], ['0xF7F4EB', 912788],['0x000000',6169]]
df = pd.DataFrame(data, columns=['c_code', 'occurence'])
df['value'] = df['c_code'].apply(int, base=16)
Also, I had to put the hexadecimal numbers as strings, if not pandas converts them to int directly.
I get this result:
You are assigning the entire column to a new value at each step in the loop
df["value"] = ...
To specify a row you need to change it to df["value"][i] = ...
However, You shouldn't have to loop through each value in Pandas.
try:
df["value"] = int(df["c_code"], 0)

Figuring out if an entire column in a Pandas dataframe is the same value or not

I have a pandas dataframe that works just fine. I am trying to figure out how to tell if a column with a label that I know if correct does not contain all the same values.
The code
below errors out for some reason when I want to see if the column contains -1 in each cell
# column = "TheColumnLabelThatIsCorrect"
# df = "my correct dataframe"
# I get an () takes 1 or 2 arguments but 3 is passed in error
if (not df.loc(column, estimate.eq(-1).all())):
I just learned about .eq() and .all() and hopefully I am using them correctly.
It's a syntax issue - see docs for .loc/indexing. Specifically, you want to be using [] instead of ()
You can do something like
if not df[column].eq(-1).all():
...
If you want to use .loc specifically, you'd do something similar:
if not df.loc[:, column].eq(-1).all():
...
Also, note you don't need to use .eq(), you can just do (df[column] == -1).all()) if you prefer.
You could drop duplicates and if you get only one record it means all records are the same.
import pandas as pd
df = pd.DataFrame({'col': [1, 1, 1, 1]})
len(df['col'].drop_duplicates()) == 1
> True
Question not as clear. Lets try the following though
Contains only -1 in each cell
df['estimate'].eq(-1).all()
Contains -1 in any cell
df['estimate'].eq(-1).any()
Filter out -1 and all columns
df.loc[df['estimate'].eq(-1),:]
df['column'].value_counts() gives you a list of all unique values and their counts in a column. As for checking if all the values are a specific number, you can do that by dropping duplicates and checking the length to be 1.
len(set(df['column'])) == 1

Making a new column based on 2 other columns

I am trying to calculate a new column labeled in the code as "Sulphide-S(calc)-C_%S", this column can be calculated from one of two options (see below in the code). Both these columns wont be filled at the same time. So I want it to calculate from the column that has data present. Presently, I have this but the second equation overwrites the first.
df["Sulphide-S(calc)-C_%S"] = df["Total-S_%S"] - df["Sulphate-S(HCL Leachable)_%S"]
df.head()
df["Sulphide-S(calc)-C_%S"] = df["Total-S_%S"]- df["Sulphate-S_%S"]
df.head()
You can use the apply function in pandas to create a new column based on other columns, resulting in a Series that you can add to your original dataframe. Without knowing what your dataframe looks like, the following code might not work directly until you replace the if condition with a working condition to detect the empty dataframe spot.
def create_sulfide_col(row):
if row["Sulphate-S(HCL Leachable)_%S"] is None:
val = row["Total-S_%S"] - row["Sulphate-S(HCL Leachable)_%S"]
else:
val = ["Total-S_%S"]- df["Sulphate-S_%S"]
return val
df["Sulphide-S(calc)-C_%S"] = df.apply(lambda row: create_sulfide_col(row), axis='columns')
If I'm understanding what you're saying correctly, the second equation overwrites the first because they have the same column name. Try changing the column name in one or both of the "Sulphide-S(calc)-C_%S" to something else like "Sulphide-S(calc)-C_%S_A" and "Sulphide-S(calc)-C_%S_B":
df["Sulphide-S(calc)-C_%S_A"] = df["Total-S_%S"] - df["Sulphate-S(HCL Leachable)_%S"]
df.head()
df["Sulphide-S(calc)-C_%S_B"] = df["Total-S_%S"]- df["Sulphate-S_%S"]
df.head()

ValueError: Cannot set a frame with no defined index and a value that cannot be converted to a Series

I'm using Pandas 0.20.3 in my python 3.X. I want to add one column in a pandas data frame from another pandas data frame. Both the data frame contains 51 rows. So I used following code:
class_df['phone']=group['phone'].values
I got following error message:
ValueError: Cannot set a frame with no defined index and a value that cannot be converted to a Series
class_df.dtypes gives me:
Group_ID object
YEAR object
Terget object
phone object
age object
and type(group['phone']) returns pandas.core.series.Series
Can you suggest me what changes I need to do to remove this error?
The first 5 rows of group['phone'] are given below:
0 [735015372, 72151508105, 7217511580, 721150431...
1 []
2 [735152771, 7351515043, 7115380870, 7115427...
3 [7111332015, 73140214, 737443075, 7110815115...
4 [718218718, 718221342, 73551401, 71811507...
Name: phoen, dtype: object
In most cases, this error comes when you return an empty dataframe. The best approach that worked for me was to check if the dataframe is empty first before using apply()
if len(df) != 0:
df['indicator'] = df.apply(assign_indicator, axis=1)
You have a column of ragged lists. Your only option is to assign a list of lists, and not an array of lists (which is what .value gives).
class_df['phone'] = group['phone'].tolist()
The error of the Question-Headline
"ValueError: Cannot set a frame with no defined index and a value that cannot be converted to a Series"
might as well occur if for what ever reason the table does not have any rows.
Instead of using an if-statement, you can use set result_type argument of apply() function to "reduce".
df['new_column'] = df.apply(func, axis=1, result_type='reduce')
The data assigned to a column in the DataFrame must be a single dimension array. For example, consider a num_arr to be added to a DataFrame
num_arr.shape
(1, 126)
For this num_arr to be added to a DataFrame column, It should be reshaped....
num_arr = num_arr.reshape(-1, )
num_arr.shape
(126,)
Now I could set this arr as a DataFrame column
df = pd.DataFrame()
df['numbers'] = num_arr

Iterate and input data into a column in a pandas dataframe

I have a pandas dataframe with a column that is a small selection of strings. Let's call the column 'A' and all of the values in it are string_1, string_2, string_3.
Now, I want to add another column and fill it with numeric values that correspond to the strings.
I created a dictionary
d = { 'string_1' : 1, 'string_2' : 2, 'string_3': 3}
I then initialized the new column:
df['B'] = pd.Series(index=df.index)
Now, I want to fill it with the integer values. I can call the values associated with the strings in the dictionary by:
for s in df['A']:
n = d[s]
That works fine, but I've tried using just plain df['B'] = n to fill the new column in the for-loop, but that doesn't work, and I've tried to figure out indexing with pandas.
If I understand you correctly you can just call map:
df['B'] = df['A'].map(d)
This will perform the lookup and fill the values you are looking for.
Rather than fill as an empty column, you can simply populate this with an apply:
df['B'] = df['A'].apply(d.get)

Categories

Resources