how make columns of dataframe variable - python

I want to make the columns of Salary_Data_split variables, depending of Sal_name (type : list) where:
Sal_name = ['Success_S_1', 'Failure_S_1', 'Success_S_2', 'Failure_S_2','Success_S_4', 'Failure_S_4','Success_S_7', 'Failure_S_7','Success_S_8', 'Failure_S_8']
and Salary_Data_split must be as follow, it contains: Salary + existing rows on Sal_name. Like :
Salary_Data_split = data[["Salary",'Success_S_1', 'Failure_S_1', 'Success_S_2', 'Failure_S_2','Success_S_4', 'Failure_S_4','Success_S_7', 'Failure_S_7','Success_S_8', 'Failure_S_8']]
I have tried this code but it doesnt work
Salary_Data_split = data[["Salary", Sal_name]]

Please always include example data in your posts. It's also important to always include error messages in your posts. That way, your question is alot more clear. I am guessing data is your dataframe with columns Sal_name and Salary, which you want to combine in Sal_data_split?
data['sal_Data_Split'] = [data['Salary'], data['Sal_name']]
This will put the columns Salary and Sal_name in a list, resulting in a nested list if data['Sal_name'] is a list itself. The way you assigned Salary_Data_split = data[["Salary", Sal_name]] in your original post it just indexes 2 columns of the dataframe at once. You also forgot the quotation marks around Sal_name if that is what you meant.

Related

Create a column in dataframe with name of an existing array (initial 4 letters of array name)

I would like to create a column in dataframe having name of an array. For example, the name of array is "customer" then name of the column should be "cust_prop" (initial 4 letters from array's name). Is there any way to get it?
Your question is a bit unclear, but presuming that you are asking: how do i turn the string "customer" into "cust_prop", thats easy enough:
Str = "customer"
NewStr = Str[0:4] + "_prop"
you might need to some extra checking for shorter strings, but i dont know what the behaviour there would be that you want.
If you mean something else, please post some code examples of what you have tried.
You didn't really describe from where you get an array name, so I'll just assume you have it in a variable:
array_name = 'customer'
to slice only first four digit and use it:
new_col_name = f'{array_name[0:4]}_prop'
df[new_col_name] = 1
here I "created" a new column in existing dataframe df, and put value of 1 to the entire column. Instead, you can create a series with any value you want:
series = pd.Series(name=new_col_name, data=array_customer)
Here I created a series with the name as desired, and assumed you have an array_customer variable which holds the array

Cannot use replace method on python

I wanted to change my variables on spesific column with dictionary values but it does not change. I tried several ways and but it does not work. My dataset has 47k rows and my dictionary has 30 different words so I will show some.
My dataset:
Dictionary:
rolechange = {"\\Adv":"Adversary",
"\\Sci":"Scientist",
"\\Inn":"Innocent",
"\\Und":"Undetermined"}
I'm trying
movies_df["Role Type"].replace(rolechange, inplace=True)
It does not gives error but result is same. I couldn't find similar question on here, sorry for if its duplicate.
You just have to create raw strings (prefix 'r')
rolechange = {r"\\Adv":"Adversary",
r"\\Sci":"Scientist",
r"\\Inn":"Innocent",
r"\\Und":"Undetermined"}
>>> df['Role Type'].replace(rolechange)
0 Scientist
1 Innocent
2 Undetermined
Name: Role Type, dtype: object

Iterate through list of dataframes, performing calculations on certain columns of each dataframe, resulting in new dataframe of the results

Newbie here. Just as the title says, I have a list of dataframes (each dataframe is a class of students). All dataframes have the same columns. I have made certain columns global.
BINARY_CATEGORIES = ['Gender', 'SPED', '504', 'LAP']
for example. These are yes/no or male/female categories, and I have already changed all of the data to be 1's and 0's for these columns. There are several other columns which I want to ignore as I iterate.
I am trying to accept the list of classes (dataframes) into my function and perform calculations on each dataframe using only my BINARY_CATEGORIES list of columns. This is what I've got, but it isn't making it through all of the classes and/or all of the columns.
def bal_bin_cols(classes):
i = 0
c = 0
for x in classes:
total_binary = classes[c][BINARY_CATEGORIES[i]].sum()
print(total_binary)
i+=1
c+=1
Eventually I need a new dataframe from this all of the sums corresponding to the categories and the respective classes. print(total binary) is just a place holder/debugger. I don't have that code yet that will populate the dataframe from the results of the above code, but I'd like it to be the classes as the index and the total calculation as the columns.
I know there's probably a vectorized way to do this, or enum, or groupby, but I will take a fix to my loop. I've been stuck forever. Please help.
Try something like:
Firstly create a dictionary:
d={
'male':1,
'female':0,
'yes':1,
'no':0
}
Finally use replace():
df[BINARY_CATEGORIES]=df[BINARY_CATEGORIES].replace(d.keys(),d.values(),regex=True)

How to modify cells in column conditionally in pandas?

I have a csv dataset which for whatever reason has an extra asterisk (*) at the end of some names. I am trying to remove them, but I'm having trouble. I just want to replace the name in the case where it ends with a *, otherwise keep it as-is.
I have tried a couple variations of the following, but with little success.
import pandas as pd
people = pd.read_csv("people.csv")
people.loc[people["name"].str[-1] == "*"]] = people["name"].str[:-1]
Here I am getting the following error:
ValueError: Must have equal len keys and value when setting with an iterable
I understand why this is wrong, but I'm not sure how else to reference the values I want to change.
I could instead do something like:
starred = people.loc[people["name"].str[-1] == "*"]
starred["name"] = starred["name"].str[:-1]
I get a warning here, but this kind of works. The problem is that it only contains the previously starred people, not all of them.
I'm kind of new to this, so apologies if this is simple. I feel like it shouldn't be too hard, there should be some function to do this, but I don't know what it is.
Your syntax for pd.DataFrame.loc needs to include a column label:
df = pd.DataFrame({'name': ['John*', 'Rose', 'Summer', 'Mark*']})
df.loc[df['name'].str[-1] == '*', 'name'] = df['name'].str[:-1]
print(df)
name
0 John
1 Rose
2 Summer
3 Mark
If you only specify the first part of the indexer, you will be filtering by row label only and return a dataframe. You cannot assign a series to a dataframe.

Using dicts to look up values for DataFrame variables

I have a pandas DataFrame with columns Teacher_ID and Student_ID. I also have dicts for each, TDict and SDict, giving, say, the grade in which each teacher teaches and the grade each student is enrolled in, with their ID numbers as the keys.
I want to create a new column in my DataFrame referencing the information in the dicts. But when I try to create a column with a formula something like TDict[Teacher_ID] + SDict[Student_ID], I get an error message telling me that "'Series' objects are mutable, thus they cannot be hashed."
What's the approved way around this? Do I have to copy the ID's into new columns, replace the values in those columns with the dict values, and then work from there? I'm guessing there's a better way....
If I understand you correctly then you can simply call map:
df['Teaching_grade'] = df['Teacher_ID'].map(TDict)
df['Student_grade'] = df['Student_ID'].map(SDict)
This will perform the lookup and assign the value to the new column

Categories

Resources