assign values to columns pandas dataframe [duplicate] - python

This question already has answers here:
How can I pivot a dataframe?
(5 answers)
Closed 1 year ago.
I have a temporary dataframe temp (as shown below) sliced from a larger dataframe.
I appreciate it if help me to assign the item_price value of each row to a related column associated with model as shown below:
Note: original and larger dataframe contains brands, prices and models which some of the rows have a similar brand name with different model and price, so I slice those similar records into temp dataframe and try to assign price to related columns associated with model for each record.
Thanks in advance!

If I were you I would delete the columns 'Sedan', 'Sport' and 'SUV' and use pivot
In your case you would want to do the following:
Create a new Dataframe called df1 like so:
df1 = df.pivot(index='brand', columns='model', values='item_price')
And then join your original DataFrame df1 with df1.
df = df.join(df1, on='brand')
This will give you the result you are looking for.

You can create a method that returns the value based on a condition like this:
I'm using df as the name of the dataframe, you can rename to temp.
def set_item_price(model):
if model == "Sedan":
return 78.00
return 0
df["item_price"] = [
set_item_price(a) for a in df['model']
]

Related

Pandas: Sort Dataframe is Column Value Exists in another Dataframe

I have a database which has two columns with unique numbers. This is my reference dataframe (df_reference). In another dataframe (df_data) I want to get the rows of this dataframe of which a column values exist in this reference dataframe. I tried stuff like:
df_new = df_data[df_data['ID'].isin(df_reference)]
However, like this I can't get any results. What am I doing wrong here?
From what I see, you are passing the whole dataframe in .isin() method.
Try:
df_new = df_data[df_data['ID'].isin(df_reference['ID'])]
Convert the ID column to the index of the df_data data frame. Then you could do
matching_index = df_reference['ID']
df_new = df_data.loc[matching_index, :]
This should solve the issue.

changing row values in a dataframe by looking into another dataframe [duplicate]

This question already has answers here:
Pandas Merging 101
(8 answers)
Closed last year.
I have a look up table as a dataframe (1000 rows) consisting of codes and labels. I have another dataframe (2,00,000 rows) consisting of codes and geometries.
I need to get label names for each corresponding code by looking in the look up dataframe.
Output should be dataframe.
I tried it as follows.
df = pd.read_csv(filepath)
codes = df['codes'].values
labels = df['labels'].values
df2 = pd.read_csv(filepath)
print (df2.shape)
for ix in df2.index:
code = df2.loc[ix, 'code']
df2.loc[ix, 'label'] = labels[codes==code][0]
print (df2)
Result is correct, but it's very slow... for looping is very slow
Can you help me?
You should use the merge method of DataFrames (https://pandas.pydata.org/docs/reference/api/pandas.merge.html). It allows to join two dataframes based on a common column. Your code should look like this:
df2 = df2.merge(df, left_on="code", right_on="codes", how="left")
# Check labels using df2["labels"]
The common column name is specified in the parameters left_on and right_on. The parameter how='left' indicates that all the rows from df2 are preserved even if there is no code for a row.

Is there a function that can remove multiple rows based on multiple specific column values in a pandas dataframe?

I have a particular Pandas dataframe that has multiple different string categories in a particular column - 'A'. I want to create a new dataframe with only rows that contain 7 separate categories from column A out of about 15.
I know that I can individually remove/add categories using:
df1 = df[df.Category != 'a']
but I also tried using a list to try and do it in a single line, like such:
df1 = df[df.Category = ['x','y','z']]
but that gave me a syntax error. Is there any way to perform this function?
try:
df1 = df[df.Category.isin(['x','y','z'])]

Create multiple column pandas from single column and feed in values

I have a dataframe that looks like this (df1):
I want to recreate the following dataframe(df2) to look like df1:
The number of years in df2 goes up to 2020.
So, essentially for each row in df2, a new row for each year should be created. Then, new columns should be created for each month. Finally, the value for % in each row should be copied to the column corresponding to the month in the "Month" column.
Any ideas?
Many thanks.
This is pivot:
(df2.assign(Year=df2.Month.str[:4],
Month=df2.Month.str[5:])
.pivot(index='Year', columns='Month', values='%')
)
More details about pivoting a dataframe here.

pandas max function results in inoperable DataFrame

I have a DataFrame with four columns and want to generate a new DataFrame with only one column containing the maximum value of each row.
Using df2 = df1.max(axis=1) gave me the correct results, but the column is titled 0 and is not operable. Meaning I can not check it's data type or change it's name, which is critical for further processing. Does anyone know what is going on here? Or better yet, has a better way to generate this new DataFrame?
It is Series, for one column DataFrame use Series.to_frame:
df2 = df1.max(axis=1).to_frame('maximum')

Categories

Resources