Pandas - Appending DataFrame [duplicate] - python

This question already has answers here:
df.append() is not appending to the DataFrame
(2 answers)
Closed 1 year ago.
When appending to a pandas DataFrame, the appended value doesn't get added to the DataFrame.
I am trying to make an empty DataFrame, and then be able to add more rows onto it, later in my code.
import pandas
df = pandas.DataFrame(columns=["A"])
df.append(DataFrame([[1]]))
print(df)
Output:
Empty DataFrame
Columns: [date, start_time, end_time]
Index: []
Any ideas what I might be doing wrong?
According to the documentation this should work as expected with a new row of value 1 under column A. However, as described above, instead it doesn't append a new row.

As #HenryEcker mentionned to you, append returns a copy of the dataframe with the new values. Your code should be:
import pandas
df = pandas.DataFrame(columns=["A"])
df = df.append(pandas.DataFrame([1], columns=['A']))
print(df)
Output:
A
0 1

Related

how to modify dataframe when referring them through a list loop [duplicate]

This question already has answers here:
Why isn't Pandas .fillna() filling values in DataFrame?
(2 answers)
Closed last year.
I have many dataframes and I store them in a list.
Now I'd like to do simple fillna(0) to each dataframe, so I do the following, but it didn't work:
df_list = [df_1,df_2,df_3,df_4]
for df in df_list:
df = df.fillna(0)
df.index = df.index.strftime('%Y-%m-%d')
I think df on the left hand side inside the loop is not the same as original dataframe, how to do it?
In your first line of the loop you are defining a new dataframe and doing nothing with it.
Instead you can just use inplace = True to do the work on the dataframe without creating a new one.
for df in df_list:
df.fillna(0, inplace = True)
df.index=df.index.strftime('%Y-%m-%d')

assign values to columns pandas dataframe [duplicate]

This question already has answers here:
How can I pivot a dataframe?
(5 answers)
Closed 1 year ago.
I have a temporary dataframe temp (as shown below) sliced from a larger dataframe.
I appreciate it if help me to assign the item_price value of each row to a related column associated with model as shown below:
Note: original and larger dataframe contains brands, prices and models which some of the rows have a similar brand name with different model and price, so I slice those similar records into temp dataframe and try to assign price to related columns associated with model for each record.
Thanks in advance!
If I were you I would delete the columns 'Sedan', 'Sport' and 'SUV' and use pivot
In your case you would want to do the following:
Create a new Dataframe called df1 like so:
df1 = df.pivot(index='brand', columns='model', values='item_price')
And then join your original DataFrame df1 with df1.
df = df.join(df1, on='brand')
This will give you the result you are looking for.
You can create a method that returns the value based on a condition like this:
I'm using df as the name of the dataframe, you can rename to temp.
def set_item_price(model):
if model == "Sedan":
return 78.00
return 0
df["item_price"] = [
set_item_price(a) for a in df['model']
]

How to set a new index [duplicate]

This question already has answers here:
How to convert index of a pandas dataframe into a column
(9 answers)
Closed 1 year ago.
My df has the columns 'Country' and 'Country Code' as the current index. How can I remove this index and create a new one that just counts the rows? I´ll leave the picture of how it´s looking. All I want to do is add a new index next to Country. Thanks a lot!
If you are using a pandas DataFrame and your DataFrame is called df:
df = df.reset_index(drop=False)

Get the specified set of columns from pandas dataframe [duplicate]

This question already has answers here:
How to select all columns whose names start with X in a pandas DataFrame
(11 answers)
Closed 2 years ago.
I manually select the columns in a pandas dataframe using
df_final = df[['column1','column2'.......'column90']]
Instead I provide the list of column names in a list by
dp_col = [col for col in df if col.startswith('column')]
But not sure how to use this list to get only those set of columns from the source dataframe.
You can use this as the list of columns to select, so:
df_final = df[[col for col in df if col.startswith('column')]]
The "origin" of the list of strings is of no importance, as long as you pass a list of strings to the subscript, this will normally work.
Use loc access with boolean masking:
df.loc[:, df.columns.str.startswith('column')]

How to get only the colums I want to move to new dataframe with this code? [duplicate]

This question already has answers here:
How to take column-slices of dataframe in pandas
(11 answers)
Closed 3 years ago.
I am trying to select only 2 columns from a csv file: Body and CreatedDate.
CreatedDate looks like this: 2018-08-07T12:36:11.000Z.
Body is just text of work being done. Some Body cells are empty so I only want the ones with data in it.
I have tried using the code below to just get only the 2 desired columns:
import pandas as pd
df = pd.read_csv("file.csv")
df1= df['CreatedDate'].map(str) + ' ' + df['Body'].map(str)
print(df1)
I am getting the entire df printed twice. I see this:
[10 rows x 15 columns] & [15 rows x 10 columns]
at the bottom of each print. I am expecting to only see my 2 chosen columns. Why am I seeing all of df twice on the console?
There are many options for indexing a dataframe. This particular one can be done on a single line.
import pandas as pd
# read the csv into df
df = pd.read_csv("file.csv")
# take only the rows where 'Body' has a value and only columns ['Body', 'CreatedDate']
df = df.loc[df['Body'].notnull(),['Body', 'CreatedDate']]
print(df)
You may also want to read up on pandas.DataFrame.dropna.

Categories

Resources