Put a new column inside a Dataframe Python - python

The Dataframe that I am working on it have a column called "Brand" that have a value called "SEAT " with the white space. I achieved to drop the white space but I don't know how to put the new column inside the previous Dataframe. I have to do this because I need to filter the previous Dataframe by "SEAT" and show these rows.
I tried this:
import pandas as pd
brand_reviewed = df_csv2.Brand.str.rstrip()
brand_correct = 'SEAT'
brand_reviewed.loc[brand_reviewed['Brand'].isin(brand_correct)]
Thank you very much!

As far as I understand,
you're trying to return rows that match the pattern "SEAT".
You are not forced to create a new column. You can directly do the following:
df2 = brand_reviewed[brand_reviewed.Brand.str.rstrip() == "SEAT"]
print(df2)

You have done great. I will also mentioned another form on how you can clead the white spaces. And also, if you just want to add a new column in your current dataframe, just write the last line of this code.
import pandas as pd
brand_reviewed = pd.read_csv("df_csv2.csv")
data2 = data["Brand"].str.strip()
brand_reviewed["New Column"] = data2
If you have another query, let me know.
Octavio Velázquez

Related

Dropping rows and finding the average of a speific column

I am trying to remove specific rows from the dataset and find the average of a specific column after the rows are removed without changing the original dataset
import pandas as PD
import NumPy as np
df = PD.read_csv(r"C:\Users\User\Downloads\nba.CSV")
NBA = PD.read_csv(r"C:\Users\User\Downloads\nba.CSV")
NBA.drop([25,72,63],axis=0)
I NEED TO FIND THE AVERAGE OF A SPECIFIC COLUMN LIKE "AGE"
HOWEVER THIS ISNT WORKING: Nba.drop([25,72,63],axis=0),['Age'].mean()
NEITHER IS THE QUERY COMMAND OR THE. LOC COMMAND
can you try this? I think there was a typo in your code
Nba.drop([25,72,63],axis=0)['Age'].mean()
Your code to drop the rows is correct.
NBA_clean = NBA.drop([25,72,63],axis=0)
will give you a new dataframe with some rows removed.
To find the average of a specific column, you can use index notation, which will return a series containing that specific row:
NBA_Age = NBA_clean["Age"]
Finally, to return the mean, you simply call the mean() method with:
NBA_mean_age = NBA_Age.mean()
It is not clear what the specific mistake in your code is, but I will present two possibilities:
You are not saving the result of NBA.drop([25,72,63],axis=0) into a variable. This operation is not done in place, if you want to do it in place you must use the inplace=True argument for NBA.drop([25,72,63], axis=0, inplace=True).
There is an unnecessary comma in Nba.drop([25,72,63],axis=0),['Age'].mean(). Remove this to get the correct syntax Nba.drop([25,72,63],axis=0)['Age'].mean(). I suspect the error message obtained when running this code would have hinted at the unnecessary comma.

Adding rows using timestamp

I saw this code
combine rows and add up value in dataframe,
but I want to add the values in cells for the same day, i.e. add all data for a day. how do I modify the code to achieve this?
Check below code:
import pandas as pd
df = pd.DataFrame({'Price':[10000,10000,10000,10000,10000,10000],
'Time':['2012.05','2012.05','2012.05','2012.06','2012.06','2012.07'],
'Type':['Q','T','Q','T','T','Q'],
'Volume':[10,20,10,20,30,10]
})
df.assign(daily_volume = df.groupby('Time')['Volume'].transform('sum'))
Output:

I have to extract all the rows in a .csv corresponding to the rows with 'watermelon' through pandas

I am using this code. but instead of new with just the required rows, I'm getting an empty .csv with just the header.
import pandas as pd
df = pd.read_csv("E:/Mac&cheese.csv")
newdf = df[df["fruit"]=="watermelon"+"*"]
newdf.to_csv("E:/Mac&cheese(2).csv",index=False)
I believe the problem is in how you select the rows containing the word "watermelon". Instead of:
newdf = df[df["fruit"]=="watermelon"+"*"]
Try:
newdf = df[df["fruit"].str.contains("watermelon")]
In your example, pandas is literally looking for cells containing the word "watermelon*".
missing the underscore in pd.read_csv on first call, also it looks like the actual location is incorrect. missing the // in the file location.

Column in Pandas Series is appearing in a row above the rest

import pandas as pd
from IPython.display import display_html
df = pd.DataFrame({'Name': ['Mike','George', 'George']})
name_series = df.groupby("Name").size()
name_series.name = ''
name_dataframe = name_series.to_frame()
name_styler = name_dataframe.style.set_table_attributes("style='display:inline'")
display_html(name_styler.render(), raw = True)
I have this cell where I want to display a DataFrame in this fashion so I can add more directly beside it to, rather than displaying DataFrames one after another.
When I display it with the name set to blank, the data looks like this:
When I set to name to anything, then it seems like the column name appears on the row above the rest of the table:
I checked my code posted in another website and the tables all look like this if I don't set a name to a column, which means there's another row being created for some reason:
How can I place the new column name and the existing column name on the same row?
edit:
After applying name_dataframe.reset_index(inplace=True)
edit 2:
Applying name_series.index.name=None gives:
Is there a way to combine answers to just show both titles and columns ONLY, "Name" and "Count" ?

Python: Create dataframe with 'uneven' column entries

I am trying to create a dataframe where the column lengths are not equal. How can I do this?
I was trying to use groupby. But I think this will not be the right way.
import pandas as pd
data = {'filename':['file1','file1'], 'variables':['a','b']}
df = pd.DataFrame(data)
grouped = df.groupby('filename')
print(grouped.get_group('file1'))
Above is my sample code. The output of which is:
What can I do to just have one entry of 'file1' under 'filename'?
Eventually I need to write this to a csv file.
Thank you
If you only have one entry in a column the other will be NaN. So you could just filter the NaNs by doing something like df = df.at[df["filename"].notnull()]

Categories

Resources