Indexing by row name

Indexing by row name - python

Can someone please help me with this. I want to call rows by name, so I used set_index on the 1st column in the dataframe to index the rows by name instead of using integers for indexing.
# Set 'Name' column as index on a Dataframe
df1 = df1.set_index("Name", inplace = True)
df1
Output:
AttributeError: 'NoneType' object has no attribute 'set_index'
Then I run the following code:
result = df1.loc["ABC4"]
result
Output:
AttributeError: 'NoneType' object has no attribute 'loc'
I don't usually run a second code that depends on the 1st before fixing the error, but originally I run them together in one Jupyter notebook cell. Now I see that the two code cells have problems.
Please let me know where I went wrong. Thank you!

Maybe you should define your dataframe?
import pandas as pd
df1 = pd.DataFrame("here's your dataframe")
df1.set_index("Name")
or just
import pandas as pd
df1 = pd.DataFrame("here's your dataframe").set_index("Name")
df1

Your variable "df1" is not defined anywhere before doing something with it.
Try this:
# Set 'Name' column as index on a Dataframe
df1 = ''
df1 = df1.set_index("Name", inplace = True)
If its defined before, its value is NONE. So check this variable first.
The rest of the code "SHOULD" work afterwards.

Related

pyspark - creating Row instance inside createDataFrame() method

Following code is supposed to create a dataframe df2 with two columns - first column storing the name of each column of df and the second column storing the max length of each column of df. But I'm getting the error shown below:
Question: What I may be doing wrong here, and how can we fix the error?
NameError: name 'row' is not defined
from pyspark.sql.functions import col, length, max
from pyspark.sql import Row
df = df.select([max(length(col(name))).alias(name) for name in df.schema.names])
df2 = spark.createDataFrame([Row(col=name, length=row[name]) for name in df.schema.names], ['col', 'length'])

Apologies Nam, Please find the below-working snippet. There was a line missing in the original answer, I've updated the same.
df = df.select([max(length(col(name))).alias(name) for name in df.schema.names])
row=df.first().asDict()
df2 = spark.createDataFrame([Row(col=name, length=row[name]) for name in df.schema.names], ['col', 'length'])
Output:
Let me know if you face any other issue

How to delete specific values from a column in a dataset (Python)?

I have a data set as below:
I want to remove 'Undecided' from my ['Grad Intention'] column. For this, I created a copy DataFrame and using the code as follows:
df_copy=df_copy.drop([df_copy['Grad Intention'] =='Undecided'], axis=1)
However, this is giving me an error.
How can I remove the row with 'Undecided'? Also, what's wrong with my code?

you could simply use:
df = df[df['Grad Intention'] != 'Undecided']
or
df.drop(df[df['Grad Intention'] == 'Undecided'].index, inplace = True)

Changing column values for a value in an adjacent column in the same dataframe using Python

I am quite new to Python programming.
I am working with the following dataframe:
Before
Note that in column "FBgn", there is a mix of FBgn and FBtr string values. I would like to replace the FBtr-containing values with FBgn values provided in the adjacent column called "## FlyBase_FBgn". However, I want to keep the FBgn values in column "FBgn". Maybe keep in mind that I am showing only a portion of the dataframe (reality: 1432 rows). How would I do that? I tried the replace() method from Pandas, but it did not work.
This is actually what I would like to have:
After
Thanks a lot!

With Pandas, you could try:
df.loc[df["FBgn"].str.contains("FBtr"), "FBgn"] = df["## FlyBase_FBgn"]

Welcome to stackoverflow. Please next time provide more info including your code. It is always helpful
Please see the code below, I think you need something similar
import pandas as pd
#ignore the dict1, I just wanted to recreate your df
dict1= {"FBgn": ['FBtr389394949', 'FBgn3093840', 'FBtr000025'], "FBtr": ['FBgn546466646', '', 'FBgn15565555']}
df = pd.DataFrame(dict1) #recreating your dataframe
#print df
print(df)
#function to replace the values
def replace_values(df):
for i in range(0, (df.size//2)):
if 'tr' in df['FBgn'][i]:
df['FBgn'][i] = df['FBtr'][i]
return df
df = replace_values(df)
#print new df
print(df)

List Objects into Individual CSV

I have a list of dataframes which I wish to convert to multiple csv.
Example:
List_Df = [df1,df2,df3,df4]
for i in List_Df:
i.to_csv("C:\\Users\\Public\\Downloads\\"+i+".csv")
Expected output: Having 4 csv files with the names df1.csv,df2.csv ...
But I am facing two problems:
First problem:
AttributeError: 'list' object has no attribute 'to_csv'
Second problem:
("C:\\Users\\Public\\Downloads\\"+ **i** +".csv") <- **i** returns the object
as it's suppose to but I wish for python to automatically take the
object_name and use it with .csv
Any help will be greatly appreciated as I am new to Python and SOF.
Thank you :)

Try this:
import pandas as pd
List_Df = [df1,df2,df3,df4]
for i,e in enumerate(List_Df):
df = pd.DataFrame(e)
df.to_csv("C:\\Users\\Public\\Downloads\\"+"df"+str(i)+".csv")

For your second problem you would have to e.g. name the dataframes first:
for j,df in enumerate(List_Df):
df.name = 'df'+str(j)
df.to_csv("C:\\Users\\Public\\Downloads\\%s.csv" %(df.name))
or even just take a string and add the index without naming the dataframes first:
for j,df in enumerate(List_Df):
name = 'df'+str(j)
df.to_csv("C:\\Users\\Public\\Downloads\\%s.csv" %(name))

Unable to drop column, object has no attribute error

I have a csv file with column titles: name, mfr, type, calories, protein, fat, sodium, fiber, carbo, sugars, vitamins, rating. When I try to drop the sodium column, I don't understand why I'm getting a NoneType' object has no attribute 'drop' error
I've tried
df.drop(['sodium'],axis=1)
df = df.drop(['sodium'],axis=1)
df = df.drop (['sodium'], 1, inplace=True)

Here's your problem:
df = df.drop (['sodium'], 1, inplace=True)
This returns None (documentation) due to the inplace flag, and so you no longer have a reference to your dataframe. df is now None and None has no drop attribute.
My expectation is that you have done this (or something like it, perhaps dropping another column?) at some prior point in your code.

There is a similar question, you should have a look at,
Delete column from pandas DataFrame using del df.column_name
According to the answer,
`df = df.drop (['sodium'], 1, inplace=True)`
should rather be
df.drop (['sodium'], 1, inplace=True)
Although the first code,
df = df.drop(['sodium'],axis=1)
should work fine, if there is an error, try
print(df.columns)
to make sure that the columns are actually read from the csv file

use pd.read_csv(r'File_Path_with_name') and this will be sorted out as there is some issue with reading csv file.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Indexing by row name - python

Maybe you should define your dataframe? import pandas as pd df1 = pd.DataFrame("here's your dataframe") df1.set_index("Name") or just import pandas as pd df1 = pd.DataFrame("here's your dataframe").set_index("Name") df1

Related

pyspark - creating Row instance inside createDataFrame() method

How to delete specific values from a column in a dataset (Python)?

Changing column values for a value in an adjacent column in the same dataframe using Python

List Objects into Individual CSV

Unable to drop column, object has no attribute error

Categories

Resources