I am trying to call a function from a different module as below:
module1 - func1: returns a dataframe
module1 - func2(p_df_in_fromfunc1)
function 2:
for i in range(0,len(p_df_in_fromfunc1):
# Trying to retrieve row values of individual columns and assign to variables
v_tmp = p_df_in_fromfunc1.loc[i,"Col1"]
When trying to run the above code, I get the error:
KeyError 0
Could the issue be because I don't have a zero numbered row?
Without knowing much of you're code, well my guess is, for positional indexing try using iloc instead of loc, if you're interesed in going index-wise.
Something like:
v_tmp = p_df_in_fromfunc1.iloc[i,"Col1"]
You may have a missed to close the quote in the loc function after Col1 ?
v_tmp = p_df_in_fromfunc1.loc[i,"Col1"]
For retrieving a row for specific columns do:
columns = ['Col1', 'Col2']
df[columns].iloc[index]
If you only want one column, you can simplify it to: df['Col1'].iloc(index)
As per your comment, you do not need to reset the index, you can iterate over the values of your index array: df.index
Related
i was reading through the pandas documentation (10 minutes to pandas) and came across this example:
dates = pd.date_range('1/1/2000', periods=8)
df = pd.DataFrame(np.random.randn(8, 4),
index=dates, columns=['A', 'B', 'C', 'D'])
s = df['A']
s[dates[5]]
# Out[5]: -0.6736897080883706
It's quite logic, but if I try it on my own and set the indexname afterwards (example follows), then i can't select data with s[dates[5]]. Does someone know why?
e.g.
df = pd.read_csv("xyz.csv").head(100)
s = df['price'] # series with unnamed int index + price
s = s.rename_axis('indexName')
s[indexName[5]] # NameError: name 'indexName' is not defined
Thanks in advance!
Edit: s.index.name returns indexName, despite not working with the call of s[indexName[5]]
You are confusing the name of the index, and the index values.
In your example, the first code chunk runs because dates is a variable, so when you call dates[5] it actually returns the 5th value from the dates object, which is a valid index value in the dataframe.
In your own attempt, you are referring to indexName inside your slice (ie. when you try to run s[indexName[5]]), but indexName is not a variable in your environment, so it will throw an error.
The correct way to subset parts of your series or dataframe, is to refer to the actual values of the index, not the name of the axis. For example, if you have a series as below:
s = pd.Series(range(5), index=list('abcde'))
Then the values in the index are a through e, therefore to subset that series, you could use:
s['b']
or:
s.loc['b']
Also note, if you prefer to access elements by location rather than index value, you can use the .iloc method. So to get the second element, you would use:
s.iloc[1] # locations 0 is the first element
Hope it helps to clarify. I would recommend you continue to work through some introductory pandas tutorials to build up a basic understanding.
First of all lets understand the example:
df[index] is used to select a row having that index.
This is the s dataframe:
The indexes are the dates.
The dates[5] is equal to '2000-01-06'which is the index of the 5th row of the s df. so, the result is the row having that index.
in your code:
indexName is not defined. so, indexName[5] is not representing an index of your df.
I have a pandas dataframe that works just fine. I am trying to figure out how to tell if a column with a label that I know if correct does not contain all the same values.
The code
below errors out for some reason when I want to see if the column contains -1 in each cell
# column = "TheColumnLabelThatIsCorrect"
# df = "my correct dataframe"
# I get an () takes 1 or 2 arguments but 3 is passed in error
if (not df.loc(column, estimate.eq(-1).all())):
I just learned about .eq() and .all() and hopefully I am using them correctly.
It's a syntax issue - see docs for .loc/indexing. Specifically, you want to be using [] instead of ()
You can do something like
if not df[column].eq(-1).all():
...
If you want to use .loc specifically, you'd do something similar:
if not df.loc[:, column].eq(-1).all():
...
Also, note you don't need to use .eq(), you can just do (df[column] == -1).all()) if you prefer.
You could drop duplicates and if you get only one record it means all records are the same.
import pandas as pd
df = pd.DataFrame({'col': [1, 1, 1, 1]})
len(df['col'].drop_duplicates()) == 1
> True
Question not as clear. Lets try the following though
Contains only -1 in each cell
df['estimate'].eq(-1).all()
Contains -1 in any cell
df['estimate'].eq(-1).any()
Filter out -1 and all columns
df.loc[df['estimate'].eq(-1),:]
df['column'].value_counts() gives you a list of all unique values and their counts in a column. As for checking if all the values are a specific number, you can do that by dropping duplicates and checking the length to be 1.
len(set(df['column'])) == 1
I am quite new to Python programming.
I am working with the following dataframe:
Before
Note that in column "FBgn", there is a mix of FBgn and FBtr string values. I would like to replace the FBtr-containing values with FBgn values provided in the adjacent column called "## FlyBase_FBgn". However, I want to keep the FBgn values in column "FBgn". Maybe keep in mind that I am showing only a portion of the dataframe (reality: 1432 rows). How would I do that? I tried the replace() method from Pandas, but it did not work.
This is actually what I would like to have:
After
Thanks a lot!
With Pandas, you could try:
df.loc[df["FBgn"].str.contains("FBtr"), "FBgn"] = df["## FlyBase_FBgn"]
Welcome to stackoverflow. Please next time provide more info including your code. It is always helpful
Please see the code below, I think you need something similar
import pandas as pd
#ignore the dict1, I just wanted to recreate your df
dict1= {"FBgn": ['FBtr389394949', 'FBgn3093840', 'FBtr000025'], "FBtr": ['FBgn546466646', '', 'FBgn15565555']}
df = pd.DataFrame(dict1) #recreating your dataframe
#print df
print(df)
#function to replace the values
def replace_values(df):
for i in range(0, (df.size//2)):
if 'tr' in df['FBgn'][i]:
df['FBgn'][i] = df['FBtr'][i]
return df
df = replace_values(df)
#print new df
print(df)
I am using the below code to remove a row in a dataframe called petrol. By doing this my expected row got removed but its index say 2 also is missing in my dataframe. Do I need to reset the index value or should proceed with my current indexing ?
petrol[(petrol_attrib>Lower_Inner_Fence) & (petrol_attrib<Upper_Inner_Fence)]
You can redefine your index as
new_df = petrol[(petrol_attrib>Lower_Inner_Fence) & (petrol_attrib<Upper_Inner_Fence)]
new_df.index=range(0,len(petrol)) #you can define your range as required
OR
new_df = petrol[(petrol_attrib>Lower_Inner_Fence) & (petrol_attrib<Upper_Inner_Fence)].reset_index(inplace=True,drop=True)
You can using reset_index()
newdf=petrol[(petrol_attrib>Lower_Inner_Fence) & (petrol_attrib<Upper_Inner_Fence)]
newdf.reset_index(drop=True,inplace=True)
city state neighborhoods categories
Dravosburg PA [asas,dfd] ['Nightlife']
Dravosburg PA [adad] ['Auto_Repair','Automotive']
I have above dataframe I want to convert each element of a list into column for eg:
city state asas dfd adad Nightlife Auto_Repair Automotive
Dravosburg PA 1 1 0 1 1 0
I am using following code to do this :
def list2columns(df):
"""
to convert list in the columns
of a dataframe
"""
columns=['categories','neighborhoods']
for col in columns:
for i in range(len(df)):
for element in eval(df.loc[i,"categories"]):
if len(element)!=0:
if element not in df.columns:
df.loc[:,element]=0
else:
df.loc[i,element]=1
How to do this in more efficient way?
Why still there is below warning when I am using df.loc already
SettingWithCopyWarning: A value is trying to be set on a copy of a slice
from a DataFrame.Try using .loc[row_indexer,col_indexer] = value instead
Since you're using eval(), I assume each column has a string representation of a list, rather than a list itself. Also, unlike your example above, I'm assuming there are quotes around the items in the lists in your neighborhoods column (df.iloc[0, 'neighborhoods'] == "['asas','dfd']"), because otherwise your eval() would fail.
If this is all correct, you could try something like this:
def list2columns(df):
"""
to convert list in the columns of a dataframe
"""
columns = ['categories','neighborhoods']
new_cols = set() # list of all new columns added
for col in columns:
for i in range(len(df[col])):
# get the list of columns to set
set_cols = eval(df.iloc[i, col])
# set the values of these columns to 1 in the current row
# (if this causes new columns to be added, other rows will get nans)
df.iloc[i, set_cols] = 1
# remember which new columns have been added
new_cols.update(set_cols)
# convert any un-set values in the new columns to 0
df[list(new_cols)].fillna(value=0, inplace=True)
# if that doesn't work, this may:
# df.update(df[list(new_cols)].fillna(value=0))
I can only speculate on an answer to your second question, about the SettingWithCopy warning.
It's possible (but unlikely) that using df.iloc instead of df.loc will help, since that is intended to select by row number (in your case, df.loc[i, col] only works because you haven't set an index, so pandas uses the default index, which matches the row number).
Another possibility is that the df that is passed in to your function is already a slice from a larger dataframe, and that is causing the SettingWithCopy warning.
I've also found that using df.loc with mixed indexing modes (logical selectors for rows and column names for columns) produces the SettingWithCopy warning; it's possible that your slice selectors are causing similar problems.
Hopefully the simpler and more direct indexing in the code above will solve any of these problems. But please report back (and provide code to generate df) if you are still seeing that warning.
Use this instead
def list2columns(df):
"""
to convert list in the columns
of a dataframe
"""
df = df.copy()
columns=['categories','neighborhoods']
for col in columns:
for i in range(len(df)):
for element in eval(df.loc[i,"categories"]):
if len(element)!=0:
if element not in df.columns:
df.loc[:,element]=0
else:
df.loc[i,element]=1
return df