Not able to access dataframe column after groupby

Not able to access dataframe column after groupby - python

import pandas as pd
df_prices = pd.read_csv('data/prices.csv', delimiter = ',')
# sample data from prices.csv
# date,symbol,open,close,low,high,volume
# 2010-01-04,PCLN,222.320007,223.960007,221.580002,225.300003,863200.0
# 2010-01-04,PDCO,29.459999,28.809999,28.65,29.459999,1519900.0
# 2010-01-04,PEG,33.139999,33.630001,32.889999,33.639999,5130400.0
# 2010-01-04,PEP,61.189999,61.240002,60.639999,61.52,6585900.0
# 2010-01-04,PFE,18.27,18.93,18.24,18.940001,52086000.0
# 2010-01-04,PFG,24.110001,25.0,24.1,25.030001,3470900.0
# 2010-01-04,PG,61.110001,61.119999,60.630001,61.310001,9190800.0
df_latest_prices = df_prices.groupby('symbol').last()
df_latest_prices.iloc[115]
# date 2014-02-07
# open 54.26
# close 55.28
# low 53.63
# high 55.45
# volume 3.8587e+06
# Name: CTXS, dtype: object
df_latest_prices.iloc[115].volume
# 3858700.0
df_latest_prices.iloc[115].Name
# ---------------------------------------------------------------------------
# AttributeError Traceback (most recent call last)
# <ipython-input-8-6385f0b6e014> in <module>
# ----> 1 df_latest_prices.iloc[115].Name
I have a dataframe called 'df_latest_prices' which was obtained by doing a groupby on another dataframe.
I am able to access the columns of df_latest_prices as shown above, but I am not able to the access the column that was used in the groupby column (ie. 'symbol')
What do I do to get the 'symbol' from a particular row of this Dataframe ?

Use name attribute:
df_latest_prices.iloc[115].name
Sample:
s = pd.Series([1,2,3], name='CTXS')
print (s.name)
CTXS

I think you problem is two fold, first you are using 'Name' instead of 'name' as #jezrael points out, secondly, when use .iloc with single brackets, [] and a single integer position, you are returning the scalar value at that location.
To fix this, I'd use double brackets to return a slice of the pd.Series or pd.Dataframe.
Using jezrael's setup.
s = pd.Series([1,2,3], name='CTXS')
s.iloc[[1]].name
Output:
'CTXS'
Note:
type(s.iloc[1])
Returns
numpy.int64
Where as,
type(s.iloc[[1]])
Returns
pandas.core.series.Series
which has the 'name' attribute

Related

np.log in a for loop, I always get TypeError: 'numpy.float64' object is not callable

I have a really easy dataset with just one column, and I would like to have a for loop over each row of the dataframe so that for each row it calculate the log of current_close_price/first_row_close_price. Whatever I do, it says:
TypeError: 'numpy.float64' object is not callable
import pandas as pd
import numpy as np
price.head()
Close
Date
2010-07-19 107.290001
2010-07-20 108.480003
2010-07-21 107.070000
2010-07-22 109.459999
2010-07-23 110.410004
for index, row in price.iterrows():
first_row_price=price.iloc[0,0]
current_price=price.iloc[index,0]
log_rt = np.log(current_price / reference_price)

Consider we have the table in a.csv file, which have two columns Date and Close, and writing first_row_price instead of reference_price in your code:
with open("a.csv", 'r') as a:
price = pd.read_csv(a, usecols=[1]) # which get data related to 'Close' column
for index, row in price.iterrows():
first_row_price = price.iloc[0, 0]
current_price = price.iloc[index, 0]
log_rt = np.log(current_price / first_row_price)
This code will get output as:
0.0
0.011030393877764241
-0.002052631799009411
0.020023718610826604
0.02866528771045947

pandas: while loop to simultaneously advance through multiple lists and call functions

I want my code to:
read data from a CSV and make a dataframe: "source_df"
see if the dataframe contains any columns specified in a list:
"possible_columns"
call a unique function to replace the values in each column whose header is found in the "possible_columns" the list, then insert the modified values in a new dataframe: "destination_df"
Here it is:
import pandas as pd
#creates source_df
file = "yes-no-true-false.csv"
data = pd.read_csv(file)
source_df = pd.DataFrame(data)
#creates destination_df
blanklist = []
destination_df = pd.DataFrame(blanklist)
#create the column header lists for comparison in the while loop
columns = source_df.head(0)
possible_columns = ['yes/no','true/false']
#establish the functions list and define the functions to replace column values
fix_functions_list = ['yes_no_fix()','true_false_fix()']
def yes_no_fix():
destination_df['yes/no'] = destination_df['yes/no fixed'].replace("No","0").replace("Yes","1")
def true_false_fix():
destination_df['true/false'] = destination_df['true/false fixed'].replace('False', '1').replace('True', '0')
'''use the counter to call a unique function from the function list to replace the values in each column whose header is found in the "possible_columns" the list, insert the modified values in "destination_df, then advance the counter'''
counter = 0
while counter < len(possible_columns):
if possible_columns[counter] in columns:
destination_df.insert(counter, possible_columns[counter], source_df[possible_columns[counter]])
fix_functions_list[counter]
counter = counter + 1
#see if it works
print(destination_df.head(10))
When I print(destination_df), I see the unmodified column values from source_df. When I call the functions independently they work, which makes me think something is going wrong in my while loop.

Your issue is that you are trying to call a function that is stored in a list as a string.
fix_functions_list[cnt]
This will not actually run the function just access the string value.
I would try and find another way to run these functions.

def yes_no_fix():
destination_df['yes/no'] = destination_df['yes/no fixed'].replace("No","0").replace("Yes","1")
def true_false_fix():
destination_df['true/false'] = destination_df['true/false fixed'].replace('False', '1').replace('True', '0')
fix_functions_list = {0:yes_no_fix,1:true_false_fix}
and change the function calling to like below
fix_functions_list[counter]()

#creates source_df
file = "yes-no-true-false.csv"
data = pd.read_csv(file)
source_df = pd.DataFrame(data)
possible_columns = ['yes/no','true/false']
mapping_dict={'yes/no':{"No":"0","Yes":"1"} ,'true/false': {'False':'1','True': '0'}
old_columns=[if column not in possible_columns for column in source_df.columns]
existed_columns=[if column in possible_columns for column in source_df.columns]
new_df=source_df[existed_columns]
for column in new_df.columns:
new_df[column].map(mapping_dict[column])
new_df[old_columns]=source_df[old_columns]

Why is this error occuring when I am using filter in pandas: TypeError: 'int' object is not iterable

When I want to remove some elements which satisfy a particular condition, python is throwing up the following error:
TypeError Traceback (most recent call last)
<ipython-input-25-93addf38c9f9> in <module>()
4
5 df = pd.read_csv('fb441e62df2d58994928907a91895ec62c2c42e6cd075c2700843b89.csv;
----> 6 df = filter(df,~('-02-29' in df['Date']))
7 '''tmax = []; tmin = []
8 for dates in df['Date']:
TypeError: 'int' object is not iterable
The following is the code :
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('data/C2A2_data/BinnedCsvs_d400/fb441e62df2d58994928907a91895ec62c2c42e6cd075c2700843b89.csv');
df = filter(df,~('-02-29' in df['Date']))
What wrong could I be doing?
Following is sample data
Sample Data

Use df.filter() (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.filter.html)
Also please attach the csv so we can run it locally.

Another way to do this is to use one of pandas' string methods for Boolean indexing:
df = df[~ df['Date'].str.contains('-02-29')]
You will still have to make sure that all the dates are actually strings first.
Edit:
Seeing the picture of your data, maybe this is what you want (slashes instead of hyphens):
df = df[~ df['Date'].str.contains('/02/29')]

using assign method to add a column to an already-existing table

Below is the problem, the code and the error that arises. top_10_movies has two columns, which are rating and name.
import babypandas as bpd
top_10_movies = top_10_movies = bpd.DataFrame().assign(
Rating = top_10_movie_ratings,
Name = top_10_movie_names
)
top_10_movies
You can use the assign method to add a column to an already-existing
table, too. Create a new DataFrame called with_ranking by adding a
column named "Ranking" to the table in top_10_movies
import babypandas as bpd
Ranking = my_ranking
with_ranking = top_10_movies.assign(Ranking)
TypeError Traceback (most recent call last)
<ipython-input-41-a56d9c05ae19> in <module>
1 import babypandas as bpd
2 Ranking = my_ranking
----> 3 with_ranking = top_10_movies.assign(Ranking)
TypeError: assign() takes 1 positional argument but 2 were given

While using assign, it needs a key to assign to, you can do:
with_ranking = top_10_movies.assign(ranking = Ranking)
Here's a simple example to check:
df = pd.DataFrame({'col': ['a','b']})
ranks = [1, 2]
df.assign(ranks) # causes the same error
df.assign(rank = ranks) # works

How to change all columns in csv file to str?

I am working on a script that imports an excel file, iterates through a column called "Title," and returns False if a certain keyword is present in "Title." The script runs, until I get to part where I want to export another csv file that gives me a separate column. My error is as follows: AttributeError: 'int' object has no attribute 'lower'
Based on this error, I changed the df.Title to a string using df['Title'].astype(str), but I get the same error.
import pandas as pd
data = pd.read_excel(r'C:/Users/Downloads/61_MONDAY_PROCESS_9.16.19.xlsx')
df = pd.DataFrame(data, columns=['Date Added','Track Item', 'Retailer Item ID','UPC','Title','Manufacturer','Brand','Client Product
Group','Category','Subcategory',
'Amazon Sub Category','Segment','Platform'])
df['Title'].astype(str)
df['Retailer Item ID'].astype(str)
excludes = ['chainsaw','pail','leaf blower','HYOUJIN','brush','dryer','genie','Genuine
Joe','backpack','curling iron','dog','cat','wig','animal','dryer',':','tea', 'Adidas', 'Fila',
'Reebok','Puma','Nike','basket','extension','extensions','batteries','battery','[EXPLICIT]']
my_excludes = [set(x.lower().split()) for x in excludes]
match_titles = [e for e in df.Title.astype(str) if any(keywords.issubset(e.lower().split()) for
keywords in my_excludes)]
def is_match(title, excludes = my_excludes):
if any(keywords.issubset(title.lower().split()) for keywords in my_excludes):
return True
return False
This is the part that returns the error:
df['match_titles'] = df['Title'].apply(is_match)
result = df[df['match_titles']]['Retailer Item ID']
print(df)
df.to_csv('Asin_List(9.18.19).csv',index=False)

Use the following code to import your file:
data = pd.read_excel(r'C:/Users/Downloads/61_MONDAY_PROCESS_9.16.19.xlsx',
dtype='str')`

For pandas.read_excel, you can pass an optional parameter dtype.
You can also use it to pass multiple data types for different columns:
ex: dtype={'Retailer Item ID': int, 'Title': str})

At the line where you wrote
match_titles = [e for e in df.Title.astype(str) if any(keywords.issubset(e.lower().split()) for
keywords in my_excludes)]
python returns as variable e an integer and not the String you like.This happens because when you write df.Title.astype(str) you are searching the index of a new pandas dataframe containing only the column Title and not the contents of the column.If you want to iterate through column you should try
match_titles = [e for e in df.ix[:,5] if any(keywords.issubset(e.lower().split()) for keywords in my_excludes)
The df.ix[:,5] returns the fifth column of the dataframe df,which is the column you want.If this doesn't work try with the iteritems() function.
The main idea is that if you directly assign a df[column] to something else,you are assigning its index,not its contents.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Not able to access dataframe column after groupby - python

Use name attribute: df_latest_prices.iloc[115].name Sample: s = pd.Series([1,2,3], name='CTXS') print (s.name) CTXS

Related

np.log in a for loop, I always get TypeError: 'numpy.float64' object is not callable

pandas: while loop to simultaneously advance through multiple lists and call functions

Why is this error occuring when I am using filter in pandas: TypeError: 'int' object is not iterable

using assign method to add a column to an already-existing table

How to change all columns in csv file to str?

Categories

Resources