using assign method to add a column to an already-existing table - python

Below is the problem, the code and the error that arises. top_10_movies has two columns, which are rating and name.
import babypandas as bpd
top_10_movies = top_10_movies = bpd.DataFrame().assign(
Rating = top_10_movie_ratings,
Name = top_10_movie_names
)
top_10_movies
You can use the assign method to add a column to an already-existing
table, too. Create a new DataFrame called with_ranking by adding a
column named "Ranking" to the table in top_10_movies
import babypandas as bpd
Ranking = my_ranking
with_ranking = top_10_movies.assign(Ranking)
TypeError Traceback (most recent call last)
<ipython-input-41-a56d9c05ae19> in <module>
1 import babypandas as bpd
2 Ranking = my_ranking
----> 3 with_ranking = top_10_movies.assign(Ranking)
TypeError: assign() takes 1 positional argument but 2 were given

While using assign, it needs a key to assign to, you can do:
with_ranking = top_10_movies.assign(ranking = Ranking)
Here's a simple example to check:
df = pd.DataFrame({'col': ['a','b']})
ranks = [1, 2]
df.assign(ranks) # causes the same error
df.assign(rank = ranks) # works

Related

Inner function with "not enough values to unpack"

I have been stuck with an inner function tentative and after a lot of changes, I'm still seeing the same error at the end when I run it.
My function code is as below.
def test(name,df,col='',col2=''):
format_type = list(df[col])
d = {name: pd.DataFrame() for name in format_type} #create two sub_dataframe
for name, df, col2 in d.items(): #look at one sub_df at the time
df['revenue_share']= (df[col2]/df[col2].sum())*100 #calculate revenue_share of each line
print(df['revenue_share'])
def function(df,col3='revenue_share'): #function to separate the companies within several groups depending on their rev_share
if (df[col3] < 0.5):
return 'longtail'
else:
return df['company_name']
df['company_name'] = df.apply(lambda x: function(x[col2],x),axis=1) . #create a new column with the group name
return df
and the error code when I run test(print,format_company_df,col='format',col2='buyer_spend') :
ValueError Traceback (most recent call last)
<ipython-input-42-19c1a2b58a26> in <module>
----> 1 test(display,miq,col='format',col2='buyer_spend')
2
<ipython-input-41-5380164aff21> in test(name, df, col, col2)
5 d = {name: pd.DataFrame() for name in format_type} #create two sub_dataframe - filtered by format (display or video)
6
----> 7 for name, df, col2 in d.items(): #look at display or video df
8
9 df['revenue_share']= (df[col2]/df[col2].sum())*100 #calculate revenue_share of each line
ValueError: not enough values to unpack (expected 3, got 2)
Thanks a lot for your help!
d is a dictionary. d.items() yields the pair (key, value) at a time. So you can only assign them to 2 variables.
for name, df, col2 in d.items():
Here you are trying to assign them to 3 variables. And that's what the error is trying to say.
for df_name, sub_df in d.items():
This should work.
Nothing to do with inner functions.

AttributeError: 'SimpleTable' object has no attribute 'column'

I am trying to export the summary of my multiple regression models in a table.
results = {'A':result.summary(),
'B': result1.summary(), 'C': result2.summary(), 'D': result3.summary(), 'E' : result4.summary()}
df2 = pd.DataFrame({'Model':[], 'Param':[], 'Value':[]})
for mod in results.keys():
for col in results[mod].tables[0].columns:
if col % 2 == 0:
df2 = df2.append(pd.DataFrame({'Model': [mod]*results[mod].tables[0][col].size,
'Param':results[mod].tables[0][col].values,
'Value':results[mod].tables[0][col+1].values}))
print(df2)
When I run the code it gives me error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-280-952fff354224> in <module>
3 df2 = pd.DataFrame({'Model':[], 'Param':[], 'Value':[]})
4 for mod in results.keys():
----> 5 for col in results[mod].tables[0].column:
6 if col % 2 == 0:
7 df2 = df2.append(pd.DataFrame({'Model': [mod]*results[mod].tables[0][col].size,
AttributeError: 'SimpleTable' object has no attribute 'column'
The SimpleTable in this context is statsmodels.iolib.table.SimpleTable. We can use pandas.DataFrame.from_records to convert the data type to DataFrame. From here, you can access the columns easily.
Assure this SimpleTable is accessed through a variable named "t"
df = pd.DataFrame.from_records(t.data)
header = df.iloc[0] # grab the first row for the header
df = df[1:] # take the data less the header row
df.columns = header
print(df.shape)
return df['your_col_name']
It's hard to tell without seeing how you're creating result.summary() et al, but it's likely that the SimpleTable API follows similar/related pandas APIs, in which case you're looking for the columns attribute (note the plural 's').

Not able to access dataframe column after groupby

import pandas as pd
df_prices = pd.read_csv('data/prices.csv', delimiter = ',')
# sample data from prices.csv
# date,symbol,open,close,low,high,volume
# 2010-01-04,PCLN,222.320007,223.960007,221.580002,225.300003,863200.0
# 2010-01-04,PDCO,29.459999,28.809999,28.65,29.459999,1519900.0
# 2010-01-04,PEG,33.139999,33.630001,32.889999,33.639999,5130400.0
# 2010-01-04,PEP,61.189999,61.240002,60.639999,61.52,6585900.0
# 2010-01-04,PFE,18.27,18.93,18.24,18.940001,52086000.0
# 2010-01-04,PFG,24.110001,25.0,24.1,25.030001,3470900.0
# 2010-01-04,PG,61.110001,61.119999,60.630001,61.310001,9190800.0
df_latest_prices = df_prices.groupby('symbol').last()
df_latest_prices.iloc[115]
# date 2014-02-07
# open 54.26
# close 55.28
# low 53.63
# high 55.45
# volume 3.8587e+06
# Name: CTXS, dtype: object
df_latest_prices.iloc[115].volume
# 3858700.0
df_latest_prices.iloc[115].Name
# ---------------------------------------------------------------------------
# AttributeError Traceback (most recent call last)
# <ipython-input-8-6385f0b6e014> in <module>
# ----> 1 df_latest_prices.iloc[115].Name
I have a dataframe called 'df_latest_prices' which was obtained by doing a groupby on another dataframe.
I am able to access the columns of df_latest_prices as shown above, but I am not able to the access the column that was used in the groupby column (ie. 'symbol')
What do I do to get the 'symbol' from a particular row of this Dataframe ?
Use name attribute:
df_latest_prices.iloc[115].name
Sample:
s = pd.Series([1,2,3], name='CTXS')
print (s.name)
CTXS
I think you problem is two fold, first you are using 'Name' instead of 'name' as #jezrael points out, secondly, when use .iloc with single brackets, [] and a single integer position, you are returning the scalar value at that location.
To fix this, I'd use double brackets to return a slice of the pd.Series or pd.Dataframe.
Using jezrael's setup.
s = pd.Series([1,2,3], name='CTXS')
s.iloc[[1]].name
Output:
'CTXS'
Note:
type(s.iloc[1])
Returns
numpy.int64
Where as,
type(s.iloc[[1]])
Returns
pandas.core.series.Series
which has the 'name' attribute

Populating an object from dataframe

Currently trying to implement Genetic Algorithm. I have built a Python class Gene
I am trying to load an object Gene from a dataframe df
class Gene:
def __init__(self,id,nb_trax,nb_days):
self.id=id
self.nb_trax=nb_trax
self.nb_days=nb_days
and then create another object Chrom
class Chromosome(object):
def __init__(self):
self.port = [Gene() for id in range(20)]
And a second class Chromosome with 20 Gene objects as its property
This is the dataframe
ID nb_obj nb_days
ECGYE 10259 62.965318
NLRTM 8007 46.550562
I successfully loaded the Gene using
tester=df.apply(lambda row: Gene(row['Injection Port'],row['Avg Daily Injection'],random.randint(1,10)), axis=1)
But i cannot load Chrom class using
f=Chromosome(tester)
I get this error
Traceback (most recent call last):
File "chrom.py", line 27, in <module>
f=Chromosome(tester)
TypeError: __init__() takes 1 positional argument but 2 were given
Any help please?
The error is misleading because it says __init__ takes 1 positional argument (which is the self from the object of the class Chromosome).
Secondly, what you are getting from the operation on df in tester is actually a DataFrame indexed as df with one column of Gene values.
To solve this you would have to change the code along these lines:
class Chromosome(object):
def __init__(self, df):
self.port = [Gene() for id in range(20)]
self.xxx = list(df)

Issue calling a function

I have a dataframe called ro which has all claims for automotive parts, What I want now is to create a function called part_dataframe where I can subset the original rointo a new dataframe with only a particular part, let say compressor with the subset name as comp_claims
My function is:
def part_dataframe(first_frame, subset, type_number, number):
subset = first_frame.loc[first_frame[type_number] == number]
subset = subset.reset_index(drop=True)
subset['word'] = subset.Comment.str.split().apply(lambda x: pd.value_counts(x).to_dict())
When I tried to call the function:
part_dataframe(ro, comp_claims, 'Part No.', '97701')
I get the following error:
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-17-65cf8428af26> in <module>()
----> 1 part_dataframe(ro, comp_claims, 'Part No.', '97701')
NameError: name 'comp_claims' is not defined
How can I fix that?
Thank you in advance
ro = pd.DataFrame(
{'Part No.': np.arange(10)}
)
def part_dataframe(first_frame, type_number, number):
return first_frame.loc[first_frame[type_number] == number]
subset = part_dataframe(ro, 'Part No.', 3)
subset

Categories

Resources