I have a dataframe called ro which has all claims for automotive parts, What I want now is to create a function called part_dataframe where I can subset the original rointo a new dataframe with only a particular part, let say compressor with the subset name as comp_claims
My function is:
def part_dataframe(first_frame, subset, type_number, number):
subset = first_frame.loc[first_frame[type_number] == number]
subset = subset.reset_index(drop=True)
subset['word'] = subset.Comment.str.split().apply(lambda x: pd.value_counts(x).to_dict())
When I tried to call the function:
part_dataframe(ro, comp_claims, 'Part No.', '97701')
I get the following error:
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-17-65cf8428af26> in <module>()
----> 1 part_dataframe(ro, comp_claims, 'Part No.', '97701')
NameError: name 'comp_claims' is not defined
How can I fix that?
Thank you in advance
ro = pd.DataFrame(
{'Part No.': np.arange(10)}
)
def part_dataframe(first_frame, type_number, number):
return first_frame.loc[first_frame[type_number] == number]
subset = part_dataframe(ro, 'Part No.', 3)
subset
Related
I'm very new to programming.
I'm doing a small project with pandas, and I need to create a function that, using a dataframe and 2 columns from that dataframe, outputs two dataframes.
dataframe = df
def string_filter(dataframe, dataframest1, dataframest2):
dataframe0 = dataframe[dataframe[dataframest2].notnull()]
dataframe0[ dataframest1 + ' refined '] = dataframe0[dataframest1].str.len()
dataframe0[ dataframest2 + ' refined '] = dataframe0[dataframest2].str.len()
print(dataframe0)
x == dataframe0[ dataframest1 + ' refined ']
z == dataframe0[ dataframest2 + ' refined ']
dataframe1 = dataframe0[x | z != 1]
dataframe2 = dataframe0[x | z == 1]
return(dataframe1, dataframe2)
string_filter(dataframe, 'c1', 'c2')
Whenever I input c1 and c2 as a string, I'm always get this error message:
KeyError: ('c1', 'c2')
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
Input In [28], in <cell line: 1>()
----> 1 string_filter(dataframe, "c1","c2")
How do I fix it so when I input dataframest1, dataframest2 as strings I get two dataframes?
Thanks
Fixed
def string_filter(df, col1, col2):
flt = (df[col1].str.len()==1) & (df[col2].str.len()==1)
df1 = df[flt]
df2 = df[~flt]
return(df1, df2)
In the following codes, I try to define a function first and apply the function to a dataframe to reset the geozone.
import pandas as pd
testdata ={'country': ['USA','AUT','CHE','ABC'], 'geozone':[0,0,0,0]}
d =pd.DataFrame.from_dict(testdata, orient = 'columns')
def setgeozone(dataframe, dcountry, dgeozone):
dataframe.loc[dataframe['dcountry'].isin(['USA','CAN']),'dgeozone'] =1
dataframe.loc[dataframe['dcountry'].isin(['AUT','BEL']),'dgeozone'] =2
dataframe.loc[dataframe['dcountry'].isin(['CHE','DNK']),'dgeozone'] =3
setgeozone(d, country, geozone)
I got error message saying:
Traceback (most recent call last):
File "<ipython-input-56-98dad4781f73>", line 1, in <module>
setgeozone(d, country, geozone)
NameError: name 'country' is not defined
Can someone help me understand what I did wrong.
Many thanks.
You don't need to pass parameters other than the DataFrame itself to your function. Try this:
def setgeozone(df):
df.loc[df['country'].isin(['USA','CAN']),'geozone'] = 1
df.loc[df['country'].isin(['AUT','BEL']),'geozone'] = 2
df.loc[df['country'].isin(['CHE','DNK']),'geozone'] = 3
setgeozone(df)
Here's two other (also better) ways to accomplish what you need:
Use map:
df["geozone"] = df["country"].map({"USA": 1, "CAN": 1, "AUT": 2, "BEL": 2, "CHE": 3, "DNK": 3})
Use numpy.select:
import numpy as np
df["geozone"] = np.select([df["country"].isin(["USA", "CAN"]), df["country"].isin(["AUT", "BEL"]), df["country"].isin(["CHE", "DNK"])],
[1, 2, 3])
I have a dataset (df) as below:
I want to drop rows based on condition when SKU is "abc" and packing is "1KG" & "5KG".
I have tried using following code:
df.drop( df[ (df['SKU'] == "abc") & (df['Packing'] == "10KG") & (df['Packing'] == "5KG") ].index, inplace=True)
Getting following error while trying above code:
NameError Traceback (most recent call last)
<ipython-input-1-fb4743b43158> in <module>
----> 1 df.drop( df[ (df['SKU'] == "abc") & (df['Packing'] == "10KG") & (df['Packing'] == "5KG") ].index, inplace=True)
NameError: name 'df' is not defined
Any help on this will be greatly appreciated. Thanks.
I suggest trying this:
df = df.loc[~((df['SKU'] == 'abc') & (df['packing'].isin(['1KG', '5KG']))]
The .loc is to help define the conditions and using ~ basically means 'NOT' those conditions.
Below is the problem, the code and the error that arises. top_10_movies has two columns, which are rating and name.
import babypandas as bpd
top_10_movies = top_10_movies = bpd.DataFrame().assign(
Rating = top_10_movie_ratings,
Name = top_10_movie_names
)
top_10_movies
You can use the assign method to add a column to an already-existing
table, too. Create a new DataFrame called with_ranking by adding a
column named "Ranking" to the table in top_10_movies
import babypandas as bpd
Ranking = my_ranking
with_ranking = top_10_movies.assign(Ranking)
TypeError Traceback (most recent call last)
<ipython-input-41-a56d9c05ae19> in <module>
1 import babypandas as bpd
2 Ranking = my_ranking
----> 3 with_ranking = top_10_movies.assign(Ranking)
TypeError: assign() takes 1 positional argument but 2 were given
While using assign, it needs a key to assign to, you can do:
with_ranking = top_10_movies.assign(ranking = Ranking)
Here's a simple example to check:
df = pd.DataFrame({'col': ['a','b']})
ranks = [1, 2]
df.assign(ranks) # causes the same error
df.assign(rank = ranks) # works
I am trying to export the summary of my multiple regression models in a table.
results = {'A':result.summary(),
'B': result1.summary(), 'C': result2.summary(), 'D': result3.summary(), 'E' : result4.summary()}
df2 = pd.DataFrame({'Model':[], 'Param':[], 'Value':[]})
for mod in results.keys():
for col in results[mod].tables[0].columns:
if col % 2 == 0:
df2 = df2.append(pd.DataFrame({'Model': [mod]*results[mod].tables[0][col].size,
'Param':results[mod].tables[0][col].values,
'Value':results[mod].tables[0][col+1].values}))
print(df2)
When I run the code it gives me error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-280-952fff354224> in <module>
3 df2 = pd.DataFrame({'Model':[], 'Param':[], 'Value':[]})
4 for mod in results.keys():
----> 5 for col in results[mod].tables[0].column:
6 if col % 2 == 0:
7 df2 = df2.append(pd.DataFrame({'Model': [mod]*results[mod].tables[0][col].size,
AttributeError: 'SimpleTable' object has no attribute 'column'
The SimpleTable in this context is statsmodels.iolib.table.SimpleTable. We can use pandas.DataFrame.from_records to convert the data type to DataFrame. From here, you can access the columns easily.
Assure this SimpleTable is accessed through a variable named "t"
df = pd.DataFrame.from_records(t.data)
header = df.iloc[0] # grab the first row for the header
df = df[1:] # take the data less the header row
df.columns = header
print(df.shape)
return df['your_col_name']
It's hard to tell without seeing how you're creating result.summary() et al, but it's likely that the SimpleTable API follows similar/related pandas APIs, in which case you're looking for the columns attribute (note the plural 's').