Missing 1 required position argument: 'self' - python

I'm having an issue with this:
I'm getting the error "
bar() missing 1 required positional argument: 'self'
"
I've tried fiddling this classes (using them and not using them) and with the self variable but I've got nothing. The function bar() comes from the pandas library I've imported as well as the dataframe (df) object. I've attached the main function of my code and the function in which the error is occurring.
def createDataframe(assessments):
df = pd.DataFrame
for review in assessments:
for skills in review.skillList:
for skill in skills:
tmp = pd.DataFrame({str(skill[:2]): [skill[3:]]})
df.merge(tmp, how = 'right', right=tmp)
return df
def plotData(df):
ax = df.plot.bar(x='1.')
plt.show()
def main():
# Ensure proper CMD Line Arg
if len(sys.argv) > 3:
print("Error!")
return 1
assessments = dataParse()
df = createDataframe(assessments)
plotData(df)
Any help is welcome! Let me know!
EDIT:
as tdy said in a comment below. I needed to add Parentheses to create an instance. Now I get no errors but I am left with nothing when printing df and nothing shows when plotting the information

Pandas data frame do not have an option for in-place merge. In your code when you merge df, then assign it back to df like so:
df = df.merge(tmp, how=‘right’, right=tmp)

Related

Pandas pipe throws error that df to be passed as an argument

Pandas pipe throws error that df to be passed as argument
Ideally pipe should take the dataframe as argument by default which is not happening in my case.
class Summary:
def get_src_base_df(self):
<do stuff>
return df
#staticmethod
def sum_agg(df):
cols = 'FREQUENCY_ID|^FLAG_'
df = (df.filter(regex=cols).fillna(0)
.groupby('FREQUENCY_ID').agg(lambda x: x.astype(int).sum()))
return df
# few other #static methods
def get_src_df(self):
df = self.get_src_base_df().pipe(self.sum_agg()) #pipe chain continues
# --> error: sum_agg() missing 1 required positional argument: 'df'
# but the below line works
# df = self.get_src_base_df().pipe((lambda x: self.sum_agg(x))) #pipe chain continues
By doing self.sum_agg(), you're calling the sum_agg function (#staticmethods in Python are pretty much indistinguishable from functions), and since it doesn't have an argument right there in that call, it rightfully fails. You need to pass the function object, not the value returned by the function.
Do this, instead :
def get_src_df(self):
df = self.get_src_base_df().pipe(self.sum_agg) # note: no parentheses

Location of a row with error as a decorator

I have seen a few other answers on Stackoverflow on how to find the location of a pandas row with an error.
I want to transform this function into a decorator so I can put it on my functions inside a class.
It would make trouble shooting much easier and then I can reuse the function over and over again.
Here is some sample data for test.txt
2021-07-30 13:29:45, this, is, my test log
2021-07-30 13:29:57, foo, bar, ham spam and eggs
2021-07-30 13:30:45, cheesy, eggs, with, foo bar
FAKE_ERROR_FOR_TESTING, this, is, to break the code
This is the decorator and some cleaning code I attempted to write:
import pandas as pd
def row_error_number(func):
"""
row error number returns the rows where an error occurred.
"""
from functools import wraps
#wraps(func)
def wrapper_decorator(*args, **kwargs):
for i, item in enumerate(*args):
try:
value = func(*args, **kwargs)
except ValueError:
print('ERROR at index {}: {!r}'.format(i, item))
return value
return wrapper_decorator
#row_error_number
def parse_log(log_file: str, N: int = 100):
"""parse_log summary."""
with open(log_file) as myfile:
head = [next(myfile, None) for x in range(N)]
df = pd.Series(head)
df = df.str.split(",", n=3, expand=True) # expand log into new columns
df = df.rename(columns={0: "time_stamp"})
df['time_stamp'] = pd.to_datetime(df['time_stamp'], format='%Y-%m-%d %H:%M:%S.%f')
return df
if __name__ == '__main__':
df = parse_log(log_file="test.txt", N=100)
print(df.head())
The issue is that I'm not sure how to pass the df back into the enumerate function as an iterable.
TypeError: enumerate() missing required argument 'iterable' (pos 1)
I want the decorator to be reusable so it will tell me what rows are wrong. A way to limit the number of error rows that it prints out would be really helpful too.

Is it possible to use a keyword name from **kwargs to filter my data frame?

Apologies if the title is a bit obscure, I am happy to change it..
Problem: I am trying to use a keyword name in the following code to filter by column name in a dataframe using pandas.
#staticmethod
def filter_json(json, col_filter, **kwargs):
'''
Convert and filter a JSON object into a dataframe
'''
df = pd.read_json(json).drop(col_filter, axis=1)
for arg in kwargs:
df = df[(df.arg.isin(kwargs[arg]))]
return df
However I get error AttributeError: 'DataFrame' object has no attribute 'arg' because arg is not a valid column name (makes sense) at line df.arg.isin(kwargs[arg]))]
I am calling the method with the following...
filter_json(json_obj, MY_COL_FILTERS, IsOpen=['false', 0])
Meaning df.arg should essentially be df.IsOpen
Question: Is there a way to use arg as my column name (IsOpen) here? Rather then me having to input it manually as df.IsOpen
You can access columns with dataframe[columnname] notation as well: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html
Try:
for arg in kwargs: # arg is 'IsOpen'
df = df[(df[arg].isin(kwargs[arg]))] # df['IsOpen'] is same as df.IsOpen

__init__() got multiple values for argument 'use_technical_indicator' - error

I can't figure out why I am getting this error. If you can figure it out, I'd appreciate it. If you can provide specific instruction, I'd appreciate it. This code is in one module; there are 7 modules total.
Python 3.7, Mac OS, code from www.finrl.org
# Perform Feature Engineering:
df = FeatureEngineer(df.copy(),
use_technical_indicator=True,
use_turbulence=False).preprocess_data()
# add covariance matrix as states
df=df.sort_values(['date','tic'],ignore_index=True)
df.index = df.date.factorize()[0]
cov_list = []
# look back is one year
lookback=252
for i in range(lookback,len(df.index.unique())):
data_lookback = df.loc[i-lookback:i,:]
price_lookback=data_lookback.pivot_table(index = 'date',columns = 'tic', values = 'close')
return_lookback = price_lookback.pct_change().dropna()
covs = return_lookback.cov().values
cov_list.append(covs)
df_cov = pd.DataFrame({'date':df.date.unique()[lookback:],'cov_list':cov_list})
df = df.merge(df_cov, on='date')
df = df.sort_values(['date','tic']).reset_index(drop=True)
df.head()
The function definition statement for FeatureEngineer.__init__ is:
def __init__(
self,
use_technical_indicator=True,
tech_indicator_list=config.TECHNICAL_INDICATORS_LIST,
use_turbulence=False,
user_defined_feature=False,
):
As you can see there is no argument (other than self which you should not provide) before use_technical_indicator, so you should remove the df.copy() from before the use_techincal_indicator in your line 2.
Checking the current FeatureEngineer class, you must to provide the df.copy() parameter to the preprocess_data() method.
So, your code have to look like:
# Perform Feature Engineering:
df = FeatureEngineer(use_technical_indicator=True,
tech_indicator_list = config.TECHNICAL_INDICATORS_LIST,
use_turbulence=True,
user_defined_feature = False).preprocess_data(df.copy())

Spyder charts in the code are not working. What is w?

I am new to Spyder and am working with the KDD1999 data. I am trying to create charts based on the dataset such as total amounts of srv_error rates. However when I try to create these charts errors pop up and I have a few I can't solve. I have commented the code. Does anyone know what is wrong with the code?
#Used to import all packanges annd/or libraries you will be useing
#pd loads and creates the data table or dataframe
import pandas as pd
####Section for loading data
#If the datafile extention has xlsx than the read_excel function should be used. If cvs than read_cvs should be used
#As this is stored in the same area the absoloute path can remain unchanged
df = pd.read_csv('kddcupdata1.csv')
#Pulls specific details
#Pulls first five rows
df.head()
#Pulls first three rows
df.head(3)
#Setting column names
df.columns = ['duration', 'protocol_type', 'service', 'flag', 'src_bytes', 'dst_bytes', 'land', 'wrong_fragment', 'urgent', 'hot', 'num_failed_logins', 'logged_in', 'lnum_compromised', 'lroot_shell', 'lsu_attempted', 'lnum_root', 'lnum_file_creations', 'lnum_shells', 'lnum_access_files', 'lnum_outbound_cmds', 'is_host_login', 'is_guest_login', 'count', 'srv_count', 'serror_rate', 'srv_serror_rate', 'rerror_rate', 'srv_rerror_rate', 'same_srv_rate', 'diff_srv_rate', 'srv_diff_host_rate', 'dst_host_count', 'dst_host_srv_count', 'dst_host_same_srv_rate', 'dst_host_diff_srv_rate', 'dst_host_same_src_port_rate', 'dst_host_srv_diff_host_rate', 'dst_host_serror_rate', 'dst_host_srv_serror_rate', 'dst_host_rerror_rate', 'dst_host_srv_rerror_rate', 'label']
#Scatter graph for number of failed logins caused by srv serror rate
df.plot(kind='scatter',x='num_failed_logins',y='srv_serror_rate',color='red')
#This works
#Total num_failed_logins caused by srv_error_rate
# making a dict of list
info = {'Attack': ['dst_host_same_srv_rate', 'dst_host_srv_rerror_rate'],
'Num' : [0, 1]}
otd = pd.DataFrame(info)
# sum of all salary stored in 'total'
otd['total'] = otd['Num'].sum()
print(otd)
##################################################################################
#Charts that do not work
import matplotlib.pyplot as plt
#1 ERROR MESSAGE - AttributeError: 'list' object has no attribute 'lsu_attempted'
#Bar chart showing total 1su attempts
df['lsu_attempted'] = df['lsu_attempted'].astype(int)
df = ({'lsu_attempted':[1]})
df['lsu_attempted'].lsu_attempted(sort=0).plot.bar()
ax = df.plot.bar(x='super user attempts', y='Total of super user attempts', rot=0)
df.from_dict('all super user attempts', orient='index')
df.transpose()
#2 ERROR MESSAGE - TypeError: plot got an unexpected keyword argument 'x'
#A simple line plot
plt.plot(kind='bar',x='protocol_type',y='lsu_attempted')
#3 ERROR MESSAGE - TypeError: 'set' object is not subscriptable
df['lsu_attempted'] = df['lsu_attempted'].astype(int)
df = ({'lsu_attempted'})
df['lsu_attempted'].lsu_attempted(sort=0).plot.bar()
ax = df.plot.bar(x='protocol_type', y='lsu_attempted', rot=0)
df.from_dict('all super user attempts', orient='index')
df.transpose()
#5 ERROR MESSAGE - TypeError: 'dict' object is not callable
#Bar chart showing total of chosen protocols used
Data = {'protocol_types': ['tcp','icmp'],
'number of protocols used': [10,20,30]
}
bar = df(Data,columns=['protocol_types','number of protocols used'])
bar.plot(x ='protocol_types', y='number of protocols used', kind = 'bar')
df.show()
Note:(Also if anyone has some clear explanation on what its about that would also be healpful please link sources if possible?)
Your first error in this snippet :
df['lsu_attempted'] = df['lsu_attempted'].astype(int)
df = ({'lsu_attempted':[1]})
df['lsu_attempted'].lsu_attempted(sort=0).plot.bar()
ax = df.plot.bar(x='super user attempts', y='Total of super user attempts', rot=0)
df.from_dict('all super user attempts', orient='index')
df.transpose()
The error you get AttributeError: 'list' object has no attribute 'lsu_attempted' is as a result of line two above.
Initially df is a pandas data frame (line 1 above), but from line 2 df = ({'lsu_attempted':[1]}), df is now a dictionary with one key - ‘lsu_attempted’ - which has a value of a list with one element.
so in line 3 when you do df['lsu_attempted'] (as the first part of that statement) this equates to that single element list, and a list doesn’t have the lsu_attempted attribute.
I have no idea what you were trying to achieve but it is my strong guess that you did not intend to replace your data frame with a single key dictionary.
Your 2nd error is easy - you are calling plt.plot incorrectly - x is not a keyword argument - see matplotlib.pyplot.plot - Matplotlib 3.2.1 documentation - x and y are positional arguments.
Your 3rd error message results from the code snippet above - you made df a dictionary - and you can’t call dictionaries.

Categories

Resources