I am stuck and I have looked up others solutions to this but I don't quite understand. In my code I have a giant matrix in a csv file that I want to iterate data in my 4th column only. It is called 'MovementTime' i thought that by calling it the way shown below I could iterate my data and therefore sort it. I am getting the error
'str' object has no attribute 'values'
Can someone explain to me why im getting this error?
Thank you!
bigdata = pd.read_csv(r'Assetslog_912021_11.csv')
data = pd.DataFrame(bigdata)
#create a function to analyze data
def analytics(data):
data.columns = ['Time', 'Fixed Delta', 'Movement Time', 'MovementNumber', 'Rest Flag', 'DistortionDigit', 'RobotForceX','RobotForceY','RobotForceZ', 'PrevPositionX','PrevPositionY','PrevPositionZ', 'TargetPosZ', 'TargetPosY', 'TargetPosZ', 'PlayerPosX', 'PlayerPosX', 'PlayerPosY', 'PlayerPosZ', 'RobotVelX','RobotVelY','RobotVelZ', 'LocalPosX', 'LocalPosY', 'LocalPosZ', 'PerpError', 'ExtError']
i = np.iterable(data.columns)
for i in set(data['MovementNumber'.]):
print("Plot for Movement Number " + str(i))
data2 = data.loc[['MovementNumber'] == i]
ax = plt.axes(projection = '3d')
xdata = data2['PlayerPosX'].values
ydata = data2['PlayerPosY'].values
zdata = data2['PlayerPosZ'].values
plot1 =ax.scatter3D(xdata,ydata,zdata, c=zdata)
plt.show(plot1)
This line is not right:
data2 = data.loc[['MovementNumber'] == i]
That's going to compare a list containing a string to an integer, which will always be false. I believe you want
data2 = data[data['MovementNumber'] == i]]
That assigns to data2 all the rows where MovementNumber is i.
And, by the way, your indentation is wrong. I assume you want one plot per movement number, so all the lines starting with ax = ... need to be indented, so they are inside the loop.
Related
I want to print the name of the Dataframe in a for a loop, but I dont get it right
When I iterate over the datasets list I get the dataset. If I try with str(d) for example I get all the dataset as string. d.name() doesnt work either.
What can I do to print just the name of the Dataframe as a string?
Thanks in advance!
PD: I get this Error, "AttributeError: 'DataFrame' object has no attribute 'name'"
# Define lists
datasets = [train_data, test_data]
features = ['Age', 'Fare']
# Create function
fig, outliers = plt.subplots(figsize=(20,10), ncols=4)
row, col = 0, 0
for f in features:
for d in datasets:
sns.boxplot(x=d[f], orient='v', color=pal_titanic[3], ax=outliers[col])
outliers[col].set_title(f + 'in' + d)
col = col + 1
I am new to Spyder and am working with the KDD1999 data. I am trying to create charts based on the dataset such as total amounts of srv_error rates. However when I try to create these charts errors pop up and I have a few I can't solve. I have commented the code. Does anyone know what is wrong with the code?
#Used to import all packanges annd/or libraries you will be useing
#pd loads and creates the data table or dataframe
import pandas as pd
####Section for loading data
#If the datafile extention has xlsx than the read_excel function should be used. If cvs than read_cvs should be used
#As this is stored in the same area the absoloute path can remain unchanged
df = pd.read_csv('kddcupdata1.csv')
#Pulls specific details
#Pulls first five rows
df.head()
#Pulls first three rows
df.head(3)
#Setting column names
df.columns = ['duration', 'protocol_type', 'service', 'flag', 'src_bytes', 'dst_bytes', 'land', 'wrong_fragment', 'urgent', 'hot', 'num_failed_logins', 'logged_in', 'lnum_compromised', 'lroot_shell', 'lsu_attempted', 'lnum_root', 'lnum_file_creations', 'lnum_shells', 'lnum_access_files', 'lnum_outbound_cmds', 'is_host_login', 'is_guest_login', 'count', 'srv_count', 'serror_rate', 'srv_serror_rate', 'rerror_rate', 'srv_rerror_rate', 'same_srv_rate', 'diff_srv_rate', 'srv_diff_host_rate', 'dst_host_count', 'dst_host_srv_count', 'dst_host_same_srv_rate', 'dst_host_diff_srv_rate', 'dst_host_same_src_port_rate', 'dst_host_srv_diff_host_rate', 'dst_host_serror_rate', 'dst_host_srv_serror_rate', 'dst_host_rerror_rate', 'dst_host_srv_rerror_rate', 'label']
#Scatter graph for number of failed logins caused by srv serror rate
df.plot(kind='scatter',x='num_failed_logins',y='srv_serror_rate',color='red')
#This works
#Total num_failed_logins caused by srv_error_rate
# making a dict of list
info = {'Attack': ['dst_host_same_srv_rate', 'dst_host_srv_rerror_rate'],
'Num' : [0, 1]}
otd = pd.DataFrame(info)
# sum of all salary stored in 'total'
otd['total'] = otd['Num'].sum()
print(otd)
##################################################################################
#Charts that do not work
import matplotlib.pyplot as plt
#1 ERROR MESSAGE - AttributeError: 'list' object has no attribute 'lsu_attempted'
#Bar chart showing total 1su attempts
df['lsu_attempted'] = df['lsu_attempted'].astype(int)
df = ({'lsu_attempted':[1]})
df['lsu_attempted'].lsu_attempted(sort=0).plot.bar()
ax = df.plot.bar(x='super user attempts', y='Total of super user attempts', rot=0)
df.from_dict('all super user attempts', orient='index')
df.transpose()
#2 ERROR MESSAGE - TypeError: plot got an unexpected keyword argument 'x'
#A simple line plot
plt.plot(kind='bar',x='protocol_type',y='lsu_attempted')
#3 ERROR MESSAGE - TypeError: 'set' object is not subscriptable
df['lsu_attempted'] = df['lsu_attempted'].astype(int)
df = ({'lsu_attempted'})
df['lsu_attempted'].lsu_attempted(sort=0).plot.bar()
ax = df.plot.bar(x='protocol_type', y='lsu_attempted', rot=0)
df.from_dict('all super user attempts', orient='index')
df.transpose()
#5 ERROR MESSAGE - TypeError: 'dict' object is not callable
#Bar chart showing total of chosen protocols used
Data = {'protocol_types': ['tcp','icmp'],
'number of protocols used': [10,20,30]
}
bar = df(Data,columns=['protocol_types','number of protocols used'])
bar.plot(x ='protocol_types', y='number of protocols used', kind = 'bar')
df.show()
Note:(Also if anyone has some clear explanation on what its about that would also be healpful please link sources if possible?)
Your first error in this snippet :
df['lsu_attempted'] = df['lsu_attempted'].astype(int)
df = ({'lsu_attempted':[1]})
df['lsu_attempted'].lsu_attempted(sort=0).plot.bar()
ax = df.plot.bar(x='super user attempts', y='Total of super user attempts', rot=0)
df.from_dict('all super user attempts', orient='index')
df.transpose()
The error you get AttributeError: 'list' object has no attribute 'lsu_attempted' is as a result of line two above.
Initially df is a pandas data frame (line 1 above), but from line 2 df = ({'lsu_attempted':[1]}), df is now a dictionary with one key - ‘lsu_attempted’ - which has a value of a list with one element.
so in line 3 when you do df['lsu_attempted'] (as the first part of that statement) this equates to that single element list, and a list doesn’t have the lsu_attempted attribute.
I have no idea what you were trying to achieve but it is my strong guess that you did not intend to replace your data frame with a single key dictionary.
Your 2nd error is easy - you are calling plt.plot incorrectly - x is not a keyword argument - see matplotlib.pyplot.plot - Matplotlib 3.2.1 documentation - x and y are positional arguments.
Your 3rd error message results from the code snippet above - you made df a dictionary - and you can’t call dictionaries.
I try to build a heat map by using bokeh. However I keep getting the same error. I'll include both my code and error below, please help me out!
I assumed that the error is mainly about Nan's in my data, so I've added necessary if statements to the code to make sure that this issue is addressed. Even tried to fill any possible Na's with zero in the following lists: 'user','module','ratio','color', and 'alpha'. However none of these changes helped.
colors = ['#ff0000','#ff1919','#ff4c4c','#ff7f7f','#99cc99','#7fbf7f','#4ca64c','#329932','#008000'] sorted_userlist = list(total_checks_sorted.index) user = [] module = [] ratio = [] color = [] alpha = []
for m_id in ol_module_ids:
pset = m_id.split('/')[-1]
col_name1 = m_id + '_ratio'
col_name2 = m_id + '_total'
min_checks = min(check_matrix[col_name2].values)
max_checks = max(check_matrix[col_name2].values)
for i, u in enumerate(sorted_userlist):
module.append(pset)
user.append(str(i+1))
ratio_value = check_matrix[col_name1][u]
ratio.append(ratio_value)
al= math.sqrt((check_matrix[col_name2][u]-min_checks+0.0001)/float(max_checks))
if ratio_value>0.16:
al = min(al*100,1)
alpha.append(al)
if np.isnan(ratio_value):
color.append(colors[0])
else:
color.append(colors[int(ratio_value*8)])
#fill NAs in source lists with zeroes pd.Series(ratio).fillna(0).tolist()
col_source = ColumnDataSource(data = dict(module = module, user = user, color=color, alpha=alpha, ratio = ratio))
#source = source.fillna('')
#TOOLS = "resize,hover,save,pan,box_zoom,wheel_zoom" TOOLS = "reset,hover,save,pan,box_zoom,wheel_zoom"
p=figure(title="Ratio of Correct Checks Each Student Each Online Homework Problem",
x_range=pset,
#y_range = list(reversed(sorted_userlist)),
y_range=list(reversed(list(map(str, range(1,475))))),
x_axis_location="above", plot_width=900, plot_height=4000,
toolbar_location="left", tools=TOOLS)
#axis_line_color = None)
#outline_line_color = None)#
p.rect("module", "user", 1, 1, source=col_source,
color="color", alpha = 'alpha', line_color=None)
show(p)
NaN values are not JSON serializable (this is a glaring deficiency in the JSON standard). You mentioned there are NaN values in the ratio list, which you are putting in the ColumnDataSource here:
col_source = ColumnDataSource(data=dict(..., ratio=ratio))
Since it is in the CDS, Bokeh will try to serialize it, resulting in the error. You have two options:
If you don't actually need the numeric ratio values in the plot for some reason (e.g. to drive a hover tool or custom mapper or something), then just leave it out of the data source.
If you do need to send the ratio values, then you must put the data into a NumPy array. Bokeh serializes NumPy arrays using a different, non-JSON approach, so it is then possible to send NaNs successfully.
When doing data analysis with pandas module in python, I was trying to create a function that can apply the following process to a list of data frames. (Note: P1_Assessment is one of the dataframes that I would like to analyse.)
P1_Assessment[P1_Assessment > 1].sum(axis=0).astype(int).sort_values(ascending = False).plot(kind = 'bar')`
So to analyse a list of data frames in one block of code, I tried to create a function as follows:
def assess_rep(dataframe):
for i in dataframe:
a = i[i > 1].sum(axis= 0).astype(int).sort_values(ascending = False)
a.plot(kind = 'bar')
return
But when I used the function on a list of dataframes, only the analysed result of the last dataframe was returned.
I tried to search on similar topics on stackoverflow but didn't come across anything, maybe I missed out. Any help on this is greatly appreciated!!
Your problem is that plot creates a plot, but when you call it again in your loop it overwrites the one plot call before. So what you want to do is save every plot in a list or something or save them as a file with:
p = a.plot()
fig = p[0].get_figure()
fig.savefig("filename.png")
check out savefig and DataFrame.plot edit took from How to save Pandas pie plot to a file?
I listed two options.
First option is to plot all dataframes in one figure:
def assess_rep(dataframe_list):
for df in dataframe_list:
a = df[df > 1].sum(axis= 0).astype(int).sort_values(ascending = False)
ax = a.plot(kind = 'bar')
return ax
you can save the figure as a png file by:
ax = assess_rep(dataframe_list)
ax.get_figure().savefig('all_dataframe.png')
Second option is to plot every dataframe seperate and save the figure during the process:
import matplotlib.pyplot as plt
def asses_rep(dataframe_list):
ax_list = []
counter = 1
for df in dataframe_list:
print(counter)
fig = plt.figure(counter)
a = df[df > 1].sum(axis= 0).astype(int).sort_values(ascending = False)
ax = a.plot(kind='bar', fig=fig)
ax_list.append(ax)
ax.get_figure().savefig('single_df_%i.png'%counter)
counter += 1
return ax_list
I have created a python function to draw a violin plot using plotly. The code for that is:
def print_violin_plot(data1,data2,type1,type2,fig_title):
data = pd.DataFrame(columns=('Group', 'Number'))
for i in data1:
x = pd.DataFrame({'Group' : [type1],
'Number' : [i],})
data = data.append(x, ignore_index=True)
for i in data2:
x = pd.DataFrame({'Group' : [type2],
'Number' : [i],})
data = data.append(x, ignore_index=True)
fig = FF.create_violin(data, data_header='Number',
group_header='Group', height=500, width=800, title= fig_title)
py.iplot(fig, filename="fig_title")
I then call the function
print_violin_plot(eng_t['grade'],noneng_t['grade'],'English
Speaking','Non-English Speaking', 'Grades')
The function runs fine, without any errors, but I dont see any output. Why?
When you loop over data1 and data2 you assign new instances of pd.DataFrame to the variable x and with each iteration you overwrite the x. If you want to append the instances of pd.DataFrame you should move the data.append(x) to the for-loop. Also, data.append() returns None, it's a method working on the data object, so data = data.append(..) results in data == None. Your code should look like this
data = pd.DataFrame(columns=('Group', 'Number'))
for i in data1:
x = pd.DataFrame({'Group' : [type1],
'Number' : [i],})
for i in data2:
x = pd.DataFrame({'Group' : [type2],
'Number' : [i],})
data.append(x, ignore_index=True)
To be clear, I must say that I have no idea what pg.DataFrame means or does, but what I wrote above is for sure a flaw in your code. I assumed that pg.DataFrame instances support append method and work just like lists.