violin plot in python function and plotly - python

I have created a python function to draw a violin plot using plotly. The code for that is:
def print_violin_plot(data1,data2,type1,type2,fig_title):
data = pd.DataFrame(columns=('Group', 'Number'))
for i in data1:
x = pd.DataFrame({'Group' : [type1],
'Number' : [i],})
data = data.append(x, ignore_index=True)
for i in data2:
x = pd.DataFrame({'Group' : [type2],
'Number' : [i],})
data = data.append(x, ignore_index=True)
fig = FF.create_violin(data, data_header='Number',
group_header='Group', height=500, width=800, title= fig_title)
py.iplot(fig, filename="fig_title")
I then call the function
print_violin_plot(eng_t['grade'],noneng_t['grade'],'English
Speaking','Non-English Speaking', 'Grades')
The function runs fine, without any errors, but I dont see any output. Why?

When you loop over data1 and data2 you assign new instances of pd.DataFrame to the variable x and with each iteration you overwrite the x. If you want to append the instances of pd.DataFrame you should move the data.append(x) to the for-loop. Also, data.append() returns None, it's a method working on the data object, so data = data.append(..) results in data == None. Your code should look like this
data = pd.DataFrame(columns=('Group', 'Number'))
for i in data1:
x = pd.DataFrame({'Group' : [type1],
'Number' : [i],})
for i in data2:
x = pd.DataFrame({'Group' : [type2],
'Number' : [i],})
data.append(x, ignore_index=True)
To be clear, I must say that I have no idea what pg.DataFrame means or does, but what I wrote above is for sure a flaw in your code. I assumed that pg.DataFrame instances support append method and work just like lists.

Related

Is there a non-looping way to perform text searching in a data frame

I have a huge list of ngrams to search. I want to know what frequency they have on my historic dataframe and the mean of a numeric variable that I have on my historic. I have a really really ugly way of doing it (that works), but as the list of ngrams is huge, it's really slow.
I am trying to avoid doing the loop, as I guess is the main reason of my velocity problem, but I don't see how I can do it.
Any idea?
output = pd.DataFrame()
ngrams = ['ngram1', 'ngram2', 'ngram3', ..., 'ngram350000']
for i in list(ngrams):
temp = pd.DataFrame(data={'ngram' : [i],
'count' : historic_df['text_variable'].str.contains(i, na=False).sum(),
'mean' : historic_df[historic_df['text_variable'].str.contains(i, na=False)]['numeric_variable'].mean()})
output = pd.concat([output, temp], axis=0)
Try DataFrame.apply()
def func(x):
temp = pd.DataFrame(data={'ngram' : [i],
'count' : historic_df['text_variable'].str.contains(i, na=False).sum(),
'mean' : historic_df[historic_df['text_variable'].str.contains(i, na=False)]['numeric_variable'].mean()})
output = pd.concat([output, temp], axis=0)
return x
output = pd.DataFrame()
ngrams = pd.DataFrame({'ngram':['ngram1', 'ngram2', 'ngram3', ..., 'ngram350000']})
ngrams.apply(func)

'str' object has no attribute 'values'

I am stuck and I have looked up others solutions to this but I don't quite understand. In my code I have a giant matrix in a csv file that I want to iterate data in my 4th column only. It is called 'MovementTime' i thought that by calling it the way shown below I could iterate my data and therefore sort it. I am getting the error
'str' object has no attribute 'values'
Can someone explain to me why im getting this error?
Thank you!
bigdata = pd.read_csv(r'Assetslog_912021_11.csv')
data = pd.DataFrame(bigdata)
#create a function to analyze data
def analytics(data):
data.columns = ['Time', 'Fixed Delta', 'Movement Time', 'MovementNumber', 'Rest Flag', 'DistortionDigit', 'RobotForceX','RobotForceY','RobotForceZ', 'PrevPositionX','PrevPositionY','PrevPositionZ', 'TargetPosZ', 'TargetPosY', 'TargetPosZ', 'PlayerPosX', 'PlayerPosX', 'PlayerPosY', 'PlayerPosZ', 'RobotVelX','RobotVelY','RobotVelZ', 'LocalPosX', 'LocalPosY', 'LocalPosZ', 'PerpError', 'ExtError']
i = np.iterable(data.columns)
for i in set(data['MovementNumber'.]):
print("Plot for Movement Number " + str(i))
data2 = data.loc[['MovementNumber'] == i]
ax = plt.axes(projection = '3d')
xdata = data2['PlayerPosX'].values
ydata = data2['PlayerPosY'].values
zdata = data2['PlayerPosZ'].values
plot1 =ax.scatter3D(xdata,ydata,zdata, c=zdata)
plt.show(plot1)
This line is not right:
data2 = data.loc[['MovementNumber'] == i]
That's going to compare a list containing a string to an integer, which will always be false. I believe you want
data2 = data[data['MovementNumber'] == i]]
That assigns to data2 all the rows where MovementNumber is i.
And, by the way, your indentation is wrong. I assume you want one plot per movement number, so all the lines starting with ax = ... need to be indented, so they are inside the loop.

Why won't my Plotly express figure work? - "ValueError: Plotly Express cannot process wide-form data with columns of different type."

So I'm trying to do something here - I'm obviously not very experienced at all - where I'm reading in data from a wikipedia table and then rudimentally plotting it on a bar chart.
It's pretty messy but what I'm essentially trying to do is make a dictionary containing the two variables and then use that as my "DataFrame" for the visualisation.
source = requests.get('https://en.wikipedia.org/wiki/List_of_countries_by_alcohol_consumption_per_capita').text
soup = BeautifulSoup(source, 'lxml')
body = soup.find('body')
table = body.find('table')
tablemain = table.find('table', class_ ='wikitable nowrap sortable mw-datatable')
countrylist = []
for data in tablemain.find_all('tbody'):
rows = data.find_all('tr')
rows.pop(0)
for row in rows:
country = row.find('a')
countrylist.append(country)
countrynames = []
for x in countrylist:
names = x.get('title')
countrynames.append(names)
total = []
for data in tablemain.find_all('tbody'):
rows = data.find_all('tr')
rows.pop(0)
for row in rows:
tot = row.find_all('td') [1]
total.append(tot.text)
floatal = []
for i in total:
i = float(i)
floatal.append(i)
countries = tuple(countrynames)
dictionary = {'Countries':countrynames,'Scores':floatal}
fig = px.bar(dictionary)
fig.show()
I keep getting this error code
ValueError: Plotly Express cannot process wide-form data with columns of different type.
So I imagine there's something subtly wrong with the way I am understanding data types or plotly or something (in my head the data that im feeding in seems like it should be easily plotable).
Any help someone could give would be really appreciated.
Hard to know for sure without all your imports and a complete code snippet, but if your data handling process is sound, then this should work:
df = pd.DataFrame(dictionary)
fig = px.bar(df, x= 'Countries', y = 'Scores')
fig.show()

Rename a data frame name by adding the iteration value as suffix in a for loop (Python)

I have run the following Python code :
array = ['AEM000', 'AID017']
USA_DATA_1D = USA_DATA10.loc[USA_DATA10['JOBSPECIALTYCODE'].isin(array)]
I run a regression model and extract the log-likelyhood value on each item of this array by a for loop :
for item in array:
USA_DATA_1D = USA_DATA10.loc[USA_DATA10['JOBSPECIALTYCODE'] == item]
formula = "WEIGHTED_BASE_MEDIAN_FINAL_MEAN ~ YEAR"
response, predictors = dmatrices(formula, USA_DATA_1D, return_type='dataframe')
mod1 = sm.GLM(response, predictors, family=sm.genmod.families.family.Gaussian()).fit()
LLF_NG = {'model': ['Standard Gaussian'],
'llf_value': mod1.llf
}
df_llf = pd.DataFrame(LLF_NG , columns = ['model', 'llf_value'])
Now I would like to remane the dataframe df_llf by df_llf_(name of the item) i.e. df_llf_AEM000 when running the loop on the first item and df_llf_AID017 when running the loop on the second one.
I need some help to know how to proceed that.
If you want to rename the data frame, you need to use the copy method so that the original data frame does not get altered.
df_llf_AEM000 = df_llf.copy()
If you want to save iteratively several different versions of the original data frame, you can do something like this:
allDataframes = []
for i in range(10):
df = df_original.copy()
allDataframes.append(df)
print(allDataframes[0])

Spyder charts in the code are not working. What is w?

I am new to Spyder and am working with the KDD1999 data. I am trying to create charts based on the dataset such as total amounts of srv_error rates. However when I try to create these charts errors pop up and I have a few I can't solve. I have commented the code. Does anyone know what is wrong with the code?
#Used to import all packanges annd/or libraries you will be useing
#pd loads and creates the data table or dataframe
import pandas as pd
####Section for loading data
#If the datafile extention has xlsx than the read_excel function should be used. If cvs than read_cvs should be used
#As this is stored in the same area the absoloute path can remain unchanged
df = pd.read_csv('kddcupdata1.csv')
#Pulls specific details
#Pulls first five rows
df.head()
#Pulls first three rows
df.head(3)
#Setting column names
df.columns = ['duration', 'protocol_type', 'service', 'flag', 'src_bytes', 'dst_bytes', 'land', 'wrong_fragment', 'urgent', 'hot', 'num_failed_logins', 'logged_in', 'lnum_compromised', 'lroot_shell', 'lsu_attempted', 'lnum_root', 'lnum_file_creations', 'lnum_shells', 'lnum_access_files', 'lnum_outbound_cmds', 'is_host_login', 'is_guest_login', 'count', 'srv_count', 'serror_rate', 'srv_serror_rate', 'rerror_rate', 'srv_rerror_rate', 'same_srv_rate', 'diff_srv_rate', 'srv_diff_host_rate', 'dst_host_count', 'dst_host_srv_count', 'dst_host_same_srv_rate', 'dst_host_diff_srv_rate', 'dst_host_same_src_port_rate', 'dst_host_srv_diff_host_rate', 'dst_host_serror_rate', 'dst_host_srv_serror_rate', 'dst_host_rerror_rate', 'dst_host_srv_rerror_rate', 'label']
#Scatter graph for number of failed logins caused by srv serror rate
df.plot(kind='scatter',x='num_failed_logins',y='srv_serror_rate',color='red')
#This works
#Total num_failed_logins caused by srv_error_rate
# making a dict of list
info = {'Attack': ['dst_host_same_srv_rate', 'dst_host_srv_rerror_rate'],
'Num' : [0, 1]}
otd = pd.DataFrame(info)
# sum of all salary stored in 'total'
otd['total'] = otd['Num'].sum()
print(otd)
##################################################################################
#Charts that do not work
import matplotlib.pyplot as plt
#1 ERROR MESSAGE - AttributeError: 'list' object has no attribute 'lsu_attempted'
#Bar chart showing total 1su attempts
df['lsu_attempted'] = df['lsu_attempted'].astype(int)
df = ({'lsu_attempted':[1]})
df['lsu_attempted'].lsu_attempted(sort=0).plot.bar()
ax = df.plot.bar(x='super user attempts', y='Total of super user attempts', rot=0)
df.from_dict('all super user attempts', orient='index')
df.transpose()
#2 ERROR MESSAGE - TypeError: plot got an unexpected keyword argument 'x'
#A simple line plot
plt.plot(kind='bar',x='protocol_type',y='lsu_attempted')
#3 ERROR MESSAGE - TypeError: 'set' object is not subscriptable
df['lsu_attempted'] = df['lsu_attempted'].astype(int)
df = ({'lsu_attempted'})
df['lsu_attempted'].lsu_attempted(sort=0).plot.bar()
ax = df.plot.bar(x='protocol_type', y='lsu_attempted', rot=0)
df.from_dict('all super user attempts', orient='index')
df.transpose()
#5 ERROR MESSAGE - TypeError: 'dict' object is not callable
#Bar chart showing total of chosen protocols used
Data = {'protocol_types': ['tcp','icmp'],
'number of protocols used': [10,20,30]
}
bar = df(Data,columns=['protocol_types','number of protocols used'])
bar.plot(x ='protocol_types', y='number of protocols used', kind = 'bar')
df.show()
Note:(Also if anyone has some clear explanation on what its about that would also be healpful please link sources if possible?)
Your first error in this snippet :
df['lsu_attempted'] = df['lsu_attempted'].astype(int)
df = ({'lsu_attempted':[1]})
df['lsu_attempted'].lsu_attempted(sort=0).plot.bar()
ax = df.plot.bar(x='super user attempts', y='Total of super user attempts', rot=0)
df.from_dict('all super user attempts', orient='index')
df.transpose()
The error you get AttributeError: 'list' object has no attribute 'lsu_attempted' is as a result of line two above.
Initially df is a pandas data frame (line 1 above), but from line 2 df = ({'lsu_attempted':[1]}), df is now a dictionary with one key - ‘lsu_attempted’ - which has a value of a list with one element.
so in line 3 when you do df['lsu_attempted'] (as the first part of that statement) this equates to that single element list, and a list doesn’t have the lsu_attempted attribute.
I have no idea what you were trying to achieve but it is my strong guess that you did not intend to replace your data frame with a single key dictionary.
Your 2nd error is easy - you are calling plt.plot incorrectly - x is not a keyword argument - see matplotlib.pyplot.plot - Matplotlib 3.2.1 documentation - x and y are positional arguments.
Your 3rd error message results from the code snippet above - you made df a dictionary - and you can’t call dictionaries.

Categories

Resources