Explain Function Mistake

Explain Function Mistake - python

I managed to write my first function. however I do not understand it :-)
I approached my real problem with a simplified on. See the following code:
import pandas as pd
import matplotlib as plt
import numpy as np
from pyXSteam.XSteam import XSteam
steamTable = XSteam(XSteam.UNIT_SYSTEM_MKS)
T1_T_in = [398,397,395]
T1_p_in = [29,29,29]
T1_mPkt_in = [2.2,3,3.5]
def Power(druck,temp,menge):
H = []
Q = []
for i in range(len(druck)):
H.append(steamTable.h_pt(druck[i],temp[i]))
Q.append(H[i]*menge[i])
return Q
t1Q=Power(T1_p_in,T1_T_in,T1_mPkt_in)
t3Q = Power(T3_p_in,T3_T_in,T3_mPkt_in)
print(t1Q)
print(t3Q)
It works. The real problem now is different in that way that I read the data from an excel file. I got an error message and (according my learnings from this good homepage :-)) I added ".tolist()" in the function and it works. I do not understand why I need to change it to a list? Can anybody explain it to me? Thank you for your help.
import pandas as pd
import matplotlib as plt
import numpy as np
from pyXSteam.XSteam import XSteam
steamTable = XSteam(XSteam.UNIT_SYSTEM_MKS)
pfad="XXX.xlsx"
df = pd.read_excel(pfad)
T1T_in = df.iloc[2:746,1]
T1p_in = df.iloc[2:746,2]
T1mPkt_in = df.iloc[2:746,3]
def Power(druck,temp,menge):
H = []
Q = []
for i in range(len(druck)):
H.append(steamTable.h_pt(druck.tolist()[i],temp.tolist()[i]))
Q.append(H[i]*menge.tolist()[i])
return Q
t1Q=Power(T1p_in,T1T_in,T1mPkt_in)
t1Q[0:10]

The reason your first example works is because you are passing the T1_mPkt_in variable into the menge parameter as a list:
T1_mPkt_in = [2.2,3,3.5]
Your second example is not working because you pass the T1_mPkt_in variable into the menge parameter as a series and not a list:
T1mPkt_in = df.iloc[2:746,3]
If you print out the type of T1_mPkt_in, you will get:
<class 'pandas.core.series.Series'>
In pandas, to convert a series back into a list, you can call .tolist() to store the data in a list so that you can properly index it.

Related

How to apply a function that takes multiple arguments to a pandas DataFrame

I want to create two functions, apply those functions on the DataFrame, and return the result to column interval_ratio
import seaborn as sns
import pandas as pd
import numpy as np
max_testing_data = sns.load_dataset('geyser')
max_testing_data = max_testing_data[max_testing_data.groupby('waiting')['duration'].transform('max') == max_testing_data['duration']]
median = max_testing_data.groupby('kind', as_index=False)['waiting'].median()
print(median)
def short_modifier(waiting, duration):
max_testing_data['interval_ratio'] = max_testing_data['duration']/max_testing_data['waiting']
def long_modifier(duration, waiting):
max_testing_data['interval_ratio'] = max_testing_data['waiting']/max_testing_data['duration']
max_testing_data.apply(short_modifier, axis=0)
max_testing_data.apply(long_modifier, axis=0)
I am getting an error:
short_modifier() missing 1 required positional argument: 'duration'
How can I fix this?

You haven't said which line the error is from, but am I corrent to believe it is the penultimate one? As Nathan directs, the problem is that you are missing the argument "duration" for you function. More information about .apply() can be found here.
Without needing to use .apply(), you could instead write the following:
max_testing_data["interval_ratio"] = max_testing_data["duration"] / max_testing_data["waiting"]
This will produce the same result as the function short_modifier().

Format pd.Interval categories when plotting

I have similar question like this one : question
Referring the code from above post.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
my_list = [1,2,3,4,5,7,8,9,11,23,56,78,3,3,5,7,9,12]
new_list = pd.Series(my_list)
df1 = pd.DataFrame({'Range1':new_list.value_counts().index, 'Range2':new_list.value_counts().values})
df1.sort_values(by=["Range1"],inplace=True)
df2 = df1.groupby(pd.cut(df1["Range1"], [0,1,2,3,4,5,6,7,8,9,10,11,df1['Range1'].max()])).sum()
objects = df2['Range2'].index
y_pos = np.arange(len(df2['Range2'].index))
but want the following sequence on x-axis:
Expected output:
(00,01] (01,02] (02,03] (03,04]......
Any help in getting the expected output?

It isn't straightforward, but it is doable. You will need to format the left and right intervals separately.
l = df2.index.categories.left.map("{:02d}".format)
r = df2.index.categories.right.map("{:02d}".format)
plt.bar(range(len(df2)), df2['Range2'].values, tick_label='('+l+', '+r+']')
plt.xticks(fontsize=6)
plt.show()
Where,
print('('+l+', '+r+']')
Index(['(00, 01]', '(01, 02]', '(02, 03]', '(03, 04]', '(04, 05]', '(05, 06]',
'(06, 07]', '(07, 08]', '(08, 09]', '(09, 10]', '(10, 11]', '(11, 78]'],
dtype='object')
You may have to change the brackets depending on whether your intervals are closed on the left, on the right, or neither.

Normalize columns in a numpy array- results in typeerror

want to do a simple normalization of the data in a numpy ndarray.
specifically want X-mu/sigma. Tried using the exact code that
that I found in earlier questions - kept getting error = TypeError
cannot perform reduce with flexible type. Gave up and tried a simpler
normzlization method X-mu/X.ptp - got the same error.
import csv
import numpy as np
from numpy import *
import urllib.request
#Import comma separated data from git.hub
url = 'http://archive.ics.uci.edu/ml/machine-learning-
databases/wine/wine.data'
urllib.request.urlretrieve(url,'F:/Python/Wine Dataset/wine_data')
#open file
filename = 'F:/Python/Wine Dataset/wine_data';
raw_data = open(filename,'rt');
#Put raw_data into a numpy.ndarray
reader = csv.reader(raw_data);
x = list(reader);
data = np.array(x)
#First column is classification, other columns are features
y= data[:,0];
X_raw = data[:,1:13];
# Attempt at normalizing data- really wanted X-mu/sigma gave up
# even this simplified version doesn't work
# latest error is TypeError cannot perform reduce with flexible type?????
X = (X_raw - X_raw.min(0)) / X_raw.ptp(0);
print(X);
#
#
#
#

Finally figured it out. The line "data = np.array(x)" returned an array containing string data.
was:
data = "np.array(x)"
changed to: "np.array(x).astype(np.float)"
after that everything worked - simple issue cost me hours

Print numpy array without ellipsis

I want to print a numpy array without truncation. I have seen other solutions but those don't seem to work.
Here is the code snippet:
total_list = np.array(total_list)
np.set_printoptions(threshold=np.inf)
print(total_list)
And this is what the output looks like:
22 A
23 G
24 C
25 T
26 A
27 A
28 A
29 G
..
232272 G
232273 T
232274 G
232275 C
232276 T
232277 C
232278 G
232279 T
This is the entire code. I might be making a mistake in type casting.
import csv
import pandas as pd
import numpy as np
seqs = pd.read_csv('BAP_GBS_BTXv2_imp801.hmp.csv')
plts = pd.read_csv('BAP16_PlotPlan.csv')
required_rows = np.array([7,11,14,19,22,31,35,47,50,55,58,63,66,72,74,79,82,87,90,93,99])
total_list = []
for i in range(len(required_rows)):
curr_row = required_rows[i];
print(curr_row)
for j in range(len(plts.RW)):
if(curr_row == plts.RW[j]):
curr_plt = plts.PI[j]
curr_range = plts.RA1[j]
curr_plt = curr_plt.replace("_", "").lower()
if curr_plt in seqs.columns:
new_item = [curr_row,curr_range,seqs[curr_plt]]
total_list.append(new_item)
print(seqs[curr_plt])
total_list = np.array(total_list)
'''
np.savetxt("foo.csv", total_list[:,2], delimiter=',',fmt='%s')
total_list[:,2].tofile('seqs.csv',sep=',',format='%s')
'''
np.set_printoptions(threshold='nan')
print(total_list)

use the following snippet to get no ellipsis.
import numpy
import sys
numpy.set_printoptions(threshold=sys.maxsize)
EDIT:
If you have a pandas.DataFrame use the following snippet to print your array:
def print_full(x):
pd.set_option('display.max_rows', len(x))
print(x)
pd.reset_option('display.max_rows')
Or you can use the pandas.DataFrame.to_string() method to get the desired result.
EDIT':
An earlier version of this post suggested the option below
numpy.set_printoptions(threshold='nan')
Technically, this might work, however, the numpy documentation specifies int and None as allowed types. Reference: https://docs.scipy.org/doc/numpy/reference/generated/numpy.set_printoptions.html.

You can get around the weird Numpy repr/print behavior by changing it to a list:
print list(total_list)
should print out your list of 2-element np arrays.

You are not printing numpy arrays.
Add the following line after the imports:
pd.set_option('display.max_rows', 100000)

#for a 2d array
def print_full(x):
dim = x.shape
pd.set_option('display.max_rows', dim[0])#dim[0] = len(x)
pd.set_option('display.max_columns', dim[1])
print(x)
pd.reset_option('display.max_rows')
pd.reset_option('display.max_columns')

It appears that as of Python 3, the threshold can no longer be unlimited.
Therefore, the recommended option is:
import numpy
import sys
numpy.set_printoptions(threshold=sys.maxsize)

Python sklearn.datasets.dump_svmlight_file failed to output the right index of column

I want to execute SVM light and SVM rank,
so I need to process my data into the format of SVM light.
But I had a big problem....
My Python codes are below:
import pandas as pd
import numpy as np
from sklearn.datasets import dump_svmlight_file
self.df = pd.DataFrame()
self.df['patent_id'] = patent_id_list
self.df['Target'] = class_list
self.df['backward_citation'] = backward_citation_list
self.df['uspc_originality'] = uspc_originality_list
self.df['science_linkage'] = science_linkage_list
self.df['sim_bc_structure'] = sim_bc_structure_list
self.df['claim_num'] = claim_num_list
self.qid = dataset_list
X = self.df[np.setdiff1d(self.df.columns, ['patent_id','Target'])]
y = self.df.Target
dump_svmlight_file(X,y,'test.dat',zero_based=False, query_id=self.qid,multilabel=False)
The output file "test.dat" is look like this:
But the real data is look like this:
I got a wrong index....
Take first instance for example, the value of column 1 is 7, and the values of column 2~4 are zeros, the value of column 5 is 2....
So my expected result is look like this:
1 qid:1 1:7 5:2
but the column index of output file are totally wrong....
and unfortunately... I cannot figure out where is the problem occur....
I cannot fix this problem for a long time....
Thank you for help!!

I change the data structure, I use np.array to produce array-like input.
Finally, I succeed!

If you're interested in loading into a numpy array, try:
X = clicks_train[:,0:2]
y = clicks_train[:,2]
where 2 is the index of the target column

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Explain Function Mistake - python

Related

How to apply a function that takes multiple arguments to a pandas DataFrame

Format pd.Interval categories when plotting

Normalize columns in a numpy array- results in typeerror

Print numpy array without ellipsis

Python sklearn.datasets.dump_svmlight_file failed to output the right index of column

Categories

Resources