statsmodel summary col getting a latex key error? - python

I've been getting a key error when using the summary_col function.
import statsmodels.api as sm
from statsmodels.iolib.summary2 import summary_col
Y = [0,1,0,0,1,1,1]
X = [5,10,15,20,25,2,7]
logit = sm.Logit(Y,X)
fit = logit.fit()
print(fit.summary())
logit_output = summary_col([fit],stars=True)
print(logit_output.as_latex())
gets me a "Key Error: '\m'". Surprisingly, fit.summary().as_latex() does not return this error.

I was reading a bit in the code and I think you are triggering a bug in the code of statsmodels.
Here is my wild explanation. The function summary_col returns an object of the Summary class and sets _merge_latex = True. In .as_latex() the next if-caluse is enabled. Here is the link to the source where I found the code below:
if self._merge_latex:
# create single tabular object for summary_col
tab = re.sub(to_replace, r'\\midrule\n', tab)
If you call fit.summary().as_latex() then _merge_latex = False by default. So you don't get into this part and don't get the same error.
Right now I am not sure what is wrong. I could think of two cases:
re.sub() is only called once and there is a leftover of stuff you want to replace
r'\\midrule\n' is wrong in this line and it shoul be '\\midrule\n' instead.
To make some progress you have to build a minimal example.
To disable the error to see if I am on the right trake, please add
logit_output._merge_latex = False
before your rerun
print(logit_output.as_latex())
your code and check if the error changes. This may generate an output you don't want.

Related

"AssertionError: Cannot handle batch sizes > 1 if no padding token is > defined" and pad_token = eos_token

I am trying to finetune a pre-trained GPT2-model. When applying the respective tokenizer, I originally got the error message:
Using pad_token, but it is not set yet.
Thus, I changed my code to:
GPT2_tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
GPT2_tokenizer.pad_token = GPT2_tokenizer.eos_token
When calling the trainer.train() later, I end up with the following error:
AssertionError: Cannot handle batch sizes > 1 if no padding token is
defined.
Since I specifically defined the pad_token above, I expect these errors (or rather my fix of the original error and this new error) to be related - although I could be wrong. Is this a known problem that eos_token and pad_token somehow interfer? Is there an easy work-around?
Thanks a lot!
I've been running into a similar problem, producing the same error message you were receiving. I can't be sure if your problem and my problem were caused by the same issue, since I can't see your full stack trace, but I'll post my solution in case it can help you or someone else who comes along.
You were totally correct to fix the first issue you described with your tokenizer by setting its pad token with the code provided. However, I also had to set the pad_token_id of my model's configuration to get my GPT2 model to function properly. I did this in the following way:
# instantiate the configuration for your model, this can be imported from transformers
configuration = GPT2Config()
# set up your tokenizer, just like you described, and set the pad token
GPT2_tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
GPT2_tokenizer.pad_token = GPT2_tokenizer.eos_token
# instantiate the model
model = GPT2ForSequenceClassification(configuration).from_pretrained(model_name).to(device)
# set the pad token of the model's configuration
model.config.pad_token_id = model.config.eos_token_id
I suppose this is because the tokenizer and the model function separately, and both need knowledge of the ID being used for the pad token. I can't tell if this will fix your problem (since this post is 6 months old, it may not matter anyway), but hopefully my answer may be able to help someone else.

Printing inside jupyter notebook custom loss function with Keras/TF

In Keras, if you make a custom loss function in a Jupyter notebook, you can not print anything. For instance if you have:
def loss_func(true_label, NN_output):
true_cat = true_label[:,0]
pred_cat = NN_output[:,0]
indicator = NN_output[:,1]
print("Hi!")
custom_term = K.mean(K.abs(indicator))
return binary_crossentropy(true_cat, pred_cat) + custom_term
Nothing will print when the function is evaluated.
As a workaround, in case I am doing some debugging, I have found that I can write to a file in a cost function, which can be useful if I want to print something standard like an int or a string.
However, trying to write out a tensor like indicator to a file gives the unbelievably helpful output:
Tensor("loss_103/model_105_loss/Print:0", shape=(512,), dtype=float32)
I know TF provides a tf.Print() method to print the value of a tensor, but I don't understand how that plays with Jupyter. Other answers have said that tf.Print() writes to std. err, which means trying
sys.stderr = open('test.txt', 'w')
should theoretically allow me to get my output from a file, but unfortunately this doesn't work (at least in Jupyter).
Is there any general method to get a representation of my tensor as a string? How do people generally get around this barrier to seeing what your code does? If I come up with something more fancy than finding a mean, I want to see exactly what's going on in the steps of my calculation to verify it works as intended.
Thanks!
You can do something like the below code:
def loss_func(true_label, NN_output):
true_cat = true_label[:,0]
true_cat = tf.Print(true_cat, [true_cat], message="true_cat: ") # added line
pred_cat = NN_output[:,0]
pred_cat = tf.Print(pred_cat, [pred_cat], message="pred_cat: ") # added line
indicator = NN_output[:,1]
custom_term = K.mean(K.abs(indicator))
return binary_crossentropy(true_cat, pred_cat) + custom_term
Basically I have added two lines to print the values of true_cat, pred_cat.
To print something, you have to include the print statement in the tf graph by the above statements.
However, the trick is it's going to print on jupyter notebook console where you're running the notebook not on the ipython notebook itself.
References:
How to print the value of a Tensor object in TensorFlow?
Printing the loss during TensorFlow training
https://www.tensorflow.org/api_docs/python/tf/Print

Maxscript Python addModifier

I'm writing maxscript in python and the following code throws a type error:
import MaxPlus
res = MaxPlus.Core.GetRootNode()
#This is just as example that I use the first child.
child = MaxPlus.INode.GetChild(res,0)
morpherFP = MaxPlus.FPValue()
MaxPlus.Core.EvalMAXScript("Morpher()", morpherFP)
morpher = MaxPlus.FPValue.Get(morpherFP)
MaxPlus.INode.AddModifier(child, morpher)
And from the MaxScript Listener I always receive the following error:
type 'exceptions.TypeError' in method 'INode_AddModifier', argument 2 of type 'Autodesk::Max::Modifier'"
while the type of morpher is Animatable(Morpher) and Animatable is a subclass of Modifier. Could someone help me with this?
Thank you in advance
I think I found a possible solution (The only thing I know is that the MaxScript Listener doesn't throw an error):
import MaxPlus
res = MaxPlus.Core.GetRootNode()
#I use the first child as example
child = MaxPlus.INode.GetChild(res,0)
morpher = MaxPlus.Factory.CreateObjectModifier(MaxPlus.ClassIds.Morpher)
MaxPlus.INode.AddModifier(child, morpher)
# the following also seems to work aka it does not throw any errors
child.InsertModifier(morpher,1)
Let me know if it is not correct or there is an easier or more understandable way.

The maxscript function "WM3_MC_BuildFromNode" in Python

I used the following code based on the information given in help.autodesk.com for executing maxscript in Python:
import MaxPlus
test = MaxPlus.FPValue()
#The target node has only one morpher and I want to retrieve it using
# .morpher[1]
bool = MaxPlus.Core.EvalMAXScript("WM3_MC_BuildFromNode $node.morpher[1] 1 $target", test)
print bool
If I print the boolean, this always print: "false". However the following code works (aka the print statement returns true):
import MaxPlus
test = MaxPlus.FPValue()
#The target node has only one morpher
bool = MaxPlus.Core.EvalMAXScript("WM3_MC_BuildFromNode $node.morpher 1 $target", test)
print bool
However I cannot use the latter code since it must be possible in my code that a node has multiple morphers.
Is there a better way using the Python api for maxscript (I didn't find a method) or can anyone give suggestions how the first code can be improved.
Thank you
The solution to my problem is:
MaxPlus.Core.EvalMAXScript(WM3_MC_BuildFromNode(for mod in $node.modifiers where isKindOf mod Morpher collect mod)[1] 3 $target)
This solution is found by Swordslayer on the autodesk forum for 3ds Max

Python statsmodels trouble getting fitted model parameters

I'm using an AR model to fit my data and I think that I have done that successfully, but now I want to actually see what the fitted model parameters are and I am running into some trouble. Here is my code
model=ar.AR(df['price'],freq='M')
ar_res=model.fit(maxlags=50,ic='bic')
which runs without any error. However when I try to print the model parameters with the following code
print ar_res.params
I get the error
AssertionError: Index length did not match values
I am unable to reproduce this with current master.
import statsmodels.api as sm
from pandas.util import testing
df = testing.makeTimeDataFrame()
mod = sm.tsa.AR(df['A'])
res = mod.fit(maxlags=10, ic='bic')
res.params

Categories

Resources