Pandas df.to_latex() output gets truncated - python

Problem: I tried to export a pandas.DataFrame to LaTex using .to_latex().
However, the output for long values (in my case long strings) gets truncated.
Step to reproduce:
import pandas as pd
df = pd.DataFrame(['veryLongString' * i for i in range(1, 5)], dtype='string')
print(df.to_latex())
Output:
\begin{tabular}{ll}
\toprule
{} & 0 \\
\midrule
0 & veryLongString \\
1 & veryLongStringveryLongString \\
2 & veryLongStringveryLongStringveryLongString \\
3 & veryLongStringveryLongStringveryLongStringvery... \\
\bottomrule
\end{tabular}
As you can see, the last row gets truncated (with ...).
I already tried to use the col_space parameter but this does not change the behavior as expected.
It simply shifts the table cells as following:
\begin{tabular}{ll}
\toprule
{} & 0 \\
\midrule
0 & veryLongString \\
1 & veryLongStringveryLongString \\
2 & veryLongStringveryLongStringveryLongString \\
3 & veryLongStringveryLongStringveryLongStringvery... \\
\bottomrule
\end{tabular}
How do I get the full content of the DataFrame exported to Latex?

You can call the context manager with a with statement to temporarily change the max column width:
with pd.option_context("max_colwidth", 1000):
print (df.to_latex())
Output:
\begin{tabular}{ll}
\toprule
{} & 0 \\
\midrule
0 & veryLongString \\
1 & veryLongStringveryLongString \\
2 & veryLongStringveryLongStringveryLongString \\
3 & veryLongStringveryLongStringveryLongStringveryLongString \\
\bottomrule
\end{tabular}
This behaviour is also described here.

After spending some time trying out other parameters from to_latex() as well as other export options, e.g., to_csv(), I was sure that this is not a problem of to_latex().
I found the solution in the pandas documentation:
So the solution is setting this option to None to don't restrict the output (globally).
pd.set_option('display.max_colwidth', None)
Source: https://pandas.pydata.org/pandas-docs/stable/user_guide/options.html

Related

Transposed dataframe to LaTeX

I am not able to change the number format in the LaTeX output of the library Pandas.
Consider this example:
import pandas as pd
values = [ { "id":"id1", "c1":1e-10, "c2":int(1000) }]
df = pd.DataFrame.from_dict(values).set_index("id")
print(df)
with output:
c1 c2
id
id1 1.000000e-10 1000
Let's say that I want c1 formatted with two decimal places, c2 as an integer:
s = df.style
s.clear()
s.format({ "c1":"{:.2f}", "c2":"{:d}" })
print(s.to_latex())
with output:
\begin{tabular}{lrr}
& c1 & c2 \\
id & & \\
id1 & 0.00 & 1000 \\
\end{tabular}
However, I do not need a LaTeX table for df but for df.T.
Question: since I can specify the styles only for the columns (at least it seems so in the docs), how can I specify the row-based output format for df.T?
If I simply write this:
dft = df.T
s2 = dft.style
# s2.clear() # nothing changes with this instruction
print(s2.to_latex())
it is ever worse as I get:
\begin{tabular}{lr}
id & id1 \\
c1 & 0.000000 \\
c2 & 1000.000000 \\
\end{tabular}
where even the integer (the one with value int(1000)) became a float using the default style/format.
I played with the subset parameter and various slices with no success.
As workaround, you can format each value to its correct one with:
fmt = { "c1":"{:.2f}", "c2":"{:d}" }
df_print = df.apply(lambda x: [fmt[x.name].format(v) for v in x])
print(df_print.T.style.to_latex())
Output:
\begin{tabular}{ll}
{id} & {id1} \\
c1 & 0.00 \\
c2 & 1000 \\
\end{tabular}

Converting a string from long to short based on logic in python

I have multiple long values like the following in a column in pandas dataframe (an example) -
((Type=Food & Value1=Fruit & Value2=Apple) or (Type=Food & Value1=Fruit & Value2=Banana) or (Type=Food & Value1=Vegetable & Value2=Carrot) or (Type=Food & Value1=Vegetable & Value2=Tomato))
I want to convert it to -
((Type=Food & Value1=Fruit & Value2 = Apple|Banana) or (Type=Food & Value1=Vegetable & Value2= Carrot|Tomato))
How can I do it? could not find anything that helps this
((Type=Food & Value1=Fruit & Value2 = Apple|Banana) =>
((Type=Food & Value1=Fruit & ((Value2 = Apple) or (Value2 = Banana))
is this helpful ?
ok I think you need something like this
fruits = ['banana','apple']
print('banana' in fruits)
print('value' in fruits)
output
True
False
for your case:
((Type=Food & Value1=Fruit & Value2 in [Apple,Banana])

Convert statsmodel table to latex style .png with python

I think I am quite close to solve a problem a lot of statsmodel users had in the past. However, my nonexistent latex experience is slowing down my progress by a lot. :D
Let's jump directly into the problem:
In the following I am using the example dataset from https://www.statsmodels.org/stable/regression.html.
The output is therefore the following regression table:
OLS Regression Results
I have tried several packages to convert the output to a latex style .png. The most promising method seems to be to use the sympy.printing.preview.preview function:
In:
from sympy.printing.preview import preview
preamble = "\\documentclass[12pt]{article}\n" \
"\\usepackage{booktabs,amsmath,amsfonts}\\begin{document}"
preview(res.summary().as_latex(), output='png', filename='output.png', preamble=preamble)
Out:
Latex style .png 1
As you can see, the second and third table of the regression as well as the warning below the table are not in line with the first table.
To fix this, I have tried to use the tabularx package as you can see in the following:
preamble = "\\documentclass[12pt]{article}\n" \
"\\usepackage{booktabs,amsmath,amsfonts, tabularx}\\begin{document}"
preview(res.summary().as_latex().replace("\\begin{tabular}", "\\begin{tabularx}
{\\textwidth}").replace("\\end{tabular}", "\\end{tabularx}"), output='png', filename='output.png', preamble=preamble)
Output:
Latex style .png 2
The horizontal lines of table two and three are aligned with the ones of table one, however the input of the table as well as the warning aren't.
Does anybody know how to continue from here? How can I get the original output table in a latex style saved as a .png with python?
Thank you guys in advance!
Edit1:
Since res.summary().as_latex() outputs the code shown at the end of the post, I have managed to align the tables more or less manually with:
preview(res.summary().as_latex().replace("\\begin{tabular}", "\\begin{tabularx}{20cm}").replace("\\end{tabular}", "\\end{tabularx}").replace("{lclc}", "{>{\hsize=.3\hsize}X >{\hsize=.2\hsize}X >{\hsize=.3\hsize}X >{\hsize=.2\hsize}X}").replace("{lcccccc}", "{XXXXXX p{3.6cm}}"), output='png', filename='output.png', preamble=preamble)
The new Output looks like that. This shouldn't and can't be the final solution, therefore I would like to expand the question if somebody knows a more dynamic (non-manual) way to do this.
Full latex code of the three tables:
\begin{center}
\begin{tabular}{lclc}
\toprule
\textbf{Dep. Variable:} & y & \textbf{ R-squared: } & 0.416 \\
\textbf{Model:} & OLS & \textbf{ Adj. R-squared: } & 0.353 \\
\textbf{Method:} & Least Squares & \textbf{ F-statistic: } & 6.646 \\
\textbf{Date:} & Thu, 04 Feb 2021 & \textbf{ Prob (F-statistic):} & 0.00157 \\
\textbf{Time:} & 18:38:15 & \textbf{ Log-Likelihood: } & -12.978 \\
\textbf{No. Observations:} & 32 & \textbf{ AIC: } & 33.96 \\
\textbf{Df Residuals:} & 28 & \textbf{ BIC: } & 39.82 \\
\textbf{Df Model:} & 3 & \textbf{ } & \\
\bottomrule
\end{tabular}
\begin{tabular}{lcccccc}
& \textbf{coef} & \textbf{std err} & \textbf{t} & \textbf{P$> |$t$|$} & \textbf{[0.025} & \textbf{0.975]} \\
\midrule
\textbf{x1} & 0.4639 & 0.162 & 2.864 & 0.008 & 0.132 & 0.796 \\
\textbf{x2} & 0.0105 & 0.019 & 0.539 & 0.594 & -0.029 & 0.050 \\
\textbf{x3} & 0.3786 & 0.139 & 2.720 & 0.011 & 0.093 & 0.664 \\
\textbf{const} & -1.4980 & 0.524 & -2.859 & 0.008 & -2.571 & -0.425 \\
\bottomrule
\end{tabular}
\begin{tabular}{lclc}
\textbf{Omnibus:} & 0.176 & \textbf{ Durbin-Watson: } & 2.346 \\
\textbf{Prob(Omnibus):} & 0.916 & \textbf{ Jarque-Bera (JB): } & 0.167 \\
\textbf{Skew:} & 0.141 & \textbf{ Prob(JB): } & 0.920 \\
\textbf{Kurtosis:} & 2.786 & \textbf{ Cond. No. } & 176. \\
\bottomrule
\end{tabular}
%\caption{OLS Regression Results}
\end{center}
Notes: \newline
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

How to extract data from LaTeX table using Python

I have a table from written in LaTeX in a .tex file:
\begin{tabular}{cccccccc}
\hline
\hline
$ \beta$ & $T\times L^3$ & $am_{ud}^\bare$ & $am_s^\bare$ & $Z_S (am_{ud})$ & $a\Mss$ & $a M_\pi$ & $aF_\pi/Z_A$ \\ \hline
%48\_24\_3.5\_ud-0.041\_s-0.006
& $48\times 24^3$ & -0.041 & -0.006
& 0.01475(33) & 0.3415(5)(2) & 0.19188(50)(6) & 0.05491(34)(0) \\
%48\_24\_3.5\_ud-0.0437\_s-0.006
& $48\times 24^3$ & -0.0437 & -0.006
& 0.01188(27) & 0.3396(5)(2) & 0.17238(49)(3) & 0.05263(34)(0) \\
%64\_24\_3.5\_ud-0.041\_s-0.012
& $64 \times 24^3$ & -0.041 & -0.012
& 0.01428(33) & 0.3175(95)(4) & 0.18790(90)(30) & 0.05384(84)(6) \\
%64\_32\_3.5\_ud-0.0463\_s-0.012
& $64 \times 32^3$ & -0.0463 & -0.012
& 0.00853(20) & 0.3134(10)(7) & 0.14440(70)(60) & 0.05004(62)(6) \\
%64\_32\_3.5\_ud-0.048\_s-0.0023
3.5 & $64 \times 32^3$ & -0.048 & -0.0023
& 0.00726(17) & 0.3496(75)(5) & 0.13480(70)(20) & 0.04982(59)(1) \\
%64\_32\_3.5\_ud-0.049\_s-0.006
& $64 \times 32^3$ & -0.049 & -0.006
& 0.00579(15) & 0.3339(10)(5) & 0.12100(9)(3) & 0.04837(84)(3) \\
%64\_32\_3.5\_ud-0.049\_s-0.012
& $64 \times 32^3$ & -0.049 & -0.012
& 0.00560(14) & 0.3103(69)(9) & 0.11733(64)(3) & 0.04800(68)(2) \\
%64\_48\_3.5\_ud-0.0515\_s-0.012
& $64 \times 48^3$ & -0.0515 & -0.012
& 0.00288(7) & 0.3079(9)(1) & 0.08410(60)(20) & 0.04628(58)(3) \\
%64\_64\_3.5\_ud-0.05294\_s-0.006
& $64 \times 64^3$ & -0.05294 & -0.006
& 0.00149(5) & 0.3281(9)(5) & 0.06126(60)(9) & 0.04440(75)(6) \\
\hline
%48\_32\_3.61\_ud-0.028\_s0.0045
& $48 \times 32^3$ & -0.028 & 0.0045
& 0.01008(23) & 0.2955(6)(3) & 0.14852(49)(2) & 0.04408(34)(2) \\
%48\_32\_3.61\_ud-0.03\_s0.0045
& $48 \times 32^3$ & -0.03 & 0.0045
& 0.00808(18) & 0.2929(7)(3) & 0.13217(50)(9) & 0.04262(39)(1) \\
%48\_32\_3.61\_ud-0.03\_s-0.0042
& $48 \times 32^3$ & -0.03 & -0.0042
& 0.00783(18) & 0.2602(7)(2) & 0.12943(59)(4) & 0.04207(39)(1) \\
%48\_48\_3.61\_ud-0.03121\_s0.0045
3.61 & $48 \times 48^3$ & -0.03121 & 0.0045
\end{tabular}
I obviously only want the numbers but I'm having trouble even getting Python to read the lines. If I for example define:
file=open('lattice-data.tex','r')
and try file.read() or file.readline() I only get '' in return.
Extract the information using regular expressions (regex). You may read, for example, at w3 school how to use regex in python.
At the core you should look for column delimiters amp '&' and new lines double backslash '\\'. After prying apart the table you may deal with 'decoding' each entry with regex for each type of data. (I cannot see a clear pattern in the unformatted source.)

Statsmodel summary_col latex format error

I am using the statsmodel's summary_col to give me an output table which summarizes the two regression output.
The code for this is
res3 = summary_col([res1,res2],stars=True,float_format='%0.2f',
info_dict={'R2':lambda x: "{:.2f}".format(x.rsquared)})
f = open('res3.tex', 'w')
f.write(res3.as_latex())
f.close()
I use the res3.tex file as input for another tex file which then generates the results. The problem arises when i convert the table to LaTeX format using as_latex(). The table header shifts to the side in the tex file and looks like this.
The res3.tex file has the following latex code
\begin{table}
\caption{}
\begin{center}
\begin{tabular}{lcc}
\hline
& investment I & investment II \\
\hline
\hline
\end{tabular}
\begin{tabular}{lll}
GDP & 1.35*** & 1.19*** \\
& (0.24) & (0.23) \\
bsent & 0.28*** & 0.26*** \\
& (0.06) & (0.06) \\
rate & -0.22* & -0.65*** \\
& (0.13) & (0.19) \\
research & & 0.80*** \\
& & (0.27) \\
R2 & 0.76 & 0.80 \\
\hline
\end{tabular}
\end{center}\end{table}
The problem seems to arise due to multiple tabular environments. Is there a way to get the investment header on top of the table without manually changing the res3 file (intermediary file)?
You first have to add couple of things to Latex's preamble.
Here you can probably find an answer to your problem Outputting Regressions as Table in Python (similar to outreg in stata)?.
res3 = summary_col([res1,res2],stars=True,float_format='%0.2f',
info_dict={'R2':lambda x: "{:.2f}".format(x.rsquared)})
beginningtex = """\\documentclass{report}
\\usepackage{booktabs}
\\begin{document}"""
endtex = "\end{document}"
f = open('myreg.tex', 'w')
f.write(beginningtex)
f.write(res3.as_latex())
f.write(endtex)
f.close()
All credits to #BKay

Categories

Resources