Python Tabulate format only one float column - python

I'm using the tabulate module to print a fixed width file and I have one column that I need formatted in such a way that there are 19 places to the left of the decimal and 2 places to the right of the decimal.
import pandas as pd
from tabulate import tabulate
df = pd.DataFrame.from_dict({'A':['x','y','z'],
'B':[1,1.1,11.21],'C':[34.2334,81.1,11]})
df
Out[4]:
A B C
0 x 1.00 34.2334
1 y 1.10 81.1000
2 z 11.21 11.0000
df['C'] = df['C'].apply(lambda x: format(x,'0>22.2f'))
df
Out[6]:
A B C
0 x 1.00 0000000000000000034.23
1 y 1.10 0000000000000000081.10
2 z 11.21 0000000000000000011.00
print(tabulate(df))
- - ----- -----
0 x 1 34.23
1 y 1.1 81.1
2 z 11.21 11
- - ----- -----
Is there any way I can preserve the formatting in column C without affecting the formatting in column B? I know I could use floatfmt = '0>22.2f' but I don't need column B to look that way just column C.
According to the tabulate documentation strings that look like decimals will be automatically converted to numeric. If I could suppress this then format my table before printing (as in the example above) that would solve it for me as well.

The documentation at GitHub is more up-to-date and it states that with floatfmt "every column may have different number formatting". Here is an example using your data:
import pandas as pd
from tabulate import tabulate
df = pd.DataFrame.from_dict({'A':['x','yy','zzz'],
'B':[1,1.1,11.21],'C':[34.2334,81.1,11]})
print(tabulate(df, floatfmt=(None, None, '.2f', '0>22.2f',)))
The result is:
- --- ----- ----------------------
0 x 1.00 0000000000000000034.23
1 yy 1.10 0000000000000000081.10
2 zzz 11.21 0000000000000000011.00
- --- ----- ----------------------
Additionally, as you suggested, you also have the option disable_numparse which disables the automatic convert from string to numeric. You can then format each field manually but this requires more coding. The option colalign may come handy in such a case, so that you can specify different column alignment for strings and numbers (which you would have converted to formatted strings, too).

Do you absolutely need tabulate for this? You can achieve similar effect (bar dashes) with:
In [18]: print(df.__repr__().split('\n',1)[1])
0 x 1.00 0000000000000000034.23
1 y 1.10 0000000000000000081.10
2 z 11.21 0000000000000000011.00
df.__repr__ is representation of df, i.e. what you see when you just type df in a cell. Then I remove the header line by splitting on the first new line char and taking the other half of the split.
Also, if you write it to a machine readable form, you might want to use tabs:
In [8]: df.to_csv(sys.stdout, sep='\t', header=False)
0 x 1.0 0000000000000000034.23
1 y 1.1 0000000000000000081.10
2 z 11.21 0000000000000000011.00
It will render pretty depending on tab rendering settings, but if you output in a file, then you get tab symbols

Related

Load data from txt

I am loading a txt file containig complex number. The data are formatted in this way
How can I create a two separate arrays, one for the real part and one for the imaginary part?
I tried to create a panda dataframe using e-01 as a separator but in this way I loose this info
df = pd.read_fwf(r'c:\test\complex.txt', header=None)
df[['real','im']] = df[0].str.extract(r'\(([-.\de]+)([+-]\d\.[\de\-j]+)')
print(df)
0 real im
0 (9.486832980505137680e-01-3.162277660168379412... 9.486832980505137680e-01 -3.162277660168379412e-01j
1 (9.486832980505137680e-01+9.486832980505137680... 9.486832980505137680e-01 +9.486832980505137680e-01j
2 (-9.486832980505137680e-01+9.48683298050513768... -9.486832980505137680e-01 +9.486832980505137680e-01j
3 (-3.162277660168379412e-01+3.16227766016837941... -3.162277660168379412e-01 +3.162277660168379412e-01j
4 (-3.162277660168379412e-01+9.48683298050513768... -3.162277660168379412e-01 +9.486832980505137680e-01j
5 (9.486832980505137680e-01-3.162277660168379412... 9.486832980505137680e-01 -3.162277660168379412e-01j
6 (-3.162277660168379412e-01+3.16227766016837941... -3.162277660168379412e-01 +3.162277660168379412e-01j
7 (9.486832980505137680e-01-9.486832980505137680... 9.486832980505137680e-01 -9.486832980505137680e-01j
8 (9.486832980505137680e-01-9.486832980505137680... 9.486832980505137680e-01 -9.486832980505137680e-01j
9 (-3.162277660168379412e-01+3.16227766016837941... -3.162277660168379412e-01 +3.162277660168379412e-01j
10 (3.162277660168379412e-01-9.486832980505137680... 3.162277660168379412e-01 -9.486832980505137680e-01j
Never knew how annoyingly involved it is to read complex numbers with Pandas, This is a slightly different solution than #Алексей's. I prefer to avoid regular expressions when not absolutely necessary.
# Read the file, pandas defaults to string type for contents
df = pd.read_csv('complex.txt', header=None, names=['string'])
# Convert string representation to complex.
# Use of `eval` is ugly but works.
df['complex'] = df['string'].map(eval)
# Alternatively...
#df['complex'] = df['string'].map(lambda c: complex(c.strip('()')))
# Separate real and imaginary parts
df['real'] = df['complex'].map(lambda c: c.real)
df['imag'] = df['complex'].map(lambda c: c.imag)
df
is...
string complex \
0 (9.486832980505137680e-01-3.162277660168379412... 0.948683-0.316228j
1 (9.486832980505137680e-01+9.486832980505137680... 0.948683+0.948683j
2 (-9.486832980505137680e-01+9.48683298050513768... -0.948683+0.000000j
3 (-3.162277660168379412e-01+3.16227766016837941... -0.316228+0.316228j
4 (-3.162277660168379412e-01+9.48683298050513768... -0.316228+0.948683j
5 (9.486832980505137680e-01-3.162277660168379412... 0.948683-0.316228j
6 (3.162277660168379412e-01+3.162277660168379412... 0.316228+0.316228j
7 (9.486832980505137680e-01-9.486832980505137680... 0.948683-0.948683j
real imag
0 0.948683 -3.162278e-01
1 0.948683 9.486833e-01
2 -0.948683 9.486833e-01
3 -0.316228 3.162278e-01
4 -0.316228 9.486833e-01
5 0.948683 -3.162278e-01
6 0.316228 3.162278e-01
7 0.948683 -9.486833e-01
df.dtypes
prints out..
string object
complex complex128
real float64
imag float64
dtype: object

Convert the string into a float value

I have copied a table with three columns from a pdf file. I am attaching the screenshot from the PDF here:
The values in the column padj are exponential values, however, when you copy from the pdf to an excel and then open it with pandas, these are strings or object data types. Hence, these values cannot be parsed as floats or numeric values. I need these values as floats, not as strings. Can someone help me with some suggestions?
So far this is what I have tried.
The excel or the csv file is then opened in python using the escape_unicode encoding in order to circumvent the UnicodeDecodeError
## open the file
df = pd.read_csv("S2_GSE184956.csv",header=0,sep=',',encoding='unicode_escape')[["DEGs","LFC","padj"]]
df.head()
DEGs padj LFC
0 JUNB 1.5 ×10-8 -1.273329
1 HOOK2 2.39×10-7 -1.109320
2 EGR1 3.17×10-6 -4.187828
3 DUSP1 3.95×10-6 -3.251030
4 IL6 3.95×10-6 -3.415500
5 ARL4C 5.06×10-6 -2.147519
6 NR4A2 2.94×10-4 -3.001167
7 CCL3L1 4.026×10-4 -5.293694
# Convert the string to float by replacing the x10- with exponential sign
df['padj'] = df['padj'].apply(lambda x: (unidecode(x).replace('x10-','x10-e'))).astype(float)
That threw an error,
ValueError: could not convert string to float: '1.5 x10-e8'
Any suggestions would be appreciated. Thanks
With the dataframe shared in the question on this last edit, the following using pandas.Series.str.replace and pandas.Series.astype will do the work:
df['padj'] = df['padj'].str.replace('×10','e').str.replace(' ', '').astype(float)
The goal is to get the cells to look like the following 1.560000e-08.
Notes:
Depending on the rest of the dataframe, additional adjustments might still be required, such as, removing the spaces ' that might exist in one of the cells. For that one can use pandas.Series.str.replace as follows
df['padj'] = df['padj'].str.replace("'", '')
Considering your sample (column padj), the code below should work:
f_value = eval(str_float.replace('x10', 'e').replace(' ', ''))
Updated based on the data you provided above. The most significant thing being that the x is actually a times symbol:
import pandas as pd
DEGs = ["JUNB", "HOOK2", "EGR1", "DUSP1", "IL6", "ARL4C", "NR4A2", "CCL3L1"]
padj = ["1.5 ×10-8", "2.39×10-7", "3.17×10-6", "3.95×10-6", "3.95×10-6", "5.06×10-6", "2.94×10-4", "4.026×10-4"]
LFC = ["-1.273329", "-1.109320", "-4.187828", "-3.251030", "-3.415500", "-2.147519", "-3.001167", "-5.293694"]
df = pd.DataFrame({'DEGs': DEGs, 'padj': padj, 'LFC': LFC})
# change to python-friendly float format
df['padj'] = df['padj'].str.replace(' ×10-', 'e-', regex=False)
df['padj'] = df['padj'].str.replace('×10-', 'e-', regex=False)
# convert padj from string to float
df['padj'] = df['padj'].astype(float)
will give you this dataframe:
If you want a numerical vectorial solution, you can use:
df['float'] = (df['padj'].str.extract(r'(\d+(?:\.\d+))\s*×10(.?\d+)')
.apply(pd.to_numeric).pipe(lambda d: d[0].mul(10.**d[1]))
)
output:
DEGs padj LFC float
0 JUNB 1.5 ×10-8 -1.273329 1.500000e-08
1 HOOK2 2.39×10-7 -1.109320 2.390000e-07
2 EGR1 3.17×10-6 -4.187828 3.170000e-06
3 DUSP1 3.95×10-6 -3.251030 3.950000e-06
4 IL6 3.95×10-6 -3.415500 3.950000e-06
5 ARL4C 5.06×10-6 -2.147519 5.060000e-06
6 NR4A2 2.94×10-4 -3.001167 2.940000e-04
7 CCL3L1 4.026×10-4 -5.293694 4.026000e-04
Intermediate:
df['padj'].str.extract('(\d+(?:\.\d+))\s*×10(.?\d+)')
0 1
0 1.5 -8
1 2.39 -7
2 3.17 -6
3 3.95 -6
4 3.95 -6
5 5.06 -6
6 2.94 -4
7 4.026 -4

How to convert string into datetime?

I'm quite new to Python and I'm encountering a problem.
I have a dataframe where one of the columns is the departure time of flights. These hours are given in the following format : 1100.0, 525.0, 1640.0, etc.
This is a pandas series which I want to transform into a datetime series such as : S = [11.00, 5.25, 16.40,...]
What I have tried already :
Transforming my objects into string :
S = [str(x) for x in S]
Using datetime.strptime :
S = [datetime.strptime(x,'%H%M.%S') for x in S]
But since they are not all the same format it doesn't work
Using parser from dateutil :
S = [parser.parse(x) for x in S]
I got the error :
'Unknown string format'
Using the panda datetime :
S= pd.to_datetime(S)
Doesn't give me the expected result
Thanks for your answers !
Since it's a columns within a dataframe (A series), keep it that way while transforming should work just fine.
S = [1100.0, 525.0, 1640.0]
se = pd.Series(S) # Your column
# se:
0 1100.0
1 525.0
2 1640.0
dtype: float64
setime = se.astype(int).astype(str).apply(lambda x: x[:-2] + ":" + x[-2:])
This transform the floats to correctly formatted strings:
0 11:00
1 5:25
2 16:40
dtype: object
And then you can simply do:
df["your_new_col"] = pd.to_datetime(setime)
How about this?
(Added an if statement since some entries have 4 digits before decimal and some have 3. Added the use case of 125.0 to account for this)
from datetime import datetime
S = [1100.0, 525.0, 1640.0, 125.0]
for x in S:
if str(x).find(".")==3:
x="0"+str(x)
print(datetime.strftime(datetime.strptime(str(x),"%H%M.%S"),"%H:%M:%S"))
You might give it a go as follows:
# Just initialising a state in line with your requirements
st = ["1100.0", "525.0", "1640.0"]
dfObj = pd.DataFrame(st)
# Casting the string column to float
dfObj_num = dfObj[0].astype(float)
# Getting the hour representation out of the number
df1 = dfObj_num.floordiv(100)
# Getting the minutes
df2 = dfObj_num.mod(100)
# Moving the minutes on the right-hand side of the decimal point
df3 = df2.mul(0.01)
# Combining the two dataframes
df4 = df1.add(df3)
# At this point can cast to other types
Result:
0 11.00
1 5.25
2 16.40
You can run this example to verify the steps for yourself, also you can make it into a function. Make slight variations if needed in order to tweak it according to your precise requirements.
Might be useful to go through this article about Pandas Series.
https://www.geeksforgeeks.org/python-pandas-series/
There must be a better way to do this, but this works for me.
df=pd.DataFrame([1100.0, 525.0, 1640.0], columns=['hour'])
df['hour_dt']=((df['hour']/100).apply(str).str.split('.').str[0]+'.'+
df['hour'].apply((lambda x: '{:.2f}'.format(x/100).split('.')[1])).apply(str))
print(df)
hour hour_dt
0 1100.0 11.00
1 525.0 5.25
2 1640.0 16.40

Output of column in Pandas dataframe from float to currency (negative values)

I have the following data frame (consisting of both negative and positive numbers):
df.head()
Out[39]:
Prices
0 -445.0
1 -2058.0
2 -954.0
3 -520.0
4 -730.0
I am trying to change the 'Prices' column to display as currency when I export it to an Excel spreadsheet. The following command I use works well:
df['Prices'] = df['Prices'].map("${:,.0f}".format)
df.head()
Out[42]:
Prices
0 $-445
1 $-2,058
2 $-954
3 $-520
4 $-730
Now my question here is what would I do if I wanted the output to have the negative signs BEFORE the dollar sign. In the output above, the dollar signs are before the negative signs. I am looking for something like this:
-$445
-$2,058
-$954
-$520
-$730
Please note there are also positive numbers as well.
You can use np.where and test whether the values are negative and if so prepend a negative sign in front of the dollar and cast the series to a string using astype:
In [153]:
df['Prices'] = np.where( df['Prices'] < 0, '-$' + df['Prices'].astype(str).str[1:], '$' + df['Prices'].astype(str))
df['Prices']
Out[153]:
0 -$445.0
1 -$2058.0
2 -$954.0
3 -$520.0
4 -$730.0
Name: Prices, dtype: object
You can use the locale module and the _override_localeconv dict. It's not well documented, but it's a trick I found in another answer that has helped me before.
import pandas as pd
import locale
locale.setlocale( locale.LC_ALL, 'English_United States.1252')
# Made an assumption with that locale. Adjust as appropriate.
locale._override_localeconv = {'n_sign_posn':1}
# Load dataframe into df
df['Prices'] = df['Prices'].map(locale.currency)
This creates a dataframe that looks like this:
Prices
0 -$445.00
1 -$2058.00
2 -$954.00
3 -$520.00
4 -$730.00

Output different precision by column with pandas.DataFrame.to_csv()?

Question
Is it possible to specify a float precision specifically for each column to be printed by the Python pandas package method pandas.DataFrame.to_csv?
Background
If I have a pandas dataframe that is arranged like this:
In [53]: df_data[:5]
Out[53]:
year month day lats lons vals
0 2012 6 16 81.862745 -29.834254 0.0
1 2012 6 16 81.862745 -29.502762 0.1
2 2012 6 16 81.862745 -29.171271 0.0
3 2012 6 16 81.862745 -28.839779 0.2
4 2012 6 16 81.862745 -28.508287 0.0
There is the float_format option that can be used to specify a precision, but this applys that precision to all columns of the dataframe when printed.
When I use that like so:
df_data.to_csv(outfile, index=False,
header=False, float_format='%11.6f')
I get the following, where vals is given an inaccurate precision:
2012,6,16, 81.862745, -29.834254, 0.000000
2012,6,16, 81.862745, -29.502762, 0.100000
2012,6,16, 81.862745, -29.171270, 0.000000
2012,6,16, 81.862745, -28.839779, 0.200000
2012,6,16, 81.862745, -28.508287, 0.000000
Change the type of column "vals" prior to exporting the data frame to a CSV file
df_data['vals'] = df_data['vals'].map(lambda x: '%2.1f' % x)
df_data.to_csv(outfile, index=False, header=False, float_format='%11.6f')
The more current version of hknust's first line would be:
df_data['vals'] = df_data['vals'].map(lambda x: '{0:.1}'.format(x))
To print without scientific notation:
df_data['vals'] = df_data['vals'].map(lambda x: '{0:.1f}'.format(x))
This question is a bit old, but I'd like to contribute with a better answer, I think so:
formats = {'lats': '{:10.5f}', 'lons': '{:.3E}', 'vals': '{:2.1f}'}
for col, f in formats.items():
df_data[col] = df_data[col].map(lambda x: f.format(x))
I tried with the solution here, but it didn't work for me, I decided to experiment with previus solutions given here combined with that from the link above.
You can use round method for dataframe before saving the dataframe to the file.
df_data = df_data.round(6)
df_data.to_csv('myfile.dat')
You can do this with to_string. There is a formatters argument where you can provide a dict of columns names to formatters. Then you can use some regexp to replace the default column separators with your delimiter of choice.
The to_string approach suggested by #mattexx looks better to me, since it doesn't modify the dataframe.
It also generalizes well when using jupyter notebooks to get pretty HTML output, via the to_html method. Here we set a new default precision of 4, and override it to get 5 digits for a particular column wider:
from IPython.display import HTML
from IPython.display import display
pd.set_option('precision', 4)
display(HTML(df.to_html(formatters={'wider': '{:,.5f}'.format})))

Categories

Resources