Table conversion in python - python

I have a table example input :
Energy1 Energy2
-966.463549649 -966.463549649
-966.463608088 -966.463585840
So I need a script for summing the two energies E1 and E2 and then convert with a factor *627.51 (hartree in kcal/mol) and at the end truncate the number with 4 digits.
I never attempted this with Python. I've always written this in Julia, but I think it should be simple.
Do you know how I can find an example of reading the table and then doing operations with the numbers in it?
something like:
import numpy
data = numpy.loadtxt('table.tab')
print(data[?:,?].sum())

You can use pandas for this if you convert the table to a csv file. You add the columns directly then use the apply function with lambda to multiply each of the elements by the conversion factor. To truncate to 4 digits, you can change pandas global settings to display the format as 1 digit + 3 decimal in scientific notation.
import pandas as pd
df = pd.read_csv('something.csv')
pd.set_option('display.float_format', '{:.3E}'.format)
df['Sum Energies'] = (df['Energy1'] + df['Energy2']).apply(lambda x: x*627.51)
print(df)
This outputs:
Energy1 Energy2 Sum Energies
0 -9.665E+02 -9.665E+02 -1.213E+06
1 -9.665E+02 -9.665E+02 -1.213E+06

Related

Roundin numbers with pandas

I have a pandas dataframe with a column that contains the numbers:
[4.534000e-01, 6.580000e-01, 1.349300e+00, 2.069180e+01, 3.498000e-01,...]
I want to round this column up to 3 decimal places, for which I use the round(col) function; however, I have noticed that panda gives me the following:
[0.453, 0.658, 1.349, 20.692, 0.35,...]
where the last element doesn't have three digits after the decimal.
I would like to have all the numbers rounded with the same amount of digits, for example, like: [0.453, 0.658, 1.349, 20.692, 0.350,...].
How can be done this within pandas?
You can use pandas.DataFrame.round to specify a precision.
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.round.html
import pandas as pd
# instantiate dataframe
dataframe = pd.DataFrame({'column_to_round': [4.534000e-01, 6.580000e-01, 1.349300e+00, 2.069180e+01, 3.498000e-01,]})
# create a new column with this new precision
dataframe['set_decimal_level'] = dataframe.round({'column_to_round': 3})
import pandas as pd
df = pd.DataFrame([4.534000e-01, 6.580000e-01, 1.349300e+00, 2.069180e+01, 3.498000e-01], columns=['numbers'])
df.round(3)
Prints:
0.453 0.658 1.349 20.692 0.350

how to separate the whole number and decimal numbers in a separate columns using python csv?

I saw many similar questions like this but none have been done in csv file using python. Basically I have a column with a decimal numbers and I want to write a code where it creates 2 new columns one for just whole number and other for decimals. I turned the column into numeric using the code below.
df['hour_num'] = pd.to_numeric(df['total_time'])
I already have the column 'total_time' and 'hour_num'. I want to know how to get the column 'Whole number' and 'Decimal'
here is the pic to help better understand.
pic
You can convert the numbers to strings and split on ., convert to a DataFrame and assign to original DataFrame.
df = pd.DataFrame({'col1':[2.123, 3.557, 0.123456]})
df[['whole number', 'decimal']] = df['col1'].astype(str).str.split('.').apply(pd.Series)
df['decimal'] = ('0.' + df['decimal']).astype(float)
df['whole number'] = df['whole number'].astype(int)
Output:
col1 whole number decimal
0 2.123000 2 0.123000
1 3.557000 3 0.557000
2 0.123456 0 0.123456

How to add two columns of a dataframe as Decimals?

I am trying to add two columns together using the Decimal module in Python but can't seem to get the syntax right for this. I have 2 columns called month1 and month2 and do not want these to become floats at any point in the outcome as division and then rounding will later be required.
The month1 and month2 columns are already to several decimals as they are averages and I need to preserve this accuracy in the addition.
I can see guidance online for how to add numbers together using Decimal but not how to apply it to columns in a pandas dataframe. I've tried things like:
df['MonthTotal'] = Decimal.decimal(df['Month1']) + Decimal.decimal(df['Month1'])
What is the solution?
from decimal import Decimal
def convert_decimal(row):
row["monthtotal"] = Decimal(row["month1"])+Decimal(row["month2"])
return row
df = df.apply(convert_decimal, axis =1)
decimal.Decimal is designed to accept single value, not pandas.Series of them. Assuming that your column is holding strings representing number values, you might use .applymap for using decimal.Decimal element-wise i.e.:
import decimal
import pandas as pd
df = pd.DataFrame({'x':['0.1','0.1','0.1'],'y':['0.1','0.1','0.1'],'z':['0.1','0.1','0.1']})
df_decimal = df.applymap(decimal.Decimal)
df_decimal["total"] = df_decimal.x + df_decimal.y + df_decimal.z
print(df_decimal.total[0])
print(type(df_decimal.total[0]))
output
0.3
<class 'decimal.Decimal'>

How do I add a " unit symbol after each number in a column in a pandas data frame?

I am taking over a project that is built in a pandas data frame where there is a large amount of measurements in this format: 6x6 , 52x14
I need to go in and add a quote (") inches unit symbol after each number in two specific columns that have this type of measurement data, the desired outcomes in the above examples would look like this 6"x6" , 52"x14"
How could I concisely write a code segment to add these quotes after each numeric value in those two columns? Another challenging piece is that there is other measurement data in these columns like the word large, small etc. but the only thing I am concerned with is adding the inch mark after each number.
Here's how to do the string replacement for units with a regex (but depending on your use-case, it might make more sense to split them into separate (numeric) columns width, length; see below):
import pandas as pd
df = pd.DataFrame({'measurements': ['6x6', '52x14']})
df['measurements'].str.replace(r'(\d+)', '\\1"')
0 6"x6"
1 52"x14"
whereas if you want separate (numeric) length, width columns:
df[['length','width']] = df['measurements'].str.partition('x')[[0,2]].astype(int)
measurements length width
0 6x6 6 6
1 52x14 52 14
Separate numeric columns is way cleaner if you'll be doing any calculations (e.g. df['area'] = df.apply(lambda row: row['length']*row['width'], axis=1)).
You could then add your custom units formatting via:
globally override pd.options.display.float_format = '{:.2f}"'.format (although your dimensions are ints, not floats). And that hack will override display of all float columns in all dfs.
or on a column- and dataframe-specific basis, use the pandas Styling API (using CSS)
or here's a total hack to override and monkey-patch pandas.io.formats.format.IntArrayFormatter for floats, since pandas bizarrely doesn't have an equivalent of pd.options.display.float_format for ints
until pandas natively implements this enhance for unit support, the 'right' way to do this is How can I manage units in pandas data?, use the pint package.
Note:
in df[['length','width']] = df['measurements'].str.partition('x')[[0,2]].astype(int), we had to do the [[0,2]] subscripting to exclude the 'x' symbol itself that partition returned. Also we had to do .astype(int) to cast from string/pandas 'object' to int.

Suppress Scientific Format in a Dataframe Column

I have a column called accountnumber with values similar to 4.11889000e+11 in a pandas dataframe. I want to suppress the scientific notation and convert the values to 4118890000. I have tried the following method and did not work.
df = pd.read_csv(data.csv)
pd.options.display.float_format = '{:,.3f}'.format
Please recommend.
You don't need the thousand separators "," and the 3 decimals for the account numbers.
Use the following instead.
pd.options.display.float_format = '{:.0f}'.format
I assume the exponential notation for the account numbers must come from the data file. If I create a small csv with the full account numbers, pandas will interpret them as integers.
acct_num
0 4118890000
1 9876543210
df['acct_num'].dtype
Out[51]: dtype('int64')
However, if the account numbers in the csv are represented in exponential notation then pandas will read them as floats.
acct_num
0 4.118890e+11
1 9.876543e+11
df['acct_num'].dtype
Out[54]: dtype('float64')
You have 2 options. First, correct the process that creates the csv so the account numbers are written out correctly. The second is to change the data type of the acct_num column to integer.
df['acct_num'] = df['acct_num'].astype('int64')
df
Out[66]:
acct_num
0 411889000000
1 987654321000

Categories

Resources