Drop decimals and add commas Pandas - python

I want a column in my Dataframe to have no decimals but have commas. It's for a bar chart. Every time I add the commas I get the decimals. Even if I convert the column to integers first. Here is the DataFrame and what I tried that is not working!
df = pd.read_csv('https://github.com/ngpsu22/indigenous-peoples-day/raw/main/native_medians_means')
summary.med_resources_per_person.astype(int)
summary["med_resources_per_person"] = (summary["med_resources_per_person"].apply(lambda x : "
{:,}".format(x)))

You're not actually changing the dtype to int inside of the dataframe. You'll need to assign it back to the column:
df = pd.read_csv('https://github.com/ngpsu22/indigenous-peoples-day/raw/main/native_medians_means')
df["med_resources_per_person"] = df["med_resources_per_person"].astype(int)
df["med_resources_per_person"].apply(lambda x : "{:,}".format(x))
Or a little bit more concise:
df = pd.read_csv('https://github.com/ngpsu22/indigenous-peoples-day/raw/main/native_medians_means')
df["med_resources_per_person"] = df["med_resources_per_person"].astype(int).apply("{:,}".format)

Related

Writing a loop with an integer in python

I have a dataframe as such:
data = [[0xD8E3ED, 2043441], [0xF7F4EB, 912788],[0x000000,6169]]
df = pd.DataFrame(data, columns=['c_code', 'occurence'])
I am attempting to loop through c_code to get an integer value. The following code works to obtain the integer
hex_val = '0xFF9B3B'
print(int(hex_val, 0))
16751419
But when I try to loop through the column I run into an issue. I currently have this running but am just overwriting every value.
for i in range(len(df)):
df['value'] = int((df['c_code'].iloc[i]), 0)
Ideal output would be a df with a value column that reflects the value of the c_code. The image below shows the desired format but notice that the value is the same for all rows. I believe that I need to append rows but I am unsure of how to do that
I believe that you can modify the type of your column c_code and assign this to a new column.
import pandas as pd
data = [['0xD8E3ED', 2043441], ['0xF7F4EB', 912788],['0x000000',6169]]
df = pd.DataFrame(data, columns=['c_code', 'occurence'])
df['value'] = df['c_code'].apply(int, base=16)
Also, I had to put the hexadecimal numbers as strings, if not pandas converts them to int directly.
I get this result:
You are assigning the entire column to a new value at each step in the loop
df["value"] = ...
To specify a row you need to change it to df["value"][i] = ...
However, You shouldn't have to loop through each value in Pandas.
try:
df["value"] = int(df["c_code"], 0)

Pandas Removing Leading Zeros

I have a short script to pivot data. The first column is a 9 digit ID number, often beginning with zeros such as 000123456
Here is the script:
df = pd.read_csv('source')
new_df = df.pivot_table(index = 'id', columns = df.groupby('id').cumcount().add(1), values = ['prog_id', 'prog_type'], aggfunc='first').sort_index(axis=1,level=1)
new_df.columns = [f'{x}_{y}' for x,y in new_df.columns]
new_df.to_csv('destination')
print(new_df)
Although the CSV is being read with an id such as 000123456, the output only contains 123456
Even when setting an explicit dtype, Pandas removes the leading zeros. Is there a work around for telling Pandas to leave the leading zeros?
Per comment on original post, set dtype as string:
df = pd.read_csv('source', dtype={'id':np.str})
You could use pandas' zfill() method right after reading your csv file "source". Basically, you would fill the values of your attribute "id", with as many zeros as you would like, in this particular case, making the number 9 digits long (3 zeros + 6 original digits). So, we would have:
df = pd.read_csv('source')
df.index = df.index.str.zfill(9)
# (...)

Python Pandas Changing Column String Values to Float values in a new column

I have a dataframe which contains a column of strings of float values (negative and positive float values) in the format
.000 (3dp). This number is represented as a string in Column1 and I would like to add a column2 to the DataFrame as a float and convert the string representation of the float value to a float value preserving the 3 dp. I have had problems trying to do this and have error message of "ValueError: could not convert string to float:" Grateful for any help
Code
dataframe4['column2'] = ''
dataframe4['column2']=dataframe4['column1'].astype('float64')
#Round column1, column2 float columns to 3 decimal places
dataframe4.round({'column1': 3, 'column2': 3})
I don't know if I totally understood your question but you can try
dataframe4['column2'] = dataframe4['column1'].apply(lambda x : float(x))
Edit : If there are some numbers with commas, you can try:
dataframe4['column2'] = dataframe4['column1'].apply(lambda x : float(x.replace(",","")))
The problem appears to be that you have commas in your floats, e.g. '9,826.000'
You can fix like below
import re
re.sub(r",", "", "1,1000.20")
# returns '11000.20' and the below works
float(re.sub(r",", "", "1,1000.20"))
# you can e.g. use apply to apply to all your numbers in the DataFrame
df["new_col"] = df["old_col"].apply(lambda x: float(re.sub(r",", "", x)))
To still show the resulting float with commas afterwards in pandas, you can change the display setting for float as described here
IDK how you want to output these, but e.g. in the to_excel function, you can specify a float format, cf here or re-format the column before output, similar to the above. See this answer for some ideas.

How do I either remove index of the data frame or replace it?

This is my current dataframe.
df =pd.DataFrame({'Observation':index,'x':x,'y':y,'dx':np.round(dx, decimals
=2),'vx':np.round(vx, decimals=2),'dy':np.round(dy,
decimals=2),'vy':np.round(vy, decimals=2), 'pxy':np.round(pxy, decimals=2)})
df = df.reindex_axis(['Observation','x','y','dx','vx','dy','vy', 'pxy'],
axis=1)
df.loc['SUM']=df.
df
I would like the my Observation Column to the index of the dataframe. How can I do it? Also, is it possible for the values in columns dx,dy,vx,vy and pxy to be shown in 2 decimal place?
You can use the pandas set_index() method to change the index of the data frame to your column values.
df.set_index('Observation')
API Documentation
The API numpy.round() is used for rounding a float value and does not append 0 after the decimal point. To achieve this, you need to use python inbuilt function, format(). So your code should be :
format(x, '.2f')
Follow this stack overflow thread for more info: Add zeros to a float after the decimal point in Python
I hope I understood you correctly
this will clear the previous index:
df.reset_index(inplace=True)
this will set the index:
df.set_index('Observation', inplace=True)
this will show 2 decimal places:
df.round(2)
This worked for me.
This removes the column names:
df.columns = ['' for index in range(len(df.columns))]
This removes the [0, 1, 2...] default index:
df.index = ('' for index in range(len(df.index)))

When aggregating on an empty dataframe, Pandas agg function converts result column type to float64. How can I make sure the type stays consistent?

import pandas
from decimal import Decimal
base_data = pandas.DataFrame(data = {'name':'Sarah', 'balance': Decimal(1)}, index = [0])
## drop the first row and aggregate
summary_data = base_data.drop(0).groupby('name').agg({'balance' : 'sum'})
summary_data.balance.dtype
yields
dtype('float64')
instead of Decimal or dtype('O') as it should.
This problem causes a type error later in my code, when I do a left join, fill with zeros, and try to add another decimal to the float64).
Recast your dataframe with astype
summary_data = base_data.drop(0).groupby('name').agg({'balance' : 'sum'}).astype(base_data.dtypes)

Categories

Resources