Conversion of series to Dataframe Issue - python

After using groupby on a Dataframe to group and sum column data into a series I then converted the result back to a Dataframe using .to_frame method which I then converted to html for output to a file. This appears to work well except that the header row has a zero in the last column which I am unable to delete - any ideas? - see here
0
Board Type NE Type Hardware Version Software Version
enter code here:
NE_3 = NE_2.groupby(NE_2.columns.tolist(), as_index=False).size()
NE_3 = NE_3.to_frame()
NE_2 = NE_2.drop_duplicates()
NE_3 = NE_3.drop(columns='NE Type') # This doesn't work due to the '0' corrupting the header row
html_txt = NE_3.to_html()
tfile.write(html_txt)
tfile.write('<br/>')

Try - NE_2 = NE_2.drop([0], axis=1) if the name of last column is 0.
In case, the name of the last column is Version0, you could try this -
cols = NE_2.columns
cols = cols[:-1] + cols[-1].replace('0','')
NE_2.columns = cols

The easiest method is to write the Dataframe back as a csv file and then re-read it - this resolves the displacement in the header row. The '0' column can then be simply renamed -
NE_3 = NE_3.rename(columns={'0':'Total'})

Related

Writing a loop with an integer in python

I have a dataframe as such:
data = [[0xD8E3ED, 2043441], [0xF7F4EB, 912788],[0x000000,6169]]
df = pd.DataFrame(data, columns=['c_code', 'occurence'])
I am attempting to loop through c_code to get an integer value. The following code works to obtain the integer
hex_val = '0xFF9B3B'
print(int(hex_val, 0))
16751419
But when I try to loop through the column I run into an issue. I currently have this running but am just overwriting every value.
for i in range(len(df)):
df['value'] = int((df['c_code'].iloc[i]), 0)
Ideal output would be a df with a value column that reflects the value of the c_code. The image below shows the desired format but notice that the value is the same for all rows. I believe that I need to append rows but I am unsure of how to do that
I believe that you can modify the type of your column c_code and assign this to a new column.
import pandas as pd
data = [['0xD8E3ED', 2043441], ['0xF7F4EB', 912788],['0x000000',6169]]
df = pd.DataFrame(data, columns=['c_code', 'occurence'])
df['value'] = df['c_code'].apply(int, base=16)
Also, I had to put the hexadecimal numbers as strings, if not pandas converts them to int directly.
I get this result:
You are assigning the entire column to a new value at each step in the loop
df["value"] = ...
To specify a row you need to change it to df["value"][i] = ...
However, You shouldn't have to loop through each value in Pandas.
try:
df["value"] = int(df["c_code"], 0)

How to remove column with number as index name?

I have the following dataframe:
I tried to drop the data of -1 column by using
df = df.drop(columns=['-1'])
However, it is giving me the following error:
I was able to drop the column if the column name is some language character using this similar coding script, but not a number. What am I doing wrong?
You can test real columns names by converting them to list:
print (df.columns.tolist())
I think you need droping number -1 instead string '-1':
df = df.drop(columns=[-1])
Or another solution with same ouput:
df = df.drop(-1, axis=1)
EDIT:
If need select all columns without first use DataFrame.iloc for select by position, first : means select all rows and second 1: all columns with omit first:
df = df.iloc[:, 1:]
If you are just trying to remove the first column, another approach that would be independent of the column name is this:
df = df[df.columns[1:]]
you can do it simply by using the following code:
first check the name of the column by using following:
df.columns
then if the output is like:
Index(['-1', '0'], dtype='object')
use drop command to delete the column
df.drop(['-1'], axis =1, inplace = True)
guess this should help for future as well

Making a new column based on 2 other columns

I am trying to calculate a new column labeled in the code as "Sulphide-S(calc)-C_%S", this column can be calculated from one of two options (see below in the code). Both these columns wont be filled at the same time. So I want it to calculate from the column that has data present. Presently, I have this but the second equation overwrites the first.
df["Sulphide-S(calc)-C_%S"] = df["Total-S_%S"] - df["Sulphate-S(HCL Leachable)_%S"]
df.head()
df["Sulphide-S(calc)-C_%S"] = df["Total-S_%S"]- df["Sulphate-S_%S"]
df.head()
You can use the apply function in pandas to create a new column based on other columns, resulting in a Series that you can add to your original dataframe. Without knowing what your dataframe looks like, the following code might not work directly until you replace the if condition with a working condition to detect the empty dataframe spot.
def create_sulfide_col(row):
if row["Sulphate-S(HCL Leachable)_%S"] is None:
val = row["Total-S_%S"] - row["Sulphate-S(HCL Leachable)_%S"]
else:
val = ["Total-S_%S"]- df["Sulphate-S_%S"]
return val
df["Sulphide-S(calc)-C_%S"] = df.apply(lambda row: create_sulfide_col(row), axis='columns')
If I'm understanding what you're saying correctly, the second equation overwrites the first because they have the same column name. Try changing the column name in one or both of the "Sulphide-S(calc)-C_%S" to something else like "Sulphide-S(calc)-C_%S_A" and "Sulphide-S(calc)-C_%S_B":
df["Sulphide-S(calc)-C_%S_A"] = df["Total-S_%S"] - df["Sulphate-S(HCL Leachable)_%S"]
df.head()
df["Sulphide-S(calc)-C_%S_B"] = df["Total-S_%S"]- df["Sulphate-S_%S"]
df.head()

Problem when processing from CSV to CSV with a row count

I am trying to process a CSV file into a new CSV file with only columns of interest and remove rows with unfit values of -1. Unfortunately I get unexpected results, as it automatically includes column 0 (old ID) into the new CSV file without explicitly asking the script to do it (as it is not defined in cols = [..]).
How could I change these values for the new row count. That for, when for example we remove row 9 with an id=9, the dataset id goes currently as [..7,8,10...] instead of a new id count as [..7,8,9,10...]. I hope anyone got a solution for it.
import pandas as pd
# take only specific columns from dataset
cols = [1, 5, 6]
data = pd.read_csv('data_sample.csv', usecols=cols, header=None) data.columns = ["url", "gender", "age"]
# remove rows from dataset with undefined values of -1
data = data[data['gender'] != -1]
data = data[data['age'] != -1]
""" Additional working solution
indexGender = data[data['gender'] == -1].index
indexAge = data[data['age'] == -1].index
# Delete the rows indexes from dataFrame
data.drop(indexGender,inplace=True)
data.drop(indexAge, inplace=True)
"""
data.to_csv('data_test.csv')
Thank you in advance.
I solved the problem via simple line after the data drop:
data.reset_index(drop=True, inplace=True)

Editing a specific cell with Pandas [Python]

I am having a problem with Pandas, have looked everywhere but think I am overlooking something.
I have a csv file I import to pandas, which has a ID column and another column I will call Column 2. I want to:
1. Input an ID to python.
2. Search this ID in the ID column with Pandas, and put a 1 on the adjacent cell, in Column 2.
import pandas
csvfile = pandas.read_csv('document1.csv')
#Convert everything to string for simplicity
csvfile['ID'] = csvfile['ID'].astype(str)
#Fill in all missing NaN
csvfile = csvfile.fillna('missing')
#looking for the row in which the ID '10099870.0' is in.
indexid = csvfile.loc[csvfile['ID'] == '10099870.0'].index
# Important part! I think this selects the column 2, row 'indexid' and replaces missing with 1.
csvfile['Column 2'][indexid].replace('missing', '1')
I know this is a simple question but thanks for all your help!
Mauricio
This is what I'd do:
cond = csvfile.ID == '10099870.0'
col = 'Column 2'
csvfile.loc[cond, col] = csvfile.loc[cond, col].replace('missing', '1')

Categories

Resources