I am having a problem with Pandas, have looked everywhere but think I am overlooking something.
I have a csv file I import to pandas, which has a ID column and another column I will call Column 2. I want to:
1. Input an ID to python.
2. Search this ID in the ID column with Pandas, and put a 1 on the adjacent cell, in Column 2.
import pandas
csvfile = pandas.read_csv('document1.csv')
#Convert everything to string for simplicity
csvfile['ID'] = csvfile['ID'].astype(str)
#Fill in all missing NaN
csvfile = csvfile.fillna('missing')
#looking for the row in which the ID '10099870.0' is in.
indexid = csvfile.loc[csvfile['ID'] == '10099870.0'].index
# Important part! I think this selects the column 2, row 'indexid' and replaces missing with 1.
csvfile['Column 2'][indexid].replace('missing', '1')
I know this is a simple question but thanks for all your help!
Mauricio
This is what I'd do:
cond = csvfile.ID == '10099870.0'
col = 'Column 2'
csvfile.loc[cond, col] = csvfile.loc[cond, col].replace('missing', '1')
Related
Want to know if I can access the second to last row of this csv file?
Am able to access the very last using:
pd.DataFrame(file1.iloc[-1:,:].values)
But want to know how I can access the one right before the last?
Here is the code I have so far:
import pandas as pd
import csv
url1 = r"https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/vaccinations/country_data/Austria.csv"
file1 = pd.read_csv(url1)
df1 = pd.DataFrame(file1.iloc[:,:].values)
df1 = pd.DataFrame(file1.iloc[-1:,:].values)
Austria_date = df1.iloc[:,1]
Austria_cum = df1.iloc[:, 4].map('{:,}'.format)
if ( Austria_cum.iloc[0] == 'nan' ):
Essentially I am checking if the the row at that specific col is 'nan', which is True, and after which I want to get the data from the row right before the last. Please, how would this be done?
Thank you
As simple as that :
df1.iloc[-2,:]
To access the data frame at any index you can use the following
df.iloc[i]
For your ask you need to set the index as -2, which would look like this:
df.iloc[-2]
Just use negative indices like in your example, but pull out the second to last with -2 instead of the last:
df.iloc[-2,:]
I am working on an automation task in python. I want a column 'result' wherein it should contain '1' if the column 'Battery' contains 'Discharge' and the previous row contains 'None' else it should contain '0'
The 1st row by default should contain 0.
The excel formula is
=IF(AND(AD2="None",AD3="Discharge"),1,0)
IIUC you can use np.where() for setting the condition, and then force the first row to 0:
import numpy as np
df['result'] = np.where(
(df.Battery=='Discharge') & (df.Battery.shift()=='None'),
1,
0
)
df['result'].iloc[0] = 0
After using groupby on a Dataframe to group and sum column data into a series I then converted the result back to a Dataframe using .to_frame method which I then converted to html for output to a file. This appears to work well except that the header row has a zero in the last column which I am unable to delete - any ideas? - see here
0
Board Type NE Type Hardware Version Software Version
enter code here:
NE_3 = NE_2.groupby(NE_2.columns.tolist(), as_index=False).size()
NE_3 = NE_3.to_frame()
NE_2 = NE_2.drop_duplicates()
NE_3 = NE_3.drop(columns='NE Type') # This doesn't work due to the '0' corrupting the header row
html_txt = NE_3.to_html()
tfile.write(html_txt)
tfile.write('<br/>')
Try - NE_2 = NE_2.drop([0], axis=1) if the name of last column is 0.
In case, the name of the last column is Version0, you could try this -
cols = NE_2.columns
cols = cols[:-1] + cols[-1].replace('0','')
NE_2.columns = cols
The easiest method is to write the Dataframe back as a csv file and then re-read it - this resolves the displacement in the header row. The '0' column can then be simply renamed -
NE_3 = NE_3.rename(columns={'0':'Total'})
I am trying to check if the last cell in a pandas data-frame column contains a 1 or a 2 (these are the only options). If it is a 1, I would like to delete the whole row, if it is a 2 however I would like to keep it.
import pandas as pd
df1 = pd.DataFrame({'number':[1,2,1,2,1], 'name': ['bill','mary','john','sarah','tom']})
df2 = pd.DataFrame({'number':[1,2,1,2,1,2], 'name': ['bill','mary','john','sarah','tom','sam']})
In the above example I would want to delete the last row of df1 (so the final row is 'sarah'), however in df2 I would want to keep it exactly as it is.
So far, I have thought to try the following but I am getting an error
if df1['number'].tail(1) == 1:
df = df.drop(-1)
DataFrame.drop removes rows based on labels (the actual values of the indices). While it is possible to do with df1.drop(df1.index[-1]) this is problematic with a duplicated index. The last row can be selected with iloc, or a single value with .iat
if df1['number'].iat[-1] == 1:
df1 = df1.iloc[:-1, :]
You can check if the value of number in the last row is equal to one:
check = df1['number'].tail(1).values == 1
# Or check entire row with
# check = 1 in df1.tail(1).values
If that condition holds, you can select all rows, except the last one and assign back to df1:
if check:
df1 = df1.iloc[:-1, :]
if df1.tail(1).number == 1:
df1.drop(len(df1)-1, inplace = True)
You can use the same tail function
df.drop(df.tail(n).index,inplace=True) # drop last n rows
I am trying to process a CSV file into a new CSV file with only columns of interest and remove rows with unfit values of -1. Unfortunately I get unexpected results, as it automatically includes column 0 (old ID) into the new CSV file without explicitly asking the script to do it (as it is not defined in cols = [..]).
How could I change these values for the new row count. That for, when for example we remove row 9 with an id=9, the dataset id goes currently as [..7,8,10...] instead of a new id count as [..7,8,9,10...]. I hope anyone got a solution for it.
import pandas as pd
# take only specific columns from dataset
cols = [1, 5, 6]
data = pd.read_csv('data_sample.csv', usecols=cols, header=None) data.columns = ["url", "gender", "age"]
# remove rows from dataset with undefined values of -1
data = data[data['gender'] != -1]
data = data[data['age'] != -1]
""" Additional working solution
indexGender = data[data['gender'] == -1].index
indexAge = data[data['age'] == -1].index
# Delete the rows indexes from dataFrame
data.drop(indexGender,inplace=True)
data.drop(indexAge, inplace=True)
"""
data.to_csv('data_test.csv')
Thank you in advance.
I solved the problem via simple line after the data drop:
data.reset_index(drop=True, inplace=True)