Is there anyway to remove index number in python when using pandas? - python

This is just a simple code that can take out some dataframes by using input dates.
It works right, but my issues has suddenly appeared once more.
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import datetime
plt.rc('font', family = 'Malgun Gothic')
df = pd.read_csv('seoul.csv', encoding = 'cp949', index_col=False)
df.style.hide_index()
del df['지점']
a = input("날짜 입력 yyyy-mm-dd: ")
b = input("날짜 입력 yyyy-mm-dd: ")
df['날짜'] = pd.to_datetime(df['날짜'])
mask = (df['날짜']>=a) & (df['날짜']<=b)
df.loc[mask]
And this is the result.
How can I remove these numbers?(the row that I point out with a red box)
oh edit: change index_col=0 is not work since some of rows are in a different level.

The index is the way the rows are identified. You can't remove it.
You can only reset it, if you make some selection and want to reindex your dataframe.
df = df.reset_index(drop=True)
If the argument drop is set to False, the indexes would come in an additional column named index.

Try df.to_csv(filename, index=False) –
tbhaxor
Jan 14, 2020 at 9:27

Related

How to turn column headers into row in order to plot in chart?

This is what happens when I try df.T.plot and it is pulling from the wrong dataframe:
df1 = open_res[['Name','6-Jun','16-Jun','26-Jun','6-Jul','16-Jul','26-Jul','5-Aug','15-Aug','4-Sep','14-Sep','24-Sep','30-Aug','4-Oct','14-Sep','24-Oct','3-Nov','13-Nov','23-Nov','3-Dec']]
df2 = df1.loc[df1['Name'] == 'Global']
df2
The data returns show each date in the format seen above as a column head. How can I change it so that they may be plotted along the x axis?
The data as seen in the picture is cleaned up because I just want the Global row
You get that error the first column is a string and your other columns are numeric, and when you transpose, everything is converted to string. Using some example data like yours:
import pandas as pd
import numpy as np
open_res = pd.DataFrame(np.random.uniform(0,1,(2,19)),
columns=['6-Jun','16-Jun','26-Jun','6-Jul','16-Jul','26-Jul','5-Aug','15-Aug',
'4-Sep','14-Sep','24-Sep','30-Aug','4-Oct','14-Sep','24-Oct',
'3-Nov','13-Nov','23-Nov','3-Dec'])
open_res['Name'] = ['Global','x']
df1 = open_res[['Name','6-Jun','16-Jun','26-Jun','6-Jul','16-Jul','26-Jul','5-Aug','15-Aug',
'4-Sep','14-Sep','24-Sep','30-Aug','4-Oct','14-Sep','24-Oct','3-Nov','13-Nov','23-Nov','3-Dec']]
df2 = df1.loc[df1['Name'] == 'Global']
We transpose:
df2.T.dtypes
0 object
You can do:
df2.set_index('Name').T.plot()

How to drop a row by index from my pandas data-frame to prevent them appearing my my bar chart

I am using df.drop however when I run my code I'm still seeing the "total" on row index 10 in my plot. I want this removed.
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv ("https://raw.githubusercontent.com/ryan-j-hope/Shielding-models/master/2020%20Uk%20Population%20By%20Age%20Demographics.csv", encoding='UTF')
df.drop([10])
print(df)
ag = df["Age Group"]
pop = df["Population (%)"]
plt.bar(ag, pop)
plt.show()
You don't need brackets. Also, you need to specify inplace
df.drop(10, inplace=True)
df.drop([10]) creates a copy of df with the row dropped. Try assigning it to a new DataFrame:
df2 = df.drop([10])
then extract the columns from df2. Or use the inplace argument to permanently modify df:
df.drop([10], inplace=True)
Make a small change in your code. Write df.drop(10,inplace = True) while dropping.
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv ("https://raw.githubusercontent.com/ryan-j-hope/Shielding-models/master/2020%20Uk%20Population%20By%20Age%20Demographics.csv", encoding='UTF')
df.drop(10,inplace = True)
print(df)
ag = df["Age Group"]
pop = df["Population (%)"]
plt.bar(ag, pop)
plt.show()

Pandas: need to remove the row that contains a string. BUT my condition is not working

from chainer import datasets
from chainer.datasets import tuple_dataset
import numpy as np
import matplotlib.pyplot as plt
import chainer
import pandas as pd
import math
I have a file CSV contains 40300 data.
df =pd.read_csv("Myfile.csv", header = None)
in this area i am removing the ignored rows and columns
columns = [0,1]
rows = [0,1,2]
df.drop(columns, axis = 1, inplace = True) #drop the two first columns that no need to the code
df.drop(rows, axis = 0, inplace = True) #drop the two first rwos that no need to the code
in this area i want to remove the row if string data type faced BUT its not working
df[~df.E.str.contains("Intf Shut")]~this part is not working with me
df.to_csv('summary.csv', index = False, header = False)
df.head()
You have to reassign the value of df in df
df = df[~df.E.str.contains("Intf Shut")]
have to change the column name into array which I choose the third column,
df[~df[2].isin(to_drop)]
Then you can define first a variable "to_drop" to the specific text that contains, Which its like following.
to_drop = ['My text 1', 'My text 2']

Sum all columns in each column in csv using Pandas

The program I have written generally has done what I've wanted it to do - for the most part. To add totals of each column. My dataframe uses the csv file format. My code is below:
import pandas as pd
import matplotlib.pyplot
class ColumnCalculation:
"""This houses the functions for all the column manipulation calculations"""
def max_electricity(self):
df.set_index('Date', inplace=True)
df.loc['Total'] = df.sum()
print(df)
df = pd.read_csv("2011-onwards-city-elec-consumption.csv")
ColumnCalculation.max_electricity(df)
Also my dataset (I didn't know how to format it properly)
The code nicely adds up all totals into a total column at the bottom of each column, except when it comes to the last column(2017)(image below):
I am not sure the program does is, I've tried to use different formatting options like .iloc or .ix but it doesn't seem to make a difference. I have also tried adding each column individually (below):
def max_electricity(self):
df.set_index('Date', inplace=True)
df.loc['Total', '2011'] = df['2011'].sum()
df.loc['Total', '2012'] = df['2012'].sum()
df.loc['Total', '2013'] = df['2013'].sum()
df.loc['Total', '2014'] = df['2014'].sum()
df.loc['Total', '2015'] = df['2015'].sum()
df.loc['Total', '2016'] = df['2016'].sum()
df.loc['Total', '2017'] = df['2017'].sum()
print(df)
But I receive an error, as I assume this would be too much? I've tried to figure this out for a good hour and a bit.
Your last column isn't being parsed as floats, but strings.
To fix this, try casting to numeric before summing:
import locale
locale.setlocale(locale.LC_NUMERIC, '')
df['2017'] = df['2017'].map(locale.atoi)
Better still, try reading in the data as numeric data. For example:
df = pd.read_csv('file.csv', sep='\t', thousands=',')

How to index a pandas data frame starting at n?

Is it possible to start the index from n in a pandas dataframe?
I have some datasets saved as csv files, and would like to add the column index with the row number starting from where the last row number ended in the previous file.
For example, for the first file I'm using the following code which works fine, so I got an output csv file with rows starting at 1 to 1048574, as expected:
yellow_jan['index'] = range(1, len(yellow_jan) + 1)
I would like to do same for the yellow_feb file, but starting the row index at 1048575 and so on.
Appreciate any help!
df["new_index"] = range(10, 20)
df = df.set_index("new_index")
df
If your plan is to concat the dataframe you can just use
import pandas as pd
import numpy as np
df1 = pd.DataFrame({"a": np.arange(10)})
df2 = pd.DataFrame({"a": np.arange(10,20)})
df = pd.concat([df1, df2],ignore_index=True)
otherwise
df2.index += len(df)
you may just reset the index at the end or define a local variable and use it in `arange' function. update the variable with the numbers of rows for each file you read.

Categories

Resources