Why can't I drop any columns in dataframe? [duplicate] - python

This question already has answers here:
How to get rid of "Unnamed: 0" column in a pandas DataFrame read in from CSV file?
(11 answers)
Closed 2 years ago.
I don't know why unnamed: 0 got there when I reversed the index and for the life of me, I can't drop or del it. It will NOT go away. No matter what I do, by index or by any possible string variation from 'Unnamed: 0' to just '0'. I've tried setting it by columns= or by .drop(df.columns, I've tried everything already in my code such as drop=True. Then I tried dropping other columns and that wouldn't work.
import pandas as pd
# set csv file as constant
TRADER_READER = pd.read_csv('TastyTrades.csv')
# change date format, make date into timestamp object, set date as index, write changes to csv file
def clean_date():
# TRADER_READER['Date'] = TRADER_READER['Date'].replace({'T': ' ', '-0500': '', '-0400': ''}, regex=True)
# TRADER_READER['Date'] = pd.to_datetime(TRADER_READER['Date'], format="%Y-%m-%d %H:%M:%S")
TRADER_READER.set_index('Date', inplace=True, drop=True)
# TRADER_READER.iloc[::-1].reset_index(drop=True)
print(TRADER_READER)
# TRADER_READER.to_csv('TastyTrades.csv')
clean_date()
Unnamed: 0 Type ... Strike Price Call or Put
Date ...
2020-04-01 11:00:05 0 Trade ... 21.0 PUT
2020-04-01 11:00:05 1 Trade ... NaN NaN
2020-03-31 17:00:00 2 Receive Deliver ... 22.0 PUT
2020-03-31 17:00:00 3 Receive Deliver ... NaN NaN
2020-03-27 16:15:00 4 Receive Deliver ... 7.5 PUT
... ... ... ... ... ...
2019-12-12 10:10:22 617 Trade ... 11.0 PUT
2019-12-12 10:10:21 618 Trade ... 45.0 CALL
2019-12-12 10:10:21 619 Trade ... 32.5 PUT
2019-12-12 09:45:42 620 Trade ... 18.0 CALL
2019-12-12 09:45:42 621 Trade ... 13.0 PUT
[622 rows x 16 columns]
Process finished with exit code 0

I think the problem is from the CSV that includes a non-named column, to fix it, read the csv specifying to use the first column as index, and then set the Date index.
TRADER_READER = pd.read_csv('TastyTrades.csv', index_col=0)
TRADER_READER.set_index('Date', inplace=True, drop=True)

Related

appending pandas columns data

why can't the pandas data frame append appropriately to form one data frame in this loop?
#Produce the overall data frame
def processed_data(data1_,f_loc,open,close):
"""data1_: is the csv file to be modified
f_loc: is the location of csv files to be processed
open and close: are the columns to undergo computations
returns a new dataframe of modified columns"""
main_file=drop_col(data1_)#Dataframe to append more data columns to
for i in files_path(f_loc):
data=get_data_frame(i[0])#returns the dataframe, takes file path location of the csv file and returns the data frame
perc=perc_df(data,open,close,i[1])#Dataframe to append
copy_data=main_file.append(perc)
return copy_data
heres the output:
Date WTRX-USD
0 2021-05-27 NaN
1 2021-05-28 NaN
2 2021-05-29 NaN
3 2021-05-30 NaN
4 2021-05-31 NaN
.. ... ...
79 NaN -2.311576
80 NaN 5.653349
81 NaN 5.052950
82 NaN -2.674435
83 NaN -3.082957
[450 rows x 2 columns]
My intention is to return something like this(where each append operation adds a column):
Date Open High Low Close Adj Close Volume
0 2021-05-27 0.130793 0.136629 0.124733 0.128665 0.128665 70936563
1 2021-05-28 0.128659 0.129724 0.111244 0.113855 0.113855 71391441
2 2021-05-29 0.113752 0.119396 0.108206 0.111285 0.111285 62049940
3 2021-05-30 0.111330 0.115755 0.107028 0.112185 0.112185 70101821
4 2021-05-31 0.112213 0.126197 0.111899 0.125617 0.125617 83502219
.. ... ... ... ... ... ... ...
361 2022-05-23 0.195637 0.201519 0.185224 0.185231 0.185231 47906144
362 2022-05-24 0.185242 0.190071 0.181249 0.189553 0.189553 33312065
363 2022-05-25 0.189550 0.193420 0.183710 0.183996 0.183996 33395138
364 2022-05-26 0.184006 0.186190 0.165384 0.170173 0.170173 57218888
365 2022-05-27 0.170636 0.170660 0.165052 0.166864 0.166864 63560568
[366 rows x 7 columns]
pandas.concat
pandas.DataFrame.append has been deprecated. Use pandas.concat instead.
Combine DataFrame objects horizontally along the x-axis by passing in
axis=1
copy_data=pd.concat([copy_data,perc], axis=1)

How do I delete specific dataframe rows based on a columns value?

I have a pandas dataframe with 2 columns ("Date" and "Gross Margin). I want to delete rows based on what the value in the "Date" column is. This is my dataframe:
Date Gross Margin
0 2021-03-31 44.79%
1 2020-12-31 44.53%
2 2020-09-30 44.47%
3 2020-06-30 44.36%
4 2020-03-31 43.69%
.. ... ...
57 2006-12-31 49.65%
58 2006-09-30 52.56%
59 2006-06-30 49.86%
60 2006-03-31 46.20%
61 2005-12-31 40.88%
I want to delete every row where the "Date" value doesn't end with "12-31". I read some similar posts on this and the pandas.drop() function seemed to be the solution, but I haven't figured out how to use it for this specific case.
Please leave any suggestions as to what I should do.
you can try the following code, where you match the day and month.
df['Date'] = pd.to_datetime(df['Date'], format='%Y-%m-%d')
df = df[df['Date'].dt.strftime('%m-%d') == '12-31']
Assuming you have the date formatted as year-month-day
df = df[~df['Date'].str.endswith('12-31')]
If the dates are using a consistent format, you can do it like this:
df = df[df['Date'].str.contains("12-31", regex=False)]

How can I slice this dataframe?

I have a dataframe 'data' that looks like this:
<bound method NDFrame.head of Close ... Volume
A AA TSLA ... A AA TSLA
Date ...
2020-06-24 86.378616 11.14 960.849976 ... 1806600 7562700 10959600
2020-06-25 87.077148 11.83 985.979980 ... 1350100 6728600 9254500
2020-06-26 85.720001 10.93 959.739990 ... 2225800 25817600 8854900
2020-06-29 87.290001 10.99 1009.349976 ... 1302500 7397600 9026400
2020-06-30 88.370003 11.24 1079.810059 ... 1920200 5796600 16881600
[5 rows x 15 columns]>
Now, from this dataframe, I would like to get all the data for 'A' into a single dataframe.
I can do this via:
df2['Open'] = data['Open']['A']
df2['High'] = data['High']['A']
df2['Low'] = data['Low']['A']
etc.
And that works fine... However, there must be a smarter way to do this, right?
All help appreciated!
Sure, use DataFrame.xs for selecting in MultiIndex:
df2 = data.xs('A', axis=1, level=1)

Write to a dataframe or excel/csv file without overlapping in loop

Basically my algorithm creates rows such as:
[1 rows x 84 columns]
Date 1990-12-31 1991-09-30 1991-12-31 1992-03-31 1992-06-30 ... 2017-06-30 2018-12-31 2019-09-30 2019-12-31 2020-03-31
AEP 28.0 30.625 34.25 30.75 31.875 ... 69.470001 74.739998 93.690002 94.510002 79.980003
[1 rows x 84 columns]
Date 1990-12-31 1991-09-30 1991-12-31 1992-03-31 1992-06-30 ... 2017-06-30 2018-12-31 2019-09-30 2019-12-31 2020-03-31
HON 6.435244 8.639912 10.457272 12.03629 12.810903 ... 127.751709 132.119995 169.199997 177.0 133.789993
[1 rows x 84 columns]
Date 1990-12-31 1991-09-30 1991-12-31 1992-03-31 1992-06-30 ... 2017-06-30 2018-12-31 2019-09-30 2019-12-31 2020-03-31
BMY 15.942265 19.689886 20.998581 18.14325 15.674578 ... 55.720001 51.98 50.709999 64.190002 55.740002
My issue is to append these rows together in one df or excel file.
The function that creates these rows is called by a loop that has a list of the tickers. The problem is everytime I try to append or write something to a file it overwrites each previous ticker so in the end I end up with just variations of the BMY ticker.
This is the loop code, the function is "ticker"
list=["CAT","CVX","BA","AEP","HON","BMY"]
for i in list:
ticker(i)
def ticker(tick):
df = pd.read_csv (r"C:/Users/NAME/Desktop/S&P data/Data Compilation.csv")
df1=df.set_index(["Company Ticker"])
abt=pd.read_csv(r"C:/Users/NAME/Desktop/S&P data/"+tick+"/"+tick+".csv")
abt1=abt[['Close',"Date"]]
# I tried a lot of methods to join, I manually inputted the dates I need.
# The code then appends the ticker data Close & price into a new sheet in Data Compilation
output=abt1.join(df1,how='left')
output=output[output["Date"].isin(['2020-03-31','2019-12-31','2019-09-30' ,'2019-06-30' ,'2019-03-31' ,'2018-12-31' ,'2018-09-30' ,'2018-06-30' ,'2018-03-31' ,'2017-12-31' ,'2017-09-30' ,'2017-06-30' ,'2017-03-31' ,'2016-12-31' ,'2016-09-30' ,'2016-06-30' ,'2016-03-31' ,'2015-12-31' ,'2015-09-30' ,'2015-06-30' ,'2015-03-31' ,'2014-12-31' ,'2014-09-30' ,'2014-06-30' ,'2014-03-31' ,'2013-12-31' ,'2013-09-30' ,'2013-06-30' ,'2013-03-31' ,'2012-12-31' ,'2012-09-30' ,'2012-06-30' ,'2012-03-31' ,'2011-12-31' ,'2011-09-30' ,'2011-06-30' ,'2011-03-31' ,'2010-12-31' ,'2010-09-30' ,'2010-06-30' ,'2010-03-31' ,'2009-12-31' ,'2009-09-30' ,'2009-06-30' ,'2009-03-31' ,'2008-12-31' ,'2008-09-30' ,'2008-06-30' ,'2008-03-31' ,'2007-12-31' ,'2007-09-30' ,'2007-06-30' ,'2007-03-31' ,'2006-12-31' ,'2006-09-30' ,'2006-06-30' ,'2006-03-31' ,'2005-12-31' ,'2005-09-30' ,'2005-06-30' ,'2005-03-31' ,'2004-12-31' ,'2004-09-30' ,'2004-06-30' ,'2004-03-31' ,'2003-12-31' ,'2003-09-30' ,'2003-06-30' ,'2003-03-31' ,'2002-12-31' ,'2002-09-30' ,'2002-06-30' ,'2002-03-31' ,'2001-12-31' ,'2001-09-30' ,'2001-06-30' ,'2001-03-31' ,'2000-12-31' ,'2000-09-30' ,'2000-06-30' ,'2000-03-31' ,'1999-12-31' ,'1999-09-30' ,'1999-06-30' ,'1999-03-31' ,'1998-12-31' ,'1998-09-30' ,'1998-06-30' ,'1998-03-31' ,'1997-12-31' ,'1997-09-30' ,'1997-06-30' ,'1997-03-31' ,'1996-12-31' ,'1996-09-30' ,'1996-06-30' ,'1996-03-31' ,'1995-12-31' ,'1995-09-30' ,'1995-06-30' ,'1995-03-31' ,'1994-12-31' ,'1994-09-30' ,'1994-06-30' ,'1994-03-31' ,'1993-12-31' ,'1993-09-30' ,'1993-06-30' ,'1993-03-31' ,'1992-12-31' ,'1992-09-30' ,'1992-06-30' ,'1992-03-31' ,'1991-12-31' ,'1991-09-30' ,'1991-06-30' ,'1991-03-31' ,'1990-12-31' ,'1990-09-30' ,'1990-06-30' ,'1990-03-31'])]
output=output.pivot_table(values='Close',columns='Date',aggfunc='first')
output=output.rename(index={"Close":tick})
print(output)
return output
If you want to merge rows in one dataframe with same columns, below code may do the work:
df = pd.DataFrame()
list=["CAT","CVX","BA","AEP","HON","BMY"]
for i in list:
responseDf = ticker(i)
df = df.append(responseDf)
print(df)
"df" is your main dataframe and in each loop result dataframe from ticker function is added to the main dataframe by the "append" function.

Get data using row / col reference from two column values in another data frame

df1
Date APA AR BP-GB CDEV ... WLL WPX XEC XOM CL00-USA
0 2018-01-01 42.22 19.00 5.227 19.80 ... 26.48 14.07 122.01 83.64 60.42
1 2018-01-02 44.30 19.78 5.175 20.00 ... 27.37 14.31 125.51 85.03 60.37
2 2018-01-03 45.33 19.78 5.242 20.33 ... 27.99 14.39 126.20 86.70 61.63
3 2018-01-04 46.84 19.80 5.300 20.37 ... 28.11 14.44 128.66 86.82 62.01
4 2018-01-05 46.39 19.44 5.296 20.12 ... 27.79 14.24 127.82 86.75 61.44
df2
Date Ticker Event_Type Event_Description Price add
0 2018-11-19 XEC M&A REN 88.03 1
1 2018-03-28 CXO M&A RSPP 143.25 1
2 2018-08-14 FANG M&A EGN 133.75 1
3 2019-05-09 OXY M&A APC 56.33 1
4 2019-08-26 PDCE M&A SRCI 29.65 1
My goal is to update df2.['add'] by using df2['Ticker'] and df2['Date'] to pull the value from df1 ... so for example the first row in df2 is XEC on 2018-11-19... the code needs to first look at df1[XEC] and then pull the value that matches the 2018-11-19 row in df[Date]
My attempt was:
df_Events['add'] = df_Prices.loc[[df_Prices['Date']==df_Events['Date']],[df_Prices.columns==df_Events['Ticker']]]
Try:
df2 = df2.merge(df1.melt(value_vars=df1.columns.tolist()[1:], id_vars='date', value_name="add", var_name='Ticker').reset_index(), how='left')`
This should change df1 Tickers columns to a single column, and than merge the values in that column to df2.
One more approach may be as below (I had started looking at it, so I am putting here even though you have accepted the answer)
First convert dates into datetime object in both dataframes & set it as index ony in the first one (code below)
df1['Date']=pd.to_datetime(df1['Date'])
df1.set_index('Date',inplace=True)
df2['Date']=pd.to_datetime(df2['Date'])
Then use apply to get the values for each of the columns.
df2['add']=df2.apply(lambda x: df1.loc[(x['Date']),(x['Ticker'])], axis=1)
This will work only if dates & values for all tickers exist in both dataframes (hence will throw as 'KeyError'

Categories

Resources