Related
I'm trying to run multiple event studies in Python with the eventstudy package. Still, I keep getting the exact date error no matter what I do (different date formats, naming or not the columns, setting the date_format parameter,...)
Would you know what is wrong or how else I could do these multiple event studies?
This is my code now:
import eventstudy as es
import pandas as pd
returns = "C:/Users/Artur Andrade/OneDrive/Documents/_Others/Monografia/Eu/Base de dados/RETURNS.csv"
events = "C:/Users/Artur Andrade/OneDrive/Documents/_Others/Monografia/Eu/Base de dados/TESTE.csv"
es.Single.import_returns("C:/Users/Artur Andrade/OneDrive/Documents/_Others/Monografia/Eu/Base de dados/RETURNS.csv", is_price=True)
energy = es.Multiple.from_csv(
path = "C:/Users/Artur Andrade/OneDrive/Documents/_Others/Monografia/Eu/Base de dados/TESTE.csv",
event_study_model = es.Single.market_model,
event_window = (-5,+10),
estimation_size = 100,
buffer_size = 30,
ignore_errors = True
)
energy.results()
energy.plot()
And I keep getting this error:
Traceback (most recent call last):
File "C:\Users\Artur Andrade\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\indexes\base.py", line 3621, in get_loc
return self._engine.get_loc(casted_key)
File "pandas\_libs\index.pyx", line 136, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\index.pyx", line 163, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas\_libs\hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'date'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\Users\Artur Andrade\OneDrive\Documents\_Others\Monografia\Eu\Base de dados\codigo\evento.py", line 11, in <module>
es.Single.import_returns("C:/Users/Artur Andrade/OneDrive/Documents/_Others/Monografia/Eu/Base de dados/RETURNS.csv", is_price=True)
File "C:\Users\Artur Andrade\AppData\Local\Programs\Python\Python310\lib\site-packages\eventstudy\single.py", line 327, in import_returns
data = read_csv(path, format_date=True, date_format=date_format)
File "C:\Users\Artur Andrade\AppData\Local\Programs\Python\Python310\lib\site-packages\eventstudy\utils.py", line 112, in read_csv
df[date_column] = pd.to_datetime(df[date_column], format=date_format)
File "C:\Users\Artur Andrade\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\frame.py", line 3505, in __getitem__
indexer = self.columns.get_loc(key)
File "C:\Users\Artur Andrade\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\indexes\base.py", line 3623, in get_loc
raise KeyError(key) from err
KeyError: 'date'
My database right now is just like this:
RETURNS.csv:
Date,IBOV,IEE,AESB3,EGIE3,ENBR3,EQTL3,MEGA3,NEOE3,ENEV3
2017-01-02,"59,589","36,062",14, ,11.18,10.53, , ,2.97
.
.
.
2022-09-23,"111,716","82,969",9.64,40.7,23.71,26.97,10.72,16.48,15.34
2022-09-26,"109,114","80,913",9.6,39.7,23.16,26.71,10.42,16.11,14.81
2022-09-27,"108,376","79,121",9.55,39.07,22.72,26.15,10.36,15.72,14.37
2022-09-28,"108,451","78,427",9.44,38.54,22.04,26.1,10.59,15.35,14.26
2022-09-29,"107,664","77,950",9.37,38.29,21.72,26.01,10.38,15.23,14.45
TESTE.csv:
security_ticker,market_ticker,event_date
AESB3,IBOV,2017-01-13
.
.
.
ENBR3,IBOV,2022-04-20
EQTL3,IBOV,2021-06-02
MEGA3,IBOV,2022-07-04
ENEV3,IBOV,2021-12-15
I want to create a Polygon from a list of coordinates:
import pandas as pd
from shapely.geometry import Point, Polygon
data = pd.read_csv('path.csv', sep=';')
the data is in the following format
Suburb
features_geometry_x
features_geometry_y
1
50.941840
6.9595637
1
50.941845
6.9595698
3
50.94182
6.9595632
4
50.9418837
6.9595958
with several rows for suburb 1, 3 and 4
#create a polygon
I = data.loc[data['Suburb'] == 1]
I['coordinates'] = list(zip(I['features_geometry_x'], I['features_geometry_y']))
poly_i = Polygon(I['coordinates'])
the code above works fine but if I do the same thing for suburb 3 and 4 it yields the following error:
L = data.loc[data['Suburb'] == 3]
L['coordinates'] = list(zip(L['features_geometry_x'], L['features_geometry_y']))
poly_l = Polygon(L['coordinates'])
File "shapely/speedups/_speedups.pyx", line 252, in shapely.speedups._speedups.geos_linearring_from_py
File "/Users/Jojo/opt/anaconda3/lib/python3.8/site-packages/pandas/core/generic.py", line 5487, in getattr
return object.getattribute(self, name)
AttributeError: 'Series' object has no attribute 'array_interface'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/Jojo/opt/anaconda3/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3361, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 76, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 2131, in pandas._libs.hashtable.Int64HashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 2140, in pandas._libs.hashtable.Int64HashTable.get_item
KeyError: 0
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/var/folders/j6/wgg72kmx145f3krf14nzjfq40000gn/T/ipykernel_4092/214655495.py", line 3, in
poly_l = Polygon(Lindenthal['coordinates'])
File "/Users/Jojo/opt/anaconda3/lib/python3.8/site-packages/shapely/geometry/polygon.py", line 261, in init
ret = geos_polygon_from_py(shell, holes)
File "/Users/Jojo/opt/anaconda3/lib/python3.8/site-packages/shapely/geometry/polygon.py", line 539, in geos_polygon_from_py
ret = geos_linearring_from_py(shell)
File "shapely/speedups/_speedups.pyx", line 344, in shapely.speedups._speedups.geos_linearring_from_py
File "/Users/Jojo/opt/anaconda3/lib/python3.8/site-packages/pandas/core/series.py", line 942, in getitem
return self._get_value(key)
File "/Users/Jojo/opt/anaconda3/lib/python3.8/site-packages/pandas/core/series.py", line 1051, in _get_value
loc = self.index.get_loc(label)
File "/Users/Jojo/opt/anaconda3/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3363, in get_loc
raise KeyError(key) from err
KeyError: 0
Please help :)
I think the issue here is that you need more than one data point to create a polygon where as your suburb 2 and 3 each got only a single point.
trying to calculate some variables from yfinance from the column df['Close'].
But im getting this error which i have not seen before. and heres are the code:
import os
import pandas as pd
import plotly.graph_objects as go
symbols = 'AAPL'
for filename in os.listdir('datasets/'):
#print(filename)
symbol = filename.split('.')[0]
#print(symbol)
df = pd.read_csv('datasets/{}'.format(filename))
if df.empty:
continue
df['20_sma'] = df['Close'].rolling(window=20).mean()
df['stddev'] = df['Close'].rolling(window=20).std()
df['lowerband'] = df['20_sma'] + (2* df['stddev'])
df['upperband'] = df['20_sma'] - (2* df['stddev'])
if symbol in symbols:
print(df)
and heres are the error message:
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 2895, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1675, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1683, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'Close'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/Kit/Documents/TTM_squeezer/squeeze.py", line 16, in <module>
df['20_sma'] = df['Close'].rolling(window=20).mean()
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/frame.py", line 2906, in __getitem__
indexer = self.columns.get_loc(key)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 2897, in get_loc
raise KeyError(key) from err
KeyError: 'Close'
Seems like the 'Close' column has contributed to this error but i just cant figure out why?
Many thanks
turns out there was an error in the process where the local file was saved
case closed, thanks all
Following script:
import pandas as pd
import numpy as np
import math
A = pd.DataFrame(np.array([[1,2,3,4],[5,6,7,8]]))
Floor1 = math.floor(A.min()[1]/2)*2
names = np.array([ 0. , 0.635, 1.27 , 1.905])
A.columns = names
Floor2 = math.floor(A.min()[1]/2)*2
Floor1 is being executed correctly, Floor2 which is done with the same df but with renamed columns isn't. I get a key error:
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexes\base.py", line 2646, in get_loc
return self._engine.get_loc(key)
File "pandas\_libs\index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\hashtable_class_helper.pxi", line 385, in pandas._libs.hashtable.Float64HashTable.get_item
File "pandas\_libs\hashtable_class_helper.pxi", line 392, in pandas._libs.hashtable.Float64HashTable.get_item
KeyError: 1.0
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\Desktop\Python\untitled0.py", line 13, in <module>
Floor2 = math.floor(A.min()[1]/2)*2
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\series.py", line 871, in __getitem__
result = self.index.get_value(self, key)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexes\numeric.py", line 449, in get_value
loc = self.get_loc(k)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexes\numeric.py", line 508, in get_loc
return super().get_loc(key, method=method, tolerance=tolerance)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexes\base.py", line 2648, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas\_libs\index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\hashtable_class_helper.pxi", line 385, in pandas._libs.hashtable.Float64HashTable.get_item
File "pandas\_libs\hashtable_class_helper.pxi", line 392, in pandas._libs.hashtable.Float64HashTable.get_item
KeyError: 1.0
I know, there is a similar question: After rename column get keyerror
But I didn't really get the answer and - more important - how to solve it.
Before renaming if you get the columns of A using list(A.columns), you'll see that you'll get the list [0,1,2,3]. So, you can index using the key 1. However, after renaming, you can no longer index with key 1 because the column names have changed.
If you are using A.min(), you are finding minimum value in axis=0 by default that is along columns.
When changing the column names, you cannot access index '1' as there is no index with the name '1' in the columns.
If Your intension is finding the minimum in a row, you can use A.min(axis=1).
You can write the code like this.
import pandas as pd
import numpy as np
import math
A = pd.DataFrame(np.array([[1,2,3,4],[5,6,7,8]]))
Floor1 = math.floor(A.min(axis=1)[1]/2)*2
names = np.array([ 0. , 0.635, 1.27 , 1.905])
A.columns = names
Floor2 = math.floor(A.min(axis=1)[1]/2)*2
Thank you
I want to get all the stocks from sp500 to a folder in csv format.
Now while scanning the sp500 everything works great but it seems to be that in some cases the index referred to date is missing because stock doesn't exist or has no date for a specific time, whatever I tried to change startdate and enddate but no effect - in en earlier post I was said to filter those dates with an exception but due to python is new land for me I was like an alien... is there someone who can help me?
If this error occurs:
/home/mu351i/PycharmProjects/untitled/venv/bin/python /home/mu351i/PycharmProjects/untitled/get_sp500_beautifulsoup_intro.py
Traceback (most recent call last):
File "/home/mu351i/PycharmProjects/untitled/venv/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2897, in get_loc
return self._engine.get_loc(key)
File "pandas/_libs/index.pyx", line 107, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1607, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'Date'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/mu351i/PycharmProjects/untitled/get_sp500_beautifulsoup_intro.py", line 44, in get_data_from_yahoo
df = web.DataReader (ticker, 'yahoo', start, end)
File "/home/mu351i/PycharmProjects/untitled/venv/lib/python3.7/site-packages/pandas/util/_decorators.py", line 208, in wrapper
return func(*args, **kwargs)
File "/home/mu351i/PycharmProjects/untitled/venv/lib/python3.7/site-packages/pandas_datareader/data.py", line 387, in DataReader
session=session,
File "/home/mu351i/PycharmProjects/untitled/venv/lib/python3.7/site-packages/pandas_datareader/base.py", line 251, in read
df = self._read_one_data(self.url, params=self._get_params(self.symbols))
File "/home/mu351i/PycharmProjects/untitled/venv/lib/python3.7/site-packages/pandas_datareader/yahoo/daily.py", line 165, in _read_one_data
prices["Date"] = to_datetime(to_datetime(prices["Date"], unit="s").dt.date)
File "/home/mu351i/PycharmProjects/untitled/venv/lib/python3.7/site-packages/pandas/core/frame.py", line 2995, in getitem
indexer = self.columns.get_loc(key)
File "/home/mu351i/PycharmProjects/untitled/venv/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2899, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/_libs/index.pyx", line 107, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1607, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'Date'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/mu351i/PycharmProjects/untitled/get_sp500_beautifulsoup_intro.py", line 57, in
get_data_from_yahoo()
File "/home/mu351i/PycharmProjects/untitled/get_sp500_beautifulsoup_intro.py", line 48, in get_data_from_yahoo
except RemoteDataError:
NameError: name 'RemoteDataError' is not defined
Process finished with exit code 1
how would you avoid this by changing this code?
import datetime as dt
import os
import pickle
import bs4 as bs
import pandas_datareader.data as web
import requests
def safe_sp500_tickers():
resp = requests.get('https://en.wikipedia.org/wiki/List_of_S%26P_500_companies')
soup = bs.BeautifulSoup(resp.text,'lxml')
table = soup.find('table',{'class':'wikitable sortable'})
tickers = []
for row in table.findAll('tr')[1:]:
ticker=row.findAll('td')[0].text.strip()
tickers.append(ticker)
with open('sp500tickers.pickle','wb') as f:
pickle.dump(tickers,f)
return tickers
safe_sp500_tickers()
def get_data_from_yahoo(reload_sp500=False):
if reload_sp500:
tickers=safe_sp500_tickers()
else:
with open('sp500tickers.pickle', 'rb') as f:
tickers = pickle.load(f)
if not os.path.exists('stock_dfs'):
os.makedirs('stock_dfs')
start = dt.datetime(1999,1,1)
end = dt.datetime(2019,12,19)
for ticker in tickers:
try:
if not os.path.exists ('stock_dfs/{}.csv'.format (ticker)):
df = web.DataReader (ticker, 'yahoo', start, end)
df.to_csv ('stock_dfs/{}.csv'.format (ticker))
else:
print ("Ticker from {} already availablle".format (ticker))
except RemoteDataError:
print ("No information for ticker '%s'" % i)
continue
except KeyError:
print("no Date for Ticker: " +ticker )
continue
get_data_from_yahoo()
A Commentator asked for some DATA Sample, well this is DATA form TSLA.csv
Date,High,Low,Open,Close,Volume,Adj Close
2010-06-29,25.0,17.540000915527344,19.0,23.889999389648438,18766300,23.889999389648438
2010-06-30,30.420000076293945,23.299999237060547,25.790000915527344,23.829999923706055,17187100,23.829999923706055
2010-07-01,25.920000076293945,20.270000457763672,25.0,21.959999084472656,8218800,21.959999084472656
2010-07-02,23.100000381469727,18.709999084472656,23.0,19.200000762939453,5139800,19.200000762939453
2010-07-06,20.0,15.829999923706055,20.0,16.110000610351562,6866900,16.110000610351562
2010-07-07,16.6299991607666,14.979999542236328,16.399999618530273,15.800000190734863,6921700,15.800000190734863
2010-07-08,17.520000457763672,15.569999694824219,16.139999389648438,17.459999084472656,7711400,17.459999084472656
2010-07-09,17.899999618530273,16.549999237060547,17.579999923706055,17.399999618530273,4050600,17.399999618530273
2010-07-12,18.06999969482422,17.0,17.950000762939453,17.049999237060547,2202500,17.049999237060547
2010-07-13,18.639999389648438,16.899999618530273,17.389999389648438,18.139999389648438,2680100,18.139999389648438
2010-07-14,20.149999618530273,17.760000228881836,17.940000534057617,19.84000015258789,4195200,19.84000015258789
2010-07-15,21.5,19.0,19.940000534057617,19.889999389648438,3739800,19.889999389648438
2010-07-16,21.299999237060547,20.049999237060547,20.700000762939453,20.639999389648438,2621300,20.639999389648438
Please provide constructive feedback because I'new here.
Thanks :)
You are missing an import
Add the following import at the top of your script
from pandas_datareader._utils import RemoteDataError
import pandas as pd
df = pd.read_html(
"https://en.wikipedia.org/wiki/List_of_S%26P_500_companies")[0]
sort = pd.DataFrame(df).sort_values(by=['Date first added'])
sort['Date first added'] = pd.to_datetime(sort['Date first added'])
start_date = '1-1-1999'
end_date = '11-12-2019'
mask = (sort['Date first added'] > start_date) & (
sort['Date first added'] <= end_date)
sort = sort.loc[mask]
pd.DataFrame(sort).to_csv('result.csv', index=False)
Output: View Online
ScreenShot: