panda.DataFrame error for analyzing stock data - python

I was coding a Stock Analyzer program following this guide: https://towardsdatascience.com/in-12-minutes-stocks-analysis-with-pandas-and-scikit-learn-a8d8a7b50ee7
I got stuck on the part of the code which said
dfreg = df.loc[:,['Adj Close','Volume']]
dfreg['HL_PCT'] = (df['High'] - df['Low']) / df['Close'] * 100.0
dfreg['PCT_change'] = (df['Close'] - df['Open']) / df['Open'] * 100.0
First, it gave this error:
NameError: name 'df' is not defined
I changed it to pandas.DataFrame and it gave me this error:
TypeError: 'property' object is not subscriptable
I don't know how to fix this. Please help.

Did you do?:
import pandas as pd
import datetime
import pandas_datareader.data as web
from pandas import Series, DataFrame
start = datetime.datetime(2010, 1, 1)
end = datetime.datetime(2017, 1, 11)
df = web.DataReader("AAPL", 'yahoo', start, end)
df.tail()
If df.tail() don`t show you the dataframe clean your workspace and try again because it show like you haven't load corectly the dataFrame --> df

Related

Error while subtracting 2 date columns in pandas

I have a dataframe and a function to get random dates..
from datetime import date, timedelta
import pandas as pd
import random
def dates(start_date, end_date):
start_date = date(start_date[0], start_date[1], start_date[2])
end_date = date(end_date[0], end_date[1], end_date[2])
days_delta = (end_date - start_date).days
return start_date + timedelta(days=random.randrange(days_delta))
df = pd.DataFrame(index=range(100))
df['MOVE_OUT_DATE'] = date(9999, 12, 31)
df['MOVE_IN_DATE'] = [dates((2021, 1, 1), (2021, 6, 30)) for _ in range(df.shape[0])]
To get the difference in days I do this,
df['days_diff'] = df['MOVE_OUT_DATE'] - df['MOVE_IN_DATE']
and this works fine in VS Code. But it throws a "Python int too large to convert to C long" in Databricks. A screenshot of error is attached below,
Any help or suggestion is appreciated. Thank you.
I was able to get everything to work and I believe it is what you are trying to accomplish with your code
df = pd.DataFrame(pd.date_range('2021-01-01', '2021-06-01', freq = 'D'), columns = ['START_DATE'])
df['MOVE_OUT_DATE'] = '2260-12-31'
df['START_DATE'] = pd.to_datetime(df['START_DATE'])
df['MOVE_OUT_DATE'] = pd.to_datetime(df['MOVE_OUT_DATE'])
df['DAYS_DIFF'] = df['MOVE_OUT_DATE'] - df['START_DATE']
df
However, if you notice the 'MOVE_OUT_DATE' is only set to 2060 as anything long than that produced an error as the being to long. Could you take this and generate the results you want (if you converted it into a function)?

AssertionError: <class 'pandas.core.indexes.datetimes.DatetimeIndex'>

I got the assertation error for the following code:
import pandas as pd
import datetime as dt
import pandas_datareader.data as web
stocks = ['AMZN','TCEHY']
start = dt.datetime(2019, 6, 1)
end = dt.datetime(2020, 6, 1)
data = web.DataReader(stocks,data_source='yahoo',start=start, end= end)['Adj Close']
this is the error message that I got:
> AssertionError: <class 'pandas.core.indexes.datetimes.DatetimeIndex'>
any feedback is appreciated. thanks

Keyerror 'Date' when using pandas datareader

I am trying to get the value of Bitcoin from yahoo finance using pandas data reader, and then save this data to a csv file. Where is the error here, and how do I fix it?
import pandas as pd
import pandas_datareader.data as web
start = dt.datetime(2017, 1, 1)
end = dt.datetime(2019, 11, 30)
df = web.DataReader('BTC', 'yahoo', start, end)
df.to_csv('BTC.csv')
print(df.head())
This was coded in spyder, python 3.7 if it is relevant...
This should work. Use 'BTC-USD' stock/security value:
import pandas as pd
import pandas_datareader.data as web
import datetime as dt
start = dt.datetime(2017, 1, 1)
end = dt.datetime(2019, 11, 30)
df = web.DataReader('BTC-USD', 'yahoo', start, end)
df.to_csv('BTC.csv')
print(df.head())
or
df = web.get_data_yahoo('BTC-USD', start, end)
I received the 'Keyerror 'Date' when using pandas datareader' error and found two errors in my script that fixed the issue:
The name of the entity was incorrect, for example using 'APPL' instead of 'AAPL'.
There was no data for the date parameters I was using.
Hope this helps!

Python error: cannot add integral value to Timestamp without freq

I am trying calculate the difference between two dates to get a number that is an integer difference (in days) between the two dates, but I get the following error: "Cannot add integral value to Timestmp without freq". Here is the code:
from __future__ import print_function
try:
import argparse
flags = argparse.ArgumentParser(parents=[tools.argparser]).parse_args()
except ImportError:
flags = None
import os
import datetime
import pandas_datareader.data as web
import numpy as np
import pandas as pd
def main():
count = 0
df = pd.DataFrame([])
start = datetime.datetime(2017, 10, 11)
end = datetime.datetime(2017, 10, 27)
index_date = datetime.datetime(2017, 10, 11)
symbols_list = ['ORCL', 'TSLA', 'IBM','YELP', 'MSFT']
length = len(symbols_list)
for num, ticker in enumerate(symbols_list, start=1):
f = web.DataReader(ticker, 'yahoo', start, end)['Adj Close']
f.ix[index_date]
if count == 0:
f = f.to_frame().reset_index()
df = f
df.columns = ['Date', ticker]
length_df = len(df)
sDate = df.iloc[:,-2] # Date data list
print ('sDate[0] is: ', (sDate[0]))
j = 0
while j < len(sDate[j] - 1):
date_delta = timedelta(sDate[j] - index_date)
j += 1
It crashes at the last line:
date_delta = timedelta(sDate[j] - index_reference_date)
The error message is: "Cannot add integral value to Timestmp without freq".
I cannot understand what the problem is. The data types are:
sDate[0] is: 2017-10-06 00:00:00, and
index_date is: 2017-10-11 00:00:00
index_date type is: <type 'datetime.datetime'>
But note that:
sDate[0] type is: <class 'pandas._libs.tslib.Timestamp'>
So: Maybe the problem is here? Thanks for any help!
There is a typing error on this line:
while j < len(sDate[j] - 1):
sDate is a date data list, thus sDate[j] is a date (probably of type pandas.tslib.Timestamp) and it's length does not make sense. So you probably want something like:
while j < len(sDate) - 1:
Maybe it's more appropriate to use a for loop, something like:
for dat in sDate[:-1]:
Edit: and then you need the thinks I wrote to the first answer.
The important thing may be the type of the difference sDate[j] - index_reference_date and how to pass it to timedelta constructor.
I believe this could be the solution:
date_delta = timedelta(microseconds=(sDate[j] - index_reference_date).delta)

Get the Week Number between two Dates Pandas

I have a basic code snippet that I need to recreate in pandas:
import datetime as dt
date1 = dt.date(2016,10,10)
date2 = dt.date.today()
print('Week#', round((date2 - date1).days / 7 +.5))
# output: Week# 36
I have a datetime64[ns] datatype column and I cannot crack it. Using this basic example I'm stumped:
import pandas as pd
import numpy as np
import datetime as dt
dfp = pd.DataFrame({'A' : [dt.date(2016,10,6)]})
dfp['A'] = pd.to_datetime(dfp['A'])
def week(col):
print((col.dt.date - dt.date(2015, 10, 6)))
week(dfp['A']) #output: 366 days
When I try re-creating the week number calculation I'm running into multiple errors: print((col.dt.date - dt.date(2015, 10, 6)).days) returns AttributeError: 'Series' object has no attribute 'days'
I'd like to try and solve this on my own so I can learn from the pain.
How do I return the pandas column values in terms of "number of days" or as an int like using the first calculation in the first code snippet? (ie, instead of 366 days, just 366)
If you're feeling adventurous how could i get the Week# xxx output in a more efficient way?
I think you forget .dt:
dfp = pd.DataFrame({'A' : [date2]})
dfp['A'] = pd.to_datetime(dfp['A'])
print (dfp)
print (((dfp['A'].dt.date - dt.date(2016, 10, 10)).dt.days / 7 + .5).round().astype(int))
0 36
Name: A, dtype: int32

Categories

Resources