Work with data in python and numpy/pandas

Work with data in python and numpy/pandas - python

so I started learning how to work with data in python. I wanted to load multiple securities. But I have an error that I can not fix for some reason. Could someone tell me what is the problem?
import numpy as np
import pandas as pd
from pandas_datareader import data as wb
import matplotlib.pyplot as plt
tickers = ['PG', 'MSFT', 'F', 'GE']
mydata = pd.DataFrame()
for t in tickers:
mydata[t] = wb.DataReader(t, data_source='yahoo', start = '1955-1-1')

you need 2 fixes here:
1) 1955 is too early for this data source, try 1971 or later.
2) your data from wb.DataReader(t, data_source='yahoo', start = '1971-1-1') comes as dataframe with multiple series, so you can not save it to mydata[t] as single series. Use a dictionary as in the other answer or save only closing prices:
mydata[t] = pdr.data.DataReader(t, data_source='yahoo', start = '2010-1-1')['Close']

First of all please do not share information as images unless absolutely necessary.
See: this link
Now here is a solution to your problem. You are using year '1955' but there is a possibility that data is not available for this year or there may be some other issues. But when you select the right year it will work. Another thing it returns data as dataframe so you can not assign it like a dictionary so instead of making a DataFram you should make a dictionary and store all dataframes into it.
Here is improved code choose year carefully
import numpy as np
import pandas as pd
from pandas_datareader import data as wb
import matplotlib.pyplot as plt
from datetime import datetime as dt
tickers = ['PG', 'MSFT', 'F', 'GE']
mydata = {}
for t in tickers:
mydata[t] = wb.DataReader(t, data_source='yahoo',start=dt(2019, 1, 1), end=dt.now())
Output
mydata['PG']
High Low Open Close Volume Adj Close
Date
2018-12-31 92.180000 91.150002 91.629997 91.919998 7239500.0 88.877655
2019-01-02 91.389999 89.930000 91.029999 91.279999 9843900.0 88.258835
2019-01-03 92.500000 90.379997 90.940002 90.639999 9820200.0 87.640022
2019-01-04 92.489998 90.370003 90.839996 92.489998 10565700.0 89.428787

Related

Pandas groupby using only year and month

I have a Python program using Pandas, which reads two dataframes, obtained in the following links:
Casos-positivos-diarios-en-San-Nicolas-de-los-Garza-Promedio-movil-de-7-dias: https://datamexico.org/es/profile/geo/san-nicolas-de-los-garza#covid19-evolucion
Denuncias-segun-bien-afectado-en-San-Nicolas-de-los-GarzaClic-en-el-grafico-para-seleccionar: https://datamexico.org/es/profile/geo/san-nicolas-de-los-garza#seguridad-publica-denuncias
What I currently want to do is a groupby in the "covid" dataframe with the same dates, having a sum of these. Regardless, no method has worked out, which regularly prints an error indicating that I should be using a syntaxis for "PeriodIndex". Does anyone have a suggestion or solution? Thanks in advance.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib notebook
#csv for the covid cases
covid = pd.read_csv('Casos-positivos-diarios-en-San-Nicolas-de-los-Garza-Promedio-movil-de-7-dias.csv')
#csv for complaints
comp = pd.read_csv('Denuncias-segun-bien-afectado-en-San-Nicolas-de-los-GarzaClic-en-el-grafico-para-seleccionar.csv')
#cleaning data in both dataframes
#keeping only the relevant columns
covid = covid[['Month','Daily Cases']]
comp = comp[['Month','Affected Legal Good', 'Value']]
#changing the labels from spanish to english
comp['Affected Legal Good'].replace({'Patrimonio': 'Heritage', 'Familia':'Family', 'Libertad y Seguridad Sexual':'Sexual Freedom and Safety', 'Sociedad':'Society', 'Vida e Integridad Corporal':'Life and Bodily Integrity', 'Libertad Personal':'Personal Freedom', 'Otros Bienes Jurídicos Afectados (Del Fuero Común)':'Other Affected Legal Assets (Common Jurisdiction)'}, inplace=True, regex=True)
#changing the month types to dates
covid['Month'] = pd.to_datetime(covid['Month'])
covid['Month'] = covid['Month'].dt.to_period('M')
covid

You can simply usen group by statement.Timegrouper by default converts it to datetime
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib notebook
#csv for the covid cases
covid = pd.read_csv('Casos-positivos-diarios-en-San-Nicolas-de-los-Garza-Promedio-movil-de-7-dias.csv')
covid = covid.groupby(['Month'])['Daily Cases'].sum()
covid = covid.reset_index()
# #changing the month types to dates
covid['Month'] = pd.to_datetime(covid['Month'])
covid['Month'] = covid['Month'].dt.to_period('M')
covid

Python-3, Pandas datareader and Yahoo Error RemoteDataError: Unable to read URL

I am trying to use yahoo finance for importing stock data
I am using the code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('whitegrid')
plt.style.use("fivethirtyeight")
%matplotlib inline
# For reading stock data from yahoo
from pandas_datareader.data import DataReader
# For time stamps
from datetime import datetime
It is running fine.
from pandas_datareader import data as pdr
import yfinance as yf
yf.pdr_override() # <== that's all it takes :-)
# download dataframe
# The tech stocks we'll use for this analysis
tech_list = ['WIPRO.BO', 'INFY.BO', 'TCS.BO', 'HAPPSTMNDS.BO']
# Set up End and Start times for data grab
end = datetime.now()
start = datetime(end.year - 1, end.month, end.day)
#For loop for grabing yahoo finance data and setting as a dataframe
for stock in tech_list:
# Set DataFrame as the Stock Ticker
globals()[stock] = pdr.get_data_yahoo(stock, start, end)
While running below mentioned code I am getting an error:
company_list = ['WIPRO.BO', 'INFY.BO', 'TCS.BO', 'HAPPSTMNDS.BO']
company_name = ["Wipro", "Infosys", "Tata_Consultancy_Services", "Happiest_Minds_Technologies"]
for company, com_name in zip(company_list, company_name):
company["company_name"] = com_name
df = pd.concat(company_list, axis=0)
df.tail(10)
Error Message:
TypeError Traceback (most recent call last)
<ipython-input-6-4753fcd8a7a3> in <module>
3
4 for company, com_name in zip(company_list, company_name):
----> 5 company["company_name"] = com_name
6
7 df = pd.concat(company_list, axis=0)
TypeError: 'str' object does not support item assignment
Please help me in solving this.
Thanks a lot ^_^

ARIMA is used for forecasting univariate time-series data. Not sure which feature you want to forecast. Came up with this one below:(Upvote if it works for you!)
#For loop for grabing yahoo finance data and setting as a dataframe
lt=[]
for stock in tech_list:
# Set DataFrame as the Stock Ticker
temp_df = pdr.get_data_yahoo(stock, start, end)
temp_df = temp_df.reset_index()
lt.append(temp_df)
# Each element in the list is a DataFrame
df = pd.concat([lt[0],lt[1],lt[2],lt[3]], axis=0)
df = df.reset_index(drop=True)
print(df.head())
Output:
Date Open High Low Close Adj Close Volume
0 2020-07-09 224.850006 224.850006 219.800003 221.600006 221.103027 198245
1 2020-07-10 221.600006 223.449997 219.449997 222.000000 221.502121 109461
2 2020-07-13 224.000000 229.000000 222.750000 227.550003 227.039673 385205
3 2020-07-14 229.000000 231.600006 224.199997 225.050003 224.545288 449975
4 2020-07-15 237.000000 265.500000 233.800003 262.950012 262.360291 6313161

The name of the fix_yahoo_finance package has been changed to yfinance. So please try this code.
from pandas_datareader import data as pdr
import yfinance as yf
yf.pdr_override() # <== that's all it takes :-)
# download dataframe
# The tech stocks we'll use for this analysis
tech_list = ['WIPRO.BO', 'INFY.BO', 'TCS.BO', 'HAPPSTMNDS.BO']
# Set up End and Start times for data grab
end = datetime.now()
start = datetime(end.year - 1, end.month, end.day)
#For loop for grabing yahoo finance data and setting as a dataframe
for stock in tech_list:
# Set DataFrame as the Stock Ticker
globals()[stock] = pdr.get_data_yahoo(stock, start, end)
The get_data_yahoo() method returns a pandas dataframe. So depending on what you want to do you can generate list of dataframes and concatenate the list together.

Unable to add values to a pandas DataFrame

I am trying to find the MACD(Moving Average Convergence Divergence) for a few stocks.I am using Pandas_ta, yfinance and pandas libraries. But When I am trying to add the Macd values to the dataframe I am getting this error:
IndexError: iloc cannot enlarge its target object
My code is :
import pandas as pd
import pandas_ta as ta
import yfinance as yf
import datetime as dt
import matplotlib.pyplot as plt
start=dt.datetime.today()-dt.timedelta(365)
end=dt.datetime.today()
zscore=pd.DataFrame()
rsi=pd.DataFrame()
tickers=['2060.SR' , '2160.SR', '3002.SR', '4007.SR', '3005.SR', '3004.SR' , '2150.SR']
macd=pd.DataFrame()
for i in tickers:
df=pd.DataFrame(yf.download(i, start=start, end=end, interval="1mo"))
df.columns = map(str.lower, df.columns)
macd=df.ta.macd()
Can someone let me know where my mistake is and how to solve this error. thanks

I am not sure which line gave you this error.
But please note that in the loop you are not adding data, but you are re-writing the data again and again:
for i in tickers:
df=pd.DataFrame(yf.download(i, start=start, end=end, interval="1mo"))
If you want to append, do the following:
agg_df = pd.DataFrame()
for i in tickers:
df=pd.DataFrame(yf.download(i, start=start, end=end, interval="1mo"))
agg_df = agg_df.append(df)

df=df.merge(macd, on="Date")

I used df.append(row) in the past which is deprecated since pandas 1.4.
most logical for me is the approach:
df.loc[len(df)] = ['list', 'of', 'elements'] # len(df.columns)
other methods are provided here: https://sparkbyexamples.com/pandas/how-to-append-row-to-pandas-dataframe/

Select several years pandas dataframe

I am trying to select several years from a dataframe in monthly resolution.
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import netCDF4 as nc
#-- open net-cdf and read in variables
data = nc.Dataset('test.nc')
time = nc.num2date(data.variables['Time'][:],
data.variables['Time'].units)
df = pd.DataFrame(data.variables['mgpp'][:,0,0], columns=['mgpp'])
df['dates'] = time
df = df.set_index('dates')
print(df.head())
This is what the head looks like:
mgpp
dates
1901-01-01 0.040735
1901-02-01 0.041172
1901-03-01 0.053889
1901-04-01 0.066906
Now I managed to extract one year:
df_cp = df[df.index.year == 2001]
but how would I extract several years, say 1997, 2001 and 2007 and have them stored in the same dataframe? Is there a one/ two line solution? My only idea for now is to iterate and then merge the dataframes but maybe there is a better solution!

Pandas: 52 week high from yahoo or google finance

Does anyone know if you can get the 52 week high in pandas from either yahoo or google finance? Thanks.

It is possible, please check out pandas documentation. Here's an example:
import pandas.io.data as web
import datetime
symbol = 'aapl'
end = datetime.datetime.now()
start = end - datetime.timedelta(weeks=52)
df = web.DataReader(symbol, 'yahoo', start, end)
highest_high = df['High'].max()

One can also use yfinance(from yahoo)
pip install finance
import yfinance as yf
stock = "JNJ"
dataframe = yf.download(stock, period="1y", auto_adjust=True, prepost=True, threads=True)
max = dataframe['High'].max()

You could also use other libraries such as yahoo_fin. This one works better sometimes, it would depend on what you want to do, but it's good to bear in mind other possibilities : )
import yfinance as yf
import yahoo_fin.stock_info as si
stock = 'AAPL'
df = yf.download(stock, period="1y")
print("$",round(df['High'].max(), 2))
df2 = si.get_data(stock, interval="1mo")
print("$",round(df2['high'].tail(12).max(), 2))
Output:
$ 182.94
$ 182.94

You can use the info keyword to return lots of aggregated data like P/E Ratio, 52-Week High,etc.
import yfinance as yf
data = yf.Ticker(ticker).info
print(data.fiftyTwoWeekHigh)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Work with data in python and numpy/pandas - python

Related

Pandas groupby using only year and month

Python-3, Pandas datareader and Yahoo Error RemoteDataError: Unable to read URL

Unable to add values to a pandas DataFrame

Select several years pandas dataframe

Pandas: 52 week high from yahoo or google finance

Categories

Resources