Pandas DataFrame not working as intended - python

I am new to Python and I'm trying to use it for finance, specifically plotting stock prices. I am using pandas and its DataFrame object, but for some reason I cannot obtain the data I need. The web.DataReader method works, as I tried it in another program, but my code does not. Here is my code:
import numpy as np
import pandas as pd
import pandas.io.data as web
symbols = ['AAPL', 'MSFT', 'GLD']
data=pd.DataFrame()
for sym in symbols:
data[sym] = web.DataReader(sym, data_source='yahoo', start='4/14/2014',end='01/30/2015')['Adj Close']
data.columns=symbols
print(data['AAPL'])
The output is an empty dataframe and I am not sure why, because DataReader does work as I tried it elsewhere.

An update of pandas to version >=0.17.1 should solve your problem. If you use conda (recommended) :
conda update pandas
will do.
After the update you will get a deprecation warning.
To avoid this install pandas-datareader:
conda install pandas-datareader
and change:
import pandas.io.data as web
into:
from pandas_datareader import data as web

Related

Attempting to extract stock prices using python

I am currently trying to extract stock prices from a list of stock codes contained on a csv file by using pandas and yfinance.
I have 145 companies I need to do this for, is there a way of doing it? As I have tried over a period of 5 days without success.
I just need to know if its possible and what would you recommend to achieve this.
yfinance.Ticker(ticker).history(start=start_date) gets you the data that you desire.
if you have a giant csv, with a field "ticker", you can create a pandas dataframe with the below:
import pandas as pd
import yfinance
def read_create_giant_df(file_in):
df = pd.read_csv(file_in)
out = []
for item in df["ticker"]:
ticker_df = yfinance.Ticker(item).history(start="1930-01-01")
ticker_df["ticker"] = item
out.append(ticker_df)
return pd.concat(out)
below code should work , if any module missing use command to install it .
pip install yfinance
pip install yahoofinancials
Run below code to get the data for Amazon-AMZN
import pandas as pd
import yfinance as yf
from yahoofinancials import YahooFinancials
amzn_df = yf.download('AMZN',
start='2019-01-01',
end='2019-12-31',
progress=False)
amzn_df.head()

Is it possibe to change similar libraries (Data Analysis) in Python within the same code?

I use the modin library for multiprocessing.
While the library is great for faster processing, it fails at merge and I would like to revert to default pandas in between the code.
I understand as per PEP 8: E402 conventions, import should be declared once and at the top of the code however my case would need otherwise.
import pandas as pd
import modin.pandas as mpd
import os
import ray
ray.init()
os.environ["MODIN_ENGINE"] = "ray"
df = mpd.read_csv()
do stuff
Then I would like to revert to default pandas within the same code
but how would i do the below in pandas as there does not seem to be a clear way to switch from pd and mpd in the below lines and unfortunately modin seems to take precedence over pandas.
df = df.loc[:, df.columns.intersection(['col1', 'col2'])]
df = df.drop_duplicates()
df = df.sort_values(['col1', 'col2'], ascending=[True, True])
Is it possible?
if yes, how?
You can simply do the following :
import modin.pandas as mpd
import pandas as pd
This way you have both modin as well as original pandas in memory and you can efficiently switch as per your need.
Since many have posted answers however in this particular case, as applicable and pointed out by #Nin17 and this comment from Modin GitHub, to convert from Modin to Pandas for single core processing of some of the operations like df.merge you can use
import pandas as pd
import modin.pandas as mpd
import os
import ray
ray.init()
os.environ["MODIN_ENGINE"] = "ray"
df_modin = mpd.read_csv() #reading dataframe into Modin for parallel processing
df_pandas = df_modin._to_pandas() #converting Modin Dataframe into pandas for single core processing
and if you would like to reconvert the dataframe to a modin dataframe for parallel processing
df_modin = mpd.DataFrame(df_pandas)
You can try pandarallel package instead of modin , It is based on similar concept : https://pypi.org/project/pandarallel/#description
Pandarallel Benchmarks : https://libraries.io/pypi/pandarallel
As #Nin17 said in a comment on the question, this comment from the Modin GitHub describes how to convert a Modin dataframe to pandas. Once you have a pandas dataframe, you call any pandas method on it. This other comment from the same issue describes how to convert the pandas dataframe back to a Modin dataframe.

how to apply Pandas.set_option (Python) to pandas.style objects

I have noticed that when we set some options for pandas DataFrames such as pandas.DataFrame('max_rows',10) it works perfectly for DataFrame objects.
However, it has no effect on Style objects.
Check the following code :
import pandas as pd
import numpy as np
data= np.zeros((10,20))
pd.set_option('max_rows',4)
pd.set_option('max_columns',10)
df=pd.DataFrame(data)
display(df)
display(df.style)
Which will result in :
I do not know how to set the properties for Style object.
Thanks.
Styler is developing its own options. The current version 1.3.0 of pandas has not got many. Perhaps only the styler.render.max_elements.
Some recent pull requests to the github repo are adding these features but they will be Stylers own version.
As #attack69 mentioned, styler has its own options under development.
However, I could mimic set_option(max_row) and set_option(max_columns) for styler objects.
Check the following code:
import pandas as pd
import numpy as np
data= np.zeros((10,20))
mx_rw=4
mx_cl=10
pd.set_option('max_rows',mx_rw)
pd.set_option('max_columns',mx_cl)
df=pd.DataFrame(data)
display(df)
print(type(df))
df.loc[mx_rw/2]='...'
df.loc[:][mx_cl/2]='...'
temp=list(range(0,int(mx_rw/2),1))
temp.append('...')
temp.extend(range(int(mx_rw/2)+1,data.shape[0],1))
df.index=temp
del temp
temp=list(range(0,int(mx_cl/2),1))
temp.append('...')
temp.extend(range(int(mx_cl/2)+1,data.shape[1],1))
df.columns=temp
del temp
df=df.drop(list(range(int(mx_rw/2)+1,data.shape[0]-int(mx_rw/2),1)),0)
df=df.drop(list(range(int(mx_cl/2)+1,data.shape[1]-int(mx_cl/2),1)),1)
df=df.style.format(precision=1)
display(df)
print(type(df))
which both DataFrame and Styler object display the same thing.

How do i import datasets in Python?

I try to import some datasets in my code. I need help, because I tried a lot of tutorials and web pages and I am still gettting errors. I use Spyder IDE and python 3.7:
import numpy as np
import pandas as pd
import tensorflow as tf
import os
dts1=pd.read_csv(r"C:\Users\Cucu\Desktop\sample_submission.csv")
dts1
This works for me. If you are still experiencing errors, please post them.
import pandas as pd
# Read data from file 'sample_submission.csv'
# (in the same directory that your python process is based)
# Control delimiters, rows, column names with read_csv (see later)
data = pd.read_csv(r"C:\Users\Cucu\Desktop\sample_submission.csv")
# Preview the first 5 lines of the loaded data
print(data.head())
Try using other approaches :
pd.read_csv("C:\\Users\\Cucu\\Desktop\\sample_submission.csv")
pd.read_csv("C:/Users/Cucu/Desktop/sample_submission.csv")

Set matplotlib backend from Pandas

I am currently facing the following issue. I have a couple of Python scripts that plot some useful information using the Python module Pandas which uses Matplotlib .
As far as I understand matplotlib let set its backend as described on the accepted answer to this question.
I would like to set the matplotlib backend from Pandas:
Is it possible?
How can I do it?
EDIT 1:
By the way my code looks like:
import pandas as pd
from pandas import DataFrame, Series
class MyPlotter():
def plot_from_file(self, stats_file_name, f_name_out, names,
title='TITLE', x_label='x label', y_label='y label'):
df = pd.read_table(stats_file_name, index_col=0, parse_dates=True,
names= names)
plot = df.plot(lw=2,colormap='jet',marker='.',markersize=10,title=title,figsize=(20, 15))
plot.set_xlabel(x_label)
plot.set_ylabel(y_label)
fig = plot.get_figure()
fig.savefig(f_name_out)
plot.cla()
I've just applied the solution posted on the this question and it worked out.
In others words, my code imports looked as:
import pandas as pd
from pandas import DataFrame, Series
After applying the solution the imports look as:
import pandas as pd
from pandas import DataFrame, Series
import matplotlib
matplotlib.use('pdf')
import matplotlib.pyplot as plt
I know I am answering my own question, but I am doing so in case someone can find it useful.

Categories

Resources