With the following code, I'm trying to get the historical prices from Yahoo for a symbol. But, when I run the code below, I get the error message:
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
How to fix this issue?
# import modules
import pandas
from pandas_datareader import data as pdr
import yfinance as yfin
yfin.pdr_override()
import matplotlib.pyplot as plt
# initializing Parameters
start = "2020-01-01"
end = "2021-01-01"
symbols = ["AAPL"]
# Getting the data
data = pdr.get_data_yahoo(symbols, start, end)
# Display
plt.figure(figsize = (20,10))
plt.title('Opening Prices from {} to {}'.format(start, end))
plt.plot(data['Open'])
plt.show()
Related
I am trying to get my code to show the piechart but I don't see the visualization after execution.
import pandas as pd
import numpy as np
import plotly.express as px
data = pd.read_csv("C:\\Users\\nasir\\credit card.csv")
print(data.head())
print(data.isnull().sum())
# Exploring transaction type
print(data.type.value_counts())
type = data["type"].value_counts()
transactions = type.index
quantity = type.values
figure = px.pie(data, values=quantity, names=transactions, hole=0.5, title="Distribution of Transaction Type")
figure.show()
I am working on a project to create an algorithmic trader. However, I want to remove the weekends from my data frame as it ruins the data as shown in I have tried to do somethings I found on StackOverflow but I get an error that the type is Timestamp and so I can't use that technique. It also isn't a column in the data frame. I'm new to python so I'm not very sure but I think it's an index since when I go through the .index function it shows me the date and time. I'm sorry if these are stupid questions but I am new to python and pandas.
Here is my code:
#import all the libraries
import nsetools as ns
import pandas as pd
import numpy
import matplotlib.pyplot as plt
from datetime import datetime
import yfinance as yf
plt.style.use('fivethirtyeight')
a = input("Enter the ticker name you wish to apply strategy to")
ticker = yf.Ticker(a)
hist = ticker.history(period="1mo", interval="15m")
print(hist)
plt.figure(figsize=(12.5, 4.5))
plt.plot(hist['Close'], label=a)
plt.title('close price history')
plt.xlabel("13 Nov 2020 too 13 Dec 2020")
plt.ylabel("Close price")
plt.legend(loc='upper left')
plt.show()
EDIT: On the suggestion of a user, I tried to modify my code to this
refinedlist = hist[hist.index.dayofweek<5]plt.style.use('fivethirtyeight')
a = input("Enter the ticker name you wish to apply strategy to")
ticker = yf.Ticker(a)
hist = ticker.history(period="1mo", interval="15m")
refinedlist = hist[hist.index.dayofweek<5]
print (refinedlist)
And graphed that, but the graph still includes the weekends on the x axis.
In the first place, stock market data does not exist because the market is closed on holidays and national holidays. The reason for this is that your unit of acquisition is time, so there is also no data from the time the market closes to the time it opens the next day.
For example, I graphed the first 50 results. (The x-axis doesn't seem to be correct.)
plt.plot(hist['Close'][:50], label=a)
As one example, if you include holidays and national holidays and draw a graph with missing values for the times when the market is not open, you get the following.
new_idx = pd.date_range(hist.index[0], hist.index[-1], freq='15min')
hist = hist.reindex(new_idx, fill_value=np.nan)
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from datetime import datetime
import yfinance as yf
# plt.style.use('fivethirtyeight')
# a = input("Enter the ticker name you wish to apply strategy to")
a = 'AAPL'
ticker = yf.Ticker(a)
hist = ticker.history(period="1mo", interval="15m")
new_idx = pd.date_range(hist.index[0], hist.index[-1], freq='15min')
hist = hist.reindex(new_idx, fill_value=np.nan)
plt.figure(figsize=(12.5, 4.5))
plt.plot(hist['Close'], label=a)
plt.title('close price history')
plt.xlabel("13 Nov 2020 too 13 Dec 2020")
plt.ylabel("Close price")
plt.legend(loc='upper left')
plt.show()
I was looking at a sample table of student information and wanted to see what days were the most popular for students to enroll on a course. The script worked fine the first day I ran it and I left it. A few days later I returned to take another look at it but started getting the ValueError message.
Why did it stop working? No new information was added to the dataset
since it worked the first time. The code now fails at
df["EnrolmentDate"] = pd.to_datetime(df.EnrolmentDate)
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
import seaborn as sns
colors = sns.cubehelix_palette(28, rot=-0.4)
df = pd.read_csv("data.csv")
#print(df.dtypes)
# To change the format of the data type from object to datetime. Has to be run at start of script or the format returns to object.
#df['Enrolment Date'] = pd.to_datetime(df['Enrolment Date'])
print(df.EnrolmentDate.str.slice(0, 10))
df["EnrolmentDate"] = pd.to_datetime(df.EnrolmentDate)
print(df.head())
print(df.dtypes)
#Tells us what day of the week the enrolment date was. Can also use .dayofyear. Google Pandas API Reference, search for .dt., datetime properties
print(df.EnrolmentDate.dt.weekday_name)
#Shows the latest or greatest enrolment date
print(df.EnrolmentDate.max())
print(df.EnrolmentDate.min())
print(df.EnrolmentDate.max()-df.EnrolmentDate.min())
df["EnrolmentDay"] = df.EnrolmentDate.dt.weekday_name
print(df.head())
print(df.EnrolmentDay.value_counts())
print(df.EnrolmentDay.value_counts().plot())
#print(df.Day.value_counts().sort_index())
#df.EnrolmentDay.value_counts().sort_index().plot()
# naming the x axis
plt.xlabel('Day')
# naming the y axis
plt.ylabel('No. of Enrolments')
plt.show()
I'm trying to import data from both iex and FRED. Although both time series are over the same time period, when I graph them together the data does not show up correctly on the same x axis. I suspect this is due to differences between how to iex dates are formatted and how the FRED dates are formatted.
Code below:
import matplotlib.pyplot as plt
import pandas as pd
from pandas_datareader.data import DataReader
from datetime import date
start = date(2016,1,1)
end = date(2016,12,31)
ticker = 'AAPL'
data_source = 'iex'
stock_prices = DataReader(ticker, data_source, start, end)
print(stock_prices.head())
stock_prices.info()
stock_prices['close'].plot(title=ticker)
plt.show()
series = 'DCOILWTICO'
start = date(2016,1,1)
end = date(2016,12,31)
oil = DataReader(series,'fred',start,end)
print(oil.head())
oil.info()
data = pd.concat([stock_prices[['close']],oil],axis=1)
print(data.head())
data.columns = ['AAPL','Oil Price']
data.plot()
plt.show()
Using join instead of pd.concat will give you what you want:
data = stock_prices[['close']].join(oil)
Main issue with pd.concat is that the index of your data are not aligned, therefore the weird stiched DataFrame. pd.join will take care of the misalignment
I am trying to plot a real-time data getting loaded in dataframe. But the attempts have led to printing of multiple blank graph frames in response to dynamic data feed, instead of plotting the data in single frame of graph.
I am implementing a solution to perform sentiment analysis on live twitter stream. I am able to stream the tweets, put them into a DataFrame and apply the required sentiment analysis algorithm on them one by one. I created a column in the DataFrame which holds the compound value generated by that algorithm for an individual tweet.
This DataFrame is getting dynamically updated as the tweets stream and the intent is to plot this real time updated compound value against time.
I have tried plotting the graph as per mentioned advises of using plt.ion(), plt.draw() instead of plt.show() functions etc. But instead of plotting one frame which gets updated with the values, the program starts printing multiple frames one after another as the data gets updated in the DataFrame.
import pandas as pd
import csv
from bs4 import BeautifulSoup
import re
import tweepy
import ast
from pytz import timezone
from datetime import datetime
import matplotlib.pyplot as plt
import time
from tweepy import Stream
from tweepy import OAuthHandler
from tweepy.streaming import StreamListener
import json
from textblob import TextBlob
from unidecode import unidecode
import nltk
from nltk.sentiment.vader import SentimentIntensityAnalyzer
ckey= '#######'
csecret= '#######'
atoken= '#########'
asecret= '#########'
class listener(StreamListener):
def on_data(self,data):
try:
global df
data=json.loads(data)
time = data["created_at"]
tweet = unidecode(data["text"])
tweet1 = BeautifulSoup(tweet,"lxml").get_text()
df = pd.DataFrame(columns = ['time','tweet'])
df['time'] = pd.Series(time)
df['tweet'] = pd.Series(tweet1)
def convert_time(time):
eastern = timezone('US/Eastern')
utc = timezone('UTC')
created_at = datetime.strptime(time, '%a %b %d %H:%M:%S %z %Y')
est_created_at = created_at.astimezone(eastern)
return (est_created_at)
df['time'] = df['time'].apply(convert_time)
def hour(time):
hour = pd.DatetimeIndex(time).hour
return hour
df['hour'] = df['time'].apply(hour)
def sentiment_analysis(tweet):
sid = SentimentIntensityAnalyzer()
return (sid.polarity_scores(tweet)['compound'])
df['compound'] = df['tweet'].apply(sentiment_analysis)
#print(df['compound'])
#print(df['time'])
plt.ion()
fig, ax = plt.subplots()
df.plot(y=df'compound', ax=ax)
ax.clear()
ax.axis([ 0, 24, -5,5])
plt.xlabel('Time')
plt.ylabel('Sentiment')
plt.draw()
plt.pause(0.2)
except KeyError as e:
print(str(e))
return (True)
auth=OAuthHandler(ckey,csecret)
auth.set_access_token(atoken,asecret)
twitterStream = Stream(auth, listener())
twitterStream.filter(track=["######"])
Expected Result - One frame of graph getting updated and plotting the real-time data.
Actual Result - Multiple blank graphs
I apologize if i have missed on any information/point.