Real-time plotting of two columns of a dynamic DataFrame

Real-time plotting of two columns of a dynamic DataFrame - python

I am trying to plot a real-time data getting loaded in dataframe. But the attempts have led to printing of multiple blank graph frames in response to dynamic data feed, instead of plotting the data in single frame of graph.
I am implementing a solution to perform sentiment analysis on live twitter stream. I am able to stream the tweets, put them into a DataFrame and apply the required sentiment analysis algorithm on them one by one. I created a column in the DataFrame which holds the compound value generated by that algorithm for an individual tweet.
This DataFrame is getting dynamically updated as the tweets stream and the intent is to plot this real time updated compound value against time.
I have tried plotting the graph as per mentioned advises of using plt.ion(), plt.draw() instead of plt.show() functions etc. But instead of plotting one frame which gets updated with the values, the program starts printing multiple frames one after another as the data gets updated in the DataFrame.
import pandas as pd
import csv
from bs4 import BeautifulSoup
import re
import tweepy
import ast
from pytz import timezone
from datetime import datetime
import matplotlib.pyplot as plt
import time
from tweepy import Stream
from tweepy import OAuthHandler
from tweepy.streaming import StreamListener
import json
from textblob import TextBlob
from unidecode import unidecode
import nltk
from nltk.sentiment.vader import SentimentIntensityAnalyzer
ckey= '#######'
csecret= '#######'
atoken= '#########'
asecret= '#########'
class listener(StreamListener):
def on_data(self,data):
try:
global df
data=json.loads(data)
time = data["created_at"]
tweet = unidecode(data["text"])
tweet1 = BeautifulSoup(tweet,"lxml").get_text()
df = pd.DataFrame(columns = ['time','tweet'])
df['time'] = pd.Series(time)
df['tweet'] = pd.Series(tweet1)
def convert_time(time):
eastern = timezone('US/Eastern')
utc = timezone('UTC')
created_at = datetime.strptime(time, '%a %b %d %H:%M:%S %z %Y')
est_created_at = created_at.astimezone(eastern)
return (est_created_at)
df['time'] = df['time'].apply(convert_time)
def hour(time):
hour = pd.DatetimeIndex(time).hour
return hour
df['hour'] = df['time'].apply(hour)
def sentiment_analysis(tweet):
sid = SentimentIntensityAnalyzer()
return (sid.polarity_scores(tweet)['compound'])
df['compound'] = df['tweet'].apply(sentiment_analysis)
#print(df['compound'])
#print(df['time'])
plt.ion()
fig, ax = plt.subplots()
df.plot(y=df'compound', ax=ax)
ax.clear()
ax.axis([ 0, 24, -5,5])
plt.xlabel('Time')
plt.ylabel('Sentiment')
plt.draw()
plt.pause(0.2)
except KeyError as e:
print(str(e))
return (True)
auth=OAuthHandler(ckey,csecret)
auth.set_access_token(atoken,asecret)
twitterStream = Stream(auth, listener())
twitterStream.filter(track=["######"])
Expected Result - One frame of graph getting updated and plotting the real-time data.
Actual Result - Multiple blank graphs
I apologize if i have missed on any information/point.

Related

Read from sqlite by only give a set of data and stop

My sensor is giving updated data continuously as I can see from the database, SQLite. However, the data I get from it to display is just a set. It stops all after.
import sqlite3
import time
import datetime
import random
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from dateutil import parser
from matplotlib import style
style.use('fivethirtyeight')
conn = sqlite3.connect('sqlite2.db')
c = conn.cursor()
def graph_data():
c.execute('SELECT time, data FROM sensor')
time = []
data = []
for row in c.fetchall():
date = datetime.datetime.strptime(row[0], '%Y-%m-%d %H:%M:%S')
print(row[0])
graph_data()
c.close
conn.close()`
How do I get continuous live-time data when I print from c.fetchall?

Python - Get historical prices from Yahoo -Error message

With the following code, I'm trying to get the historical prices from Yahoo for a symbol. But, when I run the code below, I get the error message:
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
How to fix this issue?
# import modules
import pandas
from pandas_datareader import data as pdr
import yfinance as yfin
yfin.pdr_override()
import matplotlib.pyplot as plt
# initializing Parameters
start = "2020-01-01"
end = "2021-01-01"
symbols = ["AAPL"]
# Getting the data
data = pdr.get_data_yahoo(symbols, start, end)
# Display
plt.figure(figsize = (20,10))
plt.title('Opening Prices from {} to {}'.format(start, end))
plt.plot(data['Open'])
plt.show()

Get the median and average results of currency conversion using Elasticsearch

I already have the code in which it gets the information and turns it into JSON.
I am not sure how to retrieve the median and average conversion rates and have them print on the screen with the the conversion rates.
#import libraries to handle request to api
import requests
import json
import pprint
import pandas as pd
import matplotlib.pyplot as plt
#base currency or reference currency
base="USD"
#required currency for plot
out_curr="ILS"
#exchange data from a date
start_date="2020-01-01"
#exchange data till a date
end_date="2021-12-31"
#api url for request
url = 'https://api.exchangerate.host/timeseries?base={0}&start_date={1}&end_date={2}&symbols={3}'.format(base,start_date,end_date,out_curr)
response = requests.get(url)
#retrive response in json format
data = response.json()
pprint.pprint(data["rates"])
#create an empty array to store date and exchange rates
rates=[]
#extract dates and rates from each item of dictionary or json in the above created list
for i, j in data["rates"].items():
rates.append([i, j[out_curr]])
print(rates)
#create an data frame
import pandas as pd
df=pd.DataFrame(rates)
#define column names explicitely
df.columns=["date","rate"]
df
#Put dates on the x-axis
x = df['date']
#Put exchange rates on the y-axis
y = df['rate']
#Specify the width and height of a figure in unit inches
fig = plt.figure(figsize=(15, 6))
#Rotate the date ticks on the x-axis by degrees
plt.xticks(rotation=90)
#Set title on the axis
plt.xlabel('Date', fontsize=12)
plt.ylabel('Exchange Rates', fontsize=12)
#Plot the data
plt.plot(x,y)
plt.show()
I'm unable find a way to show that information.

Error ValueError: day is out of range for month

I was looking at a sample table of student information and wanted to see what days were the most popular for students to enroll on a course. The script worked fine the first day I ran it and I left it. A few days later I returned to take another look at it but started getting the ValueError message.
Why did it stop working? No new information was added to the dataset
since it worked the first time. The code now fails at
df["EnrolmentDate"] = pd.to_datetime(df.EnrolmentDate)
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
import seaborn as sns
colors = sns.cubehelix_palette(28, rot=-0.4)
df = pd.read_csv("data.csv")
#print(df.dtypes)
# To change the format of the data type from object to datetime. Has to be run at start of script or the format returns to object.
#df['Enrolment Date'] = pd.to_datetime(df['Enrolment Date'])
print(df.EnrolmentDate.str.slice(0, 10))
df["EnrolmentDate"] = pd.to_datetime(df.EnrolmentDate)
print(df.head())
print(df.dtypes)
#Tells us what day of the week the enrolment date was. Can also use .dayofyear. Google Pandas API Reference, search for .dt., datetime properties
print(df.EnrolmentDate.dt.weekday_name)
#Shows the latest or greatest enrolment date
print(df.EnrolmentDate.max())
print(df.EnrolmentDate.min())
print(df.EnrolmentDate.max()-df.EnrolmentDate.min())
df["EnrolmentDay"] = df.EnrolmentDate.dt.weekday_name
print(df.head())
print(df.EnrolmentDay.value_counts())
print(df.EnrolmentDay.value_counts().plot())
#print(df.Day.value_counts().sort_index())
#df.EnrolmentDay.value_counts().sort_index().plot()
# naming the x axis
plt.xlabel('Day')
# naming the y axis
plt.ylabel('No. of Enrolments')
plt.show()

Two time series won't graph on same x axis (date format issue)?

I'm trying to import data from both iex and FRED. Although both time series are over the same time period, when I graph them together the data does not show up correctly on the same x axis. I suspect this is due to differences between how to iex dates are formatted and how the FRED dates are formatted.
Code below:
import matplotlib.pyplot as plt
import pandas as pd
from pandas_datareader.data import DataReader
from datetime import date
start = date(2016,1,1)
end = date(2016,12,31)
ticker = 'AAPL'
data_source = 'iex'
stock_prices = DataReader(ticker, data_source, start, end)
print(stock_prices.head())
stock_prices.info()
stock_prices['close'].plot(title=ticker)
plt.show()
series = 'DCOILWTICO'
start = date(2016,1,1)
end = date(2016,12,31)
oil = DataReader(series,'fred',start,end)
print(oil.head())
oil.info()
data = pd.concat([stock_prices[['close']],oil],axis=1)
print(data.head())
data.columns = ['AAPL','Oil Price']
data.plot()
plt.show()

Using join instead of pd.concat will give you what you want:
data = stock_prices[['close']].join(oil)
Main issue with pd.concat is that the index of your data are not aligned, therefore the weird stiched DataFrame. pd.join will take care of the misalignment

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Real-time plotting of two columns of a dynamic DataFrame - python

Related

Read from sqlite by only give a set of data and stop

Python - Get historical prices from Yahoo -Error message

Get the median and average results of currency conversion using Elasticsearch

Error ValueError: day is out of range for month

Two time series won't graph on same x axis (date format issue)?

Categories

Resources