How to Connect MySQL database to AFS server - python

I have a python code that scrapes real-time data from yahoo finance (using finance library), then I convert it into a Dataframe and now I need to store this data into a MySQL table using (sqlalchemy library) The code is running perfectly fine in my laptop and it is storing data in the desired table on my local system. Now, I am running this code on AFS remote server (I don't know how many of you are familiar with AFS, it is a remote server which universities use). Now, the problem that I am facing is I cannot set up a MySQL server on AFS, or I cannot connect MySQL to AFS. Here's my code-
import pandas as pd
import yfinance as yf
from pandas.io import sql
import sys
from sys import argv
from sqlalchemy import create_engine
import requests
link = "link from where the ticker list is taken"
f = requests.get(link)
ticker= f.text
ticker= ticker.strip('[]').split('\n')
ticker= list(filter(None, ticker))
data= yf.download(tickers= ticker,period= '1d',interval='1m',group_by='Ticker')
column_list = data.columns
comp_list = list(set([clm[0] for clm in column_list]))
temp_df = data[comp_list[0]]
temp_df.insert(0,'Ticker','')
temp_df['Ticker'] = comp_list[0]
for comp_name in comp_list[1:]:
new_df = data[comp_name]
new_df['Ticker'] = comp_name
temp_df = pd.concat([temp_df, new_df], axis=0)
final_data = temp_df.reset_index()
final_data=final_data.sort_values(by='Datetime',ascending=False)
final_data= final_data.dropna()
engine = create_engine('mysql+pymysql://username:password#localhost/database')
pandas_sql = pd.io.sql.pandasSQL_builder(engine)
final_data.to_sql('stocks',con=engine,
if_exists='replace',index=False)
print('task_done')```
#Please help me connect MySQL to AFS and save the data in a database table. Any leads would be appreciated.

Related

Connecting excel file to pgAdmin table using Python

I have excel file which has 7 columns and daily informations. I'm trying to connect excel file to pgAdmin table. But not achieving to this, it's not working. Please, help me.
My database name is "exceltodatabase". Table name is "daily". It has 7 columns. Hostname is "localhost" . Port is: 5432. Password is: 1234
import pandas as pd
from sqlalchemy import create_engine
import psycopg2
engine = create_engine('postgresql+psycopg2://postgres:1234#localhost/exceltodatabase')
with pd.ExcelFile('C:/Users/Administrator/PycharmProjects/TelegramBot/ActivateBot/masters.xlsx') as xls:
df = pd.read_excel(xls)
df.to_sql(name='daily', con=engine, if_exists='append', index=False)

Reading a MySQL query with Python where output is empty

I'm trying to connect MySQL with python in order to automate some reports. By now, I'm just testing the connection. Seems it's working but here comes the problem: the output from my Python code is different from the one that I get in MySQL.
Here I attach the query used and the output that I can find in MySQL:
The testing query for the Python connection:
SELECT accountID
FROM Account
WHERE accountID in ('340','339','343');
The output from MySQL (using Dbiever). For this test, the column chosen contains integers:
accountID
1 339
2 340
3 343
Here I attach the actual output from my Python code:
today:
20200811
Will return true if the connection works:
True
Empty DataFrame
Columns: [accountID]
Index: []
In order to help you understand the problem, please find attached my python code:
import pandas as pd
import json
import pymysql
import paramiko
from datetime import date, time
tiempo_inicial = time()
today = date.today()
today= today.strftime("%Y%m%d")
print('today:')
print(today)
#from paramiko import SSHClient
from sshtunnel import SSHTunnelForwarder
**(part that contains all the connection information, due to data protection this part can't be shared)**
print('will return true if connection works:')
print(conn.open)
query = '''SELECT accountId
FROM Account
WHERE accountID in ('340','339','343');'''
data = pd.read_sql_query(query, conn)
print(data)
conn.close()
Under my point of view doesn't have a sense this output as the connection is working and the query it's being tested previously in MySQL with a positive output. I tried with other columns that contain names or dates and the result doesn't change.
Any idea why I'm getting this "Empty DataFrame" output?
Thanks

Importing a .sql file in python

I have just started learning SQL and I'm having some difficulties to import my sql file in python.
The .sql file is in my desktop, as well is my .py file.
That's what I tried so far:
import codecs
from codecs import open
import pandas as pd
sqlfile = "countries.sql"
sql = open(sqlfile, mode='r', encoding='utf-8-sig').read()
pd.read_sql_query("SELECT name FROM countries")
But I got the following message error:
TypeError: read_sql_query() missing 1 required positional argument: 'con'
I think I have to create some kind of connection, but I can't find a way to do that. Converting my data to an ordinary pandas DataFrame would help me a lot.
Thank you
This is the code snippet taken from https://www.dataquest.io/blog/python-pandas-databases/ should help.
import pandas as pd
import sqlite3
conn = sqlite3.connect("flights.db")
df = pd.read_sql_query("select * from airlines limit 5;", conn)
Do not read database as an ordinary file. It has specific binary format and special client should be used.
With it you can create connection which will be able to handle SQL queries. And can be passed to read_sql_query.
Refer to documentation often https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_sql_query.html
You need a database connection. I don't know what SQL flavor are you using, but suppose you want to run your query in SQL server
import pyodbc
con = pyodbc.connect(driver='{SQL Server}', server='yourserverurl', database='yourdb', trusted_connection=yes)
then pass the connection instance to pandas
pd.read_sql_query("SELECT name FROM countries", con)
more about pyodbc here
And if you want to query an SQLite database
import sqlite3
con = sqlite3.connect('pathto/example.db')
More about sqlite here

Can't Save Dataframe to Local Mac Machine

I am using a Databricks notebook and trying to export my dataframe as CSV to my local machine after querying it. However, it does not save my CSV to my local machine. Why?
Connect to Database
#SQL Connector
import pandas as pd
import psycopg2
import numpy as np
from pyspark.sql import *
#Connection
cnx = psycopg2.connect(dbname= 'test', host='test', port= '1234', user= 'test', password= 'test')
cursor = cnx.cursor()
SQL Query
query = """
SELECT * from products;
"""
# Execute the query
try:
cursor.execute(query)
except OperationalError as msg:
print ("Command skipped: ")
#Fetch all rows from the result
rows = cursor.fetchall()
# Convert into a Pandas Dataframe
df = pd.DataFrame( [[ij for ij in i] for i in rows] )
Exporting Data as CSV to Local Machine
df.to_csv('test.csv')
It does NOT give any error but when I go to my Mac machine's search icon to find "test.csv", it is not existent. I presume that the operation did not work, thus the file was never saved from the Databricks cloud server to my local machine...Does anybody know how to fix it?
Select from SQL Server:
import pypyodbc
cnxn = pypyodbc.connect("Driver={SQL Server Native Client 11.0};"
"Server=Server_Name;"
"Database=TestDB;"
"Trusted_Connection=yes;")
#cursor = cnxn.cursor()
#cursor.execute("select * from Actions")
cursor = cnxn.cursor()
cursor.execute('SELECT * FROM Actions')
for row in cursor:
print('row = %r' % (row,))
From SQL Server to Excel:
import pyodbc
import pandas as pd
# cnxn = pyodbc.connect("Driver={SQL Server};SERVER=xxx;Database=xxx;UID=xxx;PWD=xxx")
cnxn = pyodbc.connect("Driver={SQL Server};SERVER=EXCEL-PC\SQLEXPRESS;Database=NORTHWND;")
data = pd.read_sql('SELECT * FROM Orders',cnxn)
data.to_excel('C:\\your_path_here\\foo.xlsx')
Since you are using Databricks, you are most probably working on a remote machine. Like it was already mentioned, saving the way you do wont work (file will be save to the machine your notebooks master node is on). Try running:
import os
os.listdir(os.getcwd())
This will list all the files that are in directory from where notebook is running (at least it is how jupyter notebooks work). You should see saved file here.
However, I would think that Databricks provides a utility functions to their clients for easy data download from the cloud. Also, try using spark to connect to db - might be a little more convenient.
I think these two links should be useful for you:
Similar question on databricks forums
Databricks documentation
Because you're running this in a Databricks notebook, when you're using Pandas to save your file to test.csv, this is being saved to the Databricks driver node's file directory. A way to test this out is the following code snippet:
# Within Databricks, there are sample files ready to use within
# the /databricks-datasets folder
df = spark.read.csv("/databricks-datasets/samples/population-vs-price/data_geo.csv", inferSchema=True, header=True)
# Converting the Spark DataFrame to a Pandas DataFrame
import pandas as pd
pdDF = df.toPandas()
# Save the Pandas DataFrame to disk
pdDF.to_csv('test.csv')
The location of your test.csv is within the /databricks/driver/ folder of your Databricks' cluster driver node. To validate this:
# Run the following shell command to see the results
%sh cat test.csv
# The output directory is shown here
%sh pwd
# Output
# /databricks/driver
To save the file to your local machine (i.e. your Mac), you can view the Spark DataFrame using the display command within your Databricks notebook. From here, you can click on the "Download to CSV" button which is highlighted in red in the below image.

BigQuery insert dates into 'DATE' type field using Python Google Cloud library

I'm using Python 2.7 and the Google Cloud Client Library for Python (v0.27.0) to insert data into a BigQuery table (using table.insert_data()).
One of the fields in my table has type 'DATE'.
In my Python script I've formatted the date-data as 'YYYY-MM-DD', but unfortunately the Google Cloud library returns an 'Invalid date:' error for that field.
I've tried formatting the date-field in many ways (i.e. 'YYYYMMDD', timestamp etc.), but no luck so far...
Unfortunately the API docs (https://googlecloudplatform.github.io/google-cloud-python/latest/) don't mention anything about the required date format/type/object in Python.
This is my code:
from google.cloud import bigquery
import pandas as pd
import json
from pprint import pprint
from collections import OrderedDict
# Using a pandas dataframe 'df' as input
# Converting date field to YYYY-MM-DD format
df['DATE_VALUE_LOCAL'] = df['DATE_VALUE_LOCAL'].apply(lambda x: x.strftime('%Y-%m-%d'))
# Converting pandas dataframe to json
json_data = df.to_json(orient='records',date_format='iso')
# Instantiates a client
bigquery_client = bigquery.Client(project="xxx")
# The name for the new dataset
dataset_name = 'dataset_name'
table_name = 'table_name'
def stream_data(dataset_name, table_name, json_data):
dataset = bigquery_client.dataset(dataset_name)
table = dataset.table(table_name)
data = json.loads(json_data, object_pairs_hook=OrderedDict)
# Reload the table to get the schema.
table.reload()
errors = table.insert_data(data)
if not errors:
print('Loaded 1 row into {}:{}'.format(dataset_name, table_name))
else:
print('Errors:')
pprint(errors)
stream_data(dataset_name, table_name, json_data)
What is the required Python date format/type/object to insert my dates into a BigQuery DATE field?
I just simulated your code here and everything worked fine. Here's what I've simulated:
import pandas as pd
import json
import os
from collections import OrderedDict
from google.cloud.bigquery import Client
d = {'ed': ['3', '5'],
'date': ['2017-10-11', '2017-11-12']}
json_data = df.to_json(orient='records', date_formate='iso')
json_data = json.loads(json_data, object_pairs_hook=OrderedDict)
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = '/key.json'
bc = Client()
ds = bc.dataset('dataset name')
table = ds.table('table I just created')
table = bc.get_table(table)
bc.create_rows(table, json_data)
It's using version 0.28.0 but still it's the same methods from previous versions.
You probably have some mistake going on in some step that maybe is converting date to some other unidentifiable format for BQ. Try using this script as reference to see where the mistake might be happening in your code.

Categories

Resources