Pandas dataframe: No numeric data to plot - python

I have a table stored in a database in MySQL.
I fetched the results using MySQL connector and copied it to DataFrame. There are no problems till that.
Now as I'm taking the results from MySQL, they are in the form of strings, so I've converted the strings to int for the values of CONFIRMED_CASES, and leave the STATE_NAME as str.
Now I want to plot it in a bar graph, with Numeric data as CONFIRMED_CASES and state names as STATE_NAME but it shows me this error:
Traceback (most recent call last):
File "c:\Users\intel\Desktop\Covid Tracker\get_sql_data.py", line 66, in <module>
fd.get_graph()
File "c:\Users\intel\Desktop\Covid Tracker\get_sql_data.py", line 59, in get_graph
ax = df.plot(y='STATE_NAME', x='CONFIRMED_CASES',
...
raise TypeError("no numeric data to plot")
TypeError: no numeric data to plot
Here's my code:
from operator import index
from sqlalchemy import create_engine
import mysql.connector
import pandas as pd
import matplotlib.pyplot as plt
mydb = mysql.connector.connect(
host="localhost",
user="abhay",
password="1234",
database="covid_db"
)
mycursor = mydb.cursor()
class fetch_data():
def __init__(self):
...
def get_data(self, cmd):
...
def get_graph(self):
mydb = mysql.connector.connect(
host="localhost",
user="abhay",
password="1234",
database="covid_db"
)
mycursor = mydb.cursor(buffered=True)
mycursor.execute(
"select CONFIRMED_CASES from india;")
query = mycursor.fetchall()
# the data was like 10,000 so I removed the ',' and converted it to int
query = [int(x[0].replace(',', '')) for x in query]
print(query)
query2 = mycursor.execute(
"select STATE_NAME from india;")
query2 = mycursor.fetchall()
# this is the query for state name and i kept it as str only...
query2 = [x[0] for x in query2]
print(query2)
df = pd.DataFrame({
'CONFIRMED_CASES': [query],
'STATE_NAME': [query2],
})
# it gives me the error here...
ax = df.plot(y='STATE_NAME', x='CONFIRMED_CASES',
kind='bar')
ax.ticklabel_format(style='plain', axis='y')
plt.show()
fd = fetch_data()
fd.get_graph()
I don't know why there is no numeric value. I set it to int, but still...

Your dataframe is defined using lists of a list. query and query2 are lists already. Define your dataframe as:
df = pd.DataFrame({
'CONFIRMED_CASES': query, # no brackets
'STATE_NAME': query2, # no brackets
})

Use below code for converting data to integer before plotting graph.
df['CONFIRMED_CASES'] = df['CONFIRMED_CASES'].astype('int64')

Heyy guys,
It's solved.
I used query.values and it worked...

Related

Year is Out of Range Cx_Oracle Python

The below code gives me usually results to query data, except this time...
def oracle(user, pwd, dsn, sql, columns):
# Connection to databases
con = cx_Oracle.connect(user=user, password=pwd, dsn=dsn, encoding="UTF-8")
cur = con.cursor()
# Check Connection
print('Connected')
# Create DF
df = pd.DataFrame(cur.execute(sql).fetchall(), columns= columns , dtype = 'str')
print('Shape:', df.shape)
return df
Below is the error.
ValueError Traceback (most recent call last)
<timed exec> in <module>
<timed exec> in oracle_aml(user, pwd, dsn, sql)
<timed exec> in oracle(user, pwd, dsn, sql, columns)
ValueError: year -7 is out of range
Question: How can I overpass this Warning? It's says that for some date columns, the value = -7. This is due to a misspelling in DB.
I thought to add the below expression in order to ignore columns types but not really helpful.
dtype = 'str'
Thanks to anyone helping!
Thanks to this link, I have been able to solve my problem
Below is the full code used (worked for me)
import cx_Oracle
import datetime
import os
os.environ['NLS_DATE_FORMAT'] = 'YYYY-MM-DD HH24:MI:SS'
def DateTimeConverter(value):
if value.startswith('9999'):
return None
return datetime.datetime.strptime(value, '%Y-%m-%d %H:%M:%S')
def OutputHandler(cursor, name, defaulttype, length, precision, scale):
if defaulttype == cx_Oracle.DATETIME:
return cursor.var(cx_Oracle.STRING, arraysize=cursor.arraysize, outconverter=DateTimeConverter)
def oracle(user, pwd, dsn, sql, columns):
# Connection to databases
con = cx_Oracle.connect(user=user, password=pwd, dsn=dsn, encoding="UTF-8")
con.outputtypehandler = OutputHandler
# Cursor allows Python code to execute PostgreSQL command in a database session
cur = con.cursor()
# Check Connection
print('Connected')
# Create DF
df = pd.DataFrame(cur.execute(sql).fetchall(), columns= columns, dtype='object')[:]
print('Shape:', df.shape)
return df

Is it possible to write a python list into a database with pymssql?

I am tyring to write a python list via pymssql into a database table.
I am aiming to write each list entry into a different row of the same column.
When trying this I am getting the error:
ValueError: 'params' arg () can be only a tuple or a dictionary.
Is there a way with pymssql or should I use something else?
My code:
from bs4 import BeautifulSoup as bs
import re
import pandas as pd
from collections.abc import Iterable
import pymssql
conn = pymssql.connect(
host='x',
port=x,
user='x',
password='x',
database='x'
)
cursor = conn.cursor()
cursor.execute('SELECT x FROM x')
text = cursor.fetchall()
conn.close()
raw = []
raw.append(text)
raw1 = str(raw)
soup = bs(raw1, 'html.parser')
autor = soup.get_text()
clear = []
s = autor.replace('\\n', '')
clear.append(s)
conn = pymssql.connect(
host='x',
port=x,
user='x',
password='x',
database='x'
)
cursor = conn.cursor()
cursor.execute('INSERT INTO mytablename (columnname) VALUES (?);', [','.join(clear)])
conn.close()
You can use executemany:
cursor.executemany('INSERT INTO mytablename (columnname) VALUES (%s);', clear)

How to store data in MySQL database via python

This is the script that I wrote so far. The first blocker that I find is that I am not being able to install MySQLdb package - Maybe I could use a different module?
import soundcloud
import pandas as pd
from pandas import DataFrame
import MySQLdb
client =
soundcloud.Client(client_id='696b5ca70f5401cc46c9011c78831877')
userId = '110652450'
tracks = client.get('/users/'+userId+'/tracks')
data = []
for x in tracks:
data.append({'Track_Name':x.title,'plays':str(x.playback_count)})
df = pd.DataFrame(data)
database = MySQLdb.connect (host="127.0.0.1",user ="root",passwd="XXX",db="soundcloudstore")
cursor = database.cursor()
query = """INSERT INTO Tracks (Track_Name, Plays) VALUES (%s,%s)"""
for x in df:
Track_Name = df[['Track_Name']].value
Plays = df[['plays']].value
values = (Track_Name, Plays)
cursor.execute(query, values)
cursor.close()
database.commit()
database.close()
Download the adapter here: https://dev.mysql.com/doc/connector-python/en/connector-python-installation.html
Then you would use it as so:
import mysql.connector
data = []
for x in tracks:
data.append((x.title, str(x.playback_count)))
conn = mysql.connector.connect(user='', password='',
host='',
database='')
cursor = conn.cursor()
q = """INSERT INTO Tracks (Track_Name, Plays) VALUES (%s,%s)"""
cursor.executemany(q, data)
This will save you from loading into a dataframe for no reason and executemany is optimized for inserts.

creating a pandas dataframe from a database query that uses bind variables

I'm working with an Oracle database. I can do this much:
import pandas as pd
import pandas.io.sql as psql
import cx_Oracle as odb
conn = odb.connect(_user +'/'+ _pass +'#'+ _dbenv)
sqlStr = "SELECT * FROM customers"
df = psql.frame_query(sqlStr, conn)
But I don't know how to handle bind variables, like so:
sqlStr = """SELECT * FROM customers
WHERE id BETWEEN :v1 AND :v2
"""
I've tried these variations:
params = (1234, 5678)
params2 = {"v1":1234, "v2":5678}
df = psql.frame_query((sqlStr,params), conn)
df = psql.frame_query((sqlStr,params2), conn)
df = psql.frame_query(sqlStr,params, conn)
df = psql.frame_query(sqlStr,params2, conn)
The following works:
curs = conn.cursor()
curs.execute(sqlStr, params)
df = pd.DataFrame(curs.fetchall())
df.columns = [rec[0] for rec in curs.description]
but this solution is just...inellegant. If I can, I'd like to do this without creating the cursor object. Is there a way to do the whole thing using just pandas?
Try using pandas.io.sql.read_sql_query. I used pandas version 0.20.1, I used it, it worked out:
import pandas as pd
import pandas.io.sql as psql
import cx_Oracle as odb
conn = odb.connect(_user +'/'+ _pass +'#'+ _dbenv)
sqlStr = """SELECT * FROM customers
WHERE id BETWEEN :v1 AND :v2
"""
pars = {"v1":1234, "v2":5678}
df = psql.frame_query(sqlStr, conn, params=pars)
As far as I can tell, pandas expects that the SQL string be completely formed prior to passing it along. With that in mind, I would (and always do) use string interpolation:
params = (1234, 5678)
sqlStr = """
SELECT * FROM customers
WHERE id BETWEEN %d AND %d
""" % params
print(sqlStr)
which gives
SELECT * FROM customers
WHERE id BETWEEN 1234 AND 5678
So that should feed into psql.frame_query just fine. (it does in my experience with postgres, mysql, and sql server).

How to convert SQL Query result to PANDAS Data Structure?

Any help on this problem will be greatly appreciated.
So basically I want to run a query to my SQL database and store the returned data as Pandas data structure.
I have attached code for query.
I am reading the documentation on Pandas, but I have problem to identify the return type of my query.
I tried to print the query result, but it doesn't give any useful information.
Thanks!!!!
from sqlalchemy import create_engine
engine2 = create_engine('mysql://THE DATABASE I AM ACCESSING')
connection2 = engine2.connect()
dataid = 1022
resoverall = connection2.execute("
SELECT
sum(BLABLA) AS BLA,
sum(BLABLABLA2) AS BLABLABLA2,
sum(SOME_INT) AS SOME_INT,
sum(SOME_INT2) AS SOME_INT2,
100*sum(SOME_INT2)/sum(SOME_INT) AS ctr,
sum(SOME_INT2)/sum(SOME_INT) AS cpc
FROM daily_report_cooked
WHERE campaign_id = '%s'",
%dataid
)
So I sort of want to understand what's the format/datatype of my variable "resoverall" and how to put it with PANDAS data structure.
Here's the shortest code that will do the job:
from pandas import DataFrame
df = DataFrame(resoverall.fetchall())
df.columns = resoverall.keys()
You can go fancier and parse the types as in Paul's answer.
Edit: Mar. 2015
As noted below, pandas now uses SQLAlchemy to both read from (read_sql) and insert into (to_sql) a database. The following should work
import pandas as pd
df = pd.read_sql(sql, cnxn)
Previous answer:
Via mikebmassey from a similar question
import pyodbc
import pandas.io.sql as psql
cnxn = pyodbc.connect(connection_info)
cursor = cnxn.cursor()
sql = "SELECT * FROM TABLE"
df = psql.frame_query(sql, cnxn)
cnxn.close()
If you are using SQLAlchemy's ORM rather than the expression language, you might find yourself wanting to convert an object of type sqlalchemy.orm.query.Query to a Pandas data frame.
The cleanest approach is to get the generated SQL from the query's statement attribute, and then execute it with pandas's read_sql() method. E.g., starting with a Query object called query:
df = pd.read_sql(query.statement, query.session.bind)
Edit 2014-09-30:
pandas now has a read_sql function. You definitely want to use that instead.
Original answer:
I can't help you with SQLAlchemy -- I always use pyodbc, MySQLdb, or psychopg2 as needed. But when doing so, a function as simple as the one below tends to suit my needs:
import decimal
import pyodbc #just corrected a typo here
import numpy as np
import pandas
cnn, cur = myConnectToDBfunction()
cmd = "SELECT * FROM myTable"
cur.execute(cmd)
dataframe = __processCursor(cur, dataframe=True)
def __processCursor(cur, dataframe=False, index=None):
'''
Processes a database cursor with data on it into either
a structured numpy array or a pandas dataframe.
input:
cur - a pyodbc cursor that has just received data
dataframe - bool. if false, a numpy record array is returned
if true, return a pandas dataframe
index - list of column(s) to use as index in a pandas dataframe
'''
datatypes = []
colinfo = cur.description
for col in colinfo:
if col[1] == unicode:
datatypes.append((col[0], 'U%d' % col[3]))
elif col[1] == str:
datatypes.append((col[0], 'S%d' % col[3]))
elif col[1] in [float, decimal.Decimal]:
datatypes.append((col[0], 'f4'))
elif col[1] == datetime.datetime:
datatypes.append((col[0], 'O4'))
elif col[1] == int:
datatypes.append((col[0], 'i4'))
data = []
for row in cur:
data.append(tuple(row))
array = np.array(data, dtype=datatypes)
if dataframe:
output = pandas.DataFrame.from_records(array)
if index is not None:
output = output.set_index(index)
else:
output = array
return output
1. Using MySQL-connector-python
# pip install mysql-connector-python
import mysql.connector
import pandas as pd
mydb = mysql.connector.connect(
host = 'host',
user = 'username',
passwd = 'pass',
database = 'db_name'
)
query = 'select * from table_name'
df = pd.read_sql(query, con = mydb)
print(df)
2. Using SQLAlchemy
# pip install pymysql
# pip install sqlalchemy
import pandas as pd
import sqlalchemy
engine = sqlalchemy.create_engine('mysql+pymysql://username:password#localhost:3306/db_name')
query = '''
select * from table_name
'''
df = pd.read_sql_query(query, engine)
print(df)
MySQL Connector
For those that works with the mysql connector you can use this code as a start. (Thanks to #Daniel Velkov)
Used refs:
Querying Data Using Connector/Python
Connecting to MYSQL with Python in 3 steps
import pandas as pd
import mysql.connector
# Setup MySQL connection
db = mysql.connector.connect(
host="<IP>", # your host, usually localhost
user="<USER>", # your username
password="<PASS>", # your password
database="<DATABASE>" # name of the data base
)
# You must create a Cursor object. It will let you execute all the queries you need
cur = db.cursor()
# Use all the SQL you like
cur.execute("SELECT * FROM <TABLE>")
# Put it all to a data frame
sql_data = pd.DataFrame(cur.fetchall())
sql_data.columns = cur.column_names
# Close the session
db.close()
# Show the data
print(sql_data.head())
Here's the code I use. Hope this helps.
import pandas as pd
from sqlalchemy import create_engine
def getData():
# Parameters
ServerName = "my_server"
Database = "my_db"
UserPwd = "user:pwd"
Driver = "driver=SQL Server Native Client 11.0"
# Create the connection
engine = create_engine('mssql+pyodbc://' + UserPwd + '#' + ServerName + '/' + Database + "?" + Driver)
sql = "select * from mytable"
df = pd.read_sql(sql, engine)
return df
df2 = getData()
print(df2)
This is a short and crisp answer to your problem:
from __future__ import print_function
import MySQLdb
import numpy as np
import pandas as pd
import xlrd
# Connecting to MySQL Database
connection = MySQLdb.connect(
host="hostname",
port=0000,
user="userID",
passwd="password",
db="table_documents",
charset='utf8'
)
print(connection)
#getting data from database into a dataframe
sql_for_df = 'select * from tabledata'
df_from_database = pd.read_sql(sql_for_df , connection)
Like Nathan, I often want to dump the results of a sqlalchemy or sqlsoup Query into a Pandas data frame. My own solution for this is:
query = session.query(tbl.Field1, tbl.Field2)
DataFrame(query.all(), columns=[column['name'] for column in query.column_descriptions])
resoverall is a sqlalchemy ResultProxy object. You can read more about it in the sqlalchemy docs, the latter explains basic usage of working with Engines and Connections. Important here is that resoverall is dict like.
Pandas likes dict like objects to create its data structures, see the online docs
Good luck with sqlalchemy and pandas.
Simply use pandas and pyodbc together. You'll have to modify your connection string (connstr) according to your database specifications.
import pyodbc
import pandas as pd
# MSSQL Connection String Example
connstr = "Server=myServerAddress;Database=myDB;User Id=myUsername;Password=myPass;"
# Query Database and Create DataFrame Using Results
df = pd.read_sql("select * from myTable", pyodbc.connect(connstr))
I've used pyodbc with several enterprise databases (e.g. SQL Server, MySQL, MariaDB, IBM).
This question is old, but I wanted to add my two-cents. I read the question as " I want to run a query to my [my]SQL database and store the returned data as Pandas data structure [DataFrame]."
From the code it looks like you mean mysql database and assume you mean pandas DataFrame.
import MySQLdb as mdb
import pandas.io.sql as sql
from pandas import *
conn = mdb.connect('<server>','<user>','<pass>','<db>');
df = sql.read_frame('<query>', conn)
For example,
conn = mdb.connect('localhost','myname','mypass','testdb');
df = sql.read_frame('select * from testTable', conn)
This will import all rows of testTable into a DataFrame.
Long time from last post but maybe it helps someone...
Shorted way than Paul H:
my_dic = session.query(query.all())
my_df = pandas.DataFrame.from_dict(my_dic)
Here is mine. Just in case if you are using "pymysql":
import pymysql
from pandas import DataFrame
host = 'localhost'
port = 3306
user = 'yourUserName'
passwd = 'yourPassword'
db = 'yourDatabase'
cnx = pymysql.connect(host=host, port=port, user=user, passwd=passwd, db=db)
cur = cnx.cursor()
query = """ SELECT * FROM yourTable LIMIT 10"""
cur.execute(query)
field_names = [i[0] for i in cur.description]
get_data = [xx for xx in cur]
cur.close()
cnx.close()
df = DataFrame(get_data)
df.columns = field_names
pandas.io.sql.write_frame is DEPRECATED.
https://pandas.pydata.org/pandas-docs/version/0.15.2/generated/pandas.io.sql.write_frame.html
Should change to use pandas.DataFrame.to_sql
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_sql.html
There is another solution.
PYODBC to Pandas - DataFrame not working - Shape of passed values is (x,y), indices imply (w,z)
As of Pandas 0.12 (I believe) you can do:
import pandas
import pyodbc
sql = 'select * from table'
cnn = pyodbc.connect(...)
data = pandas.read_sql(sql, cnn)
Prior to 0.12, you could do:
import pandas
from pandas.io.sql import read_frame
import pyodbc
sql = 'select * from table'
cnn = pyodbc.connect(...)
data = read_frame(sql, cnn)
best way I do this
db.execute(query) where db=db_class() #database class
mydata=[x for x in db.fetchall()]
df=pd.DataFrame(data=mydata)
If the result type is ResultSet, you should convert it to dictionary first. Then the DataFrame columns will be collected automatically.
This works on my case:
df = pd.DataFrame([dict(r) for r in resoverall])
Here is a simple solution I like:
Put your DB connection info in a YAML file in a secure location (do not version it in the code repo).
---
host: 'hostname'
port: port_number_integer
database: 'databasename'
user: 'username'
password: 'password'
Then load the conf in a dictionary, open the db connection and load the result set of the SQL query in a data frame:
import yaml
import pymysql
import pandas as pd
db_conf_path = '/path/to/db-conf.yaml'
# Load DB conf
with open(db_conf_path) as db_conf_file:
db_conf = yaml.safe_load(db_conf_file)
# Connect to the DB
db_connection = pymysql.connect(**db_conf)
# Load the data into a DF
query = '''
SELECT *
FROM my_table
LIMIT 10
'''
df = pd.read_sql(query, con=db_connection)

Categories

Resources