I can create a BQ view by calling client.create_table but I could not find a way to update the SQL of the view.
To create:
table = bigquery.Table(table_ref)
table.view_query = view_query
client.create_table(table)
To update? (does not work)
table = client.get_table(table_ref)
table.view_query = view_query
client.update_table(table, [])
Thoughts?
The second argument to update_table is a list of fields to update in the API. By passing an empty list you are saying: don't update anything. Instead, pass in ['view_query'] as the update properties list.
table = client.get_table(table_ref)
table.view_query = view_query
client.update_table(table, ['view_query'])
Or as Elliot suggested in the comments, you can use DDL to do this operation.
I used CREATE OR REPLACE VIEW statement.
job = client.query('CREATE OR REPLACE VIEW `{}.{}.{}` AS {}'.format(client.project, dataset, view_name, view_query))
job.result()
Related
I want to choose a field to Update from my sqlite3 db using postman by utilizing request.data. However, I receive this error "OperationalError at /
near "?": syntax error". I tried this code
def put(self,request,*args,**kwargs):
connection = sqlite3.connect('/Users/lambda_school_loaner_182/Documents/job-search-be/jobsearchbe/db.sqlite3')
cursor = connection.cursor()
req = request.data
for key in req:
if key == id:
pass
else:
print(key)
cursor.execute("UPDATE users SET ? = ? WHERE id = ?;",(key,req[key],req['id']) )
connection.commit()
cursor.execute("SELECT * FROM users WHERE id=?", (request.data['id'],))
results = cursor.fetchall()
data = []
# if request.data['id']
for row in results:
object1 = {}
col_name_list = [tuple[0] for tuple in cursor.description]
for x in range(0,len(col_name_list) ):
object1[col_name_list[x]] = row[x]
data.append(object1)
cursor.close()
# serializer =PostSerializer(data = request.data )
# if serializer.is_valid():
# serializer.save()
return Response(data)
You won't be able to use ? for identifiers (the database structures, like table and column names). You will need to use string interpolation to put in the column name.
f"UPDATE users SET {key} = ? WHERE id = ?"
? are basically for values (user-supplied data).
https://docs.python.org/3/library/sqlite3.html
Usually your SQL operations will need to use values from Python variables. You shouldn’t assemble your query using Python’s string operations because doing so is insecure; it makes your program vulnerable to an SQL injection attack (see https://xkcd.com/327/ for humorous example of what can go wrong).
Instead, use the DB-API’s parameter substitution. Put ? as a placeholder wherever you want to use a value, and then provide a tuple of values as the second argument to the cursor’s execute() method. (Other database modules may use a different placeholder, such as %s or :1.)
When I make a query in SQLAlchemy, I noticed that the queries use the AS keyword for each column. It sets the alias_name = column_name for every column.
For example, if I run the command print(session.query(DefaultLog)), it returns:
Note: DefaultLog is my table object.
SELECT default_log.id AS default_log_id, default_log.msg AS default_log_msg, default_log.logger_time AS default_log_logger_time, default_log.logger_line AS default_log_logger_line, default_log.logger_filepath AS default_log_logger_filepath, default_log.level AS default_log_level, default_log.logger_name AS default_log_logger_name, default_log.logger_method AS default_log_logger_method, default_log.hostname AS default_log_hostname
FROM default_log
Why does it use an alias = original name? Is there some way I can disable this behavior?
Thank you in advance!
Query.statement:
The full SELECT statement represented by this Query.
The statement by default will not have disambiguating labels applied
to the construct unless with_labels(True) is called first.
Using this model:
class DefaultLog(Base):
id = sa.Column(sa.Integer, primary_key=True)
msg = sa.Column(sa.String(128))
logger_time = sa.Column(sa.DateTime)
logger_line = sa.Column(sa.Integer)
print(session.query(DefaultLog).statement) shows:
SELECT defaultlog.id, defaultlog.msg, defaultlog.logger_time, defaultlog.logger_line
FROM defaultlog
print(session.query(DefaultLog).with_labels().statement) shows:
SELECT defaultlog.id AS defaultlog_id, defaultlog.msg AS defaultlog_msg, defaultlog.logger_time AS defaultlog_logger_time, defaultlog.logger_line AS defaultlog_logger_line
FROM defaultlog
You asked:
Why does it use an alias = original name?
From Query.with_labels docs:
...this is commonly used to disambiguate columns from multiple tables which have the same name.
So if you want to issue a single query that calls upon multiple tables, there is nothing stopping those tables having columns that share the same name.
Is there some way I can disable this behavior?
Also from the Query.with_labels docs:
When the Query actually issues SQL to load rows, it always uses column
labeling.
All of the methods that retrieve rows (get(), one(), one_or_none(), all() and iterating over the Query) route through the Query.__iter__() method:
def __iter__(self):
context = self._compile_context()
context.statement.use_labels = True
if self._autoflush and not self._populate_existing:
self.session._autoflush()
return self._execute_and_instances(context)
... where this line hard codes the label usage: context.statement.use_labels = True. So it is "baked in" and can't be disabled.
You can execute the statement without labels:
session.execute(session.query(DefaultLog).statement)
... but that takes the ORM out of the equation.
It is possible to hack sqlachemy Query class to not add labels. But one must be aware that this will breaks when a table is used twice in the query. For example, self join or join thought another table.
from sqlalchemy.orm import Query
class MyQuery(Query):
def __iter__(self):
"""Patch to disable auto labels"""
context = self._compile_context(labels=False)
context.statement.use_labels = False
if self._autoflush and not self._populate_existing:
self.session._autoflush()
return self._execute_and_instances(context)
And then use it according to mtth answer
sessionmaker(bind=engine, query_cls=MyQuery)
Printing an SQLAlchemy query is tricky and produced not human-friendly output. Not only columns but also bind params are in an odd place.
Here's how to do it correctly:
qry = session.query(SomeTable)
compiled = qry.statement.compile(dialect=session.bind.dialect, compile_kwargs={"literal_binds": True})
print(compiled)
Here's how to fix it for all your future work:
from sqlalchemy.orm import Query
class MyQuery(Query):
def __str__(self):
dialect = self.session.bind.dialect
compiled = self.statement.compile(dialect=dialect, compile_kwargs={"literal_binds": True})
return str(compiled)
To use:
session = sessionmaker(bind=engine, query_cls=MyQuery)()
I am fetching data from mysql using pyspark which for only one table.I want to fetch all tables from mysql db. Don't want call jdbc connection again and again. see code below
Is it possible to simplify my code? Thank you in advance
url = "jdbc:mysql://localhost:3306/dbname"
table_df=sqlContext.read.format("jdbc").option("url",url).option("dbtable","table_name").option("user","root").option("password", "root").load()
sqlContext.registerDataFrameAsTable(table_df, "table1")
table_df_1=sqlContext.read.format("jdbc").option("url",url).option("dbtable","table_name_1").option("user","root").option("password", "root").load()
sqlContext.registerDataFrameAsTable(table_df_1, "table2")
you need somehow to acquire the list of the tables you have in mysql.
Either you find some sql commands to do that, or you manually create a file containing everything.
Then, assuming you can create a list of tablenames in python tablename_list, you can simply loop over it like this :
url = "jdbc:mysql://localhost:3306/dbname"
reader = (
sqlContext.read.format("jdbc")
.option("url", url)
.option("user", "root")
.option("password", "root")
)
for tablename in tablename_list:
reader.option("dbtable", tablename).load().createTempView(tablename)
This will create a temporary view with the same tablename. If you want another name, you can probably change the initial tablename_list with a list of tuples (tablename_in_mysql, tablename_in_spark).
#Steven already gave a perfect answer. As he said, in order to find a Python list of tablenames, you can use:
#list of the tables in the server
table_names_list = spark.read.format('jdbc'). \
options(
url='jdbc:postgresql://localhost:5432/', # database url (local, remote)
dbtable='information_schema.tables',
user='YOUR_USERNAME',
password='YOUR_PASSWORD',
driver='org.postgresql.Driver'). \
load().\
filter("table_schema = 'public'").select("table_name")
#DataFrame[table_name: string]
# table_names_list.collect()
# [Row(table_name='employee'), Row(table_name='bonus')]
table_names_list = [row.table_name for row in table_names_list.collect()]
print(table_names_list)
# ['employee', 'bonus']
Note that this is in PostgreSQL. You can easily change url and driver arguments.
I have the following query
INSERT INTO `min01_aggregated_data_800` (`datenum`,`Timestamp`,`QFlag_R6_WYaw`) VALUES ('734970.002777778','2012-04-11 00:04:00.000','989898') ON DUPLICATE KEY UPDATE `datenum`=VALUES(`datenum`);
INSERT INTO `min01_aggregated_data_100` (`datenum`,`Timestamp`,`QFlag_R6_WYaw`) VALUES ('734970.002777778','2012-04-11 00:04:00.000','989898') ON DUPLICATE KEY UPDATE `datenum`=VALUES(`datenum`);
INSERT INTO `min01_aggregated_data_300` (`datenum`,`Timestamp`,`QFlag_R6_WYaw`) VALUES ('734970.002777778','2012-04-11 00:04:00.000','989898') ON DUPLICATE KEY UPDATE `datenum`=VALUES(`datenum`);
I'm using the mysql.connector package to insert the data to the MySQL
self.db = mysql.connector.Connect( host = self.m_host, user = self.m_user, password = self.m_passwd, \
database = self.m_db, port = int( self.m_port ) )
self.con = self.db.cursor( cursor )
self.con.execute( query )
self.db.commit()
self.db.close()
self.con.close()
But I'm getting the following error Use multi=True when executing multiple statements
I tried to use the multi=True in this case I'm not getting any exception, but the data won't be inserted to the MySQL. How can I insert multiple rows?
I see three options:
Send every query to the DB separately:
[...]
self.con.execute(query1)
self.con.execute(query2)
self.con.execute(query3)
[...]
[removed as it didn't apply here]
I am not very familiar with this multi=True, however; it might be possible that there is a solution which calls the self.con.nextset() repeatedly. According to the doc, this is only for multiple result sets, but perhaps it is needed on a multi-query request as well.
You have three separate queries, so each one should be run separately, i.e:
self.con.execute(query1)
self.con.execute(query2)
self.con.execute(query3)
I am migrating some data from other databases , so i am using raw sql queries for inserting data into database . But i don't know how to get last inserted id from raw sql queries in django. I have tried this
affected_count1=cursor2.execute("table')")
and
SELECT IDENT_CURRENT(‘MyTable’)
but it gives me the error of "(1305, 'FUNCTION pydev.SCOPE_IDENTITY does not exist')"
So please tell me how can i get the last inserted id in raw sq l queries in django
You can get latest create obj like this:
obj = Foo.objects.latest('id')
more info here
Try this
LastInsertId = (TableName.objects.last()).id
In Django 1.6
obj = Foo.objects.latest('id')
obj = Foo.objects.earliest('id')