Google bigquery python client library SQL select regex error - python

I'm trying to query a google bigquery table using the regex from this blog post. Here it is, slightly modified:
pd\.([^”,\.\(\,\`) \’:\[\]\/\\={}]*)
regex101 example of its usage
It does not, however, work in my google bigquery python client SQL query:
query_results = client.run_sync_query(
"""
SELECT
REGEXP_EXTRACT(SPLIT(content, '\n'),
r'pd\.([^”,\.\(\,\`) \’:\[\]\/\\={}]*)')
FROM
[fh-bigquery:github_extracts.contents_py]
LIMIT 10
""")
query_results.run()
data = query_results.fetch_data()
data
BadRequest: BadRequest: 400 Failed to parse regular expression "pd.([^”,.(\,`) \’:[]/\={}]*)": invalid escape sequence: \’

The problem here is that BigQuery uses re2 library for its regex operations.
If you try the same regex but using the golang flavor you will see the exact same error (golang also uses re2).
So maybe if you just remove the escaping of the ' character you'll already have it working for you (as I tested here it seemed to work properly).
Another issue that you might find is that the result of the SPLIT operation is an ARRAY. That means that BigQuery won't process your query saying that the signature of REGEXP_EXTRACT does not allow ARRAY<STRING> as input. You could use REGEXP_REPLACE instead:
"""
SELECT
REGEXP_EXTRACT(REGEXP_REPLACE(content, r'.*(\\n)', ''),
r'pd\.([^”,\.\(\,\`) ’:\[\]\/\\={}]*)')
FROM
[fh-bigquery:github_extracts.contents_py]
LIMIT 10
"""
The character "\n" is replaced by "" in this operation and the result is a STRING.

Related

Python cassandra update statement error 'mismatched input 're' expecting K_WHERE '?

I am trying to update cassandra database using python client as follows.
def update_content(session, id, content)
update_statement = """
UPDATE mytable SET content='{}' WHERE id={}
"""
session.execute(update_statement.format(content, id))
It works for most of the cases but in some scenarios the content is a string of the form
content = "Content Message -'[re]...)"
which results in error Exception calling application: <Error from server: code=2000 [Syntax error in CQL query] message="line 2:61 mismatched input 're' expecting K_WHERE (
which I am not sure why is it happening?
Is cassandra trying to interpret the string as regex somehow.
I tried printing the data before updation and its seems fine
"UPDATE mytable SET content='Content Message -'[re]...)' WHERE id=2"
To avoid such problems you should stop using the .format to create CQL statements, and start to use prepared statements that allow you to:
avoid problems with not escaped special characters, like, '
allows to do basic type checking
get better performance, because query will be parsed once, and only the data will be sent over the wire
you'll get token-aware query routing, meaning that query will be sent directly to one of the replicas that holds data for partition
Your code need to be modified as following:
prep_statement = session.prepare('UPDATE mytable SET content=? WHERE id=?')
def update_content(session, id, content):
session.execute(prep_statement, [content, id])
Please notice that statement need to be prepared only once, because it includes the round-trip to the cluster nodes to perform parsing of the query

Not able to do escape query

iam new to cassandra,
i want to do get query using cassandra python client? iam not able to escape special characters.can anyone help
Below is the query which iam trying, but getting syntax error
SELECT pmid FROM chemical WHERE mentions=$$
N,N'-((1Z,3Z)-1,4-bis(4-methoxyphenyl)buta-1,3-diene-2,3-diyl)diformamide
$$ AND pmid=31134000 ALLOW FILTERING;
it is giving me error
Error from server: code=2000 [Syntax error in CQL query] message="line 1:118 mismatched input '-' expecting ')' (...,source) VALUES ('be75372a-c311-11e9-ac2c-0a0df85af938','N,N'[-]...)"
Based on the Syntax Provided as i see there is a Single quotes missing in your Query .
Suggestion
Note to use ALLOW FILTERING as it will Scan your Table which will be a performance issue.

Django: Using named parameters on a raw SQL query

I'm trying to execute a raw query that is built dynamically.
To assure that the parameters are inserted in the valid position I'm using named parameters.
This seems to work for Sqlite without any problems. (all my tests succeed)
But when I'm running the same code against MariaDB it fails...
A simple example query:
SELECT u.*
FROM users_gigyauser AS u
WHERE u.email like :u_email
GROUP BY u.id
ORDER BY u.last_login DESC
LIMIT 60 OFFSET 0
Parameters are:
{'u_email': '%test%'}
The error I get is a default syntax error as the parameter is not replaced.
I tried using '%' as an indicator, but this resulted in SQL trying to parse
%u[_email]
and that returned a type error.
I'm executing the query like this:
raw_queryset = GigyaUser.objects.raw(
self.sql_fetch, self._query_object['params']
)
Or when counting:
cursor.execute(self.sql_count, self._query_object['params'])
Both give the same error on MariaDB but work on Sqlite (using the ':' indicator)
Now, what am I missing?
edit:
The format needs to have s suffix as following:
%(u_email)s
If you are using SQLite3, for some reason syntax %(name)s will not work.
You have to use :name syntax instead if you want to pass your params as {"name":"value"} dictionary.
It's contrary to the documentation, that states the first syntax should work with all DB engines.
Heres the source of the issue:
https://code.djangoproject.com/ticket/10070#comment:18

Peewee execute_sql with escaped characters

I have wrote a query which has some string replacements. I am trying to update a url in a table but the url has % signs in which causes a tuple index out of range exception.
If I print the query and run in manually it works fine but through peewee causes an issue. How can I get round this? I'm guessing this is because the percentage signs?
query = """
update table
set url = '%s'
where id = 1
""" % 'www.example.com?colour=Black%26white'
db.execute_sql(query)
The code you are currently sharing is incredibly unsafe, probably for the same reason as is causing your bug. Please do not use it in production, or you will be hacked.
Generally: you practically never want to use normal string operations like %, +, or .format() to construct a SQL query. Rather, you should to use your SQL API/ORM's specific built-in methods for providing dynamic values for a query. In your case of SQLite in peewee, that looks like this:
query = """
update table
set url = ?
where id = 1
"""
values = ('www.example.com?colour=Black%26white',)
db.execute_sql(query, values)
The database engine will automatically take care of any special characters in your data, so you don't need to worry about them. If you ever find yourself encountering issues with special characters in your data, it is a very strong warning sign that some kind of security issue exists.
This is mentioned in the Security and SQL Injection section of peewee's docs.
Wtf are you doing? Peewee supports updates.
Table.update(url=new_url).where(Table.id == some_id).execute()

MySQL LOAD DATA LOCAL INFILE example in python?

I am looking for a syntax definition, example, sample code, wiki, etc. for
executing a LOAD DATA LOCAL INFILE command from python.
I believe I can use mysqlimport as well if that is available, so any feedback (and code snippet) on which is the better route, is welcome. A Google search is not turning up much in the way of current info
The goal in either case is the same: Automate loading hundreds of files with a known naming convention & date structure, into a single MySQL table.
David
Well, using python's MySQLdb, I use this:
connection = MySQLdb.Connect(host='**', user='**', passwd='**', db='**')
cursor = connection.cursor()
query = "LOAD DATA INFILE '/path/to/my/file' INTO TABLE sometable FIELDS TERMINATED BY ';' ENCLOSED BY '\"' ESCAPED BY '\\\\'"
cursor.execute( query )
connection.commit()
replacing the host/user/passwd/db as appropriate for your needs. This is based on the MySQL docs here, The exact LOAD DATA INFILE statement would depend on your specific requirements etc (note the FIELDS TERMINATED BY, ENCLOSED BY, and ESCAPED BY statements will be specific to the type of file you are trying to read in).
You can also get the results for the import by adding the following lines after your query:
results = connection.info()

Categories

Resources