Alembic autogenerate not working for table modifications when run programmatically

Alembic autogenerate not working for table modifications when run programmatically - python

I'm trying to create automatic tests for alembic, specifically to test that the revision --autogenerate command provides the expected migrations. Instead of running from the command line, I want to obtain the suggested revisions programmatically and run them to confirm that the effect on the test database is as expected.
I have two models - initial_model_base and removed_column_base, the idea is that when I run an autogenerated revision for the second time, alembic should detect that the second model has 1 less column and create a revision to remove it. However, I'm getting ValueError: no dispatch function for object: <alembic.operations.ops.ModifyTableOps. Here is the full stack trace:
File "C:\Users\Ronel\Desktop\repos\test_alembic\tests\tests.py", line 152, in
test_removing_column()
File "C:\Users\Ronel\Desktop\repos\test_alembic\tests\tests.py", line 109, in test_removing_column
ops.invoke(op)
File "C:\Users\Ronel\AppData\Roaming\Python\Python310\site-packages\alembic\operations\base.py", line 396, in invoke
fn = self._to_impl.dispatch(
File "C:\Users\Ronel\AppData\Roaming\Python\Python310\site-packages\alembic\util\langhelpers.py", line 258, in dispatch
raise ValueError("no dispatch function for object: %s" % obj)
ValueError: no dispatch function for object: <alembic.operations.ops.ModifyTableOps object at 0x0000028C0FBC1ED0>
The table I use for testing is test_alembic_schema.employees so to prevent other tables getting affected I provided an include_object function to the config kwargs.
def include_object(object, name, type_, reflected, compare_to):
if type_ == "table":
if object.schema == "test_alembic_schema" and object.name == 'employees':
return True
else:
return False
else:
return True
mc = MigrationContext.configure(
connection=DB_DEV.connection,
opts={
'compare_type': True,
'include_schemas': True,
'include_object': include_object
}
)
ops = Operations(mc)
def test_creation():
diff_ops = produce_migrations(
context=mc,
metadata=initial_model_base.metadata
)
with DB_DEV.transaction(autocommit=True):
for op in diff_ops.upgrade_ops.ops:
ops.invoke(op)
employees = DB_DEV.get_table(table_name='employees', schema_name='test_alembic_schema')
assert len(employees.c) == 8
assert employees.c.id.primary_key
assert not employees.c.first_name.primary_key
def test_removing_column():
diff_ops = produce_migrations(
context=mc,
metadata=removed_column_base.metadata
)
with DB_DEV.transaction(autocommit=True):
for op in diff_ops.upgrade_ops.ops:
ops.invoke(op)
employees = DB_DEV.get_table(table_name='employees', schema_name='test_alembic_schema')
col_found = False
for col in employees.columns:
if col.name == 'created':
col_found = True
if col_found:
assert True
else:
assert False
I am confused as to why the operation to create a table succeeds but to amend it fails. Do you have ideas how this can be fixed so both operations work as expected?

Related

How to fill a column of type 'multi-select' in notion-py(Notion API)?

I am trying to create a telegram-bot that will create notes in notion, for this I use:
notion-py
pyTelegramBotAPI
then I connected my notion by adding token_v2, and then receiving data about the note that I want to save in notion, at the end I save a note on notion like this:
def make_notion_row():
collection_view = client.get_collection_view(list_url[temporary_category]) #take collection
print(temporary_category)
print(temporary_name)
print(temporary_link)
print(temporary_subcategory)
print(temporary_tag)
row = collection_view.collection.add_row() #make row
row.ssylka = temporary_link #this is link
row.nazvanie_zametki = temporary_name #this is name
if temporary_category == 0: #this is category, where do I want to save the note
row.stil = temporary_subcategory #this is subcategory
tags = temporary_tag.split(',') #temporary_tags is text that has many tags separated by commas. I want to get these tags as an array
for tag_one in tags:
**add_new_multi_select_value("Теги", tag_one): #"Теги" is "Tag column" in russian. in this situation, tag_one takes on the following values: ['my_hero_academia','midoria']**
else:
row.kategoria = temporary_subcategory
this script works, but the problem is filling in the Tags column which is of type multi-select.
Since in the readme 'notion-py', nothing was said about filling in the 'multi-select', therefore
I used the bkiac function:https://github.com/jamalex/notion-py/issues/51
here is the slightly modified by me function:
art_tags = ['ryuko_matoi', 'kill_la_kill']
def add_new_multi_select_value(prop, value, style=None):
global temporary_prop_schema
if style is None:
style = choice(art_tags)
collection_schema = collection_view.collection.get(["schema"])
prop_schema = next(
(v for k, v in collection_schema.items() if v["name"] == prop), None
)
if not prop_schema:
raise ValueError(
f'"{prop}" property does not exist on the collection!'
)
if prop_schema["type"] != "multi_select":
raise ValueError(f'"{prop}" is not a multi select property!')
dupe = next(
(o for o in prop_schema["options"] if o["value"] == value), None
)
if dupe:
raise ValueError(f'"{value}" already exists in the schema!')
temporary_prop_schema = prop_schema
prop_schema["options"].append(
{"id": str(uuid1()), "value": value, "style": style}
)
collection.set("schema", collection_schema)`
But it turned out that this function does not work, and gives the following error:
add_new_multi_select_value("Теги","my_hero_academia)
Traceback (most recent call last):
File "<pyshell#4>", line 1, in <module>
add_new_multi_select_value("Теги","my_hero_academia)
File "C:\Users\laere\OneDrive\Documents\Programming\Other\notion-bot\program\notionbot\test.py", line 53, in add_new_multi_select_value
collection.set("schema", collection_schema)
File "C:\Users\laere\AppData\Local\Programs\Python\Python39-32\lib\site-packages\notion\records.py", line 115, in set
self._client.submit_transaction(
File "C:\Users\laere\AppData\Local\Programs\Python\Python39-32\lib\site-packages\notion\client.py", line 290, in submit_transaction
self.post("submitTransaction", data)
File "C:\Users\laere\AppData\Local\Programs\Python\Python39-32\lib\site-packages\notion\client.py", line 260, in post
raise HTTPError(
requests.exceptions.HTTPError: Unsaved transactions: Not allowed to edit column: schema
this is my table image: link
this is my telegram chatting to bot: link
Honestly, I don’t know how to solve this problem, the question is how to fill a column of type 'multi-select'?

I solved this problem using this command
row.set_property("Категория", temporary_subcategory)
and do not be afraid if there is an error "options ..." this can be solved by adding settings for the 'multi-select' field.

Pandoc Filter via Panflute not Working as Expected

Problem
For a Markdown document I want to filter out all sections whose header titles are not in the list to_keep. A section consists of a header and the body until the next section or the end of the document. For simplicity lets assume that the document only has level 1 headers.
When I make a simple case distinction on whether the current element has been preceeded by a header in to_keep and do either return None or return [] I get an error. That is, for pandoc --filter filter.py -o output.pdf input.md I get TypeError: panflute.dump needs input of type "panflute.Doc" but received one of type "list" (code, example file and complete error message at the end).
I use Python 3.7.4 and panflute 1.12.5 and pandoc 2.2.3.2.
Question
If make a more fine grained distinction on when to do return [], it works (function action_working). My question is, why is this more fine grained distinction neccesary? My solution seems to work, but it might well be accidental... How can I get this to work properly?
Files
error
Traceback (most recent call last):
File "filter.py", line 42, in <module>
main()
File "filter.py", line 39, in main
return run_filter(action_not_working, doc=doc)
File "C:\Users\ody_he\AppData\Local\Continuum\anaconda3\lib\site-packages\panflute\io.py", line 266, in run_filter
return run_filters([action], *args, **kwargs)
File "C:\Users\ody_he\AppData\Local\Continuum\anaconda3\lib\site-packages\panflute\io.py", line 253, in run_filters
dump(doc, output_stream=output_stream)
File "C:\Users\ody_he\AppData\Local\Continuum\anaconda3\lib\site-packages\panflute\io.py", line 132, in dump
raise TypeError(msg)
TypeError: panflute.dump needs input of type "panflute.Doc" but received one of type "list"
Error running filter filter.py:
Filter returned error status 1
input.md
# English
Some cool english text this is!
# Deutsch
Dies ist die deutsche Übersetzung!
# Sources
Some source.
# Priority
**Medium** *[Low | Medium | High]*
# Status
**Open for Discussion** *\[Draft | Open for Discussion | Final\]*
# Interested Persons (mailing list)
- Franz, Heinz, Karl
fiter.py
from panflute import *
to_keep = ['Deutsch', 'Status']
keep_current = False
def action_not_working(elem, doc):
'''For every element we check if it occurs in a section we wish to keep.
If it is, we keep it and return None (indicating to keep the element unchanged).
Otherwise we remove the element (return []).'''
global to_keep, keep_current
update_keep(elem)
if keep_current:
return None
else:
return []
def action_working(elem, doc):
global to_keep, keep_current
update_keep(elem)
if keep_current:
return None
else:
if isinstance(elem, Header):
return []
elif isinstance(elem, Para):
return []
elif isinstance(elem, BulletList):
return []
def update_keep(elem):
'''if the element is a header we update to_keep.'''
global to_keep, keep_current
if isinstance(elem, Header):
# Keep if the title of a section is in too keep
keep_current = stringify(elem) in to_keep
def main(doc=None):
return run_filter(action_not_working, doc=doc)
if __name__ == '__main__':
main()

I think what happens is that panflute call the action on all elements, including the Doc root element. If keep_current is False when walking the Doc element, it will be replaced by a list. This leads to the error message you are seeing, as panflute expectes the root node to always be there.
The updated filter only acts on Header, Para, and BulletList elements, so the Doc root node will be left untouched. You'll probably want to use something more generic like isinstance(elem, Block) instead.
An alternative approach could be to use panflute's load and dump elements directly: load the document into a Doc element, manually iterate over all blocks in args and remove all that are unwanted, then dump the resulting doc back into the output stream.
from panflute import *
to_keep = ['Deutsch', 'Status']
keep_current = False
doc = load()
for top_level_block in doc.args:
# do things, remove unwanted blocks
dump(doc)

Remove selectable properties from "Add Custom Filter"/"Add Custom Group" on ODOO Tree-View

I want to remove a lot of the standard filter properties which appear in "Add custom filter..." and "Add custom group..." on a Tree-View (here: hr.employee.tree).
The filter properties which appear for selection, are obviously all fields in the associated model of the Tree-View, but I don't need all of them.
I figured out a very promising way, which actually works in regard of removing these properties from the Filter/Grouping, but raises exceptions when saving a create/edit on the Form-View of the same model:
## These are the fields I want to keep in "Filter by"/"Group by"
filerable_groupable_fields = ['name','phone','private_email','gender','department_id','work_email','work_phone','birthday']
#api.model
def fields_get(self, allfields=None, attributes=None):
res = super(HrEmployee, self).fields_get(allfields, attributes=attributes)
not_filerable_groupable_fields = set(self._fields.keys()) - set(self.filerable_groupable_fields)
for field in not_filerable_groupable_fields:
res[field]['selectable'] = False ## Remove from FilterBy
res[field]['sortable'] = False ## Remove from GroupBy
return res
Exception when saving on Form-View for basically every field I touched in the loop above:
[...]
File "/usr/lib/python3/dist-packages/odoo/addons/hr/models/hr_employee.py", line 244, in create
employee = super(HrEmployeePrivate, self).create(vals)
File "<decorator-gen-105>", line 2, in create
File "/usr/lib/python3/dist-packages/odoo/api.py", line 343, in _model_create_multi
return create(self, [arg])
File "/usr/lib/python3/dist-packages/odoo/addons/mail/models/mail_thread.py", line 297, in create
tracked_fields = self._get_tracked_fields()
File "/usr/lib/python3/dist-packages/odoo/addons/mail/models/mail_thread.py", line 554, in _get_tracked_fields
return self.fields_get(tracked_fields)
File "/mnt/extra-addons/custom_swaf_hr/models/hr_employee.py", line 165, in fields_get
res[field]['selectable'] = False ## Remove FilterBy
KeyError: 'mobile_phone'
It seems like the exception occurs for tracked-fields (mail_thread.py).
Any ideas?

I figured it out already. This is the solution:
#api.model
def fields_get(self, allfields=None, attributes=None):
res = super(HrEmployee, self).fields_get(allfields, attributes=attributes)
not_filerable_groupable_fields = set(self._fields.keys()) - set(self.filerable_groupable_fields)
for field in not_filerable_groupable_fields:
if field in res:
res[field]['selectable'] = False ## Remove FilterBy
res[field]['sortable'] = False ## Remove GroupBy
return res
Maybe this is a help for other people, too.

python multiprocessing + peewee + postgresql fails with SSL error

I am trying to write a Python model which is capable of doing some processing in a PostgreSQL database using the multi-threading module and peewee.
In single core mode the code works, however, when I try to run the code with multiple cores I am running into a SSL error.
I would like to post the structure of my model in the hope that somebody can advice how to set of my model in a proper way. Currently, I have chosen to use an object oriented approach in which I make one connection which is shared in a pool. To clarify what I have done, I will now show the source code I have so far
I have three files: main.py, models.py and parser.py. The contents is the following
models.py defines the peewee postgresql table and makes a connection to the postgres server
import peewee as pw
from playhouse.pool import PooledPostgresqlExtDatabase
KVK_KEY = "id_number"
NAME_KEY = "name"
N_VOWELS_KEY = "n_vowels"
# initialise the data base
database = PooledPostgresqlExtDatabase(
"testdb", user="postgres", host="localhost", port=5432, password="xxxx",
max_connections=8, stale_timeout=300 )
class BaseModel(pw.Model):
class Meta:
database = database
only_save_dirty = True
# this class describes the format of the sql data base
class Company(BaseModel):
id_number = pw.IntegerField(primary_key=True)
name = pw.CharField(null=True)
n_vowels = pw.IntegerField(default=-1)
processor = pw.IntegerField(default=-1)
def connect_database(database_name, reset_database=False):
""" connect the database """
database.connect()
if reset_database:
database.drop_tables([Company])
database.create_tables([Company])
parser.py contains the CompanyParser class which is used as the engine of the code to do all the processing. It generates some artificial data which is stored to the postgresql database and then the run method is used to do some processing with the data already stored in the database
import pandas as pd
import numpy as np
import random
import string
import peewee as pw
from models import (Company, database, KVK_KEY, NAME_KEY)
import multiprocessing as mp
MAX_SQL_CHUNK = 1000
np.random.seed(0)
def random_name(size=8, chars=string.ascii_lowercase):
""" Create a random character string of 'size' characters """
return "".join(random.choice(chars) for _ in range(size))
def vowel_count(characters):
"""
Count the number of vowels in the string 'characters' and return as an integer
"""
count = 0
for char in characters:
if char in list("aeiou"):
count += 1
return count
class CompanyParser(mp.Process):
def __init__(self, number_of_companies=100, i_proc=None,
number_of_procs=1,
first_id=None, last_id=None):
if i_proc is not None and number_of_procs > 1:
mp.Process.__init__(self)
self.i_proc = i_proc
self.number_of_procs = number_of_procs
self.n_companies = number_of_companies
self.data_df: pd.DataFrame = None
self.first_id = first_id
self.last_id = last_id
def generate_data(self):
""" Create a dataframe with fake company data and id's """
id_list = np.random.randint(1000000, 9999999, self.n_companies)
company_list = np.array([random_name() for _ in range(self.n_companies)])
self.data_df = pd.DataFrame(data=np.vstack([id_list, company_list]).T,
columns=[KVK_KEY, NAME_KEY])
self.data_df.sort_values([KVK_KEY], inplace=True)
def store_to_database(self):
"""
Store the company data to a sql database
"""
record_list = list(self.data_df.to_dict(orient="index").values())
n_batch = int(len(record_list) / MAX_SQL_CHUNK) + 1
with database.atomic():
for cnt, batch in enumerate(pw.chunked(record_list, MAX_SQL_CHUNK)):
print(f"writing {cnt}/{n_batch}")
Company.insert_many(batch).execute()
def run(self):
print("Making query at {}".format(self.i_proc))
query = (Company.
select().
where(Company.id_number.between(self.first_id, self.last_id)))
print("Found {} companies".format(query.count()))
for cnt, company in enumerate(query):
print("Processing # {} - {}: company {}/{}".format(self.i_proc, cnt,
company.id_number,
company.name))
number_of_vowels = vowel_count(company.name)
company.n_vowels = number_of_vowels
company.processor = self.i_proc
print(f"storing number of vowels: {number_of_vowels}")
company.save()
Finally, my main script load the class stored in the models.py and parser.py and launches the code.
from models import (Company, connect_database)
from parser import CompanyParser
number_of_processors = 2
connect_database(None, reset_database=True)
# init an object of the CompanyParser and use the create database
parser = CompanyParser()
company_ids = Company.select(Company.id_number)
parser.generate_data()
parser.store_to_database()
n_companies = company_ids.count()
n_comp_per_proc = int(n_companies / number_of_processors)
print("Found {} companies: {} per proc".format(n_companies, n_comp_per_proc))
for i_proc in range(number_of_processors):
i_start = i_proc * n_comp_per_proc
first_id = company_ids[i_start]
last_id = company_ids[i_start + n_comp_per_proc - 1]
print(f"Running proc {i_proc} for id {first_id} until id {last_id}")
sub_parser = CompanyParser(first_id=first_id, last_id=last_id,
i_proc=i_proc,
number_of_procs=number_of_processors)
if number_of_processors > 1:
sub_parser.start()
else:
sub_parser.run()
In case that the number_of_processors = 1 this script works perfectly fine. It generates artificial data, stores it to the PostgreSQL database and does some processing on the data (it counts the number of vowels in the name and stores it to the n_vowels column)
However, in case I am trying to run this with 2 cores with number_of_processors = 2, I run into the following error
/opt/miniconda3/bin/python /home/eelco/PycharmProjects/multiproc_peewee/main.py
writing 0/1
Found 100 companies: 50 per proc
Running proc 0 for id 1020737 until id 5295565
Running proc 1 for id 5302405 until id 9891087
Making query at 0
Found 50 companies
Processing # 0 - 0: company 1020737/wqrbgxiu
storing number of vowels: 2
Making query at 1
Process CompanyParser-1:
Processing # 0 - 1: company 1086107/lkbagrbc
storing number of vowels: 1
Processing # 0 - 2: company 1298367/nsdjsqio
storing number of vowels: 2
Traceback (most recent call last):
File "/opt/miniconda3/lib/python3.7/site-packages/peewee.py", line 2714, in execute_sql
cursor.execute(sql, params or ())
psycopg2.OperationalError: SSL error: sslv3 alert bad record mac
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/miniconda3/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
File "/home/eelco/PycharmProjects/multiproc_peewee/parser.py", line 82, in run
company.save()
File "/opt/miniconda3/lib/python3.7/site-packages/peewee.py", line 5748, in save
rows = self.update(**field_dict).where(self._pk_expr()).execute()
File "/opt/miniconda3/lib/python3.7/site-packages/peewee.py", line 1625, in inner
return method(self, database, *args, **kwargs)
File "/opt/miniconda3/lib/python3.7/site-packages/peewee.py", line 1696, in execute
return self._execute(database)
File "/opt/miniconda3/lib/python3.7/site-packages/peewee.py", line 2121, in _execute
cursor = database.execute(self)
File "/opt/miniconda3/lib/python3.7/site-packages/playhouse/postgres_ext.py", line 468, in execute
cursor = self.execute_sql(sql, params, commit=commit)
File "/opt/miniconda3/lib/python3.7/site-packages/peewee.py", line 2721, in execute_sql
self.commit()
File "/opt/miniconda3/lib/python3.7/site-packages/peewee.py", line 2512, in __exit__
reraise(new_type, new_type(*exc_args), traceback)
File "/opt/miniconda3/lib/python3.7/site-packages/peewee.py", line 186, in reraise
raise value.with_traceback(tb)
File "/opt/miniconda3/lib/python3.7/site-packages/peewee.py", line 2714, in execute_sql
cursor.execute(sql, params or ())
peewee.OperationalError: SSL error: sslv3 alert bad record mac
Process CompanyParser-2:
Traceback (most recent call last):
File "/opt/miniconda3/lib/python3.7/site-packages/peewee.py", line 2714, in execute_sql
cursor.execute(sql, params or ())
psycopg2.OperationalError: SSL error: decryption failed or bad record mac
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/miniconda3/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
File "/home/eelco/PycharmProjects/multiproc_peewee/parser.py", line 72, in run
print("Found {} companies".format(query.count()))
File "/opt/miniconda3/lib/python3.7/site-packages/peewee.py", line 1625, in inner
return method(self, database, *args, **kwargs)
File "/opt/miniconda3/lib/python3.7/site-packages/peewee.py", line 1881, in count
return Select([clone], [fn.COUNT(SQL('1'))]).scalar(database)
File "/opt/miniconda3/lib/python3.7/site-packages/peewee.py", line 1625, in inner
return method(self, database, *args, **kwargs)
File "/opt/miniconda3/lib/python3.7/site-packages/peewee.py", line 1866, in scalar
row = self.tuples().peek(database)
File "/opt/miniconda3/lib/python3.7/site-packages/peewee.py", line 1625, in inner
return method(self, database, *args, **kwargs)
File "/opt/miniconda3/lib/python3.7/site-packages/peewee.py", line 1853, in peek
rows = self.execute(database)[:n]
File "/opt/miniconda3/lib/python3.7/site-packages/peewee.py", line 1625, in inner
return method(self, database, *args, **kwargs)
File "/opt/miniconda3/lib/python3.7/site-packages/peewee.py", line 1696, in execute
return self._execute(database)
File "/opt/miniconda3/lib/python3.7/site-packages/peewee.py", line 1847, in _execute
cursor = database.execute(self)
File "/opt/miniconda3/lib/python3.7/site-packages/playhouse/postgres_ext.py", line 468, in execute
cursor = self.execute_sql(sql, params, commit=commit)
File "/opt/miniconda3/lib/python3.7/site-packages/peewee.py", line 2721, in execute_sql
self.commit()
File "/opt/miniconda3/lib/python3.7/site-packages/peewee.py", line 2512, in __exit__
reraise(new_type, new_type(*exc_args), traceback)
File "/opt/miniconda3/lib/python3.7/site-packages/peewee.py", line 186, in reraise
raise value.with_traceback(tb)
File "/opt/miniconda3/lib/python3.7/site-packages/peewee.py", line 2714, in execute_sql
cursor.execute(sql, params or ())
peewee.OperationalError: SSL error: decryption failed or bad record mac
Process finished with exit code 0
Somehow something goes wrong as soon as the second thread start to do something with the database. Does somebody has advice to get this code working. I have tried the following already
Try the PooledPostgresDatabase and normal PostgresqlDatabase to
connect to the database. This leads to the same error
Try using sqlite in stead of postgres. This works for 2 cores, but only if the two processes are not interfering too much; otherwise I
can some locking problems. I was in the impression that postgres
would be better for doing multiprocessing then sqlite (is that true?)
When putting a break after launching the first process(so effectively using only one core), the code works, showing that the start method is called correctly.
Hopefully somebody can advise.
Regards
Eelco

After some searching on the internet today I found the solution for my problem here:github.com/coleifer. As coleifer mentions: you apparently first have to set up all the forks before you start connecting to the database. Based on this idea I have modified my code and it is working now.
For those interested I will post my python scripts again so you can see how I did it. This because I there is not so much explicit examples out there, so perhaps it may help others.
First of all, all the database and peewee modules are now moved into initialization functions which are only called inside the constructor of the CompanyParser class.
So models.py looks like
import peewee as pw
from playhouse.pool import PooledPostgresqlExtDatabase, PostgresqlDatabase, PooledPostgresqlDatabase
KVK_KEY = "id_number"
NAME_KEY = "name"
N_VOWELS_KEY = "n_vowels"
def init_database():
db = PooledPostgresqlDatabase(
"testdb", user="postgres", host="localhost", port=5432, password="xxxxx",
max_connections=8, stale_timeout=300)
return db
def init_models(db, reset_tables=False):
class BaseModel(pw.Model):
class Meta:
database = db
# this class describes the format of the sql data base
class Company(BaseModel):
id_number = pw.IntegerField(primary_key=True)
name = pw.CharField(null=True)
n_vowels = pw.IntegerField(default=-1)
processor = pw.IntegerField(default=-1)
if db.is_closed():
db.connect()
if reset_tables and Company.table_exists():
db.drop_tables([Company])
db.create_tables([Company])
return Company
Then, the worker class CompanyParser is defined in the parser.py script and looks like this
import multiprocessing as mp
import random
import string
import numpy as np
import pandas as pd
import peewee as pw
from models import (KVK_KEY, NAME_KEY, init_database, init_models)
MAX_SQL_CHUNK = 1000
np.random.seed(0)
def random_name(size=32, chars=string.ascii_lowercase):
""" Create a random character string of 'size' characters """
return "".join(random.choice(chars) for _ in range(size))
def vowel_count(characters):
"""
Count the number of vowels in the string 'characters' and return as an integer
"""
count = 0
for char in characters:
if char in list("aeiou"):
count += 1
return count
class CompanyParser(mp.Process):
def __init__(self, reset_tables=False,
number_of_companies=100, i_proc=None,
number_of_procs=1, first_id=None, last_id=None):
if i_proc is not None and number_of_procs > 1:
mp.Process.__init__(self)
self.i_proc = i_proc
self.reset_tables = reset_tables
self.number_of_procs = number_of_procs
self.n_companies = number_of_companies
self.data_df: pd.DataFrame = None
self.first_id = first_id
self.last_id = last_id
# initialise the database and models
self.database = init_database()
self.Company = init_models(self.database, reset_tables=self.reset_tables)
def generate_data(self):
""" Create a dataframe with fake company data and id's and return the array of id's"""
id_list = np.random.randint(1000000, 9999999, self.n_companies)
company_list = np.array([random_name() for _ in range(self.n_companies)])
self.data_df = pd.DataFrame(data=np.vstack([id_list, company_list]).T,
columns=[KVK_KEY, NAME_KEY])
self.data_df.drop_duplicates([KVK_KEY], inplace=True)
self.data_df.sort_values([KVK_KEY], inplace=True)
return self.data_df[KVK_KEY].values
def store_to_database(self):
"""
Store the company data to a sql database
"""
record_list = list(self.data_df.to_dict(orient="index").values())
n_batch = int(len(record_list) / MAX_SQL_CHUNK) + 1
with self.database.atomic():
for cnt, batch in enumerate(pw.chunked(record_list, MAX_SQL_CHUNK)):
print(f"writing {cnt}/{n_batch}")
self.Company.insert_many(batch).execute()
def run(self):
query = (self.Company.
select().
where(self.Company.id_number.between(self.first_id, self.last_id)))
for cnt, company in enumerate(query):
print("Processing # {} - {}: company {}/{}".format(self.i_proc, cnt, company.id_number,
company.name))
number_of_vowels = vowel_count(company.name)
company.n_vowels = number_of_vowels
company.processor = self.i_proc
try:
company.save()
except (pw.OperationalError, pw.InterfaceError) as err:
print("failed save for {} {}: {}".format(self.i_proc, cnt, err))
else:
pass
Finally, the main.py script which launches the processes:
from parser import CompanyParser
import time
def main():
number_of_processors = 2
number_of_companies = 10000
parser = CompanyParser(number_of_companies=number_of_companies, reset_tables=True)
company_ids = parser.generate_data()
parser.store_to_database()
n_companies = company_ids.size
n_comp_per_proc = int(n_companies / number_of_processors)
print("Found {} companies: {} per proc".format(n_companies, n_comp_per_proc))
if not parser.database.is_closed():
parser.database.close()
processes = list()
for i_proc in range(number_of_processors):
i_start = i_proc * n_comp_per_proc
first_id = company_ids[i_start]
last_id = company_ids[i_start + n_comp_per_proc - 1]
print(f"Running proc {i_proc} for id {first_id} until id {last_id}")
sub_parser = CompanyParser(first_id=first_id, last_id=last_id, i_proc=i_proc,
number_of_procs=number_of_processors)
if number_of_processors > 1:
sub_parser.start()
else:
sub_parser.run()
processes.append(sub_parser)
# this blocks the script until all processes are done
for job in processes:
job.join()
# make sure all the connections are closed
for i_proc in range(number_of_processors):
db = processes[i_proc].database
if not db.is_closed():
db.close()
print("Goodbye!")
if __name__ == "__main__":
start = time.time()
main()
duration = time.time() - start
print(f"Done in {duration} s")
As you can see, the database connection is done per process inside the class.
This example works and is a full example of multiprocessing + peewee and PostgreSQL. Hopefully this may help others. In case you have any comments or suggestions for improvement please let me know.

I did get this error too but with flask + peewee + rq in Heroku. Below is how I solved it:
If you have a simple app that you use with RQ, I would suggest to use SimpleWorker
RQ suggest to use rq.worker.HerokuWorker but I still received a ssl error with this.
The error appeared in a case where I have created a follow-up(chain) tasks, where execution of 1 depends on another tasks success.
Also I am using flask-rq2 but applies to normal usage as well as shown below:
# app.py
app = Flask(__name__)
app.config['RQ_WORKER_CLASS'] = os.getenv('RQ_WORKER_CLASS', 'rq.worker.Worker')
rq = RQ(app)
I solved it by changing the following in heroku config:
set your RQ_WORKER_CLASS to rq.worker.SimpleWorker

SQLAlchemy ORM: Using func in update

I am trying to use func in an update statement, but I'm getting an InvalidRequest error (stacktrace below). Here's the code:
session.query(C).filter(
z == True,
).update({
C.x: func.sqrt((C.y1 + C.y2) * 0.675)
})
And here's the stacktrace:
File "/var/xxx/vagrant-env/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 3288, in update
update_op.exec_()
File "/var/xxx/vagrant-env/local/lib/python2.7/site-packages/sqlalchemy/orm/persistence.py", line 1167, in exec_
self._do_pre_synchronize()
File "/var/xxx/vagrant-env/local/lib/python2.7/site-packages/sqlalchemy/orm/persistence.py", line 1236, in _do_pre_synchronize
"Could not evaluate current criteria in Python. "
sqlalchemy.exc.InvalidRequestError: Could not evaluate current criteria in Python. Specify 'fetch' or False for the synchronize_session parameter.
I tried with both func.sqrt and func.round but they both give the error above. Any thoughts on what I'm doing wrong?

You have to specify synchronize_session strategy - False,'fetch' or implement evaluate. Just add it as a parameter, for example
session.query(C).filter(
z == True,
).update({
C.x: func.sqrt((C.y1 + C.y2) * 0.675)
}, synchronize_session='fetch')

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Alembic autogenerate not working for table modifications when run programmatically - python

Related

How to fill a column of type 'multi-select' in notion-py(Notion API)?

Pandoc Filter via Panflute not Working as Expected

Remove selectable properties from "Add Custom Filter"/"Add Custom Group" on ODOO Tree-View

python multiprocessing + peewee + postgresql fails with SSL error

SQLAlchemy ORM: Using func in update

Categories

Resources