I am running an example for learning The Model-View-Controller Pattern in python, but the code is giving an error. I tried to debug the code, but I couldn't find the main root/cause. Removing close connection system works, but what is the issue of the code? Can you advise me what is wrong?
# Filename: mvc.py
import sqlite3
import types
class DefectModel:
def getDefectList(self, component):
query = '''select ID from defects where Component = '%s' ''' %component
defectlist = self._dbselect(query)
list = []
for row in defectlist:
list.append(row[0])
return list
def getSummary(self, id):
query = '''select summary from defects where ID = '%d' ''' % id
summary = self._dbselect(query)
for row in summary:
return row[0]
def _dbselect(self, query):
connection = sqlite3.connect('example.db')
cursorObj = connection.cursor()
results = cursorObj.execute(query)
connection.commit()
cursorObj.close()
return results
class DefectView:
def summary(self, summary, defectid):
print("#### Defect Summary for defect# %d ####\n %s" % (defectid,summary) )
def defectList(self, list, category):
print("#### Defect List for %s ####\n" % category )
for defect in list:
print(defect )
class Controller:
def __init__(self): pass
def getDefectSummary(self, defectid):
model = DefectModel()
view = DefectView()
summary_data = model.getSummary(defectid)
return view.summary(summary_data, defectid)
def getDefectList(self, component):
model = DefectModel()
view = DefectView()
defectlist_data = model.getDefectList(component)
return view.defectList(defectlist_data, component)
This is related run.py.
#run.py
import mvc
controller = mvc.Controller()
# Displaying Summary for defect id # 2
print(controller.getDefectSummary(2))
# Displaying defect list for 'ABC' Component print
controller.getDefectList('ABC')
If you need to create the database, it is available here:
# Filename: datbase.py
import sqlite3
import types
# Create a database in RAM
db = sqlite3.connect('example.db')
# Get a cursor object
cursor = db.cursor()
cursor.execute("drop table defects")
cursor.execute("CREATE TABLE defects(id INTEGER PRIMARY KEY, Component TEXT, Summary TEXT)")
cursor.execute("INSERT INTO defects VALUES (1,'XYZ','File doesn‘t get deleted')")
cursor.execute("INSERT INTO defects VALUES (2,'XYZ','Registry doesn‘t get created')")
cursor.execute("INSERT INTO defects VALUES (3,'ABC','Wrong title gets displayed')")
# Save (commit) the changes
db.commit()
# We can also close the connection if we are done with it.
# Just be sure any changes have been committed or they will be lost.
db.close()
My error is as below:
> Windows PowerShell Copyright (C) Microsoft Corporation. All rights
> reserved.
>
> PS E:\Projects\test> & python e:/Projects/test/mvc.py
> Traceback (most recent call last): File
> "e:/Projects/test/mvc.py", line 56, in <module>
> import mvc File "e:\Projects\test\mvc.py", line 65, in <module>
> cursor.execute("drop table defects") sqlite3.OperationalError: no such table: defects PS E:\Projects\test> & python
> e:/Projects/ramin/mvc.py Traceback (most recent call last):
> File "e:/Projects/test/mvc.py", line 56, in <module>
> import mvc File "e:\Projects\test\mvc.py", line 80, in <module>
> print(controller.getDefectSummary(2)) File "e:\Projects\test\mvc.py", line 44, in getDefectSummary
> summary_data = model.getSummary(defectid) File "e:\Projects\test\mvc.py", line 18, in getSummary
> for row in summary: sqlite3.ProgrammingError: Cannot operate on a closed cursor. PS E:\Projects\test>
I suspect that the problem is this line: cursor.execute("drop table defects")
Maybe you dropped that table in a previous run, and since it's no longer there, sqlite3 raises an OperationalError exception.
In your code there is a comment that says that you are using an in-memory sqlite database, but you are not. This is how you create an in-memory db:
db = sqlite3.connect(:memory:)
If you use an in-memory db you don't need to drop anything, since you are creating the db on the fly when you run your script.
Note: last year I wanted to understand the MVC better, so I wrote a series of articles about it. Here is the one where I use SQLite as a storage backend for my Model.
Related
File: init.py
import sys
import time
import importlib
create_database = importlib.import_module("create-database")
from settings import *
create_database.import_tables_structure()
time.sleep(2)
import_settings()
time.sleep(2)
File: create-database.py
import mysql.connector
import sys
import os
import traceback
def import_tables_structure():
try:
cnx = mysql.connector.connect(user='root', password='',host='localhost')
cursor = cnx.cursor()
sql_file = open("papinhio-player.sql","r",encoding="utf-8").read()
cursor.execute(sql_file,multi=True)
cnx.commit()
cursor.close()
cnx.close()
except Exception as e:
error_message = traceback.format_exc()
print(error_message)
#make folders
try:
os.mkdir(os.path.abspath("C:\\Users\\chris\\My Projects\\papinhio-player\\disket-box\\reports\\history-report"))
os.mkdir(os.path.abspath("C:\\Users\\chris\\My Projects\\papinhio-player\\disket-box\\reports\\listeners-statistics-report"))
os.mkdir(os.path.abspath("C:\\Users\\chris\\My Projects\\papinhio-player\\disket-box\\reports\\schedule-transmition-report"))
os.mkdir(os.path.abspath("C:\\Users\\chris\\My Projects\\papinhio-player\\disket-box\\reports\\week-report"))
os.mkdir(os.path.abspath("C:\\Users\\chris\\My Projects\\papinhio-player\\disket-box\\records\\ip-calls"))
except:
pass
#import_tables_structure()
File: papinhio-player.sql
-- phpMyAdmin SQL Dump
-- version 5.1.1deb5ubuntu1
-- https://www.phpmyadmin.net/
--
-- Host: localhost:3306
-- Generation Time: Aug 02, 2022 at 07:57 PM
-- Server version: 10.6.7-MariaDB-2ubuntu1.1
-- PHP Version: 8.1.2
SET SQL_MODE = "";
START TRANSACTION;
--
-- Database: `papinhio-player`
--
DROP DATABASE IF EXISTS `papinhio-player`;
CREATE DATABASE IF NOT EXISTS `papinhio-player` COLLATE utf8_general_ci;
-- --------------------------------------------------------
-- Table: settings
-- Used for saving program settings
-- such as auto_dj value (0,1)
-- Easy table (only 3 columns)
CREATE TABLE `papinhio-player`.`settings` (
`id` int(11) NOT NULL,
`setting` varchar(255) NOT NULL,
`value` varchar(255) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
--
-- Indexes for table `settings`
--
ALTER TABLE `papinhio-player`.`settings`
ADD PRIMARY KEY (`id`),
ADD UNIQUE KEY `setting` (`setting`);
--
-- AUTO_INCREMENT for dumped tables
--
--
-- AUTO_INCREMENT for table `settings`
--
ALTER TABLE `papinhio-player`.`settings`
MODIFY `id` int(11) NOT NULL AUTO_INCREMENT;
--
-- Constraints for dumped tables
--
COMMIT;
File: settings.py
import os
import sys
import mysql.connector
def import_settings():
system_sound_volume = 50
#default_font = "Times New Roman"
default_font = "Calibri"
default_font_size = 18
default_font_color = "#000000"
default_background_color = "#F1F1F1"
default_buttons_background = "#F5F5F5"
default_buttons_font_color = "#000000"
#default_style = "WindowsVista"
default_style = "Fusion"
#default_custome_theme = "light_blue.xml"
default_custome_theme = ""
sql = """ INSERT INTO `settings` (`keyword`, `current_value`) VALUES ( ?, ?) """
settings = []
settings.append(("input_device_sound_volume",100))
settings.append(("input_device_normalize",0))
settings.append(("input_device_pan",0))
settings.append(("input_device_low_frequency",20))
settings.append(("input_device_high_frequency",20000))
settings.append(("general_deck_sound_volume",50))
settings.append(("general_deck_normalize",0))
settings.append(("general_deck_pan",0))
settings.append(("general_deck_low_frequency",20))
settings.append(("general_deck_high_frequency",20000))
settings.append(("player_list_display","Προβολή λίστας"))
settings.append(("deck_1_relative_type",""))
settings.append(("deck_1_relative_number",0))
settings.append(("deck_1_current_duration_milliseconds",0))
settings.append(("deck_1_repeats",0))
settings.append(("deck_1_status",0))
settings.append(("deck_1_total_time_milliseconds",0))
settings.append(("deck_2_relative_type",""))
settings.append(("deck_2_relative_number",0))
settings.append(("deck_2_current_duration_milliseconds",0))
settings.append(("deck_2_repeats",0))
settings.append(("deck_2_status",0))
settings.append(("deck_2_total_time_milliseconds",0))
settings.append(("music_clip_deck_relative_type",""))
settings.append(("music_clip_deck_relative_number",0))
settings.append(("music_clip_deck_current_duration_milliseconds",0))
settings.append(("music_clip_deck_repeats",0))
settings.append(("music_clip_deck_status",0))
settings.append(("music_clip_deck_total_time_milliseconds",0))
settings.append(("default_font",default_font))
settings.append(("default_font_size",default_font_size))
settings.append(("default_font_color",default_font_color))
settings.append(("default_background_color",default_background_color))
settings.append(("default_button_background",default_buttons_background))
settings.append(("default_button_font_color",default_buttons_font_color))
settings.append(("default_style",default_style))
settings.append(("default_custome_theme",default_custome_theme))
settings.append(("player_field_change_position","1"))
settings.append(("player_field_play","1"))
settings.append(("player_field_title","1"))
settings.append(("player_field_last_play","1"))
settings.append(("player_field_next_play","1"))
settings.append(("player_field_image","1"))
settings.append(("player_field_prepare","1"))
settings.append(("player_field_play_now","1"))
settings.append(("player_field_remove","1"))
settings.append(("player_field_duration","1"))
settings.append(("player_field_artist","1"))
settings.append(("player_field_album","1"))
settings.append(("player_field_author","1"))
settings.append(("player_field_composer","1"))
settings.append(("player_field_year","1"))
settings.append(("player_field_description","1"))
settings.append(("player_field_from","1"))
settings.append(("player_field_rating","1"))
settings.append(("player_field_volume","1"))
settings.append(("player_field_normalize","1"))
settings.append(("player_field_pan","1"))
settings.append(("player_field_frequencies","1"))
settings.append(("player_field_repeat","1"))
settings.append(("player_field_open_file","1"))
settings.append(("player_fade_in","0"))
settings.append(("player_fade_out","0"))
settings.append(("program_component_tool_bar","1"))
settings.append(("program_component_time_lines","1"))
settings.append(("program_component_general_deck","1"))
settings.append(("program_component_deck_1","1"))
settings.append(("program_component_deck_2","1"))
settings.append(("program_component_music_clip_deck","1"))
settings.append(("program_component_speackers_deck","1"))
settings.append(("program_component_ip_calls","0"))
settings.append(("program_component_player_list","1"))
settings.append(("program_component_web_sites","1"))
settings.append(("program_component_scheduled_transmitions","1"))
settings.append(("repeat_player_list","1"))
settings.append(("auto_dj","1"))
settings.append(("current-working-directory",os.path.abspath(".")))
cnx = mysql.connector.connect(user='root', password='',host='localhost',database='papinhio-player')
cursor = cnx.cursor()
cursor.execute("SET GLOBAL sql_mode='';")
cnx.commit()
cursor.execute("USE `papinhio-player`;")
cnx.commit()
for setting in settings:
sql = "INSERT INTO `papinhio-player`.`settings` (`setting`,`value`) VALUES (%s, %s);"
cursor.execute(sql,(str(setting[0]),str(setting[1])))
cnx.commit()
cursor.close()
cnx.close()
If i run init.py the output is:
Traceback (most recent call last):
File "C:\Users\chris\My Projects\papinhio-player\database\create-database.py",
line 12, in import_tables_structure
cnx.commit()
File "C:\python\lib\site-packages\mysql\connector\connection_cext.py", line 42
5, in commit
self._cmysql.commit()
_mysql_connector.MySQLInterfaceError: Commands out of sync; you can't run this c
ommand now
Traceback (most recent call last):
File "C:\python\lib\site-packages\mysql\connector\connection_cext.py", line 53
8, in cmd_query
query_attrs=self._query_attrs)
_mysql_connector.MySQLInterfaceError: Table 'papinhio-player.settings' doesn't e
xist
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "init_2.py", line 10, in <module>
import_settings()
File "C:\Users\chris\My Projects\papinhio-player\database\settings.py", line 1
66, in import_settings
cursor.execute(sql,(str(setting[0]),str(setting[1])))
File "C:\python\lib\site-packages\mysql\connector\cursor_cext.py", line 271, i
n execute
raw_as_string=self._raw_as_string)
File "C:\python\lib\site-packages\mysql\connector\connection_cext.py", line 54
1, in cmd_query
sqlstate=exc.sqlstate)
mysql.connector.errors.ProgrammingError: 1146 (42S02): Table 'papinhio-player.se
ttings' doesn't exist
the database and table structure are created, but no insertion to settings table.
Then if i run python settings.py the insertions are done.
What's the problem?
Note: I tried multi=True in multi-query but then there is no database build in phpmyadmin.
File: create-database.py
import mysql.connector
import sys
import os
import traceback
import re
def import_tables_structure():
try:
cnx = mysql.connector.connect(user='root', password='',host='localhost')
cursor = cnx.cursor()
exec_sql_file(cursor, "papinhio-player.sql")
except Exception as e:
error_message = traceback.format_exc()
print(error_message)
#make folders
try:
os.mkdir(os.path.abspath("C:\\Users\\chris\\My Projects\\papinhio-player\\disket-box\\reports\\history-report"))
os.mkdir(os.path.abspath("C:\\Users\\chris\\My Projects\\papinhio-player\\disket-box\\reports\\listeners-statistics-report"))
os.mkdir(os.path.abspath("C:\\Users\\chris\\My Projects\\papinhio-player\\disket-box\\reports\\schedule-transmition-report"))
os.mkdir(os.path.abspath("C:\\Users\\chris\\My Projects\\papinhio-player\\disket-box\\reports\\week-report"))
os.mkdir(os.path.abspath("C:\\Users\\chris\\My Projects\\papinhio-player\\disket-box\\records\\ip-calls"))
except:
pass
def exec_sql_file(cursor, sql_file):
statement = ""
for line in open(sql_file,"r",encoding="utf-8"):
if re.match(r'--', line): # ignore sql comment lines
continue
if not re.search(r';$', line): # keep appending lines that don't end in ';'
statement = statement + line
else: # when you get a line ending in ';' then exec statement and reset for next statement
statement = statement + line
#print "\n\n[DEBUG] Executing SQL statement:\n%s" % (statement)
try:
cursor.execute(statement)
except Exception as e:
print(e)
#print "\n[WARN] MySQLError during execute statement \n\tArgs: '%s'" % (str(e.args))
statement = ""
#import_tables_structure()
The problem may was that with one mysql execution there is no enough time to apply the changes in mysql.
In this answer i use a sql file parser to execution one single query at one execution.
Now the settings insertions are done with no mistake.
I'm working on automating some query extraction using python and pyodbc, and then converting to parquet format, and send to AWS S3.
My script solution is working fine so far, but I have faced a problem. I have a Schema, let us call it SCHEMA_A, and inside of it several tables, TABLE_1, TABLE_2 .... TABLE_N.
All those tables inside that schema are accessible by using the same credentials.
So I'm using a script like this one to automate the task.
def get_stream(cursor, batch_size=100000):
while True:
row = cursor.fetchmany(batch_size)
if row is None or not row:
break
yield row
cnxn = pyodbc.connect(driver='pyodbc driver here',
host='host name',
database='schema name',
user='user name,
password='password')
print('Connection stabilished ...')
cursor = cnxn.cursor()
print('Initializing cursos ...')
if len(sys.argv) > 1:
table_name = sys.argv[1]
cursor.execute('SELECT * FROM {}'.format(table_name))
else:
exit()
print('Query fetched ...')
row_batch = get_stream(cursor)
print('Getting Iterator ...')
cols = cursor.description
cols = [col[0] for col in cols]
print('Initalizin batch data frame ..')
df = pd.DataFrame(columns=cols)
start_time = time.time()
for rows in row_batch:
tmp = pd.DataFrame.from_records(rows, columns=cols)
df = df.append(tmp, ignore_index=True)
tmp = None
print("--- Batch inserted inn%s seconds ---" % (time.time() - start_time))
start_time = time.time()
I run a code similar to that inside Airflow tasks, and works just fine for all other tables. But then I have two tables, lets call TABLE_I and TABLE_II that yields the following error when I execute cursor.fetchmany(batch_size):
ERROR - ('ODBC SQL type -151 is not yet supported. column-index=16 type=-151', 'HY106')
Traceback (most recent call last):
File "/home/ubuntu/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1112, in _run_raw_task
self._prepare_and_execute_task_with_callbacks(context, task)
File "/home/ubuntu/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1285, in _prepare_and_execute_task_with_callbacks
result = self._execute_task(context, task_copy)
File "/home/ubuntu/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1310, in _execute_task
result = task_copy.execute(context=context)
File "/home/ubuntu/.local/lib/python3.8/site-packages/airflow/operators/python.py", line 117, in execute
return_value = self.execute_callable()
File "/home/ubuntu/.local/lib/python3.8/site-packages/airflow/operators/python.py", line 128, in execute_callable
return self.python_callable(*self.op_args, **self.op_kwargs)
File "/home/ubuntu/prea-ninja-airflow/jobs/plugins/extract/fetch.py", line 58, in fetch_data
for rows in row_batch:
File "/home/ubuntu/prea-ninja-airflow/jobs/plugins/extract/fetch.py", line 27, in stream
row = cursor.fetchmany(batch_size)
Inspecting those tables with SQLElectron, and Querying the first few lines, I have realized that both TABLE_I and TABLE_II have a Column called 'Geolocalizacao', when I use SQL server language to find the DATA TYPE of that column with:
SELECT DATA_TYPE
FROM INFORMATION_SCHEMA.COLUMNS
WHERE
TABLE_NAME = 'TABLE_I' AND
COLUMN_NAME = 'Geolocalizacao';
It yields:
DATA_TYPE
geography
Seraching here on stack overflow I found this solution: python pyodbc SQL Server Native Client 11.0 cannot return geometry column
By the description of the user, it seem work fine by adding:
def unpack_geometry(raw_bytes):
# adapted from SSCLRT information at
# https://learn.microsoft.com/en-us/openspecs/sql_server_protocols/ms-ssclrt/dc988cb6-4812-4ec6-91cd-cce329f6ecda
tup = struct.unpack('<i2b3d', raw_bytes)
# tup contains: (unknown, Version, Serialization_Properties, X, Y, SRID)
return tup[3], tup[4], tup[5]
and then:
cnxn.add_output_converter(-151, unpack_geometry)
After creating the connection. But It's not working for the GEOGRAPHY DATA TYPE, when I use this code (add import struct on python script), it gives me the following error:
Traceback (most recent call last):
File "benchmark.py", line 79, in <module>
for rows in row_batch:
File "benchmark.py", line 39, in get_stream
row = cursor.fetchmany(batch_size)
File "benchmark.py", line 47, in unpack_geometry
tup = struct.unpack('<i2b3d', raw_bytes)
struct.error: unpack requires a buffer of 30 bytes
An example of values that this column have, follows the given template:
{"srid":4326,"version":1,"points":[{}],"figures":[{"attribute":1,"pointOffset":0}],"shapes":[{"parentOffset":-1,"figureOffset":0,"type":1}],"segments":[]}
I honestly don't know how to adapt the code for this given structure, can someone help me? It's been working fine for all other tables, but I have those two tables with this column that are giving me a lot o headeach.
Hi this is what I have done:
from binascii import hexlify
def _handle_geometry(geometry_value):
return f"0x{hexlify(geometry_value).decode().upper()}"
and then on connection:
cnxn.add_output_converter(-151, _handle_geometry)
this will return value as SSMS.
i keep getting the error
Traceback (most recent call last):
File "C:\Users\max\AppData\Local\Programs\Python\Python37\lib\tkinter\__init__.py", line 1705, in __call__
return self.func(*args)
File "G:\computing project\add to db.py", line 114, in get_items
C.execute(sql ,vari)
sqlite3.OperationalError: table inventory has no column named car_make
when attempting to insert values into mydatabse taken from tkinter
i have tried changing the names of the columns in order to see if that would solve the issue bit nothing has changed, i am using db browser in order to edit my database
from tkinter import*
import sqlite3
conn = sqlite3.connect("G:\computing project\database of cars.db")
C = conn.cursor()
import tkinter.messagebox
def get_items(self,*args,**kwargs): #this function gets the items from the entry boxes
self.carmake= self.carmake_e.get()
self.carmodel= self.carmodel_e.get()
self.regi= self.regi_e.get()
self.colour= self.colour_e.get()
self.cost= self.cost_e.get()
self.tcost= self.tcost_e.get()
self.sellprice= self.sellprice_e.get()
self.assumedprofit= self.assumedprofit_e.get()
self.assumedprofit= float(self.sellprice)- float(self.tcost)
if self.carmake == '' or self.carmodel == '' == self.colour == '':
print ("WRONG")
tkinter.messagebox.showinfo("error","please enter values for car make, model and colour")
else:
print("solid m8 ")
sql = "INSERT INTO inventory(car_make,car_model,registration_plate,colour,cost,total_cost,selling_price,assumed_profit) VALUES (?,?,?,?,?,?,?)"
vari=(self.carmake,self.carmodel,self.regi,self.colour,self.cost,self.tcost,self.sellprice,self.assumedprofit)
C.execute(sql ,vari)
# C.execute(sql(self.name,self.carmake,self.carmodel,self.regi,self.colour,self.cost,self.tcost,self.sellprice,self.assumedprofit))
conn.commit()
tkinter.messagebox.showinfo("success","succesfully added to databse")
I am trying to write a Python model which is capable of doing some processing in a PostgreSQL database using the multi-threading module and peewee.
In single core mode the code works, however, when I try to run the code with multiple cores I am running into a SSL error.
I would like to post the structure of my model in the hope that somebody can advice how to set of my model in a proper way. Currently, I have chosen to use an object oriented approach in which I make one connection which is shared in a pool. To clarify what I have done, I will now show the source code I have so far
I have three files: main.py, models.py and parser.py. The contents is the following
models.py defines the peewee postgresql table and makes a connection to the postgres server
import peewee as pw
from playhouse.pool import PooledPostgresqlExtDatabase
KVK_KEY = "id_number"
NAME_KEY = "name"
N_VOWELS_KEY = "n_vowels"
# initialise the data base
database = PooledPostgresqlExtDatabase(
"testdb", user="postgres", host="localhost", port=5432, password="xxxx",
max_connections=8, stale_timeout=300 )
class BaseModel(pw.Model):
class Meta:
database = database
only_save_dirty = True
# this class describes the format of the sql data base
class Company(BaseModel):
id_number = pw.IntegerField(primary_key=True)
name = pw.CharField(null=True)
n_vowels = pw.IntegerField(default=-1)
processor = pw.IntegerField(default=-1)
def connect_database(database_name, reset_database=False):
""" connect the database """
database.connect()
if reset_database:
database.drop_tables([Company])
database.create_tables([Company])
parser.py contains the CompanyParser class which is used as the engine of the code to do all the processing. It generates some artificial data which is stored to the postgresql database and then the run method is used to do some processing with the data already stored in the database
import pandas as pd
import numpy as np
import random
import string
import peewee as pw
from models import (Company, database, KVK_KEY, NAME_KEY)
import multiprocessing as mp
MAX_SQL_CHUNK = 1000
np.random.seed(0)
def random_name(size=8, chars=string.ascii_lowercase):
""" Create a random character string of 'size' characters """
return "".join(random.choice(chars) for _ in range(size))
def vowel_count(characters):
"""
Count the number of vowels in the string 'characters' and return as an integer
"""
count = 0
for char in characters:
if char in list("aeiou"):
count += 1
return count
class CompanyParser(mp.Process):
def __init__(self, number_of_companies=100, i_proc=None,
number_of_procs=1,
first_id=None, last_id=None):
if i_proc is not None and number_of_procs > 1:
mp.Process.__init__(self)
self.i_proc = i_proc
self.number_of_procs = number_of_procs
self.n_companies = number_of_companies
self.data_df: pd.DataFrame = None
self.first_id = first_id
self.last_id = last_id
def generate_data(self):
""" Create a dataframe with fake company data and id's """
id_list = np.random.randint(1000000, 9999999, self.n_companies)
company_list = np.array([random_name() for _ in range(self.n_companies)])
self.data_df = pd.DataFrame(data=np.vstack([id_list, company_list]).T,
columns=[KVK_KEY, NAME_KEY])
self.data_df.sort_values([KVK_KEY], inplace=True)
def store_to_database(self):
"""
Store the company data to a sql database
"""
record_list = list(self.data_df.to_dict(orient="index").values())
n_batch = int(len(record_list) / MAX_SQL_CHUNK) + 1
with database.atomic():
for cnt, batch in enumerate(pw.chunked(record_list, MAX_SQL_CHUNK)):
print(f"writing {cnt}/{n_batch}")
Company.insert_many(batch).execute()
def run(self):
print("Making query at {}".format(self.i_proc))
query = (Company.
select().
where(Company.id_number.between(self.first_id, self.last_id)))
print("Found {} companies".format(query.count()))
for cnt, company in enumerate(query):
print("Processing # {} - {}: company {}/{}".format(self.i_proc, cnt,
company.id_number,
company.name))
number_of_vowels = vowel_count(company.name)
company.n_vowels = number_of_vowels
company.processor = self.i_proc
print(f"storing number of vowels: {number_of_vowels}")
company.save()
Finally, my main script load the class stored in the models.py and parser.py and launches the code.
from models import (Company, connect_database)
from parser import CompanyParser
number_of_processors = 2
connect_database(None, reset_database=True)
# init an object of the CompanyParser and use the create database
parser = CompanyParser()
company_ids = Company.select(Company.id_number)
parser.generate_data()
parser.store_to_database()
n_companies = company_ids.count()
n_comp_per_proc = int(n_companies / number_of_processors)
print("Found {} companies: {} per proc".format(n_companies, n_comp_per_proc))
for i_proc in range(number_of_processors):
i_start = i_proc * n_comp_per_proc
first_id = company_ids[i_start]
last_id = company_ids[i_start + n_comp_per_proc - 1]
print(f"Running proc {i_proc} for id {first_id} until id {last_id}")
sub_parser = CompanyParser(first_id=first_id, last_id=last_id,
i_proc=i_proc,
number_of_procs=number_of_processors)
if number_of_processors > 1:
sub_parser.start()
else:
sub_parser.run()
In case that the number_of_processors = 1 this script works perfectly fine. It generates artificial data, stores it to the PostgreSQL database and does some processing on the data (it counts the number of vowels in the name and stores it to the n_vowels column)
However, in case I am trying to run this with 2 cores with number_of_processors = 2, I run into the following error
/opt/miniconda3/bin/python /home/eelco/PycharmProjects/multiproc_peewee/main.py
writing 0/1
Found 100 companies: 50 per proc
Running proc 0 for id 1020737 until id 5295565
Running proc 1 for id 5302405 until id 9891087
Making query at 0
Found 50 companies
Processing # 0 - 0: company 1020737/wqrbgxiu
storing number of vowels: 2
Making query at 1
Process CompanyParser-1:
Processing # 0 - 1: company 1086107/lkbagrbc
storing number of vowels: 1
Processing # 0 - 2: company 1298367/nsdjsqio
storing number of vowels: 2
Traceback (most recent call last):
File "/opt/miniconda3/lib/python3.7/site-packages/peewee.py", line 2714, in execute_sql
cursor.execute(sql, params or ())
psycopg2.OperationalError: SSL error: sslv3 alert bad record mac
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/miniconda3/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
File "/home/eelco/PycharmProjects/multiproc_peewee/parser.py", line 82, in run
company.save()
File "/opt/miniconda3/lib/python3.7/site-packages/peewee.py", line 5748, in save
rows = self.update(**field_dict).where(self._pk_expr()).execute()
File "/opt/miniconda3/lib/python3.7/site-packages/peewee.py", line 1625, in inner
return method(self, database, *args, **kwargs)
File "/opt/miniconda3/lib/python3.7/site-packages/peewee.py", line 1696, in execute
return self._execute(database)
File "/opt/miniconda3/lib/python3.7/site-packages/peewee.py", line 2121, in _execute
cursor = database.execute(self)
File "/opt/miniconda3/lib/python3.7/site-packages/playhouse/postgres_ext.py", line 468, in execute
cursor = self.execute_sql(sql, params, commit=commit)
File "/opt/miniconda3/lib/python3.7/site-packages/peewee.py", line 2721, in execute_sql
self.commit()
File "/opt/miniconda3/lib/python3.7/site-packages/peewee.py", line 2512, in __exit__
reraise(new_type, new_type(*exc_args), traceback)
File "/opt/miniconda3/lib/python3.7/site-packages/peewee.py", line 186, in reraise
raise value.with_traceback(tb)
File "/opt/miniconda3/lib/python3.7/site-packages/peewee.py", line 2714, in execute_sql
cursor.execute(sql, params or ())
peewee.OperationalError: SSL error: sslv3 alert bad record mac
Process CompanyParser-2:
Traceback (most recent call last):
File "/opt/miniconda3/lib/python3.7/site-packages/peewee.py", line 2714, in execute_sql
cursor.execute(sql, params or ())
psycopg2.OperationalError: SSL error: decryption failed or bad record mac
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/miniconda3/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
File "/home/eelco/PycharmProjects/multiproc_peewee/parser.py", line 72, in run
print("Found {} companies".format(query.count()))
File "/opt/miniconda3/lib/python3.7/site-packages/peewee.py", line 1625, in inner
return method(self, database, *args, **kwargs)
File "/opt/miniconda3/lib/python3.7/site-packages/peewee.py", line 1881, in count
return Select([clone], [fn.COUNT(SQL('1'))]).scalar(database)
File "/opt/miniconda3/lib/python3.7/site-packages/peewee.py", line 1625, in inner
return method(self, database, *args, **kwargs)
File "/opt/miniconda3/lib/python3.7/site-packages/peewee.py", line 1866, in scalar
row = self.tuples().peek(database)
File "/opt/miniconda3/lib/python3.7/site-packages/peewee.py", line 1625, in inner
return method(self, database, *args, **kwargs)
File "/opt/miniconda3/lib/python3.7/site-packages/peewee.py", line 1853, in peek
rows = self.execute(database)[:n]
File "/opt/miniconda3/lib/python3.7/site-packages/peewee.py", line 1625, in inner
return method(self, database, *args, **kwargs)
File "/opt/miniconda3/lib/python3.7/site-packages/peewee.py", line 1696, in execute
return self._execute(database)
File "/opt/miniconda3/lib/python3.7/site-packages/peewee.py", line 1847, in _execute
cursor = database.execute(self)
File "/opt/miniconda3/lib/python3.7/site-packages/playhouse/postgres_ext.py", line 468, in execute
cursor = self.execute_sql(sql, params, commit=commit)
File "/opt/miniconda3/lib/python3.7/site-packages/peewee.py", line 2721, in execute_sql
self.commit()
File "/opt/miniconda3/lib/python3.7/site-packages/peewee.py", line 2512, in __exit__
reraise(new_type, new_type(*exc_args), traceback)
File "/opt/miniconda3/lib/python3.7/site-packages/peewee.py", line 186, in reraise
raise value.with_traceback(tb)
File "/opt/miniconda3/lib/python3.7/site-packages/peewee.py", line 2714, in execute_sql
cursor.execute(sql, params or ())
peewee.OperationalError: SSL error: decryption failed or bad record mac
Process finished with exit code 0
Somehow something goes wrong as soon as the second thread start to do something with the database. Does somebody has advice to get this code working. I have tried the following already
Try the PooledPostgresDatabase and normal PostgresqlDatabase to
connect to the database. This leads to the same error
Try using sqlite in stead of postgres. This works for 2 cores, but only if the two processes are not interfering too much; otherwise I
can some locking problems. I was in the impression that postgres
would be better for doing multiprocessing then sqlite (is that true?)
When putting a break after launching the first process(so effectively using only one core), the code works, showing that the start method is called correctly.
Hopefully somebody can advise.
Regards
Eelco
After some searching on the internet today I found the solution for my problem here:github.com/coleifer. As coleifer mentions: you apparently first have to set up all the forks before you start connecting to the database. Based on this idea I have modified my code and it is working now.
For those interested I will post my python scripts again so you can see how I did it. This because I there is not so much explicit examples out there, so perhaps it may help others.
First of all, all the database and peewee modules are now moved into initialization functions which are only called inside the constructor of the CompanyParser class.
So models.py looks like
import peewee as pw
from playhouse.pool import PooledPostgresqlExtDatabase, PostgresqlDatabase, PooledPostgresqlDatabase
KVK_KEY = "id_number"
NAME_KEY = "name"
N_VOWELS_KEY = "n_vowels"
def init_database():
db = PooledPostgresqlDatabase(
"testdb", user="postgres", host="localhost", port=5432, password="xxxxx",
max_connections=8, stale_timeout=300)
return db
def init_models(db, reset_tables=False):
class BaseModel(pw.Model):
class Meta:
database = db
# this class describes the format of the sql data base
class Company(BaseModel):
id_number = pw.IntegerField(primary_key=True)
name = pw.CharField(null=True)
n_vowels = pw.IntegerField(default=-1)
processor = pw.IntegerField(default=-1)
if db.is_closed():
db.connect()
if reset_tables and Company.table_exists():
db.drop_tables([Company])
db.create_tables([Company])
return Company
Then, the worker class CompanyParser is defined in the parser.py script and looks like this
import multiprocessing as mp
import random
import string
import numpy as np
import pandas as pd
import peewee as pw
from models import (KVK_KEY, NAME_KEY, init_database, init_models)
MAX_SQL_CHUNK = 1000
np.random.seed(0)
def random_name(size=32, chars=string.ascii_lowercase):
""" Create a random character string of 'size' characters """
return "".join(random.choice(chars) for _ in range(size))
def vowel_count(characters):
"""
Count the number of vowels in the string 'characters' and return as an integer
"""
count = 0
for char in characters:
if char in list("aeiou"):
count += 1
return count
class CompanyParser(mp.Process):
def __init__(self, reset_tables=False,
number_of_companies=100, i_proc=None,
number_of_procs=1, first_id=None, last_id=None):
if i_proc is not None and number_of_procs > 1:
mp.Process.__init__(self)
self.i_proc = i_proc
self.reset_tables = reset_tables
self.number_of_procs = number_of_procs
self.n_companies = number_of_companies
self.data_df: pd.DataFrame = None
self.first_id = first_id
self.last_id = last_id
# initialise the database and models
self.database = init_database()
self.Company = init_models(self.database, reset_tables=self.reset_tables)
def generate_data(self):
""" Create a dataframe with fake company data and id's and return the array of id's"""
id_list = np.random.randint(1000000, 9999999, self.n_companies)
company_list = np.array([random_name() for _ in range(self.n_companies)])
self.data_df = pd.DataFrame(data=np.vstack([id_list, company_list]).T,
columns=[KVK_KEY, NAME_KEY])
self.data_df.drop_duplicates([KVK_KEY], inplace=True)
self.data_df.sort_values([KVK_KEY], inplace=True)
return self.data_df[KVK_KEY].values
def store_to_database(self):
"""
Store the company data to a sql database
"""
record_list = list(self.data_df.to_dict(orient="index").values())
n_batch = int(len(record_list) / MAX_SQL_CHUNK) + 1
with self.database.atomic():
for cnt, batch in enumerate(pw.chunked(record_list, MAX_SQL_CHUNK)):
print(f"writing {cnt}/{n_batch}")
self.Company.insert_many(batch).execute()
def run(self):
query = (self.Company.
select().
where(self.Company.id_number.between(self.first_id, self.last_id)))
for cnt, company in enumerate(query):
print("Processing # {} - {}: company {}/{}".format(self.i_proc, cnt, company.id_number,
company.name))
number_of_vowels = vowel_count(company.name)
company.n_vowels = number_of_vowels
company.processor = self.i_proc
try:
company.save()
except (pw.OperationalError, pw.InterfaceError) as err:
print("failed save for {} {}: {}".format(self.i_proc, cnt, err))
else:
pass
Finally, the main.py script which launches the processes:
from parser import CompanyParser
import time
def main():
number_of_processors = 2
number_of_companies = 10000
parser = CompanyParser(number_of_companies=number_of_companies, reset_tables=True)
company_ids = parser.generate_data()
parser.store_to_database()
n_companies = company_ids.size
n_comp_per_proc = int(n_companies / number_of_processors)
print("Found {} companies: {} per proc".format(n_companies, n_comp_per_proc))
if not parser.database.is_closed():
parser.database.close()
processes = list()
for i_proc in range(number_of_processors):
i_start = i_proc * n_comp_per_proc
first_id = company_ids[i_start]
last_id = company_ids[i_start + n_comp_per_proc - 1]
print(f"Running proc {i_proc} for id {first_id} until id {last_id}")
sub_parser = CompanyParser(first_id=first_id, last_id=last_id, i_proc=i_proc,
number_of_procs=number_of_processors)
if number_of_processors > 1:
sub_parser.start()
else:
sub_parser.run()
processes.append(sub_parser)
# this blocks the script until all processes are done
for job in processes:
job.join()
# make sure all the connections are closed
for i_proc in range(number_of_processors):
db = processes[i_proc].database
if not db.is_closed():
db.close()
print("Goodbye!")
if __name__ == "__main__":
start = time.time()
main()
duration = time.time() - start
print(f"Done in {duration} s")
As you can see, the database connection is done per process inside the class.
This example works and is a full example of multiprocessing + peewee and PostgreSQL. Hopefully this may help others. In case you have any comments or suggestions for improvement please let me know.
I did get this error too but with flask + peewee + rq in Heroku. Below is how I solved it:
If you have a simple app that you use with RQ, I would suggest to use SimpleWorker
RQ suggest to use rq.worker.HerokuWorker but I still received a ssl error with this.
The error appeared in a case where I have created a follow-up(chain) tasks, where execution of 1 depends on another tasks success.
Also I am using flask-rq2 but applies to normal usage as well as shown below:
# app.py
app = Flask(__name__)
app.config['RQ_WORKER_CLASS'] = os.getenv('RQ_WORKER_CLASS', 'rq.worker.Worker')
rq = RQ(app)
I solved it by changing the following in heroku config:
set your RQ_WORKER_CLASS to rq.worker.SimpleWorker
I have a problem with a Python 2.7 project.
I'm trying to set a variable to a value retrieved from an sqlite3 database, but I'm having trouble. Here is my code thus far, and the error I'm receiving. Yes, the connection opens just fine, and the table, columns, and indicated row are there as they should be.
import sqlite3 import Trailcrest
conn = sqlite3.connect('roster.paw')
c = conn.cursor()
def Lantern(AI):
"""Pulls all of the data for the selected user."""
Trailcrest.FireAutoHelp = c.execute("""select fireautohelp
from roster
where index = ?;""", (AI,) ).fetchall()
The error is:
> Traceback (most recent call last):
> File "<pyshell#4>", line 1, in <module> Lantern(1)
> File "C:\Users\user\MousePaw Games\Word4Word\PYM\Glyph.py", line 20,
> in Lantern
> Trailcrest.FireAutoHelp = c.execute("""select fireautohelp from roster where index = ?;""", (AI,)).fetchall()
> OperationalError: near "index": syntax error
As Thomas K mentions in a comment, index is a SQL keyword.
You can either rename that column, or enclose in backticks:
Trailcrest.FireAutoHelp = c.execute("""select fireautohelp
from roster
where `index` = ?;""", (AI,) ).fetchall()