I have Clickhouse version and python3 installed on a vm stress testing Cache dictionaries. After a certain number of entries the query using clickhouse_driver, I'll get the error
Unexpected EOF while reading bytes
Is this an error due to the driver/python related or due to the cache being maxed on the system. For example this happens on a file size 203 columns and 10000 rows on a machine with 32Gb of RAM and 256Gb of SSD memory, a csv file of around 66Mb which seems quite small for such an error. The query I'm running is:
dictGet('CacheDictionary', 'date', toUInt64(number)) AS date,
SUM(dictGet('CacheDictionary', 'filterColumn', toUInt64(number))) AS val,
AVG(dictGet('CacheDictionary', 'filterColumn', toUInt64(number))) AS avg
FROM numbers(1, 10000)
An example entry of the csv file is:
I've posted part of the code trying to find the maximum number of cache items stored, along with the queries executed for each. In selectBenchmark the string correspond to the query above. The parameters for each are fairly self explanatory (the xmlFile is the dictionary created in /etc/lib/clickhouse-server).
def cacheMaxItems(csvRead, xmlFile, benchmarkType, columnStepSize, rowStepSize):
maxCache = []
os.system('rm -f ' + csvRead)
os.system('bash /root/restartCH.sh')
for j in range(1, 13):
outputCSV = '/root/results' + benchmarkType + '/cacheResults' + str(j*columnStepSize) + '.csv'
with open(outputCSV, 'w') as fp:
wr = csv.writer(fp)
wr.writerow([benchmarkType + ': Number of rows', 'Loading time', 'Mean', 'Variance', 'Skewness', 'Number of Columns: ' + str(j*columnStepSize)])
for i in range(1, 10000):
if i%5 == 0:
os.system('bash /root/restartCH.sh')
createCSV(10000, j*columnStepSize, csvRead)
clickhouseDictionary(rowStepSize*i*j*columnStepSize, j*columnStepSize, xmlFile, csvRead, 'Cache')
if benchmarkType == 'Random':
results = selectBenchmark(i*rowStepSize, j*columnStepSize, 'Random', 'Cache')
elif benchmarkType == 'Consecutive':
results = selectBenchmark(i*rowStepSize, j*columnStepSize, 'Consecutive', 'Cache')
elif benchmarkType == 'CPU':
results = selectBenchmark(i*rowStepSize, j*columnStepSize, 'CPU', 'Cache')
results.insert(0, i*rowStepSize)
with open(outputCSV, 'a') as fp:
wr = csv.writer(fp)
print('Successfully loaded and queried cache of size ' + str(rowStepSize*i*j*columnStepSize) + '.')
except Exception as ex:
os.system('rm -f ' + csvRead)
os.system('bash /root/restartCH.sh')
maxCache.append([j*columnStepSize, (i-1)*rowStepSize])
return maxCache
def selectBenchmark(numberOfRows, numberOfColumns, benchmarkType, dictType):
client = Client('localhost', port=9000, database='system')
client.execute('SYSTEM RELOAD DICTIONARY ' + dictType + 'Dictionary')
loadingTime = client.last_query.elapsed
client.execute('SELECT dictGet(\'' + dictType + 'Dictionary\', \'random0\', toUInt64(1))', query_id=str(uuid.uuid4()))
loadingTime += client.last_query.elapsed
loop = True
counter = 0
while loop:
times = []
for i in range(0, 31):
query_id = str(uuid.uuid4())
string = stringGen(numberOfRows, numberOfColumns, benchmarkType, dictType)
client.execute(string, query_id = query_id)
if max(times) > loadingTime:
loadingTime = max(times)
stats = transformedMLE(times)
redactedTimes = [x for x in times if (stats[0]-3*np.sqrt(stats[1])) < x < (stats[0]+3*np.sqrt(stats[1]))]
if len(times) - len(redactedTimes) <= 3:
loop = False
elif j > 15:
print('High variance query')
loop = False
result = transformedMLE(redactedTimes)
loadingTime = loadingTime - result[0]
result.insert(0, loadingTime)
return result
The restartCH.sh file is
service clickhouse-server forcerestart
as the cache overflow often blocks the restart command.
There is no output to the server error logs indicating that this is a problem with the python driver, perhaps reading the large amounts of data being returned. I also get the 'Killed' python output which also points towards cache issues, which is to be expected as I'm benchmarking cache dictionaries.
Unexpected EOF while reading bytes -- it's python driver error.
Check clickhouse-server.log for real error. is out support , please upgrade to
I was running into a similar problem on Ubuntu when starting the server binary directly using "2>&1 /dev/null &" to suppress the output from stderr and stdout to /dev/null, Python driver was throwing the error but server would still be working when connecting via the clickhouse client binary command-line. Issue was resolved by tweaking the server startup script to just redirect stderr with " 2> /dev/null &" (referring to https://www.baeldung.com/linux/silencing-bash-output difference between using 2> and 2>&1).
I'm trying to run my code with a multiprocessing function but mongo keep returning
"MongoClient opened before fork. Create MongoClient with
connect=False, or create client after forking."
I really doesn't understand how i can adapt my code to this.
Basically the structure is:
db = MongoClient().database
db.authenticate('user', 'password', mechanism='SCRAM-SHA-1')
collectionW = db['words']
collectionT = db['sinMemo']
collectionL = db['sinLogic']
def findW(word):
rows = collectionw.find({"word": word})
ind = 0
for row in rows:
ind += 1
id = row["_id"]
if ind == 0:
a = ind
a = id
return a
def trainAI(stri):
if findW(word) == 0:
_id = db['words'].insert(
{"_id": getNextSequence(db.counters, "nodeid"), "word": word})
story = _id
story = findW(word)
def train(index):
# searching progress
progFile = "./train/progress{0}.txt".format(index)
trainFile = "./train/small_file_{0}".format(index)
if os.path.exists(progFile):
f = open(progFile, "r")
ind = f.read().strip()
if ind != "":
i = int(ind)
pprint("No progress saved or progress lost!")
i = 0
i = 0
#get the number of line of the file
rangeC = rawbigcount(trainFile)
#fix unicode
non_bmp_map = dict.fromkeys(range(0x10000, sys.maxunicode + 1), 0xfffd)
files = io.open(trainFile, "r", encoding="utf8")
str1 = ""
str2 = ""
filex = open(progFile, "w")
with progressbar.ProgressBar(max_value=rangeC) as bar:
for line in files:
line = line.replace("\n", "")
if i % 2 == 0:
str1 = line.translate(non_bmp_map)
str2 = line.translate(non_bmp_map)
trainAI(str1 + " " + str2)
i += 1
#multiprocessing function
maxProcess = 3
def f(l, i):
train(i + 1)
if __name__ == '__main__':
lock = Lock()
for num in range(maxProcess):
pprint("start " + str(num))
Process(target=f, args=(lock, num)).start()
This code is made for reading 4 different file in 4 different process and at the same time insert the data in the database.
I copied only part of the code for make you understand the structure of it.
I've tried to add connect=False to this code but nothing...
db = MongoClient(connect=False).database
db.authenticate('user', 'password', mechanism='SCRAM-SHA-1')
collectionW = db['words']
collectionT = db['sinMemo']
collectionL = db['sinLogic']
then i've tried to move it in the f function (right before train() but what i get is that the program doesn't find collectionW,collectionT and collectionL.
I'm not very expert of python or mongodb so i hope that this is not a silly question.
The code is running under Ubuntu 16.04.2 with python 2.7.12
db.authenticate will have to connect to mongo server and it will try to make a connection. So, even though connect=False is being used, db.authenticate will require a connection to be open.
Why don't you create the mongo client instance after fork? That's look like the easiest solution.
Since db.authenticate must open the MongoClient and connect to the server, it creates connections which won't work in the forked subprocess. Hence, the error message. Try this instead:
db = MongoClient('mongodb://user:password#localhost', connect=False).database
Also, delete the Lock l. Acquiring a lock in one subprocess has no effect on other subprocesses.
Here is how I did it for my problem:
import pathos.pools as pp
import time
import db_access
class MultiprocessingTest(object):
def __init__(self):
def test_mp(self):
data = [[form,'form_number','client_id'] for form in range(5000)]
pool = pp.ProcessPool(4)
pool.map(db_access.insertData, data)
if __name__ == '__main__':
time_i = time.time()
mp = MultiprocessingTest()
time_f = time.time()
print 'Time Taken: ', time_f - time_i
Here is db_access.py:
from pymongo import MongoClient
def insertData(form):
client = MongoClient()
db = client['TEST_001']
"form": form[0],
"form_number": form[1],
"client_id": form[2]
This is happening to your code because you are initiating MongoCLient() once for all the sub-processes. MongoClient is not fork safe. So, initiating inside each function works and let me know if there are other solutions.
I am writing a program that requires the use of XMODEM to transfer data from a sensor device. I'd like to avoid having to write my own XMODEM code, so I was wondering if anyone knew if there was a python XMODEM module available anywhere?
def xmodem_send(serial, file):
t, anim = 0, '|/-\\'
while 1:
if serial.read(1) != NAK:
t = t + 1
print anim[t%len(anim)],'\r',
if t == 60 : return False
p = 1
s = file.read(128)
while s:
s = s + '\xFF'*(128 - len(s))
chk = 0
for c in s:
while 1:
serial.write(chr(255 - p))
answer = serial.read(1)
if answer == NAK: continue
if answer == ACK: break
return False
s = file.read(128)
p = (p + 1)%256
print '.',
return True
There is XMODEM module on PyPi. It handles both sending and receiving of data with XModem. Below is sample of its usage:
import serial
from cStringIO import StringIO
from StringIO import StringIO
from xmodem import XMODEM, NAK
from time import sleep
def readUntil(char = None):
def serialPortReader():
while True:
tmp = port.read(1)
if not tmp or (char and char == tmp):
yield tmp
return ''.join(serialPortReader())
def getc(size, timeout=1):
return port.read(size)
def putc(data, timeout=1):
sleep(0.001) # give device time to prepare new buffer and start sending it
port = serial.Serial(port='COM5',parity=serial.PARITY_NONE,bytesize=serial.EIGHTBITS,stopbits=serial.STOPBITS_ONE,timeout=0,xonxoff=0,rtscts=0,dsrdtr=0,baudrate=115200)
port.write("command that initiates xmodem send from device\r\n")
sleep(0.02) # give device time to handle command and start sending response
buffer = StringIO()
XMODEM(getc, putc).recv(buffer, crc_mode = 0, quiet = 1)
contents = buffer.getvalue()
I think you’re stuck with rolling your own.
You might be able to use sz, which implements X/Y/ZMODEM. You could call out to the binary, or port the necessary code to Python.
Here is a link to XMODEM documentation that will be useful if you have to write your own. It has detailed description of the original XMODEM, XMODEM-CRC and XMODEM-1K.
You might also find this c-code of interest.
You can try using SWIG to create Python bindings for the C libraries linked above (or any other C/C++ libraries you find online). That will allow you to use the same C API directly from Python.
The actual implementation will of course still be in C/C++, since SWIG merely creates bindings to the functions of interest.
There is a python module that you can use -> https://pypi.python.org/pypi/xmodem
You can see the transfer protocol in http://pythonhosted.org//xmodem/xmodem.html