Error shows up when using df.to_parquet("filename") - python

I want to save the data set as a parquet file, called power.parquet, and I use df.to_parquet(<filename>). But it gives me this errer "ValueError: Error converting column "Global_reactive_power" to bytes using encoding UTF8. Original error: bad argument type for built-in operation" And I installed the fastparquet package.
from fastparquet import write, ParquetFile
dat.to_parquet("power.parquet")
df_parquet = ParquetFile("power.parquet").to_pandas()
df_parquet.head() # Test your final value
`*Traceback (most recent call last):
File "/opt/anaconda3/lib/python3.9/site-packages/fastparquet/writer.py", line 259, in convert
out = array_encode_utf8(data)
File "fastparquet/speedups.pyx", line 50, in fastparquet.speedups.array_encode_utf8
TypeError: bad argument type for built-in operation
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/var/folders/4f/bm2th1p56tz4rq_zffc8g3940000gn/T/ipykernel_85477/3080656655.py", line 1, in <module>
dat.to_parquet("power.parquet", compression="GZIP")
File "/opt/anaconda3/lib/python3.9/site-packages/dask/dataframe/core.py", line 4560, in to_parquet
return to_parquet(self, path, *args, **kwargs)
File "/opt/anaconda3/lib/python3.9/site-packages/dask/dataframe/io/parquet/core.py", line 732, in to_parquet
return compute_as_if_collection(
File "/opt/anaconda3/lib/python3.9/site-packages/dask/base.py", line 315, in compute_as_if_collection
return schedule(dsk2, keys, **kwargs)
File "/opt/anaconda3/lib/python3.9/site-packages/dask/threaded.py", line 79, in get
results = get_async(
File "/opt/anaconda3/lib/python3.9/site-packages/dask/local.py", line 507, in get_async
raise_exception(exc, tb)
File "/opt/anaconda3/lib/python3.9/site-packages/dask/local.py", line 315, in reraise
raise exc
File "/opt/anaconda3/lib/python3.9/site-packages/dask/local.py", line 220, in execute_task
result = _execute_task(task, data)
File "/opt/anaconda3/lib/python3.9/site-packages/dask/core.py", line 119, in _execute_task
return func(*(_execute_task(a, cache) for a in args))
File "/opt/anaconda3/lib/python3.9/site-packages/dask/utils.py", line 35, in apply
return func(*args, **kwargs)
File "/opt/anaconda3/lib/python3.9/site-packages/dask/dataframe/io/parquet/fastparquet.py", line 1167, in write_partition
rg = make_part_file(
File "/opt/anaconda3/lib/python3.9/site-packages/fastparquet/writer.py", line 716, in make_part_file
rg = make_row_group(f, data, schema, compression=compression,
File "/opt/anaconda3/lib/python3.9/site-packages/fastparquet/writer.py", line 701, in make_row_group
chunk = write_column(f, coldata, column,
File "/opt/anaconda3/lib/python3.9/site-packages/fastparquet/writer.py", line 554, in write_column
repetition_data, definition_data, encode[encoding](data, selement), 8 * b'\x00'
File "/opt/anaconda3/lib/python3.9/site-packages/fastparquet/writer.py", line 354, in encode_plain
out = convert(data, se)
File "/opt/anaconda3/lib/python3.9/site-packages/fastparquet/writer.py", line 284, in convert
raise ValueError('Error converting column "%s" to bytes using '
ValueError: Error converting column "Global_reactive_power" to bytes using encoding UTF8. Original error: bad argument type for built-in operation
*
I tried by adding object_coding = "bytes".I want to solve this problem.

Related

File contains data in an unknown format. (m4a load from librosa)

So I am currently working on a DNN that takes in m4a files. I have ffmpeg, it creates a few batches and then dies on this error:
Traceback (most recent call last):
File "/users/work/s163838/./main.py", line 126, in <module>
File "/users/work/s163838/./main.py", line 96, in main
print("e")
File "/apl/tryton/python/3.9.5/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 521, in __next__
data = self._next_data()
File "/apl/tryton/python/3.9.5/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1203, in _next_data
return self._process_data(data)
File "/apl/tryton/python/3.9.5/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1229, in _process_data
data.reraise()
File "/apl/tryton/python/3.9.5/lib/python3.9/site-packages/torch/_utils.py", line 425, in reraise
raise self.exc_type(msg)
EOFError: Caught EOFError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/users/kdm/s163838/.local/lib/python3.9/site-packages/librosa/core/audio.py", line 164, in load
y, sr_native = __soundfile_load(path, offset, duration, dtype)
File "/users/kdm/s163838/.local/lib/python3.9/site-packages/librosa/core/audio.py", line 195, in __soundfile_load
context = sf.SoundFile(path)
File "/users/kdm/s163838/.local/lib/python3.9/site-packages/soundfile.py", line 629, in __init__
self._file = self._open(file, mode_int, closefd)
File "/users/kdm/s163838/.local/lib/python3.9/site-packages/soundfile.py", line 1183, in _open
_error_check(_snd.sf_error(file_ptr),
File "/users/kdm/s163838/.local/lib/python3.9/site-packages/soundfile.py", line 1357, in _error_check
raise RuntimeError(prefix + _ffi.string(err_str).decode('utf-8', 'replace'))
RuntimeError: Error opening 'vox2/dev/aac/id08194/QnBYPze-x9A/00079.m4a': File contains data in an unknown format.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/apl/tryton/python/3.9.5/lib/python3.9/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
data = fetcher.fetch(index)
File "/apl/tryton/python/3.9.5/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/apl/tryton/python/3.9.5/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/users/work/s163838/vox_celeb_loader.py", line 53, in __getitem__
load(speaker2utt1, self.num_samples)
File "/users/work/s163838/vox_celeb_loader.py", line 13, in load
wav, sr = librosa.load(path, sr=16000)
File "/users/kdm/s163838/.local/lib/python3.9/site-packages/librosa/util/decorators.py", line 88, in inner_f
return f(*args, **kwargs)
File "/users/kdm/s163838/.local/lib/python3.9/site-packages/librosa/core/audio.py", line 170, in load
y, sr_native = __audioread_load(path, offset, duration, dtype)
File "/users/kdm/s163838/.local/lib/python3.9/site-packages/librosa/core/audio.py", line 226, in __audioread_load
reader = audioread.audio_open(path)
File "/users/kdm/s163838/.local/lib/python3.9/site-packages/audioread/__init__.py", line 111, in audio_open
return BackendClass(path)
File "/users/kdm/s163838/.local/lib/python3.9/site-packages/audioread/rawread.py", line 65, in __init__
self._file = aifc.open(self._fh)
File "/apl/tryton/python/3.9.5/lib/python3.9/aifc.py", line 917, in open
return Aifc_read(f)
File "/apl/tryton/python/3.9.5/lib/python3.9/aifc.py", line 358, in __init__
self.initfp(f)
File "/apl/tryton/python/3.9.5/lib/python3.9/aifc.py", line 314, in initfp
chunk = Chunk(file)
File "/apl/tryton/python/3.9.5/lib/python3.9/chunk.py", line 63, in __init__
raise EOFError
EOFError
I am using this command
wav, sr = librosa.load(path, sr=16000)
is it just a broken file? How do I skip such then? Or is it something about loading a m4a file even with ffmpeg and the desired output when tested on a single m4a file?

Change the wrapper object to a different class in pywinauto

I'm trying to set a value on this dropdown but am having trouble doing so. I've tried using the .select(index) method and that doesn't do anything. I've tried doing .typekeys("{DOWN 2}"), which usually works, but I think because another dropdown is selected before hand or something, it's not working. My usual workaround for this is to do .expand() and then .type_keys("{DOWN 2}"), and then .type_keys("{ENTER}"). I can't do this last workaround, because the control is not being wrapped as a combobox, as it should be.
Is there a way I can change it's wrapper? I tried:
from pywinauto import controls
test = controls.uia_controls.ComboBoxWrapper(formJobStock.child_window(auto_id='cboStockSize'))
test.expand()
but I get:
Traceback (most recent call last):
File "C:\Users\mflanagan\AppData\Local\Programs\Python\Python39\lib\site-packages\pywinauto\application.py", line 250, in __resolve_control
ctrl = wait_until_passes(
File "C:\Users\mflanagan\AppData\Local\Programs\Python\Python39\lib\site-packages\pywinauto\timings.py", line 458, in wait_until_passes
raise err
pywinauto.timings.TimeoutError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\mflanagan\AppData\Local\Programs\Python\Python39\lib\site-packages\pywinauto\controls\uia_controls.py", line 143, in expand
if self.is_expanded():
File "C:\Users\mflanagan\AppData\Local\Programs\Python\Python39\lib\site-packages\pywinauto\controls\uiawrapper.py", line 561, in is_expanded
state = self.get_expand_state()
File "C:\Users\mflanagan\AppData\Local\Programs\Python\Python39\lib\site-packages\pywinauto\controls\uia_controls.py", line 188, in get_expand_state
return super(ComboBoxWrapper, self).get_expand_state()
File "C:\Users\mflanagan\AppData\Local\Programs\Python\Python39\lib\site-packages\pywinauto\controls\uiawrapper.py", line 556, in get_expand_state
return self.iface_expand_collapse.CurrentExpandCollapseState
File "C:\Users\mflanagan\AppData\Local\Programs\Python\Python39\lib\site-packages\pywinauto\controls\uiawrapper.py", line 132, in __get__
value = self.fget(obj)
File "C:\Users\mflanagan\AppData\Local\Programs\Python\Python39\lib\site-packages\pywinauto\controls\uiawrapper.py", line 210, in iface_expand_collapse
return uia_defs.get_elem_interface(elem, "ExpandCollapse")
File "C:\Users\mflanagan\AppData\Local\Programs\Python\Python39\lib\site-packages\pywinauto\uia_defines.py", line 233, in get_elem_interface
cur_ptrn = element_info.GetCurrentPattern(ptrn_id)
File "C:\Users\mflanagan\AppData\Local\Programs\Python\Python39\lib\site-packages\pywinauto\application.py", line 379, in __getattribute__
ctrls = self.__resolve_control(self.criteria)
File "C:\Users\mflanagan\AppData\Local\Programs\Python\Python39\lib\site-packages\pywinauto\application.py", line 261, in __resolve_control
raise e.original_exception
File "C:\Users\mflanagan\AppData\Local\Programs\Python\Python39\lib\site-packages\pywinauto\timings.py", line 436, in wait_until_passes
func_val = func(*args, **kwargs)
File "C:\Users\mflanagan\AppData\Local\Programs\Python\Python39\lib\site-packages\pywinauto\application.py", line 222, in __get_ctrl
ctrl = self.backend.generic_wrapper_class(findwindows.find_element(**ctrl_criteria))
File "C:\Users\mflanagan\AppData\Local\Programs\Python\Python39\lib\site-packages\pywinauto\findwindows.py", line 84, in find_element
elements = find_elements(**kwargs)
File "C:\Users\mflanagan\AppData\Local\Programs\Python\Python39\lib\site-packages\pywinauto\findwindows.py", line 305, in find_elements
elements = findbestmatch.find_best_control_matches(best_match, wrapped_elems)
File "C:\Users\mflanagan\AppData\Local\Programs\Python\Python39\lib\site-packages\pywinauto\findbestmatch.py", line 536, in find_best_control_matches
raise MatchError(items = name_control_map.keys(), tofind = search_text)
pywinauto.findbestmatch.MatchError: Could not find 'element' in 'dict_keys(['Edit', 'Edit0', 'Edit1', 'Edit2', 'Open', 'Button', 'OpenButton'])'
where formJobStock is a window specification:
formJobStock = window.child_window(auto_id='frmJobStock')
and I know it's found because I've done formJobStock.wrapper_object() and it comes up correctly.
It looks like I'm not passing in the element parameter correctly to the controls.controls_uia.ComboBoxWrapper class init call...any idea on how to do that correctly?

error on search image in python image_match library

I'm using python image_match library. I need to use search_image method of this library. but when I se this method I got the below error:
Traceback (most recent call last):
File "/var/www/html/Panel/test2.py", line 16, in <module>
ses.search_image('https://upload.wikimedia.org/wikipedia/commons/thumb/e/ec/Mona_Lisa,_by_Leonardo_da_Vinci,_from_C2RMF_retouched.jpg/687px-Mona_Lisa,_by_Leonardo_da_Vinci,_from_C2RMF_retouched.jpg')
File "/usr/local/lib/python3.10/site-packages/image_match/signature_database_base.py", line 268, in search_image
transformed_record = make_record(img, self.gis, self.k, self.N)
File "/usr/local/lib/python3.10/site-packages/image_match/signature_database_base.py", line 356, in make_record
signature = gis.generate_signature(path)
File "/usr/local/lib/python3.10/site-packages/image_match/goldberg.py", line 161, in generate_signature
im_array = self.preprocess_image(path_or_image, handle_mpo=self.handle_mpo, bytestream=bytestream)
File "/usr/local/lib/python3.10/site-packages/image_match/goldberg.py", line 257, in preprocess_image
return rgb2gray(image_or_path)
File "/usr/local/lib/python3.10/site-packages/skimage/_shared/utils.py", line 394, in fixed_func
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/skimage/color/colorconv.py", line 875, in rgb2gray
rgb = _prepare_colorarray(rgb)
File "/usr/local/lib/python3.10/site-packages/skimage/color/colorconv.py", line 140, in _prepare_colorarray
raise ValueError(msg)
ValueError: the input array must have size 3 along `channel_axis`, got (1024, 687)
Can you please help me?

Problem with adding Excel files at Pandas | wrapper return func

Hi everybody I have a problem uploading a excel file with Pandas
I have taken the file in archive, if I uploaded it directly it gaves me an error. If I cope and paste the excel file there is no problem.
The code is very easy:
data = pd.read_excel(r"C:\Users\obett\Desktop\Corporate Governance\pandas.xlsx")
and this is the error:
Traceback (most recent call last):
File "C:/Users/obett/PycharmProjects/pythonProject6/main.py", line 24, in <module>
data = pd.read_excel(r"C:\Users\obett\Desktop\Corporate Governance\Aida_Export_67.xlsx")
File "C:\Users\obett\PycharmProjects\pythonProject6\venv\lib\site-packages\pandas\util\_decorators.py", line 299, in wrapper
return func(*args, **kwargs)
File "C:\Users\obett\PycharmProjects\pythonProject6\venv\lib\site-packages\pandas\io\excel\_base.py", line 344, in read_excel
data = io.parse(
File "C:\Users\obett\PycharmProjects\pythonProject6\venv\lib\site-packages\pandas\io\excel\_base.py", line 1170, in parse
return self._reader.parse(
File "C:\Users\obett\PycharmProjects\pythonProject6\venv\lib\site-packages\pandas\io\excel\_base.py", line 492, in parse
data = self.get_sheet_data(sheet, convert_float)
File "C:\Users\obett\PycharmProjects\pythonProject6\venv\lib\site-packages\pandas\io\excel\_openpyxl.py", line 549, in get_sheet_data
converted_row = [self._convert_cell(cell, convert_float) for cell in row]
File "C:\Users\obett\PycharmProjects\pythonProject6\venv\lib\site-packages\pandas\io\excel\_openpyxl.py", line 549, in <listcomp>
converted_row = [self._convert_cell(cell, convert_float) for cell in row]
File "C:\Users\obett\PycharmProjects\pythonProject6\venv\lib\site-packages\pandas\io\excel\_openpyxl.py", line 514, in _convert_cell
elif cell.is_date:
File "C:\Users\obett\PycharmProjects\pythonProject6\venv\lib\site-packages\openpyxl\cell\read_only.py", line 101, in is_date
return Cell.is_date.__get__(self)
File "C:\Users\obett\PycharmProjects\pythonProject6\venv\lib\site-packages\openpyxl\cell\cell.py", line 256, in is_date
self.data_type == 'n' and is_date_format(self.number_format)
File "C:\Users\obett\PycharmProjects\pythonProject6\venv\lib\site-packages\openpyxl\cell\read_only.py", line 66, in number_format
_id = self.style_array.numFmtId
File "C:\Users\obett\PycharmProjects\pythonProject6\venv\lib\site-packages\openpyxl\cell\read_only.py", line 56, in style_array
return self.parent.parent._cell_styles[self._style_id]
IndexError: list index out of range
Thank you very much

How do I reformat this json for database import?

I have these "json" files that I like to insert into my mongodb database.
An example of one is:
http://s.live.ksmobile.net/cheetahlive/de/ff/15201023827214369775/15201023827214369775.json
The problem is, that it is formated like this:
{ "channelType":"TEMPGROUP", ... } # line 1
{ "channelType":"TEMPGROUP", ... } # line 2
So instead of inserting it as 1 document in the DB, it insert every single line as 1 entry. That ends up with what should be 3 documents from 3 "json" files in the database become 1189 documents in the database instead.
How can I insert the whole content of the ".json" into one document?
My code is:
replay_url = "http://live.ksmobile.net/live/getreplayvideos?"
userid = 969730808384462848
url2 = replay_url + urllib.parse.urlencode({'userid': userid}) + '&page_size=1000'
raw_replay_data = requests.get(url2).json()
for i in raw_replay_data['data']['video_info']:
url3 = i['msgfile']
raw_message_data = urllib.request.urlopen(url3)
for line in raw_message_data:
json_data = json.loads(line)
messages.insert_one(json_data)
print(json_data)
Update to give more information to answer
messages.insert(json_data) gives this error:
Traceback (most recent call last):
File "/media/anon/06bcf743-8b4d-409f-addc-520fc4e19299/PycharmProjects/LiveMe/venv1/lib/python3.6/site-packages/pymongo/collection.py", line 633, in _insert
blk.execute(concern, session=session)
File "/media/anon/06bcf743-8b4d-409f-addc-520fc4e19299/PycharmProjects/LiveMe/venv1/lib/python3.6/site-packages/pymongo/bulk.py", line 432, in execute
return self.execute_command(generator, write_concern, session)
File "/media/anon/06bcf743-8b4d-409f-addc-520fc4e19299/PycharmProjects/LiveMe/venv1/lib/python3.6/site-packages/pymongo/bulk.py", line 329, in execute_command
raise BulkWriteError(full_result)
pymongo.errors.BulkWriteError: batch op errors occurred
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/media/anon/06bcf743-8b4d-409f-addc-520fc4e19299/PycharmProjects/LiveMe/import_messages_dev.py", line 43, in <module>
messages.insert(json_data)
File "/media/anon/06bcf743-8b4d-409f-addc-520fc4e19299/PycharmProjects/LiveMe/venv1/lib/python3.6/site-packages/pymongo/collection.py", line 2941, in insert
check_keys, manipulate, write_concern)
File "/media/anon/06bcf743-8b4d-409f-addc-520fc4e19299/PycharmProjects/LiveMe/venv1/lib/python3.6/site-packages/pymongo/collection.py", line 635, in _insert
_raise_last_error(bwe.details)
File "/media/anon/06bcf743-8b4d-409f-addc-520fc4e19299/PycharmProjects/LiveMe/venv1/lib/python3.6/site-packages/pymongo/helpers.py", line 220, in _raise_last_error
_raise_last_write_error(write_errors)
File "/media/anon/06bcf743-8b4d-409f-addc-520fc4e19299/PycharmProjects/LiveMe/venv1/lib/python3.6/site-packages/pymongo/helpers.py", line 188, in _raise_last_write_error
raise DuplicateKeyError(error.get("errmsg"), 11000, error)
pymongo.errors.DuplicateKeyError: E11000 duplicate key error index: liveme.messages.$_id_ dup key: { : ObjectId('5aa2fc6f5d60126499060949') }
messages.insert_one(json_data) gives me this error:
Traceback (most recent call last):
File "/media/anon/06bcf743-8b4d-409f-addc-520fc4e19299/PycharmProjects/LiveMe/import_messages_dev.py", line 43, in <module>
messages.insert_one(json_data)
File "/media/anon/06bcf743-8b4d-409f-addc-520fc4e19299/PycharmProjects/LiveMe/venv1/lib/python3.6/site-packages/pymongo/collection.py", line 676, in insert_one
common.validate_is_document_type("document", document)
File "/media/anon/06bcf743-8b4d-409f-addc-520fc4e19299/PycharmProjects/LiveMe/venv1/lib/python3.6/site-packages/pymongo/common.py", line 434, in validate_is_document_type
"collections.MutableMapping" % (option,))
TypeError: document must be an instance of dict, bson.son.SON, bson.raw_bson.RawBSONDocument, or a type that inherits from collections.MutableMapping
messages.insert_many(json_data) gives me this error:
Traceback (most recent call last):
File "/media/anon/06bcf743-8b4d-409f-addc-520fc4e19299/PycharmProjects/LiveMe/import_messages_dev.py", line 43, in <module>
messages.insert_many(json_data)
File "/media/anon/06bcf743-8b4d-409f-addc-520fc4e19299/PycharmProjects/LiveMe/venv1/lib/python3.6/site-packages/pymongo/collection.py", line 742, in insert_many
blk.execute(self.write_concern.document, session=session)
File "/media/anon/06bcf743-8b4d-409f-addc-520fc4e19299/PycharmProjects/LiveMe/venv1/lib/python3.6/site-packages/pymongo/bulk.py", line 432, in execute
return self.execute_command(generator, write_concern, session)
File "/media/anon/06bcf743-8b4d-409f-addc-520fc4e19299/PycharmProjects/LiveMe/venv1/lib/python3.6/site-packages/pymongo/bulk.py", line 329, in execute_command
raise BulkWriteError(full_result)
pymongo.errors.BulkWriteError: batch op errors occurred
messages.insert and messages.insert_many both insert 1 line and throw the error.
These files obviously do not contain properly formatted json - rather they contain a separate object on each line.
To turn them into valid json, you probably want a list of objects, i.e.:
[{ "channelType":"TEMPGROUP", ... },
{ "channelType":"TEMPGROUP", ... }]
You can achieve this by doing:
for i in raw_replay_data['data']['video_info']:
url3 = i['msgfile']
raw_message_data = urllib.request.urlopen(url3)
json_data = []
for line in raw_message_data:
json_data.append(json.loads(line))
messages.insert_one(json_data)
print(json_data)

Categories

Resources