How to open a .pkl file - python

Although there's a lot of subjects related to my question already, the answers are usually no understandable for me, as I am just a beginner in the "writting scripts in Python" field.
Here is my situation :
There's a machine learning software that writes models in a .pkl format at the end of its learning phase. I would like to make those model.pkl files openable by an operator to check what there is inside the model. Thus I began to write a script that would use the pickle.load method and write the data contained in my model.pkl into a .txt file. Here's what I wrote to begin with:
import pickle
import os
model_path=input("Model Path = ")
with open(model_path, "rb") as model :
load = pickle.load(model, encoding='utf-8')
new_model_path = model_path.split('.pkl')[0] +'.txt'
print("creating new file at : ", new_model_path)
model_readable = open(new_model_path, 'rt')
model_readable.write(load)
print("writing model as readable : ", load)
model_readable.close()
model.close()
If I try to run it here's the output :
python3.7 unpickler.py
Model Path = /home/ouriacc/Desktop/workspace/SESAM/Base_de_tests/Anomalie_1/Models/OCSVM/EyeSat/CI_HEATER_CAMERA_VOLTAGE.pkl
Traceback (most recent call last):
File "unpickler.py", line 7, in <module>
load = pickle.load(model, encoding='utf-8')
_pickle.UnpicklingError: invalid load key, '_'.
I couldn't find any explanation about this error that didn't imply an incomplete or corrupted download, which can't be my case here as the model.pkl files are not modified once they've been created by the AI software.
Could someone help me to solve the error or even indicate me an other methode to achieve my goal ? All I need is a script that gives access for a user to what the .pkl file contains.
Thank you very much !

So I figured out why #wundermahn asked about scikit-learn. It seems my model.pkl files were generated by joblib and not exactly pickle library. This is why it wouldn't work apparently. It changed my code by replacing pickle.load() by joblid.load() and it works better !
Thank you !

Related

RedVox Python SDK | Not Reading in .rdvxz Files

I'm attempting to read in a series of files for processing contained in a single directory using RedVox:
input_directory = "/home/ben/Documents/Data/F1D1/21" # file location
rdvx_data = DataWindow(input_dir=input_directory, apply_correction=False, debug=True) # using RedVox to read in the files
print(os.listdir(input_directory)) # verifying the files actually exist...
# returns "['file1.rdvxz', 'file2.rdvxz', file3.rdvxz', ...etc]", they exist
# write audio portion to file
rdvx_data.to_json_file(base_dir=output_rpd_directory,
file_name=output_filename)
# this never runs, because rdvx_data.stations = [] (verified through debugging)
for station in rdvx_data.stations:
# some code here
Enabling debugging through arguments as seen above does not provide an extra details. In fact, there is no error message whatsoever. It writes the JSON file and pickle to disk, but the JSON file is full of null values and the pickle object is just a shell, no contents. So the files definitely exist, os.listdir() sees them, but RedVox does not.
I assume this is some very silly error or lack of understanding on my part. Any help is greatly appreciated. I have not worked with RedVox previously, nor do I have much understanding of what these files contain other than some audio data and some other data. I've simply been tasked with opening them to work on a model to analyze the data within.
SOLVED: Not sure why the previous code doesn't work (it was handed to me), however, I worked around the DataWindow call and went straight to calling the "redvox.api900.reader" object:
from redvox.api900 import reader
dataset_dir = "/home/*****/Documents/Data/F1D1/21/"
rdvx_files = glob(dataset_dir+"*.rdvxz")
for file in rdvx_files:
wrapped_packet = reader.read_rdvxz_file(file)
From here I can view all of the sensor data within:
if wrapped_packet.has_microphone_sensor():
microphone_sensor = wrapped_packet.microphone_sensor()
print("sample_rate_hz", microphone_sensor.sample_rate_hz())
Hope this helps anyone else who's confused.

Trouble reading npy file

I am new to python and am having trouble reading a *.npy file that somebody else saved. If I use the following commands:
import numpy as np
np.load('lat.npy')
I get the following error:
ValueError: Cannot load file containing pickled data when allow_pickle=False
So, I set allow_pickle=True:
np.load('lat.npy',allow_pickle=True)
Then, I get a different error:
OSError: Failed to interpret file 'lat.npy' as a pickle
Maybe it is relevant that I am on a PC, and the other file was written on a Mac.
Am I doing something wrong? (I am sorry if this question has been asked already.) Thank you!
I learned that my colleague's data file was written in python 2, while I am using python 3. Using the np.load command with the following options will work:
np.load('lat.npy',allow_pickle=True,fix_imports=True,encoding='latin1')
It seems I need to set all of those options, but the 'encoding' argument seems especially important. The doc for numpy.load says about the encoding argument, "Only useful when loading Python 2 generated pickled files in Python 3, which includes npy/npz files containing object arrays."

What happened when I used pandas to read csv files for multiple time in kaggle's notebook?

I am participating the kaggle's NCAA March Madness Anlytics Competion. I used pandas to read the information from csv files but encountered such a problem:
seeds = pd.read_csv('/kaggle/input/march-madness-analytics-2020/2020DataFiles/2020DataFiles/2020-Womens-Data/WDataFiles_Stage1/WNCAATourneySeeds.csv')
seeds
Here the output is empty. And I tried again like this:
rank = seeds.merge(teams)
Then there came an error:
NameError: name 'seeds' is not defined.
I can't figure out what happened and I tried it offline which turned out that nothing happened. Do I miss anything? And how can I fix it? Note that this was not the first time I used the read_csv() to read data from csv file in this notebook, though I couldn't figure out whether there is relation between this trouble and my situation.
You must put the CSV file in the folder where python saves projects.
Run this to find out the destination:
%pwd
Put the file in the destination and run this:
seeds = pd.read_csv('WNCAATourneySeeds.csv')
You can also run this:
seeds = pd.read_csv(r'C:\Users....\WNCAATourneySeeds.csv')
Where "C" is the disk where your file is saved and replace "..." by the computer path where the file is saved. Use also "\" not "/".
I finally found the problem. I didn't notice I was writing my codes in the markdown cell. Stupid me!

Phonebook in Python

I am writing a test console program-phonebook with Python. My IDE is JetBrains PyCharm. I have 5 functions - Search contact, Enter contact, Delete contact, All phones and Exit. My question is how can I make the program to save information in text file and when I compile it, the information will be saved in this text file ?
You could write the data to a csv file. https://docs.python.org/3/library/csv.html
This should help : https://docs.python.org/2/tutorial/inputoutput.html
I think for your case the easiest would be to write data to a python file as follows:
with open('PathToAFile/MyFile.py', 'w') as f:
f.write('contact_names =[' + contact_name1 + ',' + contact_name2... + ']\n')
This will make it extremely easy to load data later without having to parse (as in a csv).
In the example code i provide, im saving your contact names to a list called 'contact_names' in a python file named 'MyFile.py'. When you execute 'MyFile.py
you will have access to the 'contact_names' variable

Reading MSWord file at run time

The structure of file is not important for me so from some previous solution as mentioned "converting them to plain text and importing them with readLines" ,i changed file type from ".doc/.docx" to ".txt" and end up with an error
file_list = list.files("D:/R/New",pattern="*.txt",full.names=F
obj_list <- lapply(file_list,readLines)
Warning messages:
1: In FUN(c("adityar.txt":
incomplete final line found on 'adityar.txt'
I have tried to read with the help of corpus as well but didnt find good result ,here the second solution says about pdf and unix ,any better and fast approach, i am working on windows platform,any help.
Using python , you can do this :
from docx import *
import json
document = opendocx("path_to_your_docx")
res = getdocumenttext(document)
You can save your script and call it from R using system

Categories

Resources