Working with Python and pickle to save complex data in object - python

I am experimenting with python to do a script for a program that works with python, and I need to save an object (with custom classes and arrays inside) to a file so that I can read it afterwards (so that I don't have to remake the object everytime, which takes hours)
I was reading in many forums that the easiest way to do that is to use pickle, but I am making a mistake in some place and I don't understand where...
Now, the code would be:
First I define this class:
class Issue_class:
Title_ID = None
Publisher_ID = None
Imprint_ID = None
Volume = None
Format = None
Color = None
Original = None
Rating = None
Issue_Date_Month = None
Issue_Date_Year = None
Reprint = None
Pages = None
Issue_Title = None
Number = None
Number_str = None
Synopsis = None
Characters_ID = None
Groups_ID = None
Writer_ID = None
Inker_ID = None
Colorist_ID = None
Letterer_ID = None
CoverArtist_ID = None
Penciller_ID = None
Editor_ID = None
Alternatives_ID = None
Reprints_ID = None
Story_ID = None
Multi = None
Multistories = None
then I define a list/array for this class:
Issuesdata = []
then during a loop I fill and append these to the list:
Issuedata = Issue_class()
Issuedata.Color = "unknown"
Issuedata.Tagline = "none"
Issuedata.Synopsis = "none"
Issuedata.Format = "none"
Issuedata.Publisher_ID = "none"
Issuedata.Imprint_ID = -1
Issuedata.Title_ID = -1
Issuedata.Volume = "none"
Issuedata.Number = -1
Issuedata.Number_str = "none"
Issuedata.Issue_Title = "none"
Issuedata.Rating = -1
Issuedata.Pages = -1
Issuedata.Issue_Date_Year = 0
Issuedata.Issue_Date_Month = 0
Issuedata.Original = True
Issuedata.Reprint = False
Issuedata.Multi= True
Issuedata.Letterer_ID = []
Issuedata.Characters_ID = []
Issuedata.Story_ID = []
Issuedata.Groups_ID = []
Issuedata.Writer_ID = []
Issuedata.Penciller_ID = []
Issuedata.Alternatives_ID = []
Issuedata.Reprints_ID = []
Issuedata.Inker_ID = []
Issuedata.Colorist_ID = []
Issuedata.Editor_ID = []
Issuedata.CoverArtist_ID = []
Issuedata.Multistories = []
Then I work with the data inside the object, and when it is complete, I append it to the list:
Issuesdata.append(Issuedata)
After that I print some info inside one of the objects in the list to be sure everything is ok:
print Issuesdata[3].Title_ID
print Issuesdata[3].Publisher_ID
print Issuesdata[3].Imprint_ID
print Issuesdata[3].Volume
print Issuesdata[3].Format
etc...
And everything is ok, the printed data is perfect
Now, I try to save the list to a file with:
filehandler = open("data.dat","wb")
pickle.dump(Issuesdata,filehandler)
filehandler.close()
This create the file with info inside... but when I try to read it with:
file = open("data.dat",'rb')
Issuesdat = pickle.load(file)
file.close()
The Python console tells me "'module' object has no attribute 'Issue_class'"
The first thing I thought was that I was reading the file wrong... But then I open the saved file with notepad and inside it it was full of "wrong data", like name of files or name of classes outside the code... which makes me suspect I am dumping the data wrong in the file...
Am I using pickle wrong?

Ok, I found the problem... It seems you have to define the class of your object in the main module for pickle to see it... I had it defined in the module I was working and calling the pickle command...

Try using pandas library with simple functions like:
DataFrame.to_pickle(file-path) to save pandas Dataframe in pickle.
pandas.read_pickle(file-path) to read pickle file.
Here you can find pandas reference to_pickle read_pickle.

Related

What is the best way to return a variable or call a function to maximize code reuse?

I was wondering if i could get some input from some season python exports, i have a couple questions
I am extracting data from an api request and calculating the total vulnerabilities,
what is the best way i can return this data so that i can call it in another function
what is the way i can add up all the vulnerabilities (right now its just adding it per 500 at a time, id like to do the sum of every vulnerability
def _request():
third_party_patching_filer = {
"asset": "asset.agentKey IS NOT NULL",
"vulnerability" : "vulnerability.categories NOT IN ['microsoft patch']"}
headers = _headers()
print(headers)
url1 = f"https://us.api.insight.rapid7.com/vm/v4/integration/assets"
resp = requests.post(url=url1, headers=headers, json=third_party_patching_filer, verify=False).json()
jsonData = resp
#print(jsonData)
has_next_cursor = False
nextKey = ""
if "cursor" in jsonData["metadata"]:
has_next_cursor = True
nextKey = jsonData["metadata"]["cursor"]
while has_next_cursor:
url2 = f"https://us.api.insight.rapid7.com/vm/v4/integration/assets?&size=500&cursor={nextKey}"
resp2 = requests.post(url=url2, headers=headers, json=third_party_patching_filer, verify=False).json()
cursor = resp2["metadata"]
print(cursor)
if "cursor" in cursor:
nextKey = cursor["cursor"]
print(f"next key {nextKey}")
#print(desktop_support)
for data in resp2["data"]:
for tags in data['tags']:
total_critical_vul_osswin = []
total_severe_vul_osswin = []
total_modoer_vuln_osswin = []
if tags["name"] == 'OSSWIN':
print("OSSWIN")
critical_vuln_osswin = data['critical_vulnerabilities']
severe_vuln_osswin = data['severe_vulnerabilities']
modoer_vuln_osswin = data['moderate_vulnerabilities']
total_critical_vul_osswin.append(critical_vuln_osswin)
total_severe_vul_osswin.append(severe_vuln_osswin)
total_modoer_vuln_osswin.append(modoer_vuln_osswin)
print(sum(total_critical_vul_osswin))
print(sum(total_severe_vul_osswin))
print(sum(total_modoer_vuln_osswin))
if tags["name"] == 'DESKTOP_SUPPORT':
print("Desktop")
total_critical_vul_desktop = []
total_severe_vul_desktop = []
total_modorate_vuln_desktop = []
critical_vuln_desktop = data['critical_vulnerabilities']
severe_vuln_desktop = data['severe_vulnerabilities']
moderate_vuln_desktop = data['moderate_vulnerabilities']
total_critical_vul_desktop.append(critical_vuln_desktop)
total_severe_vul_desktop.append(severe_vuln_desktop)
total_modorate_vuln_desktop.append(moderate_vuln_desktop)
print(sum(total_critical_vul_desktop))
print(sum(total_severe_vul_desktop))
print(sum(total_modorate_vuln_desktop))
else:
pass
else:
has_next_cursor = False
If you have a lot of parameters to pass, consider using a dict to combine them. Then you can just return the dict and pass it along to the next function that needs that data. Another approach would be to create a class and either access the variables directly or have helper functions that do so. The latter is a cleaner solution vs a dict, since with a dict you have to quote every variable name, and with a class you can easily add additional functionally beyond just being a container for a bunch of instance variables.
If you want the total across all the data, you should put these initializations:
total_critical_vul_osswin = []
total_severe_vul_osswin = []
total_modoer_vuln_osswin = []
before the while has_next_cursor loop (and similarly for the desktop totals). The way your code is currently, they are initialized each cursor (ie, each 500 samples based on the URL).

django.core.exceptions.ValidationError: ['“TRUE” value must be either True or False.']

I am trying to upload data in the django ORM by a script for which I have written this
for index, row in df.iterrows():
allocated = row['is_allocated']
delivery_required_on = row['delivery_required_on']
linked = row['linked']
raised_by = row['raised_by']
raised_for = Company.objects.get(pk=row['raised_for']) ### double check
rejected = row['is_rejected']
reason = row['reason']
remarks = row['remarks']
created = row['created_at']
owner = User.objects.get(pk=row['New owner'])
j = literal_eval(row['flows'])
flows = []
mobj = MaterialRequest.objects.create(owner=owner, is_allocated=allocated,
delivery_required_on=delivery_required_on, linked=linked,
raised_by=raised_by, raised_for=raised_for, is_rejected=rejected,
reason=reason, remarks=remarks, created=created)
It is running fine when the data is something like the following:
But as soon as is_allocated reaches False it shows the following error:
django.core.exceptions.ValidationError: ['“TRUE” value must be either
True or False.']
I am unable to find something related to this
it seems like the is_allocated property of your model is a boolean. So you should assign a boolean value to it. But your column values in the data frame are strings TRUE and FALSE.
replacing this line
allocated = row['is_allocated']
with
allocated = (row['is_allocated'] == 'TRUE')
might help.
if you have None other than True and False values, consider that too.
It is because you are trying to store a string in a boolean field. One solution is to change your string type to boolean. Maybe having a function like following in your code solves your problem:
def to_boolean(raw_value: str) -> bool:
if not isinstance(raw_value, str):
raw_value = str(raw_value)
raw_value = raw_value.strip()
return {'true': True, 'false': False}.get(raw_value.lower(), False)
and then use it in your loop like (where ever you think your field type is boolean):
for index, row in df.iterrows():
allocated = to_boolean(row['is_allocated'])
delivery_required_on = row['delivery_required_on']
linked = row['linked']
raised_by = row['raised_by']
raised_for = Company.objects.get(pk=row['raised_for']) ### double check
rejected = to_boolean(row['is_rejected'])
reason = row['reason']
remarks = row['remarks']
created = row['created_at']
owner = User.objects.get(pk=row['New owner'])
j = literal_eval(row['flows'])
flows = []
mobj = MaterialRequest.objects.create(owner=owner, is_allocated=allocated,
delivery_required_on=delivery_required_on, linked=linked,
raised_by=raised_by, raised_for=raised_for, is_rejected=rejected,
reason=reason, remarks=remarks, created=created)

empty files in python

I am trying to create a file with all the magnetic information in my lists. I've used this code before as well. for some reason the file it returns is empty. I'm not sure why.
here is my code:
magnetosheath_Bx = JSS_Bx[12339:13795]
magnetosheath_By = JSS_By[12339:13795]
magnetosheath_Bz = JSS_Bz[12339:13795]
magnetosheath_B = JSS_Bmag[12339:13795]
magnetosheath_time = epochtime_magdata[12339:13795]
magnetosheath_r = new_RJSE[12339:13795]
Magnetosheath_data = zip(magnetosheath_Bx, magnetosheath_By, magnetosheath_Bz, magnetosheath_B, magnetosheath_time, magnetosheath_r)
filenew= open('Ulysses_Magnetoseath.txt' , 'w')
filenew.write("hello")
for magnetic_data in Magnetosheath_data:
filenew.write('{} {} {} {} {} {}\n'.format(magnetic_data[0], magnetic_data[1], magnetic_data[2], magnetic_data[3],magnetic_data[4],magnetic_data[5] ))
filenew.write("hello")

Get file details in python

I'm trying to extract the details of a file using Python. Specifically, when I right click a photo and select properties, under the Details tab of the menu that pops up, there's a whole bunch of details about the file. What I really need is the contents of the "People" detail.
This is the menu in question:
Is there a good way of getting that People detail in a string or something?
Some people have suggested using ExifRead. I tried that, but it didn't pull the People tag out of the Exif data.
This is not EXIF data but rather data that Windows populates for different types of objects in the Windows Property System.
The one you are concerned with is called System.Photo.PeopleNames:
propertyDescription
name = System.Photo.PeopleNames
shellPKey = PKEY_Photo_PeopleNames
formatID = E8309B6E-084C-49B4-B1FC-90A80331B638
propID = 100
searchInfo
inInvertedIndex = true
isColumn = true
isColumnSparse = true
columnIndexType = OnDemand
maxSize = 128
mnemonics = people|people tag|people tags
labelInfo
label = People
sortDescription
invitationText = Add a people tag
hideLabel = false
typeInfo
type = String
groupingRange = Discrete
isInnate = true
canBePurged = true
multipleValues = true
isGroup = false
aggregationType = Union
isTreeProperty = false
isViewable = true
isQueryable (Vista) = false
includeInFullTextQuery (Vista) = false
searchRawValue (Windows 7) = true
conditionType = String
defaultOperation = Equal
aliasInfo
sortByAlias = None
additionalSortByAliases = None
displayInfo
defaultColumnWidth = 11
displayType = String
alignment = Left
relativeDescriptionType = General
defaultSortDirection = Ascending
stringFormat
formatAs = General
booleanFormat
formatAs = YesNo
numberFormat
formatAs = General
formatDurationAs = hh:mm:ss
dateTimeFormat
formatAs = General
formatTimeAs = ShortTime
formatDateAs = ShortDate
enumeratedList
defaultText
useValueForDefault = False
enum
value
text
enumRange
minValue
setValue
text
drawControl
control = Default
editControl
control = Default
filterControl
control = Default
queryControl
control = Default
To access this information in Python, use win32com.propsys.

Python help, reading and writing to a txt file

I have posted the relevant part of my code below. Before that are just load functions, which I am pretty sure have no errors.
I am recieving error
IndexError: list index out of range( "namestaj["Naziv"] = deon[1]")
Does anyone see something out of order?
#load furniture from a txt file
def ucitajNamestaj():
listaNamestaja = open("namestaj.txt", "r").readlines()
namestaj = []
for red in listaNamestaja:
namestaj.append(stringToNamestaj(red))
return namestaj
#String to Furniture, dictionary
def stringToNamestaj(red):
namestaj = {}
deon = red.strip().split("|")
namestaj["Sifra"] = deon[0]
namestaj["Naziv"] = deon[1]
namestaj["Boja"] = deon[2]
namestaj["Kolicina"] = int(deon[3])
namestaj["Cena"] = float(deon[4])
namestaj["Kategorija"] = deon[5]
namestaj["Dostupan"] = deon[6]
return namestaj
Couple of things first, try always to provide a mcve and make sure you use properly the SO code directives, otherwise your question is unreadable.
Now, probably what's happening is your file has some empty lines and you're not skipping those, try this:
def ucitajNamestaj():
listaNamestaja = open("namestaj.txt", "r").readlines()
namestaj = []
for red in listaNamestaja:
if red.strip() == "":
continue
namestaj.append(stringToNamestaj(red))
return namestaj
def stringToNamestaj(red):
namestaj = {}
deon = red.strip().split("|")
namestaj["Sifra"] = deon[0]
namestaj["Naziv"] = deon[1]
namestaj["Boja"] = deon[2]
namestaj["Kolicina"] = int(deon[3])
namestaj["Cena"] = float(deon[4])
namestaj["Kategorija"] = deon[5]
namestaj["Dostupan"] = deon[6]
return namestaj

Categories

Resources