I would like to combine two pickle files into a new onethat both come from face recognition. I have a main actors.pkl file that one holds a lot of actors, now I would like to update this file but not using the imageset update methode. So I am creating a new picklefile actorupdate.pkl in this set are the registered faces not in actors.pkl
I am able to open / read both files and they both work perfect (recognition wise) so how would I be able to join both files into a new one (or append actorupdate.pkl to actors.pkl)
import pickle
data = pickle.loads(open("encodings/actors.pkl", "rb").read())
data2 = pickle.loads(open("encodings/actorupdate.pkl", "rb").read())
verify dataset (names):
data['names'] = ['ben_afflek', 'ben_afflek', 'ben_afflek', 'ben_afflek', 'ben_afflek', 'ben_afflek', 'ben_afflek', 'ben_afflek', ........
data2['names'] = ['Gal_Gadot', 'Gal_Gadot', 'Gal_Gadot', 'Gal_Gadot', 'Gal_Gadot', 'Gal_Gadot']
trying to use this methode:
filename="picklefile"
with open(filename, 'wb') as fp:
pickle.dump(data ,fp)
pickle.dump(data2 ,fp)
but in that case it writes a new file to my disk but the information from data2 is not there, it became just a copy of data and I would like to use the file directly not create a new pickle file on my disk.
You can simply append second list into first one and then just dumps it
I found the following worked (had to loop over the contents of the data2):
data1 = pickle.load(open('encodings/actors.pkl', 'rb'))
data2 = pickle.loads(open("encodings/actorupdate.pkl", "rb").read())
new_encodings = data1['encodings']
new_names = data1['names']
for encode in data2['encodings']:
new_encodings.append(encode)
for name in data2['names']:
new_names.append(name)
data = {"encodings": new_encodings, "names": new_names}
with open(data1 , 'wb') as fp:
pickle.dump(data, fp)
data now holds both pickle files and allows to detect the faces from both pickle files, the file data1 itself is overwritten with the updated dataset (data2) on disk.
Related
I just want to add new facial encoding to the pickle file. I've tried the following the method but it's not working.
creating of the pickle file
import face_recognition
import pickle
all_face_encodings = {}
img1 = face_recognition.load_image_file("ex.jpg")
all_face_encodings["ex"] = face_recognition.face_encodings(img1)[0]
img2 = face_recognition.load_image_file("ex2.jpg")
all_face_encodings["ex2"] = face_recognition.face_encodings(img2)[0]
with open('dataset_faces.dat', 'wb') as f:
pickle.dump(all_face_encodings, f)
Appending new data to the pickle file.
import pickle
img3 = face_recognition.load_image_file("ex3.jpg")
all_face_encodings["ex3"] = face_recognition.face_encodings(img3)[0]
with open('dataset_faces.dat', 'wb') as f:
pickle.dump(img3,f)
pickle.dump(all_face_encodings["ex3"],f)
But It's not working. Is there a way to append it?
I guess your steps should be:
Load old pickle-data to memory to <all_face_encodings> dictionary;
Add a new encodings to the dictionary.
Dump the whole dictionary to pickle file again.
I have multiple number of json files saved in a folder. I would like to parse each json file, use the library flatten and save as a seperate json file.
I have managed to do this with one json, but struggling to parse several json files at once without merging the data and then save.
I think I need to create a loop to load a json file, flatten and save until there were no more json files in the folder, is this possible?
This still seems to only parse one json file.
path_to_json='json_test/'
for file in [file for file in os.listdir(path_to_json)if file.endswith('.json')]:
with open(path_to_json + file) as json_file:
data1=json.load(json_file)
Any help would be much appreciated thanks!
Every looop 'data1' is assigned to new json files. therefore only returns one result.
Instead, append to a new list.
import os
import json
# Flatten not supported on 3.8.3
path = 'X:test folder/'
file_list = [p for p in os.listdir(path) if p.endswith('.json')]
flattened = []
for file in file_list:
with open(path + file) as json_file:
# flatten json here, can't install from pip.
flattened.append(json.load(json_file))
for file, flat_json in zip(file_list, flattened):
json.dump(flat_json, open(path + file + '_flattened.json', "w"), indent=2)
# Can yo try this out
# https://stackoverflow.com/questions/23520542/issue-with-merging-multiple-json-files-in-python
import glob
read_files = glob.glob("*.json")
with open("merged_file.json", "wb") as outfile:
outfile.write('[{}]'.format(
','.join([open(f, "rb").read() for f in read_files])))
I'm using TweetScraper to scrape tweets with certain keywords. Right now, each tweet gets saved to a separate JSON file in a set folder, so I end up with thousands of JSON files. Is there a way to make each new tweet append to one big JSON file? If not, how do I process/work with thousands of small JSON files in Python?
Here's the part of settings.py that handles saving data:
# settings for where to save data on disk
SAVE_TWEET_PATH = './Data/tweet/'
SAVE_USER_PATH = './Data/user/'
I would read all files., put data in list and save it again as JSON
import os
import json
folder = '.'
all_tweets = []
# -- read ---
for filename in sorted(os.listdir(folder)):
if filename.endswith('.json'):
fullpath = os.path.join(folder, filename)
with open(fullpath) as fh:
tweet = json.load(fh)
all_tweets.append(tweet)
# --- save ---
with open('all_tweets.json', 'w') as fh:
json.dump(all_tweets, fh)
There's a CSV file in a S3 bucket that I want to parse and turn into a dictionary in Python. Using Boto3, I called the s3.get_object(<bucket_name>, <key>) function and that returns a dictionary which includes a "Body" : StreamingBody() key-value pair that apparently contains the data I want.
In my python file, I've added import csv and the examples I see online on how to read a csv file, you pass the file name such as:
with open(<csv_file_name>, mode='r') as file:
reader = csv.reader(file)
However, I'm not sure how to retrieve the csv file name from StreamBody, if that's even possible. If not, is there a better way for me to read the csv file in Python? Thanks!
Edit: Wanted to add that I'm doing this in AWS Lambda and there are documented issues with using pandas in Lambda, so this is why I wanted to use the csv library and not pandas.
csv.reader does not require a file. It can use anything that iterates through lines, including files and lists.
So you don't need a filename. Just pass the lines from response['Body'] directly into the reader. One way to do that is
lines = response['Body'].read().splitlines(True)
reader = csv.reader(lines)
To retrieve and read CSV file from s3 bucket, you can use the following code:
import csv
import boto3
from django.conf import settings
bucket_name = "your-bucket-name"
file_name = "your-file-name-exists-in-that-bucket.csv"
s3 = boto3.resource('s3', aws_access_key_id=settings.AWS_ACCESS_KEY_ID,
aws_secret_access_key=settings.AWS_SECRET_ACCESS_KEY)
bucket = s3.Bucket(bucket_name)
obj = bucket.Object(key=file_name)
response = obj.get()
lines = response['Body'].read().decode('utf-8').splitlines(True)
reader = csv.DictReader(lines)
for row in reader:
# csv_header_key is the header keys which you have defined in your csv header
print(row['csv_header_key1'], row['csv_header_key2')
I'm working on a reporting application for my Django powered website. I want to run several reports and have each report generate a .csv file in memory that can be downloaded in batch as a .zip. I would like to do this without storing any files to disk. So far, to generate a single .csv file, I am following the common operation:
mem_file = StringIO.StringIO()
writer = csv.writer(mem_file)
writer.writerow(["My content", my_value])
mem_file.seek(0)
response = HttpResponse(mem_file, content_type='text/csv')
response['Content-Disposition'] = 'attachment; filename=my_file.csv'
This works fine, but only for a single, unzipped .csv. If I had, for example, a list of .csv files created with a StringIO stream:
firstFile = StringIO.StringIO()
# write some data to the file
secondFile = StringIO.StringIO()
# write some data to the file
thirdFile = StringIO.StringIO()
# write some data to the file
myFiles = [firstFile, secondFile, thirdFile]
How could I return a compressed file that contains all objects in myFiles and can be properly unzipped to reveal three .csv files?
zipfile is a standard library module that does exactly what you're looking for. For your use-case, the meat and potatoes is a method called "writestr" that takes a name of a file and the data contained within it that you'd like to zip.
In the code below, I've used a sequential naming scheme for the files when they're unzipped, but this can be switched to whatever you'd like.
import zipfile
import StringIO
zipped_file = StringIO.StringIO()
with zipfile.ZipFile(zipped_file, 'w') as zip:
for i, file in enumerate(files):
file.seek(0)
zip.writestr("{}.csv".format(i), file.read())
zipped_file.seek(0)
If you want to future-proof your code (hint hint Python 3 hint hint), you might want to switch over to using io.BytesIO instead of StringIO, since Python 3 is all about the bytes. Another bonus is that explicit seeks are not necessary with io.BytesIO before reads (I haven't tested this behavior with Django's HttpResponse, so I've left that final seek in there just in case).
import io
import zipfile
zipped_file = io.BytesIO()
with zipfile.ZipFile(zipped_file, 'w') as f:
for i, file in enumerate(files):
f.writestr("{}.csv".format(i), file.getvalue())
zipped_file.seek(0)
The stdlib comes with the module zipfile, and the main class, ZipFile, accepts a file or file-like object:
from zipfile import ZipFile
temp_file = StringIO.StringIO()
zipped = ZipFile(temp_file, 'w')
# create temp csv_files = [(name1, data1), (name2, data2), ... ]
for name, data in csv_files:
data.seek(0)
zipped.writestr(name, data.read())
zipped.close()
temp_file.seek(0)
# etc. etc.
I'm not a user of StringIO so I may have the seek and read out of place, but hopefully you get the idea.
def zipFiles(files):
outfile = StringIO() # io.BytesIO() for python 3
with zipfile.ZipFile(outfile, 'w') as zf:
for n, f in enumarate(files):
zf.writestr("{}.csv".format(n), f.getvalue())
return outfile.getvalue()
zipped_file = zip_files(myfiles)
response = HttpResponse(zipped_file, content_type='application/octet-stream')
response['Content-Disposition'] = 'attachment; filename=my_file.zip'
StringIO has getvalue method which return the entire contents. You can compress the zipfile
by zipfile.ZipFile(outfile, 'w', zipfile.ZIP_DEFLATED). Default value of compression is ZIP_STORED which will create zip file without compressing.