Related
I am extracting face embedding of an image and appending it in a existing pickle file. But looks like its not working as when I unpickle the file, it do not contains the new data added. Below is code:
file = client_dir + '\embeddings.pickle'
data = {"embeddings": known_embeddings, "names": known_names}
with open(file, 'ab+') as fp:
pickle.dump(data, fp)
fp.close()
log("[INFO] Data appended to embeddings.pickle ")
Current pickle file contains below data:
{'embeddings': [array([-0.03656099, 0.11354745, -0.00438912, 0.0367547 , 0.06391761,
0.18440282, 0.06150107, -0.17380905, 0.03094344, -0.00182147,
0.00969766, 0.06890091, 0.04974053, -0.0502388 , -0.03414046,
-0.13550822, -0.02251128, 0.14556041, -0.04045469, 0.06500552,
0.0726142 , -0.04139924, -0.04662199, 0.08869533, -0.00061307,
-0.11912274, 0.13141112, -0.00648551, 0.00296356, 0.03682912,
-0.15076959, 0.03989822, 0.02799555, 0.03429572, 0.09865954,
0.14113557, -0.08355764, 0.09193961, -0.00819231, -0.01184336,
-0.12519744, 0.00668721, 0.0816237 , 0.00464355, -0.00339399,
0.07501812, 0.11679655, -0.09211859, 0.06211261, -0.00543289,
0.10347278, 0.06651585, -0.01512023, 0.09477805, 0.09886038,
-0.03837246, 0.02265131, -0.14867221, 0.00781244, 0.04845129,
-0.0363168 , -0.00186919, -0.16163988, 0.09539618, 0.14983718,
0.09159472, -0.05315595, -0.05073383, 0.01501674, -0.03789762,
0.07116041, 0.07650694, -0.02975985], dtype=float32)], 'names': ['rock']}
New data which I am trying to append is below:
{'embeddings': [array([-0.03656099, 0.11354745, -0.00438912, 0.0367547 , 0.06391761,
0.18440282, 0.06150107, -0.17380905, 0.03094344, -0.00182147,
0.00969766, 0.06890091, 0.04974053, -0.0502388 , -0.03414046,
0.07501812, 0.11679655, -0.09211859, 0.06211261, -0.00543289,
-0.13550822, -0.02251128, 0.14556041, -0.04045469, 0.06500552,
0.0726142 , -0.04139924, -0.04662199, 0.08869533, -0.00061307,
-0.11912274, 0.13141112, -0.00648551, 0.00296356, 0.03682912,
-0.15076959, 0.03989822, 0.02799555, 0.03429572, 0.09865954,
0.14113557, -0.08355764, 0.09193961, -0.00819231, -0.01184336,
-0.12519744, 0.00668721, 0.0816237 , 0.00464355, -0.00339399,
0.10347278, 0.06651585, -0.01512023, 0.09477805, 0.09886038,
-0.03837246, 0.02265131, -0.14867221, 0.00781244, 0.04845129,
-0.0363168 , -0.00186919, -0.16163988, 0.09539618, 0.14983718,
0.09159472, -0.05315595, -0.05073383, 0.01501674, -0.03789762,
0.07116041, 0.07650694, -0.02975985], dtype=float32)], 'names': ['john']}
But when I unpickle the file it only has the the data for rock and not the john. Can anyone please help me what I am doing wrong. Below is the code I am using to unpickle and watch what data is added. May be the way I am unpickling the file is wrong, because when I am appending the data I can see the file size increasing.
import pickle
file = open('G:\\output\\embeddings.pickle', 'rb')
data = pickle.load(file)
file.close()
print(data)
Please help. Thanks
Updated code:
file_path = client_dir + '\embeddings.pickle'
file = open(file_path, 'rb')
old_data = pickle.load(file)
new_embeddings = old_data['embeddings']
new_names = old_data['names']
new_embeddings.append(known_embeddings[0])
new_names.append(known_names[0])
data1 = {"embeddings": new_embeddings, "names": new_names}
with open(file_path, 'ab+') as fp:
pickle.dump(data1, fp)
fp.close()
log.error("[INFO] Data appended to embeddings.pickle ")
In the above code, I am first loading the data from the pickle file into list and then appending the new data into the list and then adding all the data (old + new) into the pickle file. Can anyone please tell me if this is the correct way of doing it.
After this as well, when I unpickle the file, I am not getting all the data. Thanks
file_path = client_dir + '\embeddings.pickle'
file = open(file_path, 'rb')
old_data = pickle.load(file)
new_embeddings = old_data['embeddings']
new_names = old_data['names']
new_embeddings.append(known_embeddings[0])
new_names.append(known_names[0])
data1 = {"embeddings": new_embeddings, "names": new_names}
with open(file_path, 'ab+') as fp:
pickle.dump(data1, fp)
fp.close()
log.error("[INFO] Data appended to embeddings.pickle ")
This looks pretty close to being correct to me. You succesfully load the pickled data and add new elements to it. The problem appears to be the with open(file_path, 'ab+') as fp: call. If you open the file in "a" mode, then the pickle data you write will get added to the end, after the old pickle data. Then, on subsequent executions of your program, pickle.load will only load the old pickle data.
Try overwriting the old pickle data completely with your new pickle data. You can do this by opening in "w" mode instead.
with open(file_path, 'wb') as fp:
pickle.dump(data1, fp)
Incidentally, you don't need that fp.close() call. A with statement automatically closes the opened file at the end of the block.
It can be done without loading the data first to improve speed:
use mode='ab' to create a new file if file doesn't exist, or append data if file exists:
pickle.dump((data), open('data folder/' + filename2save + '.pkl', 'ab'))
I'm attempting to convert yelps data set that is in JSON to a csv format. The new csv file that is created is empty.
I've tried different ways to iterate through the JSON but they all give me a zero bytes file.
The json file looks like this:
{"business_id":"1SWheh84yJXfytovILXOAQ","name":"Arizona Biltmore Golf Club","address":"2818 E Camino Acequia Drive","city":"Phoenix","state":"AZ","postal_code":"85016","latitude":33.5221425,"longitude":-112.0184807,"stars":3.0,"review_count":5,"is_open":0,"attributes":{"GoodForKids":"False"},"categories":"Golf, Active Life","hours":null}
import json
import csv
infile = open("business.json","r")
outfile = open("business2.csv","w")
data = json.load(infile)
infile.close()
out = csv.writer(outfile)
out.writerow(data[0].keys())
for row in data:
out.writerow(row.values())
I get an "extra data" message when the code runs. The new business2 csv file is empty and the size is zero bytes.
if you JSON has only one row.. then try this
infile = open("business.json","r")
outfile = open("business2.csv","w")
data = json.load(infile)
infile.close()
out = csv.writer(outfile)
#print(data.keys())
out.writerow(data.keys())
out.writerow(data.values())
Hi Please try the below code, by using with command the file access will automatically get closed when the control moves out of scope of with
infile = open("business.json","r")
outfile = open("business2.csv","w")
data = json.load(infile)
infile.close()
headers = list(data.keys())
values = list(data.values())
with open("business2.csv","w") as outfile:
out = csv.writer(outfile)
out.writerow(headers)
out.writerow(values)
You need to use with to close file.
import json
import csv
infile = open("business.json","r")
data = json.load(infile)
infile.close()
with open("business2.csv","w") as outfile:
out = csv.writer(outfile)
out.writerow(list(data.keys()))
out.writerow(list(data.values()))
Here's my code, really simple stuff...
import csv
import json
csvfile = open('file.csv', 'r')
jsonfile = open('file.json', 'w')
fieldnames = ("FirstName","LastName","IDNumber","Message")
reader = csv.DictReader( csvfile, fieldnames)
out = json.dumps( [ row for row in reader ] )
jsonfile.write(out)
Declare some field names, the reader uses CSV to read the file, and the filed names to dump the file to a JSON format. Here's the problem...
Each record in the CSV file is on a different row. I want the JSON output to be the same way. The problem is it dumps it all on one giant, long line.
I've tried using something like for line in csvfile: and then running my code below that with reader = csv.DictReader( line, fieldnames) which loops through each line, but it does the entire file on one line, then loops through the entire file on another line... continues until it runs out of lines.
Any suggestions for correcting this?
Edit: To clarify, currently I have: (every record on line 1)
[{"FirstName":"John","LastName":"Doe","IDNumber":"123","Message":"None"},{"FirstName":"George","LastName":"Washington","IDNumber":"001","Message":"Something"}]
What I'm looking for: (2 records on 2 lines)
{"FirstName":"John","LastName":"Doe","IDNumber":"123","Message":"None"}
{"FirstName":"George","LastName":"Washington","IDNumber":"001","Message":"Something"}
Not each individual field indented/on a separate line, but each record on it's own line.
Some sample input.
"John","Doe","001","Message1"
"George","Washington","002","Message2"
The problem with your desired output is that it is not valid json document,; it's a stream of json documents!
That's okay, if its what you need, but that means that for each document you want in your output, you'll have to call json.dumps.
Since the newline you want separating your documents is not contained in those documents, you're on the hook for supplying it yourself. So we just need to pull the loop out of the call to json.dump and interpose newlines for each document written.
import csv
import json
csvfile = open('file.csv', 'r')
jsonfile = open('file.json', 'w')
fieldnames = ("FirstName","LastName","IDNumber","Message")
reader = csv.DictReader( csvfile, fieldnames)
for row in reader:
json.dump(row, jsonfile)
jsonfile.write('\n')
You can use Pandas DataFrame to achieve this, with the following Example:
import pandas as pd
csv_file = pd.DataFrame(pd.read_csv("path/to/file.csv", sep = ",", header = 0, index_col = False))
csv_file.to_json("/path/to/new/file.json", orient = "records", date_format = "epoch", double_precision = 10, force_ascii = True, date_unit = "ms", default_handler = None)
import csv
import json
file = 'csv_file_name.csv'
json_file = 'output_file_name.json'
#Read CSV File
def read_CSV(file, json_file):
csv_rows = []
with open(file) as csvfile:
reader = csv.DictReader(csvfile)
field = reader.fieldnames
for row in reader:
csv_rows.extend([{field[i]:row[field[i]] for i in range(len(field))}])
convert_write_json(csv_rows, json_file)
#Convert csv data into json
def convert_write_json(data, json_file):
with open(json_file, "w") as f:
f.write(json.dumps(data, sort_keys=False, indent=4, separators=(',', ': '))) #for pretty
f.write(json.dumps(data))
read_CSV(file,json_file)
Documentation of json.dumps()
I took #SingleNegationElimination's response and simplified it into a three-liner that can be used in a pipeline:
import csv
import json
import sys
for row in csv.DictReader(sys.stdin):
json.dump(row, sys.stdout)
sys.stdout.write('\n')
You can try this
import csvmapper
# how does the object look
mapper = csvmapper.DictMapper([
[
{ 'name' : 'FirstName'},
{ 'name' : 'LastName' },
{ 'name' : 'IDNumber', 'type':'int' },
{ 'name' : 'Messages' }
]
])
# parser instance
parser = csvmapper.CSVParser('sample.csv', mapper)
# conversion service
converter = csvmapper.JSONConverter(parser)
print converter.doConvert(pretty=True)
Edit:
Simpler approach
import csvmapper
fields = ('FirstName', 'LastName', 'IDNumber', 'Messages')
parser = CSVParser('sample.csv', csvmapper.FieldMapper(fields))
converter = csvmapper.JSONConverter(parser)
print converter.doConvert(pretty=True)
I see this is old but I needed the code from SingleNegationElimination however I had issue with the data containing non utf-8 characters. These appeared in fields I was not overly concerned with so I chose to ignore them. However that took some effort. I am new to python so with some trial and error I got it to work. The code is a copy of SingleNegationElimination with the extra handling of utf-8. I tried to do it with https://docs.python.org/2.7/library/csv.html but in the end gave up. The below code worked.
import csv, json
csvfile = open('file.csv', 'r')
jsonfile = open('file.json', 'w')
fieldnames = ("Scope","Comment","OOS Code","In RMF","Code","Status","Name","Sub Code","CAT","LOB","Description","Owner","Manager","Platform Owner")
reader = csv.DictReader(csvfile , fieldnames)
code = ''
for row in reader:
try:
print('+' + row['Code'])
for key in row:
row[key] = row[key].decode('utf-8', 'ignore').encode('utf-8')
json.dump(row, jsonfile)
jsonfile.write('\n')
except:
print('-' + row['Code'])
raise
Add the indent parameter to json.dumps
data = {'this': ['has', 'some', 'things'],
'in': {'it': 'with', 'some': 'more'}}
print(json.dumps(data, indent=4))
Also note that, you can simply use json.dump with the open jsonfile:
json.dump(data, jsonfile)
Use pandas and the json library:
import pandas as pd
import json
filepath = "inputfile.csv"
output_path = "outputfile.json"
df = pd.read_csv(filepath)
# Create a multiline json
json_list = json.loads(df.to_json(orient = "records"))
with open(output_path, 'w') as f:
for item in json_list:
f.write("%s\n" % item)
How about using Pandas to read the csv file into a DataFrame (pd.read_csv), then manipulating the columns if you want (dropping them or updating values) and finally converting the DataFrame back to JSON (pd.DataFrame.to_json).
Note: I haven't checked how efficient this will be but this is definitely one of the easiest ways to manipulate and convert a large csv to json.
As slight improvement to #MONTYHS answer, iterating through a tup of fieldnames:
import csv
import json
csvfilename = 'filename.csv'
jsonfilename = csvfilename.split('.')[0] + '.json'
csvfile = open(csvfilename, 'r')
jsonfile = open(jsonfilename, 'w')
reader = csv.DictReader(csvfile)
fieldnames = ('FirstName', 'LastName', 'IDNumber', 'Message')
output = []
for each in reader:
row = {}
for field in fieldnames:
row[field] = each[field]
output.append(row)
json.dump(output, jsonfile, indent=2, sort_keys=True)
def read():
noOfElem = 200 # no of data you want to import
csv_file_name = "hashtag_donaldtrump.csv" # csv file name
json_file_name = "hashtag_donaldtrump.json" # json file name
with open(csv_file_name, mode='r') as csv_file:
csv_reader = csv.DictReader(csv_file)
with open(json_file_name, 'w') as json_file:
i = 0
json_file.write("[")
for row in csv_reader:
i = i + 1
if i == noOfElem:
json_file.write("]")
return
json_file.write(json.dumps(row))
if i != noOfElem - 1:
json_file.write(",")
Change the above three parameter, everything will be done.
import csv
import json
csvfile = csv.DictReader('filename.csv', 'r'))
output =[]
for each in csvfile:
row ={}
row['FirstName'] = each['FirstName']
row['LastName'] = each['LastName']
row['IDNumber'] = each ['IDNumber']
row['Message'] = each['Message']
output.append(row)
json.dump(output,open('filename.json','w'),indent=4,sort_keys=False)
I have data in a file and I need to write it to CSV file in specific column. The data in file is like this:
002100
002077
002147
My code is this:
import csv
f = open ("file.txt","r")
with open("watout.csv", "w") as output:
for line in f :
c.writerows(line)
It is always writes on the first column. How could I resolve this?
Thanks.
This is how I solved the problem
f1 = open ("inFile","r") # open input file for reading
with open('out.csv', 'w',newline="") as f:up # output csv file
writer = csv.writer(f)
with open('in.csv','r') as csvfile: # input csv file
reader = csv.reader(csvfile, delimiter=',')
for row in reader:
row[7] = f1.readline() # edit the 8th column
writer.writerow(row)
f1.close()
python 2 users replace
with open('out.csv', 'w',newline="") as f:
by
with open('out.csv', 'wb') as f:
Currently I'm using this:
f = open(filename, 'r+')
text = f.read()
text = re.sub('foobar', 'bar', text)
f.seek(0)
f.write(text)
f.close()
But the problem is that the old file is larger than the new file. So I end up with a new file that has a part of the old file on the end of it.
If you don't want to close and reopen the file, to avoid race conditions, you could truncate it:
f = open(filename, 'r+')
text = f.read()
text = re.sub('foobar', 'bar', text)
f.seek(0)
f.write(text)
f.truncate()
f.close()
The functionality will likely also be cleaner and safer using open as a context manager, which will close the file handler, even if an error occurs!
with open(filename, 'r+') as f:
text = f.read()
text = re.sub('foobar', 'bar', text)
f.seek(0)
f.write(text)
f.truncate()
The fileinput module has an inplace mode for writing changes to the file you are processing without using temporary files etc. The module nicely encapsulates the common operation of looping over the lines in a list of files, via an object which transparently keeps track of the file name, line number etc if you should want to inspect them inside the loop.
from fileinput import FileInput
for line in FileInput("file", inplace=1):
line = line.replace("foobar", "bar")
print(line)
Probably it would be easier and neater to close the file after text = re.sub('foobar', 'bar', text), re-open it for writing (thus clearing old contents), and write your updated text to it.
I find it easier to remember to just read it and then write it.
For example:
with open('file') as f:
data = f.read()
with open('file', 'w') as f:
f.write('hello')
To anyone who wants to read and overwrite by line, refer to this answer.
https://stackoverflow.com/a/71285415/11442980
filename = input("Enter filename: ")
with open(filename, 'r+') as file:
lines = file.readlines()
file.seek(0)
for line in lines:
value = int(line)
file.write(str(value + 1))
file.truncate()
Honestly you can take a look at this class that I built which does basic file operations. The write method overwrites and append keeps old data.
class IO:
def read(self, filename):
toRead = open(filename, "rb")
out = toRead.read()
toRead.close()
return out
def write(self, filename, data):
toWrite = open(filename, "wb")
out = toWrite.write(data)
toWrite.close()
def append(self, filename, data):
append = self.read(filename)
self.write(filename, append+data)
Try writing it in a new file..
f = open(filename, 'r+')
f2= open(filename2,'a+')
text = f.read()
text = re.sub('foobar', 'bar', text)
f.seek(0)
f.close()
f2.write(text)
fw.close()