Scraping a table into a dict and writing each row to CSV

Scraping a table into a dict and writing each row to CSV - python

I have a project that was using Scraperwiki to write to their sqlite store, but I need to just write a CSV. The catch is that all the data is stored in a dict, which works fine for writing to sqlite:
scraperwiki.sqlite.save(unique_keys = ['somekey'], data = data, table_name='fancy')
I run that after I scrape each row. There's no order to a dict so I can't just write the values to CSV. I've been looking over csv.DictWriter and collections.defaultdict, and I'm still wrapping my head around how I'd refactor my code so that I can write data, which is a dictionary to CSV instead. Here's an example of my code as structured now:
def store_exception(exception, line_number, some_string):
data = {
'timestamp' : datetime.now(),
'line_number': line_number,
'message' : exception,
'string' : some_string
}
scraperwiki.sqlite.save(unique_keys = ['timestamp'], data = data, table_name='error_log')
I think I want something like this though:
def store_exception(exception, line_number, some_string):
data = {
'timestamp' : datetime.now(),
'line_number': line_number,
'message' : exception,
'string' : some_string
}
d = defaultdict(lambda: "")
d_order = d['timestamp'],d['line_number'],d['message'],d['string']
with open('some/path.csv', 'w') as csvfile:
linewriter = csv.DictWriter(csvfile, d_order, delimiter='|',
quotechar='"', quoting=csv.QUOTE_MINIMAL)
linewriter.writerow(data)
That seems inefficient, though. Do I need both collections.defaultdict and csv.DictWriter?

This should do it. You don't need the default_dict at all, d_order just needs to be a list for field names, or in this case dict keys:
def store_exception(exception, line_number, some_string):
data = {
'timestamp' : datetime.now(),
'line_number': line_number,
'message' : exception,
'string' : some_string
}
d_order = ['timestamp', 'line_number', 'message', 'string']
with open('some/path.csv', 'w') as csvfile:
linewriter = csv.DictWriter(csvfile, d_order, delimiter='|',
quotechar='"', quoting=csv.QUOTE_MINIMAL)
linewriter.writerow(data)

Related

How to properly open and encode CSV file in Python to be processed in Odoo framework

I tried to import a CSV file in Odoo custom module, but my logic stopped at some point where I decode the file object. Below is my code:
def import_csv(self, csv_file):
reader = csv.reader(csv_file)
next(reader)
for row in reader:
record = {
'name' : row[0],
'component_name' : row[1],
'percentage' : row[2],
'processing_start_date' : row[3],
'finished_real_date' : row[4],
}
self.env['item.master'].create(record)
def action_import_csv(self):
outfile = open('test.csv', 'r')
data_record = outfile.read()
ir_values = {
'name': 'test.csv',
'datas': data_record,
}
data_id = self.env['ir.attachment'].sudo().create(ir_values)
self.import_csv(data_id)
It raises an error:
binascii.Error: Invalid base64-encoded string: number of data
characters (141) cannot be 1 more than a multiple of 4
What is actually wrong in my code?
I've tried to put this line too:
data_record = base64.b64encode(outfile.read())
Right after the file opened, but a different error is raised:
TypeError: a bytes-like object is required, not 'str'

When saving an attachment, you need to base64-encode it; likewise when retrieving it, it must be base64-decoded.
Here is how you might create an attachment instance (in the Odoo 14 shell):
>>> import base64, csv, io
>>> # Example csv data.
>>> data = """a,b,c\n1,2,3\n4,5,6"""
>>> Att = env['ir.attachment']
>>> # Encoding as UTF-8 is not required if the data is already bytes, for example if
>>> # you read the csv file in binary mode ('rb').
>>> att = Att.create({'name': 'foo', 'datas': base64.b64encode(data.encode('utf-8')), 'mimetype': 'text/csv'})
>>> att.datas
b'YSxiLGMKMSwyLDMKNCw1LDY='
>>> env.cr.commit()
Here is how you can retrieve the data, and pass it to the csv reader.
>>> Load the decoded data into a file-like object that csv.reader can use.
>>> buf = io.StringIO(base64.b64decode(att.datas).decode('utf-8'))
>>> reader = csv.reader(buf)
>>> list(reader)
[['a', 'b', 'c'], ['1', '2', '3'], ['4', '5', '6']]
>>> buf.close()
>>>
Your code might look like this (untested):
def import_csv(self, attachment):
# The correct encoding will be that used to encode the original file.
# Modern systems will use UTF-8, but some Windows systems could use UTF-8-SIG,
# UTF-16 or a legacy 8-bit encoding like cp1252.
csv_data = base64.b64decode(attachment.datas).decode('utf-8')
csv_file = io.StringIO(csv_data)
reader = csv.reader(csv_file)
next(reader)
for row in reader:
record = {
'name' : row[0],
'component_name' : row[1],
'percentage' : row[2],
'processing_start_date' : row[3],
'finished_real_date' : row[4],
}
self.env['item.master'].create(record)
def action_import_csv(self):
outfile = open('test.csv', 'rb')
data_record = outfile.read()
ir_values = {
'name': 'test.csv',
'datas': base64.b64encode(data_record),
}
data_id = self.env['ir.attachment'].sudo().create(ir_values)
self.import_csv(data_id)

Not getting expected output in python when converting a csv to json

I have an excel file in which data is saved in csv format in such a way.This data is present in the excel file as shown below,under column A (The CSV File is generated by LabView Software code which i have written to generate data).I have also attached an image of the csv file for reference at the end of my question.
RPM,Load Current,Battery Output,Power Capacity
1200,30,12,37
1600,88,18,55
I want to create a Json file in such format
{
"power_capacity_data" :
{
"rpm" : ["1200","1600"],
"load_curr" : ["30","88"],
"batt_output" : ["12","18"],
"power_cap" : ["37","55"]
}
}
This is my code
import csv
import json
def main():
#created a dictionary so that i can append data to it afterwards
power_data = {"rpm":[],"load_curr":[],"batt_output":[],"power_cap":[]}
with open('power1.lvm') as f:
reader = csv.reader(f)
#trying to append the data of column "RPM" to dictionary
rowcount = 0
for row in reader:
if rowcount == 0:
#trying to skip the first row
rowcount = rowcount + 1
else:
power_data['rpm'].append(row[0])
print(row)
json_report = {}
json_report['pwr_capacity_data'] = power_data
with open('LVMJSON', "w") as f1:
f1.write(json.dumps(json_report, sort_keys=False, indent=4, separators=(',', ': '),encoding="utf-8",ensure_ascii=False))
f1.close()
if __name__ == "__main__":
main()
The output json file that i am getting is this:(please ignore the print(row) statement in my code)
{
"pwr_capacity_data":
{
"load_curr": [],
"rpm": [
"1200,30,12.62,37.88",
"1600,88,18.62,55.88"
],
"batt_output": [],
"power_cap": []
}
}
The whole row is getting saved in the list,but I just want the values under the column RPM to be saved .Can someone help me out with what I may be doing wrong.Thanks in advance.I have attached an image of csv file to just in case it helps

You could use Python's defaultdict to make it a bit easier. Also a dictionary to map all your header values.
from collections import defaultdict
import csv
import json
power_data = defaultdict(list)
header_mappings = {
'RPM' : 'rpm',
'Load Current' : 'load_curr',
'Battery Output' : 'batt_output',
'Power Capacity' : 'power_cap'}
with open('power1.lvm', newline='') as f_input:
csv_input = csv.DictReader(f_input)
for row in csv_input:
for key, value in row.items():
power_data[header_mappings[key]].append(value)
with open('LVMJSON.json', 'w') as f_output:
json.dump({'power_capacity_data' : power_data}, f_output, indent=2)
Giving you an output JSON file looking like:
{
"power_capacity_data": {
"batt_output": [
"12",
"18"
],
"power_cap": [
"37",
"55"
],
"load_curr": [
"30",
"88"
],
"rpm": [
"1200",
"1600"
]
}
}

Python : Normalize Json response (array)

I am new to JSON and Python,I am trying to achieve below
Need to parse below JSON
{
"id": "12345abc",
"codes": [
"BSVN1FKW3JKKNNMN",
"HJYYUKJJL999OJR",
"DFTTHJJJJ0099JUU",
"FGUUKHKJHJGJJYGJ"
],
"ctr": {
"source": "xyz",
"user_id": "1234"
}
}
Expected output:Normalized on "codes" value
ID~CODES~USER_ID
12345abc~BSVN1FKW3JKKNNMN~1234
12345abc~HJYYUKJJL999OJR~1234
12345abc~DFTTHJJJJ0099JUU~1234
12345abc~FGUUKHKJHJGJJYGJ~1234
Started with below ,but need help to get to my desired output.
The "codes" block can have n number of values separated by comma.
The below code is throwing an error "TypeError: string indices must be integers"
#!/usr/bin/python
import os
import json
import csv
f = open('rspns.csv','w')
writer = csv.writer(f,delimiter = '~')
headers = [‘ID’,’CODES’,’USER_ID’]
default = ''
writer.writerow(headers)
string = open('sample.json').read().decode('utf-8')
json_obj = json.loads(string)
#print json_obj['id']
#print json_obj['codes']
#print json_obj['codes'][0]
#print json_obj['codes'][1]
#print json_obj['codes’][2]
#print json_obj['codes’][3]
#print json_obj['ctr’][‘user_id']
for keyword in json_obj:
row = []
row.append(str(keyword['id']))
row.append(str(keyword['codes']))
row.append(str(keyword['ctr’][‘user_id']))
writer.writerow(row)

If your json_obj looks exactly like that , that is it is a dictionary, then the issue is that when you do -
for keyword in json_obj:
You are iterating over keys in json_obj, then if you try to access ['id'] for that key it should error out saying string indices must be integers .
You should first get the id and user_id before looping and then loop over json_obj['codes'] and then add the previously computed id and user_id along with the current value from codes list to the writer csv as a row.
Example -
import json
import csv
string = open('sample.json').read().decode('utf-8')
json_obj = json.loads(string)
with open('rspns.csv','w') as f:
writer = csv.writer(f,delimiter = '~')
headers = ['ID','CODES','USER_ID']
writer.writerow(headers)
id = json_obj['id']
user_id = json_obj['ctr']['user_id']
for code in json_obj['codes']:
writer.writerow([id,code,user_id])

You don't want to iterate through json_obj as that is a dictionary and iterating through will get the keys. The TypeError is caused by trying to index into the keys ('id', 'code', and 'ctr') -- which are strings -- as if they were a dictionary.
Instead, you want a separate row for each code in json_obj['codes'] and to use the json_obj dictionary for your lookups:
for code in json_obj['codes']:
row = []
row.append(json_obj['id'])
row.append(code)
row.append(json_obj['ctr’][‘user_id'])
writer.writerow(row)

How to build a nested ordered dict from a csv?

How can I get a nested dictionary, where both the keys and the subkeys are precisely in the same order as in the csv file?
I tried
import csv
from collections import OrderedDict
filename = "test.csv"
aDict = OrderedDict()
with open(filename, 'r') as f:
csvReader = csv.DictReader(f)
for row in csvReader:
key = row.pop("key")
aDict[key] = row
where test.csv looks like
key,number,letter
eins,1,a
zwei,2,b
drei,3,c
But the sub-dictionaries are not ordered (rows letter and number are changed). So how can I populate aDict[key] in an ordered manner?

You have to build the dictionaries and sub-dictionaries yourself from rows returned from csv.reader which are sequences, instead of using csv.DictReader.
Fortunately that's fairly easy:
import csv
from collections import OrderedDict
filename = 'test.csv'
aDict = OrderedDict()
with open(filename, 'rb') as f:
csvReader = csv.reader(f)
fields = next(csvReader)
for row in csvReader:
temp = OrderedDict(zip(fields, row))
key = temp.pop("key")
aDict[key] = temp
import json # just to create output
print(json.dumps(aDict, indent=4))
Output:
{
"eins": {
"number": "1",
"letter": "a"
},
"zwei": {
"number": "2",
"letter": "b"
},
"drei": {
"number": "3",
"letter": "c"
}
}

This is one way:
import csv
from collections import OrderedDict
filename = "test.csv"
aDict = OrderedDict()
with open(filename, 'r') as f:
order = next(csv.reader(f))[1:]
f.seek(0)
csvReader = csv.DictReader(f)
for row in csvReader:
key = row.pop("key")
aDict[key] = OrderedDict((k, row[k]) for k in order)

csv.DictReader loads the rows into a regular dict and not an ordered one. You'll have to read the csv manually into an OrderedDict to get the order you need:
from collections import OrderedDict
filename = "test.csv"
dictRows = []
with open(filename, 'r') as f:
rows = (line.strip().split(',') for line in f)
# read column names from first row
columns = rows.next()
for row in rows:
dictRows.append(OrderedDict(zip(columns, row)))

You can take advantage of the existing csv.DictReader class, but alter the rows it returns. To do that, add the following class to the beginning of your script:
class OrderedDictReader(csv.DictReader):
def next(self):
# Get a row using csv.DictReader
row = csv.DictReader.next(self)
# Create a new row using OrderedDict
new_row = OrderedDict(((k, row[k]) for k in self.fieldnames))
return new_row
Then, use this class in place of csv.DictReader:
csvReader = OrderedDictReader(f)
The rest of your code remains the same.

Python: csv to json converter value to key pair

I had a Python beginners course last year. Now I am trying to get a csv to json converter. I have searched quite some time and adapted and changed some of the code I found, until the output looked similar to what I want. I am using Python 3.4.2.
#kvorobiev this is an excerpt of my CSV, but it will do for the case. The first time Converting will work. After the second time you will see that the order of the headings will change within the json file.
The csv file looks like this
Document;Item;Category
4;10;C
What I am getting in the output file as of now (after applying the changes from kvorobiev):
[
{
"Item": "10",
"Category": "C",
"Document": "4"
};
]
The json string I want to get in the output file should look like:
[
{
"Document": "4",
"Item": "10",
"Category": "C"
},
]
You will notice the headings are in the wrong order.
Here is the code:
import json
import csv
csvfile = open('file1.csv', 'r')
jsonfile = open('file1.csv'.replace('.csv','.json'), 'w')
jsonfile.write('[' + '\n' + ' ')
fieldnames = csvfile.readline().replace('\n','').split(';')
num_lines = sum(1 for line in open('file.csv')) -1
reader = csv.DictReader(csvfile, fieldnames)
i = 0
for row in reader:
i += 1
json.dump(row, jsonfile, indent=4,sort_keys=False)
if i < num_lines:
jsonfile.write(',')
jsonfile.write('\n')
jsonfile.write(' ' + ']')
print('Done')
Thanks for helping.

Replace line
reader = csv.DictReader(csvfile, fieldnames)
with
reader = csv.DictReader(csvfile, fieldnames, delimiter=';')
Also, you open file1.csv and later get lines number from file.csv
num_lines = sum(1 for line in open('file.csv')) -2
Your solution could be reduced to
import json
import csv
csvfile = open('file1.csv', 'r')
jsonfile = open('file1.csv'.replace('.csv','.json'), 'w')
jsonfile.write('{\n[\n')
fieldnames = csvfile.readline().replace('\n','').split(';')
reader = csv.DictReader(csvfile, fieldnames, delimiter=';')
for row in reader:
json.dump(row, jsonfile, indent=4)
jsonfile.write(';\n')
jsonfile.write(']\n}')
If you want to save order of columns from csv you could use
from collections import OrderedDict
...
for row in reader:
json.dump(OrderedDict([(f, row[f]) for f in fieldnames]), jsonfile, indent=4)
jsonfile.write(';\n')
jsonfile.write(']\n}')

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Scraping a table into a dict and writing each row to CSV - python

Related

How to properly open and encode CSV file in Python to be processed in Odoo framework

Not getting expected output in python when converting a csv to json

Python : Normalize Json response (array)

How to build a nested ordered dict from a csv?

Python: csv to json converter value to key pair

Categories

Resources