I'm writing a program that obtains data from a database using pyodbc, the end goal being to analyze this data with a pandas.
as it stands, my program works quite well to connect to the database and collect the data that I need, however I'm having some trouble organizing or formatting this data in such a way that I can analyze it with pandas, or simply write it out clean to a .csv file (I know I can do this with pandas as well).
Here is the basis of my simple program:
from Logger import Logger
import pyodbc
from configparser import ConfigParser
from connectDB import connectDatabase, disconnectDatabase
config = ConfigParser()
config.read('config.ini')
getNeedlesPlaintiffs = config.get('QUERIES', 'pullNeedlesPlaintiffs')
getNeedlesDefendants = config.get('QUERIES', 'pullNeedlesDefendants')
def pullNeedlesData():
Logger.writeAndPrintLine("Connecting to needles db...", 0)
cnxn = connectDatabase()
if cnxn:
cursor=cnxn.cursor()
Logger.writeAndPrintLine("Connection successful. Getting Plaintiffs...", 0)
cursor.execute(getNeedlesPlaintiffs)
with open('needlesPlaintiffs.csv', 'w') as f:
for row in cursor.fetchall():
row = str(row)
f.write(row)
f.close()
Logger.writeAndPrintLine("Plaintiffs written to file, getting Defendants...", 0)
cursor.execute(getNeedlesDefendants)
with open('needlesDefendants.csv', 'w') as d:
for row in cursor.fetchall():
row = str(row)
d.write(row)
d.close()
disconnectDatabase(cnxn)
Logger.writeAndPrintLine("Defendants obtained, written to file.", 0)
else:
Logger.writeAndPrintLine("Connection to Needles DB Failed.", 2)
if __name__ == "__main__":
pullNeedlesData()
However, the output I'm getting in the .csv (and console) is simply unworkable. I would like to parse my data into a list of dictionaries, so that I can more easily use it for analysis with pandas.
For example, something like this (which I can then json.loads() into a pandas dataframe):
text_data = '[{"lname": "jones", "fname": "matt", "dob": "01-02-1990", "addr1": "28 sheffield dr"},\
{"lname": "kalinski", "fname": "fred", "dob": "01-02-1980", "addr1": "28 purple st"}, \
{"lname": "kyle", "fname": "ken", "dob": "05-01-1978", "addr1": "28 carlisle dr"}, \
{"lname": "jones", "fname": "matt", "dob": "01-02-1990", "addr1": "new address"}, \
{"lname": "kalinski", "fname": "fred", "dob": "01-02-1980", "addr1": "28 purple st"}, \
{"lname": "kyle", "fname": "ken", "dob": "05-01-1979", "addr1": "other address"}]'
Where I am now, I'm simply at a loss for how one would go about parsing this data from pyodbc.fetchall() into what I know I can work with- a list of dictionaries. Additionally, I would eventually like to print results to csv in a readable way.
My data is currently returned in a format like this:
(238384, 'Mr. Nathan Brown', 'Person', datetime.date(1989, 2, 3), '41 Fake Rd 1 \r\nTownName, State 13827')(283928, 'Mr. Logan Green', 'Person', datetime.date(2003, 5, 18), '36 county rd \r\nTownName, State 14432')(38272, 'Mrs. Penellope Blue', 'Person', datetime.date(1988, 1, 27), '123 fake st \r\nTownName, State, 14280)(...)
I realize I need to create an empty list object, then parse each row into a dictionary, and add it to the list- but I've never had to work with data on this scale and I'm wondering if there's a library or something that makes this type of work easier to accomplish.
Thank you for any insights.
Why not just import the data directly into pandas ?
df = pd.read_sql_query(sql_query, db.connection)
Related
Hello everyone and happy new year! I'm posting for the first time here since I can't find an answer for my problem, I've been searching for 2 days now and can't figure out what to do and it's driving me crazy... Here's my problem:
I'm using the Steamspy api to download data of the 100 most played games in the last 2 weeks on Steam, I'm able to dump the downloaded data into a .json file (dumped_data.json in this case), and to import it back into my Python code (I can print it). And while I can print the whole thing, it's not really useable as it's 1900 lines of more or less useful data, so I want to be able to print only specific items of the dumped data. In this project, I would like to only print the name, developer, owners and price of the first 10 games on the list. It's my first Python project, and I can't figure out how to do this...
Here's the Python code:
import steamspypi
import json
#gets data using the steamspy api
data_request = dict()
data_request["request"] = "top100in2weeks"
#dumps the data into json file
steam_data = steamspypi.download(data_request)
with open("dumped_data.json", "w") as jsonfile:
json.dump(steam_data, jsonfile, indent=4)
#print values from dumped file
f = open("dumped_data.json")
data= json.load(f)
print(data)
And here's an exemple of the first item in the json file:
{
"570": {
"appid": 570,
"name": "Dota 2",
"developer": "Valve",
"publisher": "Valve",
"score_rank": "",
"positive": 1378093,
"negative": 267290,
"userscore": 0,
"owners": "100,000,000 .. 200,000,000",
"average_forever": 35763,
"average_2weeks": 1831,
"median_forever": 1079,
"median_2weeks": 1020,
"price": "0",
"initialprice": "0",
"discount": "0",
"ccu": 553462
},
Thanks in advance to everyone that's willing to help me, it would mean a lot.
The following prints the values you desire from the first 10 games:
for game in list(data.values())[:10]:
print(game["name"], game["developer"], game["owners"], game["price"])
I have a text file that is 26 Gb, The line format is as follow
/type/edition /books/OL10000135M 4 2010-04-24T17:54:01.503315 {"publishers": ["Bernan Press"], "physical_format": "Hardcover", "subtitle": "9th November - 3rd December, 1992", "key": "/books/OL10000135M", "title": "Parliamentary Debates, House of Lords, Bound Volumes, 1992-93", "identifiers": {"goodreads": ["6850240"]}, "isbn_13": ["9780107805401"], "languages": [{"key": "/languages/eng"}], "number_of_pages": 64, "isbn_10": ["0107805405"], "publish_date": "December 1993", "last_modified": {"type": "/type/datetime", "value": "2010-04-24T17:54:01.503315"}, "authors": [{"key": "/authors/OL2645777A"}], "latest_revision": 4, "works": [{"key": "/works/OL7925046W"}], "type": {"key": "/type/edition"}, "subjects": ["Government - Comparative", "Politics / Current Events"], "revision": 4}
I'm trying to get only the last columns which is a json and from that Json I'm only trying to save the "title", "isbn 13", "isbn 10"
I was able to save only the last column with this code
csv.field_size_limit(sys.maxsize)
# File names: to read in from and read out to
input_file = '../inputFile/ol_dump_editions_2019-10-31.txt'
output_file = '../outputFile/output.txt'
## ==================== ##
## Using module 'csv' ##
## ==================== ##
with open(input_file) as to_read:
with open(output_file, "w") as tmp_file:
reader = csv.reader(to_read, delimiter = "\t")
writer = csv.writer(tmp_file)
desired_column = [4] # text column
for row in reader: # read one row at a time
myColumn = list(row[i] for i in desired_column) # build the output row (process)
writer.writerow(myColumn) # write it
but this doesn't return a proper json object instead returns everything with a double quotations next to it. Also how would I extract certain values from the json as a new json
EDIT:
"{""publishers"": [""Bernan Press""], ""physical_format"": ""Hardcover"", ""subtitle"": ""9th November - 3rd December, 1992"", ""key"": ""/books/OL10000135M"", ""title"": ""Parliamentary Debates, House of Lords, Bound Volumes, 1992-93"", ""identifiers"": {""goodreads"": [""6850240""]}, ""isbn_13"": [""9780107805401""], ""languages"": [{""key"": ""/languages/eng""}], ""number_of_pages"": 64, ""isbn_10"": [""0107805405""], ""publish_date"": ""December 1993"", ""last_modified"": {""type"": ""/type/datetime"", ""value"": ""2010-04-24T17:54:01.503315""}, ""authors"": [{""key"": ""/authors/OL2645777A""}], ""latest_revision"": 4, ""works"": [{""key"": ""/works/OL7925046W""}], ""type"": {""key"": ""/type/edition""}, ""subjects"": [""Government - Comparative"", ""Politics / Current Events""], ""revision"": 4}"
EDIT 2:
so im trying to read this file which is a tab separated file with the following columns:
type - type of record (/type/edition, /type/work etc.)
key - unique key of the record. (/books/OL1M etc.)
revision - revision number of the record
last_modified - last modified timestamp
JSON - the complete record in JSON format
Im trying to read the JSON file and from that Json im only trying to get the "title", "isbn 13", "isbn 10" as a json and save it to the file as a row
so every row should look like the original but with only those key and values
Here's a straight-forward way of doing it. You would need to repeat this and extract the desired data from each line of the file as it's being read, line-by-line (the default way text file reading is handled in Python).
import json
line = '/type/edition /books/OL10000135M 4 2010-04-24T17:54:01.503315 {"publishers": ["Bernan Press"], "physical_format": "Hardcover", "subtitle": "9th November - 3rd December, 1992", "key": "/books/OL10000135M", "title": "Parliamentary Debates, House of Lords, Bound Volumes, 1992-93", "identifiers": {"goodreads": ["6850240"]}, "isbn_13": ["9780107805401"], "languages": [{"key": "/languages/eng"}], "number_of_pages": 64, "isbn_10": ["0107805405"], "publish_date": "December 1993", "last_modified": {"type": "/type/datetime", "value": "2010-04-24T17:54:01.503315"}, "authors": [{"key": "/authors/OL2645777A"}], "latest_revision": 4, "works": [{"key": "/works/OL7925046W"}], "type": {"key": "/type/edition"}, "subjects": ["Government - Comparative", "Politics / Current Events"], "revision": 4}'
csv_cols = line.split('\t')
json_data = json.loads(csv_cols[4])
#print(json.dumps(json_data, indent=4))
desired = {key: json_data[key] for key in ("title", "isbn_13", "isbn_10")}
result = json.dumps(desired, indent=4)
print(result)
Output from sample line:
{
"title": "Parliamentary Debates, House of Lords, Bound Volumes, 1992-93",
"isbn_13": [
"9780107805401"
],
"isbn_10": [
"0107805405"
]
}
So given that your current code returns the following:
result = '{""publishers"": [""Bernan Press""], ""physical_format"": ""Hardcover"", ""subtitle"": ""9th November - 3rd December, 1992"", ""key"": ""/books/OL10000135M"", ""title"": ""Parliamentary Debates, House of Lords, Bound Volumes, 1992-93"", ""identifiers"": {""goodreads"": [""6850240""]}, ""isbn_13"": [""9780107805401""], ""languages"": [{""key"": ""/languages/eng""}], ""number_of_pages"": 64, ""isbn_10"": [""0107805405""], ""publish_date"": ""December 1993"", ""last_modified"": {""type"": ""/type/datetime"", ""value"": ""2010-04-24T17:54:01.503315""}, ""authors"": [{""key"": ""/authors/OL2645777A""}], ""latest_revision"": 4, ""works"": [{""key"": ""/works/OL7925046W""}], ""type"": {""key"": ""/type/edition""}, ""subjects"": [""Government - Comparative"", ""Politics / Current Events""], ""revision"": 4}'
Looks like what you need to do is: First - Replace those double-double-quotes with regular double quotes, otherwise things are not parsible:
res = result.replace('""','"')
Now res is convertible to a JSON object:
import json
my_json = json.loads(res)
my_json now looks like this:
{'authors': [{'key': '/authors/OL2645777A'}],
'identifiers': {'goodreads': ['6850240']},
'isbn_10': ['0107805405'],
'isbn_13': ['9780107805401'],
'key': '/books/OL10000135M',
'languages': [{'key': '/languages/eng'}],
'last_modified': {'type': '/type/datetime',
'value': '2010-04-24T17:54:01.503315'},
'latest_revision': 4,
'number_of_pages': 64,
'physical_format': 'Hardcover',
'publish_date': 'December 1993',
'publishers': ['Bernan Press'],
'revision': 4,
'subjects': ['Government - Comparative', 'Politics / Current Events'],
'subtitle': '9th November - 3rd December, 1992',
'title': 'Parliamentary Debates, House of Lords, Bound Volumes, 1992-93',
'type': {'key': '/type/edition'},
'works': [{'key': '/works/OL7925046W'}]}
You can conveniently get any field you want from this object:
my_json['title']
# 'Parliamentary Debates, House of Lords, Bound Volumes, 1992-93'
my_json['isbn_10'][0]
# '0107805405'
Especially because your example is so large, I'd recommend using a specialized library such as pandas, which has a read_csv method, or even dask, which supports out-of-memory operations.
Both of these systems will automatically parse out the quotations for you, and dask will do so in "pieces" direct from disk so you never have to try to load 26GB into RAM.
In both libraries, you can then access the columns you want like this:
data = read_csv(PATH)
data["ColumnName"]
You can then parse these rows either using json.loads() (import json) or you can use the pandas/dask json implementations. If you can give some more details of what you're expecting, I can help you draft a more specific code example.
Good luck!
I saved your data to a file to see if i could read just the rows, let me know if this works:
lines = zzread.split('\n')
temp=[]
for to_read in lines:
if len(to_read) == 0:
break
new_to_read = '{' + to_read.split('{',1)[1]
temp.append(json.loads(new_to_read))
for row in temp:
print(row['isbn_13'])
If that works this should create a json for you:
lines = zzread.split('\n')
temp=[]
for to_read in lines:
if len(to_read) == 0:
break
new_to_read = '{' + to_read.split('{',1)[1]
temp.append(json.loads(new_to_read))
new_json=[]
for row in temp:
new_json.append({'title': row['title'], 'isbn_13': row['isbn_13'], 'isbn_10': row['isbn_10']})
I'm building a small API to interact with our database for other projects. I've built the database and have the API functioning fine, however, the data I get back isn't structured how I want it.
I am using Python with Flask/Flask-Restful for the API.
Here is a snippet of my Python that handles the interaction:
class Address(Resource):
def get(self, store):
print('Received a request at ADDRESS for Store ' + store )
conn = sqlite3.connect('store-db.db')
cur = conn.cursor()
addresses = cur.execute('SELECT * FROM Sites WHERE StoreNumber like ' + store)
for adr in addresses:
return(adr, 200)
If I make a request to the /sites/42 endpoint, where 42 is the site id, this is what I'll receive:
[
"42",
"5000 Robinson Centre Drive",
"",
"Pittsburgh",
"PA",
"15205",
"(412) 787-1330",
"(412) 249-9161",
"",
"Dick's Sporting Goods"
]
Here is how it is structured in the database:
Ultimately I'd like to use the column name as the Key in the JSON that's received, but I need a bit of guidance in the right direction so I'm not Googling ambiguous terms hoping to find something.
Here is an example of what I'd like to receive after making a request to that endpoint:
{
"StoreNumber": "42",
"Street": "5000 Robinson Centre Drive",
"StreetSecondary": "",
"City": "Pittsburgh",
"State": "PA",
"ZipCode": "15205",
"ContactNumber": "(412) 787-1330",
"XO_TN": "(412) 249-9161",
"RelocationStatus": "",
"StoreType": "Dick's Sporting Goods"
}
I'm just looking to get some guidance on if I should change how my data is structured in the database (i.e. I've seen some just put the JSON in their database, but I think that's messy) or if there's a more intuitive method I could use to control my data.
Updated Code using Accepted Answer
class Address(Resource):
def get(self, store):
print('Received a request at ADDRESS for Store ' + store )
conn = sqlite3.connect('store-db.db')
cur = conn.cursor()
addresses = cur.execute('SELECT * FROM Sites WHERE StoreNumber like ' + store)
for r in res:
column_names = ["StoreNumber", "Street", "StreetSecondary","City","State", "ZipCode", "ContactNumber", "XO_TN", "RelocationStatus", "StoreType"]
data = [r[0], r[1], r[2], r[3], r[4], r[5], r[6], r[7], r[8]]
datadict = {column_names[itemindex]:item for itemindex, item in enumerate(data)}
return(datadict, 200)
You could just convert your list to a dict and then parse it to a JSON string before passing it back out.
// These are the names of the columns in your database
>>> column_names = ["storeid", "address", "etc"]
// This is the data coming from the database.
// All data is passed as you are using SELECT * in your query
>>> data = [42, "1 the street", "blah"]
// This is a quick notation for creating a dict from a list
// enumerate means we get a list index and a list item
// as the columns are in the same order as the data, we can use the list index to pull out the column_name
>>> datadict = {column_names[itemindex]:item for itemindex, item in enumerate(data)}
//This just prints datadict in my terminal
>>> datadict
We now have a named dict containing your data and the column names.
{'etc': 'blah', 'storeid': 42, 'address': '1 the street'}
Now dump the datadict to a string so that it can be sent to the frontend.
>>> import json
>>> json.dumps(datadict)
The dict has now been converted to a string.
'{"etc": "blah", "storeid": 42, "address": "1 the street"}'
This would require no change to your database but the script would need to know about the column names or retrieve them dynamically using some SQL.
If the data in the database is in the correct format for passing to the frontend then you shouldn't need to change the database structure. If it was not in the correct format then you could either change the way it was stored or change your SQL query to manipulate it.
I have lists of tuples, where some of the tuples contain lists as the first element. I need to convert this list into json array.
I have also read up on this issue on StackOverflow, such as here, here and here and many others, however none of which address directly the issue.
I have tried the following approach, as found here
In this approach, each list that I have is generated after iterating through each line of a txt file and performaring an operation on each line.
Example code:
infile:
Dalmations are white and yes they are pets who live at home.
Huskies tend to be grey and can be pets who also live at home.
Pitbulls usually are beige and can be pets who sometimes live at home.
sample code:
inFile = '/data/sample.txt'
f = open(inFile, 'r').readlines()
def Convert(tup, di):
for a, b in tup:
di.setdefault(a[0]).append(b)
return di
dictionary = {}
for line in f:
keyTerms = extractTerms(line)
print keyTerms
result of extractTerms
[(u'dalmations', u'dog'), (u'white', u'color'), (u'yes', u'pet'), (u'house', u'location')]
[(u'huskies', u'dog'), (u'grey', u'color'), (u'yes', u'pet'),(u'house',u'location')]
[(u'pitbulls', u'dog'), (u'beige', u'color'), (u'yes', u'pet'),(u'house',u'location')]
allTerms = [(expandAllKeyTerms(a), b) for (a,b) in keyTerms]
print allTerms
[([u'dalmations', u'dalmation', u'dalmashun', u'dalmationz'], u'dog'), ([u'white'], u'color'), ([u'yes'], u'pet'), ([u'home'], u'location')]
[([u'huskies', u'husky', u'huskies'], u'dog'), ([u'grey'], u'color'), ([u'yes'], u'pet'), ([u'home'], u'location')]
[([u'pitbulls'], u'dog'), ([u'beige'], u'color'), ([u'yes'], u'pet'), ([u'home'], u'location')]
new = (Convert(allTerms, dictionary))
print new
sample (wrong) final output:
{u'dog': [u'dalmations', u'huskies', u'pitbulls'], u'color': [u'white', u'grey', u'beige'], u'pet': [u'yes', u'yes', u'yes'], u'location': [u'home', u'home', u'home']}
I have also tried using import json // json.dumps(dictionary), however, it also associates all of the values to the one corresponding key in stead of maintaing each individual line as its own entry.
My goal is to arrive at the following format
[{u'dog': [u'dalmations'], u'color': [u'white'], u'pet': [u'yes'], u'location': [u'home']};
{u'dog': [u'huskies'], u'color': [u'grey'], u'pet': [u'yes'], u'location': [u'home']};
{u'dog': [ u'pitbulls'], u'color': [u'beige'], u'pet': [u'yes'], u'location': [u'home']}]
Is there a way to arrive at my desires output using json library or another list comprehension?
The code below seems to work. Might work for your whole data set depending on what you're looking to do (how flexible you need things to be).
#!/usr/env python3
import json
a = [
[
('dalmations', 'dog'),
('white', 'color'),
('yes', 'pet'),
('house', 'location'),
],
[
('huskies', 'dog'),
('grey', 'color'),
('yes', 'pet'),
('house','location')
],
[
('pitbulls', 'dog'),
('beige', 'color'),
('yes', 'pet'),
('house','location')
]
]
b = []
for row in a:
data = {}
for word in row:
data[word[1]] = [word[0],]
b.append(data)
print(json.dumps(b))
This gives:
[
{
"color": ["white"],
"pet": ["yes"],
"dog": ["dalmations"],
"location": ["house"]
},
{
"color": ["grey"],
"pet": ["yes"],
"dog": ["huskies"],
"location": ["house"]
},
{
"color": ["beige"],
"pet": ["yes"],
"dog": ["pitbulls"],
"location": ["house"]
}
]
I'm trying to convert a JSON file to CSV format (in memory), so that I can pass it to another Transformer in Mulesoft. Here is a snippet of the JSON:
[
{
"observationid": 1,
"fkey_observation": 1,
"value": 1,
"participantid": null,
"uom": "ppb",
"finishtime": 1008585047000,
"starttime": 1008581447000,
"observedproperty": "NO2",
"measuretime": 1008581567000,
"measurementid": 1,
"longitude": 3.1415,
"identifier": "Test-1",
"latitude": 10
},
{
"observationid": 1,
"fkey_observation": 1,
"value": 12,
"participantid": null,
"uom": "ppb",
"finishtime": 1008585047000,
"starttime": 1008581447000,
"observedproperty": "SO2",
"measuretime": 1008582047000,
"measurementid": 2,
"longitude": 5,
"identifier": "Test-1",
"latitude": 11
}
]
Essentially, this should create a CSV (in memory) with 2 rows, that looks like this:
1,1,1,N,ppb,1008585047000,1008581447000,NO2,1008581567000,1,3.1415,Test-1,10
1,1,12,N,ppb,1008585047000,1008581447000,SO2,1008582047000,2,5,Test-1,11
Currently, the output comes out like this, which is wrong:
[1 1 1 None u'ppb' 1008585047000L 1008581447000L u'NO2' 1008581567000L 1 3.1415 u'Test-1' 10]
[1 1 12 None u'ppb' 1008585047000L 1008581447000L u'SO2' 1008582047000L 2 5 u'Test-1' 11]
I believe the 'u' bit refers to Unicode, but I don't know how to change the encoding.
Any help would be greatly appreciated!
Here is the Python code I have so far:
import json
import cStringIO
f = open('test.json')
data = json.load(f)
f.close()
output = cStringIO.StringIO()
for item in data:
output.write(str([item['observationid'], item['fkey_observation'], item['value'], item['participantid'], item['uom'], item['finishtime'], item['starttime'], item['observedproperty'], item['measuretime'],item['measurementid'], item['longitude'], item['identifier'], item['latitude']]) + '\n')
contents = output.getvalue()
print contents`
EDIT
Hi guys, slight change of plan.
Essentially, I have a String object, but it actually is structured like a JSON file:
"[{observationid=1, fkey_observation=1, value=1, participantid=null, uom=ppb, finishtime=2001-12-17 10:30:47.0, starttime=2001-12-17 09:30:47.0, observedproperty=NO2, measuretime=2001-12-17 09:32:47.0, measurementid=1, longitude=3.1415, identifier=CITISENSE-Test-00000001, latitude=10}, {observationid=1, fkey_observation=1, value=12, participantid=null, uom=ppb, finishtime=2001-12-17 10:30:47.0, starttime=2001-12-17 09:30:47.0, observedproperty=SO2, measuretime=2001-12-17 09:40:47.0, measurementid=2, longitude=5, identifier=CITISENSE-Test-00000001, latitude=11}, {observationid=1, fkey_observation=1, value=7000, participantid=null, uom=ppb, finishtime=2001-12-17 10:30:47.0, starttime=2001-12-17 09:30:47.0, observedproperty=NO2, measuretime=2001-12-17 09:52:47.0, measurementid=3, longitude=6, identifier=CITISENSE-Test-00000001, latitude=9}, {observationid=2, fkey_observation=2, value=5, participantid=null, uom=ppb, finishtime=2001-12-18 10:30:47.0, starttime=2001-12-18 09:30:47.0, observedproperty=SO2, measuretime=2001-12-18 09:32:47.0, measurementid=4, longitude=7, identifier=CITISENSE-Test-00000001, latitude=8}, {observationid=2, fkey_observation=2, value=6, participantid=null, uom=ppb, finishtime=2001-12-18 10:30:47.0, starttime=2001-12-18 09:30:47.0, observedproperty=PM10, measuretime=2001-12-18 09:34:47.0, measurementid=5, longitude=8, identifier=CITISENSE-Test-00000001, latitude=10}, {observationid=3, fkey_observation=3, value=10000, participantid=null, uom=ppb, finishtime=2001-12-19 10:30:47.0, starttime=2001-12-19 09:30:47.0, observedproperty=SO2, measuretime=2001-12-19 09:38:47.0, measurementid=6, longitude=9, identifier=CITISENSE-Test-00000001, latitude=11.2}]"
How do I go about converting this to CSV? I can't use the json module as it is not a JSON file.
Here is my approach: use csv.DictWriter to handle converting from a dictionary to a row of CSV data:
import csv
import json
from cStringIO import StringIO
with open('test.json') as f:
my_data = json.load(f)
headers = [
'observationid', 'fkey_observation', 'value',
'participantid', 'uom', 'finishtime', 'starttime',
'observedproperty', 'measuretime', 'measurementid',
'longitude', 'identifier', 'latitude']
buffer = StringIO()
writer = csv.DictWriter(buffer, headers)
for row in my_data:
writer.writerow(row)
print buffer.getvalue()
Here's a little snippet I wrote up, I think it should handle your scenario and give you a list of lists. Ereli is onto something with that module though, it might make your life easier. But in the meantime maybe this will help.
import json
myFile = open('myJson.json','r+')
myData = json.load(myFile)
myFile.close()
myList = []
for x in range(0,len(myData)):
myList.append([])
for key in myData[x].keys():
value = myData[x][key]
if isinstance(value,(str,unicode)):
value = value.encode('ascii','ignore')
myList[x].append(value)
print myList
You should probably consider using something like csvwriter. it will handle the escaping and delimiter setting for you.
See example for python3:
import csv
with open('output.csv', 'w', newline='') as csvfile:
writer = csv.writer(csvfile, delimiter=',')
for line in data:
writer.writerow(line)
it can also be used with cStringIO.