I am trying to parse a "complicated" JSON string that is returned to me by an API.
It looks like this:
{
"data":[
["Distance to last strike","23.0","miles"],
["Time of last strike","1/14/2022 9:23:42 AM",""],
["Number of strikes today","1",""]
]
}
While the end goal will be to extract the distance, date/time, as well as count, for right now I am just trying to successfully get the distance.
My python script is:
import requests
import json
response_API = requests.get('http://localhost:8998/api/extra/lightning.json')
data = response_API.text
parse_json = json.loads(data)
value = parse_json['Distance to last strike']
print(value)
This does not work. If I change the value line to
value = parse_json['data']
then the entire string I listed above is returned.
I am hoping it's just a simple formatting issue. Suggestions?
You have an object with a list of lists. If you fetch
value = parse_json['data']
Then you will have a list containing three lists. So:
print(value[0][1])
will print "23.0".
Hi So basically I got 2 arrays. For the sake of simplicity the following:
array_notepad = []
array_images = []
Some magic happens and they are populated, i.e. data is loaded, for array_notepad data is read from a notepad file whilst array_images is populated with the RGB values from a folder containing images.
How do I use array_notepad as a label of array_images?
i.e. the label of array_images[0] is array_notepad[0], array_images[1] is array_notepad[1], array_images[1] is array_notepad[1], and so on until array_images[999] is array_notepad[999]
If it makes any difference I am using glob and cv2 to read the image data, whilst normal python file reader to read the content in the notepad.
Thanks a lot for your help!
Your question isn't entirely clear on what your expected output should be. You mention 'label' - to me it sounds like you're describing key-value pairs i.e. a dictionary.
In which case you should be able to use the zip function as described in this question: Convert two lists into a dictionary
I hope you want to create a dictionary from 2 lists. If so you could do as follows.
array_notepad = ['label1', 'label2', 'label3']
array_images = ['rgb1', 'rgb2', 'rgb3']
d = { label: value for label, value in zip(array_notepad, array_images) }
d
I have a series of files, each with a JSON, that I'd like to read into Pandas. This is pretty straightforward:
data_unfiltered = [json.load(open(jf)) for jf in json_input_files]
# next call used to be df = pandas.DataFrame(data_unfiltered)
# instead, json_normalize flattens nested dicts
df = json_normalize(data_unfiltered)
Now, a new wrinkle. Some of these input files no longer have just a plain JSON but instead a (python) list of JSONs: [ { JSON }, { JSON }... ].
json.load is pretty great because it inputs a whole file and puts it straight into JSON; I don't have to parse the file at all. How would I now turn a list of files, some of which have a single JSON object and some of which have a list of JSON objects, into a list of parsed JSON objects?
Bonus question: I used to be able to add the filename into each JSON with
df['filename'] = pandas.Series(json_input_files).values
but now I can't do that any more since now one input file might correspond to many JSONs in the output list. How can I add the filenames to the JSONs before I put them into a pandas dataframe?
Edit: Work in progress, per request in comments:
data_unfiltered = []
for jf_file in json_input_files:
jf = open(jf_file)
if isinstance(jf, list): # this is obviously wrong
for j in jf:
d = json.load(j) # this is what I need to fix
d['details'] = jf_file
data_unfiltered.append(d)
else: # not a list, assume dict
d = json.load(jf)
d['details'] = jf_file
data_unfiltered.append(d)
but json.load() worked perfectly for what I wanted (file object to JSON) and has no equiv for arrays. I figure I have to manually parse the file into a list of blobs and then do json.loads() on each blob? That's pretty kludgey though.
High level description of what I want: I want to be able to receive a json response detailing certain values of fields/features, say {a: 1, b:2, c:3} as a flask (json) request. Then I want to convert the resulting python_dict into an r dataframe with rpy2(a single row of one), and feed it into a model in R which is expecting to receive a set of input where each column is a factor in r. I usually use python for this sort of thing, and serialize a vectorizer object from sklearn -- but this particular analysis needs to be done an R.
So here is what I'm doing so far.
import rpy2.robjects as robjects
from rpy2.robjects.packages import STAP
model = os.path.join('model', 'rsource_file.R')
with open(model, 'r') as f:
string = f.read()
model = STAP(string, "model")
data_r = robjects.DataFrame(data)
data_factored = model.prepdata(data_r)
result = model.predict(data_factored)
the relevant r functions from rsource_code are:
prepdata = function(row){
for(v in vars) if(typeof(row[,v])=="character") row[,v] = as.factor(row[,v], levs[0,v])
modm2=model.matrix(frm, data=tdz2, contrasts.arg = c1,xlev = levs)
}
where contrasts and levels have been pre-extracted from an existing dataset likeso:
#vars = vector of columns of interest
load(data.Rd)
for(v in vars) if(typeof(data[,v])=="character") data[,v] = as.factor(data[,v])
frm = ~ weightedsum_of_things #function mapped, causes no issue
modm= (model.matrix(frm,data=data))
levs = lapply(data, levels)
c1 = attributes(modm)$contrasts
calling prepdata does not give me what I want, which is for the newly dataframe(from the json request data_r) to be properly turned into a vector of "factors" with the same encoding by which the elements of the data.Rd database where transformed.
Thank you for your assistance, will upvote.
More detail: So what my code is attempting to do is map the labels() method over a the dataset to extract a list of lists of possible "levels" for a factor -- and then for matching values in the new input, call factor() with the new data row as well as the corresponding set of levels, levs[0,v].
This throws an error that you can't use factor if there isn't more than one level. I think this might have something to do with the labels/level difference? I'm calling levs[,v] to get the element of the return value of lapply(data, levels) corresponding to the "title" v (a string). I extracted the levelsfrom the data set -- but referencing them in the body of prep_data this way doesn't seem to work. Do I need to extract labels instead? if so, how can I do that?
I have a file that has columns that look like this:
Column1,Column2,Column3,Column4,Column5,Column6
1,2,3,4,5,6
1,2,3,4,5,6
1,2,3,4,5,6
1,2,3,4,5,6
1,2,3,4,5,6
1,2,3,4,5,6
Column1,Column3,Column2,Column6,Column5,Column4
1,3,2,6,5,4
1,3,2,6,5,4
1,3,2,6,5,4
Column2,Column3,Column4,Column5,Column6,Column1
2,3,4,5,6,1
2,3,4,5,6,1
2,3,4,5,6,1
The columns randomly re-order in the middle of the file, and the only way to know the order is to look at the last set of headers right before the data (Column1,Column2, etc.) (I've also simplified the data so that it's easier to picture. In real life, there is no way to tell data apart as they are all large integer values that could really go into any column)
Obviously this isn't very SQL Server friendly when it comes to using BULK INSERT, so I need to find a way to arrange all of the columns in a consistent order that matches my table's column order in my SQL database. What's the best way to do this? I've heard Python is the language to use, but I have never worked with it. Any suggestions/sample scripts in any language are appreciated.
A solution in python:
I would read line-by-line and look for headers. When I find a header, I use it to figure out the order (somehow). Then I pass that order to itemgetter which will do the magic of reordering elements:
from operator import itemgetter
def header_parse(line,order_dict):
header_info = line.split(',')
indices = [None] * len(header_info)
for i,col_name in enumerate(header_info):
indices[order_dict[col_name]] = i
return indices
def fix(fname,foutname):
with open(fname) as f,open(foutname,'w') as fout:
#Assume first line is a "header" and gives the order to use for the
#rest of the file
line = f.readline()
order_dict = dict((name,i) for i,name in enumerate(line.strip().split(',')))
reorder_magic = itemgetter(*header_parse(line.strip(),order_dict))
for line in f:
if line.startswith('Column'): #somehow determine if this is a "header"
reorder_magic = itemgetter(*header_parse(line.strip(),order_dict))
else:
fout.write(','.join(reorder_magic(line.strip().split(','))) + '\n')
if __name__ == '__main__':
import sys
fix(sys.argv[1],sys.argv[2])
Now you can call it as:
python fixscript.py badfile goodfile
Since you didn't mention a specific problem, I'm going to assume you're having problems coming up with an algorithm.
For each row,
Parse the row into fields.
If it's the first header line,
Output the header.
Create a map of field names to position.
%map = map { $fields[$_] => $_ } 0..$#fields;
Create a map of original positions to new positions.
#map = #map{ #fields };
If it's a header line other than the first,
Update map of original positions to new positions.
#map = #map{ #fields };
If it's not a header line,
Reorder fields.
#fields[ #map ] = #fields;
Output the row.
(Snippets are in Perl.)
This can be fixed easily in two steps:
split file into multiple files when a new header starts
read each file using csv dict reader, sort the keys and re-output rows in correct order
Here is an example how you can ho about it,
def is_header(line):
return line.find('Column') >= 0
def process(lines):
headers = None
for line in lines:
line = line.strip()
if is_header(line):
headers = list(enumerate(line.split(",")))
headers_map = dict(headers)
headers.sort(key=lambda (i,v):headers_map[i])
print ",".join([h for i,h in headers])
continue
values = list(enumerate(line.split(",")))
values.sort(key=lambda (i,v):headers_map[i])
print ",".join([v for i,v in values])
if __name__ == "__main__":
import sys
process(open(sys.argv[1]))
You can also change function is_header to correctly identify header in real cases