Trying to handle some JSON response to a Python Requests call to an API, in Python--a language I'm still learning.
Here's the structure of the sample returned JSON data:
{"sports":[{"searchtype":"seasonal", "sports":["'baseball','football','softball','soccer','summer','warm'","'hockey','curling','luge','snowshoe','winter','cold'"]}]}
Currently, I'm parsing and writing output to a file like this:
output = response.json
results = output['sports'][0]['sports']
if results:
with open (filename, "w") as fileout:
fileout.write(pprint.pformat(results))
Giving me this as my file:
[u"'baseball','football','softball','soccer','summer','warm'",
"'hockey','curling','luge','snowshoe','winter','cold'"]
Since I'm basically creating double-quoted JSON Arrays, consisting of comma separated strings--how can I manipulate the array to print only the comma separated values I want? In this case, everything except the fifth column which represents seasons.
[u"'baseball','football','softball','soccer','warm'",
"'hockey','curling','luge','snowshoe','cold'"]
Ultimately, I'd like to strip away the unicode too, since I have no non-ascii characters. I currently do this manually with a language I'm more familiar with (AWK) after the fact. My desired output is really:
'baseball','football','softball','soccer','warm'
'hockey','curling','luge','snowshoe','cold'
your results is actually a list of strings, to get your desired output you can do it like this for example:
if results:
with open (filename, "w") as fileout:
for line in results
fileout.write(line)
Related
I have big size of json file to parse with python, however it's incomplete (i.e., missing parentheses in the end). The json file consist of one big json object which contains json objects inside. All json object in outer json object is complete, just finishing parenthese are missing.
for example, its structure is like this.
{bigger_json_head:value, another_key:[{small_complete_json1},{small_complete_json2}, ...,{small_complete_json_n},
So final "]}" are missing. however, each small json forms a single row so when I tried to print each line of the json file I have, I get each json object as a single string.
so I've tried to use:
with open("file.json","r",encoding="UTF-8") as f:
for line in f.readlines()
line_arr.append(line)
I expected to have a list with line of json object as its element
and then I tried below after the process:
for json_line in line_arr:
try:
json_str = json.loads(json_line)
print(json_str)
except json.decoder.JSONDecodeError:
continue
I expected from this code block, except first and last string, this code would print json string to console. However, it printed nothing and just got decode error.
Is there anyone who solved similar problem? please help. Thank you
If the faulty json file only miss the final "]}", then you can actually fix it before parse it.
Here is an example code to illustrate:
with open("file.json","r",encoding="UTF-8") as f:
faulty_json_str = f.read()
fixed_json_str = faulty_json_str + ']}'
json_obj = json.loads(fixed_json_str)
In my python script, I am struggling with xml files. I am using urllib to download xml files and convert them to a string. Next, Id like to parse the xml-file.
Sample link of a typical file
import urllib
data = urllib.request.urlopen(link).read()
data = str(data)
data2 = data.replace('\n', '')
I wanted to strip data of \n, but data2 is not stripped of \n characters, sample output looks like this for data2:
SwapInvolved>\n </transactionCoding>\n <transactionTimeliness>\n <value></value>\n
Why?
Also, since the file I pull is xml, I would like to go though ElementTree to parse it but I get an error.
e = xml.etree.ElementTree.parse(data).getroot()
OSError: [Errno 36] File name too long:
In the end, I want the xml from the link and parse it. I am doing it wrong though.
Your first problem is that you need to escape the '\n' in string.replace() because your string contains literal sequences of \ and n. Your code is looking for linefeeds but your data contains string representations of linefeeds.
Do this instead: data2 = data.replace(r"\n","")
Your second problem is that xml.etree.ElementTree.parse() is expecting a filename, not a string. Use xml.etree.ElementTree.fromstring() instead.
I have a huge json array file downloaded which needs to be split into smaller files but I need the smaller files in the below format (newline for each new object in array): (original json is also in the same format)
[
{"a":"a1","b":"b1","c":"c1"},
{"a":"a2","b":"b2","c":"c2"},
{"a":"a3","b":"b3","c":"c3"}
]
I used json.dump but it is just printing the smaller array in a single line and using the indent option is also not giving me the output in the above format
Although I don't know what your original json looks like, you would basically want something like this
lines = []
for something in original_json:
line={something['a']:something['aa']} #whatever you need to do to get your values
lines.append(line)
#alternatively you can simplify this by doing lines.append({something['a']:something['aa'], etc}
with open('myfile.json', 'a+') as f1:
f1.write("[\n")
for line in lines:
f1.write("%s,\n"%(line))
f1.write("]")
I have a huge HTML file that I have converted to text file. (The file is Facebook home page's source). Assume the text file has a specific keyword in some places of it. For example: "some_keyword: [bla bla]". How would I print all the different bla blas that are followed by some_keyword?
{id:"1126830890",name:"Hillary Clinton",firstName:"Hillary"}
Imagine there are 50 different names with this format in the page. How would I print all the names followed by "name:", considering the text is very large and crashes when you read() it or try to search through its lines.
Sample File:
shortProfiles:{"100000094503825":{id:"100000094503825",name:"Bla blah",firstName:"Blah",vanity:"blah",thumbSrc:"https://scontent-lax3-1.xx.fbcdn.net/v/t1.0-1/c19.0.64.64/p64x64/10354686_10150004552801856_220367501106153455_n.jpg?oh=3b26bb13129d4f9a482d9c4115b9eeb2&oe=5883062B",uri:"https://www.facebook.com/blah",gender:2,i18nGender:16777216,type:"friend",is_friend:true,mThumbSrcSmall:null,mThumbSrcLarge:null,dir:null,searchTokens:["Bla"],alternateName:"",is_nonfriend_messenger_contact:false},"1347968857":
Based on your comment, since you are the person responsible for writting the data to the file. Write the data in JSON format and read it from file using json.loads() as:
import json
json_file = open('/path/to/your_file')
json_str = json_file.read()
json_data = json.loads(json_str)
for item in json_data:
print item['name']
Explanation:
Lets say data is the variable storing
{id:"1126830890",name:"Hillary Clinton",firstName:"Hillary"}
which will be dynamically changing within your code where you are performing write operation in the file. Instead append it to the list as:
a = []
for item in page_content:
# data = some xy logic on HTML file
a.append(data)
Now write this list to the file using: json.dump()
I just wanted to throw this out there even though I agree with all the comments about just dealing with the html directly or using Facebook's API (probably the safest way), but open file objects in Python can be used as a generator yielding lines without reading the entire file into memory and the re module can be used to extract information from text.
This can be done like so:
import re
regex = re.compile(r"(?:some_keyword:\s\[)(.*?)\]")
with open("filename.txt", "r") as fp:
for line in fp:
for match in regex.findall(line):
print(match)
Of course this only works if the file is in a "line-based" format, but the end effect is that only the line you are on is loaded into memory at any one time.
here is the Python 2 docs for the re module
here is the Python 3 docs for the re module
I cannot find documentation which details the generator capabilities of file objects in Python, it seems to be one of those well-known secrets...Please feel free to edit and remove this paragraph if you know where in the Python docs this is detailed.
I've created a very simple piece of code to read in tweets in JSON format in text files, determine if they contain an id and coordinates and if so, write these attributes to a csv file. This is the code:
f = csv.writer(open('GeotaggedTweets/ListOfTweets.csv', 'wb+'))
all_files = glob.glob('SampleTweets/*.txt')
for filename in all_files:
with open(filename, 'r') as file:
data = simplejson.load(file)
if 'text' and 'coordinates' in data:
f.writerow([data['id'], data['geo']['coordinates']])
I've been having some difficulties but with the help of the excellent JSON Lint website have realised my mistake. I have multiple JSON objects and from what I read these need to be separated by commas and have square brackets added to the start and end of the file.
How can I achieve this? I've seen some examples online where each individual line is read and it's added to the first and last line, but as I load the whole file I'm not entirely sure how to do this.
You have a file that either contains too many newlines (in the JSON values themselves) or too few (no newlines between the tweets at all).
You can still repair this by using some creative re-stitching. The following generator function should do it:
import json
def read_objects(filename):
decoder = json.JSONDecoder()
with open(filename, 'r') as inputfile:
line = next(inputfile).strip()
while line:
try:
obj, index = decoder.raw_decode(line)
yield obj
line = line[index:]
except ValueError:
# Assume we didn't have a complete object yet
line += next(inputfile).strip()
if not line:
line += next(inputfile).strip()
This should be able to read all your JSON objects in sequence:
for filename in all_files:
for data in read_objects(filename):
if 'text' and 'coordinates' in data:
f.writerow([data['id'], data['geo']['coordinates']])
It is otherwise fine to have multiple JSON strings written to one file, but you need to make sure that the entries are clearly separated somehow. Writing JSON entries that do not use newlines, then using newlines in between them, for example, makes sure you can later on read them one by one again and process them sequentially without this much hassle.