I want to store data about tweets in the database in raw format and figured out that you can pull out the jsob from tweepy.Status for this purpose like this:
status._json
How can I parse json back to the tweepy.Status object?
I've found non-elegant solution for my problem. All you need is that:
tweepy.Status().parse(None, status_json)
where None should be tweepy.api.API object, but it not nedeed for parsing at all.
You can also compare the result with the original status for self-check. In my case this has True result:
tweepy.Status().parse(None, status_json) == status
Here you go! It took me forever to get this going! This gets all info out of the status object.
all_tweets_loaded is the tweepy.Status object
from pandas.io.json import json_normalize
dfflat = pd.DataFrame()
for tweet in all_tweets_loaded:
df_for_tweet = json_normalize(tweet._json)
dfflat=dfflat.append(df_for_tweet,ignore_index=True,sort=True)
dfflat.columns.tolist() #TO TEST IT HAS ALL THE COLUMNS
Related
I have this very long json here: https://textup.fr/601885q4 and would like to read a data that is in one of the "payment_token_contract" specifically those with "id":1
My problem is that I don't get how to call the specific dictionary as they all have the same name. Is this even possible, I'm not used to manipulating such complex objects as I'm a beginner.
I would have tried something like:
["orders][x]["id":1]["base_price"]
with x being in a for loop that iterates through each "orders" present.
But I can't manage to put it all together. Thanks for your help !
You can use a for loop to iterate over the orders, you can check the value of the payment contract id and if its 1 then print the base price for that order
import json
jdata = "yourjson"
jdict = json.loads(jdata)
for order in jdict["orders"]:
if order['payment_token_contract']['id'] == 1:
print(order["base_price"])
I have omited the json data as its to long but you can image jdata is the string of your json
OUTPUT
149000000000000000000
I am working on a program that reads the content of a Restful API from ImportIO. The connection works, and data is returned, but it's a jumbled mess. I'm trying to clean it to only return Asins.
I have tried using the split keyword and delimiter to no success.
stuff = requests.get('https://data.import.io/extractor***')
stuff.content
I get the content, but I want to extract only Asins.
results
While .content gives you access to the raw bytes of the response payload, you will often want to convert them into a string using a character encoding such as UTF-8. the response will do that for you when you access .text.
response.txt
Because the decoding of bytes to str requires an encoding scheme, requests will try to guess the encoding based on the response’s headers if you do not specify one. You can provide an explicit encoding by setting .encoding before accessing .text:
If you take a look at the response, you’ll see that it is actually serialized JSON content. To get a dictionary, you could take the str you retrieved from .text and deserialize it using json.loads(). However, a simpler way to accomplish this task is to use .json():
response.json()
The type of the return value of .json() is a dictionary, so you can access values in the object by key.
You can do a lot with status codes and message bodies. But, if you need more information, like metadata about the response itself, you’ll need to look at the response’s headers.
For More Info: https://realpython.com/python-requests/
What format is the return information in? Typically Restful API's will return the data as json, you will likely have luck parsing the it as a json object.
https://realpython.com/python-requests/#content
stuff_dictionary = stuff.json()
With that, you can load the content is returned as a dictionary and you will have a much easier time.
EDIT:
Since I don't have the full URL to test, I can't give an exact answer. Given the content type is CSV, using a pandas DataFrame is pretty easy. With a quick StackOverflow search, I found the following answer: https://stackoverflow.com/a/43312861/11530367
So I tried the following in the terminal and got a dataframe from it
from io import StringIO
import pandas as pd
pd.read_csv(StringIO("HI\r\ntest\r\n"))
So you should be able to perform the following
from io import StringIO
import pandas as pd
df = pd.read_csv(StringIO(stuff.content))
If that doesn't work, consider dropping the first three bytes you have in your response: b'\xef\xbb\xf'. Check the answer from Mark Tolonen to get parse this.
After that, selecting the ASIN (your second column) from your dataframe should be easy.
asins = df.loc[:, 'ASIN']
asins_arr = asins.array
The response is the byte string of CSV content encoded in UTF-8. The first three escaped byte codes are a UTF-8-encoded BOM signature. So stuff.content.decode('utf-8-sig') should decode it. stuff.text may also work if the encoding was returned correctly in the response headers.
I'm trying to learn Python and have following problem:
I get an error while running this as it cannot see the 'name' attribute in data.
It works when I grab one by one items from JSON. However when I want to do it in a loop it fails.
I assume my error is wrong request. That it cannot read JSON correctly and see attributes.
import requests
import json
def main():
req = requests.get('http://pokeapi.co/api/v2/pokemon/')
print("HTTP Status Code: " + str(req.status_code))
print(req.headers)
json_obj = json.loads(req.content)
for i in json_obj['name']:
print(i)
if __name__ == '__main__':
main()
You want to access the name attribute of the results attribute in your json_object like this:
for pokemon in json_obj['results']:
print (pokemon['name'])
I was able to guess that you want to access the results keys because I have looked at the result of
json_obj.keys()
that is
dict_keys(['count', 'previous', 'results', 'next'])
Because all pokemons are saved in a list which is under keyword results, so you firstly need to get that list and then iterate over it.
for result in json_obj['results']:
print(result['name'])
A couple things: as soon mentioned, iterating through json_obj['name'] doesn't really make sense - use json_obj['results'] instead.
Also, you can use req.json() which is a method that comes with the requests library by default. That will turn the response into a dictionary which you can then iterate through as usual (.iteritems() or .items(), depending if you're using Python 2 or 3).
I'm using the Google Visualization Library for python (gviz) to generate chart objects. This works great for generating JSON that can be read by the Google Charts using the DataTable.ToJSon method. What I'm trying to do now, however, is add multiple Google Chart data tables to one JSON dictionary. In other words, what I'm making now is this:
Chart_Data = JSON_Chart_Data_1
and what I want to make is this:
Chart_Data = {'Chart_1' : JSON_Chart_Data_1,
'Chart_2' : JSON_Chart_Data_2,}
Where Chart_Data is converted into a JSON string in both cases.
I'm pretty sure I can do this by converting the JSON string from gviz back into a python dictionary, compile the strings in a container dictionary as necessary, and then convert that container dictionary back into JSON, but that doesn't seem like a very elegant way to do it. Is there a better way? What I'm picturing is a .ToPythonObject method equivalent to .ToJSon, but there doesn't appear to be one in the library.
Thanks a lot,
Alex
I ended up doing my original, inelegant, solution to the problem with this function:
def CombineJson(JSON_List):
#Jsonlist should be a list of tuples, with the dictionary key and the json file to go with that.
#eg: [('json1', 'some json string'), ('json2', 'some other json string')]
Python_Dict = {}
for tup in JSON_List:
parsed_json = json.loads(tup[1])
Python_Dict[tup[0]] = parsed_json
BigJson = json.dumps(Python_Dict)
return BigJson
Thanks guys,
Alex
I've written some code that converts a JSON object to an iCalendar (.ics) object and now I am trying to test it. The problem is that I can't figure out how to create a generic JSON object to use as the parameter. Some of my attempts are as follows:
# 1
obj_json = u'sample json data in string form'
obj = json.loads(obj_json)
# 2
# I'm not sure about this very first line. My supervisor told me to put it in but he
# has a very heavy accent so I definitely could have heard him incorrectly.
input.json
with open('input.json') as f:
obj = json.loads(f.read())
Try,
import json
some_dict = {'id': 0123, 'text': 'A dummy text'}
dummy_json = json.dumps(some_dict)
Now, feed your dummy json to your function. i.e.
'{"text": "A dummy text", "id": 83}'
You can do dumps with a string object too.
See pnv's answer, but you probably don't need to dump it. Just use a dictionary, as pnv did, and pass that into whatever you need to. Unless you are about to pass your json object over the wire to something, I don't know why you'd want to dump it.
I would've added this as a comment, but no rep, yet. :)