Getting specific comments Facebook user downloaded comments.json - python

Facebook allows us to download our own content, and an option is to send it to a json file. I want to parse that file to pull specific comments I've made in a specific Facebook group. I have the comments.json file and have a short snippet of test code that can get the top layers of data. The lowest layer, where the group names and the actual comments are do not parse.
This is on Windows 10 using the IDLE python IDE (python version 3.5.2).
Here is a short sample of the json file -- anonymized:
{
"comments": [
{
"timestamp": 1564971950,
"data": [
{
"comment": {
"timestamp": 1564971950,
"comment": "Some Text Here",
"author": "My Name",
"group": "Group 1 Name"
}
}
],
"title": "My Name commented on Other Person's post."
},
{
"timestamp": 1564968688,
"data": [
{
"comment": {
"timestamp": 1564968688,
"comment": "Some More Text Here",
"author": "My Name",
"group": "Group 2 Name"
}
}
],
"title": "My Name replied to their own comment."
}
]
}
I want to select on the [comments][data][comment][group]. Here is the
short test python file code I tried:
import json
from datetime import datetime
with open('sample.json', 'r') as json_file:
data = json.load(json_file)
for j1 in data["comments"]:
for j2 in j1["data"]:
print(datetime.utcfromtimestamp(j1['timestamp']))
## for j3 in j2['comment']:
print(j2)
Which results in this output
2019-08-05 02:25:50
{'comment': {'group': 'Group 1 Name', 'comment': 'Some Text Here', 'author': 'My Name', 'timestamp': 1564971950}}
2019-08-05 01:31:28
{'comment': {'group': 'Group 2 Name', 'comment': 'Some More Text Here', 'author': 'My Name', 'timestamp': 1564968688}}
You can see the data is pulled in to j2. When I tried to grab that last level of data, the keys are grabbed, but not the values. The code for this:
import json
from datetime import datetime
with open('sample.json', 'r') as json_file:
data = json.load(json_file)
for j1 in data["comments"]:
for j2 in j1["data"]:
print(datetime.utcfromtimestamp(j1['timestamp']))
for j3 in j2['comment']:
print(j3)
And the output:
2019-08-05 02:25:50
group
timestamp
comment
author
2019-08-05 01:31:28
group
timestamp
comment
author
If I try to grab a specific key (like j3[group]), I get an error - TypeError: string indices must be integers
Which means the json library doesn't recognize this last level as keys and values properly. I can add the square brackets before and after that farthest right set of curly brackets in my sample file and get what I want to get with this code:
import json
from datetime import datetime
with open('sample2.json', 'r') as json_file:
data = json.load(json_file)
for j1 in data["comments"]:
for j2 in j1["data"]:
for j3 in j2['comment']:
if j3['group'] == "Group 1 Name":
print(datetime.utcfromtimestamp(j3['timestamp']))
print(j3['comment'])
Which, given I only ask for "Group 1 Name" I get this:
2019-08-05 02:25:50
Some Text Here
What I'd like to do, since I really don't want to manually edit a 56000 line json file to add all the missing square brackets, is there a way to parse j2 to pull the key/value pairs, as such, from that "comment" set.
import json
from datetime import datetime
with open('sample2.json', 'r') as json_file:
data = json.load(json_file)
for j1 in data["comments"]:
for j2 in j1["data"]:
for j3 in j2['comment']:
if j3['group'] == "Group 1 Name":
print(datetime.utcfromtimestamp(j3['timestamp']))
print(j3['comment'])
I expect to pull the data for comments in a specific facebook group from the user downloaded json file and having it output with the timestamp and comment text.
When I try to access that lowest level key/value set I get the error: TypeError: string indices must be integers

In Python, the default behavior when iterating over a dictionary by calling it's variable name is calling dict.keys().
It means that this statement:
for j3 in j2['comment']:
print(j3)
Actually equivalents to this:
for key in j2['comment'].keys():
print(key)
The reason you received TypeError is because the action j3['group'] was called on a string (dictionary key) and not a dictionary.
You managed to bypass this exception by changing the value of comment key from dictionary to list, so trying to iterate j2['comment'] actually returned a list of one dictionary:
for dictionary in j2['comment']:
print(dictionary['your_key'])
You can iterate over the key value pairs of j2 without changing the original JSON file by doing something like this:
import json
from datetime import datetime
with open('sample.json', 'r') as json_file:
data = json.load(json_file)
for j1 in data["comments"]:
for j2 in j1["data"]:
for k, v in j2['comment'].items():
print('Key: {0}, Value: {1}'.format(k,v))
If all you want is to only print comments from a specific group, based on your example, you don't really need to go into another nested loop, e.g:
import json
from datetime import datetime
with open('sample.json', 'r') as json_file:
data = json.load(json_file)
for j1 in data["comments"]:
for j2 in j1["data"]:
if j2['comment']['group'] == 'Group 1 Name':
print(j2['comment']['comment'])

Related

Running sentiment analysis for facebook data in json format

I would first like to say that I am very new to coding and know the very basics at best
I was tasked with scraping data from facebook and running a sentiment analysis on it. I got the data using scraping-bot.io and I have it on a json file with the following format
{
"owner_url": "https://www.facebook.com/########",
"url": "https://www.facebook.com/post",
"name": "Page name",
"date": "date",
"post_text": "Post title",
"media_url": "media url attached",
"likes": ###,
"shares": ###,
"num_comments": ###,
"scrape_time": "date",
"comments": [
{
"author_name": "Name",
"text": "Comment text",
"created": "Date"
},
The posts are in spanish and so I looked up for a library to run the analysis with. I settled on https://pypi.org/project/sentiment-analysis-spanish/ (not sure if it's the best one, so I'm open to suggestions on that front as well)
Ideally I would like to be able to open the json file, run the sentiment analysis on "text" and then save that data into the same or a new file to visualize in another program.
This is what I have so far
from sentiment_analysis_spanish import sentiment_analysis
sentiment = sentiment_analysis.SentimentAnalysisSpanish()
f = open('C:/Users/vnare/Documents/WebScraping.json', encoding='utf-8-sig')
data = json.load(f)
for i in range(len('text')):
print(sentiment.sentiment(i))
Currently it gives me the following error AttributeError: 'int' object has no attribute 'lower'
But I'm sure there's far more that I'm doing wrong there.
I appreciate any help provided
AttributeError: 'int' object has no attribute 'lower' means integer cannot be lower-cased. This means that somewhere in your code, you are trying to call the lower() string method on an integer.
If you take a look at the documentation for the sentiment analysis you provided, you will see that print(sentiment.sentiment("something")) will evaluate the sentiment of "something" and give you a score between 1 and 0.
My guess is that when you call sentiment.sentiment("some text") it will use lower() to convert whatever text is passed through to all lowercase. This would be fine if you were passing a string, but you are currently passing an integer!
By using for i in range(), you are indicating that you would like to take a range of numbers from 0 to the end number. This means that your i will always be an integer!
You need to instead loop through your JSON data to access the key/value pairs. "text" cannot be accessed directly as you've done above, but from within the JSON data, it can be!
https://www.geeksforgeeks.org/json-with-python/
The important thing to look at is the format of the JSON data that you are trying to access. First, you need to access a dictionary key named "comments". However, what is inside of 'comments'?
[{'author_name': 'Name', 'text': 'Comment text', 'created': 'Date'}]
It's actually another dictionary of key-value pairs inside of a list. Given that list indices start at 0 and there is only one list element (the dictionary) in your example, we need to next use the index 0 to access the dictionary inside. Now, we will look for the key 'text' as you were initially.
When you are learning python, I highly recommend using a lot of print statements when trying to debug! This helps you see what your program sees so you know where the errors are.
import json
from sentiment_analysis_spanish import sentiment_analysis
sentiment = sentiment_analysis.SentimentAnalysisSpanish()
f = open('WebScraping.json', encoding='utf-8-sig')
data = json.load(f)
print(data)
comments = data['comments']
print(comments)
text = comments[0]['text']
print(text)
sentimentScore = sentiment.sentiment(text)
print(sentimentScore)
When you run this, the output will show you what is inside 'data', what is inside 'comments', what is inside 'text', and what the sentiment score is.
{'owner_url': 'https://www.facebook.com/########', 'url': 'https://www.facebook.com/post', 'name': 'Page name', 'date': 'date', 'post_text': 'Post title', 'media_url': 'media url attached', 'likes': 234, 'shares': 500, 'num_comments': 100, 'scrape_time': 'date', 'comments': [{'author_name': 'Name', 'text': 'Comment text', 'created': 'Date'}]}
[{'author_name': 'Name', 'text': 'Comment text', 'created': 'Date'}]
Comment text
0.49789225920557484
This is what helped me see that inside of 'comments' was a dictionary within a list.
Now that you understand how it works, here is a more efficient way to run the code without all the extra prints! You can see I am now implementing the for loop you used earlier, as there may be multiple comments in a real-life scenario.
import json
from sentiment_analysis_spanish import sentiment_analysis
sentiment = sentiment_analysis.SentimentAnalysisSpanish()
f = open('WebScraping.json', encoding='utf-8-sig')
data = json.load(f)
comments = data['comments']
i = 0
for i in range (len(comments)):
comment = comments[i]['text']
sentimentScore = sentiment.sentiment(comment)
print(f"The sentiment score of this comment is {sentimentScore}.")
print(f"The comment was: '{comment}'.")
This results in the following output.
The sentiment score of this comment is 0.49789225920557484.
The comment was: 'Comment 1'.
The sentiment score of this comment is 0.49789225920557484.
The comment was: 'Comment 2'.
This is the file that I used for reference.
{
"owner_url": "https://www.facebook.com/########",
"url": "https://www.facebook.com/post",
"name": "Page name",
"date": "date",
"post_text": "Post title",
"media_url": "media url attached",
"likes": 234,
"shares": 500,
"num_comments": 100,
"scrape_time": "date",
"comments": [
{
"author_name": "Name",
"text": "Comment 1",
"created": "Date"
},
{
"author_name": "Name",
"text": "Comment 2",
"created": "Date"
}
]
}

How to convert JSON into a table using python?

HI I'm a beginner with python.
I have a csv file which I retrieve from my database. The table has several columns, one of which contains data in json. I have difficulty converting json data into a table to be saved in pdf.
First I load the csv. Then I take the column that has the data in json and I convert them.
df = pd.DataFrame(pd.read_csv('database.csv',sep=';'))
print(df.head())
for index, row in df.iterrows():
str = row["json_data"]
val = ast.literal_eval(str)
val1 = json.loads(str)
A sample of my json is that
{
"title0": {
"id": "1",
"ex": "90",
},
"title1": {
"name": "Alex",
"surname": "Boris",
"code": "RBAMRA4",
},
"title2": {
"company": "LA",
"state": "USA",
"example": "9090",
}
}
I'm trying to create a table like this
-------------------------------------------\
title0
--------------------------------------------\
id 1
ex 90
---------------------------------------------\
title 1
---------------------------------------------\
name Alex
surname Boris
code RBAMRA4
----------------------------------------------\
title2
----------------------------------------------\
company LA
state USA
example 9090
You can use the Python JSON library to achieve this.
import json
my_str = open("outfile").read()
val1 = json.loads(my_str)
for key in val1:
print("--------------------\n"+key+"\n--------------------")
for k in val1[key]:
print(k, val1[key][k])
Load the JSON data into the json.jsonloads function, this will deserialize the string and convert it to a python object (the whole object becomes a dict).
Then you loop through the dict the way you want.
--------------------
title0
--------------------
id 1
ex 90
--------------------
title1
--------------------
name Alex
surname Boris
code RBAMRA4
--------------------
title2
--------------------
company LA
state USA
example 9090
Read about parsing a dict, then you will understand the for loop.

How can I use jsonpath in python to change an element value in the json object

I have the following json object (Say car_details.json):
{
"name":"John",
"age":30,
"cars":
[
{
"car_model": "Mustang",
"car_brand": "Ford"
},
{
"car_model": "cx-5",
"car_brand": "Mazda"
}
}
I want to change the value of car_model from cx-5 to cx-9 through python code.
I am providing the json path to this element, through an external file. The json-path expression is basically represented as a string. Something like this:
'cars[2].car_model'
And the new value is also provided through an external file as a string:
'cx-9'
Now how do I parse through car_details.json using the jsonpath expression, and change its value to the one provided as string, and finally return the modified json object
P.S I want to do this through python code
This is an approach without using json module. Load your data in variable. Then iterate over cars key/values. If you find the key that is the value you are looking for set it to new value.
Also note: you need to close your array block, otherwise your above json is not valid. Generally I use an online json parser to check if my data is valid etc. (may be helpful in future).
data = {
"name":"John",
"age":30,
"cars":
[
{
"car_model": "Mustang",
"car_brand": "Ford"
},
{
"car_model": "cx-5",
"car_brand": "Mazda"
}
]
}
for cars in data['cars']:
for key, value in cars.items():
if key == "car_model" and value == "cx-5":
cars[key] = "cx-9"
print(data)
If you want to load your json object from a file, let's assume it is called "data.json" and is in the same directory as the python script you are going to run:
import json
with open('data.json') as json_data:
data = json.load(json_data)
for cars in data['cars']:
for key, value in cars.items():
if key == "car_model" and value == "cx-5":
cars[key] = "cx-9"
print(data)
Now if you'd like to write the content to the original file or new file, in this case I am writing to a file called "newdata.json":
import json
import re
with open('data.json') as json_data:
data = json.load(json_data)
print(data)
with open('external.txt') as f:
content = f.read()
print(content)
for cars in data['cars']:
for key, value in cars.items():
if key == "car_model" and value == "cx-5":
cars[key] = content
with open('newdata.json', 'w') as outfile:
json.dump(data, outfile)

Python for loop syntax

I started learning to code recently, and I had a query about some for loop syntax in python. I've been having a look at the NPR API module on codecademy (which, I realize, is not a great environment for learning anything) and the way a for loop is presented has me confused. The part in question:
from urllib2 import urlopen
from json import load
url = "http://api.npr.org/query?apiKey="
key = "API_KEY"
url += key
url += "&numResults=3&format=json&id="
npr_id = raw_input("Which NPR ID do you want to query?")
url += npr_id
print url
response = urlopen(url)
json_obj = load(response)
for story in json_obj["list"]["story"]:
print story["title"]["$text"]
I'm confused about the
for story in json_obj["list"]["story"]:
print story["title"]["$text"]
lines. Is it some kind of nested list?
Think of a json object as a dictionary.
The square bracket notation is how the json object is accessed.
Basically json_obj["list"]["story"] is a nested dictionary with an array of dictionaries and it would make more sense if the key name was json_obj["list"]["stories"].
The json_obj has a key "list" and the value of json_obj["list"] has a key of "story" and each story has a "title".
There is an example here of parsing json: Parsing values from a JSON file using Python?
Here is the structure of what the json object would look like based on how you have written it:
json_obj = {
'list': {
# this is the array that is being iterated
'story': [
{'title': {
'$text': 'some title1'
}
},
{'title': {
'$text': 'some title2'
}
},
{'title': {
'$text': 'some title3'
}
},
],
},
}
So the for loop:
for story in json_obj["list"]["story"]:
# each iteration story become this
# story = {'title': {'$text': 'some title2'}}
print story["title"]["$text"]
Which is similar to:
print json_obj['list']['story'][0]['title']['$text']
print json_obj['list']['story'][1]['title']['$text']
print json_obj['list']['story'][2]['title']['$text']

Python - JSON to CSV table?

I was wondering how I could import a JSON file, and then save that to an ordered CSV file, with header row and the applicable data below.
Here's what the JSON file looks like:
[
{
"firstName": "Nicolas Alexis Julio",
"lastName": "N'Koulou N'Doubena",
"nickname": "N. N'Koulou",
"nationality": "Cameroon",
"age": 24
},
{
"firstName": "Alexandre Dimitri",
"lastName": "Song-Billong",
"nickname": "A. Song",
"nationality": "Cameroon",
"age": 26,
etc. etc. + } ]
Note there are multiple 'keys' (firstName, lastName, nickname, etc.). I would like to create a CSV file with those as the header, then the applicable info beneath in rows, with each row having a player's information.
Here's the script I have so far for Python:
import urllib2
import json
import csv
writefilerows = csv.writer(open('WCData_Rows.csv',"wb+"))
api_key = "xxxx"
url = "http://worldcup.kimonolabs.com/api/players?apikey=" + api_key + "&limit=1000"
json_obj = urllib2.urlopen(url)
readable_json = json.load(json_obj)
list_of_attributes = readable_json[0].keys()
print list_of_attributes
writefilerows.writerow(list_of_attributes)
for x in readable_json:
writefilerows.writerow(x[list_of_attributes])
But when I run that, I get a "TypeError: unhashable type:'list'" error. I am still learning Python (obviously I suppose). I have looked around online (found this) and can't seem to figure out how to do it without explicitly stating what key I want to print...I don't want to have to list each one individually...
Thank you for any help/ideas! Please let me know if I can clarify or provide more information.
Your TypeError is occuring because you are trying to index a dictionary, x with a list, list_of_attributes with x[list_of_attributes]. This is not how python works. In this case you are iterating readable_json which appears it will return a dictionary with each iteration. There is no need pull values out of this data in order to write them out.
The DictWriter should give you what your looking for.
import csv
[...]
def encode_dict(d, out_encoding="utf8"):
'''Encode dictionary to desired encoding, assumes incoming data in unicode'''
encoded_d = {}
for k, v in d.iteritems():
k = k.encode(out_encoding)
v = unicode(v).encode(out_encoding)
encoded_d[k] = v
return encoded_d
list_of_attributes = readable_json[0].keys()
# sort fields in desired order
list_of_attributes.sort()
with open('WCData_Rows.csv',"wb+") as csv_out:
writer = csv.DictWriter(csv_out, fieldnames=list_of_attributes)
writer.writeheader()
for data in readable_json:
writer.writerow(encode_dict(data))
Note:
This assumes that each entry in readable_json has the same fields.
Maybe pandas could do this - but I newer tried to read JSON
import pandas as pd
df = pd.read_json( ... )
df.to_csv( ... )
pandas.DataFrame.to_csv
pandas.io.json.read_json
EDIT:
data = ''' [
{
"firstName": "Nicolas Alexis Julio",
"lastName": "N'Koulou N'Doubena",
"nickname": "N. N'Koulou",
"nationality": "Cameroon",
"age": 24
},
{
"firstName": "Alexandre Dimitri",
"lastName": "Song-Billong",
"nickname": "A. Song",
"nationality": "Cameroon",
"age": 26,
}
]'''
import pandas as pd
df = pd.read_json(data)
print df
df.to_csv('results.csv')
result:
age firstName lastName nationality nickname
0 24 Nicolas Alexis Julio N'Koulou N'Doubena Cameroon N. N'Koulou
1 26 Alexandre Dimitri Song-Billong Cameroon A. Song
With pandas you can save it in csv, excel, etc (and maybe even directly in database).
And you can do some operations on data in table and show it as graph.

Categories

Resources