Running sentiment analysis for facebook data in json format - python

I would first like to say that I am very new to coding and know the very basics at best
I was tasked with scraping data from facebook and running a sentiment analysis on it. I got the data using scraping-bot.io and I have it on a json file with the following format
{
"owner_url": "https://www.facebook.com/########",
"url": "https://www.facebook.com/post",
"name": "Page name",
"date": "date",
"post_text": "Post title",
"media_url": "media url attached",
"likes": ###,
"shares": ###,
"num_comments": ###,
"scrape_time": "date",
"comments": [
{
"author_name": "Name",
"text": "Comment text",
"created": "Date"
},
The posts are in spanish and so I looked up for a library to run the analysis with. I settled on https://pypi.org/project/sentiment-analysis-spanish/ (not sure if it's the best one, so I'm open to suggestions on that front as well)
Ideally I would like to be able to open the json file, run the sentiment analysis on "text" and then save that data into the same or a new file to visualize in another program.
This is what I have so far
from sentiment_analysis_spanish import sentiment_analysis
sentiment = sentiment_analysis.SentimentAnalysisSpanish()
f = open('C:/Users/vnare/Documents/WebScraping.json', encoding='utf-8-sig')
data = json.load(f)
for i in range(len('text')):
print(sentiment.sentiment(i))
Currently it gives me the following error AttributeError: 'int' object has no attribute 'lower'
But I'm sure there's far more that I'm doing wrong there.
I appreciate any help provided

AttributeError: 'int' object has no attribute 'lower' means integer cannot be lower-cased. This means that somewhere in your code, you are trying to call the lower() string method on an integer.
If you take a look at the documentation for the sentiment analysis you provided, you will see that print(sentiment.sentiment("something")) will evaluate the sentiment of "something" and give you a score between 1 and 0.
My guess is that when you call sentiment.sentiment("some text") it will use lower() to convert whatever text is passed through to all lowercase. This would be fine if you were passing a string, but you are currently passing an integer!
By using for i in range(), you are indicating that you would like to take a range of numbers from 0 to the end number. This means that your i will always be an integer!
You need to instead loop through your JSON data to access the key/value pairs. "text" cannot be accessed directly as you've done above, but from within the JSON data, it can be!
https://www.geeksforgeeks.org/json-with-python/
The important thing to look at is the format of the JSON data that you are trying to access. First, you need to access a dictionary key named "comments". However, what is inside of 'comments'?
[{'author_name': 'Name', 'text': 'Comment text', 'created': 'Date'}]
It's actually another dictionary of key-value pairs inside of a list. Given that list indices start at 0 and there is only one list element (the dictionary) in your example, we need to next use the index 0 to access the dictionary inside. Now, we will look for the key 'text' as you were initially.
When you are learning python, I highly recommend using a lot of print statements when trying to debug! This helps you see what your program sees so you know where the errors are.
import json
from sentiment_analysis_spanish import sentiment_analysis
sentiment = sentiment_analysis.SentimentAnalysisSpanish()
f = open('WebScraping.json', encoding='utf-8-sig')
data = json.load(f)
print(data)
comments = data['comments']
print(comments)
text = comments[0]['text']
print(text)
sentimentScore = sentiment.sentiment(text)
print(sentimentScore)
When you run this, the output will show you what is inside 'data', what is inside 'comments', what is inside 'text', and what the sentiment score is.
{'owner_url': 'https://www.facebook.com/########', 'url': 'https://www.facebook.com/post', 'name': 'Page name', 'date': 'date', 'post_text': 'Post title', 'media_url': 'media url attached', 'likes': 234, 'shares': 500, 'num_comments': 100, 'scrape_time': 'date', 'comments': [{'author_name': 'Name', 'text': 'Comment text', 'created': 'Date'}]}
[{'author_name': 'Name', 'text': 'Comment text', 'created': 'Date'}]
Comment text
0.49789225920557484
This is what helped me see that inside of 'comments' was a dictionary within a list.
Now that you understand how it works, here is a more efficient way to run the code without all the extra prints! You can see I am now implementing the for loop you used earlier, as there may be multiple comments in a real-life scenario.
import json
from sentiment_analysis_spanish import sentiment_analysis
sentiment = sentiment_analysis.SentimentAnalysisSpanish()
f = open('WebScraping.json', encoding='utf-8-sig')
data = json.load(f)
comments = data['comments']
i = 0
for i in range (len(comments)):
comment = comments[i]['text']
sentimentScore = sentiment.sentiment(comment)
print(f"The sentiment score of this comment is {sentimentScore}.")
print(f"The comment was: '{comment}'.")
This results in the following output.
The sentiment score of this comment is 0.49789225920557484.
The comment was: 'Comment 1'.
The sentiment score of this comment is 0.49789225920557484.
The comment was: 'Comment 2'.
This is the file that I used for reference.
{
"owner_url": "https://www.facebook.com/########",
"url": "https://www.facebook.com/post",
"name": "Page name",
"date": "date",
"post_text": "Post title",
"media_url": "media url attached",
"likes": 234,
"shares": 500,
"num_comments": 100,
"scrape_time": "date",
"comments": [
{
"author_name": "Name",
"text": "Comment 1",
"created": "Date"
},
{
"author_name": "Name",
"text": "Comment 2",
"created": "Date"
}
]
}

Related

Python retrieve specified nested JSON value

I have a .json file with many entries looking like this:
{
"name": "abc",
"time": "20220607T190731.442",
"id": "123",
"relatedIds": [
{
"id": "456",
"source": "sourceA"
},
{
"id": "789",
"source": "sourceB"
}
],
}
I am saving each entry in a python object, however, I only need the related ID from source A. Problem is, the related ID from source A is not always first place in that nested list.
So data['relatedIds'][0]['id'] is not reliable to yield the right Id.
Currently I am solving the issue like this:
import json
with open("filepath", 'r') as file:
data = json.load(file)
for value in data['relatedIds']:
if(value['source'] == 'sourceA'):
id_from_a = value['id']
entry = Entry(data['name'], data['time'], data['id'], id_from_a)
I don't think this approach is the optimal solution though, especially if relatedIds list gets longer and more entries appended to the JSON file.
Is there a more sophisticated way of singling out this 'id' value from a specified source without looping through all entries in that nested list?
For a cleaner solution, you could try using python's filter() function with a simple lambda:
import json
with open("filepath", 'r') as file:
data = json.load(file)
filtered_data = filter(lambda a : a["source"] == "sourceA", data["relatedIds"])
id_from_a = next(filtered_data)['id']
entry = Entry(data['name'], data['time'], data['id'], id_from_a)
Correct me if I misunderstand how your json file looks, but it seems to work for me.
One step at a time, in order to get to all entries:
>>> data["relatedIds"]
[{'id': '789', 'source': 'sourceB'}, {'id': '456', 'source': 'sourceA'}]
Next, in order to get only those entries with source=sourceA:
>>> [e for e in data["relatedIds"] if e["source"] == "sourceA"]
[{'id': '456', 'source': 'sourceA'}]
Now, since you don't want the whole entry, but just the ID, we can go a little further:
>>> [e["id"] for e in data["relatedIds"] if e["source"] == "sourceA"]
['456']
From there, just grab the first ID:
>>> [e["id"] for e in data["relatedIds"] if e["source"] == "sourceA"][0]
'456'
Can you get whatever generates your .json file to produce the relatedIds as an object rather than a list?
{
"name": "abc",
"time": "20220607T190731.442",
"id": "123",
"relatedIds": {
"sourceA": "456",
"sourceB": "789"
}
}
If not, I'd say you're stuck looping through the list until you find what you're looking for.

Google Docs API Batch Update - multiple updates to same template file?

I am trying to use the Google Docs API to auto generate a document based on values from a JSON file.
I have created a template Google Doc which is structured like this:
I went to the shop and bought the following item: {{foodtype}}.
It cost {{price}}.
The JSON looks like this:
{
'food':'apples",
'price':'10'
}
So using the Google Docs replaceAllText method I can use a Python script to update the doc like this:
def update_doc(food_json,docs)
requests=[
{
'replaceAllText': {
'containsText': {
'text': '{{foodtype}}',
'matchCase': 'true'
},
'replaceText': 'food_json['food']',
}},]
doc_id = 'xxxxxxxxxxxxx'
result = docs.documents().batchUpdate(documentId=doc_id, body={'requests': requests}).execute()
This works perfectly. The {{foodtype}} tag is replaced in the document with the string from the JSON.
My problem is that I cannot do this when I want to add multiple updates to a document. Sometimes there might be 5, 10 or even 50 items to add so manually adding tags to a template is not possible.
For example if my JSON looked like this:
{
{basket:
{'food':'apples','price':'10'},
{'food':'bread', 'price':'15'}
{'food':'bananas', 'price': '5'}
}
etc etc etc
}
I want to be able to write a for loop to iterate through the JSON and write a report for every item in the JSON e.g.:
I went to the shop and bought the following item: apple.
It cost 10 cents.
I went to the shop and bought the following item: bread.
It cost 15 cents.
I went to the shop and bought the following item: bananas.
It cost 5 cents.
The code would look something like this:
requests = []
for f in food_json['basket']:
requests.append(f)
requests=[
{
'replaceAllText': {
'containsText': {
'text': '{{foodtype}}',
'matchCase': 'true'
},
'replaceText': 'f['food']',
}},
'replaceAllText': {
'containsText': {
'text': '{{price}}',
'matchCase': 'true'
},
'replaceText': 'f['price']',
}},
]
*Push the update to the Google docs*
However this fails because on the first iteration of the for loop, the text matching the {{foodtype}} tag is replaced and then there is nothing to replace on the next iteration.
I'm stuck as to how to proceed.
I have previously used MS Word and the awesome python-docx-template library to publish documents in this way. It was possible to add Jinja tags in the Word template and then use a for loop directly in the document to achieve what I wanted e.g. just put something like this in a template:
{% for b in basket %}
I went to the shop and bought:
{{ b.food }}
The price was:
{{ b.price }}
{% endfor %}
This worked perfectly, but I cannot see how to reproduce the same result with the Google Docs API and Python. Any suggestions?
I believe your situation and goal are as follows.
In your situation, there are several same texts of {{foodtype}} in a Document. And the number of {{foodtype}} is the same as the value of food.
You want to replace 1st {{foodtype}} with the 1st value of food.
You want to achieve this using googleapis for python.
If my understanding is correct, unfortunately, in the current stage, when replaceAllText of Docs API is used, all values of {{foodtype}} are replaced with the 1st value of food in one batch request. I thought that this might be the reason for your issue.
Flow:
In order to achieve your goal using Docs API with python, as a workaround, I would like to propose the following flow.
Search each {{foodtype}} and {{price}} in the texts of Google Document.
In this case, I used documents.get method.
Add the count to the texts of {{foodtype}} and {{price}}. It's like {{foodtype1}}, {{price1}}, {{foodtype2}} and {{price2}} and so on.
For the texts like {{foodtype1}}, {{price1}}, {{foodtype2}} and {{price2}} and so on, each value is replaced with them using replaceAllText request.
In this flow, the batchUpdate request is used for one call. The sample script is as follows.
Sample script:
# Please set the values you want to replace.
sample = {'basket': [
{'food': 'apples', 'price': '10'},
{'food': 'bread', 'price': '15'},
{'food': 'bananas', 'price': '5'}
]}
documentId = '###' # Please set your document ID.
docs = build('docs', 'v1', credentials=creds)
obj = docs.documents().get(documentId=documentId, fields='body').execute()
content = obj.get('body').get('content')
foodCount = 0
priceCount = 0
requests = []
for c in content:
if 'paragraph' in c:
p = c.get('paragraph')
for e in p.get('elements'):
textRun = e.get('textRun')
if textRun:
text = textRun.get('content')
if '{{foodtype}}' in text:
foodCount += 1
requests.append(
{
"replaceAllText": {
"replaceText": sample['basket'][foodCount - 1]['food'],
"containsText": {
"text": '{{foodtype' + str(foodCount) + '}}',
"matchCase": True
}
}
}
)
requests.append({
"insertText": {
"location": {
"index": e['startIndex'] + text.find('{{foodtype}}') + len('{{foodtype')
},
"text": str(foodCount)
}})
if '{{price}}' in text:
priceCount += 1
requests.append(
{
"replaceAllText": {
"replaceText": sample['basket'][priceCount - 1]['price'],
"containsText": {
"text": '{{price' + str(priceCount) + '}}',
"matchCase": True
}
}
}
)
requests.append({
"insertText": {
"location": {
"index": e['startIndex'] + text.find('{{price}}') + len('{{price')
},
"text": str(priceCount)
}})
if requests != []:
requests.reverse()
docs.documents().batchUpdate(documentId=documentId, body={'requests': requests}).execute()
Note:
In your question, the following values are shown as a sample value.
{
{basket:
{'food':'apples','price':'10'},
{'food':'bread', 'price':'15'}
{'food':'bananas', 'price': '5'}
}
etc etc etc
}
But, I thought that this might not be correct. So in my sample script, I used the following sample value. Please modify each value for your actual situation.
sample = {'basket': [
{'food': 'apples', 'price': '10'},
{'food': 'bread', 'price': '15'},
{'food': 'bananas', 'price': '5'}
]}
In this sample script, as the sample for explaining my workaround, the paragraphs in Google Document are searched. For example, when you want to search the texts in the tables, please modify the above script.
This sample script supposes that you have already been able to get and put values for Google Document using Docs API. Please be careful this.
References:
Method: documents.get
Method: documents.batchUpdate
ReplaceAllTextRequest

How to parse the json data using python?

I've been messing around with JSON for some time. I want to get the values of "box" and "text" in this format using python can someone help me out how to resolve this example output:[92,197,162,215,AUTHORS,...!]
{ "form": [ { "box": [ 92,162,197,215], "text": "AUTHORS", "label": "question", "words": [ { "box": [ 92,197,162,215 ],"text": "AUTHORS"} ], "linking": [[0,13]],"id": 0 },
import os
import json
# Directory name consisting of json
file = open('033.json')
data = json.load(file)
result = []
for value in data['form']:
my_dict=[]
my_dict=value.get('box')
print(my_dict)
result.append(my_dict)
Probably like this:
collector = []
for obj in form:
collector.append({"box": obj["box"], "text": obj["text"]})
print(collector)
Okay, few issues with your code -
Why is your list named my_dict? A name should indicate what the object is/ what it contains. Your name does the opposite and if someone works with that code in the future then it will most likely confuse them.
Why are you initializing a list before doing this value.get('box')?
As for the solution, it is a short piece of code that would require 2 lines of code.
result = []
for form_dict in data['form']:
result.append(tuple(form_dict[key]
for key in ('box', 'text') if key in form_dict))
That piece of code would result in this: [([92, 162, 197, 215], 'AUTHORS')] based on the data you provided.
This is assuming that there can be more items in the data['form'] list, otherwise the for loop is not needed.

The best way to transform a response to a json format in the example

Appreciate if you could help me for the best way to transform a result into json as below.
We have a result like below, where we are getting an information on the employees and the companies. In the result, somehow, we are getting a enum like T, but not for all the properties.
[ {
"T.id":"Employee_11",
"T.category":"Employee",
"node_id":["11"]
},
{
"T.id":"Company_12",
"T.category":"Company",
"node_id":["12"],
"employeecount":800
},
{
"T.id":"id~Employee_11_to_Company_12",
"T.category":"WorksIn",
},
{
"T.id":"Employee_13",
"T.category":"Employee",
"node_id":["13"]
},
{
"T.id":"Parent_Company_14",
"T.category":"ParentCompany",
"node_id":["14"],
"employeecount":900,
"childcompany":"Company_12"
},
{
"T.id":"id~Employee_13_to_Parent_Company_14",
"T.category":"Contractorin",
}]
We need to transform this result into a different structure and grouping based on the category, if category in Employee, Company and ParentCompany, then it should be under the node_properties object, else, should be in the edge_properties. And also, apart from the common properties(property_id, property_category and node), different properties to be added if the category is company and parent company. There are few more logic also where we have to get the from and to properties of the edge object based on the 'to' . the expected response is,
"node_properties":[
{
"property_id":"Employee_11",
"property_category":"Employee",
"node":{node_id: "11"}
},
{
"property_id":"Company_12",
"property_category":"Company",
"node":{node_id: "12"},
"employeecount":800
},
{
"property_id":"Employee_13",
"property_category":"Employee",
"node":{node_id: "13"}
},
{
"property_id":"Company_14",
"property_category":"ParentCompany",
"node":{node_id: "14"},
"employeecount":900,
"childcompany":"Company_12"
}
],
"edge_properties":[
{
"from":"Employee_11",
"to":"Company_12",
"property_id":"Employee_11_to_Company_12",
},
{
"from":"Employee_13",
"to":"Parent_Company_14",
"property_id":"Employee_13_to_Parent_Company_14",
}
]
In java, we have used the enhanced for loop, switch etc. How we can write the code in the python to get the structure as above from the initial result structure. ( I am new to python), thank you in advance.
Regards
Here is a method that I quickly made, you can adjust it to your requirements. You can use regex or your own function to get the IDs of the edge_properties then assign it to an object like the way I did. I am not so sure of your full requirements but if that list that you gave is all the categories then this will be sufficient.
def transform(input_list):
node_properties = []
edge_properties = []
for input_obj in input_list:
# print(obj)
new_obj = {}
if input_obj['T.category'] == 'Employee' or input_obj['T.category'] == 'Company' or input_obj['T.category'] == 'ParentCompany':
new_obj['property_id'] = input_obj['T.id']
new_obj['property_category'] = input_obj['T.category']
new_obj['node'] = {input_obj['node_id'][0]}
if "employeecount" in input_obj:
new_obj['employeecount'] = input_obj['employeecount']
if "childcompany" in input_obj:
new_obj['childcompany'] = input_obj['childcompany']
node_properties.append(new_obj)
else: # You can do elif == to as well based on your requirements if there are other outliers
# You can use regex or whichever method here to split the string and add the values like above
edge_properties.append(new_obj)
return [node_properties, edge_properties]

unable to convert a json to dataframe

I'm trying to convert a huge JSON file to a data frame in order to preprocess it for sentiment analysis.But unable to convert it.
The problem is at pd.read_json
import json
import pandas as pd
with open("/content/drive/My Drive/timeline_1.jsonl") as f:
data = f.readlines()
data_json_str = "[" + ','.join(data) + "]"
data_df = pd.read_json(data_json_str)
ValueError: Unmatched ''"' when decoding 'string'
Your data is probably corrupted, at least in one place (maybe more).
One method to find such a place is to run your code, not on the whole file,
but on chunks of it.
For example, run your code on:
the first half of your file,
the second half.
If any part runs OK, then it is free from errors.
The next step is to repeat the above procedure on each "failed" chunk.
Another method: Look thoroughly at your StackTrace, maybe somewhere there is
the line number in the source file (do not confuse it with the line number
of Python code).
For now you assemble the whole text as a single line, so even if the StackTrace
contains such number, it is most likely just 1.
To ease your investigation, change your code in such a way that each
source line is in a separate line in the joined text. Something like:
data_json_str = "[" + ',\n'.join(data) + "]"
Then execute your code again and read the number shown (where the error occurred),
now equal to the number of source line.
Then look at this line, correct it and your code should run with no error.
Edit after your comment with source data
In your data I noticed that:
it contains two JSON objects (rows),
but without any comma between them.
I made the following completions and changes:
added [ and ] at the beginning / end,
added a comma after the first {...}.
so that the input string was:
data_json_str = '''[
{"id": "99014576299056245", "created_at": "2017-11-16T14:28:53.919Z",
"sensitive": false, "spoiler_text": "", "language": "en",
"uri": "mastodon.gamedev.place/users/jaggy/statuses/99014576299056245",
"instance": "mastodon.gamedev.place",
"content": "<p>Coding a cheeky skill before bed. Not as much as I&apos;d like but had drinks with co-workers after work so shrug ^_^</p>",
"account_id": "434", "tag_list": [], "media_attachments": [], "emojis": [], "mentions": []},
{"id": "99014544879467317", "created_at": "2017-11-16T14:20:54.462Z", "sensitive": false}
]'''
Then performed your instruction to read this string:
data_df = pd.read_json(data_json_str)
and got a DataFrame with 2 rows (no error).
Initially I suspected &apos; as a possible source of error, but read_json
did cope also with this case.
But when I deleted the comma after the first {...}, I got error:
ValueError: Unexpected character found when decoding array value (2)
(other error than yours).
I use Python 3.7.0 and Pandas 0.25.
If you have some older version of either Python or Pandas, maybe you should
upgrade them?
The real problem is probably connected with some "weak point" in JSON
parser (I'm not sure wherher it is a part of Python or Pandas).
Before you upgrade, perform another test: Drop the mentioned &apos; from
the input string and attempt read_json again.
If you get this time a proper result, this will confirm my suspicion
that the JSON parser in your installation has flaws and will be an
important support of my advice to upgrade your software.
Use pandas.io.json.json_normalize:
Data:
Given the data as a list of dicts in a file named test.json
[{
"id": "99014576299056245",
"created_at": "2017-11-16T14:28:53.919Z",
"sensitive": false,
"spoiler_text": "",
"language": "en",
"uri": "mastodon.gamedev.place/users/jaggy/statuses/99014576299056245",
"instance": "mastodon.gamedev.place",
"content": "<p>Coding a cheeky skill before bed. Not as much as I&apos;d like but had drinks with co-workers after work so shrug ^_^</p>",
"account_id": "434",
"tag_list": [],
"media_attachments": [],
"emojis": [],
"mentions": []
}, {
"id": "99014544879467317",
"created_at": "2017-11-16T14:20:54.462Z",
"sensitive": false,
"spoiler_text": "",
"language": "en",
"uri": "mastodon.gamedev.place/users/jaggy/statuses/99014544879467317",
"instance": "mastodon.gamedev.place",
"content": "<p>Coding a cheeky skill before bed. Not as much as I&apos;d like but had drinks with co-workers after work so shrug ^_^</p>",
"account_id": "434",
"tag_list": [],
"media_attachments": [],
"emojis": [],
"mentions": []
}
]
Code to read the data:
import pandas as pd
import json
from pathlib import Path
from pandas.io.json import json_normalize
# path to file
p = Path(r'c:\some_directory_with_data\test.json')
# read the file in and load using the json module
with p.open('r', encoding='utf-8') as f:
data = json.loads(f.read())
# create a dataframe
df = json_normalize(data)
# dataframe view
id created_at sensitive spoiler_text language uri instance content account_id tag_list media_attachments emojis mentions
99014576299056245 2017-11-16T14:28:53.919Z False en mastodon.gamedev.place/users/jaggy/statuses/99014576299056245 mastodon.gamedev.place <p>Coding a cheeky skill before bed. Not as much as I&apos;d like but had drinks with co-workers after work so shrug ^_^</p> 434 [] [] [] []
99014544879467317 2017-11-16T14:20:54.462Z False en mastodon.gamedev.place/users/jaggy/statuses/99014544879467317 mastodon.gamedev.place <p>Coding a cheeky skill before bed. Not as much as I&apos;d like but had drinks with co-workers after work so shrug ^_^</p> 434 [] [] [] []
Option 2:
Data
The data is in a file, in the form of rows of dicts
Not in a list
Separated with a newline
This is not a valid JSON file
{"id": "99014576299056245", "created_at": "2017-11-16T14:28:53.919Z", "sensitive": false, "spoiler_text": "", "language": "en", "uri": "mastodon.gamedev.place/users/jaggy/statuses/99014576299056245", "instance": "mastodon.gamedev.place", "content": "<p>Coding a cheeky skill before bed. Not as much as I&apos;d like but had drinks with co-workers after work so shrug ^_^</p>", "account_id": "434", "tag_list": [], "media_attachments": [], "emojis": [], "mentions": []}
{"id": "99014544879467317", "created_at": "2017-11-16T14:20:54.462Z", "sensitive": false, "spoiler_text": "", "language": "en", "uri": "mastodon.gamedev.place/users/jaggy/statuses/99014544879467317", "instance": "mastodon.gamedev.place", "content": "<p>Coding a cheeky skill before bed. Not as much as I&apos;d like but had drinks with co-workers after work so shrug ^_^</p>", "account_id": "434", "tag_list": [], "media_attachments": [], "emojis": [], "mentions": []}
Code to read this data
Reading the file in with the following code
data will be a list of str, where each row of the file, is a str in the list
Use ast.literal_eval to convert the str back to a dict
literal_eval won't work if there are invalid values in the str (e.g. false instead of False, true instead of True).
This will cause a ValueError: malformed node or string: <_ast.Name object at 0x000002B7240B7888>, which isn't a particularly helpful error
I've added a try-except block to print any row that causes an issue, add to the values_to_fix dict until you get them all.
import pandas as pd
import json
from pathlib import Path
from pandas.io.json import json_normalize
from ast import literal_eval
# path to file
p = Path(r'c:\some_directory_with_data\test.json')
list_of_dicts = list()
with p.open('r', encoding='utf-8') as f:
data = f.readlines()
for x in data:
values_to_fix = {'false': 'False',
'true': 'True',
'none': 'None'}
for k, v in values_to_fix.items():
x = x.replace(k, v)
try:
x = literal_eval(x)
list_of_dicts.append(x)
except ValueError as e:
print(e)
print(x)
df = json_normalize(list_of_dicts)
# this output is the same as that shown above

Categories

Resources