Unexpected behavior with open() and numpy.savetext() functions - python

Problem
I am trying to output statistics about a table, followed by more table data using Pandas and numpy.
When I execute the following code:
import pandas as pd
import numpy as np
data = pd.read_csv(r'c:\Documents\DS\CAStateBuildingMetrics.csv')
waterUsage = data["Water Use (All Water Sources) (kgal)"]
dept = data[["Department Name", "Property Id"]]
mean = str(waterUsage.mean())
median = str(waterUsage.median())
most = str(waterUsage.mode())
hw1 = open(r'c:\Documents\DS\testFile', "a")
hw1.write("Mean Water Usage Median Water Usage Most Common Usage Amounts\n")
hw1.write(mean+' '+median+' '+most)
np.savetxt(r'c:\Documents\DS\testFile', dept.values, fmt='%s')
The table output by np.savetext is written into c:\Documents\DS\testFile before the statistics about Mean, Median, and Mode water usage are written into the file. Below is the output I am describing:
Here is a sample of the table output, which ends up to be 1700 rows.
Capitol Area Development Authority 1259182
Capitol Area Development Authority 1259200
Capitol Area Development Authority 1259218
California Department of Forestry and Fire Protection 3939905
California Department of Forestry and Fire Protection 3939906
California Department of Forestry and Fire Protection 3939907
After this, the script outputs the statistics in this format
Mean Water Usage Median Water Usage Most Common Usage Amounts
6913.1633414932685 182.35 0 165.0
Type: float64
Question
How do I adjust the behavior to guarantee that the statistics appear before the table?

The issue, as pointed out by #hpaulj, is that the same open file is not being referenced.
Replacing
np.savetxt(r'c:\Documents\DS\testFile', dept.values, fmt='%s')
With
np.savetxt(hw1, dept.values, fmt='%s')
hw1.close()
Will write all information in the expected order in the same file. Closing it follows best practices of handling files in Python.

Related

How to load complex data using Pyspark

I have a CSV dataset that looks like the below:
Also, PFB data in form of text:
Timestamp,How old are you?,What industry do you work in?,Job title,What is your annual salary?,Please indicate the currency,Where are you located? (City/state/country),How many years of post-college professional work experience do you have?,"If your job title needs additional context, please clarify here:","If ""Other,"" please indicate the currency here: "
4/24/2019 11:43:21,35-44,Government,Talent Management Asst. Director,75000,USD,"Nashville, TN",11 - 20 years,,
4/24/2019 11:43:26,25-34,Environmental nonprofit,Operations Director,"65,000",USD,"Madison, Wi",8 - 10 years,,
4/24/2019 11:43:27,18-24,Market Research,Market Research Assistant,"36,330",USD,"Las Vegas, NV",2 - 4 years,,
4/24/2019 11:43:27,25-34,Biotechnology,Senior Scientist,34600,GBP,"Cardiff, UK",5-7 years,,
4/24/2019 11:43:29,25-34,Healthcare,Social worker (embedded in primary care),55000,USD,"Southeast Michigan, USA",5-7 years,,
4/24/2019 11:43:29,25-34,Information Management,Associate Consultant,"45,000",USD,"Seattle, WA",8 - 10 years,,
4/24/2019 11:43:30,25-34,Nonprofit ,Development Manager ,"51,000",USD,"Dallas, Texas, United States",2 - 4 years,"I manage our fundraising department, primarily overseeing our direct mail, planned giving, and grant writing programs. ",
4/24/2019 11:43:30,25-34,Higher Education,Student Records Coordinator,"54,371",USD,Philadelphia,8 - 10 years,equivalent to Assistant Registrar,
4/25/2019 8:35:51,25-34,Marketing,Associate Product Manager,"43,000",USD,"Cincinnati, OH, USA",5-7 years,"I started as the Marketing Coordinator, and was given the ""Associate Product Manager"" title as a promotion. My duties remained mostly the same and include graphic design work, marketing, and product management.",
Now, I tried the below code to load the data:
df = spark.read.option("header", "true").option("multiline", "true").option(
"delimiter", ",").csv("path")
It gives me the output as below for the last record which divides the columns and also the output is not as expected:
The value should be null for the last column i.e "If ""Other,"" please indicate the currency here: " and the entire string should be wrapped up in the earlier column which is "If your job title needs additional context, please clarify here:"
I also tried .option('quote','/"').option('escape','/"') but didn't work too.
However, when I tried to load this file using Pandas, it was loaded correctly. I was surprised how Pandas can identify where the new column name starts and all. Maybe I can define a String schema for all the columns and load it back to the spark data frame but since I am using the lower spark version it won't work in a distributed manner hence I was exploring a way how Spark can handle this efficiently.
Any help is much appreciated.
Main issue is consecutive double quotes in your csv file.
you have to escape extra double quotes in your csv file
Like this :
4/24/2019 11:43:30,25-34,Higher Education,Student Records Coordinator,"54,371",USD,Philadelphia,8 - 10 years,equivalent to Assistant Registrar,
4/25/2019 8:35:51,25-34,Marketing,Associate Product Manager,"43,000",USD,"Cincinnati, OH, USA",5-7 years,"I started as the Marketing Coordinator, and was given the \\" \ " Associate Product Manager \\" \ " title as a promotion. My duties remained mostly the same and include graphic design work, marketing, and product management.",
After this it is generating result as expected :
df2 = spark.read.option("header",True).csv("sample1.csv")
df2.show(10,truncate=False)
******** Output ********
|4/25/2019 8:35:51 |25-34 |Marketing |Associate Product Manager |43,000 |USD |Cincinnati, OH, USA |5-7 years |I started as the Marketing Coordinator, and was given the ""Associate Product Manager"" title as a promotion. My duties remained mostly the same and include graphic design work, marketing, and product management.|null |null |
Or you can use blow code
df2 = spark.read.option("header",True).option("multiline","true").option("escape","\"").csv("sample1.csv")

How to get a value/row from pandas read_csv

I am using a CSV file with news about crypto. My goal is to practice string manipulation and methods. The CSV looks something like this :
publishdate headlinetext
20130504 COnSTELlATIon DaG iS nOW liStEd On kucoiN eXC?haNGE
20130511 ItA*lys cRypTOCUrREnCy BITgrAil suspeNds OpERatIOnS
20130511 THe diffeRENCe bETWEEn sHarEs aNd cRYpToCUrReN€CiES
20130512 fedS seIzE 47 mIlLION In bItCoinS in FAke ID ST=ing
20130514 ThE diG sTarteD ASiCboOST neTwORK AnD b#ItcoIN cAsH
20130516 BINAncE far atualiZAO progRaMadA NEsTa QuarTAFEIR?a
20130516 tHe EUropeaN UniOn IS pLaNninG tO rEgULAtE Bi=TcOIn
20130516 i!BeROBiT HELpiNG bItcOIn To GO MAinStream IN sPaIN
20130521 EuropES sMALLEr €banKS WELComE CrypTOcuRRENCy uSerS
20130604 BiTcoIn btc hIghER bTc= price BRiNgs mOrE Btc SCAmS
20130610 BITCOin brEAkS 9000 iN latEST LANDmARK Pr#iCE pOinT
20130613 ubcoiN mArkEt movEs ItS HEAdQUArTERS To €SiNgaPorE
20130624 reeds jEwelErs TaKinG bitCOiN ONLiNe And IN Sto$Res
20130705 CoNtrOvE!RSY turnS to cLoSuRe as LItePAy SHUts DowN
20130709 bUll rESiSTAncE BITCOIn pRicE nEeDs brEAK AbOve 9K*
20130714 DIVoRcE DISpUte co#Uple fIghtS ovEr 830k Of BITcoin
20130718 10K agaIn For BitcoiN buT oT!her CRyptOs OUTperfORM
20130724 FACebOoKS liBRa crYptoCUrrency wHER$E aRE ThE BANkS
20130726 COULd eNjIn coiN ReacH A neW AlltiM=E hIGh in APRIL
20130827 the GReaT Tug of WaR betWE=eN bItCOiNS anD AltCOINs
20130827 The SacRAMento kINgs mINE EthEreuM ETh for C#HArItY
20130905 cryPtOCuRREncY aTMs tHE KEY T*o WIdeSPRead ADoPtiOn
20130909 GraySCales EtHereUM TRusT pRICE= VaLUeS Eth aT 6000
(...)
Then I used pandas to read the CSV.
import pandas as pd
news_headlines = pd.read_csv('/content/sample_data/crypto_headlines.csv')
news_headlines
Now I need to get the strings to work with them and change them to lower or upper case and then remove special characters.
However , I don't know which method I should use to extract a string from this variable I created called news_headlines.
Let's say I wanted to extract the 2nd row, with the publish date on 20130511.
Any help ?
Thanks in advance
You can use iloc(), as in the example below, to extract the string related to the second row:
news_headlines.iloc[1, 1]

Want to extract text from a text or pdf file as different paragraphs

Check the following text piece
IN THE HIGH COURT OF GUJARAT AT AHMEDABAD
R/CRIMINAL APPEAL NO. 251 of 2009
FOR APPROVAL AND SIGNATURE:
HONOURABLE MR.JUSTICE R.P.DHOLARIA
==========================================================
1 Whether Reporters of Local Papers may be allowed to see the judgment ?
2 To be referred to the Reporter or not ?
3 Whether their Lordships wish to see the fair copy of the judgment ?
4 Whether this case involves a substantial question of law as to the interpretation of the Constitution of India or any order made thereunder ?
========================================================== STATE OF GUJARAT,S M RAO,FOOD INSPECTOR,OFFICE OF THE Versus DHARMESHBHAI NARHARIBHAI GANDHI ========================================================== Appearance: MS HB PUNANI, APP (2) for the Appellant(s) No. 1 MR DK MODI(1317) for the Opponent(s)/Respondent(s) No. 1 ==========================================================
CORAM: HONOURABLE MR.JUSTICE R.P.DHOLARIA
Date : 12/03/2019
ORAL JUDGMENT
1. The appellant State of Gujarat has
preferred the present appeal under section 378(1)
(3) of the Code of Criminal Procedure, 1973
against the judgment and order of acquittal dated
Page 1 of 12
R/CR.A/251/2009 JUDGMENT
17.11.2008 rendered by learned 2nd Additional
Civil Judge and Judicial Magistrate, First Class,
Nadiad in Food Case No.1 of 2007.
The short facts giving rise to the
present appeal are that on 10.11.2006 at about
18.00 hours, the complainant visited the place of
the respondent accused situated at Juna
Makhanpura, Rabarivad, Nadiad along with panch
witness and the respondent was found dealing in
provisional items. The complainant identified
himself as a Food Inspector and after giving
intimation in Form No.6 has purchased muddamal
sample of mustard seeds in the presence of the
panchas for the purpose of analysis. Thereafter,
the complainant Food Inspector has divided the
said sample in equal three parts and after
completing formalities of packing and sealing
obtained signatures of the vendor and panchas and
out of the said three parts, one part was sent to
the Public Analyst, Vadodara for analysis and
remaining two parts were sent to the Local Health
Authority, Gandhinagar. Thereafter, the Public
Analyst forwarded his report. In the said report,
it is stated that the muddamal sample of mustard
seeds is misbranded which is in breach of the
provisions of the Food Adulteration Act, 1954
(for short “the Act”) and the Rules framed
thereunder. It is alleged that, therefore, the
sample of mustard seeds was misbranded and,
thereby, the accused has committed the offence.
**Page 2 of 12
R/CR.A/251/2009* JUDGMENT*
Hence, the complaint came to be lodged against
the respondent accused.
I want to be able to write a program such that it follows the given constraints. Be wary of the fact that this is only a single file i have like 40k files and it should run on all the files. All the files have some difference but the basic format for every file is the same.
Constraints.
It should start the text extraction process from after the "metadata" . Metadata is the data about the file from the starting of the file i.e " In the high court of gujarat" till Oral Judgment. In all the files i have , there are various POINTS after the string ends. So i need all these points as a separate paragraph ( see the text has 2 points , i need it in different paragraphs ).
Check the lines in italics, these are the panes in the text/pdf file. I need to remove these as these donot have any meaning to the text content i want.
These files are both available in TEXT or PDF format so i can use either. But i am new to python so i dont know how and where to start. I just have basic knowledge in python.
This data is going to be made into a "corpus" for further processes in building a huge expert system so you know what needs to be done i hope.
Read the official python docs!
Start with python's basic str type and its methods. One of its methods, find, will find substrings in your text.
Use the python slicing notation to extract the portion of text you need, e.g.
text = """YOUR TEXT HERE..."""
meta_start = 'In the high court of gujarat'
meta_end = 'ORAL JUDGMENT'
pos1 = text.find(meta_start)
pos2 = text.find(meta_end)
if pos2 > pos1 and pos1 > -1:
# text is found, extract it
text1 = text[meta_start + len(meta_start):meta_end - 1]
After that, you can go ahead and save your extracted text to a database.
Of course, a better and more complicated solution would be to use regular expressions, but that's another story -- try finding the right way for yourself!
As to italics and other text formatting, you won't ever be able to mark it out in plain text (unless you have some 'meta' markers, like e.g. [i] tags).

How to get the country code and national phone number after parsing using the phonenumbers package?

I'm making an application to clean cellphone number. I'm using the phonenumbers package. So I used phonenumbers.parse(Cell No, Country Initials)
and on console it looks like this:
Country Code: ### National Number: ###
I was planning to just delete some text to get the area code and national number but I remembered that the area code and national number character length is different in other countries.
Is there a way to get just the area code and national numbers seperatly
phonenumbers.parse returns a PhoneNumber object.
After y = phonenumbers.parse("020 8366 1177", "GB"), you can access the attributes by e.g. y.country_code or set them by y.country_code = 49 etc.
Extraction of the area code is a bit tricky as not all countries have the concept of an area code. See this code snippet of how to correctly get the area code and national subscriber number separately.

how to read nested json file in pandas dataframe?

I learned how to load and read json file in pandas dataframe. However, I have multiple json files about news and each json file hold a rather complicated nested structure to represent news content and its metadata. I need to read them in pandas dataframe for next downstream analysis. So I figured out how to load and read json file in python. However, the solution that I learned for my json file doesn't work for me. Here is example json data snippet on the fly: example json file and here is what I tried:
import os, json
import pandas as pd
path_to_json = 'FakeNewsNetData/BuzzFeed/FakeNewsContent/' // multiple json files
json_files = [pos_json for pos_json in os.listdir(path_to_json) if pos_json.endswith('.json')]
with open('json_files[0]') as f:
data = pd.DataFrame(json.loads(line) for line in f)
but I didn't get expected pandas dataframe. How can I read json file with nested structure into pandas dataframe nicely? Is there anyone take a look example json data snippet and provide a possible idea to make this work in pandas dataframe? Any thoughts? Thanks
source of json data:
I used json data from this github repository: FakeNewsNet Dataset, so you can browse the how original data looks like and create neat pandas dataframe from it. Any idea to get this done easily? Thanks again
update 2:
I tried following solution but it didn't work for me:
import json
import pandas as pd
with open('FakeNewsContent/BuzzFeed_Fake_1-Webpage.json', 'r') as f:
data = json.load(f)
df = pd.DataFrame(data)
ValueError: arrays must all be same length
import os
import glob
import json
from pandas.io.json import json_normalize
path_to_json = 'FakeNewsNetData/BuzzFeed/FakeNewsContent/'
json_paths = glob.glob(os.path.join(path_to_json, "*.json"))
df = pd.concat((json_normalize(json.load(open(p))) for p in json_paths), axis=0)
df = df.reset_index(drop=True) # Optionally reset index.
This will load all your json files into a single dataframe.
It will also flatten the nested json hierarchy by adding '.' between the keys.
You will probably need to perform further data cleaning, for e.g., by replacing the NaNs with appropriate values. This can be done with the the dataframe's fillna, or by applying a function to transform individual values.
Edit
As I mentioned in the comment, the data is actually messy, so words such as "View All Post" can be one of the values for "authors". See the JSON "BuzzFeed_Fake_26-Webpage.json" for an example.
To remove these entries and possibly others,
# This will be a set of entries you wish to remove.
# Here we only consider "View All Posts".
invalid_entries = {"View All Posts"}
import functools
def fix(x, invalid):
if isinstance(x, list):
return [i for i in x if i not in invalid]
else:
# You can optionally choose to return [] here to fix the NaNs
# and to standardize the types of the values in this column
return x
fix_author = functools.partial(fix, invalid=invalid_entries)
df["authors"] = df.authors.apply(fix_author)
You need to orient your dataframe. Try the below code to update your Update 2 Approach:
x = {"top_img": "http://eaglerising.com/wp-content/uploads/2016/09/terrorism-2.jpg", "text": "On Saturday, September 17 at 8:30 pm EST, an explosion rocked West 23 Street in Manhattan, in the neighborhood commonly referred to as Chelsea, injuring 29 people, smashing windows and initiating street closures. There were no fatalities. Officials maintain that a homemade bomb, which had been placed in a dumpster, created the explosion. The explosive device was removed by the police at 2:25 am and was sent to a lab in Quantico, Virginia for analysis. A second device, which has been described as a \u201cpressure cooker\u201d device similar to the device used for the Boston Marathon bombing in 2013, was found on West 27th Street between the Avenues of the Americas and Seventh Avenue. By Sunday morning, all 29 people had been released from the hospital. The Chelsea incident came on the heels of an incident Saturday morning in Seaside Heights, New Jersey where a bomb exploded in a trash can along a route where thousands of runners were present to run a 5K Marine Corps charity race. There were no casualties. By Sunday afternoon, law enforcement had learned that the NY and NJ explosives were traced to the same person.\n\nGiven that we are now living in a world where acts of terrorism are increasingly more prevalent, when a bomb goes off, our first thought usually goes to the possibility of terrorism. After all, in the last year alone, we have had several significant incidents with a massive number of casualties and injuries in Paris, San Bernardino California, Orlando Florida and Nice, to name a few. And of course, last week we remembered the 15th anniversary of the September 11, 2001 attacks where close to 3,000 people were killed at the hands of terrorists. However, we also live in a world where political correctness is the order of the day and the fear of being labeled a racist supersedes our natural instincts towards self-preservation which, of course, includes identifying the evil-doers. Isn\u2019t that how crimes are solved? Law enforcement tries to identify and locate the perpetrators of the crime or the \u201cbad guys.\u201d Unfortunately, our leadership \u2013 who ostensibly wants to protect us \u2013 finds their hands and their tongues tied. They are not allowed to be specific about their potential hypotheses for fear of offending anyone.\n\nNew York City Mayor Bill de Blasio \u2013 who famously ended \u201cstop-and-frisk\u201d profiling in his city \u2013 was extremely cautious when making his first remarks following the Chelsea neighborhood explosion. \u201cThere is no specific and credible threat to New York City from any terror organization,\u201d de Blasio said late Saturday at the news conference. \u201cWe believe at this point in this time this was an intentional act. I want to assure all New Yorkers that the NYPD and \u2026 agencies are at full alert\u201d, he said. Isn\u2019t \u201can intentional act\u201d terrorism? We may not know whether it is from an international terrorist group such as ISIS, or a homegrown terrorist organization or a deranged individual or group of individuals. It is still terrorism. It is not an accident. James O\u2019Neill, the New York City Police Commissioner had already ruled out the possibility that the explosion was caused by a natural gas leak at the time the Mayor made his comments. New York\u2019s Governor Andrew Cuomo was a little more direct than de Blasio saying that there was no evidence of international terrorism and that no specific groups had claimed responsibility. However, he did say that it is a question of how the word \u201cterrorism\u201d is defined. \u201cA bomb exploding in New York is obviously an act of terrorism.\u201d Cuomo hit the nail on the head, but why did need to clarify and caveat before making his \u201cobvious\u201d assessment?\n\nThe two candidates for president Hillary Clinton and Donald Trump also weighed in on the Chelsea explosion. Clinton was very generic in her response saying that \u201cwe need to do everything we can to support our first responders \u2013 also to pray for the victims\u201d and that \u201cwe need to let this investigation unfold.\u201d Trump was more direct. \u201cI must tell you that just before I got off the plane a bomb went off in New York and nobody knows what\u2019s going on,\u201d he said. \u201cBut boy we are living in a time\u2014we better get very tough folks. We better get very, very tough. It\u2019s a terrible thing that\u2019s going on in our world, in our country and we are going to get tough and smart and vigilant.\u201d\n\nUnfortunately, an incident like the Chelsea explosion reminds us how vulnerable our country is particularly in venues defined as \u201csoft targets.\u201d Now more than ever, America needs strong leadership which is laser-focused on protecting her citizens from terrorist attacks of all genres and is not afraid of being politically incorrect.\n\nThe views expressed in this opinion article are solely those of their author and are not necessarily either shared or endorsed by EagleRising.com", "authors": ["View All Posts", "Leonora Cravotta"], "keywords": [], "meta_data": {"description": "\u201cWe believe at this point in this time this was an intentional act,\" de Blasio said. Isn\u2019t \u201can intentional act\u201d terrorism?", "og": {"site_name": "Eagle Rising", "description": "\u201cWe believe at this point in this time this was an intentional act,\" de Blasio said. Isn\u2019t \u201can intentional act\u201d terrorism?", "title": "Another Terrorist Attack in NYC...Why Are we STILL Being Politically Correct", "locale": "en_US", "image": "http://eaglerising.com/wp-content/uploads/2016/09/terrorism-2.jpg", "updated_time": "2016-09-22T10:49:05+00:00", "url": "http://eaglerising.com/36942/another-terrorist-attack-in-nyc-why-are-we-still-being-politically-correct/", "type": "article"}, "robots": "noimageindex", "fb": {"app_id": 256195528075351, "pages": 135665053303678}, "article": {"section": "Political Correctness", "tag": "terrorism", "published_time": "2016-09-22T07:10:30+00:00", "modified_time": "2016-09-22T10:49:05+00:00"}, "viewport": "initial-scale=1,maximum-scale=1,user-scalable=no", "googlebot": "noimageindex"}, "canonical_link": "http://eaglerising.com/36942/another-terrorist-attack-in-nyc-why-are-we-still-being-politically-correct/", "images": ["http://constitution.com/wp-content/uploads/2017/08/confederatemonument_poll_pop.jpg", "http://2lv0hm3wvpix464wwy2zh7d1.wpengine.netdna-cdn.com/wp-content/uploads/wordpress-popular-posts/46772-featured-300x130.jpg", "http://2lv0hm3wvpix464wwy2zh7d1.wpengine.netdna-cdn.com/wp-content/uploads/2016/03/eagle-rising-logo3-1.png", "http://2lv0hm3wvpix464wwy2zh7d1.wpengine.netdna-cdn.com/wp-content/uploads/wordpress-popular-posts/46729-featured-300x130.jpg", "http://2lv0hm3wvpix464wwy2zh7d1.wpengine.netdna-cdn.com/wp-content/uploads/wordpress-popular-posts/46764-featured-300x130.jpg", "http://2lv0hm3wvpix464wwy2zh7d1.wpengine.netdna-cdn.com/wp-content/uploads/wordpress-popular-posts/46731-featured-300x130.jpg", "http://pixel.quantserve.com/pixel/p-52ePUfP6_NxQ_.gif", "http://0.gravatar.com/avatar/9b4601287436c60e1c7c5b65d725151f?s=112&d=mm&r=g", "http://b.scorecardresearch.com/p?c1=2&c2=22315475&cv=2.0&cj=1", "http://2lv0hm3wvpix464wwy2zh7d1.wpengine.netdna-cdn.com/wp-content/uploads/wordpress-popular-posts/46784-featured-300x130.png", "http://2lv0hm3wvpix464wwy2zh7d1.wpengine.netdna-cdn.com/wp-content/uploads/2016/09/terrorism-2-800x300.jpg", "http://2lv0hm3wvpix464wwy2zh7d1.wpengine.netdna-cdn.com/wp-content/uploads/2016/09/coup-375x195.jpg", "http://2lv0hm3wvpix464wwy2zh7d1.wpengine.netdna-cdn.com/wp-content/uploads/2017/04/crtv_300x600_1.jpg", "http://2lv0hm3wvpix464wwy2zh7d1.wpengine.netdna-cdn.com/wp-content/uploads/wordpress-popular-posts/46774-featured-300x130.jpg", "http://2lv0hm3wvpix464wwy2zh7d1.wpengine.netdna-cdn.com/wp-content/uploads/2016/09/superstar-375x195.jpg", "http://2lv0hm3wvpix464wwy2zh7d1.wpengine.netdna-cdn.com/wp-content/uploads/wordpress-popular-posts/46763-featured-300x130.jpg", "http://2lv0hm3wvpix464wwy2zh7d1.wpengine.netdna-cdn.com/wp-content/uploads/wordpress-popular-posts/46612-featured-300x130.jpg", "http://2lv0hm3wvpix464wwy2zh7d1.wpengine.netdna-cdn.com/wp-content/uploads/wordpress-popular-posts/46761-featured-300x130.jpg", "http://2lv0hm3wvpix464wwy2zh7d1.wpengine.netdna-cdn.com/wp-content/uploads/wordpress-popular-posts/46642-featured-300x130.jpg", "http://2lv0hm3wvpix464wwy2zh7d1.wpengine.netdna-cdn.com/wp-content/uploads/wordpress-popular-posts/46735-featured-300x130.jpg", "http://2lv0hm3wvpix464wwy2zh7d1.wpengine.netdna-cdn.com/wp-content/uploads/wordpress-popular-posts/46750-featured-300x130.jpg", "http://2lv0hm3wvpix464wwy2zh7d1.wpengine.netdna-cdn.com/wp-content/uploads/wordpress-popular-posts/46755-featured-300x130.jpg", "http://2lv0hm3wvpix464wwy2zh7d1.wpengine.netdna-cdn.com/wp-content/uploads/wordpress-popular-posts/46752-featured-300x130.png", "http://eaglerising.com/wp-content/uploads/2016/09/terrorism-2.jpg", "http://2lv0hm3wvpix464wwy2zh7d1.wpengine.netdna-cdn.com/wp-content/uploads/wordpress-popular-posts/46743-featured-300x130.jpg", "http://2lv0hm3wvpix464wwy2zh7d1.wpengine.netdna-cdn.com/wp-content/uploads/wordpress-popular-posts/46712-featured-300x130.jpg", "http://0.gravatar.com/avatar/9b4601287436c60e1c7c5b65d725151f?s=100&d=mm&r=g", "http://2lv0hm3wvpix464wwy2zh7d1.wpengine.netdna-cdn.com/wp-content/uploads/wordpress-popular-posts/46757-featured-300x130.png"], "title": "Another Terrorist Attack in NYC\u2026Why Are we STILL Being Politically Correct \u2013 Eagle Rising", "url": "http://eaglerising.com/36942/another-terrorist-attack-in-nyc-why-are-we-still-being-politically-correct/", "summary": "", "movies": [], "publish_date": {"$date": 1474528230000}, "source": "http://eaglerising.com"}
import pandas as pd
df = pd.DataFrame.from_dict(x, orient='index')
print df
Reading from JSON file:
import json
import pandas as pd
with open('FakeNewsContent/BuzzFeed_Fake_1-Webpage.json', 'r') as f:
data = json.load(f)
df = pd.DataFrame.from_dict(data, orient='index')
print df

Categories

Resources