convert the following json to csv using python - python

{"a":"1","b":"1","c":"1"}
{"a":"2","b":"2","c":"2"}
{"a":"3","b":"3","c":"3"}
{"a":"4","b":"4","c":"4"}
I have tried the following code but it gives error:-
from nltk.twitter import Twitter
from nltk.twitter.util import json2csv
with open('C:/Users/Archit/Desktop/raw_tweets.json', 'r') as infile:
# Variable for building our JSON block
json_block = []
for line in infile:
# Add the line to our JSON block
json_block.append(line)
# Check whether we closed our JSON block
if line.startswith('{'):
# Do something with the JSON dictionary
json2csv(json_block, 'tweets.csv', ['id','text','created_at','in_reply_to_user_id','in_reply_to_screen_name','in_reply_to_status_id','user.id','user.screen_name','user.name','user.location','user.friends_count','user.followers_count','source'])
# Start a new block
json_block = []
Error:
File "C:\Python34\lib\json\decoder.py", line 361, in raw_decode
raise ValueError(errmsg("Expecting value", s, err.value)) from None
ValueError: Expecting value: line 1 column 1 (char 0)

import csv, json
data = []
with open('C:\Users\Shahriar\Desktop\T.txt') as data_file:
for line in data_file:
data.append(json.loads(line))
keys = data[0].keys()
with open('data.csv', 'wb') as csvF:
csvWriter = csv.DictWriter(csvF, fieldnames=keys)
csvWriter.writeheader()
for d in data:
csvWriter.writerow(d)
Output:
a,c,b
1,1,1
2,2,2
3,3,3
4,4,4

This is way too late but I also stumbled upon some errors today. I figured that you actually have to import from nltk.twitter.common instead of util. Hope this helps others who stumbled upon this thread

# Read json
filename = 'C:/Users/Archit/Desktop/raw_tweets.json'
lines = [line.replace("{", "").replace("}", "").replace(":", ",") for line in open(filename)]
# Write csv
with open('out.csv', 'w') as csv_file:
for line in lines:
csv_file.write("%s\n" % line)

Related

json.decoder.JSONDecodeError: Extra data: line 1 column 139811 (char 139810)

What I'm trying to do here is to open values.json (which is https://www.rolimons.com/itemapi/itemdetails) but I get error on
values = json.load(file)
r = requests.get("https://www.rolimons.com/itemapi/itemdetails")
with open("values.json", "r+") as f:
f.write(r.text)
f.close()
file = open("values.json")
values = json.load(file)
This seems to work:
import requests
import json
req = requests.get("https://www.rolimons.com/itemapi/itemdetails")
values = json.loads(req.content)
print(values)

Reading and writing a csv file Python

I just started learning Python, and I am trying to do the following:
- Read a .csv file
- Write the filtered data in a new file where the column 7 is not blank/empty
When I am printing my results, it shows the right output in the python shelf, but when I am checking my data in the .csv is no correct (differs from what is showing with the print function)
Any suggestion with my code?
Thank you in advance.
file = open("station.csv", "r")
writeFile = open("stations-filtered.csv", "w")
for line in file:
line2 = line.split(",")
if line2[7] != "":
print(line)
writeFile.write(line)
I agree with #user513093 that you can use csv, like:
file = open("station.csv", "r")
writeFile = open("stations-filtered.csv", "w")
writer = csv.writer(writeFile, delimiter=',')
for line in file:
line2 = line.split(",")
if line2[7] != "":
print(line)
writer.writerow(line)
But still, pandas is good:
import pandas as pd
file = pd.read_csv("station.csv", sep=",", header=None)
file = file[file[7] != ""]
file.to_csv("stations-filtered.csv")

Print csv to console

I am reading in a csv file with ten lines to transfer to JSON, console output as below. Python code attached. First step is to print csv data to console and error below occurring.
Syntax errors were occurring but after fixing them this error has began.
data = {}
with open(csvFilePath) as csvFile:
csvReader = csv.DictReader(csvFile)
for csvRow in csvReader:
hmid = csvRow["hmid"]
data[hmid] = csvRow
Console output:
python csvjson.py
Traceback (most recent call last):
File "csvjson.py", line 12, in <module>
hmid = csvRow["hmid"]
KeyError: 'hmid'
Expected Output:
Prints out the CSV data to conole.
The KeyError exception means that the key you are requesting does not exist in that dictionary.
If that column "hmid" does not exist in every row of the csv, consider using the dict.get() method. This will return None if the key does not exist in the dictionary instead of the KeyError.
Alternatively you can catch that KeyError and skip the row. That would look something like this.
data = {}
with open(csvFilePath) as csvFile:
csvReader = csv.DictReader(csvFile)
for csvRow in csvReader:
try:
data[csvRow["hmid"]] = csvRow
except KeyError:
pass
Or check if the key is in the dictionary before proceeding.
data = {}
with open(csvFilePath) as csvFile:
csvReader = csv.DictReader(csvFile)
for csvRow in csvReader:
if "hmid" not in csvRow.keys():
continue
data[csvRow["hmid"]] = csvRow
I made the following file 'test.csv' :
hmid,first_name,last_name,email,gender,passport_number,departure_city,arrival_city,aircraft_type
1,Lotstring,Duobam,anatwick0#samsung.com,Female,7043833787,Changtang,Tours,B737
2,Rover,Red,rr#nowhere.com,Female,7043833787,Changtang,Tours,B737
pasted your code into Python 2.7, and it worked fine. data has two rows.
Maybe your file had an issue with terminators.
CSV files have a BOM (byte order mark) at the beginning of the file, so when you open the file you need to specify encoding='utf-8-sig' on the file open. Here is your code, corrected:
data = {}
with open(csvFilePath, encoding='utf-8-sig') as csvFile:
csvReader = csv.DictReader(csvFile)
for csvRow in csvReader:
hmid = csvRow["hmid"]
data[hmid] = csvRow

read from line to line yelp dataset by python

I want to change this code to specifically read from line 1400001 to 1450000. What is modification?
file is composed of a single object type, one JSON-object per-line.
I want also to save the output to .csv file. what should I do?
revu=[]
with open("review.json", 'r',encoding="utf8") as f:
for line in f:
revu = json.loads(line[1400001:1450000)
If it is JSON per line:
revu=[]
with open("review.json", 'r',encoding="utf8") as f:
# expensive statement, depending on your filesize this might
# let you run out of memory
revu = [json.loads(s) for s in f.readlines()[1400001:1450000]]
if you do it on the /etc/passwd file it is easy to test (no json of course, so that is left out)
revu = []
with open("/etc/passwd", 'r') as f:
# expensive statement
revu = [s for s in f.readlines()[5:10]]
print(revu) # gives entry 5 to 10
Or you iterate over all lines, saving you from memory issues:
revu = []
with open("...", 'r') as f:
for i, line in enumerate(f):
if i >= 1400001 and i <= 1450000:
revu.append(json.loads(line))
# process revu
To CSV ...
import pandas as pd
import json
def mylines(filename, _from, _to):
with open(filename, encoding="utf8") as f:
for i, line in enumerate(f):
if i >= _from and i <= _to:
yield json.loads(line)
df = pd.DataFrame([r for r in mylines("review.json", 1400001, 1450000)])
df.to_csv("/tmp/whatever.csv")

Python read in file: ERROR: line contains NULL byte

I would like to parse an .ubx File(=my input file). This file contains many different NMEA sentences as well as raw receiver data. The output file should just contain informations out of GGA sentences. This works fine as far as the .ubx File does not contain any raw messages. However if it contains raw data
I get the following error:
Traceback (most recent call last):
File "C:...myParser.py", line 25, in
for row in reader:
Error: line contains NULL byte
My code looks like this:
import csv
from datetime import datetime
import math
# adapt this to your file
INPUT_FILENAME = 'Rover.ubx'
OUTPUT_FILENAME = 'out2.csv'
# open the input file in read mode
with open(INPUT_FILENAME, 'r') as input_file:
# open the output file in write mode
with open(OUTPUT_FILENAME, 'wt') as output_file:
# create a csv reader object from the input file (nmea files are basically csv)
reader = csv.reader(input_file)
# create a csv writer object for the output file
writer = csv.writer(output_file, delimiter=',', lineterminator='\n')
# write the header line to the csv file
writer.writerow(['Time','Longitude','Latitude','Altitude','Quality','Number of Sat.','HDOP','Geoid seperation','diffAge'])
# iterate over all the rows in the nmea file
for row in reader:
if row[0].startswith('$GNGGA'):
time = row[1]
# merge the time and date columns into one Python datetime object (usually more convenient than having both separately)
date_and_time = datetime.strptime(time, '%H%M%S.%f')
date_and_time = date_and_time.strftime('%H:%M:%S.%f')[:-6] #
writer.writerow([date_and_time])
My .ubx file looks like this:
$GNGSA,A,3,16,25,29,20,31,26,05,21,,,,,1.30,0.70,1.10*10
$GNGSA,A,3,88,79,78,81,82,80,72,,,,,,1.30,0.70,1.10*16
$GPGSV,4,1,13,02,08,040,17,04,,,47,05,18,071,44,09,02,348,24*49
$GPGSV,4,2,13,12,03,118,24,16,12,298,36,20,15,118,30,21,44,179,51*74
$GPGSV,4,3,13,23,06,324,35,25,37,121,47,26,40,299,48,29,60,061,49*73
$GPGSV,4,4,13,31,52,239,51*42
$GLGSV,3,1,10,65,07,076,24,70,01,085,,71,04,342,34,72,13,029,35*64
$GLGSV,3,2,10,78,35,164,41,79,75,214,48,80,34,322,46,81,79,269,49*64
$GLGSV,3,3,10,82,28,235,52,88,39,043,43*6D
$GNGLL,4951.69412,N,00839.03672,E,124610.00,A,D*71
$GNGST,124610.00,12,,,,0.010,0.010,0.010*4B
$GNZDA,124610.00,03,07,2016,00,00*79
µb<  ¸½¸Abð½ . SB éF é v.¥ # 1 f =•Iè ,
Ïÿÿ£Ëÿÿd¡ ¬M 0+ùÿÿ³øÿÿµj #ª ² -K*
,¨ , éºJU /) ++ f 5 .lG NL C8G /{; „> é óK 3 — Bòl . "¿ 2 bm¡
4âH ÐM X cRˆ 35 »7 Óo‡ž "*ßÿÿØÜÿÿUhQ`
3ŒðÿÿÂïÿÿþþûù ÂÈÿÿñÅÿÿJX ES
$²I uM N:w (YÃÿÿV¿ÿÿ> =ìî 1¥éÿÿèÿÿmk³m /?ÔÿÿÒÿÿšz+Ú ­Ïÿÿ6ÍÿÿêwÇ\ ? ]? ˜B Aÿƒ y µbÐD‹lçtæ#p3,}ßœŒ-vAh
¿M"A‚UE ôû JQý
'wA´üát¸jžAÀ‚"Å
)DÂï–ŽtAöÙüñÅ›A|$Å ôû/ Ìcd§ÇørA†áãì˜AØY–Ä ôû1 /Áƒ´zsAc5+_’ô™AìéNÅ ôû( ¶y(,wvAFøÈV§ƒA˜ÝwE ôû$ _S R‰wAhÙ]‘ÑëžAÇ9Å vwAòܧsAŒöƒd§Ò™AÜOÄ ôû3 kœÕ}vA;D.ž‡žAÒûàÄ #ˆ" ϬŸ ntAfˆÞ3ךA~Y2E ôû3 :GVtAæ93l)ÆšAß yE ôû4 Uþy.TwA<âƒ' ¦žAhmëC ôû" ¯4Çï ›wAþ‰Ì½6ŸAŠû¶D ~~xI]tA<ÞÿrÁšAmHE ôû/ ÖÆ#ÈgŸsAXnþ‚†4šA'0tE ôû. ·ÈO:’
sA¢B†i™Aë%
E ôû/ >Þ,À8vA°‚9êœA>ÇD ôû, ø(¼+çŠuAÆOÁ לAÈΆD
ôû# ¨Ä-_c¯qAuÓ?]> —AÐкà ôû0 ÆUV¨ØZsA]ðÛñß™AÛ'Å ôû, ™mv7žqAYÐ:›Ä‘—AdWxD ôû1 ûö>%vA}„
ëV˜A.êbE
AÝ$GNRMC,124611.00,A,4951.69413,N,00839.03672,E,0.009,,030716,,,D*62
$GNVTG,,T,,M,0.009,N,0.016,K,D*36
$GNGNS,124611.00,4951.69413,N,00839.03672,E,RR,15,0.70,162.5,47.6,1.0,0000*42
$GNGGA,124611.00,4951.69413,N,00839.03672,E,4,12,0.70,162.5,M,47.6,M,1.0,0000*6A
$GNGSA,A,3,16,25,29,20,31,26,05,21,,,,,1.31,0.70,1.10*11
$GNGSA,A,3,88,79,78,81,82,80,72,,,,,,1.31,0.70,1.10*17
$GPGSV,4,1,13,02,08,040,18,04,,,47,05,18,071,44,09,02,348,21*43
$GPGSV,4,2,13,12,03,118,24,16,
I already searched for similar problems. However I was not able to find a solution which workes for me.
I ended up with code like that:
import csv
def unfussy_reader(csv_reader):
while True:
try:
yield next(csv_reader)
except csv.Error:
# log the problem or whatever
print("Problem with some row")
continue
if __name__ == '__main__':
#
# Generate malformed csv file for
# demonstration purposes
#
with open("temp.csv", "w") as fout:
fout.write("abc,def\nghi\x00,klm\n123,456")
#
# Open the malformed file for reading, fire up a
# conventional CSV reader over it, wrap that reader
# in our "unfussy" generator and enumerate over that
# generator.
#
with open("Rover.ubx") as fin:
reader = unfussy_reader(csv.reader(fin))
for n, row in enumerate(reader):
fout.write(row[0])
However I was not able to simply write a file containing just all the rows read in with the unfuss_reader wrapper using the above code.
Would be glad if you could help me.
Here is an Image of how the .ubx file looks in notepad++image
Thanks!
I am not quite sure but your file looks pretty binary. You should try to open it as such
with open(INPUT_FILENAME, 'rb') as input_file:
It seems like you did not open the file with correct coding format.
So the raw message cannot be read correctly.
If it is encoded as UTF8, you need to open the file with coding option:
with open(INPUT_FILENAME, 'r', newline='', encoding='utf8') as input_file
Hey if anyone else has this proglem to read in NMEA sentences of uBlox .ubx files
this pyhton code worked for me:
def read_in():
with open('GNGGA.txt', 'w') as GNGGA:
with open('GNRMC.txt','w') as GNRMC:
with open('rover.ubx', 'rb') as f:
for line in f:
#print line
if line.startswith('$GNGGA'):
#print line
GNGGA.write(line)
if line.startswith('$GNRMC'):
GNRMC.write(line)
read_in()
You could also use the gnssdump command line utility which is installed with the PyGPSClient and pygnssutils Python packages.
e.g.
gnssdump filename=Rover.ubx msgfilter=GNGGA
See gnssdump -h for help.
Alternatively if you want a simple Python script you could use the pyubx2 Python package, e.g.
from pyubx2 import UBXReader
with open("Rover.ubx", "rb") as stream:
ubr = UBXReader(stream)
for (_, parsed_data) in ubr.iterate():
if parsed_data.identity in ("GNGGA", "GNRMC"):
print(parsed_data)

Categories

Resources