Convert a stream of JSONs to CSV - python

I want to convert a stream of JSONs (nearly 10,000) pasted in a file to a CSV file with a particular format for headers and values.
I have the following streams of JSON data :
{"shortUrlClicks":"594","longUrlClicks":"594","countries":[{"count":"125","id":"IQ"},{"count":"94","id":"US"},{"count":"56","id":"TR"},{"count":"50","id":"SA"},{"count":"29","id":"DE"},{"count":"24","id":"TN"},{"count":"20","id":"DZ"},{"count":"14","id":"EG"},{"count":"13","id":"MA"},{"count":"12","id":"PS"}],"browsers":[{"count":"350","id":"Chrome"},{"count":"100","id":"Firefox"},{"count":"46","id":"Safari"},{"count":"35","id":"Mobile"},{"count":"20","id":"Mobile Safari"},{"count":"20","id":"SamsungBrowser"},{"count":"8","id":"MSIE"},{"count":"6","id":"Opera"},{"count":"3","id":"OS;FBSV"},{"count":"2","id":"Maxthon"}],"platforms":[{"count":"227","id":"Android"},{"count":"221","id":"Windows"},{"count":"67","id":"iPhone"},{"count":"30","id":"X11"},{"count":"25","id":"Macintosh"},{"count":"8","id":"iPad"},{"count":"2","id":"Android 4.2.2"},{"count":"1","id":"Android 4.1.2"},{"count":"1","id":"Android 4.3"},{"count":"1","id":"Android 5.0.1"}],"referrers":[{"count":"340","id":"unknown"},{"count":"193","id":"t.co"},{"count":"38","id":"m.facebook.com"},{"count":"12","id":"addpost.it"},{"count":"4","id":"plus.google.com"},{"count":"3","id":"www.facebook.com"},{"count":"1","id":"goo.gl"},{"count":"1","id":"l.facebook.com"},{"count":"1","id":"lm.facebook.com"},{"count":"1","id":"plus.url.google.com"}]}
{"shortUrlClicks":"594","longUrlClicks":"594","countries":[{"count":"125","id":"IQ"},{"count":"94","id":"US"},{"count":"56","id":"TR"},{"count":"50","id":"SA"},{"count":"29","id":"DE"},{"count":"24","id":"TN"},{"count":"20","id":"DZ"},{"count":"14","id":"EG"},{"count":"13","id":"MA"},{"count":"12","id":"PS"}],"browsers":[{"count":"350","id":"Chrome"},{"count":"100","id":"Firefox"},{"count":"46","id":"Safari"},{"count":"35","id":"Mobile"},{"count":"20","id":"Mobile Safari"},{"count":"20","id":"SamsungBrowser"},{"count":"8","id":"MSIE"},{"count":"6","id":"Opera"},{"count":"3","id":"OS;FBSV"},{"count":"2","id":"Maxthon"}],"platforms":[{"count":"227","id":"Android"},{"count":"221","id":"Windows"},{"count":"67","id":"iPhone"},{"count":"30","id":"X11"},{"count":"25","id":"Macintosh"},{"count":"8","id":"iPad"},{"count":"2","id":"Android 4.2.2"},{"count":"1","id":"Android 4.1.2"},{"count":"1","id":"Android 4.3"},{"count":"1","id":"Android 5.0.1"}],"referrers":[{"count":"340","id":"unknown"},{"count":"193","id":"t.co"},{"count":"38","id":"m.facebook.com"},{"count":"12","id":"addpost.it"},{"count":"4","id":"plus.google.com"},{"count":"3","id":"www.facebook.com"},{"count":"1","id":"goo.gl"},{"count":"1","id":"l.facebook.com"},{"count":"1","id":"lm.facebook.com"},{"count":"1","id":"plus.url.google.com"}]}
... and so on.
I want to convert it into this form in CSV with whatever the headers (shortUrlclicks, longUrclicks, etc.) are:
I would be thankful to if you could please help me in the same. Any code in python or any other language would be useful.

You can use JSON library from the Python Standard Library to read the JSON and read/write files using the OS Library (from Python Standard Library too).
It would be something like this:
f = File.open('file.json', 'r')
items = json.loads(f.read())
csv_file = ""
for row in items():
new_row = ""
# get columns somehow
for column in columns:
new_row = "%s," % column
# Finished row, append a '\n' char to the row string.
csv_file.append("%s\n" % new_row)
# write json file
out = File.open('out.csv', 'rw')
out.write(csv_file)
out.close()
PS: I didn't run this code before posting. This is something to you to get an idea.

You can use Pandas to do this for you.
read your JSON file like this
df = pandas.read_json('filename.json')
write to csv
df.to_csv('filename.csv', index=False) # set index false if u don't need it
Example:
http://hayd.github.io/2013/pandas-json
REF:
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_json.html

Related

Pandas Dataframe.to_csv - insert value of variable into beginning of csv file

Python 3.8.5 Pandas 1.1.3
I'm using the following to loop through json files and create csv files:
import os
import glob
impot pandas as pd
def stuff():
results_list = []
for filepath in glob.iglob('/Users/me/data/*.json'):
filename = str(filepath)
file = open(filepath,"r")
data = file.read()
df = pd.json_normalize(data, 'main')
df.to_csv(filename + '.csv')
file.close()
results_list.append(data)
return results_list
The format of the resulting csv files fits my requirements exactly without having to pass any additional params to the to_csv method - when viewing the csv file in Excel, row 1 is the keys as the headers, and column 1 is the index numbers. Exactly what I need. Cell A1 is blank.
One final step that I need to accomplish is to write the filename variable value to the csv file. Ideally I'd like to put it in cell A1, if possible. Can I accomplish this solely with to_csv or am I going to need to get into csv.writer world?
You can exploit the index name for that purpose:
df.rename_axis('somename').to_csv()

Grab values from seperate csv file and replace the values of columns in a pipe delimited file

Trying to whip this out in python. Long story short I got a csv file that contains column data i need to inject into another file that is pipe delimited. My understanding is that python can't replace values, so i have to re-write the whole file with the new values.
data file(csv):
value1,value2,iwantthisvalue3
source file(txt, | delimited)
value1|value2|iwanttoreplacethisvalue3|value4|value5|etc
fixed file(txt, | delimited)
samevalue1|samevalue2| replacedvalue3|value4|value5|etc
I can't figure out how to accomplish this. This is my latest attempt(broken code):
import re
import csv
result = []
row = []
with open("C:\data\generatedfixed.csv","r") as data_file:
for line in data_file:
fields = line.split(',')
result.append(fields[2])
with open("C:\data\data.txt","r") as source_file, with open("C:\data\data_fixed.txt", "w") as fixed_file:
for line in source_file:
fields = line.split('|')
n=0
for value in result:
fields[2] = result[n]
n=n+1
row.append(line)
for value in row
fixed_file.write(row)
I would highly suggest you use the pandas package here, it makes handling tabular data very easy and it would help you a lot in this case. Once you have installed pandas import it with:
import pandas as pd
To read the files simply use:
data_file = pd.read_csv("C:\data\generatedfixed.csv")
source_file = pd.read_csv('C:\data\data.txt', delimiter = "|")
and after that manipulating these two files is easy, I'm not exactly sure how many values or which ones you want to replace, but if the length of both "iwantthisvalue3" and "iwanttoreplacethisvalue3" is the same then this should do the trick:
source_file['iwanttoreplacethisvalue3'] = data_file['iwantthisvalue3]
now all you need to do is save the dataframe (the table that we just updated) into a file, since you want to save it to a .txt file with "|" as the delimiter this is the line to do that (however you can customize how to save it in a lot of ways):
source_file.to_csv("C:\data\data_fixed.txt", sep='|', index=False)
Let me know if everything works and this helped you. I would also encourage to read up (or watch some videos) on pandas if you're planning to work with tabular data, it is an awesome library with great documentation and functionality.

Python: How to create a new dataframe with first row when a specific value

I am reading csv files into python using:
df = pd.read_csv(r"C:\csvfile.csv")
But the file has some summary data, and the raw data start if a value "valx" is found. If "valx" is not found then the file is useless. I would like to create news dataframes that start when "valx" is found. I have been trying for a while with no success. Any help on how to achieve this is greatly appreciated.
Unfortunately, pandas only accepts skiprows for rows to skip in the beginning. You might want to parse the file before creating the dataframe.
As an example:
import csv
with open(r"C:\csvfile.csv","r") as f:
lines = csv.reader(f, newline = '')
if any('valx' in i for i in lines):
data = lines
Using the Standard Libary csv module, you can read file and check if valx is in the file, if it is found, the content will be returned in the data variable.
From there you can use the data variable to create your dataframe.

How would I transfer CSV "words" into Python as strings

So I am quite the beginner in Python, but what I'm trying to do is to download each CSV file for the NYSE. In an excel file I have every symbol. The Yahoo API allows you to download the CSV file by adding the symbol to the base url.
My first instinct was to use pandas, but pandas doesn't store strings.
So what I have
import urllib
strs = ["" for x in range(3297)]
#strs makes the blank string array for each symbol
#somehow I need to be able to iterate each symbol into the blank array spots
while y < 3297:
strs[y] = "symbol of each company from csv"
y = y+1
#loop for downloading each file from the link with strs[y].
while i < 3297:
N = urllib.URLopener()
N.retrieve('http://ichart.yahoo.com/table.csv?s='strs[y],'File')
i = i+1
Perhaps the solution is simpler than what I am doing.
From what I can see in this question you can't see how to connect your list of stock symbols to how you read the data in Pandas. e.g. 'http://ichart.yahoo.com/table.csv?s='strs[y] is not valid syntax.
Valid syntax for this is
pd.read_csv('http://ichart.yahoo.com/table.csv?s={}'.format(strs[y]))
It would be helpful if you could add a few sample lines from your csv file to the question. Guessing at your structure you would do something like:
import pandas as pd
symbol_df = pd.read_csv('path_to_csv_file')
for stock_symbol in symbol_df.symbol_column_name:
df = pd.read_csv('http://ichart.yahoo.com/table.csv?s={}'.format(stock_symbol))
# process your dataframe here
Assuming you take that Excel file w/ the symbols and output as a CSV, you can use Python's built-in CSV reader to parse it:
import csv
base_url = 'http://ichart.yahoo.com/table.csv?s={}'
reader = csv.reader(open('symbols.csv'))
for row in reader:
symbol = row[0]
data_csv = urllib.urlopen(base_url.format(symbol)).read()
# save out to file, or parse with CSV library...

Dynamically read csv files python pandas

I would like to iteratively read data from a set of csv files in a for loop.
The csv files are named (1.csv, 2.csv and so on)
The normal way to read the data will be
data = pd.read_csv('1.csv')
Please can someone suggest how to replace 1 by i when using a for loop.
I tried data = pd.read_csv(i+'.csv') and data = pd.read_csv(i'.csv') but they did not work.
Use either percent formatting
pd.read_csv('%d.csv' % i)
or format
pd.read_csv('{0}.csv'.format(i))
make a separate string?
if i is indeed an integer, using:
filename = str(i) + '.csv'
data=pd.read_csv(filename)
or even:
data=pd.read_csv(str(i)+'.csv')
should be fine :)

Categories

Resources