Python convert dictionary to CSV - python

I am trying to convert dictionary to CSV so that it is readable (in their respective key).
import csv
import json
from urllib.request import urlopen
x =0
id_num = [848649491, 883560475, 431495539, 883481767, 851341658, 42842466, 173114302, 900616370, 1042383097, 859872672]
for bilangan in id_num:
with urlopen("https://shopee.com.my/api/v2/item/get?itemid="+str(bilangan)+"&shopid=1883827")as response:
source = response.read()
data = json.loads(source)
#print(json.dumps(data, indent=2))
data_list ={ x:{'title':productName(),'price':price(),'description':description(),'preorder':checkPreorder(),
'estimate delivery':estimateDelivery(),'variation': variation(), 'category':categories(),
'brand':brand(),'image':image_link()}}
#print(data_list[x])
x =+ 1
i store the data in x, so it will be looping from 0 to 1, 2 and etc. i have tried many things but still cannot find a way to make it look like this or close to this:
https://i.stack.imgur.com/WoOpe.jpg

Using DictWriter from the csv module
Demo:
import csv
data_list ={'x':{'title':'productName()','price':'price()','description':'description()','preorder':'checkPreorder()',
'estimate delivery':'estimateDelivery()','variation': 'variation()', 'category':'categories()',
'brand':'brand()','image':'image_link()'}}
with open(filename, "w") as infile:
writer = csv.DictWriter(infile, fieldnames=data_list["x"].keys())
writer.writeheader()
writer.writerow(data_list["x"])

I think, maybe you just want to merge some cells like excel do?
If yes, I think this is not possible in csv, because csv format does not contain cell style information like excel.
Some possible solutions:
use openpyxl to generate a excel file instead of csv, then you can merge cells with "worksheet.merge_cells()" function.
do not try to merge cells, just keep title, price and other fields for each line, the data format should be like:
first line: {'title':'test_title', 'price': 22, 'image': 'image_link_1'}
second line: {'title':'test_title', 'price': 22, 'image': 'image_link_2'}
do not try to merge cells, but set the title, price and other fields to a blank string, so it will not show in your csv file.
use line break to control the format, that will merge multi lines with same title into single line.
hope that helps.

If I were you, I would have done this a bit differently. I do not like that you are calling so many functions while this website offers a beautiful JSON response back :) More over, I will use pandas library so that I have total control over my data. I am not a CSV lover. This is a silly prototype:
import requests
import pandas as pd
# Create our dictionary with our items lists
data_list = {'title':[],'price':[],'description':[],'preorder':[],
'estimate delivery':[],'variation': [], 'categories':[],
'brand':[],'image':[]}
# API url
url ='https://shopee.com.my/api/v2/item/get'
id_nums = [848649491, 883560475, 431495539, 883481767, 851341658,
42842466, 173114302, 900616370, 1042383097, 859872672]
shop_id = 1883827
# Loop throw id_nums and return the goodies
for id_num in id_nums:
params = {
'itemid': id_num, # take values from id_nums
'shopid':shop_id}
r = requests.get(url, params=params)
# Check if we got something :)
if r.ok:
data_json = r.json()
# This web site returns a beautiful JSON we can slice :)
product = data_json['item']
# Lets populate our data_list with the items we got. We could simply
# creating one function to do this, but for now this will do
data_list['title'].append(product['name'])
data_list['price'].append(product['price'])
data_list['description'].append(product['description'])
data_list['preorder'].append(product['is_pre_order'])
data_list['estimate delivery'].append(product['estimated_days'])
data_list['variation'].append(product['tier_variations'])
data_list['categories'].append([product['categories'][i]['display_name'] for i, _ in enumerate(product['categories'])])
data_list['brand'].append(product['brand'])
data_list['image'].append(product['image'])
else:
# Do something if we hit connection error or something.
# may be retry or ignore
pass
# Putting dictionary to a list and ordering :)
df = pd.DataFrame(data_list)
df = df[['title','price','description','preorder','estimate delivery',
'variation', 'categories','brand','image']]
# df.to ...? There are dozen of different ways to store your data
# that are far better than CSV, e.g. MongoDB, HD5 or compressed pickle
df.to_csv('my_data.csv', sep = ';', encoding='utf-8', index=False)

Related

How to turn multiple rows of dictionaries from a file into a dataframe

I have a script that I use to fire orders from a csv file, to an exchange using a for loop.
data = pd.read_csv('orderparameters.csv')
df = pd.DataFrame(data)
for i in range(len(df)):
order = Client.new_order(...
...)
file = open('orderData.txt', 'a')
original_stdout = sys.stdout
with file as f:
sys.stdout = f
print(order)
file.close()
sys.stdout = original_stdout
I put the response from the exchange in a txt file like this...
I want to turn the multiple responses into 1 single dataframe. I would hope it would look something like...
(I did that manually).
I tried;
data = pd.read_csv('orderData.txt', header=None)
dfData = pd.DataFrame(data)
print(dfData)
but I got;
I have also tried
data = pd.read_csv('orderData.txt', header=None)
organised = data.apply(pd.Series)
print(organised)
but I got the same output.
I can print order['symbol'] within the loop etc.
I'm not certain whether I should be populating this dataframe within the loop, or by capturing and writing the response and processing it afterwards. Appreciate your advice.
It looks like you are getting json strings back, you could read json objects into dictionaries and then create a dataframe from that. Perhaps try something like this (no longer needs a file)
data = pd.read_csv('orderparameters.csv')
df = pd.DataFrame(data)
response_data = []
for i in range(len(df)):
order_json = Client.new_order(...
...)
response_data.append(eval(order_json))
response_dataframe = pd.DataFrame(response_data)
If I understand your question correctly, you can simply do the following:
import pandas as pd
orders = pd.read_csv('orderparameters.csv')
responses = pd.DataFrame(Client.new_order(...) for _ in range(len(orders)))

API causing multiple JSON arrays in file

I am writing an API that needs to be able to call a list of URLs from a file and run the URLs through the code. Which I have working. The only down side is now I have a single JSON file with multiple JSON arrays in it and can not get it to convert to a CSV. Any help greatly appreciated.
import requests
import json
import pandas as pd
import csv
from pathlib import Path
Path("/test2/test2").mkdir(parents=True, exist_ok=True)
links = pd.read_csv('file.csv')
test = []
for url in links:
response = requests.get(url, headers={'CERT': 'cert'}).json()
test.append(response[:])
json2 = json.dumps(test)
f = open('/test2/test2/data.json','a')
f.write(json2)
f.close()
df = pd.read_json('/test2/test2/data.json', lines=True)
df.to_csv('/test2/test2/data.csv')
df = pd.read_csv('/test2/test2/data.csv')
test = df['ID']
test2 = df['Code']
test3 = df['Name']
header=['ID', 'Code', 'Name']
df.to_csv('/test2/test2/test.csv', columns = header)
Ive tried to include coding such as json3 = json2.replace('}][{', '}, {') as well as trying to
testList = []
with open('/test2/test2/data.json') as f:
for jsonObj in f:
testDict = json.loads(jsonObj)
testList.append(testDict)
And have had no luck. With this I got past the I mean technically I can open the file in notepad and change }][{ to }, { but I would like to do it programmatically as this would be an automated API. Any help greatly appreciated.
EDIT:
Sample Output:
[{"ID": 5, "OldID": 1, "Code": 5, "Name": "Jeff"}][{"ID": 2, "OldID": 4, "Code": 0, "Name": "James"}]
Thats a scrubbed down version. The ouput is going into one line, which works fine running the code with just one URL, but with two URLs it gives the issue. For some reason I can not get replace to correct the ']['. Running one URL having it as a list/array [] doesn't bother the conversion its just the start of the new list/array that does.
I never could get this to not combine JSON arrays in the file, but I was able to sort those arrays into one JSON object so that I could convert and parse the file. Here was the code that did it's magic and completed my process.
with fileinput.FileInput('data.json', inplace=True, backup='.bak') as file:
for line in file:
print(line.replace('][', ', '), end='')
When I added this in between my first JSON write and my pandas call to read the file it was able to find the troublemaking characters and remove it.

webjobs json data to pandas dataframe

I am trying to read azure webjobs services json data for logs using REST API, I am able to get the data in dataframe with columns, but I need lastrun column (one of the column) data to be available in tabular format, where data available in key:value format as shown below picture
Example:
latest_run
0,"{'id': '202011160826295419', 'name': '202011160826295419','job_name': 'failjob','}"
1,"{'id': '202011160826295419', 'name': '202011160826295419','job_name': 'passjob','}"
now I want to display all id, job_name in a data frame format, any help please thanks in advance
Below is my code
data = response.json()
# print(data)
df = pd.read_json(json.dumps(data), orient='records')
# df = json.loads(json.dumps(data))
df = pd.DataFrame(df)
df = df["latest_run"]
df.to_csv('file1.csv')
print(df)
Data:
First things first, your JSON is not formatted properly (it isn't correct JSON). There is an extra opening quote at the end, and JSON should have double quotes all over. Pretending it's correct JSON for now, this is how you could load it:
# NOTE: You can also open a CSV file directly
import io
csv_content = """latest_run
0,"{'id': '202011160826295419', 'name': '202011160826295419','job_name': 'failjob'}"
1,"{'id': '202011160826295419', 'name': '202011160826295419','job_name': 'passjob'}"
"""
csv_file = io.StringIO(csv_content)
import csv
import json
import pandas
# Create a CSV reader (this can also be a file using 'open("myfile.csv", "r")' )
reader = csv.reader(csv_file, delimiter=",")
# Skip the first line (header)
next(reader)
# Load the rest of the data
data = [row for row in reader]
# Create a dataframe, loading the JSON as it goes in
df = pandas.DataFrame([json.loads(row[1].replace("'", "\"")) for row in data], index=[int(row[0]) for row in data])

Python Response API JSON to CSV table

Bellow you see my code that I use to collect some data via the API of IBM. However I have some problems with saving the output via python to a csv table.
These are the columns that I want (and their values):
emotion__document__emotion__anger emotion__document__emotion__joy
emotion__document__emotion__sadness emotion__document__emotion__fear
emotion__document__emotion__disgust sentiment__document__score
sentiment__document__label language entities__relevance
entities__text entities__type entities__count concepts__relevance
concepts__text concepts__dbpedia_resource usage__text_characters
usage__features usage__text_units retrieved_url
This is my code that I use to collect the data:
response = natural_language_understanding.analyze(
url=url,
features=[
Features.Emotion(),
Features.Sentiment(),
Features.Concepts(limit=1),
Features.Entities(limit=1)
]
)
data = json.load(response)
rows_list = []
cols = []
for ind,row in enumerate(data):
if ind == 0:
cols.append(["usage__{}".format(i) for i in row["usage"].keys()])
cols.append(["emotion__document__emotion__{}".format(i) for i in row["emotion"]["document"]["emotion"].keys()])
cols.append(["sentiment__document__{}".format(i) for i in row["sentiment"]["document"].keys()])
cols.append(["concepts__{}".format(i) for i in row["concepts"].keys()])
cols.append(["entities__{}".format(i) for i in row["entities"].keys()])
cols.append(["retrieved_url"])
d = OrderedDict()
d.update(row["usage"])
d.update(row["emotion"]["document"]["emotion"])
d.update(row["sentiment"]["document"])
d.update(row["concepts"])
d.update(row["entities"])
d.update({"retrieved_url":row["retrieved_url"]})
rows_list.append(d)
df = pd.DataFrame(rows_list)
df.columns = [i for subitem in cols for i in subitem]
df.to_csv("featuresoutput.csv", index=False)
Changing
cols.append(["concepts__{}".format(i) for i in row["concepts"][0].keys()])
cols.append(["entities__{}".format(i) for i in row["entities"][0].keys()])
Did not solved the problem
If you get it from an API, the response would be in json format. You can output it into a csv by:
import csv, json
response = the json response you get from the API
attributes = [emotion__document__emotion__anger, emotion__document__emotion__joy.....attributes you want]
data = json.load(response)
with open('output.csv', 'w') as f:
writer = csv.writer(f, delimiter=',')
for attribute in attributes:
writer.writerow(data[attribute][0])
f.close()
make sure data is in dict but not string, Python 3.6 should return a dict. Print out a few rows to look into how your required data is stored.
This line assigns a string to data:
data=(json.dumps(datas, indent=2))
So here you iterate over the characters of a string:
for ind,row in enumerate(data):
In this case row will be a string, and not a dictionary. So, for example, row["usage"] would give you such an error in this case.
Maybe you wanted to iterate over datas?
Update
The code has a few other issues, such as:
cols.append(["concepts__{}".format(i) for i in row["concepts"].keys()])
In this case, you would want row["concepts"][0].keys() to get the keys of the first element, because row["concepts"] is an array.
I'm not very familiar with pandas, but I would suggest you to look at json_normalize, included in pandas, which can help flatten the JSON structure. An issue you might face, is the concepts and entities, which contain arrays of documents. That means that you would have to include the same document, at least max(len(concepts), len(entities)) times.

Python - Selecting Specific Results to Place in Excel

EDIT: I'm making a lot of head way, I am now trying to parse out single columns from my JSON file, not the entire row. I am getting an error however whenever I try to manipulate my DataFrame to get the results I want.
The error is:
line 52, in
df = pd.DataFrame.from_dict(mlbJson['stats_sortable_player']['queryResults']['name_display_first_last'])
KeyError: 'name_display_first_last'
It only happens when I try to add another parameter, for instance i took out ['row'] and added ['name_display_first_last'] to get the first and last name of each player. If I leave in ['row'] it compiles, but gives me all the data, I only want certain snippets.
Any help would be greatly appreciated! Thanks.
import requests
import pandas as pd
from bs4 import BeautifulSoup
# Scraping data from MLB.com
target = [MLB JSON][1]
mlbResponse = requests.get(target)
mlbJson = mlbResponse.json()
# Placing response in variable
# Collecting data and giving it names in pandas
data = {'Team': team, 'Line': line}
# places data table format, frames the data
table = pd.DataFrame(data)
# Creates excel file named Scrape
writer = pd.ExcelWriter('Scrape.xlsx')
# Moves table to excel taking in Parameters , 'Name of DOC and Sheet on that Doc'
table.to_excel(writer, 'Lines 1')
#stats = {'Name': name, 'Games': games, 'AtBats': ab, 'Runs': runs, 'Hits': hits, 'Doubles': doubles, 'Triples': triples, 'HR': hr, 'RBI': rbi, 'Walks': walks, 'SB': sb}
df = pd.DataFrame.from_dict(mlbJson['stats_sortable_player']['queryResults']['row'])
df.to_excel(writer, 'Batting 2')
# Saves File
writer.save()
It looks like the website loads the data asynchronously through another request to a different URL. The response you're getting has empty <datagrid><\datagrid> tag, and soup2.select_one("#datagrid").find_next("table") returns None.
You can use the developer tools in your browser under the network tab to find the URL to actually load data, it looks like :
http://mlb.mlb.com/pubajax/wf/flow/stats.splayer?season=2016&sort_order=%27desc%27&sort_column=%27avg%27&stat_type=hitting&page_type=SortablePlayer&game_type=%27R%27&player_pool=ALL&season_type=ANY&sport_code=%27mlb%27&results=1000&recSP=1&recPP=50
You can modify your code to make a request to this URL, which returns json
mlbResponse = requests.get(url)
mlbJson = mlbResponse.json() # python 3, otherwise use json.loads(mlbResponse.content)
df = pd.DataFrame(doc['stats_sortable_player']['queryResults']['row'])
The DataFrame has 54 columns, so I can't display it here, but you should be able to pick and rename the columns you need.

Categories

Resources