I am working on a web scraping project, I am trying to save the results into a CSV file,
data = {
'address_area': area,
'address_district': district,
'website': website,
'branch_rating': rating,
'facilities_delivery': delivery,
'facilities_banquet': banquet,
'facilites_shisha': shisha,
'faciliteis_minumum': minumium,
'facilites_reservation': reservation,
'facilities_free_wifi': wifi,
'facilities_smoking_permited': smoking,
'facilities_eat_out': eat_out,
'facilities_private_parking': parking,
'facilities_price_range': price_range,
'facilities_kids_ares': kids_ares,
'branch_no': branch_no}
mainlist.append(data)
with open('filetest.csv', 'w') as f:
writer = csv.writer(f)
for value in mainlist:
writer.writerow([value])
I want to save the key in the dictionary as columns and the values as a row,
(keep in mind that the value pair in the dictionary refers to a variable that extracts data from a web site)
Here is a solution that can work for you. I added a second data item to your code (data2) and renamed the initial data element to data1.
mainlist = []
#initial data item
data1 = {
'address_area': 'area1',
'address_district': 'district1',
'website': 'website1',
'branch_rating': 'rating1',
'facilities_delivery': 'delivery1',
'facilities_banquet': 'banquet1',
'facilites_shisha': 'shisha1',
'faciliteis_minumum': 'minumium1',
'facilites_reservation': 'reservation1',
'facilities_free_wifi': 'wifi1',
'facilities_smoking_permited': 'smoking1',
'facilities_eat_out': 'eat_out1',
'facilities_private_parking': 'parking1',
'facilities_price_range': 'price_range1',
'facilities_kids_ares': 'kids_aresa1',
'branch_no': 'branch_no1'}
mainlist.append(data1)
#second data item
data2 = {
'address_area': 'area2',
'address_district': 'district2',
'website': 'website2',
'branch_rating': 'rating2',
'facilities_delivery': 'delivery2',
'facilities_banquet': 'banquet2',
'facilites_shisha': 'shisha2',
'faciliteis_minumum': 'minumium2',
'facilites_reservation': 'reservation2',
'facilities_free_wifi': 'wifi2',
'facilities_smoking_permited': 'smoking2',
'facilities_eat_out': 'eat_out2',
'facilities_private_parking': 'parking2',
'facilities_price_range': 'price_range2',
'facilities_kids_ares': 'kids_aresa2',
'branch_no': 'branch_no2'}
mainlist.append(data2)
filename = 'filetest.csv'
headers = ",".join(data1.keys())
with open(filename, 'w') as f:
f.write(headers + '\n')
for item in mainlist:
f.write(','.join(str(item[key]) for key in item) + '\n')
print("All done. Check this file for results:",filename)
Related
The following situation:
movies.csv
movieId,title,genres
tags.csv
userId,movieId,tag,timestamp
I want to get the tags from tags.csv and append to the dictionary containing a list where all the tags should be stored. The movieID should be identical so that the list can be appended. The list should also not have duplicates.
Here is the code:
import csv
reader = csv.reader(open('movies1.csv'))
dict = {}
header = next(reader)
# Check file as empty
if header != None:
for row in reader:
key = row[0]
value = {
"id": row[0],
"title": row[1][:-6],
"year": row[1][-5:-1],
"average_rating": 0,
"ratings": [],
"tags": [], #the list that should be filled with tags
"genres": row[2].split('|')
}
dict[key] = value
tags={}
with open('tags1.csv', mode='r') as infile:
reader = csv.reader(infile)
header = next(reader)
# Check file as empty
if header != None:
for col in reader:
if col[1] == dict[key]['id']:
dict[key]['tags'].append(col[2])
print(dict)
My result:
I get all the tags for the last movie. The rest of the tags are just empty.
What am I doing wrong?
So i made it work. I created a second Dictionary and that looped in both of them.
for tag in tags:
for movie in dict:
if tags[tag]['movieId'] == dict[movie]['id']:
if tags[tag]['tag'] not in dict[movie]['tags']:
dict[movie]['tags'].append(tags[tag]['tag'])
In my current code, it seems to only take into account one value for my Subject key when there should be more (you can only see Economics in my JSON tree and not Maths). I've tried for hours and I can't get it to work.
Here is my sample dataset - I have many more subjects in my full data set:
ID,Name,Date,Subject,Start,Finish
0,Ladybridge High School,01/11/2019,Maths,05:28,06:45
0,Ladybridge High School,02/11/2019,Maths,05:30,06:45
0,Ladybridge High School,01/11/2019,Economics,11:58,12:40
0,Ladybridge High School,02/11/2019,Economics,11:58,12:40
1,Loreto Sixth Form,01/11/2019,Maths,05:28,06:45
1,Loreto Sixth Form,02/11/2019,Maths,05:30,06:45
1,Loreto Sixth Form,01/11/2019,Economics,11:58,12:40
1,Loreto Sixth Form,02/11/2019,Economics,11:58,12:40
Here is my Python code:
timetable = {"Timetable": []}
with open("C:/Users/kspv914/Downloads/Personal/Project Dawn/Timetable Sample.csv") as f:
csv_data = [{k: v for k, v in row.items()} for row in csv.DictReader(f, skipinitialspace=True)]
name_array = []
for name in [row["Name"] for row in csv_data]:
name_array.append(name)
name_set = set(name_array)
for name in name_set:
timetable["Timetable"].append({"Name": name, "Date": {}})
for row in csv_data:
for entry in timetable["Timetable"]:
if entry["Name"] == row["Name"]:
entry["Date"][row["Date"]] = {}
entry["Date"][row["Date"]][row["Subject"]] = {
"Start": row["Start"],
"Finish": row["Finish"]
}
Here is my JSON tree:
You're making date dict empty and then adding a subject.
Do something like this:
timetable = {"Timetable": []}
with open("a.csv") as f:
csv_data = [{k: v for k, v in row.items()} for row in csv.DictReader(f, skipinitialspace=True)]
name_array = []
for name in [row["Name"] for row in csv_data]:
name_array.append(name)
name_set = set(name_array)
for name in name_set:
timetable["Timetable"].append({"Name": name, "Date": {}})
for row in csv_data:
for entry in timetable["Timetable"]:
if entry["Name"] == row["Name"]:
if row["Date"] not in entry["Date"]:
entry["Date"][row["Date"]] = {}
entry["Date"][row["Date"]][row["Subject"]] = {
"Start": row["Start"],
"Finish": row["Finish"]
}
I've just added if condition before assigning {} to entry["Date"][row["Date"]]
It will give output like as shown in the below image:
You are overwriting your dict entries with entry["Date"][row["Date"]][row["Subject"]] =. The first time "math" is met, the entry is created. The second time it is overwritten.
Your expected result should be a list, not a dict. Every entry should be appended to the list with timetable_list.append().
Here is a simple code that converts the whole csv file into Json without loosing data:
import csv
import json
data = []
with open("ex1.csv") as f:
reader = csv.DictReader(f)
for row in reader:
data.append(row)
print(json.dumps({"Timetable": data}, indent=4))
The following code is giving me the error:
Traceback (most recent call last): File "AMZGetPendingOrders.py", line 66, in <module>
item_list.append(item['SellerSKU']) TypeError: string indices must be integers
The code:
from mws import mws
import time
import json
import xmltodict
access_key = 'xx' #replace with your access key
seller_id = 'yy' #replace with your seller id
secret_key = 'zz' #replace with your secret key
marketplace_usa = '00'
orders_api = mws.Orders(access_key, secret_key, seller_id)
orders = orders_api.list_orders(marketplaceids=[marketplace_usa], orderstatus=('Pending'), fulfillment_channels=('MFN'), created_after='2018-07-01')
#save as XML file
filename = 'c:order.xml'
with open(filename, 'w') as f:
f.write(orders.original)
#ConvertXML to JSON
dictString = json.dumps(xmltodict.parse(orders.original))
#Write new JSON to file
with open("output.json", 'w') as f:
f.write(dictString)
#Read JSON and parse our order number
with open('output.json', 'r') as jsonfile:
data = json.load(jsonfile)
#initialize blank dictionary
id_list = []
for order in data['ListOrdersResponse']['ListOrdersResult']['Orders']['Order']:
id_list.append(order['AmazonOrderId'])
#This "gets" the orderitem info - this code actually is similar to the initial Amazon "get" though it has fewer switches
orders_api = mws.Orders(access_key, secret_key, seller_id)
#opens and empties the orderitem.xml file
open('c:orderitem.xml', 'w').close()
#iterated through the list of AmazonOrderIds and writes the item information to orderitem.xml
for x in id_list:
orders = orders_api.list_order_items(amazon_order_id = x)
filename = 'c:orderitem.xml'
with open(filename, 'a') as f:
f.write(orders.original)
#ConvertXML to JSON
amz_items_pending = json.dumps(xmltodict.parse(orders.original))
#Write new JSON to file
with open("pending.json", 'w') as f:
f.write(amz_items_pending)
#read JSON and parse item_no and qty
with open('pending.json', 'r') as jsonfile1:
data1 = json.load(jsonfile1)
#initialize blank dictionary
item_list = []
for item in data1['ListOrderItemsResponse']['ListOrderItemsResult']['OrderItems']['OrderItem']:
item_list.append(item['SellerSKU'])
#print(item)
#print(id_list)
#print(data1)
#print(item_list)
time.sleep(10)
I don't understand why Python thinks this is a list and not a dictionary. When I print id_list it looks like a dictionary (curly braces, single quotes, colons, etc)
print(data1) shows my dictionary
{
'ListOrderItemsResponse':{
'#xmlns':'https://mws.amazonservices.com/Orders/201 3-09-01',
'ListOrderItemsResult':{
'OrderItems':{
'OrderItem':{
'QuantityOrdered ':'1',
'Title':'Delta Rothko Rolling Bicycle Stand',
'ConditionId':'New',
'Is Gift':'false',
'ASIN':'B00XXXXTIK',
'SellerSKU':'9934638',
'OrderItemId':'49 624373726506',
'ProductInfo':{
'NumberOfItems':'1'
},
'QuantityShipped':'0',
'C onditionSubtypeId':'New'
}
},
'AmazonOrderId':'112-9XXXXXX-XXXXXXX'
},
'ResponseM etadata':{
'RequestId':'8XXXXX8-0866-44a4-96f5-XXXXXXXXXXXX'
}
}
}
Any ideas?
because you are iterating over each key value in dict:
{'QuantityOrdered ': '1', 'Title': 'Delta Rothko Rolling Bicycle Stand', 'ConditionId': 'New', 'Is Gift': 'false', 'ASIN': 'B00XXXXTIK', 'SellerSKU': '9934638', 'OrderItemId': '49 624373726506', 'ProductInfo': {'NumberOfItems': '1'}, 'QuantityShipped': '0', 'C onditionSubtypeId': 'New'}
so first value in item will be 'QuantityOrdered ' and you are trying to access this string as if it is dictionary
you can just do:
id_list.append(data1['ListOrderItemsResponse']['ListOrderItemsResult']['OrderItems']['OrderItem']['SellerSKU']))
and avoid for loop in dictionary
I guess you are trying to iterate OrderItems and finding their SellerSKU values.
for item in data1['ListOrderItemsResponse']['ListOrderItemsResult']['OrderItems']:
item_list.append(item['SellerSKU'])
I'm using poloniex trader api to get realtime market ticker info. It works perfect on console. When ever i request for market ticker in returns i get this {'last': '0.07269671', 'high24hr': '0.07379970', 'quoteVolume': '71582.56540639', 'isFrozen': '0', 'lowestAsk': '0.07277290', 'percentChange': '-0.00551274', 'id': 148, 'low24hr': '0.07124645', 'highestBid': '0.07269671', 'baseVolume': '5172.41552885'}
Problem is it's only storing item name/list name such as - low24hr, lowestAsk, highestBid etc. Not their value like -low24hr : 0.07124645
polo = Poloniex()
ticker_data = (polo('returnTicker')['BTC_ETH'])
out = csv.writer(open("myfile.csv","w"), delimiter=',',quoting=csv.QUOTE_ALL,)
out.writerow(ticker_data)
print(ticker_data)
Here is what my csv file looks like-
Your problem is that out.writerow(ticker_data) takes only the keys of the dictionary and writes them to the file. Try to use a csv.DictWriter:
with open('myfile.csv', 'w', newline='') as csv_file:
# Pass the keys of the `ticker_data` as the fieldnames.
writer = csv.DictWriter(csv_file, fieldnames=ticker_data, quoting=csv.QUOTE_ALL)
# Write the fieldnames.
writer.writeheader()
# Write the values.
writer.writerow(ticker_data)
with open('my_file.csv', 'w') as f:
for key, value in ticker_data.items(): f.write('{0},{1}\n'.format(key, value))
From here.
I have a file that takes information and writes it to a csv file. I am having trouble getting it to format the way I want. It is looping 10 times and the information is there I can confirm. I am including the code to show you the exact setup of the csv writing part.
Here is my code:
outfile = open('Accounts_Details.csv', 'a')
for i in range(0, 11):
#Calling all the above functions
soc_auth_requests()
create_account()
config_admin_create()
account_user_create()
account_activate()
account_config_DNS_create()
#Creating the dictionary for the CSV file with the data fields made and modified from before
#It is necessary this be done after the above method calls to ensure the data field values are correct
data = {
'Account_Name': acc_name,
'Account_Id': acc_id,
'User_Email': user_email,
'User_id': user_id
}
#Creating a csv file and writing the dictionary titled "data" to it
for key, value in sorted(data.items()):
outfile.write('\t' + str(value))
outfile.write('\n')
So I have four bits of data in the dict and I want the format to be laid out in the csv file so that the four bits of info are put on one line and when it loops through the for loop it moves to the next line and does the same there.
Ex.
name1, id1, email1, uId1
name2, id2, email2, uId2
name3, id3, email3, uId3
I assume it has to do with how I open the file, but I am not sure and can't figure it out.
Thanks for the help!
Here is the current output I am getting. I want all the 1's to be on one line and then move down.
name1
id1
email1
uID1
name2
id2
email2
uID2
Try deleting arguments from last statement:
for key, value in sorted(data.items()):
outfile.write('\t' + str(value))
# outfile.write('\n')
# modified for
outfile.close()
please let me know how it was and if this worked!
:)
I don't think your file opening arguments are the issue -- I wasn't able to replicate the issue. However, you could probably streamline the code and remove any possibility of an issue by using list comprehensions for your data:
for i in range(12):
data = {
'Account_Name': 'AccountName',
'Account_Id': '#12345',
'User_Email': 'a#b',
'User_id': 'LOGGER'
}
with open("test.txt", "a") as f:
f.write('{}\n'.format(', '.join(data.values())))
Just use the csv module which will automagically format the line for you:
outfile = open('Accounts_Details.csv', 'a')
writer = csv.DictWriter(outfile, [ 'Account_Name', 'Account_Id', 'User_Email', 'User_id' ])
# optionaly if you want a header line:
writer.writeheader()
for i in range(0, 11):
...
data = {
'Account_Name': acc_name,
'Account_Id': acc_id,
'User_Email': user_email,
'User_id': user_id
}
writer.writerow(data)
outfile.close()