How to read multiple .csv files and analyse the data?

How to read multiple .csv files and analyse the data? - python

The following situation:
movies.csv
movieId,title,genres
tags.csv
userId,movieId,tag,timestamp
I want to get the tags from tags.csv and append to the dictionary containing a list where all the tags should be stored. The movieID should be identical so that the list can be appended. The list should also not have duplicates.
Here is the code:
import csv
reader = csv.reader(open('movies1.csv'))
dict = {}
header = next(reader)
# Check file as empty
if header != None:
for row in reader:
key = row[0]
value = {
"id": row[0],
"title": row[1][:-6],
"year": row[1][-5:-1],
"average_rating": 0,
"ratings": [],
"tags": [], #the list that should be filled with tags
"genres": row[2].split('|')
}
dict[key] = value
tags={}
with open('tags1.csv', mode='r') as infile:
reader = csv.reader(infile)
header = next(reader)
# Check file as empty
if header != None:
for col in reader:
if col[1] == dict[key]['id']:
dict[key]['tags'].append(col[2])
print(dict)
My result:
I get all the tags for the last movie. The rest of the tags are just empty.
What am I doing wrong?

So i made it work. I created a second Dictionary and that looped in both of them.
for tag in tags:
for movie in dict:
if tags[tag]['movieId'] == dict[movie]['id']:
if tags[tag]['tag'] not in dict[movie]['tags']:
dict[movie]['tags'].append(tags[tag]['tag'])

Related

Writing Dictionaries into a CSV

I am working on a web scraping project, I am trying to save the results into a CSV file,
data = {
'address_area': area,
'address_district': district,
'website': website,
'branch_rating': rating,
'facilities_delivery': delivery,
'facilities_banquet': banquet,
'facilites_shisha': shisha,
'faciliteis_minumum': minumium,
'facilites_reservation': reservation,
'facilities_free_wifi': wifi,
'facilities_smoking_permited': smoking,
'facilities_eat_out': eat_out,
'facilities_private_parking': parking,
'facilities_price_range': price_range,
'facilities_kids_ares': kids_ares,
'branch_no': branch_no}
mainlist.append(data)
with open('filetest.csv', 'w') as f:
writer = csv.writer(f)
for value in mainlist:
writer.writerow([value])
I want to save the key in the dictionary as columns and the values as a row,
(keep in mind that the value pair in the dictionary refers to a variable that extracts data from a web site)

Here is a solution that can work for you. I added a second data item to your code (data2) and renamed the initial data element to data1.
mainlist = []
#initial data item
data1 = {
'address_area': 'area1',
'address_district': 'district1',
'website': 'website1',
'branch_rating': 'rating1',
'facilities_delivery': 'delivery1',
'facilities_banquet': 'banquet1',
'facilites_shisha': 'shisha1',
'faciliteis_minumum': 'minumium1',
'facilites_reservation': 'reservation1',
'facilities_free_wifi': 'wifi1',
'facilities_smoking_permited': 'smoking1',
'facilities_eat_out': 'eat_out1',
'facilities_private_parking': 'parking1',
'facilities_price_range': 'price_range1',
'facilities_kids_ares': 'kids_aresa1',
'branch_no': 'branch_no1'}
mainlist.append(data1)
#second data item
data2 = {
'address_area': 'area2',
'address_district': 'district2',
'website': 'website2',
'branch_rating': 'rating2',
'facilities_delivery': 'delivery2',
'facilities_banquet': 'banquet2',
'facilites_shisha': 'shisha2',
'faciliteis_minumum': 'minumium2',
'facilites_reservation': 'reservation2',
'facilities_free_wifi': 'wifi2',
'facilities_smoking_permited': 'smoking2',
'facilities_eat_out': 'eat_out2',
'facilities_private_parking': 'parking2',
'facilities_price_range': 'price_range2',
'facilities_kids_ares': 'kids_aresa2',
'branch_no': 'branch_no2'}
mainlist.append(data2)
filename = 'filetest.csv'
headers = ",".join(data1.keys())
with open(filename, 'w') as f:
f.write(headers + '\n')
for item in mainlist:
f.write(','.join(str(item[key]) for key in item) + '\n')
print("All done. Check this file for results:",filename)

How would I adjust my Python code for a nested .JSON tree (Python dictionary) to include multiple keys?

In my current code, it seems to only take into account one value for my Subject key when there should be more (you can only see Economics in my JSON tree and not Maths). I've tried for hours and I can't get it to work.
Here is my sample dataset - I have many more subjects in my full data set:
ID,Name,Date,Subject,Start,Finish
0,Ladybridge High School,01/11/2019,Maths,05:28,06:45
0,Ladybridge High School,02/11/2019,Maths,05:30,06:45
0,Ladybridge High School,01/11/2019,Economics,11:58,12:40
0,Ladybridge High School,02/11/2019,Economics,11:58,12:40
1,Loreto Sixth Form,01/11/2019,Maths,05:28,06:45
1,Loreto Sixth Form,02/11/2019,Maths,05:30,06:45
1,Loreto Sixth Form,01/11/2019,Economics,11:58,12:40
1,Loreto Sixth Form,02/11/2019,Economics,11:58,12:40
Here is my Python code:
timetable = {"Timetable": []}
with open("C:/Users/kspv914/Downloads/Personal/Project Dawn/Timetable Sample.csv") as f:
csv_data = [{k: v for k, v in row.items()} for row in csv.DictReader(f, skipinitialspace=True)]
name_array = []
for name in [row["Name"] for row in csv_data]:
name_array.append(name)
name_set = set(name_array)
for name in name_set:
timetable["Timetable"].append({"Name": name, "Date": {}})
for row in csv_data:
for entry in timetable["Timetable"]:
if entry["Name"] == row["Name"]:
entry["Date"][row["Date"]] = {}
entry["Date"][row["Date"]][row["Subject"]] = {
"Start": row["Start"],
"Finish": row["Finish"]
}
Here is my JSON tree:

You're making date dict empty and then adding a subject.
Do something like this:
timetable = {"Timetable": []}
with open("a.csv") as f:
csv_data = [{k: v for k, v in row.items()} for row in csv.DictReader(f, skipinitialspace=True)]
name_array = []
for name in [row["Name"] for row in csv_data]:
name_array.append(name)
name_set = set(name_array)
for name in name_set:
timetable["Timetable"].append({"Name": name, "Date": {}})
for row in csv_data:
for entry in timetable["Timetable"]:
if entry["Name"] == row["Name"]:
if row["Date"] not in entry["Date"]:
entry["Date"][row["Date"]] = {}
entry["Date"][row["Date"]][row["Subject"]] = {
"Start": row["Start"],
"Finish": row["Finish"]
}
I've just added if condition before assigning {} to entry["Date"][row["Date"]]
It will give output like as shown in the below image:

You are overwriting your dict entries with entry["Date"][row["Date"]][row["Subject"]] =. The first time "math" is met, the entry is created. The second time it is overwritten.
Your expected result should be a list, not a dict. Every entry should be appended to the list with timetable_list.append().
Here is a simple code that converts the whole csv file into Json without loosing data:
import csv
import json
data = []
with open("ex1.csv") as f:
reader = csv.DictReader(f)
for row in reader:
data.append(row)
print(json.dumps({"Timetable": data}, indent=4))

Not getting expected output in python when converting a csv to json

I have an excel file in which data is saved in csv format in such a way.This data is present in the excel file as shown below,under column A (The CSV File is generated by LabView Software code which i have written to generate data).I have also attached an image of the csv file for reference at the end of my question.
RPM,Load Current,Battery Output,Power Capacity
1200,30,12,37
1600,88,18,55
I want to create a Json file in such format
{
"power_capacity_data" :
{
"rpm" : ["1200","1600"],
"load_curr" : ["30","88"],
"batt_output" : ["12","18"],
"power_cap" : ["37","55"]
}
}
This is my code
import csv
import json
def main():
#created a dictionary so that i can append data to it afterwards
power_data = {"rpm":[],"load_curr":[],"batt_output":[],"power_cap":[]}
with open('power1.lvm') as f:
reader = csv.reader(f)
#trying to append the data of column "RPM" to dictionary
rowcount = 0
for row in reader:
if rowcount == 0:
#trying to skip the first row
rowcount = rowcount + 1
else:
power_data['rpm'].append(row[0])
print(row)
json_report = {}
json_report['pwr_capacity_data'] = power_data
with open('LVMJSON', "w") as f1:
f1.write(json.dumps(json_report, sort_keys=False, indent=4, separators=(',', ': '),encoding="utf-8",ensure_ascii=False))
f1.close()
if __name__ == "__main__":
main()
The output json file that i am getting is this:(please ignore the print(row) statement in my code)
{
"pwr_capacity_data":
{
"load_curr": [],
"rpm": [
"1200,30,12.62,37.88",
"1600,88,18.62,55.88"
],
"batt_output": [],
"power_cap": []
}
}
The whole row is getting saved in the list,but I just want the values under the column RPM to be saved .Can someone help me out with what I may be doing wrong.Thanks in advance.I have attached an image of csv file to just in case it helps

You could use Python's defaultdict to make it a bit easier. Also a dictionary to map all your header values.
from collections import defaultdict
import csv
import json
power_data = defaultdict(list)
header_mappings = {
'RPM' : 'rpm',
'Load Current' : 'load_curr',
'Battery Output' : 'batt_output',
'Power Capacity' : 'power_cap'}
with open('power1.lvm', newline='') as f_input:
csv_input = csv.DictReader(f_input)
for row in csv_input:
for key, value in row.items():
power_data[header_mappings[key]].append(value)
with open('LVMJSON.json', 'w') as f_output:
json.dump({'power_capacity_data' : power_data}, f_output, indent=2)
Giving you an output JSON file looking like:
{
"power_capacity_data": {
"batt_output": [
"12",
"18"
],
"power_cap": [
"37",
"55"
],
"load_curr": [
"30",
"88"
],
"rpm": [
"1200",
"1600"
]
}
}

Python : Normalize Json response (array)

I am new to JSON and Python,I am trying to achieve below
Need to parse below JSON
{
"id": "12345abc",
"codes": [
"BSVN1FKW3JKKNNMN",
"HJYYUKJJL999OJR",
"DFTTHJJJJ0099JUU",
"FGUUKHKJHJGJJYGJ"
],
"ctr": {
"source": "xyz",
"user_id": "1234"
}
}
Expected output:Normalized on "codes" value
ID~CODES~USER_ID
12345abc~BSVN1FKW3JKKNNMN~1234
12345abc~HJYYUKJJL999OJR~1234
12345abc~DFTTHJJJJ0099JUU~1234
12345abc~FGUUKHKJHJGJJYGJ~1234
Started with below ,but need help to get to my desired output.
The "codes" block can have n number of values separated by comma.
The below code is throwing an error "TypeError: string indices must be integers"
#!/usr/bin/python
import os
import json
import csv
f = open('rspns.csv','w')
writer = csv.writer(f,delimiter = '~')
headers = [‘ID’,’CODES’,’USER_ID’]
default = ''
writer.writerow(headers)
string = open('sample.json').read().decode('utf-8')
json_obj = json.loads(string)
#print json_obj['id']
#print json_obj['codes']
#print json_obj['codes'][0]
#print json_obj['codes'][1]
#print json_obj['codes’][2]
#print json_obj['codes’][3]
#print json_obj['ctr’][‘user_id']
for keyword in json_obj:
row = []
row.append(str(keyword['id']))
row.append(str(keyword['codes']))
row.append(str(keyword['ctr’][‘user_id']))
writer.writerow(row)

If your json_obj looks exactly like that , that is it is a dictionary, then the issue is that when you do -
for keyword in json_obj:
You are iterating over keys in json_obj, then if you try to access ['id'] for that key it should error out saying string indices must be integers .
You should first get the id and user_id before looping and then loop over json_obj['codes'] and then add the previously computed id and user_id along with the current value from codes list to the writer csv as a row.
Example -
import json
import csv
string = open('sample.json').read().decode('utf-8')
json_obj = json.loads(string)
with open('rspns.csv','w') as f:
writer = csv.writer(f,delimiter = '~')
headers = ['ID','CODES','USER_ID']
writer.writerow(headers)
id = json_obj['id']
user_id = json_obj['ctr']['user_id']
for code in json_obj['codes']:
writer.writerow([id,code,user_id])

You don't want to iterate through json_obj as that is a dictionary and iterating through will get the keys. The TypeError is caused by trying to index into the keys ('id', 'code', and 'ctr') -- which are strings -- as if they were a dictionary.
Instead, you want a separate row for each code in json_obj['codes'] and to use the json_obj dictionary for your lookups:
for code in json_obj['codes']:
row = []
row.append(json_obj['id'])
row.append(code)
row.append(json_obj['ctr’][‘user_id'])
writer.writerow(row)

How to build a nested ordered dict from a csv?

How can I get a nested dictionary, where both the keys and the subkeys are precisely in the same order as in the csv file?
I tried
import csv
from collections import OrderedDict
filename = "test.csv"
aDict = OrderedDict()
with open(filename, 'r') as f:
csvReader = csv.DictReader(f)
for row in csvReader:
key = row.pop("key")
aDict[key] = row
where test.csv looks like
key,number,letter
eins,1,a
zwei,2,b
drei,3,c
But the sub-dictionaries are not ordered (rows letter and number are changed). So how can I populate aDict[key] in an ordered manner?

You have to build the dictionaries and sub-dictionaries yourself from rows returned from csv.reader which are sequences, instead of using csv.DictReader.
Fortunately that's fairly easy:
import csv
from collections import OrderedDict
filename = 'test.csv'
aDict = OrderedDict()
with open(filename, 'rb') as f:
csvReader = csv.reader(f)
fields = next(csvReader)
for row in csvReader:
temp = OrderedDict(zip(fields, row))
key = temp.pop("key")
aDict[key] = temp
import json # just to create output
print(json.dumps(aDict, indent=4))
Output:
{
"eins": {
"number": "1",
"letter": "a"
},
"zwei": {
"number": "2",
"letter": "b"
},
"drei": {
"number": "3",
"letter": "c"
}
}

This is one way:
import csv
from collections import OrderedDict
filename = "test.csv"
aDict = OrderedDict()
with open(filename, 'r') as f:
order = next(csv.reader(f))[1:]
f.seek(0)
csvReader = csv.DictReader(f)
for row in csvReader:
key = row.pop("key")
aDict[key] = OrderedDict((k, row[k]) for k in order)

csv.DictReader loads the rows into a regular dict and not an ordered one. You'll have to read the csv manually into an OrderedDict to get the order you need:
from collections import OrderedDict
filename = "test.csv"
dictRows = []
with open(filename, 'r') as f:
rows = (line.strip().split(',') for line in f)
# read column names from first row
columns = rows.next()
for row in rows:
dictRows.append(OrderedDict(zip(columns, row)))

You can take advantage of the existing csv.DictReader class, but alter the rows it returns. To do that, add the following class to the beginning of your script:
class OrderedDictReader(csv.DictReader):
def next(self):
# Get a row using csv.DictReader
row = csv.DictReader.next(self)
# Create a new row using OrderedDict
new_row = OrderedDict(((k, row[k]) for k in self.fieldnames))
return new_row
Then, use this class in place of csv.DictReader:
csvReader = OrderedDictReader(f)
The rest of your code remains the same.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to read multiple .csv files and analyse the data? - python

So i made it work. I created a second Dictionary and that looped in both of them. for tag in tags: for movie in dict: if tags[tag]['movieId'] == dict[movie]['id']: if tags[tag]['tag'] not in dict[movie]['tags']: dict[movie]['tags'].append(tags[tag]['tag'])

Related

Writing Dictionaries into a CSV

How would I adjust my Python code for a nested .JSON tree (Python dictionary) to include multiple keys?

Not getting expected output in python when converting a csv to json

Python : Normalize Json response (array)

How to build a nested ordered dict from a csv?

Categories

Resources