How can I get a nested dictionary, where both the keys and the subkeys are precisely in the same order as in the csv file?
I tried
import csv
from collections import OrderedDict
filename = "test.csv"
aDict = OrderedDict()
with open(filename, 'r') as f:
csvReader = csv.DictReader(f)
for row in csvReader:
key = row.pop("key")
aDict[key] = row
where test.csv looks like
key,number,letter
eins,1,a
zwei,2,b
drei,3,c
But the sub-dictionaries are not ordered (rows letter and number are changed). So how can I populate aDict[key] in an ordered manner?
You have to build the dictionaries and sub-dictionaries yourself from rows returned from csv.reader which are sequences, instead of using csv.DictReader.
Fortunately that's fairly easy:
import csv
from collections import OrderedDict
filename = 'test.csv'
aDict = OrderedDict()
with open(filename, 'rb') as f:
csvReader = csv.reader(f)
fields = next(csvReader)
for row in csvReader:
temp = OrderedDict(zip(fields, row))
key = temp.pop("key")
aDict[key] = temp
import json # just to create output
print(json.dumps(aDict, indent=4))
Output:
{
"eins": {
"number": "1",
"letter": "a"
},
"zwei": {
"number": "2",
"letter": "b"
},
"drei": {
"number": "3",
"letter": "c"
}
}
This is one way:
import csv
from collections import OrderedDict
filename = "test.csv"
aDict = OrderedDict()
with open(filename, 'r') as f:
order = next(csv.reader(f))[1:]
f.seek(0)
csvReader = csv.DictReader(f)
for row in csvReader:
key = row.pop("key")
aDict[key] = OrderedDict((k, row[k]) for k in order)
csv.DictReader loads the rows into a regular dict and not an ordered one. You'll have to read the csv manually into an OrderedDict to get the order you need:
from collections import OrderedDict
filename = "test.csv"
dictRows = []
with open(filename, 'r') as f:
rows = (line.strip().split(',') for line in f)
# read column names from first row
columns = rows.next()
for row in rows:
dictRows.append(OrderedDict(zip(columns, row)))
You can take advantage of the existing csv.DictReader class, but alter the rows it returns. To do that, add the following class to the beginning of your script:
class OrderedDictReader(csv.DictReader):
def next(self):
# Get a row using csv.DictReader
row = csv.DictReader.next(self)
# Create a new row using OrderedDict
new_row = OrderedDict(((k, row[k]) for k in self.fieldnames))
return new_row
Then, use this class in place of csv.DictReader:
csvReader = OrderedDictReader(f)
The rest of your code remains the same.
Related
How can I extract the T3 Period, Year and maximum value?
file.json
[
{"Fecha":"2022-08-01T00:00:00.000+02:00", "T3_TipoDato":"Avance", "T3_Periodo":"M08", "Anyo":2022, "value":10.4},
{"Fecha":"2022-07-01T00:00:00.000+02:00", "T3_TipoDato":"Definitivo", "T3_Periodo":"M07", "Anyo":2022, "value":10.8},
{"Fecha":"2022-06-01T00:00:00.000+02:00", "T3_TipoDato":"Definitivo", "T3_Periodo":"M06", "Anyo":2022, "value":10.2}
]
My code:
import json
with open("file.json") as f:
distros_dict = json.load(f)
print (distros_dict)
that is my proposition.
Load data from a file to a list.
Loop thru every dict in a list to edit it.
(At my example I, deleted two keys from every dict in list.)
import json
distros_dict = []
with open(f'file.json', "r", encoding='utf-8') as f:
distros_dict.extend(json.load(f))
for item in distros_dict:
item.pop('Fecha')
item.pop('T3_TipoDato')
distros_dict = sorted(distros_dict, key = lambda i: i['value'], reverse=True)[0]
Try this:
from json import load
with open("file.json") as f:
dictionary_max = max(load(f), key=lambda x: x["value"])
result = {
"T3_Periodo": dictionary_max["T3_Periodo"],
"Anyo": dictionary_max["Anyo"],
"value": dictionary_max["value"],
}
print(result)
output:
{'T3_Periodo': 'M07', 'Anyo': 2022, 'value': 10.8}
The following situation:
movies.csv
movieId,title,genres
tags.csv
userId,movieId,tag,timestamp
I want to get the tags from tags.csv and append to the dictionary containing a list where all the tags should be stored. The movieID should be identical so that the list can be appended. The list should also not have duplicates.
Here is the code:
import csv
reader = csv.reader(open('movies1.csv'))
dict = {}
header = next(reader)
# Check file as empty
if header != None:
for row in reader:
key = row[0]
value = {
"id": row[0],
"title": row[1][:-6],
"year": row[1][-5:-1],
"average_rating": 0,
"ratings": [],
"tags": [], #the list that should be filled with tags
"genres": row[2].split('|')
}
dict[key] = value
tags={}
with open('tags1.csv', mode='r') as infile:
reader = csv.reader(infile)
header = next(reader)
# Check file as empty
if header != None:
for col in reader:
if col[1] == dict[key]['id']:
dict[key]['tags'].append(col[2])
print(dict)
My result:
I get all the tags for the last movie. The rest of the tags are just empty.
What am I doing wrong?
So i made it work. I created a second Dictionary and that looped in both of them.
for tag in tags:
for movie in dict:
if tags[tag]['movieId'] == dict[movie]['id']:
if tags[tag]['tag'] not in dict[movie]['tags']:
dict[movie]['tags'].append(tags[tag]['tag'])
In my current code, it seems to only take into account one value for my Subject key when there should be more (you can only see Economics in my JSON tree and not Maths). I've tried for hours and I can't get it to work.
Here is my sample dataset - I have many more subjects in my full data set:
ID,Name,Date,Subject,Start,Finish
0,Ladybridge High School,01/11/2019,Maths,05:28,06:45
0,Ladybridge High School,02/11/2019,Maths,05:30,06:45
0,Ladybridge High School,01/11/2019,Economics,11:58,12:40
0,Ladybridge High School,02/11/2019,Economics,11:58,12:40
1,Loreto Sixth Form,01/11/2019,Maths,05:28,06:45
1,Loreto Sixth Form,02/11/2019,Maths,05:30,06:45
1,Loreto Sixth Form,01/11/2019,Economics,11:58,12:40
1,Loreto Sixth Form,02/11/2019,Economics,11:58,12:40
Here is my Python code:
timetable = {"Timetable": []}
with open("C:/Users/kspv914/Downloads/Personal/Project Dawn/Timetable Sample.csv") as f:
csv_data = [{k: v for k, v in row.items()} for row in csv.DictReader(f, skipinitialspace=True)]
name_array = []
for name in [row["Name"] for row in csv_data]:
name_array.append(name)
name_set = set(name_array)
for name in name_set:
timetable["Timetable"].append({"Name": name, "Date": {}})
for row in csv_data:
for entry in timetable["Timetable"]:
if entry["Name"] == row["Name"]:
entry["Date"][row["Date"]] = {}
entry["Date"][row["Date"]][row["Subject"]] = {
"Start": row["Start"],
"Finish": row["Finish"]
}
Here is my JSON tree:
You're making date dict empty and then adding a subject.
Do something like this:
timetable = {"Timetable": []}
with open("a.csv") as f:
csv_data = [{k: v for k, v in row.items()} for row in csv.DictReader(f, skipinitialspace=True)]
name_array = []
for name in [row["Name"] for row in csv_data]:
name_array.append(name)
name_set = set(name_array)
for name in name_set:
timetable["Timetable"].append({"Name": name, "Date": {}})
for row in csv_data:
for entry in timetable["Timetable"]:
if entry["Name"] == row["Name"]:
if row["Date"] not in entry["Date"]:
entry["Date"][row["Date"]] = {}
entry["Date"][row["Date"]][row["Subject"]] = {
"Start": row["Start"],
"Finish": row["Finish"]
}
I've just added if condition before assigning {} to entry["Date"][row["Date"]]
It will give output like as shown in the below image:
You are overwriting your dict entries with entry["Date"][row["Date"]][row["Subject"]] =. The first time "math" is met, the entry is created. The second time it is overwritten.
Your expected result should be a list, not a dict. Every entry should be appended to the list with timetable_list.append().
Here is a simple code that converts the whole csv file into Json without loosing data:
import csv
import json
data = []
with open("ex1.csv") as f:
reader = csv.DictReader(f)
for row in reader:
data.append(row)
print(json.dumps({"Timetable": data}, indent=4))
I had a Python beginners course last year. Now I am trying to get a csv to json converter. I have searched quite some time and adapted and changed some of the code I found, until the output looked similar to what I want. I am using Python 3.4.2.
#kvorobiev this is an excerpt of my CSV, but it will do for the case. The first time Converting will work. After the second time you will see that the order of the headings will change within the json file.
The csv file looks like this
Document;Item;Category
4;10;C
What I am getting in the output file as of now (after applying the changes from kvorobiev):
[
{
"Item": "10",
"Category": "C",
"Document": "4"
};
]
The json string I want to get in the output file should look like:
[
{
"Document": "4",
"Item": "10",
"Category": "C"
},
]
You will notice the headings are in the wrong order.
Here is the code:
import json
import csv
csvfile = open('file1.csv', 'r')
jsonfile = open('file1.csv'.replace('.csv','.json'), 'w')
jsonfile.write('[' + '\n' + ' ')
fieldnames = csvfile.readline().replace('\n','').split(';')
num_lines = sum(1 for line in open('file.csv')) -1
reader = csv.DictReader(csvfile, fieldnames)
i = 0
for row in reader:
i += 1
json.dump(row, jsonfile, indent=4,sort_keys=False)
if i < num_lines:
jsonfile.write(',')
jsonfile.write('\n')
jsonfile.write(' ' + ']')
print('Done')
Thanks for helping.
Replace line
reader = csv.DictReader(csvfile, fieldnames)
with
reader = csv.DictReader(csvfile, fieldnames, delimiter=';')
Also, you open file1.csv and later get lines number from file.csv
num_lines = sum(1 for line in open('file.csv')) -2
Your solution could be reduced to
import json
import csv
csvfile = open('file1.csv', 'r')
jsonfile = open('file1.csv'.replace('.csv','.json'), 'w')
jsonfile.write('{\n[\n')
fieldnames = csvfile.readline().replace('\n','').split(';')
reader = csv.DictReader(csvfile, fieldnames, delimiter=';')
for row in reader:
json.dump(row, jsonfile, indent=4)
jsonfile.write(';\n')
jsonfile.write(']\n}')
If you want to save order of columns from csv you could use
from collections import OrderedDict
...
for row in reader:
json.dump(OrderedDict([(f, row[f]) for f in fieldnames]), jsonfile, indent=4)
jsonfile.write(';\n')
jsonfile.write(']\n}')
I want to import a CSV into multiple dictionaries in python
queryInclude,yahoo,value1
queryInclude,yahoo,value2
queryInclude,yahoo,value3
queryExclude,yahoo,value4
queryExclude,yahoo,value5
queryInclude,google,value6
queryExclude,google,value7
My ideal result would have row[0]=dictionary, row[1]=key, and row[2]=value or list of values
queryInclude = {
"yahoo": ["value1", "value2", "value3"],
"google": ["value6"] }
queryExclude = {
"yahoo": ["value4", "value5"],
"google": ["value7"] }
Here's my code:
import csv
queryList=[]
queryDict={}
with open('dictionary.csv') as csvfile:
reader = csv.reader(csvfile, delimiter=',', quotechar='|')
for row in reader:
queryDict[row[1]] = queryList.append(row[2])
print queryDict
{'yahoo': None}
{'yahoo': None}
{'yahoo': None}
{'yahoo': None}
{'yahoo': None}
{'google': None, 'yahoo': None}
{'google': None, 'yahoo': None}
I have the flexibility to change the CSV format if needed. My ideal result posted above is what I already had hard-coded into my app. I'm trying to make it easier to add more values down the road. I've spent many hours researching this and will continue to update if I make any more progress. My thought process looks like this... not sure how close I am to understanding how to structure my loops and combine like values while iterating through the CSV rows...
for row in reader:
where row[0] = queryInclude:
create a dictionary combining keys into a list of values
where row[0] = queryExclude:
create a dictionary combining keys into a list of values
Using defaultdict prevents having to account for the first element added to a dictionary. It declares the default type when the key is not present and must be a callable that creates the default object:
#! python3
import csv
from io import StringIO
from collections import defaultdict
from pprint import pprint
data = StringIO('''\
queryInclude,yahoo,value1
queryInclude,yahoo,value2
queryInclude,yahoo,value3
queryExclude,yahoo,value4
queryExclude,yahoo,value5
queryInclude,google,value6
queryExclude,google,value7
''')
D = defaultdict(lambda: defaultdict(list))
for d,k,v in csv.reader(data):
D[d][k].append(v)
pprint(D)
Output:
{'queryExclude': {'google': ['value7'],
'yahoo': ['value4', 'value5']},
'queryInclude': {'google': ['value6'],
'yahoo': ['value1', 'value2', 'value3']}}
Is this helping?
import StringIO
import csv
csvfile = StringIO.StringIO("""queryInclude,yahoo,value1
queryInclude,yahoo,value2
queryInclude,yahoo,value3
queryExclude,yahoo,value4
queryExclude,yahoo,value5
queryInclude,google,value6
queryExclude,google,value7""")
reader = csv.reader(csvfile, delimiter=',', quotechar='|')
dict1={}
for row in reader:
key1, provider, value1 = row
if not dict1.has_key(key1):
dict1[key1] = {}
if not dict1[key1].has_key(provider):
dict1[key1][provider] = []
dict1[key1][provider].append(value1)