I have a long dict which was created by marging of lists of tuples. This dict contains a values from the lists- so it has a order, like
value:key1, value:key2, value:key3, value:key1, value:key2, value:key3
But its not a rule! There are places where there is not a key2 and places where is for example key4.
This values has a different keys.
So it look like much like this
value:key1, value:key2, value:key1, value:key2, value:key4
I would like to create a csv file from this data. I would like to look over the dict, look at the keys, add these keys to csv header, if it doesn't contain that key and add value to that keys and none if there is not a value.
So I have this
{'www.example1.com': 'url', 'FAILURE TO APPEAR (FOR FELONY OFFENSE) - FELONY': 'Charge', 'SIMULTANEOUS POSSESSION OF DRUGS AND FIREARMS - FELONY': 'Offense Date', 'POSSESSION WITH INTENT TO DELIVER METHAMPHETAMINE OR COCAINE': 'Court Type', 'Count=3': 'Court Date', '10-30-2019': 'Bond', '11-16-2019': 'Charging Agency', '': 'DEGREE', '181680713': 'ID', '24': 'Age', 'H': 'Race', 'M': 'Sex', 'BRO': 'Eye Color', 'BLK': 'Hair Color', '175 lb (79 kg)': 'Weight', '5′ 10″ (1.78 m)': 'Height', 'address example': 'Address', '11/16/2019 at 22:07': 'Admit Date', 'Benton Co': 'Confining Agency',
'www.example2.com': 'url', '32-5a-191.4': 'STATUTE', '000-0000 (ALABAMA STATE TROOPERS)': 'COURT CASE NUMBER', 'IGNITION INTERLOCK VIOLATION': 'Description', 'V': 'LEVEL', '$1000.00': 'Bond Set Amount', '181727213': 'ID', 'name example': 'Name', 'W': 'Race', 'MALE': 'Gender', 'Released': 'Inmate Status', 'some number': 'Booking No', 'some number': 'Inmate Number', '11/18/2019 at 16:49': 'Booking Date', '11/18/2019 at 20:35': 'Release Date', '33': 'Arrest Age', 'some address': 'Address Given'}
and I would like to have a csv file like this
url | Charge | Statute
1 www.example1.com SIMULTANEOUS none
2 www.example2.com none 32-5a-191.4
order in header is not important.
I tried this code, but It overwrites data in first row, without appending...
res = defaultdict(list)
d = dict((y, x) for x, y in my_dict)
for key, val in sorted(d.items()):
res[val].append(key)
df = pd.DataFrame.from_dict(res, orient='index').fillna(np.nan).T
df.to_csv("file.csv")
In your example i see every new row start with url.
I think this code can do it.
from collections import defaultdict
import pandas as pd
my_dict = {
'www.example1.com': 'url',
'FAILURE TO APPEAR (FOR FELONY OFFENSE) - FELONY': 'Charge',
'SIMULTANEOUS POSSESSION OF DRUGS AND FIREARMS - FELONY': 'Offense Date',
'POSSESSION WITH INTENT TO DELIVER METHAMPHETAMINE OR COCAINE': 'Court Type',
'Count=3': 'Court Date',
'10-30-2019': 'Bond',
'11-16-2019': 'Charging Agency',
'': 'DEGREE',
'181680713': 'ID',
'24': 'Age',
'H': 'Race',
'M': 'Sex',
'BRO': 'Eye Color',
'BLK': 'Hair Color',
'175 lb (79 kg)': 'Weight',
'5′ 10″ (1.78 m)': 'Height',
'address example': 'Address',
'11/16/2019 at 22:07': 'Admit Date',
'Benton Co': 'Confining Agency',
'www.example2.com': 'url',
'32-5a-191.4': 'STATUTE',
'000-0000 (ALABAMA STATE TROOPERS)': 'COURT CASE NUMBER',
'IGNITION INTERLOCK VIOLATION': 'Description',
'V': 'LEVEL',
'$1000.00': 'Bond Set Amount',
'181727213': 'ID',
'name example': 'Name',
'W': 'Race',
'MALE': 'Gender',
'Released': 'Inmate Status',
'some number': 'Booking No',
'some number': 'Inmate Number',
'11/18/2019 at 16:49': 'Booking Date',
'11/18/2019 at 20:35': 'Release Date',
'33': 'Arrest Age',
'some address': 'Address Given'
}
items = []
curr_dict = None
for key in my_dict.keys():
new_key = my_dict[key]
new_value = key if key else 'None'
if new_key == 'url':
curr_dict = {}
items.append(curr_dict)
curr_dict[new_key] = new_value
df = pd.DataFrame(items).fillna('None')
df.to_csv("file.csv", index = False)
Related
Have dict1 {subdict1,subdict2}, dict2 {subdict1,subdict2} and dict3 (doesnt have subdicts) into a list 'insights', need to create a gsheet file for each dict of 'insights' list but a sheet for each subdict, this is what its inside 'insights':
[{'city': {'name': 'follower_demographics',
'period': 'lifetime',
'title': 'Follower demographics',
'type': 'city',
'description': 'The demographic characteristics of followers, including countries, cities and gender distribution.',
'df': ''},
'gender': {'name': 'follower_demographics',
'period': 'lifetime',
'title': 'Follower demographics',
'type': 'gender',
'description': 'The demographic characteristics of followers, including countries, cities and gender distribution.',
'df': ''},
'country': {'name': 'follower_demographics',
'period': 'lifetime',
'title': 'Follower demographics',
'type': 'country',
'description': 'The demographic characteristics of followers, including countries, cities and gender distribution.',
'df': ''},
'age': {'name': 'follower_demographics',
'period': 'lifetime',
'title': 'Follower demographics',
'type': 'age',
'description': 'The demographic characteristics of followers, including countries, cities and gender distribution.',
'df': ''}},
{'city': {'name': 'reached_audience_demographics',
'period': 'lifetime',
'title': 'Reached audience demographics',
'type': 'city',
'description': 'The demographic characteristics of the reached audience, including countries, cities and gender distribution.',
'df': ''},
'gender': {'name': 'reached_audience_demographics',
'period': 'lifetime',
'title': 'Reached audience demographics',
'type': 'gender',
'description': 'The demographic characteristics of the reached audience, including countries, cities and gender distribution.',
'df': ''},
'country': {'name': 'reached_audience_demographics',
'period': 'lifetime',
'title': 'Reached audience demographics',
'type': 'country',
'description': 'The demographic characteristics of the reached audience, including countries, cities and gender distribution.',
'df': ''},
'age': {'name': 'reached_audience_demographics',
'period': 'lifetime',
'title': 'Reached audience demographics',
'type': 'age',
'description': 'The demographic characteristics of the reached audience, including countries, cities and gender distribution.',
'df': ''}},
{'name': 'follower_count',
'period': 'day',
'title': 'Follower Count',
'description': 'Total number of unique accounts following this profile',
'df': ''}]
As you can see in summary the list is the following:
insights = [
follower_demographics,
reached_demographics,
followers_count
]
And this is what each dictionary of the list have, in the case of 'follower_demographics' it breaks in a dictionary of ['city', 'gender', 'country', 'age'] where inside each one is this:
demographics = {
'name': '',
'period': '',
'title': '',
'type': '',
'description': '',
'df': ''
}
So I did the function below to create a file for the 3 dictionaries of 'insights', the problem is that it creates 4 files of 'follower_demographics' and each one with one respective dataframe.
def create_gsheet(insights, folder_id):
try:
# create a list to store the created files
files = []
# iterate over the items in the insights dictionary
for idx, (key, value) in enumerate(insights.items()):
# check if the value is a dictionary
if isinstance(value, dict):
# Create a new file with the name taken from the 'title' key
file = gc.create(value['title'], folder=folder_id)
print(f"Creating {value['title']} - {idx}/{len(insights)}")
# add the file to the list
files.append(file)
# Create a new sheet within the file with the name taken from the 'name' key
sheet = file.add_worksheet(value['type'] + '_' + value['name'])
# Set the sheet data to the df provided in the dictionary
sheet.set_dataframe(value['df'], (1,1), encoding='utf-8', fit=True)
sheet.frozen_rows = 1
# delete the default sheet1 from all the created files
for file in files:
file.del_worksheet(file.sheet1)
except Exception as error:
print(F'An error occurred: {error}')
sheet = None
And the result I want is that for example create 'follower_demographics' file and as sub_sheets 'city_follower_demographics', 'gender_follower_demographics' with their respective dataframes.
Using this dictionary, is there a way I can only extract the Name, Last Name, and Age of the boys?
myDict = {'boy1': {'Name': 'JM', 'Last Name':'Delgado', 'Middle Name':'Goneza', 'Age':'21',
'Birthday':'8/22/2001', 'Gender':'Male'},
'boy2': {'Name': 'Ralph', 'Last Name':'Tubongbanua', 'Middle Name':'Castro',
'Age':'21', 'Birthday':'9/5/2001', 'Gender':'Male'},}
for required in myDict.values():
print (required ['Name', 'Last Name', 'Age'])
The output is:
JM
Ralph
What I have in mind is
JM Delgado 21
Ralph Tubongbanua 21
You have to extract the keys one by one:
myDict = {'boy1': {'Name': 'JM', 'Last Name':'Delgado', 'Middle Name':'Goneza', 'Age':'21',
'Birthday':'8/22/2001', 'Gender':'Male'},
'boy2': {'Name': 'Ralph', 'Last Name':'Tubongbanua', 'Middle Name':'Castro',
'Age':'21', 'Birthday':'9/5/2001', 'Gender':'Male'},}
for required in myDict.values():
print (required['Name'], required['Last Name'],required['Age'])
this could be a solution:
myDict = {'boy1': {'Name': 'JM', 'Last Name':'Delgado', 'Middle, Name':'Goneza', 'Age':'21', 'Birthday':'8/22/2001', 'Gender':'Male'},
'boy2': {'Name': 'Ralph', 'Last Name':'Tubongbanua', 'Middle Name':'Castro',
'Age':'21', 'Birthday':'9/5/2001', 'Gender':'Male'},}
for required in myDict.values():
print(required ['Name'], required['Last Name'], required['Age'])
When printing multiple values separated with commas, a space will automatically appear between them.
I'm using Python to translate a txt file into JSON. However, when I was iterating the lines from txt, the result is containing multiple lists there, and I failed to merge the list into a dict with the function zip(). Can anyone help me figure it out? I've been stuck here for a couple of hours. Thanks.
with open(path, encoding='utf-8-sig') as f:
seq = re.compile("[|]")
for line_num, line in enumerate(f.readlines()):
result = seq.split(line.strip("\n"))
print(result)
This is the output:
['Delivery', 'Customer Name', 'Shipment Priority', 'Creation Date', 'Customer Number', 'Batch Name', 'Release Date']
['69328624', 'Zhidi Feng', 'Standard Priority', '13-OCT-20', '432579', '19657423', '13-OCT-20 00:01:07']
['69328677', 'Zhengguo Huang', 'Standard Priority', '13-OCT-20', '429085', '19657425', '13-OCT-20 00:01:34']
Something like that ?
>>> lists = [["12", "Abc", "def"], ["34", "Ghi", "jkl"]]
>>> fields = ["id", "lastname", "firstname"]
>>> dicts = []
>>> for l in lists:
... d = {}
... for i in range(3):
... d[fields[i]] = l[i]
... dicts.append(d)
>>> dicts
[{'id': '12', 'lastname': 'Abc', 'firstname': 'def'},
{'id': '34', 'lastname': 'Ghi', 'firstname': 'jkl'}]
Edit: included in your existing code:
dicts = []
for line_num, line in enumerate(f.readlines()):
result = seq.split(line.strip("\n"))
if line_num = 0:
keys = result
else:
d = {}
for i, key in enumerate(keys):
d[key] = result[i]
dicts.append(d)
(I didn't test this since I don't have the file you're using)
you can use zip for making a dictionary like this. (I separate keys and values in two lists.)
with open(path, encoding='utf-8-sig') as f:
seq = re.compile("[|]")
lines = f.readlines()
keys = lines[0] # stting keys
dict_list = []
for line_num, line in enumerate(lines[1:]):
result = seq.split(line.strip("\n"))
dict_list.append(dict(zip(keys, result))) # making a dict and append it to list
>>> total
[['Delivery', 'Customer Name', 'Shipment Priority', 'Creation Date', 'Customer Number', 'Batch Name', 'Release Date'], ['69328624', 'Zhidi Feng', 'Standard Priority', '13-OCT-20', '432579', '19657423', '13-OCT-20 00:01:07'], ['69328677', 'Zhengguo Huang', 'Standard Priority', '13-OCT-20', '429085', '19657425', '13-OCT-20 00:01:34']]
>>> keys = total[0]
>>>
>>> values = total[1:]
>>> wanted = [ dict(zip(keys, value)) for value in values]
>>> wanted
[{'Delivery': '69328624', 'Customer Name': 'Zhidi Feng', 'Shipment Priority': 'Standard Priority', 'Creation Date': '13-OCT-20', 'Customer Number': '432579', 'Batch Name': '19657423', 'Release Date': '13-OCT-20 00:01:07'}, {'Delivery': '69328677', 'Customer Name': 'Zhengguo Huang', 'Shipment Priority': 'Standard Priority', 'Creation Date': '13-OCT-20', 'Customer Number': '429085', 'Batch Name': '19657425', 'Release Date': '13-OCT-20 00:01:34'}]
I have 4 dictionaries and some of the fields are not present in all the 4. I get this error
ValueError: dict contains fields not in fieldnames: 'from', 'Command'
After reading some docs I found extrasaction='ignore' parameter for DictWriter but it leaves out some fields in the CSV file.
I need all the fields in the CSV file even though they are empty for some of the dictionaries. From what I understand it is just printing the common fields.
The dictionaries are:
{'Command': 'DELETE', 'table': 'abc', 'from': 'abc', 'where_expr': 'c = book'}
{'Command': 'SELECT', 'columns': 'a b', 'table': 'tab1', 'from': 'tab1'}
{'Command': 'INSERT', 'table': 'xyz', 'into': 'xyz', 'columns': 'pencil pens', 'values': '200 20'}
{'Command': 'UPDATE', 'table': 'Student', 'columns': 'NAME', 'values': 'PRATIK', 'where_expr': 'a = 100'}
I have included extrasaction because of the error.
with open('del.csv', 'w') as f:
write = csv.DictWriter(f, dict4.keys(),extrasaction='ignore')
write.writeheader()
write.writerow(dict5)
write.writerow(dict6)
write.writerow(dict7)
write.writerow(dict8)
Output looks something like:
(I have used commas to separate the fields. Empty spaces mean empty fields(they have no value))
table| columns|values| where_expr
abc, , ,c = book
tab1,a b, , ,
xyz, pencil pens,200 20,
Student ,NAME, PRATIK, a = 100
Edited: The required output is:
(Empty spaces mean empty fields) I wish I could post the CSV file.
Command|Table|Columns|From|Where_expr|Into|Values
DELETE,abc, ,abc,c = book, , ,
SELECT,tab1, a b,tab1, , , ,
INSERT,xyz,pencil pens, , ,xyz, 200 20
UPDATE, student,name, ,a = 100, , PRATIK
With set.union operation (for a case with dynamic number of columns):
import csv
dict5 = {'Command': 'DELETE', 'table': 'abc', 'from': 'abc', 'where_expr': 'c = book'}
dict6 = {'Command': 'SELECT', 'columns': 'a b', 'table': 'tab1', 'from': 'tab1'}
dict7 = {'Command': 'INSERT', 'table': 'xyz', 'into': 'xyz', 'columns': 'pencil pens', 'values': '200 20'}
dict8 = {'Command': 'UPDATE', 'table': 'Student', 'columns': 'NAME', 'values': 'PRATIK', 'where_expr': 'a = 100'}
with open('names.csv', 'w', newline='') as csvfile:
# `initial_fieldnames` are your dict4 keys
initial_fieldnames = ["table", "columns", "values", "where_expr"]
fieldnames = sorted(set(initial_fieldnames).union(*[dict5, dict6, dict7, dict8]))
writer = csv.DictWriter(csvfile, fieldnames=fieldnames, extrasaction='ignore')
writer.writeheader()
writer.writerows([dict5, dict6, dict7, dict8])
names.csv contents:
Command,columns,from,into,table,values,where_expr
DELETE,,abc,,abc,,c = book
SELECT,a b,tab1,,tab1,,
INSERT,pencil pens,,xyz,xyz,200 20,
UPDATE,NAME,,,Student,PRATIK,a = 100
You can use pandas data frame to load all dictionaries in a list and make csv file from that dataframe. Try this :
import pandas as pd
d1 = {'Command': 'DELETE', 'table': 'abc', 'from': 'abc', 'where_expr': 'c = book'}
d2 = {'Command': 'SELECT', 'columns': 'a b', 'table': 'tab1', 'from': 'tab1'}
d3 = {'Command': 'INSERT', 'table': 'xyz', 'into': 'xyz', 'columns': 'pencil pens', 'values': '200 20'}
d4 = {'Command': 'UPDATE', 'table': 'Student', 'columns': 'NAME', 'values': 'PRATIK', 'where_expr': 'a = 100'}
d = [d1, d2, d3, d4]
d2 = []
col = ['Command', 'table', 'columns', 'from', 'where_expr', 'into', 'values']
for i in d:
temp = {}
for c in col:
if c in i:
temp[c] = i[c]
else:
temp[c] = ''
d2.append(temp)
df2 = pd.DataFrame(d2, columns=col)
df2.columns = [c.capitalize() for c in col]
df2.to_csv('test21.csv', index=False, sep=',')
Output (csv file content) :
Command,Table,Columns,From,Where_expr,Into,Values
DELETE,abc,,abc,c = book,,
SELECT,tab1,a b,tab1,,,
INSERT,xyz,pencil pens,,,xyz,200 20
UPDATE,Student,NAME,,a = 100,,PRATIK
Input :
{'Name': 'A','Blood Group': 'O +ve', 'Age': '1', 'Sex': 'M','Phone Number': '01234567', 'Mobile Number': '9876543210', 'Date of Birth': '01-01-95'}
1.
d.update({'Contact Info': {'Mobile Number':d['Mobile Number'],'Phone
Number':d['Phone Number'] }})
2.
d['Contact Info']={}
d['Contact Info']['Mobile Number']=d['Mobile Number']
Can you say any better way or different way to create a dictionary key which can be assigned to a dict item as value???
Original Code:
import csv
import copy
from collections import namedtuple
d={}
ls=[]
def nest():
with open ("details.csv", 'r') as f:
reader=csv.DictReader(f)
for row in reader:
d.update(row)
PersonalDetails = namedtuple('PersonalDetails','blood_group age sex')
ContactInfo = namedtuple('ContactInfo','phone_number mobile_number')
d1=copy.deepcopy(d)
ls.append(d1)
print ls
nest()
This is how I would update my dict of dicts:
I would create a function that will take a 3 arguments(The key of the subdict, the subkey of said subdict and the value you want to change.) I assign to be updated and then update that value.
d = {
'Name': 'A',
'Personal Details': {'Blood Group': 'O +ve', 'Age': '1', 'Sex': 'M'},
'Contact Info': {'Phone Number': '01234567', 'Mobile Number': '9876543210'},
'Date of Birth': '01-01-95'
}
def updateInfo(toBeUpdated, subkey, ValueToUpdate):
if toBeUpdated in d:
tempdict = d[toBeUpdated]
tempdict[subkey] = ValueToUpdate
d[toBeUpdated] = tempdict
print (d)
else:
print ("No %s to update" % (toBeUpdated))
updateInfo('Contact Info','Mobile Number','999 999 9999')
the result I get from this:
{'Name': 'A', 'Personal Details': {'Blood Group': 'O +ve', 'Age': '1', 'Sex': 'M'}, 'Contact Info': {'Phone Number': '01234567', 'Mobile Number': '999 999 9999'}, 'Date of Birth': '01-01-95'}