Excel parsing in python and forming dictionary - python

I have an Excel file with a lot of data in it . i would like to get the info like in a dictionary example the first column of the excel will be the key and the rest of the column will be the values
Excel:
No name lastname hobby
1 jhon g fishing
2 mike a boxing
3 tom v sking
is it possible to have it like
dict = {No:1, name:jhon, lastname:g, hobby:fishing},
dict = {No:2, name:mike, lastname:a, hobby:boxing},
i tried converting the excel to csv and tried csv.DictReader it did not work for me is there any other way

Given the following CSV file:
No,name,lastname,hobby
1,jhon,g,fishing
2,mike,a,boxing
3,tom,v,sking
The following code appears to do what you're asking for:
In [1]: import csv
In [2]: for d in csv.DictReader(open('file.txt')): print d
...:
{'hobby': 'fishing', 'lastname': 'g', 'name': 'jhon', 'No': '1'}
{'hobby': 'boxing', 'lastname': 'a', 'name': 'mike', 'No': '2'}
{'hobby': 'sking', 'lastname': 'v', 'name': 'tom', 'No': '3'}

Related

How to access specific attributeds and output information in the object

Hello there I have a JSON dataset with a whole bunch of entries like that:
There are multiple entries for each date that look the same, just different dates with multiple entries of people added to that date.
So I could get information by using
{json_content['12-1-2021'][0]['name']}
{'12-1-2021': [{'initials': 'IS',
'name': 'Sam',
'age': 23,
'address': 'Freedom Dr',
'city': 'Seattle',
'state': 'WA'},
{'initials': 'SD',
'name': 'Sed',
'age': 21,
'address': 'Washington Dr',
'city': 'Phoenix',
'state': 'AZ'}]}
I want to iterate somehow through the dataset and select for instance all the people who live in Seattle without the date(maybe add the date later- not sure the requirement on that yet). But I can't do that without specifying the date since the beginning.
Thank you!
You definitely can. Since you don't care about the keys of the dictionary just go through the values:
names = [
person['name']
for date in json_content.values()
for person in date if person['city'] == 'Seattle'
]
If you don't want to make any assumptions about how valid the structure of the json is, you can check it explicitly along the way in addition to checking the city:
[
person['name']
for date in json_content.values() if isinstance(date, list)
for person in date if all([
isinstance(person, dict),
'name' in person,
'city' in person,
person['city'] == 'Seattle'])
]
Both of these get you ['Sam'] for your sample json.
If you want a complete person record and you are having future scope for a date then I will suggest the below code.
for person_records in json_content.values():
if isinstance(person_records, list):
for person in person_records:
if person.get("city") == "Seattle":
print(person)

Traverse through a JSON file

Hi I have a lrage JSON file.I'm reading the data from the JSON file & storing it in a list. I need to extract some element from the JSON file. So I wrote this code
l=len(alldata_json)
for i in range(l):
df_school_us.loc[i,'schoolName']=alldata_json[i].get('schoolName')
data_address=alldata_json[i].get('addressLocations')
df_school_us.loc[i,'Latitude']=data_address[0].get('Location').get('latitude')
df_school_us.loc[i,'Longitude']=data_address[0].get('Location').get('longitude')
print("i= ",i)
len(alldata_json) is returning 87598 & alldata_json contains my json data.But I'm feeling running for loop with this many number of rows is not an optimized approach. Can you suggest me how to do it without for loop?
df = pd.DataFrame(alldata_json)
df2 = pd.concat([df.drop('addressLocations', axis=1),
df['addressLocations'].apply(pd.Series)], axis=1)
Extracting countryCode, latitude, and longitude
import pandas as pd
data = [{'locationType': 'ab',
'address': {'countryCode': 'IN',
'city': 'Mumbai',
'zipCode': '5000',
'schoolNumber': '2252'},
'Location': {'latitude': 19.0760,
'longitude': 72.8777},
'names': [{'languageCode': 'IN', 'name': 'DPS'},
{'languageCode': 'IN', 'name': 'DPS'}]}]
df = pd.DataFrame(data)
df2 = pd.concat([df['address'].apply(pd.Series)['countryCode'],
df['Location'].apply(pd.Series)[['latitude', 'longitude']]

Convert Nested JSON into Dataframe

I have a nested JSON like below. I want to convert it into a pandas dataframe. As part of that, I also need to parse the weight value only. I don't need the unit.
I also want the number values converted from string to numeric.
Any help would be appreciated. I'm relatively new to python. Thank you.
JSON Example:
{'id': '123', 'name': 'joe', 'weight': {'number': '100', 'unit': 'lbs'},
'gender': 'male'}
Sample output below:
id name weight gender
123 joe 100 male
use " from pandas.io.json import json_normalize ".
id name weight.number weight.unit gender
123 joe 100 lbs male
if you want to discard the weight unit, just flatten the json:
temp = {'id': '123', 'name': 'joe', 'weight': {'number': '100', 'unit': 'lbs'}, 'gender': 'male'}
temp['weight'] = temp['weight']['number']
then turn it into a dataframe:
pd.DataFrame(temp)
Something like this should do the trick:
json_data = [{'id': '123', 'name': 'joe', 'weight': {'number': '100', 'unit': 'lbs'}, 'gender': 'male'}]
# convert the data to a DataFrame
df = pd.DataFrame.from_records(json_data)
# conver id to an int
df['id'] = df['id'].apply(int)
# get the 'number' field of weight and convert it to an int
df['weight'] = df['weight'].apply(lambda x: int(x['number']))
df

Filtering through a list with embedded dictionaries

I've got a json format list with some dictionaries within each list, it looks like the following:
[{"id":13, "name":"Albert", "venue":{"id":123, "town":"Birmingham"}, "month":"February"},
{"id":17, "name":"Alfred", "venue":{"id":456, "town":"London"}, "month":"February"},
{"id":20, "name":"David", "venue":{"id":14, "town":"Southampton"}, "month":"June"},
{"id":17, "name":"Mary", "venue":{"id":56, "town":"London"}, "month":"December"}]
The amount of entries within the list can be up to 100. I plan to present the 'name' for each entry, one result at a time, for those that have London as a town. The rest are of no use to me. I'm a beginner at python so I would appreciate a suggestion in how to go about this efficiently. I initially thought it would be best to remove all entries that don't have London and then I can go through them one by one.
I also wondered if it might be quicker to not filter but to cycle through the entire json and select the names of entries that have the town as London.
You can use filter:
data = [{"id":13, "name":"Albert", "venue":{"id":123, "town":"Birmingham"}, "month":"February"},
{"id":17, "name":"Alfred", "venue":{"id":456, "town":"London"}, "month":"February"},
{"id":20, "name":"David", "venue":{"id":14, "town":"Southampton"}, "month":"June"},
{"id":17, "name":"Mary", "venue":{"id":56, "town":"London"}, "month":"December"}]
london_dicts = filter(lambda d: d['venue']['town'] == 'London', data)
for d in london_dicts:
print(d)
This is as efficient as it can get because:
The loop is written in C (in case of CPython)
filter returns an iterator (in Python 3), which means that the results are loaded to memory one by one as required
One way is to use list comprehension:
>>> data = [{"id":13, "name":"Albert", "venue":{"id":123, "town":"Birmingham"}, "month":"February"},
{"id":17, "name":"Alfred", "venue":{"id":456, "town":"London"}, "month":"February"},
{"id":20, "name":"David", "venue":{"id":14, "town":"Southampton"}, "month":"June"},
{"id":17, "name":"Mary", "venue":{"id":56, "town":"London"}, "month":"December"}]
>>> [d for d in data if d['venue']['town'] == 'London']
[{'id': 17,
'name': 'Alfred',
'venue': {'id': 456, 'town': 'London'},
'month': 'February'},
{'id': 17,
'name': 'Mary',
'venue': {'id': 56, 'town': 'London'},
'month': 'December'}]

Convert dataframe to dictionary as shown

My dataframe is as shown
name key value
john A223 390309
jason B439 230943
peter A5388 572039
john D23902 238939
jason F2390 23930
I want to convert the above generated dataframe into a dictionary in the below shown format.
{'john': {'key':'A223', 'value':'390309', 'key':'A5388', 'value':'572039'},
'jason': {'key':'B439','value':'230943', 'key':'F2390', 'value':'23930'},
'peter': {'key':'A5388' ,'value':'572039'}}
I tried a = dict(zip(dataframe['key'],dataframe['value'])).
But wont give me the dataframe columns headers.
Dictionary keys must be unique
Assuming, as in your desired output, you want to keep only rows with the first instance of each name, you can reverse row order and then use to_dict with orient='index':
res = df.iloc[::-1].set_index('name').to_dict('index')
print(res)
{'jason': {'key': 'B439', 'value': 230943},
'john': {'key': 'A223', 'value': 390309},
'peter': {'key': 'A5388', 'value': 572039}}

Categories

Resources