python csv to dictionary using csv or pandas module - python

I am using Python's csv.DictReader to read in values from a CSV file to create a dictionary where keys are first row or headers in the CSV and other rows are values. It works perfectly as expected and I am able to get a dictionary, but I only want certain keys to be in the dictionary rather than all of the column values. What is the best way to do this? I tried using csv.reader but I don't think it has this functionality. Maybe this can be achieved using pandas?
Here is the code I was using with CSV module where Fieldnames was the keys that I wanted to retain in my dict. I realized it isn't used for what I described above.
import csv
with open(target_path+target_file) as csvfile:
reader = csv.DictReader(csvfile,fieldnames=Fieldnames)
for i in reader:
print i

You can do this very simply using pandas.
import pandas as pd
# get only the columns you want from the csv file
df = pd.read_csv(target_path + target_file, usecols=['Column Name1', 'Column Name2'])
result = df.to_dict(orient='records')
Sources:
pandas.read_csv
pandas.DataFrame.to_dict

You can use the to_dict method to get a list of dicts:
import pandas as pd
df = pd.read_csv(target_path+target_file, names=Fieldnames)
records = df.to_dict(orient='records')
for row in records:
print row
to_dict documentation:
In [67]: df.to_dict?
Signature: df.to_dict(orient='dict')
Docstring:
Convert DataFrame to dictionary.
Parameters
----------
orient : str {'dict', 'list', 'series', 'split', 'records', 'index'}
Determines the type of the values of the dictionary.
- dict (default) : dict like {column -> {index -> value}}
- list : dict like {column -> [values]}
- series : dict like {column -> Series(values)}
- split : dict like
{index -> [index], columns -> [columns], data -> [values]}
- records : list like
[{column -> value}, ... , {column -> value}]
- index : dict like {index -> {column -> value}}
.. versionadded:: 0.17.0
Abbreviations are allowed. `s` indicates `series` and `sp`
indicates `split`.
Returns
-------
result : dict like {column -> {index -> value}}
File: /usr/local/lib/python2.7/dist-packages/pandas/core/frame.py
Type: instancemethod

Related

Pandas dataframe map to a new list of objects [duplicate]

This question already has answers here:
Pandas DataFrame to List of Dictionaries
(5 answers)
Closed 1 year ago.
I am new in python, so every tip will be helpful :)
I have a pandas dataframe with multiple columns and I need it converted to a new list of objects. Among all of dataframes columns I have two (lat, lon) that I want in my new object as attributes.
index
city
lat
lon
0
London
42.33
55.44
1
Rome
92.44
88.11
My new list of object will need to look something like this:
[
{'lat': 42.33, 'lon': 55.44},
{'lat': 92.44, 'lon': 88.11}
]
More specifically I need this for Machine Learning with ML Studio.
Thanks!
Use Pandas.DataFrame.to_dict(orient) to convert a DataFrame into a dictionary. There are multiple dictionary orientations; for your case use orient='records'
You also want to only select the lat & lon columns, like this:
df[['lat','lon']].to_dict(orient='records')
This will give you your result:
[{'lat': 42.33, 'lon': 55.44}, {'lat': 92.44, 'lon': 88.11}]
Here are some other orientations you could try out:
‘dict’ (default) : dict like {column -> {index -> value}}
‘list’ : dict like {column -> [values]}
‘series’ : dict like {column -> Series(values)}
‘split’ : dict like {‘index’ -> [index], ‘columns’ -> [columns], ‘data’ -> [values]}
‘records’ : list like [{column -> value}, … , {column -> value}]
‘index’ : dict like {index -> {column -> value}}
You can choose the columns you want and then use to_dict with orient='records' to get the required result
df[["lat", "lon"]].to_dict(orient='records')

Creating List Comprehension using pandas dataframe

I am new to pandas, and I would appreciate any help. I have a pandas dataframe that comes from csv file. The data contains 2 columns : dates and cashflows. Is it possible to convert these list into list comprehension with tuples inside the list? Here how my dataset looks like:
2021/07/15 4862.306832
2021/08/15 3474.465543
2021/09/15 7121.260118
The desired output is :
[(2021/07/15, 4862.306832),
(2021/08/15, 3474.465543),
(2021/09/15, 7121.260118)]
use apply with lambda function
data = {
"date":["2021/07/15","2021/08/15","2021/09/15"],
"value":["4862.306832","3474.465543","7121.260118"]
}
df = pd.DataFrame(data)
listt = df.apply(lambda x:(x["date"],x["value"]),1).tolist()
Output:
[('2021/07/15', '4862.306832'),
('2021/08/15', '3474.465543'),
('2021/09/15', '7121.260118')]

dataframe using list vs dictionary

import pandas as pd
pincodes = [800678,800456]
numbers = [2567890, 256757]
labels = ['R','M']
first = pd.DataFrame({'Number':numbers, 'Pincode':pincodes},
index=labels)
print(first)
The above code gives me the following (correct) dataframe.
Number Pincode
R 2567890 800678
M 256757 800456
But, when I use this statement,
second = pd.DataFrame([numbers,pincodes],
index=labels, columns=['Number','Pincode'])
print(second)
then I get the following (incorrect) output.
Number Pincode
R 2567890 256757
M 800678 800456
As you can see, the two Data Frames are different. Why does this happen? What's so different in this dictionary vs list approach?
The constructor of pd.DataFrame() includes this documentation.
Init signature: pd.DataFrame(data=None, index=None, columns=None, dtype=None, copy=False)
Docstring:
...
Parameters
----------
data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame
Dict can contain Series, arrays, constants, or list-like objects
.. versionchanged :: 0.23.0
If data is a dict, column order follows insertion-order for
Python 3.6 and later.
.. versionchanged :: 0.25.0
If data is a list of dicts, column order follows insertion-order
for Python 3.6 and later.
The key word is column. In the first approach, you correctly tell pandas that numbers is the column with label 'Numbers'. But in the second approach, you tell pandas that the columns are 'Numbers' and 'Pincode' and to get the data from the list of lists [numbers, pincodes]. The first column of this list of lists is assigned to the 'Numbers' column, and the second to the 'Pincode' column.
If you want to enter your data this way (not as a dictionary), you need to transpose the list of lists.
>>> import numpy as np
# old way
>>> pd.DataFrame([numbers,pincodes],
index=labels,columns=['Number','Pincode'])
Number Pincode
R 2567890 256757
M 800678 800456
# Transpose the data instead so the rows become the columns.
>>> pd.DataFrame(np.transpose([numbers,pincodes]),
index=labels,columns=['Number','Pincode'])
Number Pincode
R 2567890 800678
M 256757 800456

How can I flatten a nested JSON with single quotes with Pandas?

I need to analyze metadata from here: http://jmcauley.ucsd.edu/data/amazon/links.html
However, metadata JSON files here are nested & have single quotes, not double quotes. Therefore I can't use json_normalize to flatten this data into a Pandas dataframe.
Example:
{'A':'1', 'B':{'c':['1','2'], 'd':['3','4']}}
I need to flatten this into a Pandas data frame with objects A B.c B.d
With guideline given in the link I used eval to get A and B but can't get B.c, B.d.
Could you please suggest a way to do this?
That's a dict, not a JSON, if you want to convert that to a DataFrame, just do this:
d = {'A':'1', 'B':{'c':['1','2'], 'd':['3','4']}}
df = pd.DataFrame(d)
A B
c 1 [1, 2]
d 1 [3, 4]
If your problem is loading this text into a python dict
you can try a couple of things
replace single quotes -> json.loads(data.replace("'",'"'))
try to read it as a python dict -> eval(data)
A JSON cannot have keys or values encompassed in single quotes.
If you have to parse a string with single quotes as a dict then you can probably use
import ast
data = str({'A':'1', 'B':{'c':['1','2'], 'd':['3','4']}})
data_dict = ast.literal_eval(data)
from pandas.io.json import json_normalize
data_normalized = json_normalize(data)
https://stackoverflow.com/a/21154138/13561487

Construct MultiIndex pandas DataFrame nested Python dictionary

I would like to construct a MultiIndex DataFrame from a deeply-nested dictionary of the form
md = {'50': {'100': {'col1': ('0.100',
'0.200',
'0.300',
'0.400'),
'col2': ('6.263E-03',
'6.746E-03',
'7.266E-03',
'7.825E-03')},
'101': {'col1': ('0.100',
'0.200',
'0.300',
'0.400'),
'col2': ('6.510E-03',
'7.011E-03',
'7.553E-03',
'8.134E-03')}
'102': ...
}
'51': ...
}
I've tried
df = pd.DataFrame.from_dict({(i,j): md[i][j][v] for i in md.keys() for j in md[i].keys() for v in md[i][j]}, orient='index')
following Construct pandas DataFrame from items in nested dictionary, but I get a DataFrame with 1 row and many columns.
Bonus:
I'd also like to label the MultiIndex keys and the columns 'col1' and 'col2', as well as convert the strings to int and float, respectively.
How can I reconstruct my original dictionary from the dataframe?
I tried df.to_dict('list').
Check out this answer: https://stackoverflow.com/a/24988227/9404057. This method unpacks the keys and values of the dictionary, and reforms the data into an easily processed format for multiindex dataframes. Note that if you are using python 3.5+, you will need to use .items() rather than .iteritems() as shown in the linked answer:
>>>>import pandas as pd
>>>>reform = {(firstKey, secondKey, thirdKey): values for firstKey, middleDict in md.items() for secondKey, innerdict in middleDict.items() for thirdKey, values in innerdict.items()}
>>>>df = pd.DataFrame(reform)
To change the data type of col1 and col to int and float, you can then use pandas.DataFrame.rename() and specify any values you want:
df.rename({'col1':1, 'col2':2.5}, axis=1, level=2, inplace=True)
Also, if you'd rather have the levels on the index rather than the columns, you can also use pandas.DataFrame.T
If you wanted to reconstruct your dictionary from this MultiIndex, you could do something like this:
>>>>md2={}
>>>>for i in df.columns:
if i[0] not in md2.keys():
md2[i[0]]={}
if i[1] not in md2[i[0]].keys():
md2[i[0]][i[1]]={}
md2[i[0]][i[1]][i[2]]=tuple(df[i[0]][i[1]][i[2]].values)

Categories

Resources