Flattening in multiple rows in python - python

list[Union[dict[str, Union[dict[str, str], str]], dict[str, Union[dict[str, str], str]], dict[str, Union[dict[str, int], str]],
dict[str, Union[dict[str, int], str]], dict[str, Union[dict[str, int], str]], dict[str, Union[dict[str, str], str]],
dict[str, Union[dict[str, int], str]]]]
What will be the best way to flatten a nested dictionary in python - this is the JSON file column value I have?
I tried using recursive loop function, json_normalize function but all of them convert it in single row with multiple columns. Ideally, this should become multiple rows corresponding to each dict value.

Related

Convert pandas Series to dict with type conversion?

I have a file containing miscellaneous parameters, stored as csv. I'm loading it into Python using pandas.read_csv, which very conveniently returns a DataFrame with one column. But I don't need any of the fancy pandas features, so I immediately convert my data to a dict.
My next problem is that the parameters all have different types: some are integers, some may be floats, and some are strings. pandas usually loads them with dtype=object, and they go into the dict as strings. (Except sometimes they're all numeric, so I get a numeric dtype right away.) I wrote a simple function that attempts to identify numeric types and convert them appropriately.
header = {key: typecast(value)
for (key, value) in dict(header['Value']).items()}
def typecast(value):
"""Convert a string to a numeric type if possible
>>> def cast_with_type(s):
... return (result := typecast(s), type(result))
>>> cast_with_type(123)
(123, <class 'int'>)
>>> cast_with_type('foo')
('foo', <class 'str'>)
>>> cast_with_type('123')
(123, <class 'int'>)
>>> cast_with_type('123.4')
(123.4, <class 'float'>)
"""
try:
if '.' in value:
return float(value)
else:
return int(value)
except TypeError:
return value
except ValueError:
return value
Is there a built-in feature that does this better than what I already have?
The way I understand it you want to convert any dtype into a numeric dtype and then make it into a dictionary. For this you can use the astype() method
(https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.astype.html) to convert it into the required dtype and then convert your dataframe to a dictionary.
df = df.astype(float)
my_dict = df.to_dict()
Your dict can be oriented in anyway you want just refer https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_dict.html

Datatype detection cell by cell in dataframe

I am trying to retrieve the datatypes of the values in each cell of a pandas dataframe using the code below:
import pandas as pd
dfin = pd.read_csv(path, dtype=object)
d = {"<class 'datetime.datetime'>":'DateTime.Type',
"<class 'int'>": 'int',
"<class 'float'>": 'float',
"<class 'str'>": 'str'}
dftypes = df.applymap(type).astype(str).replace(d)
My dataframe contains mixed type columns and the 'dtype = object' parameter is intended to protect the types of cell values from being auto defined on a by column basis.
This code generates and maps the proper datatypes when the dfin is read from an xlsx file (pd.read_xlsx()), but not when read from a standard csv file (pd.read_csv()).
I want to be able to read in the data from a csv and then determine the datatypes cell by cell, but it only detects as str or null(float). Is there a fix here, or can you recommend another method to get this result?
Example:
Given dfin:
Column A
Column B
Column C
1.4
4
NaN
'yes'
3.2
5
I want to return dftypes:
Column A
Column B
Column C
float
int
float
str
float
int
(works with read_xlsx())
With read_csv() the actual return is:
Column A
Column B
Column C
str
str
float
str
str
str
Could you use a try, except block to try to convert the string to float, then int, and if it succeeds return 'float' or 'int', if not return 'str'?
e.g.
def get_data_type(value):
try:
float(value)
except ValueError:
return 'str'
else:
try:
int(value)
return 'int'
except ValueError:
return 'float'
dfin.applymap(get_data_type)

How to select columns based on their data types in pydatatable?

I'm creating a datatable as follows,
spotify_songs_dt = dt.fread('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-01-21/spotify_songs.csv')
and its column types are,
spotify_songs_dt.stypes
Here I would like to take out only numeric fields of DT, and how can it be achieved in a datatable way?. In pandas dataframe we have a kind of function select_dtypes() for it.
If you have a frame DT, then the most straightforward way to select columns of a specific type is to use the type itself in the DT[:,j] selector:
DT[:, bool] # all boolean columns
DT[:, int] # all integer columns
DT[:, float] # all floating columns
DT[:, str] # string columns
DT[:, dt.int32] # columns with stype int32
DT[:, dt.ltype.int] # columns with ltype `int`, same as DT[:, int]
It is also possible to provide a list of types to select:
DT[:, [int, float]] # integer and floating columns
DT[:, [dt.int32, dt.int64]] # int32 and int64 columns
Sometimes it may also be useful to delete the columns of the undesirable type instead of selecting the ones you need:
del DT[:, str]

Pandas multiple dtype at import

How can I specify dtypes for each column when doing pd.DataFrame(data)? The documentation says Only a single dtype is allowed. but I have multiple columns with different types.
How can I do this?
df = pd.DataFrame(ag, dtype={'float_col': float, "int_col": int, "other": object})
Without getting this error?
TypeError: data type not understood
IIUC, use pandas.DataFrame.astype:
df = pd.DataFrame(ag).astype({'float_col': float, "int_col": int, "other": object})
print(df.dtypes)
Output:
float_col float64
int_col int32
other object
dtype: object
As opposed to pandas.DataFrame, astype can handle dict or column name -> data type.

python csv to dictionary using csv or pandas module

I am using Python's csv.DictReader to read in values from a CSV file to create a dictionary where keys are first row or headers in the CSV and other rows are values. It works perfectly as expected and I am able to get a dictionary, but I only want certain keys to be in the dictionary rather than all of the column values. What is the best way to do this? I tried using csv.reader but I don't think it has this functionality. Maybe this can be achieved using pandas?
Here is the code I was using with CSV module where Fieldnames was the keys that I wanted to retain in my dict. I realized it isn't used for what I described above.
import csv
with open(target_path+target_file) as csvfile:
reader = csv.DictReader(csvfile,fieldnames=Fieldnames)
for i in reader:
print i
You can do this very simply using pandas.
import pandas as pd
# get only the columns you want from the csv file
df = pd.read_csv(target_path + target_file, usecols=['Column Name1', 'Column Name2'])
result = df.to_dict(orient='records')
Sources:
pandas.read_csv
pandas.DataFrame.to_dict
You can use the to_dict method to get a list of dicts:
import pandas as pd
df = pd.read_csv(target_path+target_file, names=Fieldnames)
records = df.to_dict(orient='records')
for row in records:
print row
to_dict documentation:
In [67]: df.to_dict?
Signature: df.to_dict(orient='dict')
Docstring:
Convert DataFrame to dictionary.
Parameters
----------
orient : str {'dict', 'list', 'series', 'split', 'records', 'index'}
Determines the type of the values of the dictionary.
- dict (default) : dict like {column -> {index -> value}}
- list : dict like {column -> [values]}
- series : dict like {column -> Series(values)}
- split : dict like
{index -> [index], columns -> [columns], data -> [values]}
- records : list like
[{column -> value}, ... , {column -> value}]
- index : dict like {index -> {column -> value}}
.. versionadded:: 0.17.0
Abbreviations are allowed. `s` indicates `series` and `sp`
indicates `split`.
Returns
-------
result : dict like {column -> {index -> value}}
File: /usr/local/lib/python2.7/dist-packages/pandas/core/frame.py
Type: instancemethod

Categories

Resources