list[Union[dict[str, Union[dict[str, str], str]], dict[str, Union[dict[str, str], str]], dict[str, Union[dict[str, int], str]],
dict[str, Union[dict[str, int], str]], dict[str, Union[dict[str, int], str]], dict[str, Union[dict[str, str], str]],
dict[str, Union[dict[str, int], str]]]]
What will be the best way to flatten a nested dictionary in python - this is the JSON file column value I have?
I tried using recursive loop function, json_normalize function but all of them convert it in single row with multiple columns. Ideally, this should become multiple rows corresponding to each dict value.
Related
I have a file containing miscellaneous parameters, stored as csv. I'm loading it into Python using pandas.read_csv, which very conveniently returns a DataFrame with one column. But I don't need any of the fancy pandas features, so I immediately convert my data to a dict.
My next problem is that the parameters all have different types: some are integers, some may be floats, and some are strings. pandas usually loads them with dtype=object, and they go into the dict as strings. (Except sometimes they're all numeric, so I get a numeric dtype right away.) I wrote a simple function that attempts to identify numeric types and convert them appropriately.
header = {key: typecast(value)
for (key, value) in dict(header['Value']).items()}
def typecast(value):
"""Convert a string to a numeric type if possible
>>> def cast_with_type(s):
... return (result := typecast(s), type(result))
>>> cast_with_type(123)
(123, <class 'int'>)
>>> cast_with_type('foo')
('foo', <class 'str'>)
>>> cast_with_type('123')
(123, <class 'int'>)
>>> cast_with_type('123.4')
(123.4, <class 'float'>)
"""
try:
if '.' in value:
return float(value)
else:
return int(value)
except TypeError:
return value
except ValueError:
return value
Is there a built-in feature that does this better than what I already have?
The way I understand it you want to convert any dtype into a numeric dtype and then make it into a dictionary. For this you can use the astype() method
(https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.astype.html) to convert it into the required dtype and then convert your dataframe to a dictionary.
df = df.astype(float)
my_dict = df.to_dict()
Your dict can be oriented in anyway you want just refer https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_dict.html
I am trying to retrieve the datatypes of the values in each cell of a pandas dataframe using the code below:
import pandas as pd
dfin = pd.read_csv(path, dtype=object)
d = {"<class 'datetime.datetime'>":'DateTime.Type',
"<class 'int'>": 'int',
"<class 'float'>": 'float',
"<class 'str'>": 'str'}
dftypes = df.applymap(type).astype(str).replace(d)
My dataframe contains mixed type columns and the 'dtype = object' parameter is intended to protect the types of cell values from being auto defined on a by column basis.
This code generates and maps the proper datatypes when the dfin is read from an xlsx file (pd.read_xlsx()), but not when read from a standard csv file (pd.read_csv()).
I want to be able to read in the data from a csv and then determine the datatypes cell by cell, but it only detects as str or null(float). Is there a fix here, or can you recommend another method to get this result?
Example:
Given dfin:
Column A
Column B
Column C
1.4
4
NaN
'yes'
3.2
5
I want to return dftypes:
Column A
Column B
Column C
float
int
float
str
float
int
(works with read_xlsx())
With read_csv() the actual return is:
Column A
Column B
Column C
str
str
float
str
str
str
Could you use a try, except block to try to convert the string to float, then int, and if it succeeds return 'float' or 'int', if not return 'str'?
e.g.
def get_data_type(value):
try:
float(value)
except ValueError:
return 'str'
else:
try:
int(value)
return 'int'
except ValueError:
return 'float'
dfin.applymap(get_data_type)
I'm creating a datatable as follows,
spotify_songs_dt = dt.fread('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-01-21/spotify_songs.csv')
and its column types are,
spotify_songs_dt.stypes
Here I would like to take out only numeric fields of DT, and how can it be achieved in a datatable way?. In pandas dataframe we have a kind of function select_dtypes() for it.
If you have a frame DT, then the most straightforward way to select columns of a specific type is to use the type itself in the DT[:,j] selector:
DT[:, bool] # all boolean columns
DT[:, int] # all integer columns
DT[:, float] # all floating columns
DT[:, str] # string columns
DT[:, dt.int32] # columns with stype int32
DT[:, dt.ltype.int] # columns with ltype `int`, same as DT[:, int]
It is also possible to provide a list of types to select:
DT[:, [int, float]] # integer and floating columns
DT[:, [dt.int32, dt.int64]] # int32 and int64 columns
Sometimes it may also be useful to delete the columns of the undesirable type instead of selecting the ones you need:
del DT[:, str]
How can I specify dtypes for each column when doing pd.DataFrame(data)? The documentation says Only a single dtype is allowed. but I have multiple columns with different types.
How can I do this?
df = pd.DataFrame(ag, dtype={'float_col': float, "int_col": int, "other": object})
Without getting this error?
TypeError: data type not understood
IIUC, use pandas.DataFrame.astype:
df = pd.DataFrame(ag).astype({'float_col': float, "int_col": int, "other": object})
print(df.dtypes)
Output:
float_col float64
int_col int32
other object
dtype: object
As opposed to pandas.DataFrame, astype can handle dict or column name -> data type.
I am using Python's csv.DictReader to read in values from a CSV file to create a dictionary where keys are first row or headers in the CSV and other rows are values. It works perfectly as expected and I am able to get a dictionary, but I only want certain keys to be in the dictionary rather than all of the column values. What is the best way to do this? I tried using csv.reader but I don't think it has this functionality. Maybe this can be achieved using pandas?
Here is the code I was using with CSV module where Fieldnames was the keys that I wanted to retain in my dict. I realized it isn't used for what I described above.
import csv
with open(target_path+target_file) as csvfile:
reader = csv.DictReader(csvfile,fieldnames=Fieldnames)
for i in reader:
print i
You can do this very simply using pandas.
import pandas as pd
# get only the columns you want from the csv file
df = pd.read_csv(target_path + target_file, usecols=['Column Name1', 'Column Name2'])
result = df.to_dict(orient='records')
Sources:
pandas.read_csv
pandas.DataFrame.to_dict
You can use the to_dict method to get a list of dicts:
import pandas as pd
df = pd.read_csv(target_path+target_file, names=Fieldnames)
records = df.to_dict(orient='records')
for row in records:
print row
to_dict documentation:
In [67]: df.to_dict?
Signature: df.to_dict(orient='dict')
Docstring:
Convert DataFrame to dictionary.
Parameters
----------
orient : str {'dict', 'list', 'series', 'split', 'records', 'index'}
Determines the type of the values of the dictionary.
- dict (default) : dict like {column -> {index -> value}}
- list : dict like {column -> [values]}
- series : dict like {column -> Series(values)}
- split : dict like
{index -> [index], columns -> [columns], data -> [values]}
- records : list like
[{column -> value}, ... , {column -> value}]
- index : dict like {index -> {column -> value}}
.. versionadded:: 0.17.0
Abbreviations are allowed. `s` indicates `series` and `sp`
indicates `split`.
Returns
-------
result : dict like {column -> {index -> value}}
File: /usr/local/lib/python2.7/dist-packages/pandas/core/frame.py
Type: instancemethod