how to convert pandas dataframe and numpy array into dictionary? - python

I have the following pandas dataframe which looks like,
code comp name
0 A292340 디비자산운용 마이티 200커버드콜ATM레버리지
1 A291630 키움투자자산운용 KOSEF 코스닥150선물레버리지
2 A278240 케이비자산운용 KBSTAR 코스닥150선물레버리지
3 A267770 미래에셋자산운용 TIGER 200선물레버리지
4 A267490 케이비자산운용 KBSTAR 미국장기국채선물레버리지(합성 H)
And I like to make dictionary out of this which will look like,
{'20180408' :{'A292340' : {comp : 디비자산운용}, {name : 마이티 200커버드콜ATM 레버리지}}}
Sorry about the data which is in foreign to you, but let me please ask.
What I tried is like,
values = [comp, name]
names = ['comp', 'name']
tmp = {names:values for names, values in zip(names, values)}
tpm = {code:values for values in zip(*tmp)}
aaaa = {date:c for c in zip(*tpm)}
print(aaaa)
aaaa is what I try to get.. and date is just simple list of date, from prior to todate. but when I run this, I got the error
TypeError: unhashable type: 'numpy.ndarray'
Thank you in advance for your answer.

Is this what you want? First, set "code" column as the index. Then use to_dict with "orient="index".
df.set_index("code").to_dict("index")
{'A267490': {'comp': '케이비자산운용', 'name': 'KBSTAR 미국장기국채선물레버리지(합성 H)'},
'A267770': {'comp': '미래에셋자산운용', 'name': 'TIGER 200선물레버리지'},
'A278240': {'comp': '케이비자산운용', 'name': 'KBSTAR 코스닥150선물레버리지'},
'A291630': {'comp': '키움투자자산운용', 'name': 'KOSEF 코스닥150선물레버리지'},
'A292340': {'comp': '디비자산운용', 'name': '마이티 200커버드콜ATM레버리지'}}
Using the argument "index" will give the layout:
{index -> {columnName -> valueOfTheColumn}}
Here since we set code as the index, we have
code -> {"comp" -> comp's value, "name" -> name's value}
'A267490': {'comp': '케이비자산운용', 'name': 'KBSTAR 미국장기국채선물레버리지(합성 H)'}

Related

Detecting Excel column data types in Python Pandas

New to Python and Pandas here. I am trying to read an Excel file off of S3 (using boto3) and read the headers (first row of the spreadsheet) and determine what data type each header is, if this is possible to do. If it is, I need a map of key-value pairs where each key is the header name and value is its data type. So for example if the file I fetch from S3 has the following data in it:
Date,Name,Balance
02/01/2022,Jerry Jingleheimer,45.07
02/14/2022,Jane Jingleheimer,102.29
Then I would be looking for a map of KV pairs like so:
Key 1: "Date", Value 1: "datetime" (or whatever is the appropriate date type)
Key 2: "Name", Value 2: "string" (or whatever is the appropriate date type)
Key 3: "Balance", Value 3: "numeric" (or whatever is the appropriate date type)
So far I have:
s3Client = Res.resource('s3')
obj = s3Client.get_object(Bucket="some-bucket", Key="some-key")
file_headers = pd.read_excel(io.BytesIO(obj['Body'].read()), engine="openpyxl").columns.tolist()
I'm just not sure about how to go about extracting the data types that Pandas has detected or how to generate the map.
Can anyone point me in the right direction please?
IIUC, you can use dtypes:
>>> df.dtypes.to_dict()
{'Date': dtype('<M8[ns]'), 'Name': dtype('O'), 'Balance': dtype('float64')}
>>> {k: v.name for k, v in df.dtypes.to_dict().items()}
{'Date': 'datetime64[ns]', 'Name': 'object', 'Balance': 'float64'}
I suggest you to check this pandas tutorial.
The pandas.read_excel('my_file.xlsx').dtypes should give you the types of the columns.

Flatten nested JSON and concatenate to dataframe using pandas

I have searched for a lot of similar topics online, but I have not found the solution yet.
My pandas dataframe looks like this:
index FOR
0 [{'id': '2766', 'name': '0803 Computer Softwar...
1 [{'id': '2766', 'name': '0803 Computer Softwar...
2 [{'id': '2766', 'name': '0803 Computer Softwar...
3 [{'id': '2766', 'name': '0803 Computer Softwar...
4 [{'id': '2766', 'name': '0803 Computer Softwar...
And I would like to flatten all 4 rows to become like the following dataframe while below is just the result for the first row:
index id name
0 2766 0803 Computer Software
I found a similar solution here. Unfortunately, I got a "TypeError" as the following:
TypeError: the JSON object must be str, bytes or bytearray, not 'list'
My code was:
dfs = []
for i in test['FOR']:
data = json.loads(i)
dfx = pd.json_normalize(data)
dfs.append(dfx)
df = pd.concat(dfs).reset_index(inplace = True)
print(df)
Would anyone can help me here?
Thank you very much!
try using literal_eval from the ast standard lib.
from ast import literal_eval
df_flattened = pd.json_normalize(df['FOR'].map(literal_eval))
then drop duplicates.
print(df_flattened.drop_duplicates())
id name
0 2766 0803 Computer Software
After a few weeks not touching related works,
I encountered another similar case and
I think I have got the solution so far for this case.
Please feel free to correct me or provide any other ideas.
I really appreciated all the helps and all the generous support!
chuck = []
for i in range(len(test)):
chuck.append(json_normalize(test.iloc[i,:]['FOR']))
test_df = pd.concat(chuck)
And then drop duplicated columns for the test_df

Convert dataframe to dictionary as shown

My dataframe is as shown
name key value
john A223 390309
jason B439 230943
peter A5388 572039
john D23902 238939
jason F2390 23930
I want to convert the above generated dataframe into a dictionary in the below shown format.
{'john': {'key':'A223', 'value':'390309', 'key':'A5388', 'value':'572039'},
'jason': {'key':'B439','value':'230943', 'key':'F2390', 'value':'23930'},
'peter': {'key':'A5388' ,'value':'572039'}}
I tried a = dict(zip(dataframe['key'],dataframe['value'])).
But wont give me the dataframe columns headers.
Dictionary keys must be unique
Assuming, as in your desired output, you want to keep only rows with the first instance of each name, you can reverse row order and then use to_dict with orient='index':
res = df.iloc[::-1].set_index('name').to_dict('index')
print(res)
{'jason': {'key': 'B439', 'value': 230943},
'john': {'key': 'A223', 'value': 390309},
'peter': {'key': 'A5388', 'value': 572039}}

How to create a Pandas DataFrame from a list of OrderedDicts?

I have the following list:
o_dict_list = [(OrderedDict([('StreetNamePreType', 'ROAD'), ('StreetName', 'Coffee')]), 'Ambiguous'),
(OrderedDict([('StreetNamePreType', 'AVENUE'), ('StreetName', 'Washington')]), 'Ambiguous'),
(OrderedDict([('StreetNamePreType', 'ROAD'), ('StreetName', 'Quartz')]), 'Ambiguous')]
And like the title says, I am trying to take this list and create a pandas dataframe where the columns are: 'StreetNamePreType' and 'StreetName' and the rows contain the corresponding values for each key in the OrderedDict.
I have done some searching on StackOverflow to get some guidance on how to create a dataframe, see here but I am getting an error when I run this code (I am trying to replicate what is going on in that response).
from collections import Counter, OrderedDict
import pandas as pd
col = Counter()
for k in o_dict_list:
col.update(k)
df = pd.DataFrame([k.values() for k in o_dict_list], columns = col.keys())
When I run this code, the error I get is: TypeError: unhashable type: 'OrderedDict'
I looked up this error, here, I get that there is a problem with the datatypes, but I, unfortunately, I don't know enough about the inner workings of Python/Pandas to resolve this problem on my own.
I suspect that my list of OrderedDict is not exactly the same as in here which is why I am not getting my code to work. More specifically, I believe I have a list of sets, and each element contains an OrderedDict. The example, that I have linked to here seems to be a true list of OrderedDicts.
Again, I don't know enough about the inner workings of Python/Pandas to resolve this problem on my own and am looking for help.
I would use list comprehension to do this as follows.
pd.DataFrame([o_dict_list[i][0] for i, j in enumerate(o_dict_list)])
See the output below.
StreetNamePreType StreetName
0 ROAD Coffee
1 AVENUE Washington
2 ROAD Quartz
extracting the OrderedDict objects from your list and then use pd.Dataframe should work
values= []
for i in range(len(o_dict_list)):
values.append(o_dict_list[i][0])
pd.DataFrame(values)
StreetNamePreType StreetName
0 ROAD Coffee
1 AVENUE Washington
2 ROAD Quartz
d = [{'points': 50, 'time': '5:00', 'year': 2010},
{'points': 25, 'time': '6:00', 'month': "february"},
{'points':90, 'time': '9:00', 'month': 'january'},
{'points_h1':20, 'month': 'june'}]
pd.DataFrame(d)

Parsing API response Data as a data frame in Python with Pandas

I am trying to parse data from an API call to ERP system. I want to bring this data in as a data frame with Pandas so that I can work with the data. Every attempt to parse it with either json_string / json_dumps or DataFrame.from_dict() is not working for me.
My raw data file looks like:
Type: dict
String form: {0: [{'productID': 144194, 'name': 'XXXtentacion, ?, LP', 'code': '1039907', 'code2': '1210672', <...> Field4Title': 'Format Notes', 'extraField4ID': 0, 'extraField4Code': '', 'extraField4Name': ''}]}
Length: 1
Docstring:
dict() -> new empty dictionary
dict(mapping) -> new dictionary initialized from a mapping object's
(key, value) pairs
dict(iterable) -> new dictionary initialized as if via:
d = {}
for k, v in iterable:
d[k] = v
dict(**kwargs) -> new dictionary initialized with the name=value pairs
in the keyword argument list. For example: dict(one=1, two=2)
The closest I get is calling:
pd.DataFrame.from_dict(data)
which returns:
0
0 {'productID': 144194, 'name': 'XXXtentacion, ?...
1 {'productID': 131605, 'name': 'Sufjan Stevens ...
2 {'productID': 143699, 'name': 'Sufjan Stevens ...
3 {'productID': 134277, 'name': 'Sufjan Stevens ...
4 {'productID': 135151, 'name': 'Sufjan Stevens ...
5 {'productID': 145844, 'name': 'Spearhead, Home...
but what I want is for the keys to be column headers (i.e. 'productID' should be 1st column header.
I'm just starting out with Python so any help is greatly appreciated. I've looked around on similar topics and can't seem to find the solution.
Assuming your data is structured as Dict(key1: List(Dict(...)), key2: ...)
Try
data = {d:data[d][0] for d in data}
pd.DataFrame.from_dict(data)

Categories

Resources