Storing a dataframe in dictionary, weird output in dictionary - python

I have a function that returns a dataframe to my main. I am trying to store these dataframes in a dictionary, in order to retrieve them again later.
When I run this:
sa_wp5 = get_SA_WP5_value('testfile.txt')
template_dict["SAWP5Country Name"] = sa_wp5
my output looks like the following:
{'SAWP5Country Name': 1 2
0 Australia 047}
where I would rather the output just be the variable itself containing the dataframe.
What am I doing wrong here?

Nothing wrong here. Just a matter of formatting due to the default __str__() output of a DataFrame object. If you feel messy, try this way to print out your dict:
for key, df in template_dict.items():
print("%s:" % key)
print(df.to_string())
print("-------")

You can use Bunch to store all sorts of objects for easy retrieval.
from sklearn.datasets.base import Bunch
Then create a variable using the Bunch() method:
a = Bunch(df1 = df_template.copy(), df2 = df_other_df.copy())
Then you can simply call them as such:
a.df1
a.df1['col1']
df = a.df2
etc.
It's really effective for storage of objects.

Related

Assigning unique IDs to strings

I am trying to build an elegant solution to assigning IDs starting from 0 for the following data:
My Attempt at first creating IDs for the 'Person' category is like this:
df = pd.DataFrame(
{'Person': ['Tom Jones','Bill Smeegle','Silvia Geerea'],
'PersonFriends': [['Bill Smeegle','Silvia Geerea'],['Tom Jones'],['Han Solo']]})
df['PersonID'] = (df['Person']).astype('category').cat.codes
which produces
Now I want to follow the same process but do this for the 'PersonFriends' column to get this result below. How can I apply the same functions to achieve this when I have a list of friends?
I have been able to do this via the hash() function on each name, but the ID generated is long and not very readable. Any help appreciated. Thanks.
Create a dict and apply values from key
id_map = dict(zip(df["Person"], df["PersonID"]))
df["FriendsID"] = df["PersonFriends"].apply(lambda x: [id_map.get(y) for y in x])

How to replace empty values with reference to another dataframe?

I have 2 data frames. One is reference table with columns: code and name. Other one is list of dictionaries. The second data frame has code filled up but some names as empty strings. I am thinking of performing 2 for loops to get to the dictionary. But, I am new to this so unsure how to get the value from reference table.
Started with something like this:
for i in sample:
for j in i:
if j['name']=='':
(j['code'])
I am unsure how to proceed with the code. I think there is a very simple way with .map() function. Can someone help?
Reference table:
enter image description here
Edit needed table:
enter image description here
It seems to me that in this particular case you're using Pandas only to work with Python data structures. If that's the case, it would make sense to ditch Pandas altogether and just use Python data structures - usually, it results in more idiomatic and readable code that often performs better than Pandas with dtype=object.
In any case, here's the code:
import pandas as pd
sample_name = pd.DataFrame(dict(code=[8, 1, 6],
name=['Human development',
'Economic managemen',
'Social protection and risk management']))
# We just need a Series.
sample_name = sample_name.set_index('code')['name']
sample = pd.Series([[dict(code=8, name='')],
[dict(code=1, name='')],
[dict(code=6, name='')]])
def fix_dict(d):
if not d['name']:
d['name'] = sample_name.at[d['code']]
return d
def fix_dicts(dicts):
return [fix_dict(d) for d in dicts]
result = sample.map(fix_dicts)

How to add on to parameter names in functions?

def priceusd(df):
return df['closeprice'][-1]*btcusdtclose[-1]
This function gives the price of a certain asset in USD by multiplying its price in Bitcoin by Bitcoins price in USD using a dataframe as a parameter.
What I want to do is just allow the name of the asset to be the parameter instead of the dataframe where the price data is coming from. All my dataframes have been named assetbtc. for example ethbtc or neobtc. I want to just be able to pass eth into the function and return ethbtc['closeprice'][-1]*btcusdtclose[-1].
For example,
def priceusd(eth):
return ethbtc['close'][-1]*btcusdtclose[-1]
I tried this and it didnt work, but you can see what I am trying to do
def priceusd(assetname): '{}btc'.format(assetname)['close'][-1]*btcusdtclose[-1].
Thank you very much.
It's not necessary to use eval in a situation like this. As #wwii says, store the DataFrames in a dictionary so that you can easily retrieve them by name.
E.g.
coins_to_btc = {
'eth': ethbtc,
'neo': neobtc,
}
Then,
def priceusd(name):
df = coins_to_btc[name]
return df['close'][-1]*btcusdtclose[-1]
You should be getting the dataframe you want from whatever contains it instead of trying to use a str as the dataframe. I mean you should use the str you formed to fetch the dataframe from where it is.
For example assuming you have placed the priceusd function inside the same module that contains all your created data frames like:
abtc = df1()
bbtc = df2()
cbtc = df3()
# and so on...
def priceusd(asset):
asset_container = priceusd.__module__
asset_name = f'{asset}btc'
df = getattr(asset_container, asset_name)
# now do whatever you want with your df (dataframe)
You can replace the code for getting the asset_container if the structure of your code is different from the one I assumed. But you should generally get my point...

How to convert Multilevel Dictionary with Irregular Data to Desired Format

Dict = {'Things' : {'Car':'Lambo', 'Home':'NatureVilla', 'Gadgets':{'Laptop':{'Programs':{'Data':'Excel', 'Officework': 'Word', 'Coding':{'Python':'PyCharm', 'Java':'Eclipse', 'Others': 'SublimeText'}, 'Wearables': 'SamsungGear', 'Smartphone': 'Nexus'}, 'clothes': 'ArmaaniSuit', 'Bags':'TravelBags'}}}}
d = {(i,j,k,l,m,n): Dict[i][j][k][l][m][n]
for i in Dict.keys()
for j in Dict[i].keys()
for k in Dict[j].keys()
for l in Dict[k].keys()
for m in Dict[l].keys()
for n in Dict[n].keys()
}
mux = pd.MultiIndex.from_tuples(d.keys())
df = pd.DataFrame(list(d.values()), index=mux)
print (df)
What I have already done:
I tried to Multiindex this Irregular Data using pandas but I am getting KeyError at 'Car'. Then I tried to handle exceptions and tried to PASS it but then it results in a Syntax Error. So May be I lost the direction. If there is any other module or way I can index this irregular data and put it in a table somehow. I have a chunk of raw data like this.
What I am trying to do:
I wanted to use this data for printing in QTableView which is from PyQt5 (Making a program with GUI).
Conditions:
This Data keeps on updating every hour from an API.
What I have thought till now:
May be I can append all this data to MySQL. But then when this data updates from API, only Values will change, rest of the KEYS will be the same. But then It will require more space.
References:
How to convert a 3-level dictionary to a desired format?
How to build a MultiIndex Pandas DataFrame from a nested dictionary with lists
Any Help will be appreciated. Thanks for reading the question.
You data is not actually a 6-level dictionary like a dictionary in a 3-level example you referenced to. The difference is: your dictionary has a data on multiple different levels, e.g. 'Lambo' value is on second level of hierarchy with key ('Things','Car') but 'Eclipse' value is on sixth level of hierarchy with key ('Things','Gadgets','Laptop','Programs','Coding','Java')
If you want to 'flatten' your structure you will need to decide what to do with 'missed' key values for deeper levels for values like 'Lambo'.
Btw, maybe it is not actually a solution for your problem, maybe you need to use more appropriate UI widgets like TreeView to work with such kind of hierarchical data, but I will try to directly address your exact question.
Unfortunately it seems to be no easy way to reference all different level values uniformly in one simple dict or list comprehension statement.
Just look at your 'value extractor' (Dict[i][j][k][l][m][n]) there are no such values for i,j,k,l,m,n exists which allows you to get a 'Lambo'. Because to get a Lambo you will need to just use Dict['Things']['Car'] (ironically, in a real life it is also could be difficult to get a Lambo :-) )
One straightforward way to solve your task is:
extract a second level data, extract a third level data, and so on, and combine them together.
E.g. to extract second level values you can write something like this:
val_level2 = {(k1,k2):Dict[k1][k2]
for k1 in Dict
for k2 in Dict[k1]
if isinstance(Dict[k1],dict) and
not isinstance(Dict[k1][k2],dict)}
but if you want to combine it later with six level values, it will need to add some padding to your key tuples:
val_level2 = {(k1,k2,'','','',''):Dict[k1][k2]
for k1 in Dict
for k2 in Dict[k1]
if isinstance(Dict[k1],dict) and
not isinstance(Dict[k1][k2],dict)}
later you can combine all together by something like:
d = {}
d.update(val_level2)
d.update(val_level3)
But usually the most organic way to work with hierarchical data is to use some recursion, like this:
def flatten_dict(d,key_prefix,max_deep):
return [(tuple(key_prefix+[k]+['']*(max_deep-len(key_prefix))),v)
for k,v in d.items() if not isinstance(v,dict)] +\
sum([flatten_dict(v,key_prefix+[k],max_deep)
for k,v in d.items() if isinstance(v,dict)],[])
And later with code like this:
d={k:v for k,v in flatten_dict(Dict,[],5)}
mux = pd.MultiIndex.from_tuples(d.keys())
df = pd.DataFrame(list(d.values()), index=mux)
df.reset_index()
I actually get this result with your data:
P.S. According to https://www.python.org/dev/peps/pep-0008/#prescriptive-naming-conventions we prefer a lowercase_with_underscores for variable names, CapWords is for classes. So src_dict would be much better, than Dict in your case.
You information looks a lot like json and that's what the API is returning. If that's the case, and you are turning it into a dictionary, then you might me better off using python's json library or even panda's built it read_json format.
Pandas read json
Python's json

How to feed array of user_ids to flickr.people.getInfo()?

I have been working on extracting the flickr users location (not lat. and long. but person's country) by using their user_ids. I have made a dataframe (Here's the dataframe) consisted with photo id, owner and few other columns. My attempt was to feed each of the owner to flickr.people.getInfo() query by iterating owner column in dataframe. Here is my attempt
for index, row in df.iterrows():
A=np.array(df["owner"])
for i in range(len(A)):
B=flickr.people.getInfo(user_id=A[i])
unfortunately, it results only 1 result. After careful examination I've found that it belongs to the last user in the dataframe. My dataframe has 250 observations. I don't know how could I extract others.
Any help is appreciated.
It seems like you forgot to store the results while iterating over the dataframe. I haven't use the API but I think that this snippet should do it.
result_dict = {}
for idx, owner in df['owner'].iteritems():
result_dict[owner] = flickr.people.getInfo(user_id=owner)
The results are stored in a dictonary where the user id is the key.
EDIT:
Since it is a JSON you can use the read_json function to parse the result.
Example:
result_list = []
for idx, owner in df['owner'].iteritems():
result_list.appen(pd.read_json(json.dumps(flickr.people.get‌​Info(user_id=owner))‌​,orient=list))
# you may have to set the orient parameter.
# Option are: 'split','records','index', Default is 'index'
Note: I switched the dictonary to a list, since it is more convenient
Afterwards you can concatenate the resulting pandas serieses together like this:
df = pd.concat(result_list, axis=1).transpose()
I added the transpose() since you probably want the ID as the index.
Afterwards you should be able to sort by the column 'location'.
Hope that helps.
The canonical way to achieve that is to use an apply. It will be much more efficient.
import pandas as pd
import numpy as np
np.random.seed(0)
# A function to simulate the call to the API
def get_user_info(id):
return np.random.randint(id, id + 10)
# Some test data
df = pd.DataFrame({'id': [0,1,2], 'name': ['Pierre', 'Paul', 'Jacques']})
# Here the call is made for each ID
df['info'] = df['id'].apply(get_user_info)
# id name info
# 0 0 Pierre 5
# 1 1 Paul 1
# 2 2 Jacques 5
Note, another way to write the same thing is
df['info'] = df['id'].map(lambda x: get_user_info(x))
Before calling the method, have the following lines first.
from flickrapi import FlickrAPI
flickr = FlickrAPI(FLICKR_KEY, FLICKR_SECRET, format='parsed-json')

Categories

Resources