SQLAlchemy : is there any good automated way to rename columns - python

I'm using SQLAlchemy ORM for a few days and i'm looking for a way to get tablename prefix in the results of Session.query().
For instance :
myId = 4
...
data = session.query(Email.address).filter(Email.id==str(myId)).one()
print data.keys()
This would display :
("address",)
And i would like to get something like :
("Email.address",)
Is there any way to do it, without changing the class attributes and the table column names.
This example is a bit dummy but in a more general purpose i would like to prefix all column names by table names in result to make sure the results are always under the same format, even if there are joins in queries.
I've read things about aliased(), many posts here but nothing satisfied me.
Can someone please enlighten me on this ?
Thank you.
EDIT:
Thanks a lot for your answer #alecxe. I finally manage to do what i wanted. Here is the first batch of my code, there is probably many things to improve :
query = self.session.query(Email.address,User.name)
cols = [{str(column['name']):str(column['expr'])} for column in query.column_descriptions]
someone = query.filter(User.name==str(curName)).all()
r = []
for res in someone :
p = {}
for c in map(str,res.__dict__):
if not c.startswith('_'):
for k in cols:
if c == k.keys()[0]:
p[k[c]] = res.__dict__[c]
r.append(p)
print r
The output is :
[{'Email.address': u'john#foobaz.com', 'User.name': u'John'}]

Give a try to column_descriptions:
query = session.query(Email.address)
print [str(column['expr']) for column in query.column_descriptions] # should print ["Email.address"]
data = query.filter(Email.id==str(myId)).one()
Hope that helps.

Related

Filling 0's with Local Means

Hi I am working on a dataset where there is a host_id and two other columns : reviews_per_month and number_of_reviews. For every host_id, majority of the values are present for these two columns whereas some of them are zeros. For each column, I want to replace those 0 values by the mean of all the values related with that host_id. Here is the code I have tried :
def process_rpm_nor(data):
data['reviews_per_month'] = data['reviews_per_month'].fillna(0)
data['number_of_reviews'] = data['number_of_reviews'].fillna(0)
data_list = []
for host_id in set(data['host_id']):
data_temp = data[data['host_id'] == host_id]
nor_non_zero = np.mean(data_temp[data_temp['number_of_reviews'] > 0]['number_of_reviews'])
rpm_non_zero = np.mean(data_temp[data_temp['reviews_per_month'] > 0]['reviews_per_month'])
data_temp['number_of_reviews'] = data_temp['number_of_reviews'].replace(0,nor_non_zero)
data_temp['reviews_per_month'] = data_temp['reviews_per_month'].replace(0,rpm_non_zero)
data_list.append(data_temp)
return pd.concat(data_list, axis = 1)
Though the code works, yet it takes a lot of time to process and I was wondering if anyone could help by offering an alternate solution to this problem or help me optimize my code. I'd really appreciate the help.

How to replace a string in a list of strings in a DataFrame (Python)?

I have a Dataframe which consists of lists of lists in two seperate columns.
import pandas as pd
data = pd.DataFrame()
data["Website"] = [["google.com", "amazon.com"], ["google.com"], ["aol.com", "no website"]]
data["App"] = [["Ok Google", "Alexa"], ["Ok Google"], ["AOL App", "Generic Device"]]
Thats how the Dataframe looks like
I need to replace certain strings in the first column (here: "no website") with the according string in the second column (here: "Generic Device"). The replacing string has the same index in the list as the string that needs to be replaced.
What did not work so far:
I tried several forms of str.replace(x,y) for lists and DataFrames and nothing worked. A simple replace(x,y) does not work as I need to replace several different strings. I think I can't get my head around the indexing thing.
I already googled and stackoverflowed for two hours and haven't found a solution yet.
Many thanks in advance! Sorry for bad engrish or noob mistakes, I am still learning.
-Max
Define replacement function and use apply to vectorize
def replacements(websites, apps):
" Substitute items in list replace_items that's found in websites "
replace_items = ["no website", ] # can add to this list of keys
# that trigger replacement
for i, k in enumerate(websites):
# Check each item in website for replacement
if k in replace_items:
# This is an item to be replaced
websites[i] = apps[i] # replace with corresponding item in apps
return websites
# Create Dataframe
websites = [["google.com", "amazon.com"], ["google.com"], ["aol.com", "no website"]]
app = [["Ok Google", "Alexa"], ["Ok Google"], ["AOL App", "Generic Device"]]
data = list(zip(websites, app))
df = pd.DataFrame(data, columns = ['Websites', 'App'])
# Perform replacement
df['Websites'] = df.apply(lambda row: replacements(row['Websites'], row['App']), axis=1)
print(df)
Output
Websites App
0 [google.com, amazon.com] [Ok Google, Alexa]
1 [google.com] [Ok Google]
2 [aol.com, Generic Device] [AOL App, Generic Device]
Try this,You can define replaceable values in a array and execute.
def f(x,items):
for rep in items:
if rep in list(x.Website):
x.Website[list(x.Website).index(rep)]=list(x.App)[list(x.Website).index(rep)]
return x
items = ["no website"]
data = data.apply(lambda x: f(x,items),axis=1)
Output:
Website App
0 [google.com, amazon.com] [Ok Google, Alexa]
1 [google.com] [Ok Google]
2 [aol.com, Generic Device] [AOL App, Generic Device]
First of all, Happy Holidays!
I wasn't really sure what your expected output was and I'm not really sure what you have tried previously, but I think that this may work:
data["Website"] = data["Website"].replace("no website", "Generic Device")
I really hope this helps!
You can create a function like this:
def f(replaced_value, col1, col2):
def r(s):
while replaced_value in s[col1]:
s[col1][s[col1].index(replaced_value)] = s[col2][s[col1].index(replaced_value)]
return s
return r
and use apply:
df=df.apply(f("no website","Website","App"), axis=1)
print(df)

Formatting Multiple Columns in a Pandas Dataframe

I have a dataframe I'm working with that has a large number of columns, and I'm trying to format them as efficiently as possible. I have a bunch of columns that all end in .pct that need to be formatted as percentages, some that end in .cost that need to be formatted as currency, etc.
I know I can do something like this:
cost_calc.style.format({'c.somecolumn.cost' : "${:,.2f}",
'c.somecolumn.cost' : "${:,.2f}",
'e.somecolumn.cost' : "${:,.2f}",
'e.somecolumn.cost' : "${:,.2f}",...
and format each column individually, but I was hoping there was a way to do something similar to this:
cost_calc.style.format({'*.cost' : "${:,.2f}",
'*.pct' : "{:,.2%}",...
Any ideas? Thanks!
The first way doesn't seem bad if you can automatically build that dictionary... you can generate a list of all columns fitting the *.cost description with something like
costcols = [x for x in df.columns.values if x[-5:] == '.cost']
then build your dict like:
formatdict = {}
for costcol in costcols: formatdict[costcol] = "${:,.2f}"
then as you suggested:
cost_calc.style.format(formatdict)
You can easily add the .pct cases similarly. Hope this helps!
I would use regEx with dict generators:
import re
mylist = cost_calc.columns
r = re.compile(r'.*cost')
cost_cols = {key: "${:,.2f}" for key in mylist if r.match(key)}
r = re.compile(r'.*pct')
pct_cols = {key: "${:,.2f}" for key in mylist if r.match(key)}
cost_calc.style.format({**cost_cols, **pct_cols})
note: code for Python 2.7 and 3 onwards

Vectorized string interpolation in Pandas? Is this doable without iteration?

The set up
I want to add a new column that contains a URL that has a base/template form and should have certain values interpolated into it based on the information contained in the row.
Table
What I would LOVE to be able to do
base_link = "https://www.vectorbase.org/Glossina_fuscipes/Location/View?r=%(scaffold)s:%(start)s-%(end)s"
# simplify getting column data from data_frame
start = operator.attrgetter('start')
end = operator.attrgetter('end')
scaffold = operator.attrgetter('seqname')
def get_links_to_genome_browser(data_frame):
base_links = pd.Series([base_link]*len(data_frame.index))
links = base_links % {"scaffold":scaffold(data_frame),"start":start(data_frame),"end":end(data_frame)}
return links
So I am answering my own question but I finally figured it out so I want to close this out and record the solution.
The solution is to use data_frame.apply() but to change my indexing syntax in the get_links_to_genome_browser function to Series syntax rather than DataFrame indexing syntax.
def get_links_to_genome_browser(series):
link = base_link % {"scaffold":series.ix['seqname'],"start":series.ix['start'],"end":series.ix['end']}
return link
Then call it like:
df.apply(get_links_to_genome_browser, axis=1)
I think I get what you're asking. Let me know
base_link = "https://www.vectorbase.org/Glossina_fuscipes/Location/View?r=%(scaffold)s:%(start)s-%(end)s"
then you can do something like this
data_frame['url'] = base_link + data_frame['start'] + data_frame['end'] + etc...

DataFrame constructor not properly called! error

I am new to Python and I am facing problem in creating the Dataframe in the format of key and value i.e.
data = [{'key':'\[GlobalProgramSizeInThousands\]','value':'1000'},]
Here is my code:
columnsss = ['key','value'];
query = "select * from bparst_tags where tag_type = 1 ";
result = database.cursor(db.cursors.DictCursor);
result.execute(query);
result_set = result.fetchall();
data = "[";
for row in result_set:
`row["tag_expression"]`)
data += "{'value': %s , 'key': %s }," % ( `row["tag_expression"]`, `row["tag_name"]` )
data += "]" ;
df = DataFrame(data , columns=columnsss);
But when I pass the data in DataFrame it shows me
pandas.core.common.PandasError: DataFrame constructor not properly called!
while if I print the data and assign the same value to data variable then it works.
You are providing a string representation of a dict to the DataFrame constructor, and not a dict itself. So this is the reason you get that error.
So if you want to use your code, you could do:
df = DataFrame(eval(data))
But better would be to not create the string in the first place, but directly putting it in a dict. Something roughly like:
data = []
for row in result_set:
data.append({'value': row["tag_expression"], 'key': row["tag_name"]})
But probably even this is not needed, as depending on what is exactly in your result_set you could probably:
provide this directly to a DataFrame: DataFrame(result_set)
or use the pandas read_sql_query function to do this for you (see docs on this)
Just ran into the same error, but the above answer could not help me.
My code worked fine on my computer which was like this:
test_dict = {'x': '123', 'y': '456', 'z': '456'}
df=pd.DataFrame(test_dict.items(),columns=['col1','col2'])
However, it did not work on another platform. It gave me the same error as mentioned in the original question. I tried below code by simply adding the list() around the dictionary items, and it worked smoothly after:
df=pd.DataFrame(list(test_dict.items()),columns=['col1','col2'])
Hopefully, this answer can help whoever ran into a similar situation like me.
import json
# Opening JSON file
f = open('data.json')
# returns JSON object as
# a dictionary
data1 = json.load(f)
#converting it into dataframe
df = pd.read_json(data1, orient ='index')

Categories

Resources