how to name the output dataframe generated by python-ply - python

How can I name the data frame generated by the following code?
import re
import os
import csv
import codecs
import numpy as np
import matplotlib as p
import pdb
import pandas as pd
from pandas_ply import install_ply, X, sym_call
install_ply(pd)
(data_merged
.groupby('index')
.ply_select(
count = X.index.count(),
p_avg = X.item_price.mean()
))

Looking at your example, I assume that you mean to name the output variable of the dataframe, which would be data_out in the following:
data_out = (data_merged
.groupby('index')
.ply_select(
count = X.index.count(),
p_avg = X.item_price.mean()
)
)
Note that this is not actually giving the DataFrame a name, it's just naming the variable. You could create another variable that holds a reference to data_out (a pointer) that would have a different name. This is true because data_out is a mutable object.
Series are named and its name is stored in the name attribute. DataFrames are not named, but their columns are, since they are Series.

Related

How to change the attribute values in h5ad file?

There is an attribute in h5ad file which is var_names. I converted the values of var_names to lowercase. Now I want to save/rewrite the new values to the var_names attribute in the h5ad file. How can I do that?
#ipynb file
import scanpy as sc
import anndata as ad
import pandas as pd
import numpy as np
file1 = sc.read_h5ad('/Users/nitish/Downloads/human_all.h5ad')
file2 = sc.read_h5ad('/Users/nitish/Downloads/mouse_all.h5ad')
file1.var_names.str.lower()
file2.var_names.str.lower()
file1.var_names = file1.var_names.str.lower()
file2.var_names = file2.var_names.str.lower()
How to save new file1.var_name to the existing h5ad file?

Iterate through df and update based on prediction

I am not a Python programmer, so am struggling with the following;
def py_model(df):
import pickle
import pandas as pd
import numpy as np
from pandas import Series,DataFrame
filename = 'C:/aaaTENNIS-DATA/votingC.pkl'
loaded_model = pickle.load(open(filename,'rb'))
for index, row in df.iterrows():
ab = row[['abc','def','ghi','jkl']]
input = np.array(ab)
df['Prediction'] =pd.DataFrame(loaded_model.predict([input]))
df['AccScore'] =??
return df
For each row of the dataframe, I wish to get a prediction and put it in df['Prediction'] and also get the model score and put it in another field.
You don't need to iterate
import pickle
filename = 'C:/aaaTENNIS-DATA/votingC.pkl'
loaded_model = pickle.load(open(filename,'rb'))
df['Prediction'] = loaded_model.predict(df[['abc','def','ghi','jkl']])
Tip #1: don't use input as a variable, it's a built-in function in python: https://docs.python.org/3/library/functions.html#input
Tip #2: don't put import statement in a function, put them all at the beginning of your file

pandas read_csv add attributes by stdin issue

I want to add a new column in the dataframe. The new column is depend on some rules.
This is my code:
#!/usr/bin/python3.6
# coding=utf-8
import sys
import pandas as pd
import numpy as np
import io
import csv
df = pd.read_csv(sys.stdin,sep=',',encoding='utf-8',engine="python")
col_0 = check
df['df_cal'] = df.groupby(col_0)[col_0].transform('count')
df['status'] = np.where(
df['df_cal'] > 1,'change',
'New')
df = df.drop_duplicates(
subset=df.columns.difference(['keep']),keep = False)
df = df[(df.keep == '2')]
df.drop(['keep','df_cal'],axis = 1,inplace = True)
# print(sys.stdin)
df.to_csv(sys.stdout,encoding='utf-8',index = None)
sample csv:
VIP_number,keep
ab1,1
ab1,2
ab2,2
ab3,1
when I try to run this code, I write the command like this:
python3.6 nifi_python.py < test.csv check = VIP_number
and I get the error:
name 'check' is not defined
This is still not work because I don't know how can I input the column name to col_0 by stdin. col_0 should be 'VIP_number'. I don't want to hardcode the column name because the script will use in next time but the columns are different.
How can I add a new column in the dataframe by stdin?
Any help would be very much appreciated.
#!/usr/bin/python3.6
# coding=utf-8
import sys
import pandas as pd
import numpy as np
import io
import csv
if len(sys.argv) < 2:
print( "Usage: nifi_python.py check=<column>"
sys.exit(1)
df = pd.read_csv(sys.stdin,sep=',',encoding='utf-8',engine="python")
col_0 = sys.argv[1].split('=')[1]
...
python nifi_python.py check=VIP_number < test.csv

Trouble using classes to call an instance on a dataframe object

Newbie at dealing with classes.
I have some dataframe objects I want to transform, but I'm having trouble manipulating them with classes. Below is an example. The goal is to transpose a dataframe and reassign it to its original variable name. In this case, the dataframe is assets.
import pandas as pd
from requests import get
import numpy as np
html = get("https://www.cbn.gov.ng/rates/Assets.asp").text
table = pd.read_html(html,skiprows=[0,1])[2]
assets = table[1:13]
class Array_Df_Retitle:
def __init__(self,df):
self.df = df
def change(self):
self.df = self.df.transpose()
self.df.columns = self.df[0]
return self.df
However, calling assets = Array_Df_Retitle(assets).change() simply yields an error:
KeyError: 0
I'd like to know where I'm getting things wrong.
I made a few changes to your code. The problem is coming from self.df[0]. This means you are selecting the column named 0. However, after transposing, you will not have any column named 0. You will have a row instead.
import pandas as pd
from requests import get
import numpy as np
html = get("https://www.cbn.gov.ng/rates/Assets.asp").text
table = pd.read_html(html,skiprows=[0,1])[2]
assets = table[1:13]
class Array_Df_Retitle:
def __init__(self,df):
self.df = df
def change(self):
self.df = self.df.dropna(how='all').transpose()
self.df.columns = self.df.loc[0,:]
return self.df.drop(0).reset_index(drop=True)
Array_Df_Retitle(assets).change()

How do I save my list to a dataframe keeping empty rows?

I'm trying to extract subject-verb-object triplets and then attach an ID. I am using a loop so my list of extracted triplets keeping the results for the rows were no triplet was found. So it looks like:
[]
[trump,carried,energy]
[]
[clinton,doesn't,trust]
When I print mylist it looks as expected.
However when I try and create a dataframe from mylist I get an error caused by the empty rows
`IndexError: list index out of range`.
I tried to include an if statement to avoid this but the problem is the same. I also tried using reindex instead but the df2 came out empty.
# -*- coding: utf-8 -*-
from __future__ import unicode_literals
import spacy
import textacy
import csv, string, re
import numpy as np
import pandas as pd
#Import csv file with pre-processing already carried out
import pandas as pd
df = pd.read_csv("pre-processed_file_1.csv", sep=",")
#Prepare dataframe to be relevant columns and unicode
df1 = df[['text_1', 'id']].copy()
import StringIO
s = StringIO.StringIO()
tweets = df1.to_csv(encoding='utf-8');
nlp = spacy.load('en')
count = 0;
df2 = pd.DataFrame();
for row in df1.iterrows():
doc = nlp(unicode(row));
text_ext = textacy.extract.subject_verb_object_triples(doc);
tweetID = df['id'].tolist();
mylist = list(text_ext)
count = count + 1;
if (mylist):
df2 = df2.append(mylist, ignore_index=True)
else:
df2 = df2.append('0','0','0')
Any help would be very appreciated. Thank you!
You're supposed to pass a DataFrame-shaped object to append. Passing the raw data doesn't work. So df2=df2.append([['0','0','0']],ignore_index=True)
You can also wrap your processing in a function process_row, then do df2 = pd.DataFrame([process_row(row) for row in df1.iterrows()]). Note that while append won't work with empty rows, the DataFrame constructor just fills them in with None. If you want empty rows to be ['0','0','0'], you have several options:
-Have your processing function return ['0','0','0'] for empty rows -Change the list comprehension to [process_row(row) if process_row(row) else ['0','0','0'] for row in df1.iterrows()] -Do df2=df2.fillna('0')

Categories

Resources