how to make this changing dataframe faster? - python

list_nn = [k for k in list(df['job_keyword'].unique()) if not str(k).isdigit()]
i = 0
for k in list_nn:
df.loc[df.job_keyword == k ,'job_keyword'] = i
df.loc[df.user_keyword == k , 'user_keyword'] = i
i+=1
it's a loop through the data frame column and if its match with it change with number
it takes more than 3 minutes is there a way to make this faster?
it's look though the entire dataframe I want to make it faster

You can try something like the following:
df[df.job_keyword.isin(list_nn)] = df[df.job_keyword.isin(list_nn)].index
df[df.user_keyword.isin(list_nn)] = df[df.job_keyword.isin(list_nn)].index

Related

Python- I have a list of 9 DataFrames I want concatenate each 3 DataFrames

Input
mydfs= [df1,df2,df3,df4,df5,df6,df7,df8,df9]
My Code
import pandas as pd
df_1 = pd.concat([mydfs[0],mydfs[1],mydfs[2]])
df_m = df_1.merge(mydfs[2])
df_2 = pd.concat([mydfs[3],mydfs[4],mydfs[5]])
df_m1 = df_2.merge(mydfs[5])
df_3 = pd.concat([mydfs[6],mydfs[7],mydfs[8]])
df_m2 = df_3.merge(mydfs[8])
But I want my code dynamic way instead of doing manually,
using for loop is it possible? may be in future the list of data frames will increase
You can use a dictionary comprehension:
N = 3
out_dfs = {f'df_{i//N+1}': pd.concat(mydfs[i:i+N])
for i in range(0, len(mydfs), N)}
output:
{'df_1': <concatenation result of ['df1', 'df2', 'df3']>,
'df_2': <concatenation result of ['df4', 'df5', 'df6']>,
'df_3': <concatenation result of ['df7', 'df8', 'df9']>,
}
You can use a loop with "globals" to iterate through mydfs and define two "kth" variables each round
i = 0
k = 1
while i < len(mydfs):
globals()["df_{}".format(k)] = pd.concat([mydfs[i],mydfs[i+1],mydfs[i+2]])
globals()["df_m{}".format(k)] = globals()["df_{}".format(k)].merge(mydfs[i+2])
i = i+3
k = k+1

i get runtime error when doing kattis oddmanout challenge

Hi im new to Kattis ive done this assignment "oddmanout" and it works when i compile it locally but i get runtime error doing it via Kattis. Im not sure why?
from collections import Counter
cases = int(input())
i = 0
case = 0
while cases > i:
list = []
i = 1 + i
case = case + 1
guests = int(input())
f = 0
while f < guests:
f = f + 1
invitation_number = int(input())
list.append(invitation_number)
d = Counter(list)
res = [k for k, v in d.items() if v == 1]
resnew = str(res)[1:-1]
print(f'Case#{case}: {resnew}')
Looking at the input data on Kattis : invitation_number = int(input()) reads not just the first integer, but the whole line of invitation numbers at once in the third line of the input. A ValueError is the result.
With invitation_numbers = list(map(int, input().split())) or alternatively invitation_numbers = [int(x) for x in input().split()] you will get your desired format directly.
You may have to rework your approach afterwards, since you have to get rid of the 2nd while loop. Additionally you don't have to use a counter, running through a sorted list and pairwise comparing the entries, may give you the solution aswell.
Additionally try to avoid naming your variables like the datatypes (list = list()).

How can I write a method or a for loop for very similar code pieces

I have a code sample below. Code works perfectly but my problem is, this code isn't clean and costing too much line, I believe this code can be reduced with a method or for-loop, but I couldn't figure out how can I achieve this. The code pieces are %90 same, only changes are happening in variable side. I only put 2 of the pieces but my code consists of 5 pieces just like this
#KFOLD-1
all_fold_X_1 = pd.DataFrame(columns=['Sentence_txt'])
index = 0
for k, i in enumerate(dfNew['Sentence_txt'].values):
if k in kFoldsTrain1:
all_fold_X_1 = all_fold_X_1.append({index:i}, ignore_index=True)
X_train1 = count_vect.fit_transform(all_fold_X_1[0].values)
Y_train1 = [i for k,i in enumerate(dfNew['Sentence_Polarity'].values) if k in kFoldsTrain1]
Y_train1 = np.asarray(Y_train1)
#KFOLD-2
all_fold_X_2 = pd.DataFrame(columns=['Sentence_txt'])
index = 0
for k, i in enumerate(dfNew['Sentence_txt'].values):
if k in kFoldsTrain2:
all_fold_X_2 = all_fold_X_2.append({index:i}, ignore_index=True)
X_train2 = count_vect.fit_transform(all_fold_X_2[0].values)
Y_train2 = [i for k,i in enumerate(dfNew['Sentence_Polarity'].values) if k in kFoldsTrain2]
Y_train2 = np.asarray(Y_train2)
A full example hasn't been provided, so I'm making some assumptions. Perhaps something along these lines:
def train(dataVar, dfNew):
ret = {}
index = 0
for k, i in enumerate(dfNew['Sentence_txt'].values):
if k in kFoldsTrain1:
dataVar = dataVar.append({index:i}, ignore_index=True)
ret['x'] = count_vect.fit_transform(dataVar[0].values)
ret['y'] = [i for k,i in enumerate(dfNew['Sentence_Polarity'].values) if k in kFoldsTrain1]
ret['y'] = np.asarray(Y_train1)
return ret
#KFOLD-1
kfold1 = train(pd.DataFrame(columns=['Sentence_txt']), dfNew)
#KFOLD-2
kfold2 = train(pd.DataFrame(columns=['Sentence_txt']), dfNew)
You perhaps get the idea. You may not need the second argument in the function dependent on if the variable 'dfNew' is global. I'm also far from a Python expert! ;)

iterate over several collections in parallel

I am trying to create a list of objects (from a class defined earlier) through a loop. The structure looks something like:
ticker_symbols = ["AZN", "AAPL", "YHOO"]
stock_list = []
for i in ticker_symbols:
stock = Share(i)
pe = stock.get_price_earnings_ratio()
ps = stock.get_price_sales()
stock_object = Company(pe, ps)
stock_list.append(stock_object)
I would however want to add one more attribute to the Company-objects (stock_object) through the loop. The attribute would be a value from another list, like (arbitrary numbers) [5, 10, 20] where the first attribute would go to the first object, the second to the second object etc.Is it possible to do something like:
for i, j in ticker_symbols, list2:
#dostuff
? Could not get this sort of nested loop to work on my own. Thankful for any help.
I believe that all you have to do is change the for loop.
Instead of "for i in ticker_symbols:" you should loop like
"for i in range(len(ticker_symbols))" and then use the index i to do whatever you want with the second list.
ticker_symbols = ["AZN", "AAPL", "YHOO"]
stock_list = []
for i in range(len(ticker_symbols):
stock = Share(ticker_symbols[i])
pe = stock.get_price_earnings_ratio()
ps = stock.get_price_sales()
# And then you can write
px = whatever list2[i]
stock_object = Company(pe, ps, px)
stock_list.append(stock_object)
Some people say that using index to iterate is not good practice, but I don't think so specially if the code works.
Try:
for i, j in zip(ticker_symbols, list2):
Or
for (k, i) in enumerate(ticker_symbols):
j = list2[k]
Equivalently:
for index in range(len(ticker_symbols)):
i = ticker_symbols[index]
j = list2[index]

looping dictionaries of {tuple:NumPy.array}

i have a set of dictionaries k of the form {(i,j):NumPy.array} over which I want to loop the NumPy.arrays for a certain evaluation.
I made the dictionarries as follows:
datarr = ['PowUse', 'PowHea', 'PowSol', 'Top']
for i in range(len(dat)): exec(datarr[i]+'={}')
so i can always change the set of data i want to evaluate in my bigger set of code by changeing the original list of strings. However, this means i have to call for my dictionaries as eval(k) for k in datarr.
As a result, the loop i want to do looks like this for the moment :
for i in filarr:
for j in buiarr:
for l in datarrdif:
a = eval(l)[(i, j)]
a[abs(a)<.01] = float('NaN')
eval(l).update({(i, j):a})
but is there a much nicer way to write this ? I tried following, but this didn't work:
[eval(l)[(i, j)][abs(eval(l)[(i, j)])<.01 for i in filarr for j in buiarr for k in datarrdiff] = float('NaN')`
Thx in advance
datarr = ['PowUse', 'PowHea', 'PowSol', 'Top']
for i in range(len(dat)): exec(datarr[i]+'={}')
Why don't you create them as a dictionary of dictionaries?
datarr = ['PowUse', 'PowHea', 'PowSol', 'Top']
data = dict((name, {}) for name in datarr)
Then you can avoid all the eval().
for i in filarr:
for j in buiarr:
for l in datarr:
a = data[l][(i, j)]
np.putmask(a, np.abs(a)<.01, np.nan)
data[l].update({(i, j):a})
or probably just:
for arr in data.itervalues():
np.putmask(arr, np.abs(arr)<.01, np.nan)
if you want to set all elements of all dictionary values where abs(element) < .01 to NaN .

Categories

Resources