Maximum limit in the length of expression evaluated by eval() in python - python

Consider the example
a = "( False or False ) and not ( False and True and False ) and not ( False and True and False ) "
print eval(a)
b = "( False or False or False or False or False or False or True or False or False or False or False or False or False or False or False or False or False or False ) and not False and not False and not ( False and False ) and not ( False and False ) and not ( False and False ) and not ( False and False ) and not ( False and False ) and not ( False and False ) and not ( False and True ) and not ( False and False ) and not ( False and False ) and not ( False and False ) and not ( False and False ) and not ( False and False) and not False"
print eval(b)
First one gives proper output. but for second eventhough synax is correct it is giving
SyntaxError: EOL while scanning string literal
because of length. I need to evaluate large expressions in my program. Any suggestions?

Try to find the limit empirically:
b = 'False or False'
while True:
try:
b = b + b[5:]
print len(b), eval(b)
except:
print len(b)
break
I stopped it at len(b) == 288MiB. Interestingly, python used up to 5.5GiB of RAM at the 288MiB level.

Related

Sklearn KernelDensity gives identical results for two different models

I'm having trouble with KernelDensity from sklearn. I put in two completely different arrays to create two different models, but the two models have identical results (scores). They should have different results for different models, shouldn't they?
Here's my code:
from sklearn.neighbors import KernelDensity
import numpy as np
kde = KernelDensity(kernel="gaussian", bandwidth=15)
def reproducible_example():
X1 = np.array([9,18,28,35,54,59,65,83,89,116,119,124,144])
X2 = np.array([39,51,57,61,66,81,88,103,120,126,130,132,134])
model1 = kde.fit(X1[:, np.newaxis])
model2 = kde.fit(X2[:, np.newaxis])
X_plot = np.linspace(0, 129, 130)
score1 = model1.score_samples(X_plot[:, np.newaxis])
score2 = model2.score_samples(X_plot[:, np.newaxis])
print(np.exp(score1) == np.exp(score2))
reproducible_example()
The expected output is False, as the two different models should return two different scores. Instead, the output is this:
[ True True True True True True True True True True True True
True True True True True True True True True True True True
True True True True True True True True True True True True
True True True True True True True True True True True True
True True True True True True True True True True True True
True True True True True True True True True True True True
True True True True True True True True True True True True
True True True True True True True True True True True True
True True True True True True True True True True True True
True True True True True True True True True True True True
True True True True True True True True True True]
Indicating that the two results are identical. How is that possible?

the pandas function any() don’t return the result what i want

I have the following DataFrame
df = pd.DataFrame(
{
'class': ['0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0'],
'item': ['1','1','2','2','2','3','3','3','3','3','4','4','5','5','5','5','5','5','5'],
'last_PO_code': ['103','103','103','104','103','103','104','105','106','103','103','104','103','103','104','105','105','106','1046'],
'qty': [3,4,3,3,2,4,4,3,3,3,5,5,2,6,8,2,6,2,6],
}
)
I apply the following rules for each unique item in the item column to this DataFrame:
last_PO_code has '103' only.
last_PO_code has ('103' & '104') and (qty column of '103' > qty column of '104')
last_PO_code has ('103' & '104' & '105' & '106') and (qty column of '105' == qty column of '106') and (qty column of '103' > qty column of '104')
last_PO_code don't have '103'
last_PO_code has ('103' & '104') and (qty column of '103' == qty column of '104')
last_PO_code has ('103' & '104' & '105' & '106') and (qty column of '105' == qty column of '106') and (qty column of '103' == qty column of '104')
I wrote the following code, but the result is not what I want.
regle1 = lambda x: True if x['last_PO_code'].eq('103').all() else False
regle2 = lambda x: True if x['last_PO_code'].eq('103').any() \
and x['last_PO_code'].eq('104').any() \
and x['last_PO_code'].eq('103').sum() > x['last_PO_code'].eq('104').sum() \
else False
regle3 = lambda x: True if x['last_PO_code'].eq('103').any() \
and x['last_PO_code'].eq('104').any() \
and x['last_PO_code'].eq('105').any() \
and x['last_PO_code'].eq('106').any() \
and x['last_PO_code'].eq('103').sum() > x['last_PO_code'].eq('104').sum() \
and x['last_PO_code'].eq('105').sum() == x['last_PO_code'].eq('106').sum() \
else False
regle4 = lambda x: False if x['last_PO_code'].eq('103').any() else True
regle5 = lambda x: True if (x['last_PO_code'].eq('103').any() \
and x['last_PO_code'].eq('104').any()) \
and x['last_PO_code'].eq('103').sum() == x['last_PO_code'].eq('104').sum() \
else False
regle6 = lambda x: True if x['last_PO_code'].eq('103').any() \
and x['last_PO_code'].eq('104').any() \
and x['last_PO_code'].eq('105').any() \
and x['last_PO_code'].eq('106').any() \
and x['last_PO_code'].eq('103').sum() == x['last_PO_code'].eq('104').sum() \
and x['last_PO_code'].eq('105').sum() == x['last_PO_code'].eq('106').sum() \
else False
df2 = df.groupby(['class','item']).apply(lambda x: pd.Series({'regle1' : regle1(x),
'regle2': regle2(x),
'regle3' : regle3(x)
}))
Only regle1 does what I want for all items. For me the problem comes from the any() function. Either I use it badly or I don't understand it well.
What I have :
regle1 regle2 regle3 regle4 regle5 regle6
class item
0 1 True False False False False False
2 False True False False False False
3 False True True False False False
4 False False False False True False
5 False True True False False False
What I want :
regle1 regle2 regle3 regle4 regle5 regle6
class item
0 1 True False False False False False
2 False True False False False False
3 False True True False False False
4 False False False False True False
5 False False False False True True
All the mistakes I noticed were on item 5, but I don't understand why
The problem is, that you are summing the number of 'last_PO_code' instead of 'qty'. In each lambda, you must have:
(x['last_PO_code'].eq('103')*x['qty']).sum()
or as mozway suggested, even better:
x.loc[x['last_PO_code'].eq('103'), 'qty'].sum()
instead of:
x['last_PO_code'].eq('103').sum()
The whole code:
egle1 = lambda x: True if x['last_PO_code'].eq('103').all() else False
regle2 = lambda x: True if x['last_PO_code'].eq('103').any() \
and x['last_PO_code'].eq('104').any() \
and (x['last_PO_code'].eq('103') * x['qty']).sum() > (x['last_PO_code'].eq('104') * x['qty']).sum() \
else False
regle3 = lambda x: True if x['last_PO_code'].eq('103').any() \
and x['last_PO_code'].eq('104').any() \
and x['last_PO_code'].eq('105').any() \
and x['last_PO_code'].eq('106').any() \
and (x['last_PO_code'].eq('103')*x['qty']).sum() > (x['last_PO_code'].eq('104')*x['qty']).sum() \
and (x['last_PO_code'].eq('105')*x['qty']).sum() == (x['last_PO_code'].eq('106')*x['qty']).sum() \
else False
regle4 = lambda x: False if x['last_PO_code'].eq('103').any() else True
regle5 = lambda x: True if (x['last_PO_code'].eq('103').any() \
and x['last_PO_code'].eq('104').any()) \
and (x['last_PO_code'].eq('103')*x['qty']).sum() == (x['last_PO_code'].eq('104')*x['qty']).sum() \
else False
regle6 = lambda x: True if x['last_PO_code'].eq('103').any() \
and x['last_PO_code'].eq('104').any() \
and x['last_PO_code'].eq('105').any() \
and x['last_PO_code'].eq('106').any() \
and (x['last_PO_code'].eq('103')*x['qty']).sum() == (x['last_PO_code'].eq('104')*x['qty']).sum() \
and (x['last_PO_code'].eq('105')*x['qty']).sum() == (x['last_PO_code'].eq('106')*x['qty']).sum() \
else False
df2 = df.groupby(['class','item']).apply(lambda x: pd.Series({'regle1' : regle1(x),
'regle2' : regle2(x),
'regle3' : regle3(x),
'regle4' : regle4(x),
'regle5' : regle5(x),
'regle6' : regle6(x),
}))
# regle1 regle2 regle3 regle4 regle5 regle6
#class item
#0 1 True False False False False False
# 2 False True False False False False
# 3 False True True False False False
# 4 False False False False True False
# 5 False False False False True True
PS. At this moment maybe it's time to use normal functions instead of lambdas, to have cleaner code :D. You also have repeatable chunks of code in your lambda, which could be easily automated.
PS2. I assumed, that in your example data, you have a typo (there shuld be 106 instead of 1046

Pandas dropna not working as expected

I have a dataframe and I used dropna() on it successfully as shown:
proc_train.isnull().any()
id False
perc_premium_paid_by_cash_credit False
age_in_days False
Income False
Count_3-6_months_late False
Count_6-12_months_late False
Count_more_than_12_months_late False
application_underwriting_score False
no_of_premiums_paid False
premium False
renewal False
sourcing_channel_B False
sourcing_channel_C False
sourcing_channel_D False
sourcing_channel_E False
Urban/Rural False
prem_to_inc_ratio False
late36_612 False
late36_12more False
late612_12more False
perc_times_prem False
Then I try to take a selection of the data to use as input variables:
X_train = proc_train.loc[:, proc_train.columns != 'renewal']
X_train = X.loc[:, X.columns != 'id']
but it then gives all the null values back:
X_train.isnull().any()
perc_premium_paid_by_cash_credit False
age_in_days False
Income False
Count_3-6_months_late True
Count_6-12_months_late True
Count_more_than_12_months_late True
application_underwriting_score True
no_of_premiums_paid False
premium False
sourcing_channel_B False
sourcing_channel_C False
sourcing_channel_D False
sourcing_channel_E False
Urban/Rural False
prem_to_inc_ratio False
late36_612 True
late36_12more True
late612_12more True
perc_times_prem False
Why does this happen and what would be a better way to run this?
This section:
X_train = X.loc[:, X.columns != 'id']
should be
X_train = X_train.loc[:, X_train.columns != 'id']
which produces the same all-False result for isnull().any() as before.

How can I hide columns in Openpyxl?

I'm hiding a bunch of columns in an Excel sheet. I'm getting this error: AttributeError: can't set attribute from this line worksheet.column_dimensions['B'].visible = False
Sorry if this is a super simple question. I just updated to a new version of Openpyxl/Pandas so i'm now having to go through my code and make changes to fit the new version's documentation.
worksheet.column_dimensions['B'].visible = False
worksheet.column_dimensions['D'].visible = False
worksheet.column_dimensions['E'].visible = False
worksheet.column_dimensions['F'].visible = False
worksheet.column_dimensions['G'].visible = False
worksheet.column_dimensions['H'].visible = False
worksheet.column_dimensions['I'].visible = False
worksheet.column_dimensions['K'].visible = False
worksheet.column_dimensions['L'].visible = False
worksheet.column_dimensions['M'].visible = False
worksheet.column_dimensions['N'].visible = False
worksheet.column_dimensions['O'].visible = False
worksheet.column_dimensions['P'].visible = False
worksheet.column_dimensions['Q'].visible = False
worksheet.column_dimensions['R'].visible = False
worksheet.column_dimensions['S'].visible = False
worksheet.column_dimensions['T'].visible = False
worksheet.column_dimensions['U'].visible = False
worksheet.column_dimensions['V'].visible = False
worksheet.column_dimensions['W'].visible = False
worksheet.column_dimensions['X'].visible = False
worksheet.column_dimensions['Y'].visible = False
worksheet.column_dimensions['Z'].visible = False
worksheet.column_dimensions['AA'].visible = False
worksheet.column_dimensions['AB'].visible = False
worksheet.column_dimensions['AC'].visible = False
worksheet.column_dimensions['AD'].visible = False
worksheet.column_dimensions['AE'].visible = False
worksheet.column_dimensions['AF'].visible = False
worksheet.column_dimensions['AG'].visible = False
worksheet.column_dimensions['AH'].visible = False
worksheet.column_dimensions['AI'].visible = False
worksheet.column_dimensions['AJ'].visible = False
worksheet.column_dimensions['AK'].visible = False
worksheet.column_dimensions['AM'].visible = False
worksheet.column_dimensions['AN'].visible = False
worksheet.column_dimensions['AP'].visible = False
worksheet.column_dimensions['AQ'].visible = False
worksheet.column_dimensions['AR'].visible = False
worksheet.column_dimensions['AS'].visible = False
worksheet.column_dimensions['AT'].visible = False
worksheet.column_dimensions['AU'].visible = False
worksheet.column_dimensions['AV'].visible = False
worksheet.column_dimensions['AW'].visible = False
worksheet.column_dimensions['AX'].visible = False
worksheet.column_dimensions['AY'].visible = False
worksheet.column_dimensions['AZ'].visible = False
worksheet.column_dimensions['BA'].visible = False
worksheet.column_dimensions['BB'].visible = False
worksheet.column_dimensions['BC'].visible = False
worksheet.column_dimensions['BD'].visible = False
worksheet.column_dimensions['BE'].visible = False
worksheet.column_dimensions['BF'].visible = False
worksheet.column_dimensions['BH'].visible = False
worksheet.column_dimensions['BI'].visible = False
worksheet.column_dimensions['BJ'].visible = False
worksheet.column_dimensions['BK'].visible = False
worksheet.column_dimensions['BL'].visible = False
worksheet.column_dimensions['BM'].visible = False
worksheet.column_dimensions['BN'].visible = False
worksheet.column_dimensions['BO'].visible = False
worksheet.column_dimensions['BP'].visible = False
worksheet.column_dimensions['BQ'].visible = False
worksheet.column_dimensions['BR'].visible = False
worksheet.column_dimensions['BS'].visible = False
worksheet.column_dimensions['BT'].visible = False
worksheet.column_dimensions['BU'].visible = False
worksheet.column_dimensions['BV'].visible = False
worksheet.column_dimensions['BW'].visible = False
worksheet.column_dimensions['BX'].visible = False
worksheet.column_dimensions['BY'].visible = False
worksheet.column_dimensions['BZ'].visible = False
worksheet.column_dimensions['CA'].visible = False
worksheet.column_dimensions['CB'].visible = False
worksheet.column_dimensions['CC'].visible = False
worksheet.column_dimensions['CD'].visible = False
worksheet.column_dimensions['CE'].visible = False
worksheet.column_dimensions['CF'].visible = False
worksheet.column_dimensions['CG'].visible = False
worksheet.column_dimensions['CH'].visible = False
worksheet.column_dimensions['CI'].visible = False
worksheet.column_dimensions['CJ'].visible = False
worksheet.column_dimensions['CK'].visible = False
worksheet.column_dimensions['CL'].visible = False
worksheet.column_dimensions['CM'].visible = False
worksheet.column_dimensions['CN'].visible = False
worksheet.column_dimensions['CO'].visible = False
worksheet.column_dimensions['CP'].visible = False
worksheet.column_dimensions['CQ'].visible = False
worksheet.column_dimensions['CR'].visible = False
worksheet.column_dimensions['CS'].visible = False
worksheet.column_dimensions['CU'].visible = False
Also, if someone could tell me if there's a more efficient way to hide the columns, which i'm certain there probably is, that would be great.
You should set the hidden attribute to True:
worksheet.column_dimensions['A'].hidden= True
In order to hide more than one column:
for col in ['A', 'B', 'C']:
worksheet.column_dimensions[col].hidden= True
Columns can be grouped:
ws.column_dimensions.group(start='B', end='CU', hidden=True)
You can use a loop for a defined workbook wb.
in this example I have 10 columns with data and want to hidden all the remaining
16385 is the index of the last excel column, XFD, +1.
import openpyxl as op
worksheet = wb['Sheet1']
max_column =ws.max_column
last_column = op.utils.cell.column_index_from_string('XFD')
for idx in range(max_column+1, last_column+1):
ws.column_dimensions[op.utils.get_column_letter(idx)].hidden = True
if you know the positions of your columns then will be easy

New line character overwrites the following byte [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I've been writing a code that inserts new line character('\n') at the end of the '#include' statements of the C language.
ex)
[BEFORE]
#include <stdio.h> #include <stdlib.h>
[AFTER]
#include <stdio.h>
#include <stdlib.h>
So I made a function for that. The varaible 'raw' stands for the string buffer of raw data, and idx is an index variable which moves through the string variable 'raw'.
def insert(raw, idx, new): # inserts string(new) to the original string(raw). Location of insertion can be specified by variable idx.
return raw[:idx] + new + raw[idx:]
Using the function above, I also wrote a code like this:
includeFlag1 = False # be ready to detect '<' of the include statement.
includeFlag2 = False # Turned on when the '<' is detected. Turned off when the include statement ends.
elif raw[idx:idx+8] == '#include' : # include statement is detected.
includeFlag1 = True
elif raw[idx] == '<' and includeFlag1 == True:
includeFlag2 = True
elif raw[idx] == '>' and includeFlag1 == True:
raw = insert(raw, idx+1, '\n') # adds the new-line charcter at the end of the include statement.
includeFlag2 = False # the include statement has just ended.
includeFlag1 = False
idx = idx + 1
The real problem is that when I run the code above, the new line character overwrites the '#' character, which is the starting point of the '#include <>' statement.
For more information, please look at the debugging log below.
The log is recorded like this:
raw[idx] [icdFlag] includeFlag1 includeFlag2
===================================================================
# [icdFlag] False False
i [icdFlag] True False
n [icdFlag] True False
c [icdFlag] True False
l [icdFlag] True False
u [icdFlag] True False
d [icdFlag] True False
e [icdFlag] True False
[icdFlag] True False
< [icdFlag] True False
s [icdFlag] True True
t [icdFlag] True True
d [icdFlag] True True
i [icdFlag] True True
o [icdFlag] True True
. [icdFlag] True True
h [icdFlag] True True
> [icdFlag] True True
[icdFlag] False False
i [icdFlag] False False
n [icdFlag] False False
c [icdFlag] False False
l [icdFlag] False False
u [icdFlag] False False
d [icdFlag] False False
e [icdFlag] False False
[icdFlag] False False
< [icdFlag] False False
s [icdFlag] False False
t [icdFlag] False False
d [icdFlag] False False
l [icdFlag] False False
i [icdFlag] False False
b [icdFlag] False False
. [icdFlag] False False
h [icdFlag] False False
> [icdFlag] False False
# [icdFlag] False False
i [icdFlag] True False
n [icdFlag] True False
c [icdFlag] True False
l [icdFlag] True False
u [icdFlag] True False
d [icdFlag] True False
e [icdFlag] True False
[icdFlag] True False
< [icdFlag] True False
s [icdFlag] True True
t [icdFlag] True True
r [icdFlag] True True
i [icdFlag] True True
n [icdFlag] True True
g [icdFlag] True True
. [icdFlag] True True
h [icdFlag] True True
> [icdFlag] True True
[icdFlag] False False
i [icdFlag] False False
n [icdFlag] False False
c [icdFlag] False False
l [icdFlag] False False
u [icdFlag] False False
d [icdFlag] False False
e [icdFlag] False False
[icdFlag] False False
< [icdFlag] False False
t [icdFlag] False False
i [icdFlag] False False
m [icdFlag] False False
e [icdFlag] False False
. [icdFlag] False False
h [icdFlag] False False
> [icdFlag] False False
v [icdFlag] False False
o [icdFlag] False False
i [icdFlag] False False
d [icdFlag] False False
[icdFlag] False False
g [icdFlag] False False
e [icdFlag] False False
t [icdFlag] False False
_ [icdFlag] False False
u [icdFlag] False False
s [icdFlag] False False
e [icdFlag] False False
r [icdFlag] False False
_ [icdFlag] False False
i [icdFlag] False False
n [icdFlag] False False
f [icdFlag] False False
o [icdFlag] False False
( [icdFlag] False False
c [icdFlag] False False
h [icdFlag] False False
a [icdFlag] False False
r [icdFlag] False False
[icdFlag] False False
* [icdFlag] False False
Of course, I've already checked whether the original C source code wasn't missing the '#' characters, but it didn't. I wonder why this kind of thing happens.
I'd probably just break it up into lines and split by the "include".
processedLines = [] # put into an array
lines = raw.split('\n')
for line in lines:
if "#include" in line:
includes = line.split("#include ") # "#include <foo.h> #include <bar.h>" becomes ["", "<foo.h>", "<bar.h>"]
for include in includes[1:] # skip the blank at the beginning
processedLines.append("#include " + include)
else
processedLines.append(line)
Perhaps this?
#!/usr/bin/env python
my_code = """
# Other stuff
#include <stdio.h> #include <stdlib.h> #include <other.h>
# Other stuff
"""
import os
import re
for line in my_code.splitlines():
include_list = re.findall(r"include <\w+\.h>", line)
if include_list:
print(os.linesep.join(include_list))
else:
print(line)

Categories

Resources