regex replace using function python - python

I'm trying to print a table of my database using :
pd.read_sql_query("SELECT name,duration FROM activity where (strftime('%W', date) = strftime('%W', 'now'))", conn))
and it work it prints :
name duration
0 programmation 150
1 lecture 40
2 ctf 90
3 ceh 90
4 deep learning 133
5 vm capture the flag 100
but I would like to use my function minuteToStr who translate the duration to string likes "1h30" on the duraton colowns.
I tried this code but it does'nt work :
tableau = str(pd.read_sql_query("SELECT name,duration FROM activity\
where (strftime('%W', date) = strftime('%W', 'now'))", conn))
tableau = re.sub("([0-9]{2,})", minuteToStr(int("\\1")), tableau)
print(tableau)
Thanks

Make this easy, just use a little mathemagic and string formatting.
h = df.duration // 60
m = df.duration % 60
df['duration'] = h.astype(str) + 'h' + m.astype(str) + 'm'
df
name duration
0 programmation 2h30m
1 lecture 0h40m
2 ctf 1h30m
3 ceh 1h30m
4 deep learning 2h13m
5 vm capture the flag 1h40m

re.sub doesn't work this way. It expects a string, not a DataFrame.
Given that minuteToStr accepts an integer, you can simply use apply:
tableau['duration'] = tableau['duration'].apply(minuteToStr)

Similar to using a function inside re.sub in pandas we can use str.replace . Similar type is used here i.e
If duration column is of integer type then
tableau['duration'].astype(str).str.replace("([0-9]{2,})", minuteToStr)
Else:
tableau['duration'].str.replace("([0-9]{2,})", minuteToStr)
To illustrate using function inside replace (I prefer you go with #colspeed's solution)
def minuteToStr(x):
h = int(x.group(1)) // 60
m = int(x.group(1)) % 60
return str(h) + 'h' + str(m)
df['duration'].astype(str).str.replace("([0-9]{2,})",minuteToStr)
name duration
0 programmation 2h30
1 lecture 0h40
2 ctf 1h30
3 ceh 1h30
4 deeplearning 2h13
5 vmcapturetheflag 1h40

Related

Trouble slicing based on function +1

I am trying to search c_item_number_one = (r'12" Pipe SA-106 GR. B SCH 40 WALL smls'.upper()) for " to pull both it and all information in front of it. i.e. I want 12"
I thought I could just search for what position " is in...
def find_nps_via_comma_item_one():
nps = '"'
print(c_item_number_one.find(nps))
find_nps_via_comma_item_one()
Image showing above function results in 2
and then slice everything off after it
c_item_number_one = (r'12" Pipe SA-106 GR. B SCH 40 WALL smls'.upper())
def find_nps_via_comma_item_one():
nps = '"'
print(c_item_number_one.find(nps))
find_nps_via_comma_item_one()
item_one_nps = slice(3)
print(c_item_number_one[item_one_nps])
Issue: It is returning an error
print(c_item_number_one[item_one_nps])
TypeError: slice indices must be integers or None or have an __index__ method
How can I turn the results of my function into an integer? I've tried changing print(c_item_number_one.find(nps)) to return(c_item_number_one.find(nps)) but then it stopped giving a value entirely.
Lastly, the slice portion does not produce the full answer I am looking for 12". Even if I enter the value produced by the function 2
item_one_nps = slice(2)
print(c_item_number_one[item_one_nps])
It only gives me 12. I need to +1 the function results.
You could do
sep_char = "\""
c_item_number_one.split(sep_char)[0] + sep_char
The print statement prints a value to the console whereas a return returns a value where the function is call.
In your code you are not storing the value but just printing it to the console even when you used return instead of print you weren't making use of the returned value.
1 is being added to the slice since while slicing python excludes the stop index so to include the stop index you add 1
c_item_number_one = (r'12" Pipe SA-106 GR. B SCH 40 WALL smls'.upper())
def find_nps_via_comma_item_one():
nps = '"'
return(c_item_number_one.find(nps))
item_one_nps = slice(find_nps_via_comma_item_one()+1)
print(c_item_number_one[item_one_nps])
The following code is more verbose.
c_item_number_one = (r'12" Pipe SA-106 GR. B SCH 40 WALL smls'.upper())
def find_nps_via_comma_item_one():
nps = '"'
return(c_item_number_one.find(nps))
index = find_nps_via_comma_item_one()
item_one_nps = slice(index+1)
print(c_item_number_one[item_one_nps])

How to make a decent R function for rainbow table?

I'm trying to develop a rainbow table. The search function works well, but the problem is when I try to generate the table.
There are 54 possible characters and the passwords are 3 characters long (for now). I can generate a table with 1024 lines, and 153 columns. In theory, if there were no collisions, there would be a >95% chance that I crack the password (1024*153 ≈ 54^3).
But I'm getting 126662 collisions.... Here's my R function
def reduct(length, hashData, k):
### Initializing variables and mapping ###
while len(pwd) != length:
pwdChar = (int(hashData[0], 16) + int(hashData[1], 16) + int(hashData[2], 16) + int(hashData[3], 16) - 7 + 3*k) % 54
hashData = hashData[3:]
pwd += mapping[pwdChar][1]
return pwd
How can this function result in so many collisions? The maximum sum of the first 4 nibbles can be 60 so -7 ensures it's between 0 and 53 (equal chance for all chars). +3k makes it different for every column and % 54 to make sure it fits in the mapping.

How to concatenate string from integers?

I'm very new to python.
How do I get a string like this:
53 (46.49 %)
But, I'm getting this:
1 53 (1 46.49 %)
I'm trying to get the last value from the table count and the proportion (i'm not sure what it's called in python)
table = pd.value_counts(data[var].values, sort=False)
prop_table = (table/table.sum() * 100).round(2)
num = table[[1]].to_string()
prop = prop_table[[1]].to_string()
test = num + " (" + prop + " %)"
but, it puts 1 before displaying the number.

Resolution of the knapsack approach by bruteforce in python

I'm actually trying to resolve the knapsack problem with bruteforce. I know it's not efficient at all, I just want to implement it in python.
The problem is it take long time. In my point of view too much time for a bruteforce. So maybe I make some mistakes in my code...
def solve_it(input_data):
# Start the counting clock
start_time = time.time()
# Parse the input
lines = input_data.split('\n')
firstLine = lines[0].split()
item_count = int(firstLine[0])
capacity = int(firstLine[1])
items = []
for i in range(1, item_count+1):
line = lines[i]
parts = line.split()
items.append(Item(i-1, int(parts[0]), int(parts[1])))
# a trivial greedy algorithm for filling the knapsack
# it takes items in-order until the knapsack is full
value = 0
weight = 0
best_value = 0
my_list_combinations = list()
our_range = 2 ** (item_count)
print(our_range)
output = ""
for i in range(our_range):
# for exemple if item_count is 7 then 2 in binary is
# 0000010
binary = binary_repr(i, width=item_count)
# print the value every 0.25%
if (i % (our_range/400) == 0):
print("i : " + str(i) + '/' + str(our_range) + ' ' +
str((i * 100.0) / our_range) + '%')
elapsed_time_secs = time.time() - start_time
print "Execution: %s secs" % \
timedelta(seconds=round(elapsed_time_secs))
my_list_combinations = (tuple(binary))
sum_weight = 0
sum_value = 0
for item in items:
sum_weight += int(my_list_combinations[item.index]) * \
int(item.weight)
if sum_weight <= capacity:
for item in items:
sum_value += int(my_list_combinations[item.index]) * \
int(item.value)
if sum_value > best_value:
best_value = sum_value
output = 'The decision variable is : ' + \
str(my_list_combinations) + \
' with a total value of : ' + str(sum_value) + \
' for a weight of : ' + str(sum_weight) + '\n'
return output
Here is the file containing the 30 objects :
30 100000 # 30 objects with a maximum weight of 100000
90000 90001
89750 89751
10001 10002
89500 89501
10252 10254
89250 89251
10503 10506
89000 89001
10754 10758
88750 88751
11005 11010
88500 88501
11256 11262
88250 88251
11507 11514
88000 88001
11758 11766
87750 87751
12009 12018
87500 87501
12260 12270
87250 87251
12511 12522
87000 87001
12762 12774
86750 86751
13013 13026
86500 86501
13264 13278
86250 86251
I dont show the code relative to the reading of the file because I think it's pointless... For 19 objects I'm able to solve the problem with bruteforce in 14 seconds. But for 30 objects I have calculated that it would take me roughly 15h. So I think that there is a problem in my computing...
Any help would be appreciated :)
Astrus
Your issue, that solving the knapsack problem takes too long, is indeed frustrating, and it pops up in other places where algorithms are high-order polynomial or non-polynomial. You're seeing what it means for an algorithm to have exponential runtime ;) In other words, whether your python code is efficient or not, you can very easily construct a version of the knapsack problem that your computer won't be able to solve within your lifetime.
Exponential running time means that every time you add another object to your list, the brute-force solution will take twice as long. If you can solve for 19 objects in 14 seconds, that suggests that 30 objects will take 14 secs x (2**11) = 28672 secs = about 8 hours. To do 31 objects might take about 16 hours. Etc.
There are dynamic programming approaches to the knapsack problem which trade off runtime for memory (see the Wikipedia page), and there are numerical optimisers which are tuned to solve constraint-based problems very quickly (again, Wikipedia), but none of this really alters the fact that finding exact solutions to the knapsack problem is just plain hard.
TL;DR: You're able to solve for 19 objects in a reasonable amount of time, but not for 30 (or 100 objects). This is a property of the problem you're solving, and not a shortcoming with your Python code.

pandas and "re" - search for total and partial strings

This an extended question from this topic. I would like to search in strings total and partial strings like the following keywords Series "w":
rigour*
*demeanour*
centre*
*arbour
fulfil
This obviously means that I wanted to search for words like rigour and rigours, endemeanour and demeanours, centre and centres, harbour and arbour, and fulfil. So the keywords list I have is a mix of complete and partial strings to find. I would like to apply the search on this DataFrame "df":
ID;name
01;rigour
02;rigours
03;endemeanour
04;endemeanours
05;centre
06;centres
07;encentre
08;fulfil
09;fulfill
10;harbour
11;arbour
12;harbours
What I tried so far is the following:
r = re.compile(r'.*({}).*'.format('|'.join(w.values)), re.IGNORECASE)
then I've build a mask to filter the DataFrame:
mask = [m.group(1) if m else None for m in map(r.search, df['Tweet'])]
in order to get a new column with the Keyword found:
df['keyword'] = mask
What I'm expecting is the following resulting DataFrame:
ID;name;keyword
01;rigour;rigour
02;rigours;rigour
03;endemeanour;demeanour
04;endemeanours;demeanour
05;centre;centre
06;centres;centre
07;encentre;None
08;fulfil;fulfil
09;fulfill;None
10;harbour;arbour
11;arbour;arbour
12;harbours;None
This works using a w list without *. Now I had several issues in formatting the keyword w List of words with the * conditions, in order to run the re.compile function correctly.
Any help would be really appreciated.
It looks like your input series w needs to be adjusted to be used as regex pattern like this:
rigour.*
.*demeanour.*
centre.*
\\b.*arbour\\b
\\bfulfil\\b
Note that * in regex goes after something it does not work on its own. It means that whatever it follows can be repeated 0 or more times.
Note also that fulfil is a part of fulfill and if you want to have strict match you need to tell regex this. For example by using 'word separator' - \b - it will catch only string as whole.
Here is how your regex might look like to give you results that you need:
s = '({})'.format('|'.join(w.values))
r = re.compile(s, re.IGNORECASE)
r
re.compile(r'(rigour.*|.*demeanour.*|centre*|\b.*arbour\b|\bfulfil\b)', re.IGNORECASE)
And your code to have the replacement could be done with pandas .where method like this:
df['keyword'] = df.name.where(df.name.str.match(r), None)
df
ID name keyword
0 1 rigour rigour
1 2 rigours rigours
2 3 endemeanour endemeanour
3 4 endemeanours endemeanours
4 5 centre centre
5 6 centres centres
6 7 encentre None
7 8 fulfil fulfil
8 9 fulfill None
9 10 harbour harbour
10 11 arbour arbour
11 12 harbours None

Categories

Resources