How to replace multiple files with scraped html whilst numbered [duplicate] - python

Is it possible to use variables in the format specifier in the format()-function in Python? I have the following code, and I need VAR to equal field_size:
def pretty_printer(*numbers):
str_list = [str(num).lstrip('0') for num in numbers]
field_size = max([len(string) for string in str_list])
i = 1
for num in numbers:
print("Number", i, ":", format(num, 'VAR.2f')) # VAR needs to equal field_size

You can use the str.format() method, which lets you interpolate other variables for things like the width:
'Number {i}: {num:{field_size}.2f}'.format(i=i, num=num, field_size=field_size)
Each {} is a placeholder, filling in named values from the keyword arguments (you can use numbered positional arguments too). The part after the optional : gives the format (the second argument to the format() function, basically), and you can use more {} placeholders there to fill in parameters.
Using numbered positions would look like this:
'Number {0}: {1:{2}.2f}'.format(i, num, field_size)
but you could also mix the two or pick different names:
'Number {0}: {1:{width}.2f}'.format(i, num, width=field_size)
If you omit the numbers and names, the fields are automatically numbered, so the following is equivalent to the preceding format:
'Number {}: {:{width}.2f}'.format(i, num, width=field_size)
Note that the whole string is a template, so things like the Number string and the colon are part of the template here.
You need to take into account that the field size includes the decimal point, however; you may need to adjust your size to add those 3 extra characters.
Demo:
>>> i = 3
>>> num = 25
>>> field_size = 7
>>> 'Number {i}: {num:{field_size}.2f}'.format(i=i, num=num, field_size=field_size)
'Number 3: 25.00'
Last but not least, of Python 3.6 and up, you can put the variables directly into the string literal by using a formatted string literal:
f'Number {i}: {num:{field_size}.2f}'
The advantage of using a regular string template and str.format() is that you can swap out the template, the advantage of f-strings is that makes for very readable and compact string formatting inline in the string value syntax itself.

I prefer this (new 3.6) style:
name = 'Eugene'
f'Hello, {name}!'
or a multi-line string:
f'''
Hello,
{name}!!!
{a_number_to_format:.1f}
'''
which is really handy.
I find the old style formatting sometimes hard to read. Even concatenation could be more readable. See an example:
'{} {} {} {} which one is which??? {} {} {}'.format('1', '2', '3', '4', '5', '6', '7')

I used just assigned the VAR value to field_size and change the print statement. It works.
def pretty_printer(*numbers):
str_list = [str(num).lstrip('0') for num in numbers]
field_size = max([len(string) for string in str_list])
VAR=field_size
i = 1
for num in numbers:
print("Number", i, ":", format(num, f'{VAR}.2f'))

Related

Reinstate lost leading zeroes in Python list

I have a list of geographical postcodes that take the format xxxx (a string of numbers).
However, in the process of gathering and treating the data, the leading zero has been lost in cases where the postcode begins with '0'.
I need to reinstate the leading '0' in such cases.
Postcodes either occur singularly as xxxx, or they occur as a range in my list, xxxx-xxxx.
Have:
v = ['821-322', '877', '2004-2218', '2022']
Desired output:
['0821-0322', '0877', '2004-2218', '2022']
^ ^ ^
Attempt:
for i in range(len(v)):
v[i] = re.sub(pattern, '0' + pattern, v)
However, I'm struggling with the regex pattern, and how to simply get the desired result.
There is no requirement to use re.sub(). Any simple solution will do.
You should use f-string formatting instead!
Here is a one-liner to solve your problem:
>>> v = ['821-322', '877', '2004-2218', '2022']
>>> ["-".join([f'{i:0>4}' for i in x.split("-")]) for x in v]
['0821-0322', '0877', '2004-2218', '2022']
A more verbose example is this:
v = ['821-322', '877', '2004-2218', '2022']
newv = []
for number in v:
num_holder = []
# Split the numbers on "-", returns a list of one if no split occurs
for num in number.split("-"):
# Append the formatted string to num_holder
num_holder.append(f'{num:0>4}')
# After each number has been formatted correctly, create a
# new string which is joined together with a "-" once again and append it to newv
newv.append("-".join(num_holder))
print(newv)
You can read up more on how f-strings work here and a further description of the "mini-language" that is used by the formatter here
The short version of the explanation is this:
f'{num:0>4}'
f tells the interpreter that a format-string is following
{} inside of the string tells the formatter that it is a replacement-field and should be "calculated"
num inside of the brackets is a reference to a variable
: tells the formatter that there is a format-specifier settings following.
0 is the variable / value that should be used to 'fill' the string.
> is the alignment of the variable num on the new string. > means to the right.
4 is the minimum number of characters that we want the resulting string to have. If num is equal to or greater that 4 characters long then the formatter will do nothing.

question related to formatting . Usage of format() in python

could anyone explain how format() works in python? where to use it, and how to use it?. I am not getting even I about this keyword
You can regarding it as a kind of string replacement.
{} part in the string -> string.format() content
Definition: https://www.w3schools.com/python/ref_string_format.asp
A pratical example can be like this:
base_url = 'www.xxxx.com/test?page={}'
for i in range(10):
url = base_url.format(i)
do sth
The format() method formats the specified value(s) and insert them inside the string's placeholder.
txt1 = "My name is {fname}, I'm {age}".format(fname = "John", age = 36)
Here fname will be replaced by John and age will be replaced by 36, if you print txt1.
Alternatively you can use f strings .
eg:
fname= "John"
age= 36
print(f"My name is {fname}, I'm {age}")
Even it will print the same output.
Format is often applied as a str-type method: txt.format(...), where type(txt)='str'.
This function is used to insert values inside string's placeholders. Placeholders are curly brackets {} placed inside a string and the format() method returns a formatted string with the values plugged into the string.
This function also enables formatting different type of variables in different ways. E.g. float with value 0.0001 can be represented in floating point representation: 0.0001 or scientific representation 1e-4 using different specifires.
Usage:
txt = "My name is {name}. I'm {age} years old."
print(txt.format(name="Dan", age=32))
Will output: 'My name is Dan. I'm 32 years old.'
You can use positional arguments as well:
txt = "My name is {}. I'm {} years old."
print(txt.format("Dan", 32))
Where the values are taken by their order.
This will output the same result.
To format with different formatting you can use specifiers:
txt = "Decimal numbers: {number:d}"
print(txt.format(number=8340))
txt = "Fix point numbers: {number:.2f}"
print(txt.format(number=3.1415))
There are other specifiers that have other formatting behavior like centering some value to match some desired width:
txt = "{center:^20}"
print(txt.format(center='center'))
This will output ' center ' which contains exactly 20 characters.
There are many more formatting options that you can browse here
or in many other rescorces.

python print format, read indentation value from a variable [duplicate]

Is it possible to use variables in the format specifier in the format()-function in Python? I have the following code, and I need VAR to equal field_size:
def pretty_printer(*numbers):
str_list = [str(num).lstrip('0') for num in numbers]
field_size = max([len(string) for string in str_list])
i = 1
for num in numbers:
print("Number", i, ":", format(num, 'VAR.2f')) # VAR needs to equal field_size
You can use the str.format() method, which lets you interpolate other variables for things like the width:
'Number {i}: {num:{field_size}.2f}'.format(i=i, num=num, field_size=field_size)
Each {} is a placeholder, filling in named values from the keyword arguments (you can use numbered positional arguments too). The part after the optional : gives the format (the second argument to the format() function, basically), and you can use more {} placeholders there to fill in parameters.
Using numbered positions would look like this:
'Number {0}: {1:{2}.2f}'.format(i, num, field_size)
but you could also mix the two or pick different names:
'Number {0}: {1:{width}.2f}'.format(i, num, width=field_size)
If you omit the numbers and names, the fields are automatically numbered, so the following is equivalent to the preceding format:
'Number {}: {:{width}.2f}'.format(i, num, width=field_size)
Note that the whole string is a template, so things like the Number string and the colon are part of the template here.
You need to take into account that the field size includes the decimal point, however; you may need to adjust your size to add those 3 extra characters.
Demo:
>>> i = 3
>>> num = 25
>>> field_size = 7
>>> 'Number {i}: {num:{field_size}.2f}'.format(i=i, num=num, field_size=field_size)
'Number 3: 25.00'
Last but not least, of Python 3.6 and up, you can put the variables directly into the string literal by using a formatted string literal:
f'Number {i}: {num:{field_size}.2f}'
The advantage of using a regular string template and str.format() is that you can swap out the template, the advantage of f-strings is that makes for very readable and compact string formatting inline in the string value syntax itself.
I prefer this (new 3.6) style:
name = 'Eugene'
f'Hello, {name}!'
or a multi-line string:
f'''
Hello,
{name}!!!
{a_number_to_format:.1f}
'''
which is really handy.
I find the old style formatting sometimes hard to read. Even concatenation could be more readable. See an example:
'{} {} {} {} which one is which??? {} {} {}'.format('1', '2', '3', '4', '5', '6', '7')
I used just assigned the VAR value to field_size and change the print statement. It works.
def pretty_printer(*numbers):
str_list = [str(num).lstrip('0') for num in numbers]
field_size = max([len(string) for string in str_list])
VAR=field_size
i = 1
for num in numbers:
print("Number", i, ":", format(num, f'{VAR}.2f'))

Replace each occurrence of sub-strings in the string with randomly generated values

I am having strings like:
"What is var + var?"
"Find the midpoint between (var,var) and (var,var)"
I want to change the each occurrence of vars in the above sentences to the random different integers. My current code code is:
question = question.replace("var",str(random.randint(-10,10)))
This only made all the integers into the same randomly genererated number, For example;
"Find the midpoint between (5,5) and (5,5)"
As I'm aware that for loops cannot be used on a string, how is it possible to change the substring "var" to different values rather than that single generated number?
You may use str.format to achieve this as:
import random
my_str = "Find the midpoint between (var,var) and (var,var)"
var_count = my_str.count("var") # count of `var` sub-string
format_str = my_str.replace('var', '{}') # create valid formatted string
# replace each `{}` in formatted string with random `int`
new_str = format_str.format(*(random.randint(-10, 10) for _ in range(var_count)))
where new_str will hold something like:
'Find the midpoint between (6,-10) and (-5,2)'
Suggestion: It is better to use '{}' instead of 'var' in the original string (as python performs formatting based on {}). Hence, in the above solution you may skip the .replace() part.
References related to string formatting:
Python string formatting: % vs. .format
String formatting in Python
You can use following code to make it easier:
pattern = "Find the midpoint between (%s, %s) and (%s, %s)"
question = pattern % (str(1), str(2), str(3.0), str(4))
print(question)
>>> Find the midpoint between (1, 2) and (3.0, 4)
You can use string formatting, like so:
"What is {} + {}?".format(random.randint(-10,10), random.randint(-10,10))
and:
four_random_numbers = [random.randint(-10, 10) for _ in range(4)]
"Find the midpoint between ({}, {}) and ({}, {})".format(*four_random_numbers)
You could easily re-write it as a function returning n-random numbers to use in your questions.

Python Regex to match a string as a pattern and return number

I have some lines that represent some data in a text file. They are all of the following format:
s = 'TheBears SUCCESS Number of wins : 14'
They all begin with the name then whitespace and the text 'SUCCESS Number of wins : ' and finally the number of wins, n1. There are multiple strings each with a different name and value. I am trying to write a program that can parse any of these strings and return the name of the dataset and the numerical value at the end of the string. I am trying to use regular expressions to do this and I have come up with the following:
import re
def winnumbers(s):
pattern = re.compile(r"""(?P<name>.*?) #starting name
\s*SUCCESS #whitespace and success
\s*Number\s*of\s*wins #whitespace and strings
\s*\:\s*(?P<n1>.*?)""",re.VERBOSE)
match = pattern.match(s)
name = match.group("name")
n1 = match.group("n1")
return (name, n1)
So far, my program can return the name, but the trouble comes after that. They all have the text "SUCCESS Number of wins : " so my thinking was to find a way to match this text. But I realize that my method of matching an exact substring isn't correct right now. Is there any way to match a whole substring as part of the pattern? I have been reading quite a bit on regular expressions lately but haven't found anything like this. I'm still really new to programming and I appreciate any assistance.
Eventually, I will use float() to return n1 as a number, but I left that out because it doesn't properly find the number in the first place right now and would only return an error.
Try this one out:
((\S+)\s+SUCCESS Number of wins : (\d+))
These are the results:
>>> regex = re.compile("((\S+)\s+SUCCESS Number of wins : (\d+))")
>>> r = regex.search(string)
>>> r
<_sre.SRE_Match object at 0xc827cf478a56b350>
>>> regex.match(string)
<_sre.SRE_Match object at 0xc827cf478a56b228>
# List the groups found
>>> r.groups()
(u'TheBears SUCCESS Number of wins : 14', u'TheBears', u'14')
# List the named dictionary objects found
>>> r.groupdict()
{}
# Run findall
>>> regex.findall(string)
[(u'TheBears SUCCESS Number of wins : 14', u'TheBears', u'14')]
# So you can do this for the name and number:
>>> fullstring, name, number = r.groups()
If you don't need the full string just remove the surround parenthesis.
I believe that there is no actual need to use a regex here. So you can use the following code if it acceptable for you(note that i have posted it so you will have ability to have another one option):
dict((line[:line.lower().index('success')+1], line[line.lower().index('wins:') + 6:]) for line in text.split('\n') if 'success' in line.lower())
OR in case of you are sure that all words are splitted by single spaces:
output={}
for line in text:
if 'success' in line.lower():
words = line.strip().split(' ')
output[words[0]] = words[-1]
If the text in the middle is always constant, there is no need for a regular expression. The inbuilt string processing functions will be more efficient and easier to develop, debug and maintain. In this case, you can just use the inbuilt split() function to get the pieces, and then clean the two pieces as appropriate:
>>> def winnumber(s):
... parts = s.split('SUCCESS Number of wins : ')
... return (parts[0].strip(), int(parts[1]))
...
>>> winnumber('TheBears SUCCESS Number of wins : 14')
('TheBears', 14)
Note that I have output the number of wins as an integer (as presumably this will always be a whole number), but you can easily substitute float()- or any other conversion function - for int() if you desire.
Edit: Obviously this will only work for single lines - if you call the function with several lines it will give you errors. To process an entire file, I'd use map():
>>> map(winnumber, open(filename, 'r'))
[('TheBears', 14), ('OtherTeam', 6)]
Also, I'm not sure of your end use for this code, but you might find it easier to work with the outputs as a dictionary:
>>> dict(map(winnumber, open(filename, 'r')))
{'OtherTeam': 6, 'TheBears': 14}

Categories

Resources