Python regex on numbers [duplicate] - python

This question already has answers here:
MongoDB Regex Search on Integer Value
(2 answers)
Closed 8 years ago.
Is it possible to use regex on a number instead of a string?
For example: I have a field in a mongodb that contains the numeric value 1234567 (not stored as a string for sorting purposes etc.).
Now I want to use regex to find parts of this number, i.e. 456.
On a database-field that contains a string "1234567" this is easy: I just pass re.compile("456") to my database query. However re.compile(456) gets me the following:
TypeError: first argument must be string or compiled pattern
Any hints on how to accomplish this? Storing my numbers as strings is not really an option, since I would lose lots of other possibilities (like gt/lt, sorting etc.).
Update:
Also, I'm passing the regex right into the db-query to filter results, so I cannot pull up an individual field, convert it's content to a string and then use the regex on it.

You can convert a number to a string using the built-in str function:
str(456)

Marking as duplicate: MongoDB Regex Search on Integer Value
db.test.find({ $where: "/^123.*/.test(this.example)" })
{ "_id" : ObjectId("4bfc3187fec861325f34b132"), "example" : 1234 }

This isn't possible with MongoDB. Depending on your application, you might be able to store these numbers as string-typed values instead of numbers. In Python:
db.collection.insert({"my_number": "12345678"})
For phone numbers or zipcodes where arithmetic operations like $inc don't make sense, but where you want to use a regex to search your data, this could make sense.
An alternate approach could be to store each number both as a string and as a number:
db.collection.insert({"s": "12345678", "n": 12345678})

Related

How to skip first two numbers in provide argument? (python) [duplicate]

This question already has answers here:
How to remove leading and trailing zeros in a string? Python
(7 answers)
Closed last year.
I'm having issues skipping or trimming the first two numbers in a provided argument.
As an example I am passing the value of "00123456" to 'id'. I want the request.args.get against 123456 instead 00123456. is there a function I can use to drop off the zero's. Also relatively new to the python world so please advise if I need to provide more info.
#main.route('/test')
def test():
"""
Test route for validating number
example - /test?id=00123456
"""
# Get number passed in id argument
varNum = request.args.get('id')
You can convert the "00123456" to an int and it will remove all the zeros at the start of the string.
print(int("00123456"))
output:
123456
Edit:
Use this only if you want to remove any zeros at the start of the number, if you want to remove the first two chars use string slicing.
Also, use this only if u know for sure that the str will only contain numbers.
You can use string slicing, if you know that there are always two zeroes:
varNum = request.args.get('id')[2:]
Alternatively, you can use .lstrip(), if you don't know how many leading zeroes there are in advance:
varNum = request.args.get('id').lstrip('0')

Python string concatenation of multiple strings separated without comma [duplicate]

This question already has answers here:
String concatenation without '+' operator
(6 answers)
Closed 3 years ago.
Though it might seem a very trivial question, I still want to know the principle behind it. When we write multiple strings together without any comma,python concatenates them. I was under the impression that it will throw some error. Below is a sample output:
print('hello''world')
# This will output helloworld
Even if I write those multiple strings in the python REPL, the output will be the concatenated form of the strings. Can anyone please explain the logic behind this operation ?
See https://docs.python.org/3.8/reference/lexical_analysis.html#string-literal-concatenation.
Multiple adjacent string or bytes literals (delimited by whitespace), possibly using different quoting conventions, are allowed, and their meaning is the same as their concatenation.
Thus, "hello" 'world' is equivalent to "helloworld". This feature can be used to reduce the number of backslashes needed, to split long strings conveniently across long lines, or even to add comments to parts of strings

How to use string slicing inside string.format [duplicate]

This question already has answers here:
Slicing strings in str.format
(6 answers)
Closed 6 years ago.
How can I do variable string slicing inside string.format like this.
"{0[:2]} Some text {0[2:4]} some text".format("123456")
Result I want result like this.
12 Some text 34 some text
You can't. Best you can do is limit how many characters of a string are printed (roughly equivalent to specifying a slice end), but you can't specify arbitrary start or end indices.
Save the data to a named variable and pass the slices to the format method, it's more readable, more intuitive, and easier for the parser to identify errors when they occur:
mystr = "123456"
"{} Some text {} some text".format(mystr[:2], mystr[2:4])
You could move some of the work from that to the format string if you really wanted to, but it's not a huge improvement (and in fact, involves larger temporaries when a slice ends up being needed anyway):
"{:.2s} Some text {:.2s} some text".format(mystr, mystr[2:])

Type inference of values contained in strings stored in a list

I am trying to figure out how to do some nice type inference on the columns of a CSV file.
Are there any libraries that might tell me, for example, that a column contains only integers?
All values are of course available in string format.
I will write my own tool if nothing of this sort already exists, but it seems weird to me that such a basic task does not have a library counterpart somewhere.
Why don't you do the straightforward approach?
if all values can be parsed as integers, to column is integers
otherwise, if all values can be parsed as doubles, to column is doubles
otherwise, the column is all strings
The reason why there is no library for this is probably because it's trivial to implement using the existing string to int and string to double conversion functions.
Regular expressions are good for that, in Python, you could use something like this:
import re
def str_is_num(s):
number_pattern = re.compile("-?^\d+(\.\d+)?$")
return re.match(number_pattern, s) != None
To check whether a cell is a number, you can evaluate str_is_num(cell)

Python, string (consisting of variable and strings, concatenated) used as new variable name?

I've been searching on this but am coming up a little short on exactly how to do specifically what i am trying to do.. I want to concatentate a string (I guess it would be a string in this case as it has a variable and string) such as below, where I need to use a variable consisting of a string to call a listname that has an index (from another variable).. I simplified my code below to just show the relevant parts its part of a macro that is replacing values:
toreplacetype = 'type'
toreplace_indx = 5
replacement_string = 'list'+toreplacetype[toreplace_indx]
so... I am trying to make the string on the last line equal to the actual variable name:
replacement_string = listtype[5]
Any advice on how to do this is appreciated
EDIT:
To explain further, this is for a macro that is sort of a template system where I am indicating things in a python script that I want to replace with specific values so I am using regex to do this. So, when I match something, I want to be able to replace it from a specific value within a list, but, for example, in the template I have {{type}}, so I extract this, but then I need to manipulate it as above so that I can use the extracted value "type" to call a specific value from within a list (such as from a list called "listtype") (there is more than 1 list so I need to find the one called "listtype" so I just want to concatenate as above to get this, based on the value I extracted using regex
This is not recommended. Use a dict instead.
vars['list%s' % toreplacetype][5] = ...
Hrm...
globals()['list%s'% toreplacetype][toreplace_indx]
replacement_string = 'list'+toreplacetype+'['+str(toreplace_indx)+']'
will yield listtype[5] when you print it.
You need to basically break it into 5 parts: 1 string variable, 3 strings and an int casted to a string.
I think this is what you are asking?

Categories

Resources