How can I use batch embeddings using OpenAI's API? - python

I am using the OpenAI API to get embeddings for a bunch of sentences. And by a bunch of sentences, I mean a bunch of sentences, like thousands. Is there a way to make it faster or make it do the embeddings concurrently or something?
I tried Looping through and sending a request for each sentence but that was super slow, but so is sending a list of the sentences. For both cases I used this code : '''
response = requests.post(
"https://api.openai.com/v1/embeddings",
json={
"model": "text-embedding-ada-002",
"input": ["text:This is a test", "text:This is another test", "text:This is a third test", "text:This is a fourth test", "text:This is a fifth test", "text:This is a sixth test", "text:This is a seventh test", "text:This is a eighth test", "text:This is a ninth test", "text:This is a tenth test", "text:This is a eleventh test", "text:This is a twelfth test", "text:This is a thirteenth test", "text:This is a fourteenth test", "text:This is a fifteenth test", "text:This is a sixteenth test", "text:This is a seventeenth test", "text:This is a eighteenth test", "text:This is a nineteenth test", "text:This is a twentieth test", "text:This is a twenty first test", "text:This is a twenty second test", "text:This is a twenty third test", "text:This is a twenty fourth test", "text:This is a twenty fifth test", "text:This is a twenty sixth test", "text:This is a twenty seventh test", "text:This is a twenty eighth test", "text:This is a twenty ninth test", "text:This is a thirtieth test", "text:This is a thirty first test", "text:This is a thirty second test", "text:This is a thirty third test", "text:This is a thirty fourth test", "text:This is a thirty fifth test", "text:This is a thirty sixth test", "text:This is a thirty seventh test", "text:This is a thirty eighth test", "text:This is a thirty ninth test", "text:This is a fourtieth test", "text:This is a forty first test", "text:This is a forty second test", "text:This is a forty third test", "text:This is a forty fourth test", "text:This is a forty fifth test", "text:This is a forty sixth test", "text:This is a forty seventh test", "text:This is a forty eighth test", "text:This is a forty ninth test", "text:This is a fiftieth test", "text:This is a fifty first test", "text:This is a fifty second test", "text:This is a fifty third test"],
},
headers={
"Authorization": f"Bearer {key}"
}
)
For the first test I did a bunch of those requests one by one, and the second one I sent a list. Should I send individual requests in parallel? Would that help? Thanks!

According to OpenAi's Create Embeddings API, you should be able to do this:
To get embeddings for multiple inputs in a single request, pass an array of strings or array of token arrays. Each input must not exceed 8192 tokens in length.
https://beta.openai.com/docs/api-reference/embeddings/create

Related

automatically appended string loop

I was supposed to built a program that would automatically print the lyrics of a song (twelve days of christmas) so that it re-prints the same message in each line, but extended by the new lyric pertaining to that line.
For instance:
verse1 = '''On the first day of Christmas
my true love sent to me:
A Partridge in a Pear Tree''''
verse2 = '''On the second day of Christmas
my true love sent to me:
2 Turtle Doves
and a Partridge in a Pear Tree'''
I get stuck with the loops and ' "TypeError: '<' not supported between instances of 'str' and 'int'" '. What I do know is that I will have to use the .join() statement. Thank you in advance.
Here is a naive implementation (reference for the content):
verses = ['a partridge in a pear tree', 'two turtle doves', 'three French hens',
'four calling birds', 'five gold rings', 'six geese a-laying',
'seven swans a-swimming', 'eight maids a-milking', 'nine ladies dancing',
'ten lords a-leaping', 'eleven pipers piping', 'twelve drummers drumming']
days = ['first', 'second', 'third', 'fourth', 'fifth', 'sixth',
'seventh', 'eigth', 'ninth', 'tenth', 'eleventh', 'twelfth']
for i, day in enumerate(days):
print(f'On the {day} day of Christmas,\nmy true love sent to me:')
for verse in verses[i::-1]:
print(verse)
if i == 0:
verses[0] = 'and ' + verses[0]
print()
Output:
On the first day of Christmas,
my true love sent to me:
a partridge in a pear tree
On the second day of Christmas,
my true love sent to me:
two turtle doves
and a partridge in a pear tree
On the third day of Christmas,
my true love sent to me:
three French hens
two turtle doves
and a partridge in a pear tree
On the fourth day of Christmas,
my true love sent to me:
four calling birds
three French hens
two turtle doves
and a partridge in a pear tree
On the fifth day of Christmas,
my true love sent to me:
five gold rings
four calling birds
three French hens
two turtle doves
and a partridge in a pear tree
On the sixth day of Christmas,
my true love sent to me:
six geese a-laying
five gold rings
four calling birds
three French hens
two turtle doves
and a partridge in a pear tree
On the seventh day of Christmas,
my true love sent to me:
seven swans a-swimming
six geese a-laying
five gold rings
four calling birds
three French hens
two turtle doves
and a partridge in a pear tree
On the eigth day of Christmas,
my true love sent to me:
eight maids a-milking
seven swans a-swimming
six geese a-laying
five gold rings
four calling birds
three French hens
two turtle doves
and a partridge in a pear tree
On the ninth day of Christmas,
my true love sent to me:
nine ladies dancing
eight maids a-milking
seven swans a-swimming
six geese a-laying
five gold rings
four calling birds
three French hens
two turtle doves
and a partridge in a pear tree
On the tenth day of Christmas,
my true love sent to me:
ten lords a-leaping
nine ladies dancing
eight maids a-milking
seven swans a-swimming
six geese a-laying
five gold rings
four calling birds
three French hens
two turtle doves
and a partridge in a pear tree
On the eleventh day of Christmas,
my true love sent to me:
eleven pipers piping
ten lords a-leaping
nine ladies dancing
eight maids a-milking
seven swans a-swimming
six geese a-laying
five gold rings
four calling birds
three French hens
two turtle doves
and a partridge in a pear tree
On the twelfth day of Christmas,
my true love sent to me:
twelve drummers drumming
eleven pipers piping
ten lords a-leaping
nine ladies dancing
eight maids a-milking
seven swans a-swimming
six geese a-laying
five gold rings
four calling birds
three French hens
two turtle doves
and a partridge in a pear tree

How to collect a specific value from a text that is always a few lines below another text value?

To retrieve the values from the JSON file, I use this code:
import json
json_file = json.load(open('Bla_Bla_Bla.json'))
master_data = json_file['messages']
for unique_message in master_data:
print(unique_message['text'])
This is the JSON template:
{
"name": "Bla Bla Bla",
"type": "public_channel",
"id": 123456789,
"messages": [
{
"id": 12460,
"type": "message",
"date": "2022-02-07T04:14:51",
"from": "Bla Bla Bla",
"from_id": "channel1127646148",
"text": "Yesterday's Tips\n \n First Half\n Tips: 2\n Win/Loss: 0/2\n Void: 0\n Win Ratio: 0%\n Invested: 2u\n Net Profit: -2u\n ROI: -100%\n \n Second Half\n Tips: 10\n Win/Loss: 4/4\n Void: 2\n Win Ratio: 40%\n Invested: 8u\n Net Profit: 1.145u\n ROI: 14.3%\n \n Total\n Tips: 12\n Win/Loss: 4/6\n Void: 2\n Win Ratio: 33.3%\n Invested: 10u\n Net Profit: -0.855u\n ROI: -8.6%\n \n Highest Odds won: 2.500"
}
]
}
This is the output text:
Yesterday's Tips
First Half
Tips: 2
Win/Loss: 0/2
Void: 0
Win Ratio: 0%
Invested: 2u
Net Profit: -2u
ROI: -100%
Second Half
Tips: 10
Win/Loss: 4/4
Void: 2
Win Ratio: 40%
Invested: 8u
Net Profit: 1.145u
ROI: 14.3%
Total
Tips: 12
Win/Loss: 4/6
Void: 2
Win Ratio: 33.3%
Invested: 10u
Net Profit: -0.855u
ROI: -8.6%
Highest Odds won: 2.500
I'm trying to collect the value to Net Profit: that is inserted inside Total, the texts can change and the positions too, but whenever there is a Total, there will be a Net Profit:.
The value I want to collect in this case is:
-0.855u
How can I go about getting that specific part of the text value?
For more complex cases, you may want to look into Regular Expressions (Regex), but for this simple case:
string = unique_message["text"] # the json data
string = string[string.find("Total"):] # cut all the part before "Total"
string = string[string.find("Net Profit") + 12:] # cut all the part before "Net Profit" (the + 12 removes the Net Profit bit too)
string = string[:string.find("\n")] # take the part of the string till the next line
print(string)
Here, we use the str.find() function which returns the index of a certain part of a string together with string slicing string[start:end] to find only the part of the string needed. The \n is a special character which denotes the end of a line.
If the Net Profit: -0.855u is always after the Total, you can first find the index of Total and then use that to further narrow down your search scope and look for Net Profit.
# Your other logics
text = unique_message['text']
try:
i = text.index("Total")
profit_beginning = text[i:].index("Net Profit:")
except ValueError:
# Don't look for profit
pass
You can further enhance the lookup by using a RegEx and catch the number only.

How to print entire Twelve Days of Christmas lyrics without loops and if-else

As stated in the title, I tried many ways and the closest I got till was here:
lyrics = ['A partridge in a pear tree','Two turtle doves, and','Three
French hens','Four colly birds','Five Gold Rings','Six geese a-
laying','Seven swans a-swimming','Eights maids a-milking','Nine ladies
dancing','Ten lords a-leaping','Elven piper piping','Twelve drummers
drumming']
days = ['first','second','third','fourth','fifth','Sixth','Seventh','Eighth','Nineth'
,'Tenth','Eleventh','Twelveth']
x=1
def base():
print("On the " + days[0] + " day of christmas my true love sent to me")
print(lyrics[0]+"\n")
def day_of_christmas(x):
try:
print("On the " + days[x] + " day of christmas my true love sent to me")
y = count_days(x)
day_of_christmas(y)
except IndexError:
return None
def count_days(day):
try:
print(str(lyrics[day]))
print(str(lyrics[day-1]))
print(str(lyrics[day-2]))
print(str(lyrics[day-3]))
print(str(lyrics[day-4]))
print(str(lyrics[day-5]))
print(str(lyrics[day-6]))
print(str(lyrics[day-7]))
print(str(lyrics[day-8]))
print(str(lyrics[day-9]))
print(str(lyrics[day-10]))
print(str(lyrics[day-11]+"\n"))
except IndexError:
return None
return day+1
base()
day_of_christmas(x)
My output is:
On the first day of christmas my true love sent to me
A partridge in a pear tree
On the second day of christmas my true love sent to me
Two turtle doves, and
A partridge in a pear tree
Twelve drummers drumming
Elven piper piping
Ten lords a-leaping
Nine ladies dancing
Eights maids a-milking
Seven swans a-swimming
Six geese a-laying
Five Gold Rings
Four colly birds
Three French hens
On the third day of christmas my true love sent to me
Three French hens
Two turtle doves, and
A partridge in a pear tree
Twelve drummers drumming
Elven piper piping
Ten lords a-leaping
Nine ladies dancing
Eights maids a-milking
Seven swans a-swimming
Six geese a-laying
Five Gold Rings
Four colly birds
On the fourth day of christmas my true love sent to me
Four colly birds
Three French hens
Two turtle doves, and
A partridge in a pear tree
Twelve drummers drumming
Elven piper piping
Ten lords a-leaping
Nine ladies dancing
Eights maids a-milking
Seven swans a-swimming
Six geese a-laying
Five Gold Rings
The output basically repeats itself(too long to display all) only the 12th day has correct output. I know I am forcing the 12 lines for each day and they are repeating due to the list negative index but I need to solve this problem without loops and if-else.
I expected the output(in this order till 12th day):
On the first day of christmas my true love sent to me
A partridge in a pear tree
On the second day of christmas my true love sent to me
Two turtle doves, and
A partridge in a pear tree
On the third day of christmas my true love sent to me
Three French hens
Two turtle doves, and
A partridge in a pear tree
On the fourth day of christmas my true love sent to me
Four colly birds
Three French hens
Two turtle doves, and
A partridge in a pear tree
I like #cricket_007's solution, but you can do it recursively too. This is a bit silly:
lyrics = [
("first", "A partridge in a pear tree"),
("second", "Two turtle doves, and"),
("third", "Three French hens"),
("fourth", "Four colly birds"),
("fifth", "Five Gold Rings")
]
def get_lyrics_for_day(n):
current_lyrics = [lyrics[n][1]]
if n != 0:
previous_lyrics = get_lyrics_for_day(n-1)
current_lyrics.extend(previous_lyrics)
return current_lyrics
def print_lyrics(iteration):
if iteration == len(lyrics):
return
all_lyrics = get_lyrics_for_day(iteration)
nth = lyrics[iteration][0]
print("\n".join([f"On the {nth} day of christmas my true love sent to me"] + all_lyrics), end="\n\n")
print_lyrics(iteration+1)
print_lyrics(0)
Output:
On the first day of christmas my true love sent to me
A partridge in a pear tree
On the second day of christmas my true love sent to me
Two turtle doves, and
A partridge in a pear tree
On the third day of christmas my true love sent to me
Three French hens
Two turtle doves, and
A partridge in a pear tree
On the fourth day of christmas my true love sent to me
Four colly birds
Three French hens
Two turtle doves, and
A partridge in a pear tree
On the fifth day of christmas my true love sent to me
Five Gold Rings
Four colly birds
Three French hens
Two turtle doves, and
A partridge in a pear tree
Use corecursion (fancy name for counting up from a staring point, rather than down to a base case), and catch the IndexError when you try to access days[12] in the call to sing_day(12) to stop.
def sing_day(n):
# This line raises an IndexError when n == 12
print("On the {} day of ...".format(days[n]))
print("\n".join(lyrics[n::-1]))
print()
sing_day(n+1) # Corecurse on the next day
def print_lyrics():
try:
sing_day(0) # Start the song, and keep going as long as you can
except IndexError:
pass # We got to sing_day(12), we can stop now.
print_lyrics()
Or, abuse list(map(...)) for the side effect of calling sing_day:
def sing_day(n):
print("On the ...")
print("\n".join(...))
print()
def print_lyrics():
list(map(sing_day, range(12)))
Construct a sequence of functions like so:
def the12th():
print("On the twelfth day...")
the11th()
def the11th():
print("On the eleventh day...")
the10th()
and so on. Then in your main, call them like this:
the1st()
the2nd()
the3rd()
and so on.
A loop from 0-12 would be prefered, but you can use list-slicing to get the lyrics, then join them on a newline.
def twelve_days(day):
print("On the {} day of christmas my true love sent to me".format(days[day]))
print('\n'.join(reversed(lyrics[:day+1])))
twelve_days(0)
twelve_days(1)
twelve_days(2)
twelve_days(3)
Output
On the first day of christmas my true love sent to me
A partridge in a pear tree
On the second day of christmas my true love sent to me
Two turtle doves, and
A partridge in a pear tree
On the third day of christmas my true love sent to me
Three French hens
Two turtle doves, and
A partridge in a pear tree
On the fourth day of christmas my true love sent to me
Four colly birds
Three French hens
Two turtle doves, and
A partridge in a pear tree
Note that day_of_christmas(y) within def day_of_christmas(x) is somewhat a loop... but doing recursion
Check out this
def print_lyrics(count):
print("On the " + days[count] + " day of christmas my true love sent to me")
print('\n'.join(lyrics[:count + 1]), end="\n")
print()
count = count + 1
if count < len(days):
print_poem(count)
count = 0
print_lyrics(count)

how to return numbers in word format from a string in python

EDIT: The "already answered" is not talking about what I am. My string already comes out in word format. I need to strip those words from my string into a list
I'm trying to work with phrase manipulation for a voice assistant in python.
When I speak something like:
"What is 15,276 divided by 5?"
It comes out formatted like this:
"what is fifteen thousand two hundred seventy six divided by five"
I already have a way to change a string to an int for the math part, so is there a way to somehow get a list like this from the phrase?
['fifteen thousand two hundred seventy six','five']
Go through the list of words and check each one for membership in a set of numerical words. Add adjacent words to a temporary list. When you find a non-number word, create a new list. Remember to account for non-number words at the beginning of the sentence and number words at the end of the sentence.
result = []
group = []
nums = set('one two three four five six seven eight nine ten eleven twelve thirteen fourteen fifteen sixteen seventeen eighteen nineteen twenty thirty forty fifty sixty seventy eighty ninety hundred thousand million billion trillion quadrillion quintillion'.split())
for word in "what is fifteen thousand two hundred seventy six divided by five".split():
if word in nums:
group.append(word)
else:
if group:
result.append(group)
group = []
if group:
result.append(group)
Result:
>>> result
[['fifteen', 'thousand', 'two', 'hundred', 'seventy', 'six'], ['five']]
To join each sublist into a single string:
>>> list(map(' '.join, result))
['fifteen thousand two hundred seventy six', 'five']

Difference between unicode.isdigit() and unicode.isnumeric()

What is the difference between methods unicode.isdigit() and unicode.isnumeric()?
The Python 3 documentation is a little clearer than the Python 2 docs:
str.isdigit()
[...] Digits include decimal characters and digits that need special handling, such as the compatibility superscript digits. Formally, a digit is a character that has the property value Numeric_Type=Digit or Numeric_Type=Decimal.
str.isnumeric()
Numeric characters include digit characters, and all characters that have the Unicode numeric value property, e.g. U+2155, VULGAR FRACTION ONE FIFTH. Formally, numeric characters are those with the property value Numeric_Type=Digit, Numeric_Type=Decimal or Numeric_Type=Numeric.
So isnumeric() tests additionally for Numeric_Type=Numeric. Quoting from a historic proposal for official numeric type definitions:
Numeric_Type=Decimal
Characters used in a positional decimal systems, which standard base-10 radix systems with contiguous digits 0..9, and are most-significant-digit first (backingstore order). These are coextensive by definition with General_Category=Decimal_Number.
Numeric_Type=Digit
Variants of positional decimal characters (Numeric_Type=Decimal) or sequences thereof. These include super/subscripts, enclosed, or decorated by the addition of characters such as parentheses, dots, or commas.
Numeric_Type=Numeric
Characters with numeric value, but that are neither Decimal nor Digit.
So any character that is numeric, but not decimal or a variation thereof. Think fractions, roman numerals, glyphs that combine digits, and any numbering system that is not decimal-based.
That includes:
>>> import unicodedata
>>> for codepoint in range(2**16):
... chr = unichr(codepoint)
... if chr.isnumeric() and not chr.isdigit():
... print u'{:04x}: {} ({})'.format(codepoint, chr, unicodedata.name(chr, 'UNNAMED'))
...
00bc: ¼ (VULGAR FRACTION ONE QUARTER)
00bd: ½ (VULGAR FRACTION ONE HALF)
00be: ¾ (VULGAR FRACTION THREE QUARTERS)
09f4: ৴ (BENGALI CURRENCY NUMERATOR ONE)
09f5: ৵ (BENGALI CURRENCY NUMERATOR TWO)
09f6: ৶ (BENGALI CURRENCY NUMERATOR THREE)
09f7: ৷ (BENGALI CURRENCY NUMERATOR FOUR)
09f8: ৸ (BENGALI CURRENCY NUMERATOR ONE LESS THAN THE DENOMINATOR)
09f9: ৹ (BENGALI CURRENCY DENOMINATOR SIXTEEN)
0bf0: ௰ (TAMIL NUMBER TEN)
0bf1: ௱ (TAMIL NUMBER ONE HUNDRED)
0bf2: ௲ (TAMIL NUMBER ONE THOUSAND)
0c78: ౸ (TELUGU FRACTION DIGIT ZERO FOR ODD POWERS OF FOUR)
0c79: ౹ (TELUGU FRACTION DIGIT ONE FOR ODD POWERS OF FOUR)
0c7a: ౺ (TELUGU FRACTION DIGIT TWO FOR ODD POWERS OF FOUR)
0c7b: ౻ (TELUGU FRACTION DIGIT THREE FOR ODD POWERS OF FOUR)
0c7c: ౼ (TELUGU FRACTION DIGIT ONE FOR EVEN POWERS OF FOUR)
0c7d: ౽ (TELUGU FRACTION DIGIT TWO FOR EVEN POWERS OF FOUR)
0c7e: ౾ (TELUGU FRACTION DIGIT THREE FOR EVEN POWERS OF FOUR)
0d70: ൰ (MALAYALAM NUMBER TEN)
0d71: ൱ (MALAYALAM NUMBER ONE HUNDRED)
0d72: ൲ (MALAYALAM NUMBER ONE THOUSAND)
0d73: ൳ (MALAYALAM FRACTION ONE QUARTER)
0d74: ൴ (MALAYALAM FRACTION ONE HALF)
0d75: ൵ (MALAYALAM FRACTION THREE QUARTERS)
0f2a: ༪ (TIBETAN DIGIT HALF ONE)
0f2b: ༫ (TIBETAN DIGIT HALF TWO)
0f2c: ༬ (TIBETAN DIGIT HALF THREE)
0f2d: ༭ (TIBETAN DIGIT HALF FOUR)
0f2e: ༮ (TIBETAN DIGIT HALF FIVE)
0f2f: ༯ (TIBETAN DIGIT HALF SIX)
0f30: ༰ (TIBETAN DIGIT HALF SEVEN)
0f31: ༱ (TIBETAN DIGIT HALF EIGHT)
0f32: ༲ (TIBETAN DIGIT HALF NINE)
0f33: ༳ (TIBETAN DIGIT HALF ZERO)
1372: ፲ (ETHIOPIC NUMBER TEN)
1373: ፳ (ETHIOPIC NUMBER TWENTY)
1374: ፴ (ETHIOPIC NUMBER THIRTY)
1375: ፵ (ETHIOPIC NUMBER FORTY)
1376: ፶ (ETHIOPIC NUMBER FIFTY)
1377: ፷ (ETHIOPIC NUMBER SIXTY)
1378: ፸ (ETHIOPIC NUMBER SEVENTY)
1379: ፹ (ETHIOPIC NUMBER EIGHTY)
137a: ፺ (ETHIOPIC NUMBER NINETY)
137b: ፻ (ETHIOPIC NUMBER HUNDRED)
137c: ፼ (ETHIOPIC NUMBER TEN THOUSAND)
16ee: ᛮ (RUNIC ARLAUG SYMBOL)
16ef: ᛯ (RUNIC TVIMADUR SYMBOL)
16f0: ᛰ (RUNIC BELGTHOR SYMBOL)
17f0: ៰ (KHMER SYMBOL LEK ATTAK SON)
17f1: ៱ (KHMER SYMBOL LEK ATTAK MUOY)
17f2: ៲ (KHMER SYMBOL LEK ATTAK PII)
17f3: ៳ (KHMER SYMBOL LEK ATTAK BEI)
17f4: ៴ (KHMER SYMBOL LEK ATTAK BUON)
17f5: ៵ (KHMER SYMBOL LEK ATTAK PRAM)
17f6: ៶ (KHMER SYMBOL LEK ATTAK PRAM-MUOY)
17f7: ៷ (KHMER SYMBOL LEK ATTAK PRAM-PII)
17f8: ៸ (KHMER SYMBOL LEK ATTAK PRAM-BEI)
17f9: ៹ (KHMER SYMBOL LEK ATTAK PRAM-BUON)
2150: ⅐ (VULGAR FRACTION ONE SEVENTH)
2151: ⅑ (VULGAR FRACTION ONE NINTH)
2152: ⅒ (VULGAR FRACTION ONE TENTH)
2153: ⅓ (VULGAR FRACTION ONE THIRD)
2154: ⅔ (VULGAR FRACTION TWO THIRDS)
2155: ⅕ (VULGAR FRACTION ONE FIFTH)
2156: ⅖ (VULGAR FRACTION TWO FIFTHS)
2157: ⅗ (VULGAR FRACTION THREE FIFTHS)
2158: ⅘ (VULGAR FRACTION FOUR FIFTHS)
2159: ⅙ (VULGAR FRACTION ONE SIXTH)
215a: ⅚ (VULGAR FRACTION FIVE SIXTHS)
215b: ⅛ (VULGAR FRACTION ONE EIGHTH)
215c: ⅜ (VULGAR FRACTION THREE EIGHTHS)
215d: ⅝ (VULGAR FRACTION FIVE EIGHTHS)
215e: ⅞ (VULGAR FRACTION SEVEN EIGHTHS)
215f: ⅟ (FRACTION NUMERATOR ONE)
2160: Ⅰ (ROMAN NUMERAL ONE)
2161: Ⅱ (ROMAN NUMERAL TWO)
2162: Ⅲ (ROMAN NUMERAL THREE)
2163: Ⅳ (ROMAN NUMERAL FOUR)
2164: Ⅴ (ROMAN NUMERAL FIVE)
2165: Ⅵ (ROMAN NUMERAL SIX)
2166: Ⅶ (ROMAN NUMERAL SEVEN)
2167: Ⅷ (ROMAN NUMERAL EIGHT)
2168: Ⅸ (ROMAN NUMERAL NINE)
2169: Ⅹ (ROMAN NUMERAL TEN)
216a: Ⅺ (ROMAN NUMERAL ELEVEN)
216b: Ⅻ (ROMAN NUMERAL TWELVE)
216c: Ⅼ (ROMAN NUMERAL FIFTY)
216d: Ⅽ (ROMAN NUMERAL ONE HUNDRED)
216e: Ⅾ (ROMAN NUMERAL FIVE HUNDRED)
216f: Ⅿ (ROMAN NUMERAL ONE THOUSAND)
2170: ⅰ (SMALL ROMAN NUMERAL ONE)
2171: ⅱ (SMALL ROMAN NUMERAL TWO)
2172: ⅲ (SMALL ROMAN NUMERAL THREE)
2173: ⅳ (SMALL ROMAN NUMERAL FOUR)
2174: ⅴ (SMALL ROMAN NUMERAL FIVE)
2175: ⅵ (SMALL ROMAN NUMERAL SIX)
2176: ⅶ (SMALL ROMAN NUMERAL SEVEN)
2177: ⅷ (SMALL ROMAN NUMERAL EIGHT)
2178: ⅸ (SMALL ROMAN NUMERAL NINE)
2179: ⅹ (SMALL ROMAN NUMERAL TEN)
217a: ⅺ (SMALL ROMAN NUMERAL ELEVEN)
217b: ⅻ (SMALL ROMAN NUMERAL TWELVE)
217c: ⅼ (SMALL ROMAN NUMERAL FIFTY)
217d: ⅽ (SMALL ROMAN NUMERAL ONE HUNDRED)
217e: ⅾ (SMALL ROMAN NUMERAL FIVE HUNDRED)
217f: ⅿ (SMALL ROMAN NUMERAL ONE THOUSAND)
2180: ↀ (ROMAN NUMERAL ONE THOUSAND C D)
2181: ↁ (ROMAN NUMERAL FIVE THOUSAND)
2182: ↂ (ROMAN NUMERAL TEN THOUSAND)
2185: ↅ (ROMAN NUMERAL SIX LATE FORM)
2186: ↆ (ROMAN NUMERAL FIFTY EARLY FORM)
2187: ↇ (ROMAN NUMERAL FIFTY THOUSAND)
2188: ↈ (ROMAN NUMERAL ONE HUNDRED THOUSAND)
2189: ↉ (VULGAR FRACTION ZERO THIRDS)
2469: ⑩ (CIRCLED NUMBER TEN)
246a: ⑪ (CIRCLED NUMBER ELEVEN)
246b: ⑫ (CIRCLED NUMBER TWELVE)
246c: ⑬ (CIRCLED NUMBER THIRTEEN)
246d: ⑭ (CIRCLED NUMBER FOURTEEN)
246e: ⑮ (CIRCLED NUMBER FIFTEEN)
246f: ⑯ (CIRCLED NUMBER SIXTEEN)
2470: ⑰ (CIRCLED NUMBER SEVENTEEN)
2471: ⑱ (CIRCLED NUMBER EIGHTEEN)
2472: ⑲ (CIRCLED NUMBER NINETEEN)
2473: ⑳ (CIRCLED NUMBER TWENTY)
247d: ⑽ (PARENTHESIZED NUMBER TEN)
247e: ⑾ (PARENTHESIZED NUMBER ELEVEN)
247f: ⑿ (PARENTHESIZED NUMBER TWELVE)
2480: ⒀ (PARENTHESIZED NUMBER THIRTEEN)
2481: ⒁ (PARENTHESIZED NUMBER FOURTEEN)
2482: ⒂ (PARENTHESIZED NUMBER FIFTEEN)
2483: ⒃ (PARENTHESIZED NUMBER SIXTEEN)
2484: ⒄ (PARENTHESIZED NUMBER SEVENTEEN)
2485: ⒅ (PARENTHESIZED NUMBER EIGHTEEN)
2486: ⒆ (PARENTHESIZED NUMBER NINETEEN)
2487: ⒇ (PARENTHESIZED NUMBER TWENTY)
2491: ⒑ (NUMBER TEN FULL STOP)
2492: ⒒ (NUMBER ELEVEN FULL STOP)
2493: ⒓ (NUMBER TWELVE FULL STOP)
2494: ⒔ (NUMBER THIRTEEN FULL STOP)
2495: ⒕ (NUMBER FOURTEEN FULL STOP)
2496: ⒖ (NUMBER FIFTEEN FULL STOP)
2497: ⒗ (NUMBER SIXTEEN FULL STOP)
2498: ⒘ (NUMBER SEVENTEEN FULL STOP)
2499: ⒙ (NUMBER EIGHTEEN FULL STOP)
249a: ⒚ (NUMBER NINETEEN FULL STOP)
249b: ⒛ (NUMBER TWENTY FULL STOP)
24eb: ⓫ (NEGATIVE CIRCLED NUMBER ELEVEN)
24ec: ⓬ (NEGATIVE CIRCLED NUMBER TWELVE)
24ed: ⓭ (NEGATIVE CIRCLED NUMBER THIRTEEN)
24ee: ⓮ (NEGATIVE CIRCLED NUMBER FOURTEEN)
24ef: ⓯ (NEGATIVE CIRCLED NUMBER FIFTEEN)
24f0: ⓰ (NEGATIVE CIRCLED NUMBER SIXTEEN)
24f1: ⓱ (NEGATIVE CIRCLED NUMBER SEVENTEEN)
24f2: ⓲ (NEGATIVE CIRCLED NUMBER EIGHTEEN)
24f3: ⓳ (NEGATIVE CIRCLED NUMBER NINETEEN)
24f4: ⓴ (NEGATIVE CIRCLED NUMBER TWENTY)
24fe: ⓾ (DOUBLE CIRCLED NUMBER TEN)
277f: ❿ (DINGBAT NEGATIVE CIRCLED NUMBER TEN)
2789: ➉ (DINGBAT CIRCLED SANS-SERIF NUMBER TEN)
2793: ➓ (DINGBAT NEGATIVE CIRCLED SANS-SERIF NUMBER TEN)
2cfd: ⳽ (COPTIC FRACTION ONE HALF)
3007: 〇 (IDEOGRAPHIC NUMBER ZERO)
3021: 〡 (HANGZHOU NUMERAL ONE)
3022: 〢 (HANGZHOU NUMERAL TWO)
3023: 〣 (HANGZHOU NUMERAL THREE)
3024: 〤 (HANGZHOU NUMERAL FOUR)
3025: 〥 (HANGZHOU NUMERAL FIVE)
3026: 〦 (HANGZHOU NUMERAL SIX)
3027: 〧 (HANGZHOU NUMERAL SEVEN)
3028: 〨 (HANGZHOU NUMERAL EIGHT)
3029: 〩 (HANGZHOU NUMERAL NINE)
3038: 〸 (HANGZHOU NUMERAL TEN)
3039: 〹 (HANGZHOU NUMERAL TWENTY)
303a: 〺 (HANGZHOU NUMERAL THIRTY)
3192: ㆒ (IDEOGRAPHIC ANNOTATION ONE MARK)
3193: ㆓ (IDEOGRAPHIC ANNOTATION TWO MARK)
3194: ㆔ (IDEOGRAPHIC ANNOTATION THREE MARK)
3195: ㆕ (IDEOGRAPHIC ANNOTATION FOUR MARK)
3220: ㈠ (PARENTHESIZED IDEOGRAPH ONE)
3221: ㈡ (PARENTHESIZED IDEOGRAPH TWO)
3222: ㈢ (PARENTHESIZED IDEOGRAPH THREE)
3223: ㈣ (PARENTHESIZED IDEOGRAPH FOUR)
3224: ㈤ (PARENTHESIZED IDEOGRAPH FIVE)
3225: ㈥ (PARENTHESIZED IDEOGRAPH SIX)
3226: ㈦ (PARENTHESIZED IDEOGRAPH SEVEN)
3227: ㈧ (PARENTHESIZED IDEOGRAPH EIGHT)
3228: ㈨ (PARENTHESIZED IDEOGRAPH NINE)
3229: ㈩ (PARENTHESIZED IDEOGRAPH TEN)
3251: ㉑ (CIRCLED NUMBER TWENTY ONE)
3252: ㉒ (CIRCLED NUMBER TWENTY TWO)
3253: ㉓ (CIRCLED NUMBER TWENTY THREE)
3254: ㉔ (CIRCLED NUMBER TWENTY FOUR)
3255: ㉕ (CIRCLED NUMBER TWENTY FIVE)
3256: ㉖ (CIRCLED NUMBER TWENTY SIX)
3257: ㉗ (CIRCLED NUMBER TWENTY SEVEN)
3258: ㉘ (CIRCLED NUMBER TWENTY EIGHT)
3259: ㉙ (CIRCLED NUMBER TWENTY NINE)
325a: ㉚ (CIRCLED NUMBER THIRTY)
325b: ㉛ (CIRCLED NUMBER THIRTY ONE)
325c: ㉜ (CIRCLED NUMBER THIRTY TWO)
325d: ㉝ (CIRCLED NUMBER THIRTY THREE)
325e: ㉞ (CIRCLED NUMBER THIRTY FOUR)
325f: ㉟ (CIRCLED NUMBER THIRTY FIVE)
3280: ㊀ (CIRCLED IDEOGRAPH ONE)
3281: ㊁ (CIRCLED IDEOGRAPH TWO)
3282: ㊂ (CIRCLED IDEOGRAPH THREE)
3283: ㊃ (CIRCLED IDEOGRAPH FOUR)
3284: ㊄ (CIRCLED IDEOGRAPH FIVE)
3285: ㊅ (CIRCLED IDEOGRAPH SIX)
3286: ㊆ (CIRCLED IDEOGRAPH SEVEN)
3287: ㊇ (CIRCLED IDEOGRAPH EIGHT)
3288: ㊈ (CIRCLED IDEOGRAPH NINE)
3289: ㊉ (CIRCLED IDEOGRAPH TEN)
32b1: ㊱ (CIRCLED NUMBER THIRTY SIX)
32b2: ㊲ (CIRCLED NUMBER THIRTY SEVEN)
32b3: ㊳ (CIRCLED NUMBER THIRTY EIGHT)
32b4: ㊴ (CIRCLED NUMBER THIRTY NINE)
32b5: ㊵ (CIRCLED NUMBER FORTY)
32b6: ㊶ (CIRCLED NUMBER FORTY ONE)
32b7: ㊷ (CIRCLED NUMBER FORTY TWO)
32b8: ㊸ (CIRCLED NUMBER FORTY THREE)
32b9: ㊹ (CIRCLED NUMBER FORTY FOUR)
32ba: ㊺ (CIRCLED NUMBER FORTY FIVE)
32bb: ㊻ (CIRCLED NUMBER FORTY SIX)
32bc: ㊼ (CIRCLED NUMBER FORTY SEVEN)
32bd: ㊽ (CIRCLED NUMBER FORTY EIGHT)
32be: ㊾ (CIRCLED NUMBER FORTY NINE)
32bf: ㊿ (CIRCLED NUMBER FIFTY)
3405: 㐅 (CJK UNIFIED IDEOGRAPH-3405)
3483: 㒃 (CJK UNIFIED IDEOGRAPH-3483)
382a: 㠪 (CJK UNIFIED IDEOGRAPH-382A)
3b4d: 㭍 (CJK UNIFIED IDEOGRAPH-3B4D)
4e00: 一 (CJK UNIFIED IDEOGRAPH-4E00)
4e03: 七 (CJK UNIFIED IDEOGRAPH-4E03)
4e07: 万 (CJK UNIFIED IDEOGRAPH-4E07)
4e09: 三 (CJK UNIFIED IDEOGRAPH-4E09)
4e5d: 九 (CJK UNIFIED IDEOGRAPH-4E5D)
4e8c: 二 (CJK UNIFIED IDEOGRAPH-4E8C)
4e94: 五 (CJK UNIFIED IDEOGRAPH-4E94)
4e96: 亖 (CJK UNIFIED IDEOGRAPH-4E96)
4ebf: 亿 (CJK UNIFIED IDEOGRAPH-4EBF)
4ec0: 什 (CJK UNIFIED IDEOGRAPH-4EC0)
4edf: 仟 (CJK UNIFIED IDEOGRAPH-4EDF)
4ee8: 仨 (CJK UNIFIED IDEOGRAPH-4EE8)
4f0d: 伍 (CJK UNIFIED IDEOGRAPH-4F0D)
4f70: 佰 (CJK UNIFIED IDEOGRAPH-4F70)
5104: 億 (CJK UNIFIED IDEOGRAPH-5104)
5146: 兆 (CJK UNIFIED IDEOGRAPH-5146)
5169: 兩 (CJK UNIFIED IDEOGRAPH-5169)
516b: 八 (CJK UNIFIED IDEOGRAPH-516B)
516d: 六 (CJK UNIFIED IDEOGRAPH-516D)
5341: 十 (CJK UNIFIED IDEOGRAPH-5341)
5343: 千 (CJK UNIFIED IDEOGRAPH-5343)
5344: 卄 (CJK UNIFIED IDEOGRAPH-5344)
5345: 卅 (CJK UNIFIED IDEOGRAPH-5345)
534c: 卌 (CJK UNIFIED IDEOGRAPH-534C)
53c1: 叁 (CJK UNIFIED IDEOGRAPH-53C1)
53c2: 参 (CJK UNIFIED IDEOGRAPH-53C2)
53c3: 參 (CJK UNIFIED IDEOGRAPH-53C3)
53c4: 叄 (CJK UNIFIED IDEOGRAPH-53C4)
56db: 四 (CJK UNIFIED IDEOGRAPH-56DB)
58f1: 壱 (CJK UNIFIED IDEOGRAPH-58F1)
58f9: 壹 (CJK UNIFIED IDEOGRAPH-58F9)
5e7a: 幺 (CJK UNIFIED IDEOGRAPH-5E7A)
5efe: 廾 (CJK UNIFIED IDEOGRAPH-5EFE)
5eff: 廿 (CJK UNIFIED IDEOGRAPH-5EFF)
5f0c: 弌 (CJK UNIFIED IDEOGRAPH-5F0C)
5f0d: 弍 (CJK UNIFIED IDEOGRAPH-5F0D)
5f0e: 弎 (CJK UNIFIED IDEOGRAPH-5F0E)
5f10: 弐 (CJK UNIFIED IDEOGRAPH-5F10)
62fe: 拾 (CJK UNIFIED IDEOGRAPH-62FE)
634c: 捌 (CJK UNIFIED IDEOGRAPH-634C)
67d2: 柒 (CJK UNIFIED IDEOGRAPH-67D2)
6f06: 漆 (CJK UNIFIED IDEOGRAPH-6F06)
7396: 玖 (CJK UNIFIED IDEOGRAPH-7396)
767e: 百 (CJK UNIFIED IDEOGRAPH-767E)
8086: 肆 (CJK UNIFIED IDEOGRAPH-8086)
842c: 萬 (CJK UNIFIED IDEOGRAPH-842C)
8cae: 貮 (CJK UNIFIED IDEOGRAPH-8CAE)
8cb3: 貳 (CJK UNIFIED IDEOGRAPH-8CB3)
8d30: 贰 (CJK UNIFIED IDEOGRAPH-8D30)
9621: 阡 (CJK UNIFIED IDEOGRAPH-9621)
9646: 陆 (CJK UNIFIED IDEOGRAPH-9646)
964c: 陌 (CJK UNIFIED IDEOGRAPH-964C)
9678: 陸 (CJK UNIFIED IDEOGRAPH-9678)
96f6: 零 (CJK UNIFIED IDEOGRAPH-96F6)
a6e6: ꛦ (BAMUM LETTER MO)
a6e7: ꛧ (BAMUM LETTER MBAA)
a6e8: ꛨ (BAMUM LETTER TET)
a6e9: ꛩ (BAMUM LETTER KPA)
a6ea: ꛪ (BAMUM LETTER TEN)
a6eb: ꛫ (BAMUM LETTER NTUU)
a6ec: ꛬ (BAMUM LETTER SAMBA)
a6ed: ꛭ (BAMUM LETTER FAAMAE)
a6ee: ꛮ (BAMUM LETTER KOVUU)
a6ef: ꛯ (BAMUM LETTER KOGHOM)
a830: ꠰ (NORTH INDIC FRACTION ONE QUARTER)
a831: ꠱ (NORTH INDIC FRACTION ONE HALF)
a832: ꠲ (NORTH INDIC FRACTION THREE QUARTERS)
a833: ꠳ (NORTH INDIC FRACTION ONE SIXTEENTH)
a834: ꠴ (NORTH INDIC FRACTION ONE EIGHTH)
a835: ꠵ (NORTH INDIC FRACTION THREE SIXTEENTHS)
f96b: 參 (CJK COMPATIBILITY IDEOGRAPH-F96B)
f973: 拾 (CJK COMPATIBILITY IDEOGRAPH-F973)
f978: 兩 (CJK COMPATIBILITY IDEOGRAPH-F978)
f9b2: 零 (CJK COMPATIBILITY IDEOGRAPH-F9B2)
f9d1: 六 (CJK COMPATIBILITY IDEOGRAPH-F9D1)
f9d3: 陸 (CJK COMPATIBILITY IDEOGRAPH-F9D3)
f9fd: 什 (CJK COMPATIBILITY IDEOGRAPH-F9FD)
However, the distinction between Numeric_Type=Digit and Numeric_Type=Numeric is no longer considered useful, and Numeric_Type=Digit is no longer used for new characters since Unicode 6.3.0. Quoting Unicode Standard Annex #44:
Starting with Unicode 6.3.0, no newly encoded numeric characters will be given Numeric_Type=Digit, nor will existing characters with Numeric_Type=Numeric be changed to Numeric_Type=Digit. The distinction between those two types is not considered useful.
Thus, 🄌 (DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT ZERO) and other characters that once would have been assigned Numeric_Type=Digit have instead been assigned Numeric_Type=Numeric, and they report False for isdigit:
>>> '🄌'.isdigit()
False
unicode.isnumeric()
Return True if there are only numeric characters in S, False otherwise. Numeric characters include digit characters, and all characters that have the Unicode numeric value property, e.g. U+2155, VULGAR FRACTION ONE FIFTH.
str.isdigit()
Return true if all characters in the string are digits and there is at least one character, false otherwise.
For 8-bit strings, this method is locale-dependent.
From the manual
The method isnumeric() checks whether the string consists of only
numeric characters. This method is present only on unicode objects.
Digits include decimal characters and digits that need special
handling, such as the compatibility superscript digits. Formally, a
digit is a character that has the property value Numeric_Type=Digit or
Numeric_Type=Decimal.
The code snippet provided by #Martijn Pieters doesn't work on the latest Python version i.e. 3.7 at the time of writing this answer.
Here is the updated code snippet.
import unicodedata
count = 0
for codepoint in range(2**16):
ch = chr(codepoint)
if ch.isnumeric() and not ch.isdigit():
print(u'{:04x}: {} ({})'.format(codepoint, ch, unicodedata.name(ch, 'UNNAMED')))
count = count + 1
print(f'Total Number of Numeric and Non-Digit Unicode Characters = {count}')
Output:
...
f9d1: 六 (CJK COMPATIBILITY IDEOGRAPH-F9D1)
f9d3: 陸 (CJK COMPATIBILITY IDEOGRAPH-F9D3)
f9fd: 什 (CJK COMPATIBILITY IDEOGRAPH-F9FD)
Total Number of Numeric and Non-Digit Unicode Characters = 335
NOTE: I am using f-strings for formatting. It's a really cool new way to format string and introduced in Python 3.6 under PEP-498. It's also called Literal String Interpolation. You can read more about it here or check out Official Documentation too.
From python inbuilt docs,
>>> unicode.isdigit.__doc__
'S.isdigit() -> bool\n\nReturn True if all characters in S are digits\nand there is at least one character in S, False otherwise.'
>>> unicode.isnumeric.__doc__
'S.isnumeric() -> bool\n\nReturn True if there are only numeric characters in S,\nFalse otherwise.'

Categories

Resources