Hello my fellow coders!
I'm an absolute beginner to Python and coding in general. Right now, I'm writing a code that converts regular arabic numerals to roman. For numbers larger than 3 999, the romans usually wrote a line over a letter to make it thousand times larger. For example, IV with a line over it represented 4 000. How is this possible in Python? I have understood that you can create an "overscore" by writing "\u203E". How can I make this appear over a letter instead of beside it?
Regards
You need to use the combining character U+0304 instead.
>>> print(u'a\u0304')
ā
U+0305 is probably a better choice (as viraptor suggests). You can also use the Unicode Roman numerals (U+2160 through U+217f) instead of regular uppercase Latin letters, although (at least in my terminal) they don't render as well with the overline.
>>> print(u'\u2163\u0305')
Ⅳ̅
>>> print u'I\u0305V\u0305'
I̅V̅
(Or as I see it:
Notice the overline is centered over, but does not completely cover, the single-character Roman numeral 4.)
(Any pure text option will only be as good as the font and renderer used by the person running the code. Case in point, the I+V version does not even display consistently while I type this; sometimes the overbars are over the letters, sometimes they follow the letters.)
A combining overline is \u305 and it works quite well with "IV". What you want is for example: u'I\u0305V\u0305' (gives I̅V̅)
I looked for something online but didn't find it. The best workaround I'd suggest would be the following:
def over(character):
return "_\n"+character
Such as:
>>> print over("M")
_
M
>>>
Related
I have the letter 'ᴇ' in a text and when I have if 'ᴇ' == 'e' it returns False. How can I convert 'ᴇ'
to 'E'?
I tried to encode it, but when I run:
'ᴇ'.encode("utf-16")
>>>b'\xff\xfe\x07\x1d'
The problem is that what you're trying to do isn't "remove a font". You're trying to map one-or-more unicode characters into some other subset of characters. There doesn't seem to be an out-of-the-box way to do what you want, probably because there are so many different things you might "actually want".
All of the following, and many other distinct characters, are arguable "e"s: ᴇ, ꠄ, 𝔼, E, ⅇ, aͤ *, Ӭ, Ꭱ, e, Ⓔ, ℰ, é
Somehow you'll have to decide which of these you want to transform, and into what, and which you'd like to leave alone. Even assuming reasonable answers exist for every question that will come up, there are simply a lot of unicode characters; don't actually try cover all cases.
Depending on the scope of the transformation you have in mind, you may be able to use str.translate or something funky with codecs.register_error to perform OK transformation on many possible inputs.
*That's actually two characters, the "a" is just an "a".
This is a rather generic question, but I have a textfile that I want to edit using a script.
What are some ways to format text, so that it will visually stand out but still be recognized by my script?
It works fine when I use text_to_be_replaced, but it is hard to find when you have a large file.
Tried searching, and it seems that the common ways are:
%text_to_be_replaced%
<text_to_be_replaced>
$(text_to_be_replaced)
But maybe there is a commonly used/widely accepted way to format text for visibility?
The language the script is written in is python, if that matters... but I'm looking for a more-or-less generic soluting which will work 90% of the time.
I'm not aware of any generic standard here, but if it's meant to be replaced, you can use the new string formatting method as follows:
string = 'some text {add_text_here} some more text'
Then to replace it when you need to:
value = 'formatted'
string = string.format(add_text_here=value)
Now print it out:
>>> string
'some text formatted some more text'
In fact, this quite neat at the addition of curly {brackets} around the text that needs to be replaced also may make it stand out a little.
At first I thought that {{curly braces}} would be fine, but than I went with $ALLCAPS.
First of all, caps really stands out, while lowercase may be confused with the rest of the code.
And while it $REALLYSTANDSOUT, it shouldn't cause any problems, since it's just a "bookmark" in a text file, and will be replaced with the appropriate stuff determined by the script.
I am making a code in which every character is associated with a number. To make my life easier I decided to use alphanumeric values (a=97, b=98, z=121). The first step in my code would be to get a number out of a character. For example:
char = input"Write a character:"
print(ord(char.lower()))
Afterwards though, I need to know the total number of alphanum characters that exist and nowhere have I found my answer...
Your question is not very clear. Why would you need total number of alphanum characters?
thing is, that number depends on the encoding in question. If ASCII is in question then:
>>> import string
>>> len(string.letters+string.digits)
Which is something you could do by counting manually.
And this is even not really the total count, as there is a few more alpha from other languages within 0-128 ASCII range.
If unicode, well, then you will have to search for the specification to see how many of these are there. I do not even know how many alphabets are crammed into unicode or UTF-8.
If it is a question of recognizing alpha-numeric characters in a string, then Python has a nice method to do so:
>>> "A".isalnum()
>>> "0".isalnum()
>>> "[".isalnum()
So please, express yourself more clearly.
I am writing a python3 program that has to handle text in various writing systems, including Hangul (Korean) and I have problems with the comparison of the same character in different positions.
For those unfamiliar with Hangul (not that I know much about it, either), this script has the almost unique feature of combining the letters of a syllable into square blocks. For example 'ㅎ' is pronounced [h] and 'ㅏ' is pronounced [a], the syllable 'hah' is written '핳' (in case your system can't render Hangul: the first h is displayed in the top-left corner, the a is in the top-right corner and the second h is under them in the middle). Unicode handles this by having two different entries for each consonant, depending on whether it appears in the onset or the coda of a syllable. For example, the previous syllable is encoded as '\u1112\u1161\u11c2'.
My code needs to compare two chars, considering them as equal if they only differ for their positions. This is not the case with simple comparison, even applying Unicode normalizations. Is there a way to do it?
You will need to use a tailored version of the Unicode Collation Algorithm (UCA) that assigns equal weights to identical syllables. The UCA technical report describes the general problem for sorting Hangul.
Luckily, the ICU library has a set of collation rules that does exactly this: ko-u-co-search – Korean (General-Purpose Search); which you can try out on their demo page. To use this in Python, you will either need use a library like PyICU, or one that implements the UCA and supports the ICU rule file format (or lets you write your own rules).
I'm the developer for Python jamo (the Hangul letters are called jamo). An easy way to do this would be to cast all jamo code points to their respective Hangul compatibility jamo (HCJ) code points. HCJ is the display form of jamo characters, so initial and final forms of consonants are the same code point.
For example:
>>> import jamo
>>> initial, vowel, final = jamo.j2hcj('\u1112\u1161\u11c2')
>>> initial == final
True
The way this is done internally is with a lookup table copied from the Unicode specifications.
Just wondering...
I find using escape characters too distracting. I'd rather do something like this (console code):
>>> print ^'Let's begin and end with sets of unlikely 2 chars and bingo!'^
Let's begin and end with sets of unlikely 2 chars and bingo!
Note the ' inside the string, and how this syntax would have no issue with it, or whatever else inside for basically all cases. Too bad markdown can't properly colorize it (yet), so I decided to <pre> it.
Sure, the ^ could be any other char, I'm not sure what would look/work better. That sounds good enough to me, tho.
Probably some other language already have a similar solution. And, just maybe, Python already have such a feature and I overlooked it. I hope this is the case.
But if it isn't, would it be too hard to, somehow, change Python's interpreter and be able to select an arbitrary (or even standardized) syntax for notating the strings?
I realize there are many ways to change statements and the whole syntax in general by using pre-compilators, but this is far more specific. And going any of those routes is what I call "too hard". I'm not really needing to do this so, again, I'm just wondering.
Python has this use """ or ''' as the delimiters
print '''Let's begin and end with sets of unlikely 2 chars and bingo'''
How often do you have both of 3' and 3" in a string