Remove special characters and replace spaces with "-"

Remove special characters and replace spaces with "-" - python

I'm trying to make a directory that is okay to appear in a URL. I want to ensure that it doesn't contain any special characters and replace any spaces with hyphens.
from os.path import join as osjoin
def image_dir(self, filename):
categorydir = ''.join(e for e in str(self.title.lower()) if e.isalnum())
return "category/" + osjoin(categorydir, filename)
It's removing special characters however I'd like use .replace(" ", "-") to swap out spaces with hyphens

The best way is probably to use the slugify function which takes any string as input and returns an URL-compatible one, yours is not an URL but it will do the trick, eg:
>>> from django.utils.text import slugify
>>> slugify(' Joel is a slug ')
'joel-is-a-slug'

Why don't you use the quote function?
import urllib.parse
urlllib.parse.quote(filename.replace(" ", "-"), safe="")

You can create this functions and call remove_special_chars(s) to do it:
def __is_ascii__(c):
return (ord(c) < 128)
def remove_special_chars(s):
output = ''
for c in s:
if (c.isalpha() and __is_ascii__(c)) or c == ' ':
output = output + c
else:
if c in string.punctuation:
output = output + ' '
output = re.sub(' +', ' ', output)
output = output.replace(' ', '-')
return output
It will remove each non-ASCII character and each element in string.punctuation
EDIT:
This function will substitute each element in string.punctuation with a '-', if you want you can substitute ' ' with '' in the else statement to merge the two parts of the string before and after the punctuation element.

Related

Python challenge to convert a string to camelCase

I have a python challenge that if given a string with '_' or '-' in between each word such as the_big_red_apple or the-big-red-apple to convert it to camel case. Also if the first word is uppercase keep it as uppercase. This is my code. Im not allowed to use the re library in the challenge however but I didn't know how else to do it.
from re import sub
def to_camel_case(text):
if text[0].isupper():
text = sub(r"(_|-)+"," ", text).title().replace(" ", "")
else:
text = sub(r"(_|-)+"," ", text).title().replace(" ", "")
text = text[0].lower() + text[1:]
return print(text)

Word delimiters can be - dash or _ underscore.
Let's simplify, making them all underscores:
text = text.replace('-', '_')
Now we can break out words:
words = text.split('_')
With that in hand it's simple to put them back together:
text = ''.join(map(str.capitalize, words))
or more verbosely, with a generator expression,
assign ''.join(word.capitalize() for word in words).
I leave "finesse the 1st character"
as an exercise to the reader.
If you RTFM you'll find it contains a wealth of knowledge.
https://docs.python.org/3/library/re.html#raw-string-notation
'+'
Causes the resulting RE to match 1 or more repetitions of the preceding RE. ab+ will match ‘a’ followed by any non-zero number of ‘b’s
The effect of + is turn both
db_rows_read and
db__rows_read
into DbRowsRead.
Also,
Raw string notation (r"text") keeps regular expressions sane.
The regex in your question doesn't exactly
need a raw string, as it has no crazy
punctuation like \ backwhacks.
But it's a very good habit to always put
a regex in an r-string, Just In Case.
You never know when code maintenance
will tack on additional elements,
and who wants a subtle regex bug on their hands?

You can try it like this :
def to_camel_case(text):
s = text.replace("-", " ").replace("_", " ")
s = s.split()
if len(text) == 0:
return text
return s[0] + ''.join(i.capitalize() for i in s[1:])
print(to_camel_case('momo_es-es'))
the output of print(to_camel_case('momo_es-es')) is momoEsEs

r"..." refers to Raw String in Python which simply means treating backlash \ as literal instead of escape character.
And (_|-)[+] is a Regular Expression that match the string containing one or more - or _ characters.
(_|-) means matching the string that contains - or _.
+ means matching the above character (- or _) than occur one or more times in the string.
In case you cannot use re library for this solution:
def to_camel_case(text):
# Since delimiters can have 2 possible answers, let's simplify it to one.
# In this case, I replace all `_` characters with `-`, to make sure we have only one delimiter.
text = text.replace("_", "-") # the_big-red_apple => the-big-red-apple
# Next, we should split our text into words in order for us to iterate through and modify it later.
words = text.split("-") # the-big-red-apple => ["the", "big", "red", "apple"]
# Now, for each word (except the first word) we have to turn its first character to uppercase.
for i in range(1, len(words)):
# `i`start from 1, which means the first word IS NOT INCLUDED in this loop.
word = words[i]
# word[1:] means the rest of the characters except the first one
# (e.g. w = "apple" => w[1:] = "pple")
words[i] = word[0].upper() + word[1:].lower()
# you can also use Python built-in method for this:
# words[i] = word.capitalize()
# After this loop, ["the", "big", "red", "apple"] => ["the", "Big", "Red", "Apple"]
# Finally, we put the words back together and return it
# ["the", "Big", "Red", "Apple"] => theBigRedApple
return "".join(words)
print(to_camel_case("the_big-red_apple"))

Try this:
First, replace all the delimiters into a single one, i.e. str.replace('_', '-')
Split the string on the str.split('-') standardized delimiter
Capitalize each string in list, i.e. str.capitilize()
Join the capitalize string with str.join
>>> s = "the_big_red_apple"
>>> s.replace('_', '-').split('-')
['the', 'big', 'red', 'apple']
>>> ''.join(map(str.capitalize, s.replace('_', '-').split('-')))
'TheBigRedApple'
>> ''.join(word.capitalize() for word in s.replace('_', '-').split('-'))
'TheBigRedApple'
If you need to lowercase the first char, then:
>>> camel_mile = lambda x: x[0].lower() + x[1:]
>>> s = 'TheBigRedApple'
>>> camel_mile(s)
'theBigRedApple'
Alternative,
First replace all delimiters to space str.replace('_', ' ')
Titlecase the string str.title()
Remove space from string, i.e. str.replace(' ', '')
>>> s = "the_big_red_apple"
>>> s.replace('_', ' ').title().replace(' ', '')
'TheBigRedApple'
Another alternative,
Iterate through the characters and then keep a pointer/note on previous character, i.e. for prev, curr in zip(s, s[1:])
check if the previous character is one of your delimiter, if so, uppercase the current character, i.e. curr.upper() if prev in ['-', '_'] else curr
skip whitepace characters, i.e. if curr != " "
Then add the first character in lowercase, [s[0].lower()]
>>> chars = [s[0].lower()] + [curr.upper() if prev in ['-', '_'] else curr for prev, curr in zip(s, s[1:]) if curr != " "]
>>> "".join(chars)
'theBigRedApple'
Yet another alternative,
Replace/Normalize all delimiters into a single one, s.replace('-', '_')
Convert it into a list of chars, list(s.replace('-', '_'))
While there is still '_' in the list of chars, keep
find the position of the next '_'
replacing the character after '_' with its uppercase
replacing the '_' with ''
>>> s = 'the_big_red_apple'
>>> s_list = list(s.replace('-', '_'))
>>> while '_' in s_list:
... where_underscore = s_list.index('_')
... s_list[where_underscore+1] = s_list[where_underscore+1].upper()
... s_list[where_underscore] = ""
...
>>> "".join(s_list)
'theBigRedApple'
or
>>> s = 'the_big_red_apple'
>>> s_list = list(s.replace('-', '_'))
>>> while '_' in s_list:
... where_underscore = s_list.index('_')
... s_list[where_underscore:where_underscore+2] = ["", s_list[where_underscore+1].upper()]
...
>>> "".join(s_list)
'theBigRedApple'
Note: Why do we need to convert the string to list of chars? Cos strings are immutable, 'str' object does not support item assignment
BTW, the regex solution can make use of some group catching, e.g.
>>> import re
>>> s = "the_big_red_apple"
>>> upper_regex_group = lambda x: x.group(1).upper()
>>> re.sub("[_|-](\w)", upper_regex_group, s)
'theBigRedApple'
>>> re.sub("[_|-](\w)", lambda x: x.group(1).upper(), s)
'theBigRedApple'

Insert string before first occurence of character

So basically I have this string __int64 __fastcall(IOService *__hidden this);, and I need to insert a word in between __fastcall (this could be anything) and (IOService... such as __int64 __fastcall LmaoThisWorks(IOService *__hidden this);.
I've thought about splitting the string but this seems a bit overkill. I'm hoping there's a simpler and shorter way of doing this:
type_declaration_fun = GetType(fun_addr) # Sample: '__int64 __fastcall(IOService *__hidden this)'
if type_declaration_fun:
print(type_declaration_fun)
type_declaration_fun = type_declaration_fun.split(' ')
first_bit = ''
others = ''
funky_list = type_declaration_fun[1].split('(')
for x in range(0, (len(funky_list))):
if x == 0:
first_bit = funky_list[0]
else:
others = others + funky_list[x]
type_declaration_fun = type_declaration_fun[0] + ' ' + funky_list[0] + ' ' + final_addr_name + others
type_declaration_fun = type_declaration_fun + ";"
print(type_declaration_fun)
The code is not only crap, but it doesn't quite work. Here's a sample output:
void *__fastcall(void *objToFree)
void *__fastcall IOFree_stub_IONetworkingFamilyvoid;
How could I make this work and cleaner?
Notice that there could be nested parentheses and other weird stuff, so you need to make sure that the name is added just before the first parenthesis.

You can use the method replace():
s = 'ABCDEF'
ins = '$'
before = 'DE'
new_s = s.replace(before, ins + before, 1)
print(new_s)
# ABC$DEF

Once you find the index of the character you need to insert before, you can use splicing to create your new string.
string = 'abcdefg'
string_to_insert = '123'
insert_before_char = 'c'
for i in range(len(string)):
if string[i] == insert_before_char:
string = string[:i] + string_to_insert + string[i:]
break

What about this:
s = "__int64__fastcall(IOService *__hidden this);"
t = s.split("__fastcall",1)[0]+"anystring"+s.split("__fastcall",1)[1]
I get:
__int64__fastcallanystring(IOService *__hidden this);
I hope this is what you want. If not, please comment.

Use regex.
In [1]: import re
pattern = r'(?=\()'
string = '__int64 __fastcall(IOService *__hidden this);'
re.sub(pattern, 'pizza', string)
Out[1]: '__int64 __fastcallpizza(IOService *__hidden this);'
The pattern is a positive lookahead to match the first occurrence of (.

x='high speed'
z='new text'
y = x.index('speed')
x =x[:y] + z +x[y:]
print(x)
>>> high new textspeed
this a quick example, please be aware that y inclusuve after the new string.
be Aware that you are changing the original string, or instead just declare a new string.

Removing And Re-Inserting Spaces

What is the most efficient way to remove spaces from a text, and then after the neccessary function has been performed, re-insert the previously removed spacing?
Take this example below, here is a program for encoding a simple railfence cipher:
from string import ascii_lowercase
string = "Hello World Today"
string = string.replace(" ", "").lower()
print(string[::2] + string[1::2])
This outputs the following:
hlooltdyelwrdoa
This is because it must remove the spacing prior to encoding the text. However, if I now want to re-insert the spacing to make it:
hlool tdyel wrdoa
What is the most efficient way of doing this?

As mentioned by one of the other commenters, you need to record where the spaces came from then add them back in
from string import ascii_lowercase
string = "Hello World Today"
# Get list of spaces
spaces = [i for i,x in enumerate(string) if x == ' ']
string = string.replace(" ", "").lower()
# Set string with ciphered text
ciphered = (string[::2] + string[1::2])
# Reinsert spaces
for space in spaces:
ciphered = ciphered[:space] + ' ' + ciphered[space:]
print(ciphered)

You could use str.split to help you out. When you split on spaces, the lengths of the remaining segments will tell you where to split the processed string:
broken = string.split(' ')
sizes = list(map(len, broken))
You'll need the cumulative sum of the sizes:
from itertools import accumulate, chain
cs = accumulate(sizes)
Now you can reinstate the spaces:
processed = ''.join(broken).lower()
processed = processed[::2] + processed[1::2]
chunks = [processed[index:size] for index, size in zip(chain([0], cs), sizes)]
result = ' '.join(chunks)
This solution is not especially straightforward or efficient, but it does avoid explicit loops.

Using list and join operation,
random_string = "Hello World Today"
space_position = [pos for pos, char in enumerate(random_string) if char == ' ']
random_string = random_string.replace(" ", "").lower()
random_string = list(random_string[::2] + random_string[1::2])
for index in space_position:
random_string.insert(index, ' ')
random_string = ''.join(random_string)
print(random_string)

I think this might Help
string = "Hello World Today"
nonSpaceyString = string.replace(" ", "").lower()
randomString = nonSpaceyString[::2] + nonSpaceyString[1::2]
spaceSet = [i for i, x in enumerate(string) if x == " "]
for index in spaceSet:
randomString = randomString[:index] + " " + randomString[index:]
print(randomString)

string = "Hello World Today"
# getting index of ' '
index = [i for i in range(len(string)) if string[i]==' ']
# storing the non ' ' characters
data = [i for i in string.lower() if i!=' ']
# applying cipher code as mention in OP STATEMENT
result = data[::2]+data[1::2]
# inserting back the spaces in there position as they had in original string
for i in index:
result.insert(i, ' ')
# creating a string solution
solution = ''.join(result)
print(solution)
# output hlool tdyel wrdoa

You can make a new string with this small yet simple (kind of) code:
Note this doesn't use any libraries, which might make this slower, but less confusing.
def weird_string(string): # get input value
spaceless = ''.join([c for c in string if c != ' ']) # get spaceless version
skipped = spaceless[::2] + spaceless[1::2] # get new unique 'code'
result = list(skipped) # get list of one letter strings
for i in range(len(string)): # loop over strings
if string[i] == ' ': # if a space 'was' here
result.insert(i, ' ') # add the space back
# end for
s = ''.join(result) # join the results back
return s # return the result

How to remove or strip off white spaces without using strip() function?

Write a function that accepts an input string consisting of alphabetic
characters and removes all the leading whitespace of the string and
returns it without using .strip(). For example if:
input_string = " Hello "
then your function should return a string such as:
output_string = "Hello "
The below is my program for removing white spaces without using strip:
def Leading_White_Space (input_str):
length = len(input_str)
i = 0
while (length):
if(input_str[i] == " "):
input_str.remove()
i =+ 1
length -= 1
#Main Program
input_str = " Hello "
result = Leading_White_Space (input_str)
print (result)
I chose the remove function as it would be easy to get rid off the white spaces before the string 'Hello'. Also the program tells to just eliminate the white spaces before the actual string. By my logic I suppose it not only eliminates the leading but trailing white spaces too. Any help would be appreciated.

You can loop over the characters of the string and stop when you reach a non-space one. Here is one solution :
def Leading_White_Space(input_str):
for i, c in enumerate(input_str):
if c != ' ':
return input_str[i:]
Edit :
#PM 2Ring mentionned a good point. If you want to handle all types of types of whitespaces (e.g \t,\n,\r), you need to use isspace(), so a correct solution could be :
def Leading_White_Space(input_str):
for i, c in enumerate(input_str):
if not c.isspace():
return input_str[i:]

Here's another way to strip the leading whitespace, that actually strips all leading whitespace, not just the ' ' space char. There's no need to bother tracking the index of the characters in the string, we just need a flag to let us know when to stop checking for whitespace.
def my_lstrip(input_str):
leading = True
for ch in input_str:
if leading:
# All the chars read so far have been whitespace
if not ch.isspace():
# The leading whitespace is finished
leading = False
# Start saving chars
result = ch
else:
# We're past the whitespace, copy everything
result += ch
return result
# test
input_str = " \n \t Hello "
result = my_lstrip(input_str)
print(repr(result))
output
'Hello '
There are various other ways to do this. Of course, in a real program you'd simply use the string .lstrip method, but here are a couple of cute ways to do it using an iterator:
def my_lstrip(input_str):
it = iter(input_str)
for ch in it:
if not ch.isspace():
break
return ch + ''.join(it)
and
def my_lstrip(input_str):
it = iter(input_str)
ch = next(it)
while ch.isspace():
ch = next(it)
return ch + ''.join(it)

Use re.sub
>>> input_string = " Hello "
>>> re.sub(r'^\s+', '', input_string)
'Hello '
or
>>> def remove_space(s):
ind = 0
for i,j in enumerate(s):
if j != ' ':
ind = i
break
return s[ind:]
>>> remove_space(input_string)
'Hello '
>>>

Just to be thorough and without using other modules, we can also specify which whitespace to remove (leading, trailing, both or all), including tab and new line characters. The code I used (which is, for obvious reasons, less compact than other answers) is as follows and makes use of slicing:
def no_ws(string,which='left'):
"""
Which takes the value of 'left'/'right'/'both'/'all' to remove relevant
whitespace.
"""
remove_chars = (' ','\n','\t')
first_char = 0; last_char = 0
if which in ['left','both']:
for idx,letter in enumerate(string):
if not first_char and letter not in remove_chars:
first_char = idx
break
if which == 'left':
return string[first_char:]
if which in ['right','both']:
for idx,letter in enumerate(string[::-1]):
if not last_char and letter not in remove_chars:
last_char = -(idx + 1)
break
return string[first_char:last_char+1]
if which == 'all':
return ''.join([s for s in string if s not in remove_chars])

you can use itertools.dropwhile to remove all particualar characters from the start of you string like this
import itertools
def my_lstrip(input_str,remove=" \n\t"):
return "".join( itertools.dropwhile(lambda x:x in remove,input_str))
to make it more flexible, I add an additional argument called remove, they represent the characters to remove from the string, with a default value of " \n\t", then with dropwhile it will ignore all characters that are in remove, to check this I use a lambda function (that is a practical form of write short anonymous functions)
here a few tests
>>> my_lstrip(" \n \t Hello ")
'Hello '
>>> my_lstrip(" Hello ")
'Hello '
>>> my_lstrip(" \n \t Hello ")
'Hello '
>>> my_lstrip("--- Hello ","-")
' Hello '
>>> my_lstrip("--- Hello ","- ")
'Hello '
>>> my_lstrip("- - - Hello ","- ")
'Hello '
>>>
the previous function is equivalent to
def my_lstrip(input_str,remove=" \n\t"):
i=0
for i,x in enumerate(input_str):
if x not in remove:
break
return input_str[i:]

Python - How to clear spaces from a text

In Python, I have a lot of strings, containing spaces.
I would like to clear all spaces from the text, except if it is in quotation marks.
Example input:
This is "an example text" containing spaces.
And I want to get:
Thisis"an example text"containingspaces.
line.split() is not good, I think, because it clears all of spaces from the text.
What do you recommend?

For the simple case that only " are used as quotes:
>>> import re
>>> s = 'This is "an example text" containing spaces.'
>>> re.sub(r' (?=(?:[^"]*"[^"]*")*[^"]*$)', "", s)
'Thisis"an example text"containingspaces.'
Explanation:
[ ] # Match a space
(?= # only if an even number of spaces follows --> lookahead
(?: # This is true when the following can be matched:
[^"]*" # Any number of non-quote characters, then a quote, then
[^"]*" # the same thing again to get an even number of quotes.
)* # Repeat zero or more times.
[^"]* # Match any remaining non-quote characters
$ # and then the end of the string.
) # End of lookahead.

There is probably a more elegant solution than this, but:
>>> test = "This is \"an example text\" containing spaces."
>>> '"'.join([x if i % 2 else "".join(x.split())
for i, x in enumerate(test.split('"'))])
'Thisis"an example text"containingspaces.'
We split the text on quotes, then iterate through them in a list comprehension. We remove the spaces by splitting and rejoining if the index is odd (not inside quotes), and don't if it is even (inside quotes). We then rejoin the whole thing with quotes.

Using re.findall is probably the more easily understood/flexible method:
>>> s = 'This is "an example text" containing spaces.'
>>> ''.join(re.findall(r'(?:".*?")|(?:\S+)', s))
'Thisis"an example text"containingspaces.'
You could (ab)use the csv.reader:
>>> import csv
>>> ''.join(next(csv.reader([s.replace('"', '"""')], delimiter=' ')))
'Thisis"an example text"containingspaces.'
Or using re.split:
>>> ''.join(filter(None, re.split(r'(?:\s*(".*?")\s*)|[ ]', s)))
'Thisis"an example text"containingspaces.'

Use regular expressions!
import cStringIO, re
result = cStringIO.StringIO()
regex = re.compile('("[^"]*")')
text = 'This is "an example text" containing spaces.'
for part in regex.split(text):
if part and part[0] == '"':
result.write(part)
else:
result.write(part.replace(" ", ""))
return result.getvalue()

You can do this with csv as well:
import csv
out=[]
for e in csv.reader('This is "an example text" containing spaces. '):
e=''.join(e)
if e==' ': continue
if ' ' in e: out.extend('"'+e+'"')
else: out.extend(e)
print ''.join(out)
Prints Thisis"an example text"containingspaces.

'"'.join(v if i%2 else v.replace(' ', '') for i, v in enumerate(line.split('"')))

quotation_mark = '"'
space = " "
example = 'foo choo boo "blaee blahhh" didneid ei did '
formated_example = ''
if example[0] == quotation_mark:
inside_quotes = True
else:
inside_quotes = False
for character in example:
if inside_quotes != True:
formated_example += character
else:
if character != space:
formated_example += character
if character == quotation_mark:
if inside_quotes == True:
inside_quotes = False
else:
inside_quotes = True
print formated_example

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Remove special characters and replace spaces with "-" - python

The best way is probably to use the slugify function which takes any string as input and returns an URL-compatible one, yours is not an URL but it will do the trick, eg: >>> from django.utils.text import slugify >>> slugify(' Joel is a slug ') 'joel-is-a-slug'

Why don't you use the quote function? import urllib.parse urlllib.parse.quote(filename.replace(" ", "-"), safe="")

Related

Python challenge to convert a string to camelCase

Insert string before first occurence of character

Removing And Re-Inserting Spaces

How to remove or strip off white spaces without using strip() function?

Python - How to clear spaces from a text

Categories

Resources