Reading input using format string in python

Reading input using format string in python - python

Suppose I want to read a sequence of inputs, where each input is a tuple is of the form <string> , <integer>, <string>. Additionally, there can be arbitrary amount of whitespace around the commas. An easy way to do this in C/C++ is to use scanf with format string "%s , %d , %s". What is the equivalent function in python?
Suppose we knew that each input is on a separate line, then you could easily parse this in python using split and strip. But the newline requirement complicates things. Furthermore, we could even have weird inputs such as
<s11>, <i1>
, <s12> <s21>,
<i2> , <s22>
Where s11, i1, s12 is the first input and s21, i2, s22 is the second. And scanf would still be able to handle this. How does one do it in python? I also don't want to take the entire input at once and parse it, since I know that there will be other inputs that don't fit this format later on, and I don't want to do the parsing manually.

You should be able to first strip the whitespace, then split on commas, then handle the resulting strings and integers however you want. The regular expression s\+ matches any nonzero amount of whitespace characters:
input_string = " hello \n \t , 10 , world \n "
stripped_string = re.sub('\s+', '', input_string)
substrings = stripped_string.split(',')
string1 = substrings[0]
integer1 = int(substrings[1])
string2 = substrings[2]
You'd just have to put those last three lines inside a loop if you need to handle multiple s,i,s tuples in a row.
EDIT: I realize now you want to interpret any whitespace as a comma. I'm not sure how wise that is, but a hacky way to do it is to replace all the commas with whitespace, split on whitespace, and call it a day
input_string = " hello \n \t , 10 world \n "
stripped_string = re.sub(',', ' ', input_string)
substrings = stripped_string.split()
string1 = substrings[0]
integer1 = int(substrings[1])
string2 = substrings[2]

For delimited format it's pretty easy with the csv module.
You can plugin any kind of file-like inputs to it.
And you handle stripping white spaces and type casting downstream. Here's a sample to get you going:
In [25]: import fileinput
In [26]: import csv
In [28]: reader = csv.reader(fileinput.input())
In [29]: for l in reader:
...: print(l)
...:
stdin input -> a,b, c, d
print output -> ['a', 'b', ' c', ' d ']

Related

Python String manipulation to store result in a string after removing comma delimiter

I have a string which I want to separate based on the ',' delimiter and store the result in a new string. Currently the split function stores the result in an array. How to store the result in a string with out the ',' delimiter? Also , I want to manipulate the positions of the string content. Are there ways in Python to do this?
code
string_in = "a,bcd,e1,20"
print (string_in.split())
output
['a,bcd,e1,20']
I want the below result to be stored in a string without the comma delimiter and manipulate the position of the string content as below.
string_out = a bcd 20 e1

You want to pass your delimiter as an argument to split, like so:
>>> split = string_in.split(",")
['a', 'bcd', 'e1', '20']
That will give you a list of elements that you can manipulate as you wish. When you want to put them back into a space delimited string, you use join like so:
>>> " ".join(split)
'a bcd e1 20'
Take a look at the python documentation for split and join:
https://docs.python.org/3.8/library/stdtypes.html#str.split
https://docs.python.org/3.8/library/stdtypes.html#str.join

You are reinventing the wheel. In your case ordinary search /replace in source string suffices
string_in = "a,bcd,e1,20"
result = string_in.replace(',', ' ')
If you want split/join then
string_in = "a,bcd,e1,20"
result = ' '.join(string_in.split(','))

How to deal with utf-8 string as i want to replace the — with space in my text and then use split function to separate each word from other word

I am trying to put all the words from a page in a string and then replacing — (utf-8) with space.
I am able to do so but when I do split() the words separated by space are taken together. For example, about—it --> about it after replace and then when I do split, the word that I get in the list is 'about it' i.e without space.
I tried copying the — from the utf-8 format. It replaces it but doesn't change it.
book_content = book_content.replace('—', ' ')
book_content = book_content.replace('_', ' ')
book_text = book_content
book_text = book_text.replace("\n", " ")
list_of_contents = book_text.split()
the output required is:
about it (readable string)
and ['about' , 'it']

Ignoring tabs and spaces in a python string

I need to compare two string in python, first string is read from .xlsx file and second is an output from stdout.readlines().
Below code is to get command output.
stdin, stdout, stderr = client.exec_command(testCommand)
op = stdout.readlines()
print("op =\n"+str(op))
str1 = "".join(op)
Since some commands output begin with \t or might have \t in between .
For Eg : Below command output begin with \t and after LEN there is \t.
# PASS_MIN_LEN Minimum acceptable password length.
PASS_MIN_LEN 5
And xlsx file is having
# PASS_MIN_LEN Minimum acceptable password length.
PASS_MIN_LEN 5
As .xlsx comparison string doesn't have \t, how can i ignore \t while comparing two string.
if cmdOutput== xlsxOutput:
is not working.
I tried to trim the cmdOutput with \t, it didn't worked.
Any approach can i follow?

if you just want to replace tabs with a space, perhaps str.replace is simple enough. But that doesn't leave the trailing newlines. You might consider the replacement followed by str.strip. For example:
op = [x.replace('\t', ' ').strip() for x in op]
print(op)
['# PASS_MIN_LEN Minimum acceptable password length.', 'PASS_MIN_LEN 5']
If you have other kinds of characters, or multiple characters (missing data, or the like), a more aggressive approach with regex may be considered:
import re
op = [x for x in map(lambda x: re.sub('\s+', ' ', x).strip(), op)]
print(op)
['# PASS_MIN_LEN Minimum acceptable password length.', 'PASS_MIN_LEN 5']

You can replace the tab in the command output string with a space.
For example:
cmdOutput.replace('\t', ' ') == xlsxOutput

Read the description for strip() method in official python documentation.
"Return a copy of the string with the leading and trailing characters removed."
So, the characters within the string remain unchanged. Using replace() method is the best solution for your problem.
>>> str1 = "PASS_MIN_LEN\t5"
>>> str2 = "PASS_MIN_LEN 5"
>>> str1.replace('\t', ' ') == str2
True

How to delete some characters from a string by matching certain character in python

i am trying to delete certain portion of a string if a match found in the string as below
string = 'Newyork, NY'
I want to delete all the characters after the comma from the string including comma, if comma is present in the string
Can anyone let me now how to do this .

Use .split():
string = string.split(',', 1)[0]
We split the string on the comma once, to save python the work of splitting on more commas.
Alternatively, you can use .partition():
string = string.partition(',')[0]
Demo:
>>> 'Newyork, NY'.split(',', 1)[0]
'Newyork'
>>> 'Newyork, NY'.partition(',')[0]
'Newyork'
.partition() is the faster method:
>>> import timeit
>>> timeit.timeit("'one, two'.split(',', 1)[0]")
0.52929401397705078
>>> timeit.timeit("'one, two'.partition(',')[0]")
0.26499605178833008

You can split the string with the delimiter ",":
string.split(",")[0]
Example:
'Newyork, NY'.split(",") # ['Newyork', ' NY']
'Newyork, NY'.split(",")[0] # 'Newyork'

Try this :
s = "this, is"
m = s.index(',')
l = s[:m]

A fwe options:
string[:string.index(",")]
This will raise a ValueError if , cannot be found in the string. Here, we find the position of the character with .index then use slicing.
string.split(",")[0]
The split function will give you a list of the substrings that were separated by ,, and you just take the first element of the list. This will work even if , is not present in the string (as there'd be nothing to split in that case, we'd have string.split(...) == [string])

Remove all special characters, punctuation and spaces from string

I need to remove all special characters, punctuation and spaces from a string so that I only have letters and numbers.

This can be done without regex:
>>> string = "Special $#! characters spaces 888323"
>>> ''.join(e for e in string if e.isalnum())
'Specialcharactersspaces888323'
You can use str.isalnum:
S.isalnum() -> bool
Return True if all characters in S are alphanumeric
and there is at least one character in S, False otherwise.
If you insist on using regex, other solutions will do fine. However note that if it can be done without using a regular expression, that's the best way to go about it.

Here is a regex to match a string of characters that are not a letters or numbers:
[^A-Za-z0-9]+
Here is the Python command to do a regex substitution:
re.sub('[^A-Za-z0-9]+', '', mystring)

Shorter way :
import re
cleanString = re.sub('\W+','', string )
If you want spaces between words and numbers substitute '' with ' '

TLDR
I timed the provided answers.
import re
re.sub('\W+','', string)
is typically 3x faster than the next fastest provided top answer.
Caution should be taken when using this option. Some special characters (e.g. ø) may not be striped using this method.
After seeing this, I was interested in expanding on the provided answers by finding out which executes in the least amount of time, so I went through and checked some of the proposed answers with timeit against two of the example strings:
string1 = 'Special $#! characters spaces 888323'
string2 = 'how much for the maple syrup? $20.99? That s ridiculous!!!'
Example 1
'.join(e for e in string if e.isalnum())
string1 - Result: 10.7061979771
string2 - Result: 7.78372597694
Example 2
import re
re.sub('[^A-Za-z0-9]+', '', string)
string1 - Result: 7.10785102844
string2 - Result: 4.12814903259
Example 3
import re
re.sub('\W+','', string)
string1 - Result: 3.11899876595
string2 - Result: 2.78014397621
The above results are a product of the lowest returned result from an average of: repeat(3, 2000000)
Example 3 can be 3x faster than Example 1.

Python 2.*
I think just filter(str.isalnum, string) works
In [20]: filter(str.isalnum, 'string with special chars like !,#$% etcs.')
Out[20]: 'stringwithspecialcharslikeetcs'
Python 3.*
In Python3, filter( ) function would return an itertable object (instead of string unlike in above). One has to join back to get a string from itertable:
''.join(filter(str.isalnum, string))
or to pass list in join use (not sure but can be fast a bit)
''.join([*filter(str.isalnum, string)])
note: unpacking in [*args] valid from Python >= 3.5

#!/usr/bin/python
import re
strs = "how much for the maple syrup? $20.99? That's ricidulous!!!"
print strs
nstr = re.sub(r'[?|$|.|!]',r'',strs)
print nstr
nestr = re.sub(r'[^a-zA-Z0-9 ]',r'',nstr)
print nestr
you can add more special character and that will be replaced by '' means nothing i.e they will be removed.

Differently than everyone else did using regex, I would try to exclude every character that is not what I want, instead of enumerating explicitly what I don't want.
For example, if I want only characters from 'a to z' (upper and lower case) and numbers, I would exclude everything else:
import re
s = re.sub(r"[^a-zA-Z0-9]","",s)
This means "substitute every character that is not a number, or a character in the range 'a to z' or 'A to Z' with an empty string".
In fact, if you insert the special character ^ at the first place of your regex, you will get the negation.
Extra tip: if you also need to lowercase the result, you can make the regex even faster and easier, as long as you won't find any uppercase now.
import re
s = re.sub(r"[^a-z0-9]","",s.lower())

string.punctuation contains following characters:
'!"#$%&\'()*+,-./:;<=>?#[\]^_`{|}~'
You can use translate and maketrans functions to map punctuations to empty values (replace)
import string
'This, is. A test!'.translate(str.maketrans('', '', string.punctuation))
Output:
'This is A test'

s = re.sub(r"[-()\"#/#;:<>{}`+=~|.!?,]", "", s)

Assuming you want to use a regex and you want/need Unicode-cognisant 2.x code that is 2to3-ready:
>>> import re
>>> rx = re.compile(u'[\W_]+', re.UNICODE)
>>> data = u''.join(unichr(i) for i in range(256))
>>> rx.sub(u'', data)
u'0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz\xaa\xb2 [snip] \xfe\xff'
>>>

The most generic approach is using the 'categories' of the unicodedata table which classifies every single character. E.g. the following code filters only printable characters based on their category:
import unicodedata
# strip of crap characters (based on the Unicode database
# categorization:
# http://www.sql-und-xml.de/unicode-database/#kategorien
PRINTABLE = set(('Lu', 'Ll', 'Nd', 'Zs'))
def filter_non_printable(s):
result = []
ws_last = False
for c in s:
c = unicodedata.category(c) in PRINTABLE and c or u'#'
result.append(c)
return u''.join(result).replace(u'#', u' ')
Look at the given URL above for all related categories. You also can of course filter
by the punctuation categories.

For other languages like German, Spanish, Danish, French etc that contain special characters (like German "Umlaute" as ü, ä, ö) simply add these to the regex search string:
Example for German:
re.sub('[^A-ZÜÖÄa-z0-9]+', '', mystring)

This will remove all special characters, punctuation, and spaces from a string and only have numbers and letters.
import re
sample_str = "Hel&&lo %% Wo$#rl#d"
# using isalnum()
print("".join(k for k in sample_str if k.isalnum()))
# using regex
op2 = re.sub("[^A-Za-z]", "", sample_str)
print(f"op2 = ", op2)
special_char_list = ["$", "#", "#", "&", "%"]
# using list comprehension
op1 = "".join([k for k in sample_str if k not in special_char_list])
print(f"op1 = ", op1)
# using lambda function
op3 = "".join(filter(lambda x: x not in special_char_list, sample_str))
print(f"op3 = ", op3)

Use translate:
import string
def clean(instr):
return instr.translate(None, string.punctuation + ' ')
Caveat: Only works on ascii strings.

This will remove all non-alphanumeric characters except spaces.
string = "Special $#! characters spaces 888323"
''.join(e for e in string if (e.isalnum() or e.isspace()))
Special characters spaces 888323

import re
my_string = """Strings are amongst the most popular data types in Python. We can create the strings by enclosing characters in quotes. Python treats single quotes the
same as double quotes."""
# if we need to count the word python that ends with or without ',' or '.' at end
count = 0
for i in text:
if i.endswith("."):
text[count] = re.sub("^([a-z]+)(.)?$", r"\1", i)
count += 1
print("The count of Python : ", text.count("python"))

After 10 Years, below I wrote there is the best solution.
You can remove/clean all special characters, punctuation, ASCII characters and spaces from the string.
from clean_text import clean
string = 'Special $#! characters spaces 888323'
new = clean(string,lower=False,no_currency_symbols=True, no_punct = True,replace_with_currency_symbol='')
print(new)
Output ==> 'Special characters spaces 888323'
you can replace space if you want.
update = new.replace(' ','')
print(update)
Output ==> 'Specialcharactersspaces888323'

function regexFuntion(st) {
const regx = /[^\w\s]/gi; // allow : [a-zA-Z0-9, space]
st = st.replace(regx, ''); // remove all data without [a-zA-Z0-9, space]
st = st.replace(/\s\s+/g, ' '); // remove multiple space
return st;
}
console.log(regexFuntion('$Hello; # -world--78asdf+-===asdflkj******lkjasdfj67;'));
// Output: Hello world78asdfasdflkjlkjasdfj67

import re
abc = "askhnl#$%askdjalsdk"
ddd = abc.replace("#$%","")
print (ddd)
and you shall see your result as
'askhnlaskdjalsdk

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Reading input using format string in python - python

Related

Python String manipulation to store result in a string after removing comma delimiter

How to deal with utf-8 string as i want to replace the — with space in my text and then use split function to separate each word from other word

Ignoring tabs and spaces in a python string

How to delete some characters from a string by matching certain character in python

Remove all special characters, punctuation and spaces from string

Categories

Resources