How do I start reading from a certain character in a string? - python

I have a list of strings that look something like this:
"['id', 'thing: 1\nother: 2\n']"
"['notid', 'thing: 1\nother: 2\n']"
I would now like to read the value of 'other' out of each of them.
I did this by counting the number at a certain position but since the position of such varies I wondererd if I could read from a certain character like a comma and say: read x_position character from comma. How would I do that?

Assuming that "other: " is always present in your strings, you can use it as a separator and split by it:
s = 'thing: 1\nother: 2'
_,number = s.split('other: ')
number
#'2'
(Use int(number) to convert the number-like string to an actual number.) If you are not sure if "other: " is present, enclose the above code in try-except statement.

Related

List of unique characters of a dataset

I have a dataset in a dataframe and I want to see the total number of characters and the list of unique characters.
As for the total number of characters I have implemented the following code which seems is working well
df["Preprocessed_Text"].str.len().sum()
Could you please let me know how to get a list with the unique characters (not including the space)?
Try this:
from string import ascii_letters
chars = set(''.join(df["Preprocessed_Text"])).intersection(ascii_letters)
If you need to work with a different alphabet, then simply replace ascii_letters with whatever you need.
If you want every character but the space then:
chars = set(''.join(df["Preprocessed_Text"]).replace(' ', ''))
unichars = list(''.join(df["Preprocessed_Text"]))
print(sorted(set(unichars), key=unichars.index))
unique = list(set([letter for letter in ''.join(df['Processed_text'].values) if letter != " "]))

Leading zero for hex string

I read UART port with this:
print ("Attempt to Read")
readOut = serial_port.readline()
time.sleep(1)
in_hex = hex(int.from_bytes(readOut,byteorder='big'))
print ("Reading: ", in_hex)
and then it gives me output e.g. like:
Attempt to Read
Reading: 0x6011401011403
So there is one leading zero missing before number 6. I can't find a way to add it without specifying in the code the length of the whole string (and I don't want to do that, becasue the string can have variable length e.g. like 0x056822). Could you please help?
You could inject a zero after 0x when the number of characters is odd:
if len(in_hex)%2: in_hex = in_hex.replace("0x","0x0")
Even number of digits is guaranteed if you convert each byte to two characters:
hexstr = ''.join(format(b,'02x') for b in bytesequence)
Just prepend "0x" if you want.

python the list has automatically unwanted special characters \n+

I am reading some data from a dataframe column and I do some manipulation on each value if the value contains a "-". These manipulations include spliting based on the "-". However I do not understand why each value in the list has an "\n*" as for instance
['2010\n1', '200\n2 450\n3', ..., '1239\n1000']
here is a sample of my code:
splited = []
wantedList = []
val = str(x) # x represents the value in the value read from the dataframe column
print val # the val variable does not does not contain those special characters
if val.find('-') != -1:
splited = val.split('-')
wantedList.append(splited[0])
print splited # splited list contains those special characters
print wantedList # wantedList contains those special characters
I guess this has to do with the way I created the list or the way I am appending to it.
Does anyone knows why something like this does happen
There isn't nothing in your code that could possibly automagically add a new line character at some random position within your strings. I'd say the characters are already in the string but print isn't showing as \n but as a new line.
You can confirm that by printing the representation of the string:
print repr(val)
If you want them out of your strings, you can with a simple str.replace for all \n.

Add a number to the beginning of a string in particular locations

I have this string:
abc,12345,abc,abc,abc,abc,12345,98765443,xyz,zyx,123
What can I use to add a 0 to the beginning of each number in this string? So how can I turn that string into something like:
abc,012345,abc,abc,abc,abc,012345,098765443,xyz,zyx,0123
I've tried playing around with Regex but I'm unsure how I can use that effectively to yield the result I want. I need it to match with a string of numbers rather than a positive integer, but with only numbers in the string, so not something like:
1234abc567 into 01234abc567 as it has letters in it. Each value is always separated by a comma.
Use re.sub,
re.sub(r'(^|,)(\d)', r'\g<1>0\2', s)
or
re.sub(r'(^|,)(?=\d)', r'\g<1>0', s)
or
re.sub(r'\b(\d)', r'0\1', s)
Try following
re.sub(r'(?<=\b)(\d+)(?=\b)', r'\g<1>0', str)
If the numbers are always seperated by commas in your string, you can use basic list methods to achieve the result you want.
Let's say your string is called x
y=x.split(',')
x=''
for i in y:
if i.isdigit():
i='0'+i
x=x+i+','
What this piece of code does is the following:
Splits your string into pieces depending on where you have commas and returns a list of the pieces.
Checks if the pieces are actually numbers, and if they are a 0 is added using string concatenation.
Finally your string is rebuilt by concatenating the pieces along with the commas.

Find Certain String Indices

I have this string and I need to get a specific number out of it.
E.G. encrypted = "10134585588147, 3847183463814, 18517461398"
How would I pull out only the second integer out of the string?
You are looking for the "split" method. Turn a string into a list by specifying a smaller part of the string on which to split.
>>> encrypted = '10134585588147, 3847183463814, 18517461398'
>>> encrypted_list = encrypted.split(', ')
>>> encrypted_list
['10134585588147', '3847183463814', '18517461398']
>>> encrypted_list[1]
'3847183463814'
>>> encrypted_list[-1]
'18517461398'
Then you can just access the indices as normal. Note that lists can be indexed forwards or backwards. By providing a negative index, we count from the right rather than the left, selecting the last index (without any idea how big the list is). Note this will produce IndexError if the list is empty, though. If you use Jon's method (below), there will always be at least one index in the list unless the string you start with is itself empty.
Edited to add:
What Jon is pointing out in the comment is that if you are not sure if the string will be well-formatted (e.g., always separated by exactly one comma followed by exactly one space), then you can replace all the commas with spaces (encrypt.replace(',', ' ')), then call split without arguments, which will split on any number of whitespace characters. As usual, you can chain these together:
encrypted.replace(',', ' ').split()

Categories

Resources