extract last two fields from split - python

I want to extract last two field values from a variable of varying length. For example, consider the three values below:
fe80::e590:1001:7d11:1c7e
ff02::1:ff1f:fb6
fe80::7cbe:e61:f5ab:e62 ff02::1:ff1f:fb6
These three lines are of variable lengths. I want to extract only the last two field values if i split each line by delimiter :
That is, from the three lines, i want:
7d11, 1c7e
ff1f, fb6
ff1f, fb6
Can this be done using split()? I am not getting any ideas.

If s is the string containing the IPv6 address, use
s.split(":")[-2:]
to get the last two components. The split() method will return a list of all components, and the [-2:] will slice this list to return only the last two elements.

You can use str.rsplit() to split from the right:
>>> ipaddress = 'fe80::e590:1001:7d11:1c7e'
>>> ipaddress.rsplit(':', 2) # splits at most 2 times from the right
['fe80::e590:1001', '7d11', '1c7e']
This avoids the unnecessary splitting of the first part of the address.

Related

Why is re not removing some values from my list?

I'm asking more out of curiosity at this point since I found a work-around, but it's still bothering me.
I have a list of dataframes (x) that all have the same column names. I'm trying to use pandas and re to make a list of the subset of column names that have the format
"D(number) S(number)"
so I wrote the following function:
def extract_sensor_columns(x):
sensor_name = list(x[0].columns)
for j in sensor_name:
if bool(re.match('D(\d+)S(\d+)', j))==False:
sensor_name.remove(j)
return sensor_name
The list that I'm generating has 103 items (98 wanted items, 5 items). This function removes three of the five columns that I want to get rid of, but keeps the columns labeled 'Pos' and 'RH.' I generated the sensor_name list outside of the function and tested the truth value of the
bool(re.match('D(\d+)S(\d+)', sensor_name[j]))
for all five of the items that I wanted to get rid of and they all gave the False value. The other thing I tried is changing the conditional to ==True, which even more strangely gave me 54 items (all of the unwanted column names and half of the wanted column names).
If I rewrite the function to add the column names that have a given format (rather than remove column names that don't follow the format), I get the list I want.
def extract_sensor_columns(x):
sensor_name = []
for j in list(x[0].columns):
if bool(re.match('D(\d+)S(\d+)', j))==True:
sensor_name.append(j)
return sensor_name
Why is the first block of code acting so strangely?
In general, do not change arrays while iterating over them. The problem lies in the fact that you remove elements of the iterable in the first (wrong) case. But in the second (correct) case, you add correct elements to an empty list.
Consider this:
arr = list(range(10))
for el in arr:
print(el)
for i, el in enumerate(arr):
print(el)
arr.remove(arr[i+1])
The second only prints even number as every next one is removed.

Multiple set unions along with list comprehension

I am trying to understand this code:
edit_two_set = set()
edit_two_set = set.union(*[edit_two_set.union(edit_one_letter(w, allow_switches)) for w in one])
Here one is a set of strings. allow_switches is True.
edit_one_letter takes in one word and makes either one character insertion, deletion or one switch of corresponding characters.
I understand:
[edit_two_set.union(edit_one_letter(w, allow_switches)) for w in one]
is performing a list comprehension in which for every word in one we make one character edit and then take the union of the resulting set with the previous set.
I am mainly stuck at trying to understand what:
set.union(*[])
is doing?
Thanks!
You can refer to this:
https://docs.python.org/3/library/stdtypes.html#frozenset.union
The list comprehension returns a list of sets.
set.union(*) would perform a union of the sets within the list and return a new set.

Converting String Data Values with two commas from csv or txt files into float in python

I just received a dataset from a HPLC run and the problem I ran into is that the txt data from the software generates two dotted separated values for instance "31.456.234 min". Since I want to plot the data with matplotlib and numpy I can only see the data where the values are not listed with two commas. This is due to every value which is smaller than 1 is represented with one comma like "0.765298" the rest of the values is, as aforementioned, listed with two commas.
I tried to solve this issue with a .split() and .find() method, however, this is rather inconvenient and I was wondering whether there would be a more elegant way to solve this issue, since I need in the end again x and y values for plotting.
Many thanks for any helping answers in advance.
This is not very clear regarding comma and dots.
For the decimal number you say that you have comma but you show a dot : 0.765298
I guess you can not have dots for either thousand separator and decimal...
If you have english notation I guess the numbers are:
"31,456,234 min" and "0.765298"
In this case you can use the replace method :
output = "31,456,234"
number = float(output.replace(',',''))
# result : 31456234.0
EDIT
Not very sure to have understood what you are looking for and the format of the numbers...
However if the second comma in 31.456.234 is unwanted here is a solution :
def conv(n):
i = n.find('.')
return float(n[:i]+'.'+n[i:].replace('.',''))
x = '31.456.234'
y = '0.765298'
print(conv(x)) # 31.456234
print(conv(y)) # 0.765298

What does this anonymmous split function do?

narcoticsCrimeTuples = narcoticsCrimes.map(lambda x:(x.split(",")[0], x))
I have a CSV I am trying to parse by splitting on commas and the first entry in each array of strings is the primary key.
I would like to get the key on a separate line (or just separate) from the value when calling narcoticsCrimeTuples.first()[1]
My current understanding is 'split x by commas, take the first part of each split [0], and return that as the new x', but I'm pretty sure that middle part is not right because the number inside the [] can be anything and returns the same result.
Your variable is named "narcoticsCrimeTuples", so you seem to be expected to get a "tuple".
Your two values of the tuple are the first column of the CSV x.split(",")[0] and the entire line x.
I would like to get the key on a separate line
Not really clear why you want that...
(or just separate) from the value when calling narcoticsCrimeTuples.first()[1]
Well, when you call .first(), you get the entire tuple. [0] is the first column, and [1] would be the corresponding line of the CSV, which also contains the [0] value.
If you narcoticsCrimes.flatMap(lambda x: x.split(",")), then all the values will be separated.
For example, in the word count example...
textFile.flatMap(lambda line: line.split()).map(lambda word: (word, 1))
Judging by the syntax seems like you are in PySpark. If that's true you're mapping over your RDD and for each row creating a (key, row) tuple, the key being the first element in a comma-separated list of items. Doing narcoticsCrimeTuples.first() will just give you the first record.
See an example here:
https://gist.github.com/amirziai/5db698ea613c6857d72e9ce6189c1193

Breaking 1 String into 2 Strings based on special characters using python

I am working with python and I am new to it. I am looking for a way to take a string and split it into two smaller strings. An example of the string is below
wholeString = '102..109'
And what I am trying to get is:
a = '102'
b = '109'
The information will always be separated by two periods like shown above, but the number of characters before and after can range anywhere from 1 - 10 characters in length. I am writing a loop that counts characters before and after the periods and then makes a slice based on those counts, but I was wondering if there was a more elegant way that someone knew about.
Thanks!
Try this:
a, b = wholeString.split('..')
It'll put each value into the corresponding variables.
Look at the string.split method.
split_up = [s.strip() for s in wholeString.split("..")]
This code will also strip off leading and trailing whitespace so you are just left with the values you are looking for. split_up will be a list of these values.

Categories

Resources