Find specific character with python regex

Find specific character with python regex - python

I have a list of strings looking like this:
H PL->01 Tx=000/006 Ph=00/000 DGDD DDDR YDyD GRDD YGR Dets= 003,003,003,003,003,003,003,003,003,003,003,003, ports= 255,255,255,255,255,255,255,255,'
I want to be able to extract the content tha matches DGDD DDDR YDyD GRDD YGR(this changes but always has the letters D,G,R,Y,y and its length may change) and put it in a list without whitespaces like this:
['D', 'G', 'D', 'D', 'D', 'D', 'D', 'R', 'Y', 'D', 'y', 'D', 'G', 'R', 'D', 'D', 'Y', 'G', 'R']

If the criteria is groups of DGRYy that have at least three characters, then you can use a regex to that effect and then "flatten" it to a list after... eg:
import re
from itertools import chain
print list(chain.from_iterable(re.findall('[DGRYy]{3,}', data)))
# ['D', 'G', 'D', 'D', 'D', 'D', 'D', 'R', 'Y', 'D', 'y', 'D', 'G', 'R', 'D', 'D', 'Y', 'G', 'R']
If it's always between two items, then it's possible to use the builtin string functions to extract it, eg:
print [ch for ch in data[data.index('Ph'):].partition('Dets=')[0].split(' ', 1)[1] if ch != ' ']

Related

I want to print unique strings from a list in python but it breaks the list

I am trying to fetch the different file paths through this:
for (i, imagePath) in enumerate(imagePaths):
name = set(imagePath.split(os.path.sep)[-2])
It brings multiple paths that have the same names such as this:
Angelina Jolie
Angelina Jolie
Sam
Sam
Sam
What I want to do is print the unique ones of them. Like print Angelina Jolie only once. But whatever I try whether it is the unique method, or the set method to convert list to a set it returns something like this. And I am not understanding the logic behind this.
{'l', 'e', 'J', 'g', 'i', ' ', 'o', 'a', 'A', 'n'}
{'l', 'e', 'J', 'g', 'i', ' ', 'o', 'a', 'A', 'n'}
{'l', 'e', 'J', 'g', 'i', ' ', 'o', 'a', 'A', 'n'}
{'l', 'e', 'J', 'g', 'i', ' ', 'o', 'a', 'A', 'n'}
Please help me try to understand why this is happening and what solution should I look for?

There are two ways to remove duplicates.
list(dict.fromkeys(imagePaths).keys())
# or
list(set(imagePaths)) # If you don't care about order
You don't necessarily have to convert to lists btw.

Conditional statement to pull out last names from a list

I have a list of full names (titled "FullNames") and I am trying to pull out the last names. The problem is that some of the full names include middle names (e.g., some of the items in the list are "Craig Nelson" while others are "Craig T. Nelson") which stops me from using a simple list comprehension statement such as:
LastNames = ([x.split()[1] for x in FullNames])
Instead, I am trying to loop through the list with this code:
LastNames = []
for item in FullNames:
if '.' in FullNames:
LastNames.appened(item[2])
else:
LastNames.append(item[1])
print(LastNames)
However, I am just getting a bunch of letters back:
['u', 'a', 'e', 'i', 'o', 'a', 't', 'h', 'r', 'e', 'e', 'r', 'e', 'h', 'a', 'i', 't', 'a', 'r', 'a', 'i', 'e', 'o', 'e', 'e', 'a', 'r', 'o', 'a', 'y', 'i', 'e', 'e', 'o', 'o', 'e', 'e', 'a', 'i', 'i', 'e', 'm', 'a', 'a', 'a', 'n', 'e', 'a', 'r']
Is there a simple way around this?

def get_last(name):
return name.split(' ')[-1].split('.')[-1]
full_names = ["Craig T. Nelson", "Craig Nelson"]
output = list(map(get_last, full_names))
print(output)
#['Nelson', 'Nelson']

Why does unified_diff method from the difflib library in Python leave out some characters?

I am trying to check for differences between lines. This is my code:
from difflib import unified_diff
s1 = ['a', 'b', 'c', 'd', 'e', 'f']
s2 = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'i', 'k', 'l', 'm', 'n']
for line in unified_diff(s1, s2):
print line
It prints:
---
+++
## -4,3 +4,9 ##
d
e
f
+g
+i
+k
+l
+m
+n
What happened to 'a', 'b', and 'c'? Thanks!

If you take a look at unified_diff code you will find description about a parameter called n:
Unified diffs are a compact way of showing line changes and a few
lines of context. The number of context lines is set by 'n' which
defaults to three.
In your case, n basically indicates numbers of characters. If you assign a value to n, then you will get the correct output. This code:
from difflib import unified_diff
s1 = ['a', 'b', 'c', 'd', 'e', 'f']
s2 = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'i', 'k', 'l', 'm', 'n']
for line in unified_diff(s1, s2,n=6):
print line
Will generate:
---
+++
## -1,6 +1,12 ##
a
b
c
d
e
f
+g
+i
+k
+l
+m
+n

Control Structure in Python

I am wondering why this piece of code:
wordlist = ['cat','dog','rabbit']
letterlist=[]
for aword in wordlist:
for aletter in aword:
if aletter not in letterlist:
letterlist.append(aletter)
print(letterlist)
prints ['c', 'a', 't', 'd', 'o', 'g', 'r', 'b', 'i']
while this code:
wordlist = ['cat','dog','rabbit']
letterlist=[]
for aword in wordlist:
for aletter in aword:
letterlist.append(aletter)
print(letterlist)
prints ['c', 'a', 't', 'd', 'o', 'g', 'r', 'a', 'b', 'b', 'i', 't']
I don't understand how the code is being computed and doesn't spell out all of 'rabbit' and/or why it spells out 'r', 'b', 'i'? Anyone know what's going on?

You are adding each unique letter to letterlist with this if block:
if aletter not in letterlist:
letterlist.append(aletter)
If the letter has already been seen, it does not get appended again. That means the second time you see a (in 'rabbit'), the second b (in 'rabbit') and the second and third time you see t, they aren't added to the list.

This part of the code if aletter not in letterlist: checks if the letter has already been added to the list. If it does, you wont add it again.
So basically you wont add any repeated characters. That's why the output is ['c', 'a', 't', 'd', 'o', 'g', 'r', 'b', 'i'] . No repeated letters there.
The second piece of code just iterates the whole list and appends to letterlist no matter what. That's why all letters are added, and you get ['c', 'a', 't', 'd', 'o', 'g', 'r', 'a', 'b', 'b', 'i', 't'] as result.

Sorting a list using an alphabet string

I'm trying to sort a list containing only lower case letters by using the string :
alphabet = "abcdefghijklmnopqrstuvwxyz".
that is without using sort, and with O(n) complexity only.
I got here:
def sort_char_list(lst):
alphabet = "abcdefghijklmnopqrstuvwxyz"
new_list = []
length = len(lst)
for i in range(length):
new_list.insert(alphabet.index(lst[i]),lst[i])
print (new_list)
return new_list
for this input :
m = list("emabrgtjh")
I get this:
['e']
['e', 'm']
['a', 'e', 'm']
['a', 'b', 'e', 'm']
['a', 'b', 'e', 'm', 'r']
['a', 'b', 'e', 'm', 'r', 'g']
['a', 'b', 'e', 'm', 'r', 'g', 't']
['a', 'b', 'e', 'm', 'r', 'g', 't', 'j']
['a', 'b', 'e', 'm', 'r', 'g', 't', 'h', 'j']
['a', 'b', 'e', 'm', 'r', 'g', 't', 'h', 'j']
looks like something goes wrong along the way, and I can't seem to understand why.. if anyone can please enlighten me that would be great.

You are looking for a bucket sort. Here:
def sort_char_list(lst):
alphabet = "abcdefghijklmnopqrstuvwxyz"
# Here, create the 26 buckets
new_list = [''] * len(alphabet)
for letter in lst:
# This is the bucket index
# You could use `ord(letter) - ord('a')` in this specific case, but it is not mandatory
index = alphabet.index(letter)
new_list[index] += letter
# Assemble the buckets
return ''.join(new_list)
As for complexity, since alphabet is a pre-defined fixed-size string, searching a letter in it is requires at most 26 operations, which qualifies as O(1). The overall complexity is therefore O(n)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Find specific character with python regex - python

Related

I want to print unique strings from a list in python but it breaks the list

Conditional statement to pull out last names from a list

Why does unified_diff method from the difflib library in Python leave out some characters?

Control Structure in Python

Sorting a list using an alphabet string

Categories

Resources