I have a data set with 4 columns, I have already opened, read, and made each column into a key/dictionary, I am trying to filter out any data which begins with a certain letter, e.g. for key DA, any value in this key with a certain starting letter (e.g. E) will result in the row being deleted. How can I go about doing this?
You can use the startswith function to check if a string starts with a certain letter. So in your case, it can be something like the following:
list_dictionary = [
{'KeyYa': 'abc', 'KeyDa': 'def', 'KeyBa': 'ghi', 'KeySa': 'jkl'},
{'KeyYa': 'abc', 'KeyDa': 'Edef', 'KeyBa': 'ghi', 'KeySa': 'jkl'},
{'KeyYa': 'abc', 'KeyDa': 'Gdef', 'KeyBa': 'ghi', 'KeySa': 'jkl'},
{'KeyYa': 'abc', 'KeyDa': 'Edef', 'KeyBa': 'ghi', 'KeySa': 'jkl'}
]
filtered = []
for line_dict in list_dictionary:
if not line_dict['KeyDa'].startswith('E'):
filtered.append(line_dict)
print(filtered)
This prints:
[{'KeyDa': 'def', 'KeyYa': 'abc', 'KeyBa': 'ghi', 'KeySa': 'jkl'}, {'KeyDa': 'Gdef', 'KeyYa': 'abc', 'KeyBa': 'ghi', 'KeySa': 'jkl'}]
If you're comfortable with the filter function and lambda, you can also do this concisely like this:
filtered = list(filter(lambda line: not line['KeyDa'].startswith('E'), list_dictionary))
Related
I have a list which contains other lists. Each element of this list contains 5 elements, like so:
requests = [['abc', 'def', 'abc.def#email.com', 'GROUP1', '000'],
['ghi', 'jkl', 'ghi,jkl#email.com', 'GROUP4', '111'],
['mno', 'pqr', 'mno.pqr#email.com', 'GROUP4', '222'],
['stu', 'vxy', 'stu.vxy#email.com', 'GROUP2', '333'],
['123', '456', '123.456#email.com', 'GROUP4', '444'],
['A12', 'B34', 'A12.B34f#email.com', 'GROUP3', '555']]
I would like to sort this list, in such a way that each list that contains an element equal to GROUP4, will be placed at the end of the list. So, the output would look like this:
requests = [['abc', 'def', 'abc.def#email.com', 'GROUP1', '000'],
['stu', 'vxy', 'stu.vxy#email.com', 'GROUP2', '333'],
['A12', 'B34', 'A12.B34f#email.com', 'GROUP3', '555'],
['ghi', 'jkl', 'ghi,jkl#email.com', 'GROUP4', '111'],
['mno', 'pqr', 'mno.pqr#email.com', 'GROUP4', '222'],
['123', '456', '123.456#email.com', 'GROUP4', '444'],]
I managed to sort them based on that specific element using:
requests.sort(key=lambda x: x[3])
But this just sorts them alphabetically, based on the 4th element. This won't work if a GROUP5 or GROUP6 is present, as those 2 will be the last elements.
Any ideas on how to do this?
Thank you very much!
You can change the lambda to check if the 4th item is 'GROUP4'
requests.sort(key=lambda x: x[3] == 'GROUP4')
If you want to sort the rest of the list as well use tuple in the lambda with two items, it will sort by 'GROUP4' first and then by lexicography order
requests.sort(key=lambda x: (x[3] == 'GROUP4', x[3]))
Move lists with GROUP4 at index [3] to the end by sorting by boolean.
requests.sort(key=lambda el: el[3] == "GROUP4")
I have two strings where I want to isolate sequences of digits from everything else.
For example:
import re
s = 'abc123abc'
print(re.split('(\d+)', s))
s = 'abc123abc123'
print(re.split('(\d+)', s))
The output looks like this:
['abc', '123', 'abc']
['abc', '123', 'abc', '123', '']
Note that in the second case, there's a trailing empty string.
Obviously I can test for that and remove it if necessary but it seems cumbersome and I wondered if the RE can be improved to account for this scenario.
You can use filter and don't return this empty string like below:
>>> s = 'abc123abc123'
>>> re.split('(\d+)', s)
['abc', '123', 'abc', '123', '']
>>> list(filter(None,re.split('(\d+)', s)))
['abc', '123', 'abc', '123']
By thanks #chepner you can generate list comprehension like below:
>>> [x for x in re.split('(\d+)', s) if x]
['abc', '123', 'abc', '123']
If maybe you have symbols or other you need split:
>>> s = '&^%123abc123$##123'
>>> list(filter(None,re.split('(\d+)', s)))
['&^%', '123', 'abc', '123', '$##', '123']
This has to do with the implementation of re.split() itself: you can't change it. When the function splits, it doesn't check anything that comes after the capture group, so it can't choose for you to either keep or discard the empty string that is left after splitting. It just splits there and leaves the rest of the string (which can be empty) to the next cycle.
If you don't want that empty string, you can get rid of it in various ways before collecting the results into a list. user1740577's is one example, but personally I prefer a list comprehension, since it's more idiomatic for simple filter/map operations:
parts = [part for part in re.split('(\d+)', s) if part]
I recommend against checking and getting rid of the element after the list has already been created, because it involves more operations and allocations.
A simple way to use regular expressions for this would be re.findall:
def bits(s):
return re.findall(r"(\D+|\d+)", s)
bits("abc123abc123")
# ['abc', '123', 'abc', '123']
But it seems easier and more natural with itertools.groupby. After all, you are chunking an iterable based on a single condition:
from itertools import groupby
def bits(s):
return ["".join(g) for _, g in groupby(s, key=str.isdigit)]
bits("abc123abc123")
# ['abc', '123', 'abc', '123']
here is the code:
aList = ['0.01', 'xyz', 'J0.01', 'abc', 'xyz'];
aList.remove('0.01');
print("List : ", aList)
here is the output:
List :
['xyz', 'J0.01', 'abc', 'xyz']
How can I remove the 0.01 attached to 'J0.01'? I would like to keep the J. Thanks for your time! =)
Seems like you want
aList = ['0.01', 'xyz', 'J0.01', 'abc', 'xyz'];
>>> [z.replace('0.01', '') for z in aList]
['', 'xyz', 'J', 'abc', 'xyz']
If you want to remove also empty strings/whitespaces,
>>> [z.replace('0.01', '') for z in aList if z.replace('0.01', '').strip()]
['xyz', 'J', 'abc', 'xyz']
Using re module:
import re
aList = ['0.01', 'xyz', 'J0.01', 'abc', 'xyz'];
print([i for i in (re.sub(r'\d+\.?\d*$', '', i) for i in aList) if i])
Prints:
['xyz', 'J', 'abc', 'xyz']
EDIT:
The regexp substitution re.sub(r'\d+\.?\d*$', '', i) will substitute every digit followed by dot (optional) and followed by any number of digits for empty string. The $ signifies that the digit should be at the end of the string.
So. e.g. the following matches are valid: "0.01", "0.", "0". Explanation on external site here.
Something like that can works:
l = ['0.01', 'xyz', 'J0.01', 'abc', 'xyz']
string = '0.01'
result = []
for x in l :
if string in x:
substring = x.replace(string,'')
if substring != "":
result.append(substring)
else:
result.append(x)
print(result)
try it, regards.
This question already has answers here:
Find all upper, lower and mixed case combinations of a string
(7 answers)
Closed 1 year ago.
How do you generate all combinations of lower and upper characters in a word? e.g:
'abc' → ['abc', 'ABC', 'Abc', 'ABc', 'aBC', 'aBc', 'abC', 'Abc']
'ab' → ['ab', 'AB', 'Ab', 'aB']
You can achieve this by zipping the upper and lower case letters and taking their cartesian product:
import itertools
chars = "abc"
results = list(map(''.join, itertools.product(*zip(chars.upper(), chars.lower()))))
print(results)
>>>['ABC', 'ABc', 'AbC', 'Abc', 'aBC', 'aBc', 'abC', 'abc']
To visualise how this works:
zip is creating our 3 'axes' for us, each with 2 points (the upper / lower cases)
[('A', 'a'), ('B', 'b'), ('C', 'c')].
product takes the cartesian product of these axes, i.e. the 8 possible coordinates corresponding to the corners of the unit cube it creates:
1. Create axes
2. Take Cartesian product
Using recursion:
def foo(word):
if len(word) == 1:
return [word.lower(), word.upper()]
else:
return [f"{j}{i}" for j in foo(word[0]) for i in foo(word[1:])]
s = "abc"
print(foo(s))
>>> ['abc', 'abC', 'aBc', 'aBC', 'Abc', 'AbC', 'ABc', 'ABC']
How can I extract the text enclosed within the parenthesis from the following string:
string = '{a=[], b=[abc, def], c=[ghi], d=[], e=[jkl], f=[mno, pqr, stu, vwx]}'
Expected Output is:
['abc','def','ghi','jkl','mno','pqr','stu','vwx']
Regex should help.
import re
string = '{a=[], b=[abc, def], c=[ghi], d=[], e=[jkl], f=[mno, pqr, stu, vwx]}'
res = []
for i in re.findall("\[(.*?)\]", string):
res.extend(i.replace(",", "").split())
print res
Output:
['abc', 'def', 'ghi', 'jkl', 'mno', 'pqr', 'stu', 'vwx']
An alternative using the newer regex module could be:
(?:\G(?!\A)|\[)([^][,]+)(?:,\s*)?
Broken down, this says:
(?:\G(?!\A)|\[) # match either [ or at the end of the last match
([^][,]+) # capture anything not [ or ] or ,
(?:,\s*)? # followed by , and whitespaces, eventually
See a demo on regex101.com.
In Python:
import regex as re
string = '{a=[], b=[abc, def], c=[ghi], d=[], e=[jkl], f=[mno, pqr, stu, vwx]}'
rx = re.compile(r'(?:\G(?!\A)|\[)([^][,]+)(?:,\s*)?')
output = rx.findall(string)
print(output)
# ['abc', 'def', 'ghi', 'jkl', 'mno', 'pqr', 'stu', 'vwx']