Python set anomaly

Python set anomaly - python

I'm trying to understand how sets work when I try to get values from a list.
So when I run the code at the bottom
wordlist = ['hello',1,2,3]
wordSet = set(wordlist)
Output is
{3, 1, 2, 'hello'}
or something similar because set doesn't have a order.
But my point is, when I try to reach my list's first element, like using myList[0] when using it's value to create a set
wordlist = ['hello',1,2,3]
wordSet = set(wordlist[0])
I was expecting output to be
{'hello'}
but instead, I get
{'l', 'o', 'h', 'e'}
or one of randomized style.
My point is when I put my list in set function directly, it uses entire list to create a set, but when I want to create a set with using only first element in my list, it divides my string to characters.
Why does that happen ?

Strings such as 'hello' are iterable; set() converts iterables into sets.
To clarify,
set(('1', '1', '2', '3')) == {'1', '2', '3'}
set(['1', '1', '2', '3']) == {'1', '2', '3'}
set('1123') == {'1', '2', '3'}

Calling set on an object will iterate the object. Strings are iterable, yielding individual characters. If you want a set containing only the first element of wordlist, you would need to use an iterable which only contains that element:
set([worldlist[0]])
Or, more directly, just use the curly braces:
{worldlist[0]}

Related

How do I collect values into a list in Python standard regex?

I have a string with repeated parts:
s = '[1][2][5] and [3][8]'
And I want to group the numbers into two lists using re.match. The expected result is:
{'x': ['1', '2', '5'], 'y': ['3', '8']}
I tried this expression that gives a wrong result:
re.match(r'^(?:\[(?P<x>\d+)\])+ and (?:\[(?P<y>\d+)\])+$', s).groupdict()
# {'x': '5', 'y': '8'}
It looks like re.match keeps the last match only. How do I collect all the parts into a list instead of the last one only?
Of course, I know that I could split the line on ' and ' separator and use re.findall for the parts instead, but this approach is not general enough because it gives some issues for more complex strings so I would always need to think about correct splitting separately all the time.

We can use regular expressions here. First, iterate the input string looking for matches of the type [3][8]. For each match, use re.findall to generate a list of number strings. Then, add a key whose value is that list. Note that we maintain a list of keys and pop each one when we use it.
import re
s = '[1][2][5] and [3][8]'
keys= ['x', 'y']
d = {}
for m in re.finditer('(?:\[\d+\])+', s):
d[keys.pop(0)] = re.findall(r'\d+', m.group())
print(d) # {'y': ['3', '8'], 'x': ['1', '2', '5']}

If you want to use the named capture groups, you can write the pattern like this repeating the digits between the square brackets inside the named group.
Then you can get the digits from the groupdict using re.findall on the values and first check if there is a match for the pattern:
^(?P<x>(?:\[\d+])+) and (?P<y>(?:\[\d+])+)$
See a regex demo
Example
import re
s = '[1][2][5] and [3][8]'
m = re.match(r'^(?P<x>(?:\[\d+])+) and (?P<y>(?:\[\d+])+)$', s)
if m:
dct = {k: re.findall(r"\d+", v) for k, v in m.groupdict().items()}
print(dct)
Output
{'x': ['1', '2', '5'], 'y': ['3', '8']}

Double Nesting defaultdict

Poked around but couldn't figure it out, probably a very simple solution but please help me understand.
Source (sample.txt):
1,1,2,3
2,3,2,4,4
This:
import csv
from collections import defaultdict
input = "sample.txt"
with open(input) as f:
r = csv.reader(f)
d = defaultdict(list)
rlabel = 1
for row in r:
d[rlabel].append(row)
rlabel += 1
print(d)
Gets This:
defaultdict(<class 'list'>, {1: [['1', '1', '2', '3']], 2: [['2', '3', '2', '4', '4']]})
Why are there double brackets around my lists?

Why are there double brackets around my lists?
Your code works exactly as expected. The key point is the usage of extend and append.
append adds the parameter you passed as a single element. Due to a list is an object and your defaultdict class is list, so the list is appended as a list of lists.
extend method iterate in the input and extend the original list by adding all elements from an iterable.
So, in this case, if you want to add a single list to your defaultdict you should use list.extend method. And your output will be:
defaultdict(<class 'list'>, {1: ['1', '1', '2', '3'], 2: ['2', '3', '2', '4', '4']})

With a defaultdict, when a key is created a default value is associated to it
>>> d = defaultdict(list)
>>> 'a' in d[1]
False
>>> d
defaultdict(<class 'list'>, {1: []})
Given that your row is a list, you are appending a list to the list associated with the key.
To add the elements you can do
d[rlabel]+= row

Getting elements with dynamic range of list in python

I am trying to get the elements of a list in python but with a dynamic range ie if I have two lists ['9','e','s','t','1','2','3'] and ['9','e','1','2','3','s','t'] now I need to access the three numbers including 1, so what I did was reached for 1 and then pass the index value of 1 and extract the desired values ie
s_point = valueList.index('1')
print (valueList[s_point::3]
but it does not seem to work however on const values like
print (valueList[1::3])
it seems to work just fine. is there a way I could dynamically pass range of list elements to extract out of list ?

If you want the three items after the s_point index you don't have to use the step which is ::step because the usage is different. Just change your line to this:
valueList[s_point:s_point+3]
output:
>>> [1,2,3]
This way it is going to get the sublist of valueList from the index of s_point to the three which are front of it.
And to know the usage of step as other websites mentioned:
The step is a difference between each number in the result. The
default value of the step is 1 if not specified
For example:
valueList[::2]
result:
>>> ['9','s','1','3']
As you see the odd items are not in the list.

Looks like you need
lst = ['9', 'e', '1', '2', '3', 's', 't']
print(lst[lst.index('1'):lst.index('1') + 3])
lst1 = ['9', 'e', 's', 't', '1', '2', '3']
print(lst1[lst1.index('1'):lst1.index('1') +3])
Output:
['1', '2', '3']
['1', '2', '3']

There are three problems
1) s_point = valueList.index('1')
print (valueList[s_point::3]
but it does not seem to work,
This is simply because you missed the ending parentheses of the print statement.
2) However, on const values like 1 in this example
print (valueList[1::3]) works
but it will not give the desired output rather prints the first 3 numbers.
3) Assuming when the list valuelist is defined, the alphabets used is in single quotes.
Now for the actual solution part.
If you are looking for the case wherein you need the value 1 or any dynamic value say x and the subsequent three values after that from the list. You can make use of a function or anonymous function called lambda, which should accept a dynamic parameter, and it should return the subsequent 3 or dynamic values. like shown below.
>>>
>>> valuelist = [9, 'e', 's', 't', 1, 2, 3]
>>>
>>> result = lambda x,y: valuelist[valuelist.index(x):valuelist.index(x)+y]
>>> result(1,3)
[1, 2, 3]
>>>
>>>

Remove wildcard string from list

I have a list which is a large recurring dataset with headers of the form:
array = ['header = 1','0','1','2',...,'header = 1','1','2','3',...,'header = 2','1','2','3']
The header string can vary between each individual dataset, but the size of the individual datasets do not.
I would like to remove all of the headers so that I am left with:
array = ['0','1','2',...,'1','2','3',...,'1','2','3']
If the header string does not vary, then I can remove them with:
lookup = array[0]
while True:
try:
array.remove(lookup)
except ValueError:
break
However, if the header strings do change, then they are not caught, and I am left with:
array = ['0','1','2',...,'1','2','3',...,'header = 2','1','2','3']
Is there a way in which the sub-string "header" can be removed, regardless of what else is in the string?

Best use a list comprehension with a condition instead of repeatedly removing elements. Also, use startswith instead of using a fixed lookup to compare to.
>>> array = ['header = 1','0','1','2','header = 1','1','2','3','header = 2','1','2','3']
>>> [x for x in array if not x.startswith("header")]
['0', '1', '2', '1', '2', '3', '1', '2', '3']
Note that this does not modify the existing list but create a new one, but it should be considerably faster as each single remove has O(n) complexity.
If you do not know what the header string is, you can still determine it from the first element:
>>> lookup = array[0].split()[0] # use first part before space
>>> [x for x in array if not x.startswith(lookup)]
['0', '1', '2', '1', '2', '3', '1', '2', '3']

Using the find() method you can determine whether or not the word "header" is contained in the first list item and use that to determine whether or not to remove the first item.

how to match items between 2 different lists

I have 2 different lists:
['2', '1']
['equals', 'x']
I want to match the items so 2 = "equals" and 1 = "x" in order to recreate the original sentence "x equals x", also i have a third list which is:
['1', '2', '1']
I need the third list to recreate the original sentence since it has all the positions, to do this I thought of making the numbers equal to the words such as 1 = "x" and printing the list of numbers in order to have the full sentence. The problem is i do not know how to make the numbers equal to the words. Thanks for the help in advance

A dictionary might be what you need here which maps keys to values. You can create a dictionary from the first two lists by zipping them. And with this dictionary, it should be fairly straight forward to map any list of numbers to words:
mapping = dict(zip(['2', '1'], ['equals', 'x']))
mapping
# {'1': 'x', '2': 'equals'}
[mapping.get(num) for num in ['1', '2', '1']]
# ['x', 'equals', 'x']
To make the list a sentence, use join method:
" ".join(mapping.get(num) for num in ['1', '2', '1'])
# 'x equals x'

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python set anomaly - python

Strings such as 'hello' are iterable; set() converts iterables into sets. To clarify, set(('1', '1', '2', '3')) == {'1', '2', '3'} set(['1', '1', '2', '3']) == {'1', '2', '3'} set('1123') == {'1', '2', '3'}

Related

How do I collect values into a list in Python standard regex?

Double Nesting defaultdict

Getting elements with dynamic range of list in python

Remove wildcard string from list

how to match items between 2 different lists

Categories

Resources