How to break one line code into multiple descriptive line because i am unable to understand this one line code.
data = formatted_data + "|" + '|'.join(["{}".format(a) for b, a in sorted(values.items()) if a and b not in ['SecureHash']])
Is this correct or not any one help me:
for b, a in sorted(values.items()):
if a and b not in ['SecureHash']:
c = ["{}".format(a)]
data = formatted_data + "|" + "|".join(c)
This code is collecting a string representation of a, and then building another string with it.
You need to define an external list, to account for the list comprehension expression.
c = ["{}".format(a) for b, a in sorted(values.items()) if a and b not in ['SecureHash']]
Further, to break down how c is being assembled, you can expand the list comprehension:
c = []
for b, a in sorted(values.items()):
if a and b not in ['SecureHash']:
c.append('{}'.format(a))
Finally, just combine the three parts:
data = formatted_data + "|" + "|".join(c)
Well, generally you can see any opening bracket and the plus operators in the string as a "breaking point". Working with your example:
data = formatted_data
data += "|"
data += '|'.join(["{}".format(a) for b, a in sorted(values.items()) if a and b
not in ['SecureHash']])
OK so now we need to unpack what's happening in that join:
data = formatted_data
data += "|"
jointmp = ["{}".format(a) for b, a in sorted(values.items()) if a and b not in ['SecureHash']]
data += '|'.join(jointmp)
OK so we've got some string formatting and a bunch of list comprehensions:
data = formatted_data
data += "|"
jointmp = []
for b, a in sorted(values.items()):
if a and b not in ['SecureHash']:
jointmp += ["{}".format(a)] # Equivalent to str(a) ?
data += '|'.join(jointmp)
To do the last stage there was a lot of going back and for as things were expanded. Those list comprehensions are quite terse...
There are some questions here though:
Where did values come from?
What's the "{}".format(a) for?
etc.
Your "expanded" code is not quite equivalent because you don't handle the case where there are no matches / values is empty and you are replacing data each time rather than growing it.
You might want to read up on list comprehensions.
basically it's making a list of things like this:
[item for item in iterable_thing]
so this one is making a list of strings ("{}".format(a)). I assume a is a hash, but let's pretend it's a number in a range:
["{}".format(a) for a in range(5)]
will make:
>>>['0', '1', '2', '3', '4']
Comprehensions can become quite complicated with the addition of if statements, and whoever wrote this code is one of those a in b in i in j kind of people, it seems, so their code is hard to follow. Good variable names are SO important.
Related
How do you concat numerous lists altogether into one single list, when there's a collection of lists assigned to only one variable already?
Most online advices has shown two or many variables to concat together, but mine's only a single variable assigned to many lists. I attempted at a nested For-Loop, but resulted in duplications and incoherent lists. Also attempted with extend and append functions with no success. Maybe I should approach this with Data Frame?
Any help is much appreciated. If you have questions, feel free to ask.
Actual Code:
from bs4 import BeautifulSoup as bs
import requests
import re
from time import sleep
from random import randint
def price():
baseURL='https://www.apartmentlist.com/ca/redwood-city'
r=requests.get(baseURL)
soup=bs(r.content,'html.parser')
block=soup.find_all('div',class_='css-1u6cvl9 e1k7pw6k0')
sleep(randint(2,10))
for properties in block:
priceBlock=properties.find_all('div',class_="css-q23zey e131nafx0")
price=[price.text for price in priceBlock]
strPrice=''.join(price) #Change from list to string type
removed=r'[$]' #Select and remove $ characters
removed2=r'Ask' #Select and remove Ask
removed3=r'[,]' #Select and remove comma
modPrice=re.sub(removed,' ',strPrice) #Substitute $ for '_'
modPrice2=re.sub(removed2,' 0',modPrice) #Substitute Ask for _0
modPrice3=re.sub(removed3,'',modPrice2) #Eliminate space within price
segments=modPrice3.split() #Change string with updates into list, remain clustered
for inserts in segments:
newPrice=[inserts] #Returns values from string to list by brackets.
print(newPrice)
price()
Actual Output:
#After executing the program
['2157']
['2805']
['0']
['1875']
['2800']
['2265']
['2735']
['3985']
...
...
Attempt for:
['2157', '2805', '0', '2800',...] # all the while assigned to a single variable.
Again, any help is appreciated.
The issue in your code is that the "for inserts in segments" loop only takes each price, puts it in its own list, then output the list with only 1 content. So, you need to add all prices to the same list, then after the loop output it.
In your case you can use a list comprehension like this to achieve what you want:
from bs4 import BeautifulSoup as bs
import requests
import re
from time import sleep
from random import randint
def price():
baseURL='https://www.apartmentlist.com/ca/redwood-city'
r=requests.get(baseURL)
soup=bs(r.content,'html.parser')
block=soup.find_all('div',class_='css-1u6cvl9 e1k7pw6k0')
sleep(randint(2,10))
result = []
for properties in block:
priceBlock=properties.find_all('div',class_="css-q23zey e131nafx0")
price=[price.text for price in priceBlock]
strPrice=''.join(price) #Change from list to string type
removed=r'[$]' #Select and remove $ characters
removed2=r'Ask' #Select and remove Ask
removed3=r'[,]' #Select and remove comma
modPrice=re.sub(removed,' ',strPrice) #Substitute $ for '_'
modPrice2=re.sub(removed2,' 0',modPrice) #Substitute Ask for _0
modPrice3=re.sub(removed3,'',modPrice2) #Eliminate space within price
segments=modPrice3.split() #Change string with updates into list, remain clustered
result += [insert for insert in segments]
print(result)
price()
(Hopefully I understood the problem)
If each of the sublists is a variable, you can do one of the following to convert them into a single list:
a = ['2157']
b = ['2805']
c = ['0']
d = ['1875']
e = ['2800']
f = ['2265']
g = ['2735']
h = ['3985']
#Pythonic Way
test = [i[0] for i in [a, b, c, d, e, f, g, h]]
print(test)
#Detailed Way
check = []
for i in a,b,c,d,e,f,g,h:
check.append(i[0])
print(check)
If your function creates lists, then you would just modify the for loops to reference your function:
#Pythonic Way
test = [i[0] for i in YOUR_FUNCTION()]
print(test)
#Detailed Way
check = []
for i in YOUR_FUNCTION():
check.append(i[0])
print(check)
How do you concat numerous lists altogether into one single list, when there's a collection of lists assigned to only one variable already?
In Python, it's common to flatten a list of lists either with a list comprehension or itertools.chain.
from itertools import chain
prices = [
['2157'],
['2805'],
['0'],
['1875'],
['2800'],
['2265'],
['2735'],
['3985'],
]
# list comprehension
[x for row in prices for x in row]
>>> ['2157', '2805', '0', '1875', '2800', '2265', '2735', '3985']
# itertools.chain will return a generator like object
chain.from_iterable(prices)
>>> <itertools.chain at 0x7f01573076a0>
# if you want a list back call list
list(chain.from_iterable(prices))
>>> ['2157', '2805', '0', '1875', '2800', '2265', '2735', '3985']
For your code above the price function is only printing output and not returning an object. You could have the function create an empty list and add to the list each time you loop through the properties. Then return the list.
def price():
# web scrape code
new_price = []
for properties in block:
# processing code
new_price += [x for x in segments]
return chain.from_iterable(new_prices)
I am trying to obtain a list of the names of selected nodes with Python in Nuke.
I have tried:
for s in nuke.selectedNodes():
n = s['name'].value()
print n
This gives me the names of the selected nodes, but as separate strings.
There is nothing I can do to them that will combine each string. If I
have three Merges selected, in the Nuke script editor I get:
Result: Merge3
Merge2
Merge1
If I wrap the last variable n in brackets, I get:
Result: ['Merge3']
['Merge2']
['Merge1']
That's how I know they are separate strings. I found one other way to
return selected nodes. I used:
s = nuke.tcl("selected_nodes")
print s
I get odd names back like node3a7c000, but these names work in anything
that calls a node, like nuke.toNode() and they are all on one line. I
tried to force these results into a list or a tuple, like so:
s = nuke.tcl("selected_nodes")
print s
Result: node3a7c000 node3a7c400 node3a7c800
s = nuke.tcl("selected_nodes")
s2 = s.replace(" ","', '")
s3 = "(" + "'" + s2 + "'" + ")"
print s3
Result: ('node3a7c000', 'node3a7c400', 'node3a7c800')
My result looks to have the standard construct of a tuple, but if I try
to call the first value from the tuple, I get a parentheses back. This
is as if my created tuple is still a string.
Is there anything I can do to gather a list or tuple of selected nodes
names? I'm not sure what I am doing wrong and it seems that my last
solution should have worked.
As you iterate over each node, you'll want to add its name to a list ([]), and then return that. For instance:
names = []
for s in nuke.selectedNodes():
n = s['name'].value()
names.append(n)
print names
This will give you:
# Result: ['Merge3', 'Merge2', 'Merge1']
If you're familiar with list comprehensions, you can also use one to make names in one line:
names = [s['name'].value() for s in nuke.selectedNodes()]
nodename = list()
for node in nuke.selectedNodes():
nodename.append(node.name())
I have a tab-delimited input.txt file like this
A B C
A B D
E F G
E F T
E F K
These are tab-delimited.
I want to remove duplicates only when multiple rows have the same 1st and 2nd columns.
So, even though 1st and 2nd rows are different in 3rd column, they have the same 1st and 2nd columns, so I want to remove "A B D" that appears later.
So output.txt will be like this.
A B C
E F G
If I was to remove duplicates in usual way, I just make the lists into "set" function, and I am all set.
But now I am trying to remove duplicates using only "some" columns.
Using excel, it's just so easy.
Data -> Remove Duplicates -> Select columns
Using MatLab, it's easy, too.
import input.txt -> Use "unique" function with respect to 1st and 2nd columns -> Remove the rows numbered "1"
But using python, I couldn't find how to do this because all I knew about removing duplicate was using "set" in python.
===========================
This is what I experimented following undefined_is_not_a_function's answer.
I am not sure how to overwrite the result to output.txt, and how to alter the code to let me specify the columns to use for duplicate-removing (like 3 and 5).
import sys
input = sys.argv[1]
seen = set()
data = []
for line in input.splitlines():
key = tuple(line.split(None, 2)[0])
if key not in seen:
data.append(line)
seen.add(key)
You should use itertools.groupby for this. Here I am grouping the data based on first first two columns and then using next() to get the first item from each group.
>>> from itertools import groupby
>>> s = '''A B C
A B D
E F G
E F T
E F K'''
>>> for k, g in groupby(s.splitlines(), key=lambda x:x.split()[:2]):
print next(g)
...
A B C
E F G
Simply replace s.splitlines() with file object if input is coming from a file.
Note that the above solution will work only if data is sorted as per first two columns, if that's not the case then you'll have to use a set here.
>>> from operator import itemgetter
>>> ig = itemgetter(0, 1) #Pass any column number you want, note that indexing starts at 0
>>> s = '''A B C
A B D
E F G
E F T
E F K
A B F'''
>>> seen = set()
>>> data = []
>>> for line in s.splitlines():
... key = ig(line.split())
... if key not in seen:
... data.append(line)
... seen.add(key)
...
>>> data
['A B C', 'E F G']
if you have access to a Unix system, sort is a nice utility that is made for your problem.
sort -u -t$'\t' --key=1,2 filein.txt
I know this is a Python question, but sometimes Python is not the tool for the task. And you can always embed a system call in your python script.
from the below code, you can do it.
file_ = open('yourfile.txt')
lst = []
for each_line in file_ .read().split('\n'):
li = each_line .split()
lst.append(li)
dic = {}
for l in lst:
if (l[0], l[1]) not in dic:
dic[(l[0], l[1])] = l[2]
print dic
sorry for variable names.
Assuming that you have already read your object, and that you have an array named rows(tell me if you need help with that), the following code should work:
entries = set()
keys = set()
for row in rows:
key = (row[0], row[1]) # Only the first two columns
if key not in keys:
keys.add(key)
entries.add((row[0], row[1], row[2]))
please notice that I am not an expert but I still have ideas that may help you.
There is a csv module useful for csv files, you might go see there if you find something interesting.
First I would ask how are you storing those datas ? In a list ?
something like
[[A,B,C],
[A,B,D],
[E,F,G],...]
Could be suitable. (maybe not the best choice)
Second, is it possible to go through the whole list ?
You can simply store a line, compare it to all lines.
I would do this :
suposing list contains the letters.
copy = list
index_list = []
for i in range(0, len(list)-1):
for j in range(0, len(list)-1): #and exclude i of course
if copy[i][1] == list[j][1] and copy[i][0] == list[j][0] and i!=j:
index_list.append(j)
for i in index_list: #just loop over the index list and remove
list.pop(index_list[i])
this is not working code but it gives you the idea. It is the simplest idea to perform your task, and not likely the most suitable. (and it will take a while, since you need to perform a quadratic number of operations).
Edit : pop; not remove
I am fairly new to Python. I have a list as follows:
sorted_x = [('pvg-cu2', 50.349189), ('hkg-pccw', 135.14921), ('syd-ipc', 163.441705), ('sjc-inap', 165.722676)]
I am trying to write a regex which will remove everything after the '-' and before the ',', i.e I need the same list to look as below:
[('pvg', 50.349189), ('hkg', 135.14921), ('syd', 163.441705), ('sjc', 165.722676)]
I have written a regex as follows:
for i in range(len(sorted_x)):
title_search = re.search('^\(\'(.*)-(.*)\', (.*)\)$', str(sorted_x[i]), re.IGNORECASE)
if title_search:
title = title_search.group(1)
time = title_search.group(3)
But this requires me to create two new lists and I don't want to change my original list.
Can you please suggest a simple way so that I can modify my original list without creating a new list?
result = [(a.split('-', 1)[0], b) for a, b in sorted_x]
Example:
>>> sorted_x = [('pvg-cu2', 50.349189), ('hkg-pccw', 135.14921), ('syd-ipc', 163.441705), ('sjc-inap', 165.722676)]
>>> [(a.split('-', 1)[0], b) for a, b in sorted_x]
[('pvg', 50.349189000000003), ('hkg', 135.14921000000001), ('syd', 163.44170500000001), ('sjc', 165.72267600000001)]
new to these boards and understand there is protocol and any critique is appreciated. I have begun python programming a few days ago and am trying to play catch-up. The basis of the program is to read a file, convert a specific occurrence of a string into a dictionary of positions within the document. Issues abound, I'll take all responses.
Here is my code:
f = open('C:\CodeDoc\Mm9\sampleCpG.txt', 'r')
cpglist = f.read()
def buildcpg(cpg):
return "\t".join(["%d" % (k) for k in cpg.items()])
lookingFor = 'CG'
i = 0
index = 0
cpgdic = {}
try:
while i < len(cpglist):
index = cpglist.index(lookingFor, i)
i = index + 1
for index in range(len(cpglist)):
if index not in cpgdic:
cpgdic[index] = index
print (buildcpg(cpgdic))
except ValueError:
pass
f.close()
The cpgdic is supposed to act as a dictionary of the position reference obtained in the index. Each read of index should be entering cpgdic as a new value, and the print (buildcpg(cpgdic)) is my hunch of where the logic fails. I believe(??) it is passing cpgdic into the buildcpg function, where it should be returned as an output of all the positions of 'CG', however the error "TypeError:not all arguments converted during string formatting" shows up. Your turn!
ps. this destroys my 2GB memory; I need to improve with much more reading
cpg.items is yielding tuples. As such, k is a tuple (length 2) and then you're trying to format that as a single integer.
As a side note, you'll probably be a bit more memory efficient if you leave off the [ and ] in the join line. This will turn your list comprehension to a generator expression which is a bit nicer. If you're on python2.x, you could use cpg.iteritems() instead of cpg.items() as well to save a little memory.
It also makes little sense to store a dictionary where the keys and the values are the same. In this case, a simple list is probably more elegant. I would probably write the code this way:
with open('C:\CodeDoc\Mm9\sampleCpG.txt') as fin:
cpgtxt = fin.read()
indices = [i for i,_ in enumerate(cpgtxt) if cpgtxt[i:i+2] == 'CG']
print '\t'.join(indices)
Here it is in action:
>>> s = "CGFOOCGBARCGBAZ"
>>> indices = [i for i,_ in enumerate(s) if s[i:i+2] == 'CG']
>>> print indices
[0, 5, 10]
Note that
i for i,_ in enumerate(s)
is roughly the same thing as
i for i in range(len(s))
except that I don't like range(len(s)) and the former version will work with any iterable -- Not just sequences.