I have scraped a list of prices from a site that I want to get the average on. And correct me if I am wrong but my assumption is that the data needs to not have dollar signs to be a be added up to get the total sum so that it can be used to get the average price of the list.
My attempts include but are not limited to using a for loop to slice the 0 index off each list item.
for i in clean:
i = i[1:]
i also originally tried just running it without creating a variable but it does literally nothing to the output of printing the clean list
for i in clean:
i = i[1:]
example list of current list i have:
clean = [$123.56, $234.56, $561.12]
What I would like the output of the cleaned up list to be:
[123.56, 234.56, 561.12]
You don't actually have to use enumerate. Here is a very simple solution to your problem.
clean = ['$123.56', '$234.56', '$561.12']
result = []
for i in clean:
result.append(float(i[1:]))
print(result) # [123.56, 234.56, 561.12]
You should use the library 're':
import re
trim = re.compile(r'[^\d.,]+')
my_string = '$12.56' #works also with USD or other currency, with space or not
result = trim.sub('',my_string)
print(result)
>>> 12.56
For a list:
my_list = ['$123.56', '$234.56', '$561.12']
list_without_currency = [float(trim.sub('',e)) for e in my_list]
>>> [123.56, 234.56, 561.12]
EDIT:
see also: this (SO)
Related
How do you concat numerous lists altogether into one single list, when there's a collection of lists assigned to only one variable already?
Most online advices has shown two or many variables to concat together, but mine's only a single variable assigned to many lists. I attempted at a nested For-Loop, but resulted in duplications and incoherent lists. Also attempted with extend and append functions with no success. Maybe I should approach this with Data Frame?
Any help is much appreciated. If you have questions, feel free to ask.
Actual Code:
from bs4 import BeautifulSoup as bs
import requests
import re
from time import sleep
from random import randint
def price():
baseURL='https://www.apartmentlist.com/ca/redwood-city'
r=requests.get(baseURL)
soup=bs(r.content,'html.parser')
block=soup.find_all('div',class_='css-1u6cvl9 e1k7pw6k0')
sleep(randint(2,10))
for properties in block:
priceBlock=properties.find_all('div',class_="css-q23zey e131nafx0")
price=[price.text for price in priceBlock]
strPrice=''.join(price) #Change from list to string type
removed=r'[$]' #Select and remove $ characters
removed2=r'Ask' #Select and remove Ask
removed3=r'[,]' #Select and remove comma
modPrice=re.sub(removed,' ',strPrice) #Substitute $ for '_'
modPrice2=re.sub(removed2,' 0',modPrice) #Substitute Ask for _0
modPrice3=re.sub(removed3,'',modPrice2) #Eliminate space within price
segments=modPrice3.split() #Change string with updates into list, remain clustered
for inserts in segments:
newPrice=[inserts] #Returns values from string to list by brackets.
print(newPrice)
price()
Actual Output:
#After executing the program
['2157']
['2805']
['0']
['1875']
['2800']
['2265']
['2735']
['3985']
...
...
Attempt for:
['2157', '2805', '0', '2800',...] # all the while assigned to a single variable.
Again, any help is appreciated.
The issue in your code is that the "for inserts in segments" loop only takes each price, puts it in its own list, then output the list with only 1 content. So, you need to add all prices to the same list, then after the loop output it.
In your case you can use a list comprehension like this to achieve what you want:
from bs4 import BeautifulSoup as bs
import requests
import re
from time import sleep
from random import randint
def price():
baseURL='https://www.apartmentlist.com/ca/redwood-city'
r=requests.get(baseURL)
soup=bs(r.content,'html.parser')
block=soup.find_all('div',class_='css-1u6cvl9 e1k7pw6k0')
sleep(randint(2,10))
result = []
for properties in block:
priceBlock=properties.find_all('div',class_="css-q23zey e131nafx0")
price=[price.text for price in priceBlock]
strPrice=''.join(price) #Change from list to string type
removed=r'[$]' #Select and remove $ characters
removed2=r'Ask' #Select and remove Ask
removed3=r'[,]' #Select and remove comma
modPrice=re.sub(removed,' ',strPrice) #Substitute $ for '_'
modPrice2=re.sub(removed2,' 0',modPrice) #Substitute Ask for _0
modPrice3=re.sub(removed3,'',modPrice2) #Eliminate space within price
segments=modPrice3.split() #Change string with updates into list, remain clustered
result += [insert for insert in segments]
print(result)
price()
(Hopefully I understood the problem)
If each of the sublists is a variable, you can do one of the following to convert them into a single list:
a = ['2157']
b = ['2805']
c = ['0']
d = ['1875']
e = ['2800']
f = ['2265']
g = ['2735']
h = ['3985']
#Pythonic Way
test = [i[0] for i in [a, b, c, d, e, f, g, h]]
print(test)
#Detailed Way
check = []
for i in a,b,c,d,e,f,g,h:
check.append(i[0])
print(check)
If your function creates lists, then you would just modify the for loops to reference your function:
#Pythonic Way
test = [i[0] for i in YOUR_FUNCTION()]
print(test)
#Detailed Way
check = []
for i in YOUR_FUNCTION():
check.append(i[0])
print(check)
How do you concat numerous lists altogether into one single list, when there's a collection of lists assigned to only one variable already?
In Python, it's common to flatten a list of lists either with a list comprehension or itertools.chain.
from itertools import chain
prices = [
['2157'],
['2805'],
['0'],
['1875'],
['2800'],
['2265'],
['2735'],
['3985'],
]
# list comprehension
[x for row in prices for x in row]
>>> ['2157', '2805', '0', '1875', '2800', '2265', '2735', '3985']
# itertools.chain will return a generator like object
chain.from_iterable(prices)
>>> <itertools.chain at 0x7f01573076a0>
# if you want a list back call list
list(chain.from_iterable(prices))
>>> ['2157', '2805', '0', '1875', '2800', '2265', '2735', '3985']
For your code above the price function is only printing output and not returning an object. You could have the function create an empty list and add to the list each time you loop through the properties. Then return the list.
def price():
# web scrape code
new_price = []
for properties in block:
# processing code
new_price += [x for x in segments]
return chain.from_iterable(new_prices)
I am trying to put my results into a list.
Here is my code:
from ise import ERS
l = ise.get_endpoints(groupID=my_group_id)['response']
Here is my output:
[('AA:BB:CD', 'cvr5667'), ('AA:BB:CC', '8888')]
Here is my desired output which is a list of just the first elements of inside the parentheses:
['AA:BB:CD','AA:BB:CC']
I am new at python and working with lists/dicts so any suggestions would. All I am trying to do it put the first elements inside the parentheses in one list like i showed.
Using list comprehension (as suggested in comments too):
lst_1 = [('AA:BB:CD', 'cvr5667'), ('AA:BB:CC', '8888')]
lst_result = [i[0] for i in lst_1]
With something like this ?
result = [('AA:BB:CD', 'cvr5667'), ('AA:BB:CC', '8888')]
first_elements_to_list = [tmp[0] for tmp in result]
print(first_elements_to_list)
print:
['AA:BB:CD', 'AA:BB:CC']
I am trying to write a Python program as following:
list_a = []
list_b = []
list_c = []
listName = str(input('insert list name: '))
listValue = int(input('insert value: '))
listName.append(listValue)
But unfortunately "listName.append()" won't work.
Using only IF functions as in:
if listName == 'list_a':
list_a.append(listValue)
is impractical because I am actually working with 600+ lists...
Is there any other way I could do to make something like this work properly??
Help is much appreciated!
Thank you very much in advance!!
When you're tempted to use variable names to hold data — like the names of stocks — you should almost certainly be using a dictionary with the data as keys. You can still pre-populate the dictionary if you want to (although you don't have to).
You can change your existing to code to something like:
# all the valid names
names = ['list_a', 'list_b', 'list_c']
# pre-populate a dictionary with them
names = {name:[] for name in names}
# later you can add data to the arrays:
listName = str(input('insert list name: '))
listValue = int(input('insert value: '))
# append value
names[listName].append(listValue)
With this, all your data is in one structure. It will be easy to loop through it, make summary statistics like sums/averages act. Having a dictionary with 600 keys is no big deal, but having 600 individual variables is a recipe for a mess.
This will raise a key error if you try to add a name that doesn't exists, which you can catch in a try block if that's a possibility.
Keep your lists in a dict. You could initialize your dict from a file, db, etc.
lists = {"GOOG": []}
listName = str(input('insert list name: '))
listValue = int(input('insert value: '))
lists.setdefault(listName,[]).append(listValue)
# Just a little output to see what you've got in there...
for list_name, list_value in lists.items():
print(list_name, list_value)
So, following Mark Meyer's and MSlimmer's suggestions above, I am using a dictionary of lists to simplify data input, which has made this section of my program work flawlessly (thanks again, guys!).
However, I am experiencing problems with the next section of my code (which was working before when I had it all as lists haha). I have a dictionary as below:
names = {'list_a':[5, 7, -3], 'list_b':[10, 12, -10, -10]}
Now, I have to add up all positives and all negatives to each list. I want it to have the following result:
names_positives = {'list_a': 12, 'list_b': 22}
names_negatives = {'list_a': -3, 'list_b': -20}
I have tried three different ways, but none of them worked:
## first try:
names_positives = sum(a for a in names.values() if a > 0)
## second try:
names_positives = []
names_positives.append(a for a in names.values() if compras > 0)
## third try:
names_positives = dict(zip(names.keys(), [[sum(a)] for a in names.values() if compra > 0]))
To be honest, I have no idea how to proceed -- I am getting errors due to mixing strings and integers in those lines, and I am not being able to work some way around this problem... Any help is much appreciated. It could result in a dictionary, in a list or even in only the total sum (as in the first try, under no better circumstances).
Cheers!
You can try this one:
I just added one line for your code to work.
list_a = ['a']
list_b = []
list_c = []
listName = str(input('insert list name: '))
listValue = int(input('insert value: '))
eval(listName).append(listValue)
the eval function evaluates the string into an actual python expression. It should be noted however that there are security issues regarding the use of eval. But for the exact question that you were trying to answer, this would be the easiest way that wouldn't require much refactoring of the existing code.
I am trying to set up a data set that checks how often several different names are mentioned in a list of articles. So for each article, I want to know how often nameA, nameB and so forth are mentioned. However, I have troubles with iterating over the list.
My code is the following:
for element in list_of_names:
for i in list_of_articles:
list_of_namecounts = len(re.findall(element, i))
list_of_names = a string with several names [nameA nameB nameC]
list_of_articles = a list with 40.000 strings that are articles
Example of article in list_of_articles:
Index: 1
Type: str
Size: Amsterdam - de financiële ...
the error i get is: expected string or buffer
I though that when iterating over the list of strings, that the re.findall command should work using lists like this, but am also fairly new to Python. Any idea how to solve my issue here?
Thank you!
If your list is ['apple', 'apple', 'banana'] and you want the result: number of apple = 2, then:
from collections import Counter
list_count = Counter(list_of_articles)
for element in list_of_names:
list_of_namecounts = list_count[element]
And assuming list_of_namecounts is a list ¿?
list_of_namecounts = []
for element in list_of_names:
list_of_namecounts.append(list_count[element])
See this for more understanding
I have this part of code isolated for testing purposes and this question
noTasks = int(input())
noOutput = int(input())
outputClist = []
outputCList = []
for i in range(0, noTasks):
for w in range(0, noOutput):
outputChecked = str(input())
outputClist.append(outputChecked)
outputCList.append(outputClist)
outputClist[:] = []
print(outputCList)
I have this code here, and i get this output
[[], []]
I can't figure out how to get the following output, and i must clear that sublist or i get something completely wrong...
[["test lol", "here can be more stuff"], ["test 2 lol", "here can be more stuff"]]
In Python everything is a object. A list is a object with elements. You only create one object outputclist filling and clearing its contents. In the end, you have one list multiple times in outputCList, and as your last thing is clearing the list, this list is empty.
Instead, you have to create a new list for every task:
noTasks = int(input())
noOutput = int(input())
output = []
for i in range(noTasks):
checks = []
for w in range(noOutput):
checks.append(input())
output.append(checks)
print(output)
Instead of passing the contained elements in outputClist to outputCList (not the greatest naming practice either to just have one capitalization partway through be the only difference in variable names), you are passing a reference to the list itself. To get around this important and useful feature of Python that you don't want to make use of, you can pretty easily just pass a new list containing the elements of outputClist by changing this line
outputCList.append(outputClist)
to
outputCList.append(list(outputClist))
or equivalently, as #jonrsharpe states in his comment
outputCList.append(outputClist[:])