I am trying to set up a data set that checks how often several different names are mentioned in a list of articles. So for each article, I want to know how often nameA, nameB and so forth are mentioned. However, I have troubles with iterating over the list.
My code is the following:
for element in list_of_names:
for i in list_of_articles:
list_of_namecounts = len(re.findall(element, i))
list_of_names = a string with several names [nameA nameB nameC]
list_of_articles = a list with 40.000 strings that are articles
Example of article in list_of_articles:
Index: 1
Type: str
Size: Amsterdam - de financiële ...
the error i get is: expected string or buffer
I though that when iterating over the list of strings, that the re.findall command should work using lists like this, but am also fairly new to Python. Any idea how to solve my issue here?
Thank you!
If your list is ['apple', 'apple', 'banana'] and you want the result: number of apple = 2, then:
from collections import Counter
list_count = Counter(list_of_articles)
for element in list_of_names:
list_of_namecounts = list_count[element]
And assuming list_of_namecounts is a list ¿?
list_of_namecounts = []
for element in list_of_names:
list_of_namecounts.append(list_count[element])
See this for more understanding
Related
Hey (Sorry bad english) so am going to try and make my question more clear. if i have a function let's say create_username_dict(name_list, username_list). which takes in two list's 1 being the name_list with names of people than the other list being usernames that is made out of the names of people. what i want to do is take does two list than convert them to a dictonary and set them together.
like this:
>>> name_list = ["Ola Nordmann", "Kari Olsen", "Roger Jensen"]
>>> username_list = ["alejon", "carli", "hanri"]
>>> create_username_dict(name_list, username_list)
{
"Albert Jones": "alejon",
"Carlos Lion": "carli",
"Hanna Richardo": "hanri"
}
i have tried look around on how to connect two different list in too one dictonary, but can't seem to find the right solution
If both lists are in matching order, i.e. the i-th element of one list corresponds to the i-th element of the other, then you can use this
D = dict(zip(name_list, username_list))
Use zip to pair the list.
d = {key: value for key,value in zip(name_list, username_list)}
print(d)
Output:
{'Ola Nordmann': 'alejon', 'Kari Olsen': 'carli', 'Roger Jensen': 'hanri'}
Considering both the list are same length and one to one mapping
name_list = ["Ola Nordmann", "Kari Olsen", "Roger Jensen"]
username_list = ["alejon", "carli", "hanri"]
result_stackoverflow = dict()
for index, name in enumerate(name_list):
result_stackoverflow[name] = username_list[index]
print(result_stackoverflow)
>>> {'Ola Nordmann': 'alejon', 'Kari Olsen': 'carli', 'Roger Jensen': 'hanri'}
Answer by #alex does the same but maybe too encapsulated for a beginner. So this is the verbose version.
Could you help me with the following challenge I am currently facing:
I have multiple lists, each of which contains multiple strings. Each string has the following format:
"ID-Type" - where ID is a number and type is a common Python type. One such example can be found here:
["1-int", "2-double", "1-string", "5-list", "5-int"],
["3-string", "1-int", "1-double", "5-double", "5-string"]
Before calculating further, I now want to preprocess these list to unify them the following way:
Count how often each type is appearing in each list
Generate a new list, combining both results
Create a mapping from initial list to that new list
As an example
In the above lists, we have the following types:
List 1: 2 int, 1 double, 1 string, 1 list
List 2: 2 string, 2 double, 1 int
The resulting table should now contain:
2 int, 2 double, 2 string, 1 list (in order to be able to contain both lists), like this:
[
"int_1-int",
"int_2-int",
"double_1-double",
"double_2-double",
"string_1-string",
"string_2-string",
"list_1-list"
]
And lastly, in order to map input to output, the idea is to have a corresponding dictionary to map this transformation, e.g., for list_1:
{
"1-int": "int_1-int",
"2-double": "double_1-double",
"1-string": "string_1-string",
"5-list": "list_1-list",
"5-int": "int_2-int"
}
I want to prevent to do this with a nested loop and multiple iterations - are there any libraries or is there maybe a smart vectorized solution to address this challenge?
Just add them:
Example :
['it'] + ['was'] + ['annoying']
You should read the Python tutorial to learn basic info like this.
Just another method....
import itertools
ab = itertools.chain(['it'], ['was'], ['annoying'])
list(ab)
Just add them: Example :
['it'] + ['was'] + ['annoying']
You should read the Python tutorial to learn basic info like this.
Just another method....
import itertools
ab = itertools.chain(['it'], ['was'], ['annoying'])
list(ab)
In general, this approach doesn't really make sense unless you specifically need to have the items in the resulting list and dict in this exact format. But here's how you can do it:
def process_type_list(type_list):
mapping = dict()
for i in type_list:
i_type = i.split('-')[1]
n_occur = 1
map_val = f'{i_type}_{n_occur}-{i_type}'
while map_val in mapping.values():
n_occur += 1
map_val = f'{i_type}_{n_occur}-{i_type}'
mapping[i] = map_val
return mapping
l1 = ["1-int", "2-double", "1-string", "5-list", "5-int"]
l2 = ["3-string", "1-int", "1-double", "5-double", "5-string"]
l1_mapping = process_type_list(l1)
l2_mapping = process_type_list(l2)
Additionally, Python does not have a double type. C doubles are implemented as Python floats (or decimal.Decimal if you need fine control over the precision)
I am pretty sure that this is what you want to do:
To make a joint list:
['any item'] + ['any item 2']
If you want to turn the list into a dictionary:
dict(zip(['key 1', 'key 2'], ['value 1', 'value 2']))
Another method of joining 2 lists:
a = ['list item', 'another list item']
a.extend(['another list item', 'another list item'])
I have three lists:
id = [1,3,4]
text = ["hello","hola","salut"]
date = ["20-12-2020","21-04-2018","15-04-2016"]
#I then combined it all in one list:
new_list = zip(id, text, date)
#which looks like [(1,"hello","20-12-2020"),(3,"hola","21-04-2018"),(4,"salut","15-04-2016")
I want to delete the whole list if it is not in english, do to this i installed lang id and am using lang id.classify
I ran a loop on only the text and its working but am unsure how to delete the whole value such as: (3,"hola","21-04-2018") as hola is not in english.
I am trying to achieve a new list which only has those lists in it that is only english. I want to further write the output list in a xml file.
To do that I have made a sample xml file and am using the date as a parent key as the date can be same for multiple texts.
Try this simple for loop
new_list = [(1,"hello","20-12-2020"),(3,"hola","21-04-2018"),(4,"salut","15-04-2016")]
for x in new_list:
# condition to check if word or sentence is english
if not isEnglishWord(x[1]):
new_list.pop(x)
Not sure how lang id.classify works or what parameters it takes in but something like this should work:
for i in range(len(new_list)):
if id.classify(new_list[i][1]) != 'english':
new_list.pop[i]
In this case, I'm assuming id.classify takes in a str and outputs which language the word belongs (as a str).
I'm also using the range list method to iterate so we don't end up changing the list as we are iterating over it.
I have scraped a list of prices from a site that I want to get the average on. And correct me if I am wrong but my assumption is that the data needs to not have dollar signs to be a be added up to get the total sum so that it can be used to get the average price of the list.
My attempts include but are not limited to using a for loop to slice the 0 index off each list item.
for i in clean:
i = i[1:]
i also originally tried just running it without creating a variable but it does literally nothing to the output of printing the clean list
for i in clean:
i = i[1:]
example list of current list i have:
clean = [$123.56, $234.56, $561.12]
What I would like the output of the cleaned up list to be:
[123.56, 234.56, 561.12]
You don't actually have to use enumerate. Here is a very simple solution to your problem.
clean = ['$123.56', '$234.56', '$561.12']
result = []
for i in clean:
result.append(float(i[1:]))
print(result) # [123.56, 234.56, 561.12]
You should use the library 're':
import re
trim = re.compile(r'[^\d.,]+')
my_string = '$12.56' #works also with USD or other currency, with space or not
result = trim.sub('',my_string)
print(result)
>>> 12.56
For a list:
my_list = ['$123.56', '$234.56', '$561.12']
list_without_currency = [float(trim.sub('',e)) for e in my_list]
>>> [123.56, 234.56, 561.12]
EDIT:
see also: this (SO)
I'm trying to manipulate a list of items in python but im getting the error "AttributeError: 'list' object has no attribute 'split'"
I understand that list does not understand .split but i don't know what else to do. Below is a copy paste of the relevant part of my code.
tourl = 'http://data.bitcoinity.org/chart_data'
tovalues = {'timespan':'24h','resolution':'hour','currency':'USD','exchange':'all','mining_pool':'all','compare':'no','data_type':'price_volume','chart_type':'line_bar','smoothing':'linear','chart_types':'ccacdfcdaa'}
todata = urllib.urlencode(tovalues)
toreq = urllib2.Request(tourl, todata)
tores = urllib2.urlopen(toreq)
tores2 = tores.read()
tos = json.loads(tores2)
tola = tos["data"]
for item in tola:
ting = item.get("values")
ting.split(',')[2] <-----ERROR
print(ting)
To understand what i'm trying to do you will also need to see the json data. Ting outputs this:
[
[1379955600000L, 123.107310846774], [1379959200000L, 124.092526428571],
[1379962800000L, 125.539504822835], [1379966400000L, 126.27024617931],
[1379970000000L, 126.723474983766], [1379973600000L, 126.242406356837],
[1379977200000L, 124.788410570987], [1379980800000L, 126.810084904632],
[1379984400000L, 128.270580796748], [1379988000000L, 127.892411269036],
[1379991600000L, 126.140579640523], [1379995200000L, 126.513705084746],
[1379998800000L, 128.695124951923], [1380002400000L, 128.709738051044],
[1380006000000L, 125.987767097378], [1380009600000L, 124.323433535528],
[1380013200000L, 123.359378559603], [1380016800000L, 125.963250678733],
[1380020400000L, 125.074618194444], [1380024000000L, 124.656345088853],
[1380027600000L, 122.411303435449], [1380031200000L, 124.145747100372],
[1380034800000L, 124.359452274881], [1380038400000L, 122.815357211394],
[1380042000000L, 123.057706915888]
]
[
[1379955600000L, 536.4739135], [1379959200000L, 1235.42506637],
[1379962800000L, 763.16329656], [1379966400000L, 804.04579319],
[1379970000000L, 634.84689741], [1379973600000L, 753.52716718],
[1379977200000L, 506.90632968], [1379980800000L, 494.473732950001],
[1379984400000L, 437.02095093], [1379988000000L, 176.25405034],
[1379991600000L, 319.80432715], [1379995200000L, 206.87212398],
[1379998800000L, 638.47226435], [1380002400000L, 438.18036666],
[1380006000000L, 512.68490443], [1380009600000L, 904.603705539997],
[1380013200000L, 491.408088450001], [1380016800000L, 670.275397960001],
[1380020400000L, 767.166941339999], [1380024000000L, 899.976089609997],
[1380027600000L, 1243.64963909], [1380031200000L, 1508.82429811],
[1380034800000L, 1190.18854705], [1380038400000L, 546.504592349999],
[1380042000000L, 206.84883264]
]
And ting[0] outputs this:
[1379955600000L, 123.187067936508]
[1379955600000L, 536.794013499999]
What i'm really trying to do is add up the values from ting[0-24] that comes AFTER the second comma. This made me try to do a split but that does not work
You already have a list; the commas are put there by Python to delimit the values only when printing the list.
Just access element 2 directly:
print ting[2]
This prints:
[1379962800000, 125.539504822835]
Each of the entries in item['values'] (so ting) is a list of two float values, so you can address each of those with index 0 and 1:
>>> print ting[2][0]
1379962800000
>>> print ting[2][1]
125.539504822835
To get a list of all the second values, you could use a list comprehension:
second_vals = [t[1] for t in ting]
When you load the data with json.loads, it is already parsed into a real list that you can slice and index as normal. If you want the data starting with the third element, just use ting[2:]. (If you just want the third element by itself, just use ting[2].)