Python - intersect list of dictionaries

Python - intersect list of dictionaries - python

I have this list of dictionaries:
artist_and_tags = [{u'Yo La Tengo': ['indie', 'indie rock', 'seen live', 'alternative', 'indie pop', 'rock', 'post-rock', 'dream pop', 'shoegaze', 'noise pop', 'folk', 'experimental', 'alternative rock', 'american', 'lo-fi', 'pop', 'new jersey', 'yo la tengo', 'usa', 'noise rock', '90s', 'noise', '00s', 'ambient', 'post-punk', '80s', 'mellow', 'psychedelic', 'hoboken', 'experimental rock', 'singer-songwriter', 'post rock', 'electronic', 'female vocalists', 'alt-country', 'dreamy', 'matador', 'chillout', 'instrumental', 'favorites', 'punk', 'electronica', 'slowcore', 'folk rock', 'new wave', 'jazz', 'eclectic', 'new york', 'emo']}, {u'Radiohead': ['alternative', 'alternative rock', 'rock', 'indie', 'electronic', 'seen live', 'british', 'britpop', 'indie rock', 'experimental', 'radiohead', 'progressive rock', '90s', 'electronica', 'art rock', 'experimental rock', 'post-rock', 'psychedelic', 'uk', 'male vocalists', 'pop', '00s', 'ambient', 'chillout', 'progressive', 'favorites', 'melancholic', 'awesome', 'overrated', 'english', 'beautiful', 'classic rock', 'genius', 'melancholy', 'better than radiohead', 'trip-hop', 'idm', 'indie pop', 'emo']}, {u'Portishead': ['trip-hop', 'electronic', 'female vocalists', 'chillout', 'trip hop', 'alternative', 'electronica', 'seen live', 'downtempo', 'british', 'indie', 'portishead', 'experimental', 'ambient', 'female vocalist', 'alternative rock', '90s', 'lounge', 'mellow', 'bristol', 'jazz', 'psychedelic', 'chill', 'melancholic', 'triphop', 'uk', 'rock', 'bristol sound', 'acid jazz', 'lo-fi']}]
which I'm using to get relatedness between artists.
for that, I'm doing:
tags0 = set(artist_and_tags[0].values()[0])
tags1 = set(artist_and_tags[1].values()[0])
tags2 = set(artist_and_tags[2].values()[0])
then:
intersection1 = tags0 & tags1
intersection2 = tags0 & tags2
intersection3 = tags1 & tags2
so:
print (intersection1, len(intersection1), intersection2, len(intersection), intersection3, len(intersection3))
shows me that "Yo La Tengo" is closer to "Radiohead" than "Portishead", with 20 intersected tags.
this code seems a bit redundant, however...
QUESTION:
Is there a way to use this logic in a for loop (or wrapped in a simple function), so it works with a dictionary with n artists(keys)?

You can use itertools.combinations.
import itertools
import collections
ArtistTags = collections.namedtuple('ArtistTags', ('name', 'tags'))
tags = (ArtistTags(artist, set(tags))
for artists_dict in artist_and_tags
for artist, tags in artists_dict.items())
artist_pairings = itertools.combinations(tags, 2)
intersections = ((len(a.tags & b.tags), a, b) for a, b in artist_pairings)
for n, a, b in sorted(intersections, reverse=True):
print(n, a.name, b.name)
output:
20 Yo La Tengo Radiohead
16 Yo La Tengo Portishead
16 Radiohead Portishead

Related

Creating undirected unweighted graph from dictionary containing neighborhood relationship

I have a Python dictionary that looks like this:
{'Aitkin': ['Carlton', 'Cass', 'Crow Wing', 'Itasca',
'Kanabec', 'Mille Lacs', 'Pine', 'St. Louis'], 'Anoka':
['Chisago', 'Hennepin', 'Isanti', 'Ramsey', 'Sherburne',
'Washington'], 'Becker': ['Clay', 'Clearwater', 'Hubbard',
'Mahnomen', 'Norman', 'Otter Tail', 'Wadena'], 'Beltrami':
['Cass', 'Clearwater', 'Hubbard', 'Itasca', 'Koochiching',
'Lake of the Woods', 'Marshall', 'Pennington', 'Roseau'],
'Benton': ['Mille Lacs', 'Morrison', 'Sherburne', 'Stearns'], 'Big
Stone': ['Lac qui Parle', 'Stevens', 'Swift', 'Traverse'], 'Blue
Earth': ['Brown', 'Faribault', 'Le Sueur', 'Martin',
'Nicollet', 'Waseca', 'Watonwan'], 'Brown': ['Blue Earth',
'Cottonwood', 'Nicollet', 'Redwood', 'Renville', 'Watonwan'],
'Carlton': ['Aitkin', 'Pine', 'St. Louis'], 'Carver': ['Hennepin',
'McLeod', 'Scott', 'Sibley', 'Wright'], 'Cass': ['Aitkin',
'Beltrami', 'Crow Wing', 'Hubbard', 'Itasca', 'Morrison',
'Todd', 'Wadena'], 'Chippewa': ['Kandiyohi', 'Lac qui Parle',
'Renville', 'Swift', 'Yellow Medicine'], 'Chisago': ['Anoka',
'Isanti', 'Kanabec', 'Pine', 'Washington'], 'Clay': ['Becker',
'Norman', 'Otter Tail', 'Wilkin'], 'Clearwater': ['Becker',
'Beltrami', 'Hubbard', 'Mahnomen', 'Pennington', 'Polk'],
'Cook': ['Lake'], 'Cottonwood': ['Brown', 'Jackson', 'Murray',
'Nobles', 'Redwood', 'Watonwan'], 'Crow Wing': ['Aitkin', 'Cass',
'Mille Lacs', 'Morrison'], 'Dakota': ['Goodhue', 'Hennepin',
'Ramsey', 'Rice', 'Scott', 'Washington'], 'Dodge': ['Goodhue',
'Mower', 'Olmsted', 'Rice', 'Steele'], 'Douglas': ['Grant', 'Otter
Tail', 'Pope', 'Stearns', 'Stevens', 'Todd'], 'Faribault': ['Blue
Earth', 'Freeborn', 'Martin', 'Waseca'], 'Fillmore': ['Houston',
'Mower', 'Olmsted', 'Winona'], 'Freeborn': ['Faribault', 'Mower',
'Steele', 'Waseca'], 'Goodhue': ['Dakota', 'Dodge', 'Olmsted',
'Rice', 'Wabasha'], 'Grant': ['Douglas', 'Otter Tail', 'Pope',
'Stevens', 'Traverse', 'Wilkin'], 'Hennepin': ['Anoka', 'Carver',
'Dakota', 'Ramsey', 'Scott', 'Sherburne', 'Wright'],
'Houston': ['Fillmore', 'Winona'], 'Hubbard': ['Becker', 'Beltrami',
'Cass', 'Clearwater', 'Wadena'], 'Isanti': ['Anoka', 'Chisago',
'Kanabec', 'Mille Lacs', 'Pine', 'Sherburne'], 'Itasca': ['Aitkin',
'Beltrami', 'Cass', 'Koochiching', 'St. Louis'], 'Jackson':
['Cottonwood', 'Martin', 'Nobles', 'Watonwan'], 'Kanabec': ['Aitkin',
'Chisago', 'Isanti', 'Mille Lacs', 'Pine'], 'Kandiyohi': ['Chippewa',
'Meeker', 'Pope', 'Renville', 'Stearns', 'Swift'], 'Kittson':
['Marshall', 'Roseau'], 'Koochiching': ['Beltrami', 'Itasca', 'Lake
of the Woods', 'St. Louis'], 'Lac qui Parle': ['Big Stone',
'Chippewa', 'Swift', 'Yellow Medicine'], 'Lake': ['Cook', 'St.
Louis'], 'Lake of the Woods': ['Beltrami', 'Koochiching', 'Roseau'],
'Le Sueur': ['Blue Earth', 'Nicollet', 'Rice', 'Scott', 'Sibley',
'Waseca'], 'Lincoln': ['Lyon', 'Pipestone', 'Yellow Medicine'],
'Lyon': ['Lincoln', 'Murray', 'Pipestone', 'Redwood', 'Yellow
Medicine'], 'Mahnomen': ['Becker', 'Clearwater', 'Norman', 'Polk'],
'Marshall': ['Beltrami', 'Kittson', 'Pennington', 'Polk', 'Roseau'],
'Martin': ['Blue Earth', 'Faribault', 'Jackson', 'Watonwan'],
'McLeod': ['Carver', 'Meeker', 'Renville', 'Sibley', 'Wright'],
'Meeker': ['Kandiyohi', 'McLeod', 'Renville', 'Stearns', 'Wright'],
'Mille Lacs': ['Aitkin', 'Benton', 'Crow Wing', 'Isanti',
'Kanabec', 'Morrison', 'Sherburne'], 'Morrison': ['Benton',
'Cass', 'Crow Wing', 'Mille Lacs', 'Stearns', 'Todd'], 'Mower':
['Dodge', 'Fillmore', 'Freeborn', 'Olmsted', 'Steele'], 'Murray':
['Cottonwood', 'Lyon', 'Nobles', 'Pipestone', 'Redwood', 'Rock'],
'Nicollet': ['Blue Earth', 'Brown', 'Le Sueur', 'Renville', 'Sibley'],
'Nobles': ['Cottonwood', 'Jackson', 'Murray', 'Rock'], 'Norman':
['Becker', 'Clay', 'Mahnomen', 'Polk'], 'Olmsted': ['Dodge',
'Fillmore', 'Goodhue', 'Mower', 'Wabasha', 'Winona'], 'Otter Tail':
['Becker', 'Clay', 'Douglas', 'Grant', 'Wadena', 'Wilkin'],
'Pennington': ['Beltrami', 'Clearwater', 'Marshall', 'Polk', 'Red
Lake'], 'Pine': ['Aitkin', 'Carlton', 'Chisago', 'Isanti',
'Kanabec'], 'Pipestone': ['Lincoln', 'Lyon', 'Murray', 'Rock'],
'Polk': ['Clearwater', 'Mahnomen', 'Marshall', 'Norman',
'Pennington', 'Red Lake'], 'Pope': ['Douglas', 'Grant',
'Kandiyohi', 'Stearns', 'Stevens', 'Swift'], 'Ramsey': ['Anoka',
'Dakota', 'Hennepin', 'Washington'], 'Red Lake': ['Pennington',
'Polk'], 'Redwood': ['Brown', 'Cottonwood', 'Lyon', 'Murray',
'Renville', 'Yellow Medicine'], 'Renville': ['Brown', 'Chippewa',
'Kandiyohi', 'McLeod', 'Meeker', 'Nicollet', 'Redwood',
'Sibley', 'Yellow Medicine'], 'Rice': ['Dakota', 'Dodge',
'Goodhue', 'Le Sueur', 'Scott', 'Steele', 'Waseca'], 'Rock':
['Murray', 'Nobles', 'Pipestone'], 'Roseau': ['Beltrami', 'Kittson',
'Lake of the Woods', 'Marshall'], 'Scott': ['Carver', 'Dakota',
'Hennepin', 'Le Sueur', 'Rice', 'Sibley'], 'Sherburne': ['Anoka',
'Benton', 'Hennepin', 'Isanti', 'Mille Lacs', 'Stearns',
'Wright'], 'Sibley': ['Carver', 'Le Sueur', 'McLeod', 'Nicollet',
'Renville', 'Scott'], 'St. Louis': ['Aitkin', 'Carlton', 'Itasca',
'Koochiching', 'Lake'], 'Stearns': ['Benton', 'Douglas',
'Kandiyohi', 'Meeker', 'Morrison', 'Pope', 'Sherburne',
'Todd', 'Wright'], 'Steele': ['Dodge', 'Freeborn', 'Mower', 'Rice',
'Waseca'], 'Stevens': ['Big Stone', 'Douglas', 'Grant', 'Pope',
'Swift', 'Traverse'], 'Swift': ['Big Stone', 'Chippewa',
'Kandiyohi', 'Lac qui Parle', 'Pope', 'Stevens'], 'Todd':
['Cass', 'Douglas', 'Morrison', 'Otter Tail', 'Stearns', 'Wadena'],
'Traverse': ['Big Stone', 'Grant', 'Stevens', 'Wilkin'], 'Wabasha':
['Goodhue', 'Olmsted', 'Winona'], 'Wadena': ['Becker', 'Cass',
'Hubbard', 'Otter Tail', 'Todd'], 'Waseca': ['Blue Earth',
'Faribault', 'Freeborn', 'Le Sueur', 'Rice', 'Steele'],
'Washington': ['Anoka', 'Chisago', 'Dakota', 'Ramsey'], 'Watonwan':
['Blue Earth', 'Brown', 'Cottonwood', 'Jackson', 'Martin'], 'Wilkin':
['Clay', 'Grant', 'Otter Tail', 'Traverse'], 'Winona': ['Fillmore',
'Houston', 'Olmsted', 'Wabasha'], 'Wright': ['Carver', 'Hennepin',
'McLeod', 'Meeker', 'Sherburne', 'Stearns'], 'Yellow Medicine':
['Chippewa', 'Lac qui Parle', 'Lincoln', 'Lyon', 'Redwood',
'Renville']}
The keys in the dictionary represent the nodes, while the values(lists) represent nodes of neighbors of the key. This is an undirected unweighted graph.
Is there some function to implement this in networkx or some network analysis or graph-related libraries?

You can use the nx.Graph constructor -- it will add additional nodes if they don't appear as keys in the original dictionary (data represents the dictionary in the original question):
import networkx as nx
graph = nx.Graph(data)

Coloring Edges by Value networkX - error edge_color must be a color or list of one color per edge

I am getting the error "edge_color must be a color or list of one color per edge" using the code below. The dataframe genres_grouped is also included below.
# Build a dataframe with your connections
test = pd.DataFrame({'from': genres_grouped['genre_1'],
'to': genres_grouped['genre_2'],
'value': genres_grouped['owner']})
# Build your graph
G=nx.from_pandas_edgelist(test, 'from', 'to', create_using=nx.Graph())
# Custom the nodes:
nx.draw(G, with_labels=True, node_color='skyblue', node_size=1500,
edge_color=test['value'], width=10.0, edge_cmap=plt.cm.Blues)
Here is the dict version of my DataFrame:
{'from':
{4864: 'Rap', 2129: 'Indie Rock', 1394: 'Dance & House',
2123: 'Indie Rock', 2117: 'Indie Rock', 4827: 'Rap',
1251: 'Country & Folk', 2284: 'Indie Rock', 2383: 'Indie Rock',
3978: 'Pop', 4508: 'R&B', 3805: 'Pop', 3954: 'Pop',
122: 'Alternative', 1484: 'Dance & House', 1388: 'Dance & House',
1159: 'Country & Folk', 4480: 'R&B', 2739: 'Latin', 4030: 'Pop',
2204: 'Indie Rock', 2115: 'Indie Rock', 4854: 'Rap', 5814: 'Rock',
5598: 'Rock', 4595: 'Rap', 1397: 'Dance & House', 4858: 'Rap',
116: 'Alternative', 4816: 'Rap', 3626: 'Pop', 4856: 'Rap',
5209: 'Religious', 3823: 'Pop', 4039: 'Pop', 3967: 'Pop',
3616: 'Pop', 1821: 'Electronica', 1830: 'Electronica',
2303: 'Indie Rock', 4848: 'Rap', 306: 'Alternative', 4664: 'Rap',
1827: 'Electronica', 4671: 'Rap', 5592: 'Rock', 3637: 'Pop',
5718: 'Rock', 5583: 'Rock', 4849: 'Rap'}, 'to': {4864: 'R&B',
2129: 'Alternative', 1394: 'Electronica', 2123: 'Alternative',
2117: 'Alternative', 4827: 'Pop', 1251: 'Rock', 2284: 'Pop',
2383: 'Rock', 3978: 'Rap', 4508: 'Rap', 3805: 'IndieRock',
3954: 'R&B', 122: 'IndieRock', 1484: 'Pop', 1388: 'Electronica',
1159: 'Pop', 4480: 'Pop', 2739: '-', 4030: 'Rock', 2204: 'Electronica',
2115: 'Alternative', 4854: 'R&B', 5814: 'Pop', 5598: 'Alternative',
4595: '-', 1397: 'Electronica', 4858: 'R&B', 116: 'IndieRock',
4816: 'Pop', 3626: 'Alternative', 4856: 'R&B', 5209: '-',
3823: 'IndieRock', 4039: 'Rock', 3967: 'Rap', 3616: '-',
1821: 'Dance&House', 1830: 'Dance&House', 2303: 'Pop', 4848: 'R&B',
306: 'Rock', 4664: 'Dance&House', 1827: 'Dance&House',
4671: 'Dance&House', 5592: 'Alternative', 3637: 'Alternative',
5718: 'IndieRock', 5583: '-', 4849: 'R&B'
},
'value': {4864: 15477, 2129: 13102, 1394: 10800, 2123: 9981, 2117: 7233,
4827: 6895, 1251: 4550, 2284: 4480, 2383: 4366, 3978: 4299, 4508: 3620,
3805: 3339, 3954: 3075, 122: 2975, 1484: 2961, 1388: 2733, 1159: 2725,
4480: 2689, 2739: 2660, 4030: 2627, 2204: 2590, 2115: 2561, 4854: 2550,
5814: 2491, 5598: 2487, 4595: 2328, 1397: 2327, 4858: 2268, 116: 2265,
4816: 2201, 3626: 2133, 4856: 2129, 5209: 2113, 3823: 2063, 4039: 1895,
3967: 1837, 3616: 1812, 1821: 1759, 1830: 1726, 2303: 1683, 4848: 1671,
306: 1671, 4664: 1653, 1827: 1648, 4671: 1646, 5592: 1586, 3637: 1548,
5718: 1514, 5583: 1493, 4849: 1481
}}
Any idea why I'd be getting this error?

Though the text of my error message is a bit different, I believe you are getting the error because the shape of test['value'] which is (50, ) is different than the required shape. However you specify colors, the object you pass to needs to be the same length as the number of edges. len(G.edges) outputs 24.
The bit that you're missing is that when building the graph with nx.Graph(), networkx does not add a new edge if that edge is already present in the graph. Since you have duplicate pairs present in your 'from' and 'to' columns (for example, there are multiple rows with 'from' = 'Rap' and 'to' = 'Pop'), the duplicates will not be added as an additional edge.
So you have to decide two things:
Should duplicate rows result in an additional, parallel edge? If so, then you need to use create_using=nx.MultiGraph() when building your graph.
Does node order matter? I.e. is 'from' = 'Rap, 'to' = 'Pop' distinct from 'from' = 'Pop', 'to' = 'Rap'? If so, then you need to use create_using=nx.DiGraph() when building your graph.
If both 1 and 2 above are true then you need to use create_using=nx.MultiDiGraph(). Visualizing these more complex graph types deserves its own question, so I won't describe it here.
Once you've answered questions 1 and 2, however, you need to work with your input data to decide how to potentially group the 'to' and 'from' columns and sum the value column.

Capital city that starts with "a", and ends with "a". Doesn't matter if letter "a" is uppercase or lowercase

Start with a, and ends with a. I have been trying to output capital cities that start and end with the letter "a". Doesn't matter if they start with capital "A"
capitals = ('Kabul', 'Tirana (Tirane)', 'Algiers', 'Andorra la Vella', 'Luanda', "Saint John's", 'Buenos Aires', 'Yerevan', 'Canberra', 'Vienna', 'Baku', 'Nassau', 'Manama', 'Dhaka', 'Bridgetown', 'Minsk', 'Brussels', 'Belmopan', 'Porto Novo', 'Thimphu', 'Sucre', 'Sarajevo', 'Gaborone', 'Brasilia', 'Bandar Seri Begawan', 'Sofia', 'Ouagadougou', 'Gitega', 'Phnom Penh', 'Yaounde', 'Ottawa', 'Praia', 'Bangui', "N'Djamena", 'Santiago', 'Beijing', 'Bogota', 'Moroni', 'Kinshasa', 'Brazzaville', 'San Jose', 'Yamoussoukro', 'Zagreb', 'Havana', 'Nicosia', 'Prague', 'Copenhagen', 'Djibouti', 'Roseau', 'Santo Domingo', 'Dili', 'Quito', 'Cairo', 'San Salvador', 'London', 'Malabo', 'Asmara', 'Tallinn', 'Mbabana', 'Addis Ababa', 'Palikir', 'Suva', 'Helsinki', 'Paris', 'Libreville', 'Banjul', 'Tbilisi', 'Berlin', 'Accra', 'Athens', "Saint George's", 'Guatemala City', 'Conakry', 'Bissau', 'Georgetown', 'Port au Prince', 'Tegucigalpa', 'Budapest', 'Reykjavik', 'New Delhi', 'Jakarta', 'Tehran', 'Baghdad', 'Dublin', 'Jerusalem', 'Rome', 'Kingston', 'Tokyo', 'Amman', 'Nur-Sultan', 'Nairobi', 'Tarawa Atoll', 'Pristina', 'Kuwait City', 'Bishkek', 'Vientiane', 'Riga', 'Beirut', 'Maseru', 'Monrovia', 'Tripoli', 'Vaduz', 'Vilnius', 'Luxembourg', 'Antananarivo', 'Lilongwe', 'Kuala Lumpur', 'Male', 'Bamako', 'Valletta', 'Majuro', 'Nouakchott', 'Port Louis', 'Mexico City', 'Chisinau', 'Monaco', 'Ulaanbaatar', 'Podgorica', 'Rabat', 'Maputo', 'Nay Pyi Taw', 'Windhoek', 'No official capital', 'Kathmandu', 'Amsterdam', 'Wellington', 'Managua', 'Niamey', 'Abuja', 'Pyongyang', 'Skopje', 'Belfast', 'Oslo', 'Muscat', 'Islamabad', 'Melekeok', 'Panama City', 'Port Moresby', 'Asuncion', 'Lima', 'Manila', 'Warsaw', 'Lisbon', 'Doha', 'Bucharest', 'Moscow', 'Kigali', 'Basseterre', 'Castries', 'Kingstown', 'Apia', 'San Marino', 'Sao Tome', 'Riyadh', 'Edinburgh', 'Dakar', 'Belgrade', 'Victoria', 'Freetown', 'Singapore', 'Bratislava', 'Ljubljana', 'Honiara', 'Mogadishu', 'Pretoria, Bloemfontein, Cape Town', 'Seoul', 'Juba', 'Madrid', 'Colombo', 'Khartoum', 'Paramaribo', 'Stockholm', 'Bern', 'Damascus', 'Taipei', 'Dushanbe', 'Dodoma', 'Bangkok', 'Lome', "Nuku'alofa", 'Port of Spain', 'Tunis', 'Ankara', 'Ashgabat', 'Funafuti', 'Kampala', 'Kiev', 'Abu Dhabi', 'London', 'Washington D.C.', 'Montevideo', 'Tashkent', 'Port Vila', 'Vatican City', 'Caracas', 'Hanoi', 'Cardiff', "Sana'a", 'Lusaka', 'Harare')
This is my code:
for elem in capitals:
elem = elem.lower()
["".join(j for j in i if j not in string.punctuation) for i in capitals]
if (len(elem) >=4 and elem.endswith(elem[0])):
print(elem)
My output is:
andorra la vella
saint john's
asmara
addis ababa
accra
saint george's
nur-sultan
abuja
oslo
warsaw
apia
ankara
tashkent
My expected output is:
andorra la vella
asmara
addis ababa
accra
abuja
apia
ankara

You didn't check if the capital starts with 'a'. I also assumed you want to filter out punctuation based on your code, so this is what I ended up with:
import string
for elem in capitals:
elem = elem.lower()
for punct in string.punctuation:
elem = elem.replace(punct, '')
if elem.startswith('a') and elem.endswith('a'):
print(elem)

for elem in capitals:
elem = elem.lower()
if (elem.startswith('a') and elem.endswith('a')):
print(elem)

How to check frequency of every unique value from pandas data-frame?

If I have a data-frame of 2000 and in which let say brand have 142 unique values and i want to count frequency of every unique value form 1 to 142.values should change dynamically.
brand=clothes_z.brand_name
brand.describe(include="all")
unique_brand=brand.unique()
brand.describe(include="all"),unique_brand
Output:
(count 2613
unique 142
top Mango
freq 54
Name: brand_name, dtype: object,
array(['Jack & Jones', 'TOM TAILOR DENIM', 'YOURTURN', 'Tommy Jeans',
'Alessandro Zavetti', 'adidas Originals', 'Volcom', 'Pier One',
'Superdry', 'G-Star', 'SIKSILK', 'Tommy Hilfiger', 'Karl Kani',
'Alpha Industries', 'Farah', 'Nike Sportswear',
'Calvin Klein Jeans', 'Champion', 'Hollister Co.', 'PULL&BEAR',
'Nike Performance', 'Even&Odd', 'Stradivarius', 'Mango',
'Champion Reverse Weave', 'Massimo Dutti', 'Selected Femme Petite',
'NAF NAF', 'YAS', 'New Look', 'Missguided', 'Miss Selfridge',
'Topshop', 'Miss Selfridge Petite', 'Guess', 'Esprit Collection',
'Vero Moda', 'ONLY Petite', 'Selected Femme', 'ONLY', 'Dr.Denim',
'Bershka', 'Vero Moda Petite', 'PULL & BEAR', 'New Look Petite',
'JDY', 'Even & Odd', 'Vila', 'Lacoste', 'PS Paul Smith',
'Redefined Rebel', 'Selected Homme', 'BOSS', 'Brave Soul', 'Mind',
'Scotch & Soda', 'Only & Sons', 'The North Face',
'Polo Ralph Lauren', 'Gym King', 'Selected Woman', 'Rich & Royal',
'Rooms', 'Glamorous', 'Club L London', 'Zalando Essentials',
'edc by Esprit', 'OYSHO', 'Oasis', 'Gina Tricot',
'Glamorous Petite', 'Cortefiel', 'Missguided Petite',
'Missguided Tall', 'River Island', 'INDICODE JEANS',
'Kings Will Dream', 'Topman', 'Esprit', 'Diesel', 'Key Largo',
'Mennace', 'Lee', "Levi's®", 'adidas Performance', 'jordan',
'Jack & Jones PREMIUM', 'They', 'Springfield', 'Benetton', 'Fila',
'Replay', 'Original Penguin', 'Kronstadt', 'Vans', 'Jordan',
'Apart', 'New look', 'River island', 'Freequent', 'Mads Nørgaard',
'4th & Reckless', 'Morgan', 'Honey punch', 'Anna Field Petite',
'Noisy may', 'Pepe Jeans', 'Mavi', 'mint & berry', 'KIOMI', 'mbyM',
'Escada Sport', 'Lost Ink', 'More & More', 'Coffee', 'GANT',
'TWINTIP', 'MAMALICIOUS', 'Noisy May', 'Pieces', 'Rest',
'Anna Field', 'Pinko', 'Forever New', 'ICHI', 'Seafolly', 'Object',
'Freya', 'Wrangler', 'Cream', 'LTB', 'G-star', 'Dorothy Perkins',
'Carhartt WIP', 'Betty & Co', 'GAP', 'ONLY Tall', 'Next', 'HUGO',
'Violet by Mango', 'WEEKEND MaxMara', 'French Connection'],
dtype=object))
As it is showing only frequency of Mango "54" because it is top frequency and I want every value frequency like what is the frequency of Jack & Jones, TOM TAILOR DENIM and YOURTURN and so on... and values should change dynamically.

You could simply do,
clothes_z.brand_name.value_counts()
This would list down the unique values and would give you the frequency of every element in that Pandas Series.

from collections import Counter
ll = [...your list of brands...]
c = Counter(ll)
# you can do whatever you want with your counted values
df = pd.DataFrame.from_dict(c, orient='index', columns=['counted'])

XPath - extracting table data with irregular pattern

Extending an existing question and answer here, I am trying to extract player name and his position. The output would like:
playername, position
EJ Manuel, Quarterbacks
Tyrod Taylor, Quarterbacks
Anthony Dixon, Running backs
...
This is what I have done so far:
tree = html.fromstring(requests.get("https://en.wikipedia.org/wiki/List_of_current_AFC_team_rosters").text)
for h3 in tree.xpath("//table[#class='toccolours']//tr[2]"):
position = h3.xpath(".//b/text()")
players = h3.xpath(".//ul/li/a/text()")
print(position, players)
The above codes can deliver the following, but not in the format I need.
(['Quarterbacks', 'Running backs', 'Wide receivers', 'Tight ends', 'Offensive linemen', 'Defensive linemen', 'Linebackers', 'Defensive backs', 'Special teams', 'Reserve lists', 'Unrestricted FAs', 'Restricted FAs', 'Exclusive-Rights FAs'], ['EJ Manuel', 'Tyrod Taylor', 'Anthony Dixon', 'Jerome Felton', 'Mike Gillislee', 'LeSean McCoy', 'Karlos Williams', 'Leonard Hankerson', 'Marcus Easley', 'Marquise Goodwin', 'Percy Harvin', 'Dez Lewis', 'Walt Powell', 'Greg Salas', 'Sammy Watkins', 'Robert Woods', 'Charles Clay', 'Chris Gragg', "Nick O'Leary", 'Tyson Chandler', 'Ryan Groy', 'Seantrel Henderson', 'Cyrus Kouandjio', 'John Miller', 'Kraig Urbik', 'Eric Wood', 'T. J. Barnes', 'Marcell Dareus', 'Lavar Edwards', 'IK Enemkpali', 'Jerry Hughes', 'Kyle Williams', 'Mario Williams', 'Jerel Worthy', 'Jarius Wynn', 'Preston Brown', 'Randell Johnson', 'Manny Lawson', 'Kevin Reddick', 'Tony Steward', 'A. J. Tarpley', 'Max Valles', 'Mario Butler', 'Ronald Darby', 'Stephon Gilmore', 'Corey Graham', 'Leodis McKelvin', 'Jonathan Meeks', 'Merrill Noel', 'Nickell Robey', 'Sammy Seamster', 'Cam Thomas', 'Aaron Williams', 'Duke Williams', 'Dan Carpenter', 'Jordan Gay', 'Garrison Sanborn', 'Colton Schmidt', 'Blake Annen', 'Jarrett Boykin', 'Jonathan Dowling', 'Greg Little', 'Jacob Maxwell', 'Ronald Patrick', 'Cedric Reed', 'Cyril Richardson', 'Phillip Thomas', 'James Wilder, Jr.', 'Nigel Bradham', 'Ron Brooks', 'Alex Carrington', 'Cordy Glenn', 'Leonard Hankerson', 'Richie Incognito', 'Josh Johnson', 'Corbin Bryant', 'Stefan Charles', 'MarQueis Gray', 'Chris Hogan', 'Jordan Mills', 'Ty Powell', 'Bacarri Rambo', 'Cierre Wood'])
(['Quarterbacks', 'Running backs', 'Wide receivers', 'Tight ends', 'Offensive linemen', 'Defensive linemen', 'Linebackers', 'Defensive backs', 'Special teams', 'Reserve lists', 'Unrestricted FAs', 'Restricted FAs', 'Exclusive-Rights FAs'], ['Zac Dysert', 'Ryan Tannehill', 'Logan Thomas', 'Jay Ajayi', 'Jahwan Edwards', 'Damien Williams', 'Tyler Davis', 'Robert Herron', 'Greg Jennings', 'Jarvis Landry', 'DeVante Parker', 'Kenny Stills', 'Jordan Cameron', 'Dominique Jones', 'Dion Sims', 'Branden Albert', 'Jamil Douglas', "Ja'Wuan James", 'Vinston Painter', 'Mike Pouncey', 'Anthony Steen', 'Dallas Thomas', 'Billy Turner', 'Deandre Coleman', 'Quinton Coples', 'Terrence Fede', 'Dion Jordan', 'Earl Mitchell', 'Damontre Moore', 'Jordan Phillips', 'Ndamukong Suh', 'Charles Tuaau', 'Robert Thomas', 'Cameron Wake', 'Julius Warmsley', 'Jordan Williams', 'Neville Hewitt', 'Mike Hull', 'Jelani Jenkins', 'Terrell Manning', 'Chris McCain', 'Koa Misi', 'Zach Vigil', 'Walt Aikens', 'Damarr Aultman', 'Brent Grimes', 'Reshad Jones', 'Tony Lippett', 'Bobby McCain', 'Brice McCain', 'Tyler Patmon', 'Dax Swanson', 'Jamar Taylor', 'Matt Darr', 'John Denney', 'Andrew Franks', 'Louis Delmas', 'James-Michael Johnson', 'Rishard Matthews', 'Jacques McClendon', 'Lamar Miller', 'Matt Moore', 'Spencer Paysinger', 'Derrick Shelby', 'Kelvin Sheppard', 'Shelley Smith', 'Olivier Vernon', 'Michael Thomas', 'Brandon Williams', 'Shamiel Gary', 'Matt Hazel', 'Ulrick John', 'Jake Stoneburner'])
...
Any suggestions?

You can use nested loop for this task. First loop through the positions and then, for each position, loop through the corresponding players :
#loop through positions
for b in tree.xpath("//table[#class='toccolours']//tr[2]//b"):
#get current position text
position = b.xpath("text()")[0]
#get players that correspond to the current position
for a in b.xpath("following::ul[1]/li/a[not(*)]"):
#get current player text
player = a.xpath("text()")[0]
#print current position and player together
print(position, player)
Last part of the output :
.....
('Reserve lists', 'Chris Watt')
('Reserve lists', 'Eric Weddle')
('Reserve lists', 'Tourek Williams')
('Practice squad', 'Alex Bayer')
('Practice squad', 'Isaiah Burse')
('Practice squad', 'Richard Crawford')
('Practice squad', 'Ben Gardner')
('Practice squad', 'Michael Huey')
('Practice squad', 'Keith Lewis')
('Practice squad', 'Chuka Ndulue')
('Practice squad', 'Tim Semisch')
('Practice squad', 'Brad Sorensen')
('Practice squad', 'Craig Watts')

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python - intersect list of dictionaries - python

Related

Creating undirected unweighted graph from dictionary containing neighborhood relationship

Coloring Edges by Value networkX - error edge_color must be a color or list of one color per edge

Capital city that starts with "a", and ends with "a". Doesn't matter if letter "a" is uppercase or lowercase

How to check frequency of every unique value from pandas data-frame?

XPath - extracting table data with irregular pattern

Categories

Resources