I want function input is list's and each list save as one row.
My attempt to save a two-dimensional array:
my_list = [['\ufeffUser Name', 'First Name', 'Last Name', 'Display Name', 'Job Title', 'Department', 'Office Number', 'Office Phone', 'Mobile Phone', 'Fax', 'Address', 'City', 'State or Province', 'ZIP or Postal Code', 'Country or Region'], ['chris#contoso.com', 'Chris', 'Green', 'Chris Green', 'IT Manager', 'Information Technology', '123451', '123-555-1211', '123-555-6641', '123-555-9821', '1 Microsoft way', 'Redmond', 'Wa',
'98052', 'United States'], ['ben#contoso.com', 'Ben', 'Andrews', 'Ben Andrews', 'IT Manager', 'Information Technology', '123452', '123-555-1212', '123-555-6642', '123-555-9822', '1 Microsoft way', 'Redmond', 'Wa', '98052', 'United States'], ['david#contoso.com', 'David', 'Longmuir', 'David Longmuir', 'IT Manager', 'Information Technology', '123453', '123-555-1213', '123-555-6643', '123-555-9823', '1 Microsoft way', 'Redmond', 'Wa', '98052', 'United States'], ['cynthia#contoso.com', 'Cynthia', 'Carey', 'Cynthia Carey', 'IT Manager', 'Information Technology', '123454', '123-555-1214', '123-555-6644', '123-555-9824', '1 Microsoft way', 'Redmond', 'Wa', '98052', 'United States'], ['melissa#contoso.com', 'Melissa', 'MacBeth', 'Melissa MacBeth', 'IT Manager', 'Information Technology', '123455', '123-555-1215', '123-555-6645', '123-555-9825', '1 Microsoft way', 'Redmond', 'Wa', '98052', 'United States']]
unifile.dump.excel("file.csv", "array", "utf-8", my_list)
My attempt to save lists to CSV files:
l1 = ["aa", "ftyg"]
l2 = ["fgghg", "ftyfuv"]
unifile.dump.excel("file.csv", "list", "utf-8", l1, l2)
My function
def excel(file_path: str, mode: str = "array", t_encoding: str = "utf-8", write_mode: str = "r", *data: list):
try:
os.remove(file_path)
except:
pass
if mode == "list":
with open(file_path, 'w') as csv_file:
csv_writer = csv.writer(csv_file, delimiter=',')
for l in data:
csv_writer.writerow(l)
elif mode == "array":
with open(file_path, write_mode, encoding=t_encoding) as csv_file:
csv_writer = csv.writer(csv_file, delimiter=',')
for l in data:
csv_writer.writerow(l)
else:
pass
User Name,First Name,Last Name,Display Name,Job Title,Department,Office Number,Office Phone,Mobile Phone,Fax,Address,City,State or Province,ZIP or Postal Code,Country or Region
chris#contoso.com,Chris,Green,Chris Green,IT Manager,Information Technology,123451,123-555-1211,123-555-6641,123-555-9821,1 Microsoft way,Redmond,Wa,98052,United States
ben#contoso.com,Ben,Andrews,Ben Andrews,IT Manager,Information Technology,123452,123-555-1212,123-555-6642,123-555-9822,1 Microsoft way,Redmond,Wa,98052,United States
david#contoso.com,David,Longmuir,David Longmuir,IT Manager,Information Technology,123453,123-555-1213,123-555-6643,123-555-9823,1 Microsoft way,Redmond,Wa,98052,United States
cynthia#contoso.com,Cynthia,Carey,Cynthia Carey,IT Manager,Information Technology,123454,123-555-1214,123-555-6644,123-555-9824,1 Microsoft way,Redmond,Wa,98052,United States
melissa#contoso.com,Melissa,MacBeth,Melissa MacBeth,IT Manager,Information Technology,123455,123-555-1215,123-555-6645,123-555-9825,1 Microsoft way,Redmond,Wa,98052,United States
The problem is this code will replace all previous lines with end line.
The Pandas module is really useful. This worked for me:
import pandas as pd
my_list = pd.DataFrame(my_list)
my_list.to_csv("example.csv")
Related
I am making a code which takes in jumble word and returns a unjumbled word , the data.json contains a list and here take a word one-by-one and check if it contains all the characters of the word and later checking if the length is same , but the problem is when i enter a word as helol then the l is checked twice and giving me some other outputs including the main one(hello). i know why does it happen but i cant get a fix to it
import json
val = open("data.json")
val1 = json.load(val)#loads the list
a = input("Enter a Jumbled word ")#takes a word from user
a = list(a)#changes into list to iterate
for x in val1:#iterates words from list
for somethin in a:#iterates letters from list
if somethin in list(x):#checks if the letter is in the iterated word
continue
else:
break
else:#checks if the loop ended correctly (that means word has same letters)
if len(a) != len(list(x)):#checks if it has same number of letters
continue#returns
else:
print(x)#continues the loop to see if there are more like that
EDIT: many people wanted the json file so here it is
['Torres Strait Creole', 'good bye', 'agon', "queen's guard", 'animosity', 'price list', 'subjective', 'means', 'severe', 'knockout', 'life-threatening', 'entry into the war', 'dominion', 'damnify', 'packsaddle', 'hallucinate', 'lumpy', 'inception', 'Blankenese', 'cacophonous', 'zeptomole', 'floccinaucinihilipilificate', 'abashed', 'abacterial', 'ableism', 'invade', 'cohabitant', 'handicapped', 'obelus', 'triathlon', 'habitue', 'instigate', 'Gladstone Gander', 'Linked Data', 'seeded player', 'mozzarella', 'gymnast', 'gravitational force', 'Friedelehe', 'open up', 'bundt cake', 'riffraff', 'resourceful', 'wheedle', 'city center', 'gorgonzola', 'oaf', 'auf', 'oafs', 'galoot', 'imbecile', 'lout', 'moron', 'news leak', 'crate', 'aggregator', 'cheating', 'negative growth', 'zero growth', 'defer', 'ride back', 'drive back', 'start back', 'shy back', 'spring back', 'shrink back', 'shy away', 'abderian', 'unable', 'font manager', 'font management software', 'consortium', 'gown', 'inject', 'ISO 639', 'look up', 'cross-eyed', 'squinting', 'health club', 'fitness facility', 'steer', 'sunbathe', 'combatives', 'HTH', 'hope that helps', 'How The Hell', 'distributed', 'plum cake', 'liberalization', 'macchiato', 'caffè macchiato', 'beach volley', 'exult', 'jubilate', 'beach volleyball', 'be beached', 'affogato', 'gigabyte', 'terabyte', 'petabyte', 'undressed', 'decameter', 'sensual', 'boundary marker', 'poor man', 'cohabitee', 'night sleep', 'protruding ears', 'three quarters of an hour', 'spermophilus', 'spermophilus stricto sensu', "devil's advocate", 'sacred king', 'sacral king', 'myr', 'million years', 'obtuse-angled', 'inconsolable', 'neurotic', 'humiliating', 'mortifying', 'theological', 'rematch', 'varıety', 'be short', 'ontological', 'taxonomic', 'taxonomical', 'toxicology testing', 'on the job training', 'boulder', 'unattackable', 'inviolable', 'resinous', 'resiny', 'ionizing radiation', 'citrus grove', 'comic book shop', 'preparatory measure', 'written account', 'brittle', 'locker', 'baozi', 'bao', 'bau', 'humbow', 'nunu', 'bausak', 'pow', 'pau', 'yesteryear', 'fire drill', 'rotted', 'putto', 'overthrow', 'ankle monitor', 'somewhat stupid', 'a little stupid', 'semordnilap', 'pangram', 'emordnilap', 'person with a sunlamp tan', 'tittle', 'incompatible', 'autumn wind', 'dairyman', 'chesty', 'lacustrine', 'chronophotograph', 'chronophoto', 'leg lace', 'ankle lace', 'ankle lock', 'Babelfy', 'ventricular', 'recurrent', 'long-lasting', 'long-standing', 'long standing', 'sea bass', 'reap', 'break wind', 'chase away', 'spark', 'speckle', 'take back', 'Westphalian', 'Aeolic Greek', 'startup', 'abseiling', 'impure', 'bottle cork', 'paralympic', 'work out', 'might', 'ice-cream man', 'ice cream man', 'ice cream maker', 'ice-cream maker', 'traveling', 'special delivery', 'prizefighter', 'abs', 'ab', 'churro', 'pilfer', 'dehumanize', 'fertilize', 'inseminate', 'digitalize', 'fluke', 'stroke of luck', 'decontaminate', 'abandonware', 'manzanita', 'tule', 'jackrabbit', 'system administrator', 'system admin', 'springtime lethargy', 'Palatinean', 'organized religion', 'bearing puller', 'wheel puller', 'gear puller', 'shot', 'normalize', 'palindromic', 'lancet window', 'terminological', 'back of head', 'dragon food', 'barbel', 'Central American Spanish', 'basis', 'birthmark', 'blood vessel', 'ribes', 'dog-rose', 'dreadful', 'freckle', 'free of charge', 'weather verb', 'weather sentence', 'gipsy', 'gypsy', 'glutton', 'hump', 'low voice', 'meek', 'moist', 'river mouth', 'turbid', 'multitude', 'palate', 'peak of mountain', 'poetry', 'pure', 'scanty', 'spicy', 'spicey', 'spruce', 'surface', 'infected', 'copulate', 'dilute', 'dislocate', 'grow up', 'hew', 'hinder', 'infringe', 'inhabit', 'marry off', 'offend', 'pass by', 'brother of a man', 'brother of a woman', 'sister of a man', 'sister of a woman', 'agricultural farm', 'result in', 'rebel', 'strew', 'scatter', 'sway', 'tread', 'tremble', 'hog', 'circuit breaker', 'Southern Quechua', 'safety pin', 'baby pin', 'college student', 'university student', 'pinus sibirica', 'Siberian pine', 'have lunch', 'floppy', 'slack', 'sloppy', 'wishi-washi', 'turn around', 'bogeyman', 'selfish', 'Talossan', 'biomembrane', 'biological membrane', 'self-sufficiency', 'underevaluation', 'underestimation', 'opisthenar', 'prosody', 'Kumhar Bhag Paharia', 'psychoneurotic', 'psychoneurosis', 'levant', "couldn't-care-less attitude", 'noctambule', 'acid-free paper', 'decontaminant', 'woven', 'wheaten', 'waste-ridden', 'war-ridden', 'violence-ridden', 'unwritten', 'typewritten', 'spoken', 'abiogenetically', 'rasp', 'abstractly', 'cyclically', 'acyclically', 'acyclic', 'ad hoc', 'spare tire', 'spare wheel', 'spare tyre', 'prefabricated', 'ISO 9000', 'Barquisimeto', 'Maracay', 'Ciudad Guayana', 'San Cristobal', 'Barranquilla', 'Arequipa', 'Trujillo', 'Cusco', 'Callao', 'Cochabamba', 'Goiânia', 'Campinas', 'Fortaleza', 'Florianópolis', 'Rosario', 'Mendoza', 'Bariloche', 'temporality', 'papyrus sedge', 'paper reed', 'Indian matting plant', 'Nile grass', 'softly softly', 'abductive reasoning', 'abductive inference', 'retroduction', 'Salzburgian', 'cymotrichous', 'access point', 'wireless access point', 'dynamic DNS', 'IP address', 'electrolyte', 'helical', 'hydrometer', 'intranet', 'jumper', 'MAC address', 'Media Access Control address', 'nickel–cadmium battery', 'Ni-Cd battery', 'oscillograph', 'overload', 'photovoltaic', 'photovoltaic cell', 'refractor telescope', 'autosome', 'bacterial artificial chromosome', 'plasmid', 'nucleobase', 'base pair', 'base sequence', 'chromosomal deletion', 'deletion', 'deletion mutation', 'gene deletion', 'chromosomal inversion', 'comparative genomics', 'genomics', 'cytogenetics', 'DNA replication', 'DNA repair', 'DNA sequence', 'electrophoresis', 'functional genomics', 'retroviral', 'retroviral infection', 'acceptance criteria', 'batch processing', 'business rule', 'code review', 'configuration management', 'entity–relationship model', 'lifecycle', 'object code', 'prototyping', 'pseudocode', 'referential', 'reusability', 'self-join', 'timestamp', 'accredited', 'accredited translator', 'certify', 'certified translation', 'computer-aided design', 'computer-aided', 'computer-assisted', 'management system', 'computer-aided translation', 'computer-assisted translation', 'machine-aided translation', 'conference interpreter', 'freelance translator', 'literal translation', 'mother-tongue', 'whispered interpreting', 'simultaneous interpreting', 'simultaneous interpretation', 'base anhydride', 'binary compound', 'absorber', 'absorption coefficient', 'attenuation coefficient', 'active solar heater', 'ampacity', 'amorphous semiconductor', 'amorphous silicon', 'flowerpot', 'antireflection coating', 'antireflection', 'armored cable', 'electric arc', 'breakdown voltage','casing', 'facing', 'lining', 'assumption of Mary', 'auscultation']
Just a example and the dictionary is full of items
As I understand it you are trying to identify all possible matches for the jumbled string in your list. You could sort the letters in the jumbled word and match the resulting list against sorted lists of the words in your data file.
sorted_jumbled_word = sorted(a)
for word in val1:
if len(sorted_jumbled_word) == len(word) and sorted(word) == sorted_jumbled_word:
print(word)
Checking by length first reduces unnecessary sorting. If doing this repeatedly, you might want to create a dictionary of the words in the data file with their sorted versions, to avoid having to repeatedly sort them.
There are spaces and punctuation in some of the terms in your word list. If you want to make the comparison ignoring spaces then remove them from both the jumbled word and the list of unjumbled words, using e.g. word = word.replace(" ", "")
I am trying to scrape address information from https://www.smartystreets.com/products/single-address-iframe. I have a script that searches for the given address in its parameters. When I look at the website itself, one can see various fields like Carrier Route.
Using 3301 South Greenfield Rd Gilbert, AZ 85297 as a hypothetical example, when one goes to the page manually, one can see the Carrier Route: R109.
I am having trouble, however, finding the carrier route on Selenium to scrape it. Does have any recommendations for how to find the Carrier Route for any given address?
Starting code:
driver = webdriver.Chrome('chromedriver')
address = "3301 South Greenfield Rd Gilbert, AZ 85297\n"
url = 'https://www.smartystreets.com/products/single-address-iframe'
driver.get(url)
driver.find_element_by_id("lookup-select-button").click()
driver.find_element_by_id("lookup-select").find_element_by_id("address-freeform").click()
driver.find_element_by_id("freeform-address").send_keys(address)
# Find Carrier Route here
You can use driver.execute_script to provide input for the fields and to click the submission button:
from selenium import webdriver
d = webdriver.Chrome('/path/to/chromedriver')
d.get('https://www.smartystreets.com/products/single-address-iframe')
s = '3301 South Greenfield Rd Gilbert, AZ 85297'
a, a1 = s.split(' Rd ')
route = d.execute_script(f'''
document.querySelector('#address-line1').value = '{a}'
document.querySelector('#city').value = '{(j:=a1.split())[0][:-1]}'
document.querySelector('#state').value = '{j[1]}'
document.querySelector('#zip-code').value = '{j[2]}'
document.querySelector('#submit-request').click()
return document.querySelector('#us-street-metadata li:nth-of-type(2) .answer.col-sm-5.col-xs-3').textContent
''')
Output:
'R109'
To get a full display of all the parameter data, you can use BeautifulSoup:
from bs4 import BeautifulSoup as soup
... #selenium driver source here
cols = soup(d.page_source, 'html.parser').select('#us-street-output div')
data = {i.h4.text:{b.select_one('span:nth-of-type(1)').get_text(strip=True)[:-1]:b.select_one('span:nth-of-type(2)').get_text(strip=True)
for b in i.select('ul li')} for i in cols}
print(data)
print(data['Metadata']['Congressional District'])
Output:
{'Metadata': {'Building Default': 'default', 'Carrier Route': 'R109', 'Congressional District': '05', 'Latitude': '33.291248', 'Longitude': '-111.737427', 'Coordinate Precision': 'Rooftop', 'County Name': 'Maricopa', 'County FIPS': '04013', 'eLOT Sequence': '0160', 'eLOT Sort': 'A', 'Observes DST': 'default', 'RDI': 'Commercial', 'Record Type': 'S', 'Time Zone': 'Mountain', 'ZIP Type': 'Standard'}, 'Analysis': {'Vacant': 'N', 'DPV Match Code': 'Y', 'DPV Footnotes': 'AABB', 'General Footnotes': 'L#', 'CMRA': 'N', 'EWS Match': 'default', 'LACSLink Code': 'default', 'LACSLink Indicator': 'default', 'SuiteLink Match': 'default', 'Enhanced Match': 'default'}, 'Components': {'Urbanization': 'default', 'Primary Number': '3301', 'Street Predirection': 'S', 'Street Name': 'Greenfield', 'Street Postdirection': 'default', 'Street Suffix': 'Rd', 'Secondary Designator': 'default', 'Secondary Number': 'default', 'Extra Secondary Designator': 'default', 'Extra Secondary Number': 'default', 'PMB Designator': 'default', 'PMB Number': 'default', 'City': 'Gilbert', 'Default City Name': 'Gilbert', 'State': 'AZ', 'ZIP Code': '85297', '+4 Code': '2176', 'Delivery Point': '01', 'Check Digit': '2'}}
'05'
Ajax1234, here's the code and screenshot you asked for:
Start with a, and ends with a. I have been trying to output capital cities that start and end with the letter "a". Doesn't matter if they start with capital "A"
capitals = ('Kabul', 'Tirana (Tirane)', 'Algiers', 'Andorra la Vella', 'Luanda', "Saint John's", 'Buenos Aires', 'Yerevan', 'Canberra', 'Vienna', 'Baku', 'Nassau', 'Manama', 'Dhaka', 'Bridgetown', 'Minsk', 'Brussels', 'Belmopan', 'Porto Novo', 'Thimphu', 'Sucre', 'Sarajevo', 'Gaborone', 'Brasilia', 'Bandar Seri Begawan', 'Sofia', 'Ouagadougou', 'Gitega', 'Phnom Penh', 'Yaounde', 'Ottawa', 'Praia', 'Bangui', "N'Djamena", 'Santiago', 'Beijing', 'Bogota', 'Moroni', 'Kinshasa', 'Brazzaville', 'San Jose', 'Yamoussoukro', 'Zagreb', 'Havana', 'Nicosia', 'Prague', 'Copenhagen', 'Djibouti', 'Roseau', 'Santo Domingo', 'Dili', 'Quito', 'Cairo', 'San Salvador', 'London', 'Malabo', 'Asmara', 'Tallinn', 'Mbabana', 'Addis Ababa', 'Palikir', 'Suva', 'Helsinki', 'Paris', 'Libreville', 'Banjul', 'Tbilisi', 'Berlin', 'Accra', 'Athens', "Saint George's", 'Guatemala City', 'Conakry', 'Bissau', 'Georgetown', 'Port au Prince', 'Tegucigalpa', 'Budapest', 'Reykjavik', 'New Delhi', 'Jakarta', 'Tehran', 'Baghdad', 'Dublin', 'Jerusalem', 'Rome', 'Kingston', 'Tokyo', 'Amman', 'Nur-Sultan', 'Nairobi', 'Tarawa Atoll', 'Pristina', 'Kuwait City', 'Bishkek', 'Vientiane', 'Riga', 'Beirut', 'Maseru', 'Monrovia', 'Tripoli', 'Vaduz', 'Vilnius', 'Luxembourg', 'Antananarivo', 'Lilongwe', 'Kuala Lumpur', 'Male', 'Bamako', 'Valletta', 'Majuro', 'Nouakchott', 'Port Louis', 'Mexico City', 'Chisinau', 'Monaco', 'Ulaanbaatar', 'Podgorica', 'Rabat', 'Maputo', 'Nay Pyi Taw', 'Windhoek', 'No official capital', 'Kathmandu', 'Amsterdam', 'Wellington', 'Managua', 'Niamey', 'Abuja', 'Pyongyang', 'Skopje', 'Belfast', 'Oslo', 'Muscat', 'Islamabad', 'Melekeok', 'Panama City', 'Port Moresby', 'Asuncion', 'Lima', 'Manila', 'Warsaw', 'Lisbon', 'Doha', 'Bucharest', 'Moscow', 'Kigali', 'Basseterre', 'Castries', 'Kingstown', 'Apia', 'San Marino', 'Sao Tome', 'Riyadh', 'Edinburgh', 'Dakar', 'Belgrade', 'Victoria', 'Freetown', 'Singapore', 'Bratislava', 'Ljubljana', 'Honiara', 'Mogadishu', 'Pretoria, Bloemfontein, Cape Town', 'Seoul', 'Juba', 'Madrid', 'Colombo', 'Khartoum', 'Paramaribo', 'Stockholm', 'Bern', 'Damascus', 'Taipei', 'Dushanbe', 'Dodoma', 'Bangkok', 'Lome', "Nuku'alofa", 'Port of Spain', 'Tunis', 'Ankara', 'Ashgabat', 'Funafuti', 'Kampala', 'Kiev', 'Abu Dhabi', 'London', 'Washington D.C.', 'Montevideo', 'Tashkent', 'Port Vila', 'Vatican City', 'Caracas', 'Hanoi', 'Cardiff', "Sana'a", 'Lusaka', 'Harare')
This is my code:
for elem in capitals:
elem = elem.lower()
["".join(j for j in i if j not in string.punctuation) for i in capitals]
if (len(elem) >=4 and elem.endswith(elem[0])):
print(elem)
My output is:
andorra la vella
saint john's
asmara
addis ababa
accra
saint george's
nur-sultan
abuja
oslo
warsaw
apia
ankara
tashkent
My expected output is:
andorra la vella
asmara
addis ababa
accra
abuja
apia
ankara
You didn't check if the capital starts with 'a'. I also assumed you want to filter out punctuation based on your code, so this is what I ended up with:
import string
for elem in capitals:
elem = elem.lower()
for punct in string.punctuation:
elem = elem.replace(punct, '')
if elem.startswith('a') and elem.endswith('a'):
print(elem)
for elem in capitals:
elem = elem.lower()
if (elem.startswith('a') and elem.endswith('a')):
print(elem)
I have this read function where it reads a csv file using csv.DictReader. The file.csv is separated by commas and it fully reads. However, this part of my file has a column that contains multiple commas. My question is, how can I make sure that comma is counted as part of a column? I cannot alter my csv file to meet the criteria.
Text File:
ID,Name,University,Street,ZipCode,Country
12,Jon Snow,U of Winterfell,Winterfell #45,60434,Westeros
13,Steve Rogers,NYU,108, Chelsea St.,23333,United States
20,Peter Parker,Yale,34, Tribeca,32444,United States
34,Tyrion Lannister,U of Casterly Rock,Kings Landing #89, 43543,Westeros
The desired output is this:
{'ID': '12', 'Name': 'Jon Snow', 'University': 'U of Winterfell', 'Street': 'Winterfell #45', 'ZipCode': '60434', 'Country': 'Westeros'}
{'ID': '13', 'Name': 'Steve Rogers', 'University': 'NYU', 'Street': '108, Chelsea St.', 'ZipCode': '23333', 'Country': 'United States'}
{'ID': '20', 'Name': 'Peter Parker', 'University': 'Yale', 'Street': '34, Tribeca', 'ZipCode': '32444', 'Country': 'United States'}
{'ID': '34', 'Name': 'Tyrion Lannister', 'University': 'U of Casterly Rock', 'Street': 'Kings Landing #89', 'ZipCode': '43543', 'Country': 'Westeros'}
As you can tell the 'Street' has at least two commas due to the numbers:
13,Steve Rogers,NYU,108, Chelsea St.,23333,United States
20,Peter Parker,Yale,34, Tribeca,32444,United States
Note: Most of the columns being read splits by a str,str BUT under the 'Street' column it is followed by a str, str (there is an extra space after the comma). I hope this makes sense.
The options I tried looking out is using re.split, but I don't know how to implement it on my read file. I was thinking re.split(r'(?!\s),(?!\s)',x[:-1])? How can I make sure the format from my file will count as part of any column? I can't use pandas.
My current output looks like this right now:
{'ID': '12', 'Name': 'Jon Snow', 'University': 'U of Winterfell', 'Street': 'Winterfell #45', 'ZipCode': '60434', 'Country': 'Westeros'}
{'ID': '13', 'Name': 'Steve Rogers', 'University': 'NYU', 'Street': '108', 'ZipCode': 'Chelsea St.', 'Country': '23333', None: ['United States']}
{'ID': '20', 'Name': 'Peter Parker', 'University': 'Yale', 'Street': '34', 'ZipCode': 'Tribeca', 'Country': '32444', None: ['United States']}
{'ID': '34', 'Name': 'Tyrion Lannister', 'University': 'U of Casterly Rock', 'Street': 'Kings Landing #89', 'ZipCode': '43543', 'Country': 'Westeros'}
This is my read function:
import csv
list = []
with open('file.csv', mode='r') as csv_file:
csv_reader = csv.DictReader(csv_file, delimiter=",", skipinitialspace=True)
for col in csv_reader:
list.append(dict(col))
print(dict(col))
You can't use csv if the file isn't valid CSV format.
You need to call re.split() on ordinary lines, not on dictionaries.
list = []
with open('file.csv', mode='r') as csv_file:
keys = csv_file.readline().strip().split(',') # Read header line
for line in csv_file:
line = line.strip()
row = re.split(r'(?!\s),(?!\s)',line)
list.append(dict(zip(keys, row)))
The actual solution for the problem is modifying the script that generates the csv file.
If you have a chance to modify that output you can do 2 things
Use a delimiter other than a comma such as | symbol or ; whatever you believe it doesn't exist in the string.
Or enclose all columns with " so you'll be able to split them by , which are actual separators.
If you don't have a chance to modify the output.
And if you are sure about that multiple commas are only in the street column; then you should use csv.reader instead of DictReader this way you can get the columns by Indexes that you are already sure. for instance row[0] will be ID row[1] will be Name and row[-1] will be Country row[-2] will be ZipCode so row[2:-2] would give you what you need i guess. Indexes can be arranged but the idea is clear I guess.
Hope that helps.
Edit:
import csv
list = []
with open('file.csv', mode='r') as csv_file:
csv_reader = csv.reader(csv_file, delimiter=",", skipinitialspace=True)
# pass the header row
next(csv_reader)
for row in csv_reader:
list.append({"ID": row[0],
"Name": row[1],
"University": row[2],
"Street": ' '.join(row[3:-2]),
"Zipcode": row[-2],
"Country": row[-1]})
print(list)
--
Here is the output (with pprint)
[{'Country': 'Westeros',
'ID': '12',
'Name': 'Jon Snow',
'Street': 'Winterfell #45',
'University': 'U of Winterfell',
'Zipcode': '60434'},
{'Country': 'United States',
'ID': '13',
'Name': 'Steve Rogers',
'Street': '108 Chelsea St.',
'University': 'NYU',
'Zipcode': '23333'},
{'Country': 'United States',
'ID': '20',
'Name': 'Peter Parker',
'Street': '34 Tribeca',
'University': 'Yale',
'Zipcode': '32444'},
{'Country': 'Westeros',
'ID': '34',
'Name': 'Tyrion Lannister',
'Street': 'Kings Landing #89',
'University': 'U of Casterly Rock',
'Zipcode': '43543'}]
-- second edit
edited the index on the street.
Regards.
I have lists like this:
['7801234567', 'Robert Post', '66 Hinton Road']
['7809876543', 'Farrukh Ahmed', '101 Edson Crest']
['7803214567', 'Md Toukir Imam', '34 Sherwood Park Avenue']
['7807890123', 'Elham Ahmadi', '8 Devon Place']
['7808907654', 'Rong Feng', '32 Spruce Street']
['7801236789', 'Nazanin Tahmasebi', '98 Albert Avenue']
['7804321098', 'Sayem Mohammad Siam', '56 Stony Place']
['7808765432', 'Amir Hossein Faghih Dinevari', '45 Beautiful Street']
How should I convert them to dictionaries with keys named: 'tel', 'name', and'address'?
Use zip to associate the keys with their values by index
keys = ('tel', 'name', 'address')
values = ['7801234567', 'Robert Post', '66 Hinton Road']
d = dict(zip(keys, values))
# {'tel': '7801234567', 'name': 'Robert Post', 'address': '66 Hinton Road'}
Edit:
Obviously, you can use this technique to create more complex structures.
info = [['7801234567', 'Robert Post', '66 Hinton Road'],
['7809876543', 'Farrukh Ahmed', '101 Edson Crest'],
['7803214567', 'Md Toukir Imam', '34 Sherwood Park Avenue'],
['7807890123', 'Elham Ahmadi', '8 Devon Place'],
['7808907654', 'Rong Feng', '32 Spruce Street'],
['7801236789', 'Nazanin Tahmasebi', '98 Albert Avenue'],
['7804321098', 'Sayem Mohammad Siam', '56 Stony Place'],
['7808765432', 'Amir Hossein Faghih Dinevari', '45 Beautiful Street']]
keys = ('tel', 'name', 'address')
dictionaries = [dict(zip(keys, values)) for values in info]
The [_ for _ in _] is called a list comprehension
We assume the data are given as:
info_row = [['7801234567', 'Robert Post', '66 Hinton Road'],
['7809876543', 'Farrukh Ahmed', '101 Edson Crest'],
['7803214567', 'Md Toukir Imam', '34 Sherwood Park Avenue'],
...,
]
and
info_type = ['tel', 'name', 'address']
We could first write a function to convert each "row" list.
def convert_row(row_name, row_content):
d = dict()
for i, j in enumerate(row_name):
d[j] = row_content[i]
return d
Then, we can use list comprehension to apply such function to the whole list of list.
expected_result = [convert_row(info_type, r) for r in info_row]
As another answer has pointed out, we can make use of zip. Using built-in routines is more than encouraged in most cases. So it is better to type
expected_result = [dict(zip(info_type, r)) for r in info_row]
You can do this in a list comprehension:
data = [['7801234567', 'Robert Post', '66 Hinton Road'],
['7809876543', 'Farrukh Ahmed', '101 Edson Crest'],
['7803214567', 'Md Toukir Imam', '34 Sherwood Park Avenue'],
['7807890123', 'Elham Ahmadi', '8 Devon Place'],
['7808907654', 'Rong Feng', '32 Spruce Street'],
['7801236789', 'Nazanin Tahmasebi', '98 Albert Avenue'],
['7804321098', 'Sayem Mohammad Siam', '56 Stony Place'],
['7808765432', 'Amir Hossein Faghih Dinevari', '45 Beautiful Street']]
new_data = [{'tel': tel, 'name': name, 'address': address} for tel, name, address in data]
print(new_data)
Which Outputs:
[{'tel': '7801234567', 'name': 'Robert Post', 'address': '66 Hinton Road'}, {'tel': '7809876543', 'name': 'Farrukh Ahmed', 'address': '101 Edson Crest'}, {'tel': '7803214567', 'name': 'Md Toukir Imam', 'address': '34 Sherwood Park Avenue'}, {'tel': '7807890123', 'name': 'Elham Ahmadi', 'address': '8 Devon Place'}, {'tel': '7808907654', 'name': 'Rong Feng', 'address': '32 Spruce Street'}, {'tel': '7801236789', 'name': 'Nazanin Tahmasebi', 'address': '98 Albert Avenue'}, {'tel': '7804321098', 'name': 'Sayem Mohammad Siam', 'address': '56 Stony Place'}, {'tel': '7808765432', 'name': 'Amir Hossein Faghih Dinevari', 'address': '45 Beautiful Street'}]