Making a list from list of lists - python

I have the below list of lists
[['Afghanistan,2.66171813,7.460143566,0.490880072,52.33952713,0.427010864,-0.106340349,0.261178523'], ['Albania,4.639548302,9.373718262,0.637698293,69.05165863,0.74961102,-0.035140377,0.457737535']]
I want to create a new list with only the country names.
So
[Afghanistan, Albania]
Currently using this code.
with open(fileName, "r") as f:
_= next(f)
row_lst = f.read().split()
countryLst = [[i] for i in row_lst]

Try this, using split(',') as your first element in list of list is string separated by comma.
>>> lst = [['Afghanistan,2.66171813,7.460143566,0.490880072,52.33952713,0.427010864,-0.106340349,0.261178523'], ['Albania,4.639548302,9.373718262,0.637698293,69.05165863,0.74961102,-0.035140377,0.457737535']]
Output:
>>> [el[0].split(',')[0] for el in lst]
['Afghanistan', 'Albania']
Explanation:
# el[0] gives the first element in you list which a string.
# .split(',') returns a list of elements after spliting by `,`
# [0] finally selecting your first element as required.
Edit-1:
Using regex,
pattern = r'([a-zA-Z]+)'
new_lst = []
for el in lst:
new_lst+=re.findall(pattern, el[0])
>>> new_lst # output
['Afghanistan', 'Albania']

Looks like a CSV file. Use the csv module
Ex:
import csv
with open(fileName, "r") as f:
reader = csv.reader(f)
next(reader) #Skip header
country = [row[0] for row in reader]

Related

Joining two CSV files (inner join) based on a common column in Python without Pandas

I am trying to join two CSV files based on one common column.
I am reading the CSV file storing a list of tuples. My code:
def read_csv(path):
file = open(path, "r")
content_list = []
for line in file.readlines():
record = line.split(",")
for i in range(len(record)):
record[i] = record[i].replace("\n","")
content_list.append(tuple(record))
return content_list
a_list = read_csv("a.csv")
b_list = read_csv("b.csv")
This is giving me list with headers of CSV as first tuple in the list
a_list
[('user_id', 'activeFl'),
('80c611f1-532a-4f7d-aa80-f28b472c0dbe', 'True'),
('4d04ab57-1b50-4474-bd12-b2b16ed2cca3', 'True'),
('0f37a42a-a984-4402-97bd-0eac95fa95d1', 'True'),
('dbe15b19-0128-4e3a-a82b-c8154d272c18', 'True'), ......]
b_list
[('id','date','user_id','blockedFl','amount','type'),
('b7819826-6468-4416-9953-e739d8046b81','2021-04-23','18a382ef-bd38-4884-8bf','True,'9.04','6'), ....]
I would like to merge these two lists based on the user_id, but I am stuck at this point. What can I try next?
the O(N^2) solution is:
result = list()
for left in a_list[1:]:
for right in b_list[1:]:
if left[0] == right[0]:
result.append(right + left[1:])
break
O(N) using dictionary:
result =list()
b_dict = {x[0]: x for x in b_list[1:]}
for left in a_list[1:]:
if left[0] in b_dict:
result.append(b_dict.get(left[0]) + left[1:])
This is one approach using csv module and a dict
Ex:
import csv
def read_csv(path):
with open(path) as infile:
reader = csv.reader(infile)
header = next(reader)
content = {i[0]: i for i in reader} # UserID as key
return content
a_list = read_csv("a.csv")
b_list = read_csv("b.csv")
merge_data = {k: v + [a_list.get(k)] for k, v in b_list.items()}
print(merge_data) # OR print(list(merge_data.values()))

How can I pass a list comprehension results into a csv using python

I have two lists with variable lengths
list1 = ['x1','x2','x3','x4','x5']
list2 = ['x5','x4']
I try the following to find the missing elements
*[item for item in list1 if item not in list2], sep='\n'
but if I do
item = *[item for item in skuslist if item not in retaillist], sep='\n'
csvwriter.writerow(item)
I get can't assign to list comprehension
How can I pass the results to writerow?
You can try like this:
import csv
list1 = ['x1','x2','x3','x4','x5']
list2 = ['x5','x4']
with open('output.csv', 'w') as f:
writer = csv.writer(f, delimiter='\n', quoting=csv.QUOTE_NONE)
writer.writerow([item for item in list1 if item not in list2])
You need to use writerows to write one item per line, and put each item in a 1-element list:
list1 = ['x1','x2','x3','x4','x5']
list2 = {'x5','x4'}
import csv
with open("test.csv","w",newline="") as f:
cw = csv.writer(f)
cw.writerows([x] for x in list1 if x not in list2)
Detail: create a set for values to exclude, as lookup is faster (that is, for more elements)
Here is another way to accomplish this task. This method create a set based on the differences between list1 and list2. The code also writes the values to the CSV file in order.
import csv
list1 = ['x1','x2','x3','x4','x5']
list2 = ['x5','x4']
# Obtain the differences between list1 and list2
list_difference = (list(set(list1).difference(list2)))
# Uses list comprehension to write the values to a CSV file.
# Uses sorted to write the values in order to the CSV file.
with open('output.csv', 'w') as outfile:
csv_writer = csv.writer(outfile)
csv_writer.writerows([[x] for x in sorted(list_difference)])
outfile.close()
You can also do it this way.
import csv
list1 = ['x1','x2','x3','x4','x5']
list2 = ['x5','x4']
# Obtain the differences between list1 and list2.
# Uses list comprehension to write the values to a CSV file.
# Uses sorted to write the values in order to the CSV file.
with open('output.csv', 'w') as outfile:
csv_writer = csv.writer(outfile)
csv_writer.writerows([[x] for x in sorted(list(set(list1).difference(list2)))])
outfile.close()

what does this mean? stuff = [i.split() for i in row]

what does this mean in this context?
stuff = [i.split() for i in row]
import csv
with open('AB.csv', 'r') as ABfile:
AB=csv.reader(ABfile,csv.excel)
for row in AB:
print(row)
stuff = [i.split() for i in row]
print(stuff)
this is the output
['qqq', 'qqq', 'sd3 3ds', '12/12/2012']
[['qqq'], ['qqq'], ['sd3', '3ds'], ['12/12/2012']]
This is a list comprehension. It is building the same list as
stuff = []
for i in row:
stuff.append(i.split())
It's just a convenient and pythonic way to build a list.
The split method splits a string into a list on whitespace, examples:
>>> 'qqq'.split()
['qqq']
>>> 'sd3 3ds'.split()
['sd3', '3ds']
For each element in row, split is called and the resulting list is added to stuff. That's why you end up with a list of lists for stuff.

Adding column in CSV python and enumerating it

my CSV looks like
John,Bomb,Dawn
3,4,5
3,4,5
3,4,5
I want to add ID column in front like so:
ID,John,Bomb,Dawn
1,3,4,5
2,3,4,5
3,3,4,5
using enumerate function, but I don't know how. Here's my code so far:
import csv
with open("testi.csv", 'rb') as input, open('temp.csv', 'wb') as output:
reader = csv.reader(input, delimiter = ',')
writer = csv.writer(output, delimiter = ',')
all = []
row = next(reader)
row.append('ID')
all.append(row)
count = 0
for row in reader:
count += 1
while count:
all.append(row)
row.append(enumerate(reader, 1))
break
writer.writerows(all)
And the output comes all wrong:
John,Bomb,Dawn,ID
3,4,5,<enumerate object at 0x7fb2a5728d70>
3,4,5,<enumerate object at 0x1764370>
3,4,5,<enumerate object at 0x17643c0>
So the ID comes in the end, when it should be in the start, and it doesn't even do the 1,2,3. Some weird error comes out.
I can suggest the code below to solve your question:
import csv
with open("testi.csv", 'rb') as input, open('temp.csv', 'wb') as output:
reader = csv.reader(input, delimiter = ',')
writer = csv.writer(output, delimiter = ',')
all = []
row = next(reader)
row.insert(0, 'ID')
all.append(row)
for k, row in enumerate(reader):
all.append([str(k+1)] + row)
writer.writerows(all)
More compact code can be:
all = [['ID'] + next(reader)] + [[str(k+1)] + row for k, row in enumerate(reader)]
UPDATE (some explanation):
Your have wrong enumerate function understanding. enumerate should be used in for loop and when you iterate over enumerate function result you get the sequence of the tuples where first item is ordered number of item from list and the second is item itself.
But enumerate function return is object (docs) so when you try to convert it to string it call __repr__ magic method and cast enumerate object to <enumerate object at ...>.
Another words, enumerate helps to avoid additional counters in loops such as your count += 1 variable.
Also you have a very strange code here:
while count:
all.append(row)
row.append(enumerate(reader, 1))
break
this part of code never can't be performed more than one time.
You should use insert() instead of append(). This will allow you to specify the index where you want to add the element.
Try this
import csv
with open("testi.csv", 'rb') as input, open('temp.csv', 'wb') as output:
reader = csv.reader(input, delimiter = ',')
writer = csv.writer(output, delimiter = ',')
all = []
row = next(reader)
row.insert(0, 'ID')
all.append(row)
count = 0
for row in reader:
count += 1
row.insert(0, count)
all.append(row)
writer.writerows(all)
You can do something like this:
import csv
with open('testi.csv') as inp, open('temp.csv', 'w') as out:
reader = csv.reader(inp)
writer = csv.writer(out, delimiter=',')
#No need to use `insert(), `append()` simply use `+` to concatenate two lists.
writer.writerow(['ID'] + next(reader))
#Iterate over enumerate object of reader and pass the starting index as 1.
writer.writerows([i] + row for i, row in enumerate(reader, 1))
enumerate() returns an enumerate object, that yield the index and item in a tuple one at a time, so you need to iterate over the enumerate object instead of writing it to the csv file.
>>> lst = ['a', 'b', 'c']
>>> e = enumerate(lst)
>>> e
<enumerate object at 0x1d48f50>
>>> for ind, item in e:
... print ind, item
...
0 a
1 b
2 c
Output:
>>> !cat temp.csv
ID,John,Bomb,Dawn
1,3,4,5
2,3,4,5
3,3,4,5

output items in a list using curly braces

I have a text file with 'n' lines. I want to extract first word, second word, third word, ... of each line into a list1, list2, list3,...
Suppose input txt file contains:
a1#a2#a3
b1#b2#b3#b4
c1#c2
After reading the file, Output should be:
List1: {a1,b1,c1}
List2: {a2,b2,c2}
List3: {a3,b3}
List4: {b4}
The code:
f = open('path','r')
for line in f:
List=line.split('#')
List1 = List[0]
print '{0},'.format(List1),
List2 = List[1]
print '{0},'.format(List2),
List3 = List[2]
print '{0},'.format(List3),
List4 = List[3]
print '{0},'.format(List4),
OUTPUT
a1,b1,c1,a2,b2,c2,a3,b3,b4
You really don't want to use separate lists here; just use a list of lists. Using the csv module here would make handling splitting a little easier:
import csv
columns = [[] for _ in range(4)] # 4 columns expected
with open('path', rb) as f:
reader = csv.reader(f, delimiter='#')
for row in reader:
for i, col in enumerate(row):
columns[i].append(col)
or, if the number of columns needs to grow dynamically:
import csv
columns = []
with open('path', rb) as f:
reader = csv.reader(f, delimiter='#')
for row in reader:
while len(row) > len(columns):
columns.append([])
for i, col in enumerate(row):
columns[i].append(col)
Or you can use itertools.izip_longest() to transpose the CSV rows:
import csv
from itertools import izip_longest
with open('path', rb) as f:
reader = csv.reader(f, delimiter='#')
columns = [filter(None, column) for column in izip_longest(*reader)]
In the end, you can then print your columns with:
for i, col in enumerate(columns, 1):
print 'List{}: {{{}}}'.format(i, ','.join(col))
Demo:
>>> import csv
>>> from itertools import izip_longest
>>> data = '''\
... a1#a2#a3
... b1#b2#b3#b4
... c1#c2
... '''.splitlines(True)
>>> reader = csv.reader(data, delimiter='#')
>>> columns = [filter(None, column) for column in izip_longest(*reader)]
>>> for i, col in enumerate(columns, 1):
... print 'List{}: {{{}}}'.format(i, ','.join(col))
...
List1: {a1,b1,c1}
List2: {a2,b2,c2}
List3: {a3,b3}
List4: {b4}

Categories

Resources