CSV split rows into lists

CSV split rows into lists - python

So i would like to split string from list into multiple lists
like rows[1] should be splited into another list contained in list m
i saw this here and it hsould be accesable m[0][0] to get first item form first list .
import csv
reader = csv.reader(open("alerts.csv"), delimiter=',')
)
rows=[]
for row in reader:
rows.append(row)
num_lists=int(len(rows))
lists=[]
m=[]
for x in rows:
m.append(x.split(';')[0])
printing rows:
[['priority;status;time;object_class;host;app;inc;tool;msg'], ['P2;CLOSED;24-09-2016 20:06:41;nm;prod;;390949;HPNNM;call'], ['P2;CLOSED;24-09-2016 20:06:41;nm;prod;;390949;HPNNM;msg'], ['P2;CLOSED;24-09-2016 20:06:41;nm;prod;;390949;HPNNM;msg']]
and output should look like
m[0][0] should return pririty

you can do this pretty easily with pandas
import pandas as pd
A = pd.read_csv('yourfile.csv')
for x in A.values:
for y in x:
print y
so the 'print y' statement access each element in the row. but I mean, after the "for x in A.values" you can do just about anything

Exact solution to your question; you almost got it right (note the delimiter value):
reader = csv.reader(open("alerts.csv"), delimiter=';')
table = [row for row in reader]
print(table[0][0])
>>> priority
For easy data handling, it is often nice to explicitly extract the header like so:
reader = csv.reader(open("alerts.csv"), delimiter=';')
header = reader.next()
table = [row for row in reader]
print(header[0])
print(table[0][0])
>>> priority
>>> P2

Here's how to do it:
import csv
with open('alerts.csv') as f:
reader = csv.reader(f, delimiter=';')
next(reader) # skip over the first header row
rows = [row for row in reader]
>>> print(rows[0][0])
P2
This uses a list comprehension to read all rows from the CSV file into a list. The delimiter should be a semi-colon, not a comma; so use delimiter=';'. Also the first row is a header and is therefore skipped.

Related

Python modify column in a CSV to muliple columns

I'm trying to modify one column in a CSV, to change it to multiple columns.
Hence this CSV:
title,body,field_tag,field_titel
--------------------------------
"bladibla", "bla.....bla", "[""tag1"",""tag2"",""tag3"",""tag4""]", "bladiblabla"
"bladibla", "bla.....bla", "[""tag3"",""tag4"",""tag5"",""tag7"",""tag8"",""tag11""]", "bladiblabla"
What I want is this:
title,body,field_titel,field_tag,field_tag,field_tag,field_tag,field_tag,field_tag
--------------------------------
"bladibla","bla.....bla","bladiblabla","tag1,"tag2","tag3","tag4"
"bladibla","bla.....bla","bladiblabla","tag3,"tag4","tag5","tag7","tag8","tag11"
How to achieve this in Python?
What i've tried so far is this, but not given the result i want.
import csv
import numpy
with open('tester.csv','r') as csvinput:
with open('testeroutput.csv', 'w') as csvoutput:
writer = csv.writer(csvoutput, lineterminator='\n')
reader = csv.reader(csvinput)
all = []
rij = next(reader)
for row in reader:
# print row['field_tag']
strlist = row[3]
#remove [ and ]
strlist = (strlist.replace('[', ''))
strlist = (strlist.replace(']', ''))
text = strlist.split(',')
#make string of list
for tag in text:
str1 = ''.join(tag)
print str1
print(type(str1))
row.append('field_tag')
all.append(row)
row.append(str1)
all.append(row)
writer.writerows(all)
Hope that you can point me in a better direction.

Utilize this snippet:
import ast
for row in reader:
row.extend(ast.literal_eval(row.pop(2)))
writer.writerow(row)
row.pop(2) removes the third item from the row and returns it. ast.literal_eval() safely evaluates that third item as long as it contains "Python literal structures: strings, bytes, numbers, tuples, lists, dicts, sets, booleans, and None."

Check for unique elements of csv

I would like to check for duplicates in a .csv (structure bellow). Every value in this .csv has to be unique! You can find "a" thrice, but it should be there only once.
###start
a
a;b;
d;e
f;g
h
i;
i
d;b
a
c;i
### end
The progress so far:
import os,glob
import csv
folder_path = "csv_entities/"
found_rows = set()
for filepath in glob.glob(os.path.join(folder_path, "*.csv")):
with open(filepath) as fin, open("newfile.csv", "w") as fout:
reader = csv.reader(fin, delimiter=";")
writer = csv.writer(fout, delimiter=";")
for row in reader:
# delete empty list elements
if "" in row:
row = row[:-1]
#delete empt row
if not row:
continue
row = tuple(row) # make row hashable
# don't write if row is there already!
if row in found_rows:
continue
print(row)
writer.writerow(row)
found_rows.add(row)
Which results in this csv:
###start
a
a;b
d;e
f;g
h
i
d;b
c;i
###end
The most important question is right now: How can I get rid of the double values?
e.g in the second row there should be only "b" instead of "a;b", because "a" is already in the row before.

your mistake is to consider the rows themselves as unique elements. You have to consider cells as elements.
So use your marker set to mark elements, not rows.
Example with only one input file (using several input files with only one output file makes no sense)
found_values = set()
with open("input.csv") as fin, open("newfile.csv", "w",newline="") as fout:
reader = csv.reader(fin, delimiter=";")
writer = csv.writer(fout, delimiter=";")
for row in reader:
# delete empty list elements & filter out already seen elements
new_row = [x for x in row if x and x not in found_values]
# update marker set with row contents
found_values.update(row)
if new_row:
# new row isn't empty: write it
writer.writerow(new_row)
the resulting csv file is:
a
b
d;e
f;g
h
i
c

Need help in finding the row of CSV which contains the values in array

I have an array LiveTick = ['ted3m index','US0003m index','USGG3m index'] and I am reading a CSV file book1.csv. I have to find the row which contains the values in csv.
For example, 15th row will contain ted3m index 500 | 600 and 20th row will contain US0003m index 800 | 900 and likewise.
I then have to get the values contained in the row and parse it for each value contained in array LiveTick. How do I proceed? Below is my sample code:
with open('C:\\blp\\book1.csv', 'r') as f:
reader = csv.reader(f, delimiter=',')
writer = csv.writer(outf)
for row in reader:
for list in LiveTick:
if list in row:
print ('Found: {}'.format(row))

You can use pandas, it's pretty fast and will do all reading, writing and filtering job for you out of the box:
import pandas as pd
df = pd.read_csv('C:\\blp\\book1.csv')
filtered_df = df[df['your_column_name'].isin(LiveTick)]
# now you can save it
filtered_df.to_csv('C:\\blp\\book_filtered.csv')

You have the right idea, but there are a few improvements you can make:
Instead of a nested for loop which doesn't short-circuit, use any to compare the first column to multiple values.
Write to your csv as you go along instead of just print. This is memory-efficient, as you hold in memory only one line at any one time.
Define outf as an open object in your with statement.
Do not shadow built-in list. Use another identifier, e.g. i, for elements in LiveTick.
Here's a demo:
with open('in.csv', 'r') as f, open('out.csv', 'wb', newline='') as outf:
reader = csv.reader(f, delimiter=',')
writer = csv.writer(outf, delimiter=',')
for row in reader:
if any(i in row[0] for i in LiveTick):
writer.writerow(row)

Filtering data out of Excel file via CSV

Is there eny alternative for this than making multiple for loop ?
I have an Excel file :
|col1|col2|col3|
1 x y
2 s r
3 o o
I want an output like this: When first column argument equals 1, print argument from column 3 from the same row.
import csv
reader = csv.reader(open("alerts.csv"), delimiter=',')
rows=[]
for row in reader:
rows.append(row)
for i in row:
x"i"?=row[i].split(";")
I'm trying to figure out a function that would make another list with split information form row[i] but that wont work I feel.

import pandas as pd
reader = pd.read_csv("alerts.csv")
print reader[['col1','col3']].loc[reader['col1'] == 1]

So i have difrent idea to sort everything out to diffrent list.
import csv
reader = csv.reader(open("alerts.csv"), delimiter=',')
rows=[]
for row in reader:
rows.append(row)
num_lists=int(len(rows))
lists = []
for p in range(num_lists):
lists.append([])
for i in rows:
lists[i][i]=row[i].split(";")
print (lists[0][0])
but that seems not work :<

Compare two CSV files and look for matches Python

I have two CSV files that are like
CSV1
H1,H2,H3
arm,biopsy,forearm
heart,leg biopsy,biopsy
organs.csv
arm
leg
forearm
heart
skin
I need to compare both the files and get an output list like this [arm,forearm,heart,leg] but the script that I'm currently working on doesn't give me any output (I want leg also in the output, though it is mixed with biopsy in the same cell). Here's the code so far. How can I get all the matched words?
import csv
import io
alist, blist = [], []
with open("csv1.csv", "rb") as fileA:
reader = csv.reader(fileA, delimiter=',')
for row in reader:
alist.append(row)
with open("organs.csv", "rb") as fileB:
reader = csv.reader(fileB, delimiter=',')
for row in reader:
blist.append(row)
first_set = set(map(tuple, alist))
secnd_set = set(map(tuple, blist))
matches = set(first_set).intersection(secnd_set)
print matches

Try this:
import csv
alist, blist = [], []
with open("csv1.csv", "rb") as fileA:
reader = csv.reader(fileA, delimiter=',')
for row in reader:
for row_str in row:
alist += row_str.strip().split()
with open("organs.csv", "rb") as fileB:
reader = csv.reader(fileB, delimiter=',')
for row in reader:
blist += row
first_set = set(alist)
second_set = set(blist)
print first_set.intersection(second_set)
Basically, iterating through the csv file via csv reader returns a row which is a list of the items (strings) like this ['arm', 'biopsy', 'forearm'], so you have to sum lists to insert all of the items.
On the other hand, to remove duplications only one set conversion via the set() function is required, and the intersection method returns another set with the elements.

Change the part reading from csv1.csv to:
with open("csv1.csv", "rb") as fileA:
reader = csv.reader(fileA, delimiter=',')
for row in reader:
# append all words in cell
for word in row:
alist.append(word)

I would treat the CSV files as text files, get a lists of all the words in the first and the seconds, then iterate over the first list to see if any exactly match any in the second list.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

CSV split rows into lists - python

you can do this pretty easily with pandas import pandas as pd A = pd.read_csv('yourfile.csv') for x in A.values: for y in x: print y so the 'print y' statement access each element in the row. but I mean, after the "for x in A.values" you can do just about anything

Related

Python modify column in a CSV to muliple columns

Check for unique elements of csv

Need help in finding the row of CSV which contains the values in array

Filtering data out of Excel file via CSV

Compare two CSV files and look for matches Python

Categories

Resources