I'm trying to modify one column in a CSV, to change it to multiple columns.
Hence this CSV:
title,body,field_tag,field_titel
--------------------------------
"bladibla", "bla.....bla", "[""tag1"",""tag2"",""tag3"",""tag4""]", "bladiblabla"
"bladibla", "bla.....bla", "[""tag3"",""tag4"",""tag5"",""tag7"",""tag8"",""tag11""]", "bladiblabla"
What I want is this:
title,body,field_titel,field_tag,field_tag,field_tag,field_tag,field_tag,field_tag
--------------------------------
"bladibla","bla.....bla","bladiblabla","tag1,"tag2","tag3","tag4"
"bladibla","bla.....bla","bladiblabla","tag3,"tag4","tag5","tag7","tag8","tag11"
How to achieve this in Python?
What i've tried so far is this, but not given the result i want.
import csv
import numpy
with open('tester.csv','r') as csvinput:
with open('testeroutput.csv', 'w') as csvoutput:
writer = csv.writer(csvoutput, lineterminator='\n')
reader = csv.reader(csvinput)
all = []
rij = next(reader)
for row in reader:
# print row['field_tag']
strlist = row[3]
#remove [ and ]
strlist = (strlist.replace('[', ''))
strlist = (strlist.replace(']', ''))
text = strlist.split(',')
#make string of list
for tag in text:
str1 = ''.join(tag)
print str1
print(type(str1))
row.append('field_tag')
all.append(row)
row.append(str1)
all.append(row)
writer.writerows(all)
Hope that you can point me in a better direction.
Utilize this snippet:
import ast
for row in reader:
row.extend(ast.literal_eval(row.pop(2)))
writer.writerow(row)
row.pop(2) removes the third item from the row and returns it. ast.literal_eval() safely evaluates that third item as long as it contains "Python literal structures: strings, bytes, numbers, tuples, lists, dicts, sets, booleans, and None."
Related
I have a list of approximately 500 strings that I want to check against a CSV file containing 25,000 rows. What I currently have seems to be getting stuck looping. I basically want to skip the row if it contains any of the strings in my string list and then extract other data.
stringList = [] #strings look like "AAA", "AAB", "AAC", etc.
with open('BadStrings.csv', 'r')as csvfile:
filereader = csv.reader(csvfile, delimiter=',')
for row in filereader:
stringToExclude = row[0]
stringList.append(stringToExclude)
with open('OtherData.csv', 'r')as csvfile:
filereader = csv.reader(csvfile, delimiter=',')
next(filereader, None) #Skip header row
for row in filereader:
for s in stringList:
if s not in row:
data1 = row[1]
Edit: Not an infinite loop, but looping is taking too long.
according to Niels I would change the 2 loop and iterate over the row itself and check if the current row entry is inside the "bad" list:
for row in filereader:
for s in row:
if s not in stringlist:
data1 = row[0]
And I also dont know what you want to do with data1 but you always change the object reference when an item is not in stringList.
You could use a list to add the items to a list with data1.append(item)
You could try something like this.
stringList = [] #strings look like "AAA", "AAB", "AAC", etc.
with open('BadStrings.csv', 'r')as csvfile:
filereader = csv.reader(csvfile, delimiter=',')
for row in filereader:
stringToExclude = row[0]
stringList.append(stringToExclude)
data1 = [] # Right now you are overwriting your data1 every time. I don't know what you want to do with it, but you could for exmaple add all row[1] to a list data1
with open('OtherData.csv', 'r')as csvfile:
filereader = csv.reader(csvfile, delimiter=',')
next(filereader, None) #Skip header row
for row in filereader:
found_s = False
for s in stringList:
if s in row:
found_s = True
break
if not found_s:
data1.append(row[1]) # Add row[1] to the list is no element of stringList is found in row.
Still probably not a huge performance improvement, but at least the for loop for s in stringList: will now stop after s is found.
I have an array LiveTick = ['ted3m index','US0003m index','USGG3m index'] and I am reading a CSV file book1.csv. I have to find the row which contains the values in csv.
For example, 15th row will contain ted3m index 500 | 600 and 20th row will contain US0003m index 800 | 900 and likewise.
I then have to get the values contained in the row and parse it for each value contained in array LiveTick. How do I proceed? Below is my sample code:
with open('C:\\blp\\book1.csv', 'r') as f:
reader = csv.reader(f, delimiter=',')
writer = csv.writer(outf)
for row in reader:
for list in LiveTick:
if list in row:
print ('Found: {}'.format(row))
You can use pandas, it's pretty fast and will do all reading, writing and filtering job for you out of the box:
import pandas as pd
df = pd.read_csv('C:\\blp\\book1.csv')
filtered_df = df[df['your_column_name'].isin(LiveTick)]
# now you can save it
filtered_df.to_csv('C:\\blp\\book_filtered.csv')
You have the right idea, but there are a few improvements you can make:
Instead of a nested for loop which doesn't short-circuit, use any to compare the first column to multiple values.
Write to your csv as you go along instead of just print. This is memory-efficient, as you hold in memory only one line at any one time.
Define outf as an open object in your with statement.
Do not shadow built-in list. Use another identifier, e.g. i, for elements in LiveTick.
Here's a demo:
with open('in.csv', 'r') as f, open('out.csv', 'wb', newline='') as outf:
reader = csv.reader(f, delimiter=',')
writer = csv.writer(outf, delimiter=',')
for row in reader:
if any(i in row[0] for i in LiveTick):
writer.writerow(row)
So i would like to split string from list into multiple lists
like rows[1] should be splited into another list contained in list m
i saw this here and it hsould be accesable m[0][0] to get first item form first list .
import csv
reader = csv.reader(open("alerts.csv"), delimiter=',')
)
rows=[]
for row in reader:
rows.append(row)
num_lists=int(len(rows))
lists=[]
m=[]
for x in rows:
m.append(x.split(';')[0])
printing rows:
[['priority;status;time;object_class;host;app;inc;tool;msg'], ['P2;CLOSED;24-09-2016 20:06:41;nm;prod;;390949;HPNNM;call'], ['P2;CLOSED;24-09-2016 20:06:41;nm;prod;;390949;HPNNM;msg'], ['P2;CLOSED;24-09-2016 20:06:41;nm;prod;;390949;HPNNM;msg']]
and output should look like
m[0][0] should return pririty
you can do this pretty easily with pandas
import pandas as pd
A = pd.read_csv('yourfile.csv')
for x in A.values:
for y in x:
print y
so the 'print y' statement access each element in the row. but I mean, after the "for x in A.values" you can do just about anything
Exact solution to your question; you almost got it right (note the delimiter value):
reader = csv.reader(open("alerts.csv"), delimiter=';')
table = [row for row in reader]
print(table[0][0])
>>> priority
For easy data handling, it is often nice to explicitly extract the header like so:
reader = csv.reader(open("alerts.csv"), delimiter=';')
header = reader.next()
table = [row for row in reader]
print(header[0])
print(table[0][0])
>>> priority
>>> P2
Here's how to do it:
import csv
with open('alerts.csv') as f:
reader = csv.reader(f, delimiter=';')
next(reader) # skip over the first header row
rows = [row for row in reader]
>>> print(rows[0][0])
P2
This uses a list comprehension to read all rows from the CSV file into a list. The delimiter should be a semi-colon, not a comma; so use delimiter=';'. Also the first row is a header and is therefore skipped.
I have two CSV files that are like
CSV1
H1,H2,H3
arm,biopsy,forearm
heart,leg biopsy,biopsy
organs.csv
arm
leg
forearm
heart
skin
I need to compare both the files and get an output list like this [arm,forearm,heart,leg] but the script that I'm currently working on doesn't give me any output (I want leg also in the output, though it is mixed with biopsy in the same cell). Here's the code so far. How can I get all the matched words?
import csv
import io
alist, blist = [], []
with open("csv1.csv", "rb") as fileA:
reader = csv.reader(fileA, delimiter=',')
for row in reader:
alist.append(row)
with open("organs.csv", "rb") as fileB:
reader = csv.reader(fileB, delimiter=',')
for row in reader:
blist.append(row)
first_set = set(map(tuple, alist))
secnd_set = set(map(tuple, blist))
matches = set(first_set).intersection(secnd_set)
print matches
Try this:
import csv
alist, blist = [], []
with open("csv1.csv", "rb") as fileA:
reader = csv.reader(fileA, delimiter=',')
for row in reader:
for row_str in row:
alist += row_str.strip().split()
with open("organs.csv", "rb") as fileB:
reader = csv.reader(fileB, delimiter=',')
for row in reader:
blist += row
first_set = set(alist)
second_set = set(blist)
print first_set.intersection(second_set)
Basically, iterating through the csv file via csv reader returns a row which is a list of the items (strings) like this ['arm', 'biopsy', 'forearm'], so you have to sum lists to insert all of the items.
On the other hand, to remove duplications only one set conversion via the set() function is required, and the intersection method returns another set with the elements.
Change the part reading from csv1.csv to:
with open("csv1.csv", "rb") as fileA:
reader = csv.reader(fileA, delimiter=',')
for row in reader:
# append all words in cell
for word in row:
alist.append(word)
I would treat the CSV files as text files, get a lists of all the words in the first and the seconds, then iterate over the first list to see if any exactly match any in the second list.
I have a text file containing key-value pairs, with the last two key-value pairs containing JSON-like objects that I would like to split out into columns and write with the other values, using the keys as column headings. The first three rows of the data file input.txt look like this:
InnerDiameterOrWidth::0.1,InnerHeight::0.1,Length2dCenterToCenter::44.6743867864386,Length3dCenterToCenter::44.6768028159989,Tag::<NULL>,{StartPoint::7858.35924983374[%2C]1703.69341358077[%2C]-3.075},{EndPoint::7822.85045874375[%2C]1730.80294308742[%2C]-3.53962362760298}
InnerDiameterOrWidth::0.1,InnerHeight::0.1,Length2dCenterToCenter::57.8689351603823,Length3dCenterToCenter::57.8700464193429,Tag::<NULL>,{StartPoint::7793.52927597915[%2C]1680.91224357457[%2C]-3.075},{EndPoint::7822.85045874375[%2C]1730.80294308742[%2C]-3.43363070193163}
InnerDiameterOrWidth::0.1,InnerHeight::0.1,Length2dCenterToCenter::68.7161350545728,Length3dCenterToCenter::68.7172034962765,Tag::<NULL>,{StartPoint::7858.35924983374[%2C]1703.69341358077[%2C]-3.075},{EndPoint::7793.52927597915[%2C]1680.91224357457[%2C]-3.45819643838485}
and we eventually came up with something that worked, but there must be a much better way:
import csv
with open('input.txt', 'rb') as fin, open('output.csv', 'wb') as fout:
reader = csv.reader(fin)
writer = csv.writer(fout)
for i, line in enumerate(reader):
mysplit = [item.split('::') for item in line if item.strip()]
if not mysplit: # blank line
continue
keys, vals = zip(*mysplit)
start_vals = [item.split('[%2C]') for item in mysplit[-2]]
end_vals = [item.split('[%2C]') for item in mysplit[-1]]
a=list(keys[0:-2])
a.extend(['start1','start2','start3','end1','end2','end3'])
b=list(vals[0:-2])
b.append(start_vals[1][0])
b.append(start_vals[1][1])
b.append(start_vals[1][2][:-1])
b.append(end_vals[1][0])
b.append(end_vals[1][1])
b.append(end_vals[1][2][:-1])
if i == 0:
# if first line: write header
writer.writerow(a)
writer.writerow(b)
which produces the output file output.csv that looks like this
InnerDiameterOrWidth,InnerHeight,Length2dCenterToCenter,Length3dCenterToCenter,Tag,start1,start2,start3,end1,end2,end3
0.1,0.1,44.6743867864386,44.6768028159989,<NULL>,7858.35924983374,1703.69341358077,-3.075,7822.85045874375,1730.80294308742,-3.53962362760298
0.1,0.1,57.8689351603823,57.8700464193429,<NULL>,7793.52927597915,1680.91224357457,-3.075,7822.85045874375,1730.80294308742,-3.43363070193163
0.1,0.1,68.7161350545728,68.7172034962765,<NULL>,7858.35924983374,1703.69341358077,-3.075,7793.52927597915,1680.91224357457,-3.45819643838485
We don't want to write code like this in the future.
What is the best way to read data like this?
I'd use:
from itertools import chain
import csv
_header_translate = {
'StartPoint': ('start1', 'start2', 'start3'),
'EndPoint': ('end1', 'end2', 'end3')
}
def header(col):
header = col.strip('{}').split('::', 1)[0]
return _header_translate.get(header, (header,))
def cleancolumn(col):
col = col.strip('{}').split('::', 1)[1]
return col.split('[%2C]')
def chainedmap(func, row):
return list(chain.from_iterable(map(func, row)))
with open('input.txt', 'rb') as fin, open('output.csv', 'wb') as fout:
reader = csv.reader(fin)
writer = csv.writer(fout)
for i, row in enumerate(reader):
if not i: # first row, write header first
writer.writerow(chainedmap(header, row))
writer.writerow(chainedmap(cleancolumn, row))
The cleancolumn method takes any of your columns and returns a tuple (possibly with only one value) after removing the braces, removing everything before the first :: and splitting on the embedded 'comma'. By using itertools.chain.from_iterable() we turn the series of tuples generated from the columns into one list again for the csv writer.
When handling the first line we generate one header row from the same columns, replacing the StartPoint and EndPoint headers with the 6 expanded headers.