This is csv file
name,country,code
Georgina,Saint Helena,ET
Brooks,Austria,LR
Rosaline,Peru,DZ
How to get a particular row data without looping the whole csv file?
Looking for following syntax:
If searchName exist in csv, extract the data
searchName = 'Brooks'
with open('name.csv', 'r') as file:
reader = csv.DictReader(file)
for row in reader:
if (row['name']) == searchName :
print(row['name'] + ' >> ' + row['country'])
Thanks
Update panda solution for those who interested
import pandas as pd
df = pd.read_csv('a.csv')
select_row = df.loc[df['name'] == 'Brooks']
if select_row.empty:
print('No records')
else:
print('Print Record')
print(select_row.country)
Get first instance
search_name = 'Brooks'
with open('name.csv', 'r') as file:
output = re.search(f'{search_name}.*', file.read())
row = output.group().split(',')
print(row[0], '>>' ,row[1])
Get all instances
search_name = 'Brooks'
with open('name.csv', 'r') as file:
output = re.findall(f'{search_name}.*', file.read())
for row in output:
items = row.split(',')
print(items[0], '>>' ,items[1])
Using DataFrames
import pandas as pd
search_name = 'Brooks'
df = pd.read_csv('name.csv')
output = df[df.name == search_name].iloc[0]
print(output['name'], '>>', output['country'])
You could try using pandas and make your life easier, try something like this :
import pandas as pd
df = pd.read_csv('name.csv')
if df.iloc[5, 6]:
# execute condition
else
# execute another condition
I have given you an outline,you can try to use this and come up with a solution for your issue.
Although dataframe seems to be the best option, if you treat the csv as a simple text file, This should help you:
searchName = 'Brooks'
with open('name.csv', 'r') as f:
foo = f.read()
items=re.findall(f"{searchName}.*$",foo,re.MULTILINE)
print(items)
Output:
['Brooks,Austria,LR']
Related
You may think of this one as another redundant question asked, but I tried to go through all similar questions asked, no luck so far. In my specific use-case, I can't use pandas or any other similar library for this operation.
This is what my input looks like
AttributeName,Value
Name,John
Gender,M
PlaceofBirth,Texas
Name,Alexa
Gender,F
SurName,Garden
This is my expected output
Name,Gender,Surname,PlaceofBirth
John,M,,Texas
Alexa,F,Garden,
So far, I have tried to store my input into a dictionary and then tried writing it to a csv string. But, it is failing as I am not sure how to incorporate missing column values conditions. Here is my code so far
reader = csv.reader(csvstring.split('\n'), delimiter=',')
csvdata = {}
csvfile = ''
for row in reader:
if row[0] != '' and row[0] in csvdata and row[1] != '':
csvdata[row[0]].append(row[1])
elif row[0] != '' and row[0] in csvdata and row[1] == '':
csvdata[row[0]].append(' ')
elif row[0] != '' and row[1] != '':
csvdata[row[0]] = [row[1]]
elif row[0] != '' and row[1] == '':
csvdata[row[0]] = [' ']
for key, value in csvdata.items():
if value == ' ':
csvdata[key] = []
csvfile += ','.join(csvdata.keys()) + '\n'
for row in zip(*csvdata.values()):
csvfile += ','.join(row) + '\n'
For the above code as well, I took some help here. Thanks in advance for any suggestions/advice.
Edit #1 : Update code to imply that I am doing processing on a csv string instead of a csv file.
What you need is something like that:
import csv
with open("in.csv") as infile:
buffer = []
item = {}
lines = csv.reader(infile)
for line in lines:
if line[0] == 'Name':
buffer.append(item.copy())
item = {'Name':line[1]}
else:
item[line[0]] = line[1]
buffer.append(item.copy())
for item in buffer[1:]:
print item
If none of the attributes is mandatory, I think #framontb solution needs to be rearranged in order to work also when Name field is not given.
This is an import-free solution, and it's not super elegant.
I assume you have lines already in this form, with this columns:
lines = [
"Name,John",
"Gender,M",
"PlaceofBirth,Texas",
"Gender,F",
"Name,Alexa",
"Surname,Garden" # modified typo here: SurName -> Surname
]
cols = ["Name", "Gender", "Surname", "PlaceofBirth"]
We need to distinguish one record from another, and without mandatory fields the best I can do is start considering a new record when an attribute has already been seen.
To do this, I use a temporary list of attributes tempcols from which I remove elements until an error is raised, i.e. new record.
Code:
csvdata = {k:[] for k in cols}
tempcols = list(cols)
for line in lines:
attr, value = line.split(",")
try:
csvdata[attr].append(value)
tempcols.remove(attr)
except ValueError:
for c in tempcols: # now tempcols has only "missing" attributes
csvdata[c].append("")
tempcols = [c for c in cols if c != attr]
for c in tempcols:
csvdata[c].append("")
# write csv string with the code you provided
csvfile = ""
csvfile += ",".join(csvdata.keys()) + "\n"
for row in zip(*csvdata.values()):
csvfile += ",".join(row) + "\n"
>>> print(csvfile)
Name,PlaceofBirth,Surname,Gender
John,Texas,,M
Alexa,,Garden,F
While, if you want to sort columns according to your desired output:
csvfile = ""
csvfile += ",".join(cols) + "\n"
for row in zip(*[csvdata[k] for k in cols]):
csvfile += ",".join(row) + "\n"
>>> print(csvfile)
Name,Gender,Surname,PlaceofBirth
John,M,,Texas
Alexa,F,Garden,
This works for me:
with open("in.csv") as infile, open("out.csv", "w") as outfile:
incsv, outcsv = csv.reader(infile), csv.writer(outfile)
incsv.__next__() # Skip 1st row
outcsv.writerows(zip(*incsv))
Update: For input and output as strings:
import csv, io
with io.StringIO(indata) as infile, io.StringIO() as outfile:
incsv, outcsv = csv.reader(infile), csv.writer(outfile)
incsv.__next__() # Skip 1st row
outcsv.writerows(zip(*incsv))
print(outfile.getvalue())
I am trying to convert CSV file to JSON.
CSV File:
id,name,email
1,jim,test#gmail.com
1,jim,test2#gmail.com
2,kim,test3#gmail.com
Expected output
{"row" : {"id":1,"name":"jim","email": ["test#gmail.com","test1#gmail.com"]}},
{"row" : {"id":2,"name":"kim","email": "test3#gmail.com"}}
Here a little bit bulky implementation
import csv
import json
with open('data.csv') as csvfile:
reader = csv.reader(csvfile)
# Get headers
headers = next(reader, None)
result = {}
for row in reader:
# Combine header and line to get a dict
data = dict(zip(headers, row))
if data['id'] not in result:
data.update({'email': [data.pop('email')]})
result[data['id']] = data
else:
# Aware if id and name fields are not consistant
assert data['name'] == result[data['id']]['name']
result[data['id']]['email'].append(data['email'])
for rec in result.values():
try:
# try to unpack as a single value and if it fails leave as is
rec['email'], = rec['email']
except ValueError:
pass
print(json.dumps({'row': rec}))
You can use pandas to do this:
import pandas as pd
df = pd.read_csv('test.csv', index_col=None)
print(df)
#Output
id name email
0 1 jim test#gmail.com
1 1 jim test2#gmail.com
2 2 kim test3#gmail.com
df1 = df.groupby(['id', 'name'])['email'].apply(list).reset_index()
df_json = df1.to_json(orient='index')
print(df_json)
#Output:
{"0":{"id":1,"name":"jim","email":["test#gmail.com","test2#gmail.com"]},"1":{"id":2,"name":"kim","email":["test3#gmail.com"]}}
I'm trying to convert text file to excel sheet in python. The txt file contains data in the below specified formart
Column names: reg no, zip code, loc id, emp id, lastname, first name. Each record has one or more error numbers. Each record have their column names listed above the values. I would like to create an excel sheet containing reg no, firstname, lastname and errors listed in separate rows for each record.
How can I put the records in excel sheet ? Should I be using regular expressions ? And how can I insert error numbers in different rows for that corresponding record?
Expected output:
Here is the link to the input file:
https://github.com/trEaSRE124/Text_Excel_python/blob/master/new.txt
Any code snippets or suggestions are kindly appreciated.
Here is a draft code. Let me know if any changes needed:
# import pandas as pd
from collections import OrderedDict
from datetime import date
import csv
with open('in.txt') as f:
with open('out.csv', 'wb') as csvfile:
spamwriter = csv.writer(csvfile, delimiter=',', quoting=csv.QUOTE_MINIMAL)
#Remove inital clutter
while("INPUT DATA" not in f.readline()):
continue
header = ["REG NO", "ZIP CODE", "LOC ID", "EMP ID", "LASTNAME", "FIRSTNAME", "ERROR"]; data = list(); errors = list()
spamwriter.writerow(header)
print header
while(True):
line = f.readline()
errors = list()
if("END" in line):
exit()
try:
int(line.split()[0])
data = line.strip().split()
f.readline() # get rid of \n
line = f.readline()
while("ERROR" in line):
errors.append(line.strip())
line = f.readline()
spamwriter.writerow(data + errors)
spamwriter.flush()
except:
continue
# while(True):
# line = f.readline()
Use python-2 to run. The errors are appended as subsequent columns. It's slightly complicated the way you want it. I can fix it if still needed
Output looks like:
You can do this using the openpyxl library which is capable of depositing items directly into a spreadsheet. This code shows how to do that for your particular situation.
NEW_PERSON, ERROR_LINE = 1,2
def Line_items():
with open('katherine.txt') as katherine:
for line in katherine:
line = line.strip()
if not line:
continue
items = line.split()
if items[0].isnumeric():
yield NEW_PERSON, items
elif items[:2] == ['ERROR', 'NUM']:
yield ERROR_LINE, line
else:
continue
from openpyxl import Workbook
wb = Workbook()
ws = wb.active
ws['A2'] = 'REG NO'
ws['B2'] = 'LASTNAME'
ws['C2'] = 'FIRSTNAME'
ws['D2'] = 'ERROR'
row = 2
for kind, data in Line_items():
if kind == NEW_PERSON:
row += 2
ws['A{:d}'.format(row)] = int(data[0])
ws['B{:d}'.format(row)] = data[-2]
ws['C{:d}'.format(row)] = data[-1]
first = True
else:
if first:
first = False
else:
row += 1
ws['D{:d}'.format(row)] = data
wb.save(filename='katherine.xlsx')
This is a screen snapshot of the result.
I have a csv file which contains columns like below. I want to change the date format to 2014-1-10 and combine it with time in a new column. I would like to do this without pandas ...
Date |Time
1/10/2014|0:09:31
1/10/2014|0:10:29
The result should look like this:
Date |Time |DateTime
1/10/2014|0:09:31|2014-1-10 0:09:31
1/10/2014|0:10:29|2014-1-10 0:10:29
I tried replace, matrix [][], etc. but somehow nothing works well so far. Will appreciate your help!!
Easiest way is to use PETL:
import petl as etl
import datetime
t = etl.fromcsv('my.csv')
t = etl.addfield(t, 'DateTime',
lambda row: datetime.combine(row[0], row[1]))
etl.tocsv(t, 'mynew.csv')
Using only Python built-int modules:
import csv
import os
import datetime
inFilePath = "C:\\Temp\\SO\\test.csv"
outFilePath = "C:\\Temp\\SO\\temp.csv"
inFile = open(inFilePath, "r")
outFile = open(outFilePath, "wb")
reader = csv.reader(inFile, delimiter='|')
writer = csv.writer(outFile, delimiter='|', quotechar='"', quoting=csv.QUOTE_MINIMAL)
for row in reader:
if "Date " in row:
writer.writerow(row)
continue
newDate = datetime.datetime.strptime(row[0], '%d/%m/%Y').strftime('%Y-%m-%d')
newCell = newDate + " " + row[1]
row.append(newCell)
writer.writerow(row)
inFile.close()
outFile.close()
os.remove(inFilePath)
os.rename(outFilePath, inFilePath)
I have a text file consisting of 100 records like
fname,lname,subj1,marks1,subj2,marks2,subj3,marks3.
I need to extract and print lname and marks1+marks2+marks3 in python. How do I do that?
I am a beginner in python.
Please help
When I used split, i got an error saying
TypeError: Can't convert 'type' object to str implicitly.
The code was
import sys
file_name = sys.argv[1]
file = open(file_name, 'r')
for line in file:
fname = str.split(str=",", num=line.count(str))
print fname
If you want to do it that way, you were close. Is this what you were trying?
file = open(file_name, 'r')
for line in file.readlines():
fname = line.rstrip().split(',') #using rstrip to remove the \n
print fname
Note: its not a tested code. but it tries to solve your problem. Please give it a try
import csv
with open(file_name, 'rb') as csvfile:
marksReader = csv.reader(csvfile)
for row in marksReader:
if len(row) < 8: # 8 is the number of columns in your file.
# row has some missing columns or empty
continue
# Unpack columns of row; you can also do like fname = row[0] and lname = row[1] and so on ...
(fname,lname,subj1,marks1,subj2,marks2,subj3,marks3) = *row
# you can use float in place of int if marks contains decimals
totalMarks = int(marks1) + int(marks2) + int(marks3)
print '%s %s scored: %s'%(fname, lname, totalMarks)
print 'End.'
"""
sample file content
poohpool#signet.com; meixin_kok#hotmail.com; ngai_nicole#hotmail.com; isabelle_gal#hotmail.com; michelle-878#hotmail.com;
valerietan98#gmail.com; remuskan#hotmail.com; genevieve.goh#hotmail.com; poonzheng5798#yahoo.com; burgergirl96#hotmail.com;
insyirah_powergals#hotmail.com; little_princess-angel#hotmail.com; ifah_duff#hotmail.com; tweety_butt#hotmail.com;
choco_ela#hotmail.com; princessdyanah#hotmail.com;
"""
import pandas as pd
file = open('emaildump.txt', 'r')
for line in file.readlines():
fname = line.split(';') #using split to form a list
#print(fname)
df1 = pd.DataFrame(fname,columns=['Email'])
print(df1)