I have the following information extracted from a JSON and saved on a variables. The variables and its information are:
tickets = ['DC-830', 'DC-463’, ' DC-631’]
duration = ['5h 3m', '1h 7m' , ‘3h 4m']
When I use writerow() if the JSON has only one value for example, tickets = 'DC-830', I am able to save the information in a csv file. However, if it has 2 or more values it writes the information in the same row.
This is what I get:
Ticket | Duration
['DC-830', 'DC-463’, ' DC-631’] | ['5h 3m', '1h 7m' , ‘3h 4m']
Instead I need something like this:
Ticket | Duration
DC-830 | 5h 3m
DC-463 | 1h 7m
DC-631 | 3h 4m
This is the code:
issues_test=s_json[['issues'][0]]
tickets,duration=[],[]
for item in issues_test:
tickets.append(item['key'])
duration.append(item['fields']['customfield_154'])
header = ['Ticket', 'Duration']
with open('P1_6.7.csv', 'a') as arc:
writer = csv.writer(arc)
writer.writerow(header)
writer.writerow([tickets, duration])
As the singular name suggests, writerow() just writes one row. The argument should be a list of strings or numbers. You're giving it a 2-dimensional list, and expecting it to pair them up and write each pair to a separate row.
To write multiple rows, use writerows() (notice the plural in the name). You can use zip() to pair up the elements of the two lists.
writer.writerows(zip(tickets, duration))
I'm writing a short script to validate data in a CSV and I'm formatting the results to dump to stdout and for readability, I'm adding 5 space padding on the left. Note: I'm NOT using format because I don't want to justify output.
Code:
def duplicate_data():
dup_df = inventory_df[inventory_df.duplicated(['STORE_NO','SKU'],keep=False)]
if dup_df.empty:
print(five, 'INFO: No Duplicate Entries Found')
else:
#print('\n')
print(five, 'WARN: Duplicate STORE_ID and SKU Data Found!')
print(five, dup_df.to_string(index=False))
Results:
It all works great until it prints the data frame:
WARN: Duplicate STORE_ID and SKU Data Found!
Please Copy/Paste the following and send to the customer:
STORE_NO SKU ON_HAND_QTY
10000001 1000000000007 2
10000002 1000000000007 8
I could iterate over the rows but the formatting is worse than the example above.
for rows in dup_df.iterrows():
print(five,rows)
Any thoughts as to how I can format the data frame output?
Not super nice but you could to do something like this:
def padlines(text, padding):
return "\n".join(padding + l for l in text.splitlines())
And then padlines(df.to_string(), five)
Take this invoice.txt for example
Invoice Number
INV-3337
Order Number
12345
Invoice Date
January 25, 2016
Due Date
January 31, 2016
And this is what dict.txt looks like:
Invoice Date
Invoice Number
Due Date
Order Number
I am trying to find keywords from 'dict.txt' in 'invoice.txt' and then add it and the text which comes after it (but before the next keyword) in a 2 column datatable.
So it would look like :
col1 ----- col2
Invoice number ------ INV-3337
order number ---- 12345
Here is what I have done till now
with open('C:\invoice.txt') as f:
invoices = list(f)
with open('C:\dict.txt') as f:
for line in f:
dict = line.strip()
for invoice in invoices:
if dict in invoice:
print invoice
This is working but the ordering is all wrong (it is as in dict.txt and not as in invoice.txt)
i.e.
The output is
Invoice Date
Invoice Number
Due Date
Order Number
instead of the order in the invoice.txt , which is
invoice number
order number
invoice date
due date
Can you help me with how I should proceed further ?
Thank You.
This should work. You can load your invoice data into a list, and your dict data into a set for easy lookup.
with open('C:\invoice.txt') as f:
invoice_data = [line.strip() for line in f if line.strip()]
with open('C:\dict.txt') as f:
dict_data = set([line.strip() for line in f if line.strip()])
Now iterate over invoices, 2 at a time and print out the line sets that match.
for i in range(0, len(invoice_data), 2):
if invoice_data[i] in dict_data:
print(invoive_data[i: i + 2])
What I have at the moment is that I take in a cvs file and determine the related data between a given start time and end time. I write this relevant data into a different cvs file. All of this works correctly.
What I want to do is convert all the numerical data (not touching the date or time) from the original cvs file from bytes into kilobytes and only take one decimal place when presenting the kilobyte value. These altered numerical data is what I want written into the new cvs file.
The numerical data seems to be read as a string so they I’m a little unsure how to do this, any help would be appreciated.
The original CSV (when opened in excel) is presented like this:
Date:-------- | Title1:----- | Title2: | Title3: | Title4:
01/01/2016 | 32517293 | 45673 | 0.453 |263749
01/01/2016 | 32721993 | 65673 | 0.563 |162919
01/01/2016 | 33617293 | 25673 | 0.853 |463723
But I want the new CSV to look something like this:
Date:-------- | Title1:--- | Title2: | Title3: | Title4:
01/01/2016 | 32517.2 | 45673 | 0.0 | 263.749
01/01/2016 | 32721.9 | 65673 | 0.0 | 162.919
01/01/2016 | 33617.2 | 25673 | 0.0 | 463.723
My Python function so far:
def edit_csv_file(Name,Start,End):
#Open file to be written to
f_writ = open(logs_folder+csv_file_name, 'a')
#Open file to read from (i.e. the raw csv data from the windows machine)
csvReader = csv.reader(open(logs_folder+edited_csv_file_name,'rb'))
#Remove double quotation marks when writing new file
writer = csv.writer(f_writ,lineterminator='\n', quotechar = '"')
for row in csvReader:
#Write the data relating to the modules greater than 10 seconds
if get_sec(row[0][11:19]) >= get_sec(Start):
if get_sec(row[0][11:19]) <= get_sec(End):
writer.writerow(row)
f_writ.close()
The following should do what you need:
import csv
with open('input.csv', 'rb') as f_input, open('output.csv', 'wb') as f_output:
csv_input = csv.reader(f_input)
csv_output = csv.writer(f_output)
csv_output.writerow(next(csv_input)) # write header
for cols in csv_input:
for col in range(1, len(cols)):
try:
cols[col] = "{:.1f}".format(float(cols[col]) / 1024.0)
except ValueError:
pass
csv_output.writerow(cols)
Giving you the following output csv file:
Date:--------,Title1:-----,Title2:,Title3:,Title4:
01/01/2016,31755.2,44.6,0.0,257.6
01/01/2016,31955.1,64.1,0.0,159.1
01/01/2016,32829.4,25.1,0.0,452.9
Tested using Python 2.7.9
int() is the standard way in python to convert a string to an int. it is used like
int("5") + 1
this will return 6. Hope this helps.
Depending on what else you may find yourself working on, I'd be tempted to use pandas for this one - given a file with the contents you describe, after importing the pandas module:
import pandas as pd
Read in the csv file (automagically recognising that the 1st line is a header) - the delimiter in your case may not need specifying - if it's the default comma - but other delimiters are available - I'm a fan of the pipe '|' character.
csv = pd.read_csv("pandas_csv.csv",delimiter="|")
Then you can enrich/process your data as you like using the column names as references.
For example, to convert a column by some factor you might write:
csv['Title3'] = csv['Title3']/1024
The datatypes are again, automatically determined, so if a column is all numeric (as in the example) there's no need to do any conversion from datatype to datatype, 99% of the time, it figures it out correctly based on the data in the file.
Once you're happy with the edits, type
csv
To see a representation of the results, and then
csv.to_csv("pandas_csv.csv")
To save the results (in this case, overwriting the original file, but you may want to write something more like:
csv.to_csv("pandas_csv_kilobytes.csv")
There are more useful/powerful functions available, but I know no easier method for manipulating tabular data than this - it's better and more reliable than Excel, and in years to come, you will celebrate the day you started using pandas!
In this case, you've opened, edited and saved the file using the following 4 lines of code:
import pandas as pd
csv = pd.read_csv("pandas_csv.csv",delimiter="|")
csv['Title3'] = csv['Title3']/1024
csv.to_csv("pandas_csv_kilobytes.csv")
That's about as powerful and convenient as it gets.
And another solution using a func (bytesto) from: gist.github.com/shawnbutts/3906915
def bytesto(bytes, to):
a = {'k' : 1, 'm': 2, 'g' : 3, 't' : 4, 'p' : 5, 'e' : 6 }
r = float(bytes)
for i in range(a[to]):
r = r / 1024
return(int(r)) # ori not return int
with open('csvfile.csv', 'rb') as csvfile:
data = csv.reader(csvfile, delimiter='|', quotechar='|')
row=iter(data)
next(row) # Jump title
for row in data:
print 'kb= ' + str(bytesto((row[1]), 'k')), 'kb= ' + str(bytesto((row[2]), 'k')), 'kb= ' + str(bytesto((row[3]), 'k')), 'kb= ' + str(bytesto((row[4]), 'k'))
Result:
kb= 31755 kb= 44 kb= 0 kb= 257
kb= 31955 kb= 64 kb= 0 kb= 159
kb= 32829 kb= 25 kb= 0 kb= 452
Hope this help u a bit.
if s is your string representing a byte value, you can convert to a string representing a kilobyte value with a single decimal place like this:
'%.1f' % (float(s)/1024)
Alternatively:
str(round(float(s)/1024, 1))
EDIT:
To prevent errors for non-digit strings, you can just make a conditional
'%.1f' % (float(s)/1024) if s.isdigit() else ''
I am trying to parse through a CSV file and extract few columns from the CSV.
ID | Code | Phase |FBB | AM | Development status | AN REMARKS | stem | year | IN -NAME |IN Year |Company
L2106538 |Rs124 | 4 | | | Unknown | | -pre- | 1982 | Domoedne | 1982 | XYZ
I would like to group and extract few columns for uploading them to different models.
For example I would like to group first 3 columns to a model, next two to a different model, first column and the 6, 7 to a different model and so on.
I also need to keep the header of the file and store the data as key value pair so that I would know which column should go for a particular field in a model.
This is what I have so far.
def group_header_value(file):
reader = csv.DictReader(open(file, 'r'))# to have the header and get the data as a key value pair.
all_result= []
for row in reader:
print row
all_result.append(row)
return all_result
def group_by_models(all_results):
MD = range(1,3) # to get the required cols.
for every_row in all_results:
contents = [(every_row[i] for i in MD)]
print contents
def handle(self, *args, **options):
database = options.get('database')
filename = options.get('filename')
all_results = group_header_value(filename)
print 'grouped_bymodel', group_by_models(all_results)
This is what I get when I try to get the contents
grouped_by model: at 0x7f9f5382e0f0>
at 0x7f9f5382e0a0>
at 0x7f9f5382e0f0>
Is there a different approach to extract particular columns in DictReader? how else can I extract required columns using DictReader. Thanks
(every_row[i] for i in MD) is a generator expression. The syntax for a generator expression is (mostly) the same as that for a list comprehension, except that a generator expression is enclosed by parentheses, (...), while a list comprehension uses brackets, [...].
[(every_row[i] for i in MD)] is a list containing one element, the generator expression.
To fix your code with minimal changes, remove the parentheses:
def group_by_models(all_results):
MD = range(1,3) # to get the required cols.
for every_row in all_results:
contents = [every_row[i] for i in MD]
print(contents)
You could also make group_by_models more reusable by making MD a parameter:
def group_by_models(all_results, MD=range(3)):
for every_row in all_results:
contents = [every_row[i] for i in MD]
print(contents)