I have written a fragment of code that is fully compatible with both Python 2 and Python 3. The fragment that I wrote parses data and it builds the output as a list of CSV strings.
The script provides an option to:
write the data to a CSV file, or
display it to the stdout.
While I could easily iterate through the list and replace , with \t when displaying to stdout (second bullet option), the items are of arbitrary length, so don't line up in a nice format due to variances in tabs.
I have done quite a bit of research, and I believe that string format options could accomplish what I'm after. That said, I can't seem to find an example that helps me get the syntax correct.
I would prefer to not use an external library. I am aware that there are many options available if I went that route, but I want the script to be as compatible and simple as possible.
Here is an example:
value1,somevalue2,value3,reallylongvalue4,value5,superlongvalue6
value1,value2,reallylongvalue3,value4,value5,somevalue6
Can you help me please? Any suggestion will be much appreciated.
import csv
from StringIO import StringIO
rows = list(csv.reader(StringIO(
'''value1,somevalue2,value3,reallylongvalue4,value5,superlongvalue6
value1,value2,reallylongvalue3,value4,value5,somevalue6''')))
widths = [max(len(row[i]) for row in rows) for i in range(len(rows[0]))]
for row in rows:
print(' | '.join(cell.ljust(width) for cell, width in zip(row, widths)))
Output:
value1 | somevalue2 | value3 | reallylongvalue4 | value5 | superlongvalue6
value1 | value2 | reallylongvalue3 | value4 | value5 | somevalue6
def printCsvStringListAsTable(csvStrings):
# convert to list of lists
csvStrings = map(lambda x: x.split(','), csvStrings)
# get max column widths for printing
widths = []
for idx in range(len(csvStrings[0])):
columns = map(lambda x: x[idx], csvStrings)
widths.append(
len(
max(columns, key = len)
)
)
# print the csv strings
for row in csvStrings:
cells = []
for idx, col in enumerate(row):
format = '%-' + str(widths[idx]) + "s"
cells.append(format % (col))
print ' |'.join(cells)
if __name__ == '__main__':
printCsvStringListAsTable([
'col1,col2,col3,col4',
'val1,val2,val3,val4',
'abadfafdm,afdafag,aadfag,aadfaf',
])
Output:
col1 |col2 |col3 |col4
val1 |val2 |val3 |val4
abadfafdm |afdafag |aadfag |aadfaf
The answer by Alex Hall is definitely better and a terse form of the same code which I have written.
Related
I have a file with different sheets and column-heading but same structure. I want to convert to json. but already now I have a problem. How can I index, my first column(with different heading) to pandas?
import pandas;
datapath = 'myfile.xlsx'
datasheet = 'testsheet'
data = pandas.read_excel(datapath, sheet_name=datasheet)
index_1 = data.columns[0]
# now my problem, in bash I would do it like:
chipset = data.$(echo $index_1)
print(chipset)
# can anyone give me please a solution?
I have a excel-file (sx) sheet:
s1:
s1 col1: | s1col2
sc11data1 | sc12data1
sc11data2 | sc12data2
---
s2:
s2 col1: | s2col2
sc21data | sc22data
--
I dont know how the exact name of the heading in a sheet is but 1st sheet is always a index in my json.
I don't seem to understand your question. Do you mean you want to set the first column as the index? Doesn't data.set_index(index_1,inplace=True) work?
Hi I'm totally new to Python but am hoping someone can show me the ropes.
I have a csv reference table which contains over 1000 rows with unique Find values, example of reference table:
|Find |Replace |
------------------------------
|D2-D32-dog |Brown |
|CJ-E4-cat |Yellow |
|MG3-K454-bird |Red |
I need to do a find and replace of text in another csv file. Example of Column in another file that I need to find and replace (over 2000 rows):
|Pets |
----------------------------------------
|D2-D32-dog |
|CJ-E4-cat, D2-D32-dog |
|MG3-K454-bird, D2-D32-dog, CJ-E4-cat |
|T2- M45 Pig |
|CJ-E4-cat, D2-D32-dog |
What I need is for python to find and replace, returning the following, and if no reference, return original value:
|Expected output |
---------------------
|Brown |
|Yellow, Brown |
|Red, Brown, Yellow |
|T2- M45 Pig |
|Yellow, Brown |
Thanking you in advance.
FYI - I don't have any programming experience, usually use Excel but was told that Python will be able to achieve this. So I have given it a go in hope to achieve the above - but it's returning invalid syntax error...
import pandas as pd
dfRef1 = pd.read_csv(r'C:\Users\Downloads\Lookup.csv')
#File of Find and Replace Table
df= pd.read_csv(r'C:\Users\Downloads\Data.csv')
#File that contains text I want to replace
dfCol = df['Pets'].tolist()
#converting Pets column to list from Data.csv file
for x in dfCol:
Split = str(x).split(',')
#asking python to look at each element within row to find and replace
newlist=[]
for index,refRow in dfRef1.iteritems():
newRow = []
for i in Split:
if i == refRow['Find']:
newRow.append(refRow['Replace']
else
newRow.append(refRow['Find'])
newlist.append(newRow)
newlist
#if match found replace, else return original text
#When run, the code is Returning - SyntaxError: invalid syntax
#I've also noticed that the dfRef1 dtype: object
Am I even on the right track? Any advise is greatly appreciated.
I understand the concept of Excel VLookup, however, because the cell value contains multiple lookup items which i need to replace within the same cell, I'm unable to do this in Excel.
Thanks again.
You can save the excel file as CSV to make your life easier
then strip your file to contain only the table without any unnecessary information.
load the CSV file to python with pandas:
import pandas as pd
df_table1 = pd.read_csv("file/path/filename.csv")
df_table2 = pd.read_csv("file/path/other_filename.csv")
df_table1[['wanted_to_be_replaced_col_name']] = df_table2[['wanted_col_to_copy']]
for further informaion and more complex assignment go visit the pandas documentaion # https://pandas.pydata.org/
hint: for large amount of columns check the iloc function
I am new to python programming, pardon me if I make any mistakes. I am writing a python script to read a csv file and print out the required cell of the column if it contains the information in the row.
| A | B | C
---|----|---|---
1 | Re | Mg| 23
---|----|---|---
2 | Ra | Fe| 90
For example, I if-else the row C for value between 20 to 24. Then if the condition passes, it will return Cell A1 (Re) as the result.
At the moment, i only have the following and i have no idea how to proceed from here on.
f = open( 'imageResults.csv', 'rU' )
for line in f:
cells = line.split( "," )
if(cells[2] >= 20 and cells[2] <= 24):
f.close()
This might contain the answer to my question but i can't seem to make it work.
UPDATE
If in the row, there is a header, how do i get it to work? I wanted to change the condition to string but it don't work if I want to search for a range of values.
| A | B | C
---|----|---|---
1 |Name|Lat|Ref
---|----|---|---
2 | Re | Mg| 23
---|----|---|---
3 | Ra | Fe| 90
You should use a csv reader. It's built into python so there's no dependencies to install. Then you need to tell python that the third column is an integer. Something like this will do it:
import csv
with open('data.csv', 'rb') as f:
for line in csv.reader(f):
if 20 <= int(line[2]) <= 24:
print(line)
With this data in data.csv:
Re,Mg,23
Ra,Fe,90
Ha,Ns,50
Ku,Rt,20
the output will be:
$ python script.py
['Re', 'Mg', '23']
['Ku', 'Rt', '20']
Update:
If in the [first] row, there is a header, how do i get it to work?
There's csv.DictReader which is for that. Indeed it is safer to work with DictReader, especially when the order of the columns might change or you insert a column before the third column. Given this data in data.csv
Name,Lat,Ref
Re,Mg,23
Ra,Fe,90
Ha,Ns,50
Ku,Rt,20
Then is this the python script:
import csv
with open('data.csv', 'rb') as f:
for line in csv.DictReader(f):
if 20 <= int(line['Ref']) <= 24:
print(line)
P.S. Welcome at python. It's a good language for learning to program
What I have at the moment is that I take in a cvs file and determine the related data between a given start time and end time. I write this relevant data into a different cvs file. All of this works correctly.
What I want to do is convert all the numerical data (not touching the date or time) from the original cvs file from bytes into kilobytes and only take one decimal place when presenting the kilobyte value. These altered numerical data is what I want written into the new cvs file.
The numerical data seems to be read as a string so they I’m a little unsure how to do this, any help would be appreciated.
The original CSV (when opened in excel) is presented like this:
Date:-------- | Title1:----- | Title2: | Title3: | Title4:
01/01/2016 | 32517293 | 45673 | 0.453 |263749
01/01/2016 | 32721993 | 65673 | 0.563 |162919
01/01/2016 | 33617293 | 25673 | 0.853 |463723
But I want the new CSV to look something like this:
Date:-------- | Title1:--- | Title2: | Title3: | Title4:
01/01/2016 | 32517.2 | 45673 | 0.0 | 263.749
01/01/2016 | 32721.9 | 65673 | 0.0 | 162.919
01/01/2016 | 33617.2 | 25673 | 0.0 | 463.723
My Python function so far:
def edit_csv_file(Name,Start,End):
#Open file to be written to
f_writ = open(logs_folder+csv_file_name, 'a')
#Open file to read from (i.e. the raw csv data from the windows machine)
csvReader = csv.reader(open(logs_folder+edited_csv_file_name,'rb'))
#Remove double quotation marks when writing new file
writer = csv.writer(f_writ,lineterminator='\n', quotechar = '"')
for row in csvReader:
#Write the data relating to the modules greater than 10 seconds
if get_sec(row[0][11:19]) >= get_sec(Start):
if get_sec(row[0][11:19]) <= get_sec(End):
writer.writerow(row)
f_writ.close()
The following should do what you need:
import csv
with open('input.csv', 'rb') as f_input, open('output.csv', 'wb') as f_output:
csv_input = csv.reader(f_input)
csv_output = csv.writer(f_output)
csv_output.writerow(next(csv_input)) # write header
for cols in csv_input:
for col in range(1, len(cols)):
try:
cols[col] = "{:.1f}".format(float(cols[col]) / 1024.0)
except ValueError:
pass
csv_output.writerow(cols)
Giving you the following output csv file:
Date:--------,Title1:-----,Title2:,Title3:,Title4:
01/01/2016,31755.2,44.6,0.0,257.6
01/01/2016,31955.1,64.1,0.0,159.1
01/01/2016,32829.4,25.1,0.0,452.9
Tested using Python 2.7.9
int() is the standard way in python to convert a string to an int. it is used like
int("5") + 1
this will return 6. Hope this helps.
Depending on what else you may find yourself working on, I'd be tempted to use pandas for this one - given a file with the contents you describe, after importing the pandas module:
import pandas as pd
Read in the csv file (automagically recognising that the 1st line is a header) - the delimiter in your case may not need specifying - if it's the default comma - but other delimiters are available - I'm a fan of the pipe '|' character.
csv = pd.read_csv("pandas_csv.csv",delimiter="|")
Then you can enrich/process your data as you like using the column names as references.
For example, to convert a column by some factor you might write:
csv['Title3'] = csv['Title3']/1024
The datatypes are again, automatically determined, so if a column is all numeric (as in the example) there's no need to do any conversion from datatype to datatype, 99% of the time, it figures it out correctly based on the data in the file.
Once you're happy with the edits, type
csv
To see a representation of the results, and then
csv.to_csv("pandas_csv.csv")
To save the results (in this case, overwriting the original file, but you may want to write something more like:
csv.to_csv("pandas_csv_kilobytes.csv")
There are more useful/powerful functions available, but I know no easier method for manipulating tabular data than this - it's better and more reliable than Excel, and in years to come, you will celebrate the day you started using pandas!
In this case, you've opened, edited and saved the file using the following 4 lines of code:
import pandas as pd
csv = pd.read_csv("pandas_csv.csv",delimiter="|")
csv['Title3'] = csv['Title3']/1024
csv.to_csv("pandas_csv_kilobytes.csv")
That's about as powerful and convenient as it gets.
And another solution using a func (bytesto) from: gist.github.com/shawnbutts/3906915
def bytesto(bytes, to):
a = {'k' : 1, 'm': 2, 'g' : 3, 't' : 4, 'p' : 5, 'e' : 6 }
r = float(bytes)
for i in range(a[to]):
r = r / 1024
return(int(r)) # ori not return int
with open('csvfile.csv', 'rb') as csvfile:
data = csv.reader(csvfile, delimiter='|', quotechar='|')
row=iter(data)
next(row) # Jump title
for row in data:
print 'kb= ' + str(bytesto((row[1]), 'k')), 'kb= ' + str(bytesto((row[2]), 'k')), 'kb= ' + str(bytesto((row[3]), 'k')), 'kb= ' + str(bytesto((row[4]), 'k'))
Result:
kb= 31755 kb= 44 kb= 0 kb= 257
kb= 31955 kb= 64 kb= 0 kb= 159
kb= 32829 kb= 25 kb= 0 kb= 452
Hope this help u a bit.
if s is your string representing a byte value, you can convert to a string representing a kilobyte value with a single decimal place like this:
'%.1f' % (float(s)/1024)
Alternatively:
str(round(float(s)/1024, 1))
EDIT:
To prevent errors for non-digit strings, you can just make a conditional
'%.1f' % (float(s)/1024) if s.isdigit() else ''
I am trying to parse through a CSV file and extract few columns from the CSV.
ID | Code | Phase |FBB | AM | Development status | AN REMARKS | stem | year | IN -NAME |IN Year |Company
L2106538 |Rs124 | 4 | | | Unknown | | -pre- | 1982 | Domoedne | 1982 | XYZ
I would like to group and extract few columns for uploading them to different models.
For example I would like to group first 3 columns to a model, next two to a different model, first column and the 6, 7 to a different model and so on.
I also need to keep the header of the file and store the data as key value pair so that I would know which column should go for a particular field in a model.
This is what I have so far.
def group_header_value(file):
reader = csv.DictReader(open(file, 'r'))# to have the header and get the data as a key value pair.
all_result= []
for row in reader:
print row
all_result.append(row)
return all_result
def group_by_models(all_results):
MD = range(1,3) # to get the required cols.
for every_row in all_results:
contents = [(every_row[i] for i in MD)]
print contents
def handle(self, *args, **options):
database = options.get('database')
filename = options.get('filename')
all_results = group_header_value(filename)
print 'grouped_bymodel', group_by_models(all_results)
This is what I get when I try to get the contents
grouped_by model: at 0x7f9f5382e0f0>
at 0x7f9f5382e0a0>
at 0x7f9f5382e0f0>
Is there a different approach to extract particular columns in DictReader? how else can I extract required columns using DictReader. Thanks
(every_row[i] for i in MD) is a generator expression. The syntax for a generator expression is (mostly) the same as that for a list comprehension, except that a generator expression is enclosed by parentheses, (...), while a list comprehension uses brackets, [...].
[(every_row[i] for i in MD)] is a list containing one element, the generator expression.
To fix your code with minimal changes, remove the parentheses:
def group_by_models(all_results):
MD = range(1,3) # to get the required cols.
for every_row in all_results:
contents = [every_row[i] for i in MD]
print(contents)
You could also make group_by_models more reusable by making MD a parameter:
def group_by_models(all_results, MD=range(3)):
for every_row in all_results:
contents = [every_row[i] for i in MD]
print(contents)