Splitting an array with \x00\t as the separator

Splitting an array with \x00\t as the separator - python

I have imported from a csv file all the data by using:
import pandas as pd
# Import data using panda
df = pd.read_csv('ML_Cancelaciones_20190301.csv','rb', engine='python')
x = df.values
I used 'rb' as it was impossible using utf-8 and other methods.
When I try utf-16 I get the following error:
ParserError: Expected 2 fields in line 6666, saw 3
I believe this might be due to a 'ñ' being present in this row.
Using 'rb' gives me an array with twice as many rows as the original csv file, with one row being empty and the other row containing all columns stack together. The answer looks like so:
array([['\x00'],
[
'\x001\x000\x004\x000\x001\x007\x00H\x001\x00\t\x006\x003\x00\t\x002\x000\x001\x006\x000\x008\x001\x002\x00\t\x009\x009\x009\x009\x009\x009\x009\x009\x00\t\x002\x000\x001\x006\x000\x008\x001\x002\x00\t\x001\x004\x00:\x005\x008\x00\t\x002\x000\x001\x006\x000\x008\x002\x006\x00\t\x002\x000\x001\x006\x000\x008\x002\x009\x00\t\x000\x00\t\x00D\x00e\x00s\x00c\x00o\x00n\x00o\x00c\x00i\x00d\x00o\x00\t\x000\x00\t\x00F\x00A\x00-\x00R\x00S\x00\t\x00F\x00a\x00c\x00t\x00u\x00r\x00a\x00d\x00a\x00 \x00(\x00N\x00)\x00\t\x00A\x00c\x00t\x00i\x00v\x00a\x00\t\x00E\x00s\x00t\x00a\x00d\x00o\x00s\x00 \x00U\x00n\x00i\x00d\x00o\x00s\x00\t\x002\x00\t\x00E\x00s\x00t\x00a\x00d\x00o\x00s\x00 \x00U\x00n\x00i\x00d\x00o\x00s\x00\t\x002\x00\t\x00E\x00.\x00E\x00.\x00U\x00.\x00U\x00.\x00\t\x00E\x00s\x00p\x00a\x00ñ\x00o\x00l\x00\t\x00E\x00S\x00\t\x00P\x00T\x00G\x00\t\x00P\x00r\x00e\x00s\x00t\x00i\x00g\x00e\x00\t\x00P\x00T\x00G\x00\t\x00P\x00r\x00e\x00s\x00t\x00i\x00g\x00e\x00\t\x00E\x00D\x00\t\x00E\x00-\x00D\x00i\x00s\x00t\x00r\x00i\x00b\x00u\x00t\x00i\x00o\x00n\x00\t\x00B\x002\x00B\x00\t\x00\t\x00D\x00e\x00s\x00c\x00o\x00n\x00o\x00c\x00i\x00d\x00o\x00\t\x00D\x00e\x00s\x00c\x00o\x00n\x00o\x00c\x00i\x00d\x00o\x00\t\x00P\x00r\x00o\x00m\x00o\x00c\x00i\x00o\x00n\x00a\x00l\x00\t\x00D\x00e\x00s\x00c\x00o\x00n\x00o\x00c\x00i\x00d\x00o\x00\t\x00T\x00o\x00d\x00o\x00 \x00I\x00n\x00c\x00l\x00u\x00i\x00d\x00o\x00\t\x001\x000\x000\x00%\x00\t\x00J\x00u\x00n\x00e\x00 \x00S\x00a\x00l\x00e\x00 \x00\t\x00J\x00u\x00n\x00e\x00 \x00S\x00a\x00l\x00e\x00 \x00\t\x00V\x00\t\x00N\x00u\x00e\x00v\x00a\x00\t\x00D\x00o\x00u\x00b\x00l\x00e\x00\t\x00N\x00o\x00\t\x00-\x009\x009\x00\t\x00\t\x00D\x00e\x00s\x00c\x00o\x00n\x00o\x00c\x00i\x00d\x00o\x00\t\x00\t\x00\t\x00D\x00e\x00s\x00c\x00o\x00n\x00o\x00c\x00i\x00d\x00o\x00\t\x00N\x00o\x00\t\x00N\x00o\x00\t\x00\t\x00\t\x00S\x00í\x00\t\x002\x00\t\x00P\x00e\x00n\x00d\x00i\x00e\x00n\x00t\x00e\x00 \x00d\x00e\x00 \x00C\x00o\x00b\x00r\x00o\x00\t\x00-\x009\x009\x00\t\x00D\x00e\x00s\x00c\x00o\x00n\x00o\x00c\x00i\x00d\x00o\x00\t\x00C\x00R\x00\t\x00C\x00r\x00é\x00d\x00i\x00t\x00o\x00\t\x003\x002\x001\x000\x009\x00\t\x001\x005\x000\x003\x001\x005\x00\t\x00D\x00e\x00s\x00c\x00o\x00n\x00o\x00c\x00i\x00d\x00o\x00\t\x00N\x00o\x00\t\x00E\x00.\x00E\x00.\x00U\x00.\x00U\x00.\x00\t\x00U\x00S\x00A\x00\t\x00P\x00A\x00B\x00L\x00O\x00\t\x00S\x00r\x00.\x00\t\x00\t\x00\t\x00D\x00e\x00s\x00c\x00o\x00n\x00o\x00c\x00i\x00d\x00o\x00\t\x00R\x00e\x00p\x00ú\x00b\x00l\x00i\x00c\x00a\x00 \x00D\x00o\x00m\x00i\x00n\x00i\x00c\x00a\x00n\x00a\x00\t\x003\x00\t\x006\x00\t\x002\x00\t\x002\x00\t\x000\x00\t\x002\x00\t\x004\x000\x008\x00,\x009\x006\x000\x000\x00'],
['\x00'],
...,
[ '\x00V\x003\x000\x004\x000\x001\x00H\x001\x00\t\x006\x001\x00\t\x002\x000\x001\x005\x000\x004\x001\x005\x00\t\x009\x009\x009\x009\x009\x009\x009\x009\x00\t\x002\x000\x001\x005\x000\x004\x001\x005\x00\t\x001\x006\x00:\x000\x000\x00\t\x002\x000\x001\x005\x000\x004\x001\x008\x00\t\x002\x000\x001\x005\x000\x004\x002\x006\x00\t\x000\x00\t\x00D\x00e\x00s\x00c\x00o\x00n\x00o\x00c\x00i\x00d\x00o\x00\t\x000\x00\t\x00F\x00A\x00-\x00R\x00S\x00\t\x00F\x00a\x00c\x00t\x00u\x00r\x00a\x00d\x00a\x00 \x00(\x00N\x00)\x00\t\x00A\x00c\x00t\x00i\x00v\x00a\x00\t\x00E\x00s\x00t\x00a\x00d\x00o\x00s\x00 \x00U\x00n\x00i\x00d\x00o\x00s\x00\t\x002\x00\t\x00E\x00s\x00t\x00a\x00d\x00o\x00s\x00 \x00U\x00n\x00i\x00d\x00o\x00s\x00\t\x002\x00\t\x00E\x00.\x00E\x00.\x00U\x00.\x00U\x00.\x00\t\x00E\x00n\x00g\x00l\x00i\x00s\x00h\x00\t\x00E\x00N\x00\t\x00B\x002\x00C\x00 \x00M\x00\t\x00W\x00e\x00b\x00 \x00C\x00a\x00l\x00l\x00 \x00C\x00e\x00n\x00t\x00e\x00r\x00\t\x00B\x002\x00C\x00\t\x00W\x00e\x00b\x00 \x00C\x00l\x00i\x00e\x00n\x00t\x00e\x00\t\x00B\x002\x00C\x00\t\x00B\x00u\x00s\x00i\x00n\x00e\x00s\x00s\x00-\x00t\x00o\x00-\x00C\x00u\x00s\x00t\x00o\x00m\x00e\x00r\x00\t\x00B\x002\x00C\x00\t\x00\t\x00D\x00e\x00s\x00c\x00o\x00n\x00o\x00c\x00i\x00d\x00o\x00\t\x00D\x00e\x00s\x00c\x00o\x00n\x00o\x00c\x00i\x00d\x00o\x00\t\x00P\x00r\x00o\x00m\x00o\x00c\x00i\x00o\x00n\x00a\x00l\x00\t\x00D\x00e\x00s\x00c\x00o\x00n\x00o\x00c\x00i\x00d\x00o\x00\t\x00T\x00o\x00d\x00o\x00 \x00I\x00n\x00c\x00l\x00u\x00i\x00d\x00o\x00\t\x001\x000\x000\x00%\x00\t\x00S\x00P\x00R\x00I\x00N\x00G\x00 \x00S\x00U\x00P\x00E\x00R\x00 \x00S\x00A\x00L\x00E\x00\t\x00S\x00P\x00R\x00I\x00N\x00G\x00 \x00S\x00U\x00P\x00E\x00R\x00 \x00S\x00A\x00L\x00E\x00\t\x00M\x00\t\x00M\x00o\x00d\x00i\x00f\x00i\x00c\x00a\x00d\x00a\x00\t\x00J\x00u\x00n\x00i\x00o\x00r\x00 \x00S\x00u\x00i\x00t\x00e\x00\t\x00N\x00o\x00\t\x00-\x009\x009\x00\t\x00\t\x00D\x00e\x00s\x00c\x00o\x00n\x00o\x00c\x00i\x00d\x00o\x00\t\x00\t\x00\t\x00D\x00e\x00s\x00c\x00o\x00n\x00o\x00c\x00i\x00d\x00o\x00\t\x00N\x00o\x00\t\x00N\x00o\x00\t\x00\t\x00\t\x00N\x00o\x00\t\x002\x00\t\x00P\x00e\x00n\x00d\x00i\x00e\x00n\x00t\x00e\x00 \x00d\x00e\x00 \x00C\x00o\x00b\x00r\x00o\x00\t\x00-\x009\x009\x00\t\x00D\x00e\x00s\x00c\x00o\x00n\x00o\x00c\x00i\x00d\x00o\x00\t\x00T\x00C\x00\t\x00T\x00a\x00r\x00j\x00e\x00t\x00a\x00 \x00C\x00r\x00é\x00d\x00i\x00t\x00o\x00\t\x00-\x009\x009\x00\t\x00-\x009\x008\x00\t\x00D\x00e\x00s\x00c\x00o\x00n\x00o\x00c\x00i\x00d\x00o\x00\t\x00N\x00o\x00\t\x00D\x00e\x00s\x00c\x00o\x00n\x00o\x00c\x00i\x00d\x00o\x00\t\x00-\x009\x009\x00\t\x00D\x00I\x00L\x00L\x00O\x00N\x00\t\x00\t\x001\x009\x006\x001\x00-\x001\x002\x00-\x001\x007\x00 \x000\x000\x00:\x000\x000\x00:\x000\x000\x00\t\x00\t\x00E\x00.\x00E\x00.\x00U\x00.\x00U\x00.\x00\t\x00E\x00s\x00t\x00a\x00d\x00o\x00s\x00 \x00U\x00n\x00i\x00d\x00o\x00s\x00\t\x008\x00\t\x003\x002\x00\t\x004\x00\t\x003\x00\t\x001\x00\t\x000\x00\t\x003\x004\x001\x006\x00,\x000\x000\x000\x000\x00'],
['\x00'],
['\x00']], dtype=object)
I wanted to convert this array so that it has a shape nrowsxncolumns of the original csv file. I also wanted the entries to be the same as the original file, i.e. words and numbers.
How could I do this?
An example of the data is here:
csv file opened with word pad
The row: '\x001\x000\x004\x000\x001\x007\x00H\x001\x00\t\x006\x003\x00\t\x002\x000\x001\x006\x000\x008\x001\x002\x00\t\...'
should look like:
['104017H1', '63', '20160812',...]
Therefore all values have a '\x00' before them and each column is separated by a '\x00\t. Is there a way I can do this trasformation?
Thank you very much

You can try replace and split :
a = '\x001\x000\x004\x000\x001\x007\x00H\x001\x00\t\x006\x003\x00\t\x002\x000\x001\x006\x000\x008\x001\x002\x00\t'
a.replace('\x00','').split('\t')
OUTPUT :
['104017H1', '63', '20160812', '']

Related

pandas reading csv with one row spanning multiple lines

My csv starts out like this:
,index,spotify_id,artist_name,track_name,album_name,duration_ms,lyrics,lyrics_bert_embeddings
0,0,5Jk0vfT81ltt2rYyrWDzZ5,Hundred Waters,Xtalk - Kodak to Graph Remix,The Moon Rang Like a Bell,285327,not fetched,"[ 0.00722605 -0.23726921 0.15163635 -0.28774077 0.07081255 0.26606813
each row ends like this in a new line:
0.03439684 -0.29289168 0.13590978 0.2332756 -0.24305075 0.2034984 ]"
These values are from a big numpy array encoded with np.array2string() and span multiple lines in the csv.
When using pd.read_csv it throws an "ParserError: Error tokenizing data. C error: EOF inside string starting at row 90607". When using the parameter engine="python" it throws an "ParserError: unexpected end of data". When using the seperator sep='\t+' it just puts each line in a new row in the dataframe. When using csv.reader by using with open(file_path) and then iterating through each line, the same happens as with the sep='\t+'.
Is there a way to automatically append each row to the original row it belongs to or do I have to preprocess this by hand?

I tried to use your csv data to check it. I pasted the code along with the answer below,
import pandas as pd
import csv
data_path = 'dt.csv'
df = pd.read_csv(data_path, header = None, quoting=csv.QUOTE_NONE, encoding='utf-8')
dt_json = pd.DataFrame.to_json(df)
print(dt_json)
For an example, I just tried to change the data format from CSV to JSON using pandas dataframe.
{"0":{"0":null,"1":0.0},
"1":{"0":"index","1":"0"},
"2":{"0":"spotify_id","1":"5Jk0vfT81ltt2rYyrWDzZ5"},
"3":{"0":"artist_name","1":"Hundred Waters"},
"4":{"0":"track_name","1":"Xtalk - Kodak to Graph Remix"},
"5":{"0":"album_name","1":"The Moon Rang Like a Bell"},
"6":{"0":"duration_ms","1":"285327"},
"7":{"0":"lyrics","1":"not fetched"},
"8":{"0":"lyrics_bert_embeddings","1":"[ 0.00722605 -0.23726921 0.15163635 -0.28774077 0.07081255 0.26606813\r\n 0.03439684 -0.29289168 0.13590978 0.2332756 -0.24305075 0.2034984 ]"}}
https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html?highlight=read_csv

How do I make a function that reads a csv, converts its lists into dictionaries to result in a certain output in python?

A csv file in python which includes a list of integers and string such as:
Sunset,61,South,2002-12-28,48,3,25,168,42,13
Sunrise,42,North,2012-02-05,84,3,39,141,13,115
How would I make a function which would, open and read the csv, convert it into a dictionary (in this case would be named dict) so that I could put in an input like this in the function:
sun_state(dict, 61)
and get an output of:
[ '2002-12-28', '84', '4', '49', '308']
the format of the output follows respectively:
The date, three integers after the date and the sum of the last three integers.
If I had made any problems with my question, please do tell, I would love the help of any kind, thank you.

You could simply read a csv file by using pandas and create a dictionary (although I could not understand your output given your explanation) as follows:
Code:
import pandas as pd
if __name__ == '__main__':
df = pd.read_csv("data/id_to_date.csv", header=None, names=["Sunset","ID","Direction","Date","col1","col2","col3","col4","col5","col6"])
sun_dict = {}
for row_id, row in df.iterrows():
values = [row["Date"], row["col1"], row["col2"], row["col3"], sum([row["col4"], row["col5"], row["col6"]])]
sun_dict[row["ID"]] = values
print(sun_dict[61])
Result:
['2002-12-28', 48, 3, 25, 223]

Pandas error reading csv with double quotes

I've read all related topics - like this, this and this - but couldn't get a solution to work.
I have an input csv file like this:
ItemId,Content
i0000008,{"Title":"Edison Kinetoscopic Record of a Sneeze","Year":"1894","Rated":"N/A"}
i0000010,{"Title":"Employees, Leaving the Lumiére, Factory","Year":"1895","Rated":"N/A"}
I've tried several different approaches but couldn't get it to work. I want to read this csv file into a Dataframe like this:
ItemId Content
-------- -------------------------------------------------------------------------------
i0000008 {"Title":"Edison Kinetoscopic Record of a Sneeze","Year":"1894","Rated":"N/A"}
i0000010 {"Title":"Employees, Leaving the Lumiére, Factory","Year":"1895","Rated":"N/A"}
With following code (Python 3.9):
df = pd.read_csv('test.csv', sep=',', skipinitialspace = True, quotechar = '"')
As far as I understand, commas inside dictionary column and commas inside quotation marks are being treated as regular separators, so it raises following error:
pandas.errors.ParserError: Error tokenizing data. C error: Expected 4 fields in line 3, saw 6
Is it possible to produce desired result? Thanks.

The problem is that the comma's in the Content column are interpreted as separators. You can solve this by using pd.read_fwf to manually set the number of characters on which to split:
df = pd.read_fwf('test.csv', colspecs=[(0, 8),(9,100)], header=0, names=['ItemId', 'Content'])
Result:
ItemId
Content
0
i0000008
{"Title":"Edison Kinetoscopic Record of a Sneeze","Year":"1894","Rated":"N/A"}
1
i0000010
{"Title":"Employees, Leaving the Lumiére, Factory","Year":"1895","Rated":"N/A"}

I don't think you'll be able to read it normally with pandas because it has the delimiter used multiple times for a single value; however, reading it with python and doing some processing, you should be able to convert it to pandas dataframe:
def splitValues(x):
index = x.find(',')
return x[:index], x[index+1:].strip()
import pandas as pd
data = open('file.csv')
columns = next(data)
columns = columns.strip().split(',')
df = pd.DataFrame(columns=columns, data=(splitValues(row) for row in data))
OUTPUT:
ItemId Content
0 i0000008 {"Title":"Edison Kinetoscopic Record of a Sneeze","Year":"1894","Rated":"N/A"}
1 i0000010 {"Title":"Employees, Leaving the Lumiére, Factory","Year":"1895","Rated":"N/A"}

Reading data with more columns than expected into a dataframe

I have a number of .csv files that I download into a directory.
Each .csv is suppose to have 3 columns of information. The head of one of these files looks like:
17/07/2014,637580,10.755
18/07/2014,61996,10.8497
21/07/2014,126758,10.8208
22/07/2014,520926,10.8201
23/07/2014,370843,9.2883
The code that I am using to read the .csv into a dataframe (df) is:
df = pd.read_csv(adj_directory+'\\'+filename, error_bad_lines=False,names=['DATE', 'PX', 'RAW'])
Where I name the three columns (DATE, PX and RAW).
This works fine when the file is formatted correctly. However I have noticed that sometimes the .csv has a slightly different format and can look like for example:
09/07/2014,26268315,,
10/07/2014,6601181,16.3857
11/07/2014,916651,12.5879
14/07/2014,213357,,
15/07/2014,205019,10.8607
where there is a column value missing and an extra comma appears in the values place. This means that the file fails to load into the dataframe (the df dataframe is empty).
Is there a way to read the data into a dataframe with the extra comma (ignoring the offending row) so the df would look like:
09/07/2014,26268315,NaN
10/07/2014,6601181,16.3857
11/07/2014,916651,12.5879
14/07/2014,213357,NaN
15/07/2014,205019,10.8607

Probably best to fix the file upstream so that missing values aren't filled with a ,. But if necessary you can correct the file in python, by replacing ,, with just , (line-by-line). Taking your bad file as test.csv:
import re
import csv
patt = re.compile(r",,")
with open('corrected.csv', 'w') as f2:
with open('test.csv') as f:
for line in csv.reader(map(lambda s: patt.sub(',', s), f)):
f2.write(','.join(str(x) for x in line))
f2.write('\n')
f2.close()
f.close()
Output: corrected.csv
09/07/2014,26268315,
10/07/2014,6601181,16.3857
11/07/2014,916651,12.5879
14/07/2014,213357,
15/07/2014,205019,10.8607
Then you should be able to read in this file without issue
import pandas as pd
df = pd.read_csv('corrected.csv', names=['DATE', 'PX', 'RAW'])
DATE PX RAW
0 09/07/2014 26268315 NaN
1 10/07/2014 6601181 16.3857
2 11/07/2014 916651 12.5879
3 14/07/2014 213357 NaN
4 15/07/2014 205019 10.8607

Had this problem yesterday.
Have you tried:
pd.read_csv(adj_directory+'\\'+filename,
error_bad_lines=False,names=['DATE', 'PX', 'RAW'],
keep_default_na=False,
na_values=[''])

Splitting this CSV file into a list

So I want to read in a csv file in Python3 and split it into a list. Where each index[0,1, ..., onwards] relates to each value separated by comma.
This is my csv file:
2017-04-1,14.9,30.1,0,8.2,10.8,NE,33,11:55,20.3,51,0,E,11,1023.9,29.5,25,0,ENE,7,1020.3
,2017-04-2,17.4,31.6,0,8.0,10.9,NE,35,08:56,23.5,34,0,NE,17,1021.4,30.7,20,0,SE,9,1018.6
2017-04-3,12.1,31.8,0,6.8,10.8,SSW,33,15:14,23.1,39,0,SSE,6,1022.7,29.3,34,0,SSW,19,1020.8
,2017-04-4,15.4,30.4,0,7.0,10.7,E,28,03:01,19.8,64,0,ESE,11,1024.9,29.7,29,0,S,9,1020.3
,2017-04-5,12.3,30.4,0,5.2,10.6,S,19,13:10,21.7,55,0,NE,6,1018.2,27.7,37,0,WSW,11,1013.5
,2017-04-6,13.7,24.4,0,5.2,8.1,SW,43,16:16,17.1,94,7,NE,2,1013.1,22.8,63,3,SSW,20,1011.7
,2017-04-7,14.8,22.3,0,5.4,8.9,SSE,35,06:26,16.4,56,5,SSE,17,1023.6,21.3,33,3,SSE,4,1021.6
,2017-04-8,8.7,23.6,0,5.0,10.5,SW,33,15:41,16.0,58,0,SE,7,1027.6,22.1,44,0,SW,17,1024.5
,2017-04-9,11.1,27.4,0,4.8,10.4,ESE,24,10:30,18.1,56,0,ENE,13,1027.4,26.8,26,0,NE,9,1023.1
,2017-04-10,10.0,30.1,0,5.6,10.4,SSW,24,16:38,19.3,51,4,E,9,1022.7,30.0,20,1,E,6,1018.4
,2017-04-11,13.1,28.1,0,6.6,10.5,SW,28,15:02,21.8,38,0,NE,9,1016.6,26.6,35,0,SW,13,1015.7
,2017-04-12,10.6,23.8,0,5.2,9.7,SW,35,16:19,17.4,69,6,ENE,9,1021.5,23.0,52,1,SW,15,1019.9
,2017-04-13,12.9,26.8,0,4.2,10.4,SSW,31,15:56,19.9,64,1,ESE,11,1024.3,25.5,49,1,SW,15,1020.1
,2017-04-14,12.8,29.0,0,5.8,6.2,SW,22,15:43,18.1,73,4,SSE,2,1019.3,27.6,42,5,SSW,11,1015.4
,2017-04-15,14.8,29.3,0,4.0,7.3,SSE,31,22:03,18.5,73,6,S,11,1015.7,28.2,38,7,SE,9,1011.7
,2017-04-16,17.2,25.1,0,5.4,7.0,SSE,35,00:43,18.8,66,4,ESE,11,1014.6,24.4,54,5,SW,11,1011.6
,2017-04-17,15.4,21.8,0,5.0,2.5,SSW,24,07:56,17.8,74,5,S,13,1015.3,21.4,59,8,SSW,11,1013.2
,2017-04-18,15.3,25.0,0,4.0,8.0,SSW,31,19:02,19.7,72,6,SSE,9,1013.0,22.8,63,1,SW,15,1010.4
Currently when I read it in, each element is being split at the end of the line. So if I accessed index[0] this would be the output.
2017-04-1,14.9,30.1,0,8.2,10.8,NE,33,11:55,20.3,51,0,E,11,1023.9,29.5,25,0,ENE,7,1020.3
What I need to understand is how to split the csv so that if I access index[0], I will be given 2017-04-1. And index[1] would give the next value after the comma.
Here is my code at the moment.
import matplotlib.pyplot as plt
# Opening and reading files
weatherdata = open('weather.csv', encoding='latin1')
data = weatherdata.readlines()
Encoding is required to be in latin one because it needs to be able to handle degree symbols.
Thanks for the guidance.

You have already read all lines into data:
weatherdata = open('weather.csv')
data = weatherdata.readlines()
data
data will be a string list:
['2017-04-1,14.9,30.1,0,8.2,10.8,NE,33,11:55,20.3,51,0,E,11,1023.9,29.5,25,0,ENE,7,1020.3\n',
'2017-04-2,17.4,31.6,0,8.0,10.9,NE,35,08:56,23.5,34,0,NE,17,1021.4,30.7,20,0,SE,9,1018.6 \n',
'2017-04-3,12.1,31.8,0,6.8,10.8,SSW,33,15:14,23.1,39,0,SSE,6,1022.7,29.3,34,0,SSW,19,1020.8\n',
'2017-04-4,15.4,30.4,0,7.0,10.7,E,28,03:01,19.8,64,0,ESE,11,1024.9,29.7,29,0,S,9,1020.3\n',
'2017-04-5,12.3,30.4,0,5.2,10.6,S,19,13:10,21.7,55,0,NE,6,1018.2,27.7,37,0,WSW,11,1013.5\n',
'2017-04-6,13.7,24.4,0,5.2,8.1,SW,43,16:16,17.1,94,7,NE,2,1013.1,22.8,63,3,SSW,20,1011.7\n',
'2017-04-7,14.8,22.3,0,5.4,8.9,SSE,35,06:26,16.4,56,5,SSE,17,1023.6,21.3,33,3,SSE,4,1021.6\n',
'2017-04-8,8.7,23.6,0,5.0,10.5,SW,33,15:41,16.0,58,0,SE,7,1027.6,22.1,44,0,SW,17,1024.5\n',
'2017-04-9,11.1,27.4,0,4.8,10.4,ESE,24,10:30,18.1,56,0,ENE,13,1027.4,26.8,26,0,NE,9,1023.1\n',
'2017-04-10,10.0,30.1,0,5.6,10.4,SSW,24,16:38,19.3,51,4,E,9,1022.7,30.0,20,1,E,6,1018.4\n',
'2017-04-11,13.1,28.1,0,6.6,10.5,SW,28,15:02,21.8,38,0,NE,9,1016.6,26.6,35,0,SW,13,1015.7\n',
'2017-04-12,10.6,23.8,0,5.2,9.7,SW,35,16:19,17.4,69,6,ENE,9,1021.5,23.0,52,1,SW,15,1019.9\n',
'2017-04-13,12.9,26.8,0,4.2,10.4,SSW,31,15:56,19.9,64,1,ESE,11,1024.3,25.5,49,1,SW,15,1020.1\n',
'2017-04-14,12.8,29.0,0,5.8,6.2,SW,22,15:43,18.1,73,4,SSE,2,1019.3,27.6,42,5,SSW,11,1015.4\n',
'2017-04-15,14.8,29.3,0,4.0,7.3,SSE,31,22:03,18.5,73,6,S,11,1015.7,28.2,38,7,SE,9,1011.7\n',
'2017-04-16,17.2,25.1,0,5.4,7.0,SSE,35,00:43,18.8,66,4,ESE,11,1014.6,24.4,54,5,SW,11,1011.6\n',
'2017-04-17,15.4,21.8,0,5.0,2.5,SSW,24,07:56,17.8,74,5,S,13,1015.3,21.4,59,8,SSW,11,1013.2\n',
'2017-04-18,15.3,25.0,0,4.0,8.0,SSW,31,19:02,19.7,72,6,SSE,9,1013.0,22.8,63,1,SW,15,1010.4']
Then use data[0].split(',')[0], it will give you:
'2017-04-1'
and data[0].split(',')[1], will be:
'14.9'
and so on.

Simply read and then split:
weatherdata = open('weather.csv')
data = [line.split(',') for line in weatherdata.read().splitlines()]

Or you can use pandas and it does it for you,Pandas is very useful to read dataset and manipulate them,it reads the data all at once and you can get different columns after reading them
import pandas as pd
df = pd.read_csv('weather.csv')
df.column[0]# this will get the first column

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Splitting an array with \x00\t as the separator - python

You can try replace and split : a = '\x001\x000\x004\x000\x001\x007\x00H\x001\x00\t\x006\x003\x00\t\x002\x000\x001\x006\x000\x008\x001\x002\x00\t' a.replace('\x00','').split('\t') OUTPUT : ['104017H1', '63', '20160812', '']

Related

pandas reading csv with one row spanning multiple lines

How do I make a function that reads a csv, converts its lists into dictionaries to result in a certain output in python?

Pandas error reading csv with double quotes

Reading data with more columns than expected into a dataframe

Splitting this CSV file into a list

Categories

Resources