How to eliminate null valued cells from a CSV dataset using Python? - python

Each row represents a person(315 total) and each column represent a choice scenario(16 total). Each person responded randomly to 4 consecutive choice scenarios. I want to have four columns consecutively having the responses of each person and do away with any blank cell.Image of the excel sheet
import pandas as pd
df = pd.read_csv(r"C:\Users\Admin\Desktop\Book2.csv")
for (r,c) in df.iterrows():
if df.iat[r,c] is not None:
for i in range(4):
print(str(df.iat[r,c+i]))
UPDATE
I have managed to get data row wise into a list and grouped them in groups of 4 (as i need it). Now how do I keep the elements with values other than ' '?
import csv
rowdata = []
with open(r'C:\Users\ARPLAB31\Desktop\SPdata.csv') as inputfile:
reader = csv.reader(inputfile)
rowdata = list(reader)
r= []
for i in range(1,718,1):
for j in range(28):
if len(rowdata[i][j])!=0:
r.append(rowdata[i][j])
cardref = [r[x:x+4] for x in range(0, len(r),4)] '''cardref contains the partitioned data.'''
print(cardref)
OUTPUT:


Use df.isnull() [ where df is pandas dataframe]
A good resource to find the null value in dataframe in pandas.
https://dzone.com/articles/pandas-find-rows-where-columnfield-is-null

You can read each row, obtain the not null fields, and create a new CSV from there.
For example:
data = ",,2,2,2,2,,,"
arr = filter(None, data.split(",")) #removes null fields
",".join(arr) #"2,2,2,2"

Thanks all of you. I happened to solve my problem with the help of all the above comments. Please mention any changes to the code in the comment.
import pandas
import csv
rowdata = []
''' READING CSV INTO LIST'''
with open('FILE.csv') as inputfile:
reader = csv.reader(inputfile)
rowdata = list(reader)
'''RECORDING THE POSITION OF NON-EMPTY ELEMENTS'''
r= []
for i in range(1,718,1):
for j in range(28):
if len(rowdata[i][j])!=0:
r.append(j)
continue
''' RE-GROUPING LIST AS LIST IN LIST'''
resp_index = [r[x:x+4] for x in range(0, len(r),4)]
print(resp_index)
print(len(resp_index))
'''ELIMINATING BLANK SPACES AND STORING INTO NEW LIST'''
s= []
for i in range(1,718,1):
for j in range(28):
if len(rowdata[i][j])!=0:
s.append(rowdata[i][j])
''' RE-GROUPING LIST AS LIST IN LIST'''
resp_main = [s[x:x+4] for x in range(0, len(s),4)]
print(resp_main)
print(len(resp_main))
pd = pandas.DataFrame(resp_index)
pe = pandas.DataFrame(resp_main)
'''SAVING TO CSV FILES'''
pd.to_csv('INDEX.csv')
pe.to_csv('RESPONSE.csv')

Related

The axis argument to unique is not supported for dtype object

I am trying to get unique counts column-wise but my array has categorical variables (dtype object)
val, count = np.unique(x, axis=1, return_counts=True)
Though I am getting an error like this:
TypeError: The axis argument to unique is not supported for dtype object
How do I sove this problem?
Sample x:
array([[' Private', ' HS-grad', ' Divorced'],
[' Private', ' 11th', ' Married-civ-spouse'],
[' Private', ' Bachelors', ' Married-civ-spouse'],
[' Private', ' Masters', ' Married-civ-spouse'],
[' Private', ' 9th', ' Married-spouse-absent'],
[' Self-emp-not-inc', ' HS-grad', ' Married-civ-spouse'],
[' Private', ' Masters', ' Never-married'],
[' Private', ' Bachelors', ' Married-civ-spouse'],
[' Private', ' Some-college', ' Married-civ-spouse']], dtype=object)
Need the following counts:
for x_T in x.T:
val, count = np.unique(x_T, return_counts=True)
print (val,count)
[' Private' ' Self-emp-not-inc'] [8 1]
[' 11th' ' 9th' ' Bachelors' ' HS-grad' ' Masters' ' Some-college'] [1 1 2 2 2 1]
[' Divorced' ' Married-civ-spouse' ' Married-spouse-absent'
' Never-married'] [1 6 1 1]
You could use Itemfreq eventhough it the output does not look like yours it delivers the desired counts:
import numpy as np
from scipy.stats import itemfreq
x = np. array([[' Private', ' HS-grad', ' Divorced'],
[' Private', ' 11th', ' Married-civ-spouse'],
[' Private', ' Bachelors', ' Married-civ-spouse'],
[' Private', ' Masters', ' Married-civ-spouse'],
[' Private', ' 9th', ' Married-spouse-absent'],
[' Self-emp-not-inc', ' HS-grad', ' Married-civ-spouse'],
[' Private', ' Masters', ' Never-married'],
[' Private', ' Bachelors', ' Married-civ-spouse'],
[' Private', ' Some-college', ' Married-civ-spouse']], dtype=object)
itemfreq(x)
Output:
array([[' 11th', 1],
[' 9th', 1],
[' Bachelors', 2],
[' Divorced', 1],
[' HS-grad', 2],
[' Married-civ-spouse', 6],
[' Married-spouse-absent', 1],
[' Masters', 2],
[' Never-married', 1],
[' Private', 8],
[' Self-emp-not-inc', 1],
[' Some-college', 1]], dtype=object)
otherwise you could try to specifiy another dtype such as:
val, count = np.unique(x.astype("<U22"), axis=1, return_counts=True)
for this however your array has to be different

How to get access to every element of the 2d array and than replace them?

arr = [['.' for i in range(4)] for j in range(4)]
for line, i in enumerate(arr):
for column, j in enumerate(i):
print(j, 'at column', column+1, 'line', line+1) # we can know which
# postition takes
# every element
How out of loop to check if coordinate is different with another coordinate.
What I want to get in final:
pseudocode:
#arr[x][y]
arr[1][0] = 'new'
if arr[1][4] - arr[1][0] == 4: # i.e. coord are different by `y` on 4 pos
arr[1][4] = 'new'`
# Before || After
[[' ', 'new', ' ', ' '],|| [[' ', ' ', ' ', ' '],
[' ', ' ', ' ', ' '], || [' ', ' ', ' ', ' '],
[' ', ' ', ' ', ' '], || [' ', ' ', ' ', ' '],
[' ', ' ', ' ', ' '], || [' ', ' ', ' ', ' '],
[' ', ' ', ' ', ' ']] || [' ', 'new', ' ', ' ']]
OR
#arr[x][y]
arr[0][0] = 'new'
if arr[3][0] - arr[0][0] == 3: # i.e. coord are different by `x` on 3 pos
arr[3][0] = 'new'`
# Before || After
[['new', ' ', ' ', ' '],|| [[' ', ' ', ' ', 'new'],
[' ', ' ', ' ', ' '], || [' ', ' ', ' ', ' '],
[' ', ' ', ' ', ' '], || [' ', ' ', ' ', ' '],
[' ', ' ', ' ', ' '], || [' ', ' ', ' ', ' '],
[' ', ' ', ' ', ' ']] || [' ', ' ', ' ', ' ']]
Necessarily need to know which list in main list takes position, but how it does outside of loop without numpy, using native python?
Question: How out of loop to check if coordinate is different with another coordinate.
Define the coordinate's using tuple, then compare the tuples:
RC = (1,4)
RC2 = (1,0)
if RC == RC2:
print('Equal')
else:
print('Different')
Type list is 0 Based
You have a list of lists.
RC Coordinate System, the same as with Excel
R == index of list in list == y == Row
C == index inside a list == x == Column
A B C D
C => 0 1 2 3
----------------------
R:0 | [0.0, 0.1, 0.2, 0.3]
R:1 | [1.0, 1.1, 1.2, 1.3]
R:2 | [2.0, 2.1, 2.2, 2.3]
R:3 | [3.0, 3.1, 3.2, 3.3]
Use the opposite
#arr[y][x]
The first index=y is the index of a list in arr, alias row.
The second index=x is the index inside the list selected with y, alias column.

Handle Fortran Character Arrays from Python with F2PY

I have a legacy Fortran library I've wrapped with F2PY. However, I'm at a loss for how to properly read character arrays declared as module data, from Python. The data data comes through, but the array is transposed in such a way that it is indiscernible. How can I get Numpy to correctly handle my array? I'd be satisfied with a 2 dimensional array of characters if they were in an intelligible order.
The character arrays are declared and populated in Fortran like so:
module plot_mod
implicit none
CHARACTER*4, JSP(39)
...
JSP = (/ &
'SF ', 'WF ', 'GF ', 'AF ', 'RF ', 'SS ', 'NF ', &
'YC ', 'IC ', 'ES ', 'LP ', 'JP ', 'SP ', 'WP ', &
'PP ', 'DF ', 'RW ', 'RC ', 'WH ', 'MH ', 'BM ', &
'RA ', 'WA ', 'PB ', 'GC ', 'AS ', 'CW ', 'WO ', &
'WJ ', 'LL ', 'WB ', 'KP ', 'PY ', 'DG ', 'HT ', &
'CH ', 'WI ', ' ', 'OT '/)
end module plot_mod
In Python 2.7 (previous version of numpy) I could do this:
x = numpy.frombuffer(fvslib.plot_mod.jsp.data, numpy.dtype('a4'))
But now Python (3.4.4) and Numpy (1.10.4) raises an error, BufferError: memoryview: underlying buffer is not C-contiguous.
I know I should be able to get Numpy to handle this for me by reshaping, or using stride tricks, but I can't seem to figure it out. The array is reported as F-contiguous, so at least that seems correct.
If I simply print the array it looks like this:
array([[b'S', b' ', b' ', b'L'],
[b'F', b'L', b' ', b' '],
[b' ', b'P', b'B', b' '],
[b' ', b' ', b'M', b'W'],
[b'W', b' ', b' ', b'B'],
[b'F', b'J', b' ', b' '],
[b' ', b'P', b'R', b' '],
[b' ', b' ', b'A', b'K'],
[b'G', b' ', b' ', b'P'],
[b'F', b'S', b' ', b' '],
[b' ', b'P', b'W', b' '],
[b' ', b' ', b'A', b'P'],
[b'A', b' ', b' ', b'Y'],
[b'F', b'W', b' ', b' '],
[b' ', b'P', b'P', b' '],
[b' ', b' ', b'B', b'D'],
[b'R', b' ', b' ', b'G'],
[b'F', b'P', b' ', b' '],
[b' ', b'P', b'G', b' '],
[b' ', b' ', b'C', b'H'],
[b'S', b' ', b' ', b'T'],
[b'S', b'D', b' ', b' '],
[b' ', b'F', b'A', b' '],
[b' ', b' ', b'S', b'C'],
[b'N', b' ', b' ', b'H'],
[b'F', b'R', b' ', b' '],
[b' ', b'W', b'C', b' '],
[b' ', b' ', b'W', b'W'],
[b'Y', b' ', b' ', b'I'],
[b'C', b'R', b' ', b' '],
[b' ', b'C', b'W', b' '],
[b' ', b' ', b'O', b' '],
[b'I', b' ', b' ', b' '],
[b'C', b'W', b' ', b' '],
[b' ', b'H', b'W', b' '],
[b' ', b' ', b'J', b'O'],
[b'E', b' ', b' ', b'T'],
[b'S', b'M', b' ', b' '],
[b' ', b'H', b'L', b' ']],
dtype='|S1')
What I would like an array like this:
[['SF ']
, ['WF ']
, ['GF ']
, ['AF ']
, ['RF ']
, ['SS ']
, ['NF ']
, ['YC ']
, ['IC ']
, ['ES ']
, ['LP ']
, ['JP ']
, ['SP ']
, ['WP ']
, ['PP ']
, ['DF ']
, ['RW ']
, ['RC ']
, ['WH ']
, ['MH ']
, ['BM ']
, ['RA ']
, ['WA ']
, ['PB ']
, ['GC ']
, ['AS ']
, ['CW ']
, ['WO ']
, ['WJ ']
, ['LL ']
, ['WB ']
, ['KP ']
, ['PY ']
, ['DG ']
, ['HT ']
, ['CH ']
, ['WI ']
, [' ']
, ['OT ']]
I haven't tried running f2py on your module, but if I define the array you show as:
In [11]: s = array([[b'S', b' ', b' ', b'L'],
...: [b'F', b'L', b' ', b' '],
...: [b' ', b'P', b'B', b' '],
...: [b' ', b' ', b'M', b'W'],
...: [b'W', b' ', b' ', b'B'],
...: [b'F', b'J', b' ', b' '],
...: [b' ', b'P', b'R', b' '],
...: [b' ', b' ', b'A', b'K'],
...: [b'G', b' ', b' ', b'P'],
...: [b'F', b'S', b' ', b' '],
...: [b' ', b'P', b'W', b' '],
...: [b' ', b' ', b'A', b'P'],
...: [b'A', b' ', b' ', b'Y'],
...: [b'F', b'W', b' ', b' '],
...: [b' ', b'P', b'P', b' '],
...: [b' ', b' ', b'B', b'D'],
...: [b'R', b' ', b' ', b'G'],
...: [b'F', b'P', b' ', b' '],
...: [b' ', b'P', b'G', b' '],
...: [b' ', b' ', b'C', b'H'],
...: [b'S', b' ', b' ', b'T'],
...: [b'S', b'D', b' ', b' '],
...: [b' ', b'F', b'A', b' '],
...: [b' ', b' ', b'S', b'C'],
...: [b'N', b' ', b' ', b'H'],
...: [b'F', b'R', b' ', b' '],
...: [b' ', b'W', b'C', b' '],
...: [b' ', b' ', b'W', b'W'],
...: [b'Y', b' ', b' ', b'I'],
...: [b'C', b'R', b' ', b' '],
...: [b' ', b'C', b'W', b' '],
...: [b' ', b' ', b'O', b' '],
...: [b'I', b' ', b' ', b' '],
...: [b'C', b'W', b' ', b' '],
...: [b' ', b'H', b'W', b' '],
...: [b' ', b' ', b'J', b'O'],
...: [b'E', b' ', b' ', b'T'],
...: [b'S', b'M', b' ', b' '],
...: [b' ', b'H', b'L', b' ']],
...: dtype='|S1')
I can get an array that looks like what you want with:
In [12]: s.T.reshape(-1, 4).view('S4')
Out[12]:
array([[b'SF '],
[b'WF '],
[b'GF '],
[b'AF '],
[b'RF '],
[b'SS '],
[b'NF '],
[b'YC '],
[b'IC '],
[b'ES '],
[b'LP '],
[b'JP '],
[b'SP '],
[b'WP '],
[b'PP '],
[b'DF '],
[b'RW '],
[b'RC '],
[b'WH '],
[b'MH '],
[b'BM '],
[b'RA '],
[b'WA '],
[b'PB '],
[b'GC '],
[b'AS '],
[b'CW '],
[b'WO '],
[b'WJ '],
[b'LL '],
[b'WB '],
[b'KP '],
[b'PY '],
[b'DG '],
[b'HT '],
[b'CH '],
[b'WI '],
[b' '],
[b'OT ']],
dtype='|S4')
Note that the data type is 'S4', to match the declared size of the Fortran array.
That result leaves a trivial second dimension, so you might want to convert it to a one-dimensional array, e.g.
In [22]: s.T.reshape(-1, 4).view('S4')[:,0]
Out[22]:
array([b'SF ', b'WF ', b'GF ', b'AF ', b'RF ', b'SS ', b'NF ',
b'YC ', b'IC ', b'ES ', b'LP ', b'JP ', b'SP ', b'WP ',
b'PP ', b'DF ', b'RW ', b'RC ', b'WH ', b'MH ', b'BM ',
b'RA ', b'WA ', b'PB ', b'GC ', b'AS ', b'CW ', b'WO ',
b'WJ ', b'LL ', b'WB ', b'KP ', b'PY ', b'DG ', b'HT ',
b'CH ', b'WI ', b' ', b'OT '],
dtype='|S4')
For completeness I'll include this alternative solution. Same results as #Warren Weckesser, but requires an additional import.
from numpy.lib import stride_tricks
spp = stride_tricks.as_strided(jsp, strides=(jsp.shape[1],1))
# View as S4 and strip whitespace
spp = np.char.strip(spp.view('S4'))

Pythonic way of creating list of lists from list

I would like to create a list of lists from a list.
The list looks like this:
level = [' WWWWWWWWWWWWWWWWW', 'C W C W', 'C W C W', 'C W C W', 'C W C W', 'C W C W', 'C W C W', 'C W C W', 'C W C W', 'C W C W', 'C W C W', 'C W C W', 'C E']
I need to create this:

I have done it this way:
listofLists = []
for row in level:
liss = []
for col in row:
liss.append(col)
listofLists.append(liss)
What is a more pythonic way or shorter way of doing this?
>>> listofLists = map(list,level)
(in python3, if you really need a list, do list(map(list, level)))
When you call list() on a string this will return the list of all its characters (including spaces).
level = [' WWWWWWWWWWWWWWWWW', 'C W C W', 'C W C W', 'C W C W', 'C W C W', 'C W C W', 'C W C W', 'C W C W', 'C W C W', 'C W C W', 'C W C W', 'C W C W', 'C E']
trasnsformed = [list(x) for x in level]

Transposing a 3D list in Python

I have to transpose a 3d list the following way:
Input:
matrix7 = [[['A ', 'E ', 'C#'], ['B ', 'E ', 'C#'], ['C ', 'E ', 'C#']],
[[' ', 'F#', 'D '], [' ', 'F#', 'D '], [' ', 'F#', 'D ']],
[[' ', 'E ', 'B '], [' ', 'E ', 'B '], [' ', 'E ', 'B ']],
[[' ', 'E ', 'C#'], [' ', 'E ', 'C#'], [' ', 'E ', 'C#']],
[[' ', 'F#', 'D '], [' ', 'F#', 'D '], [' ', 'F#', 'D ']],
[[' ', 'E ', 'B '], [' ', 'E ', 'B '], [' ', 'E ', 'B ']],
[[' ', ' ', ' '], [' ', ' ', ' '], [' ', ' ', ' ']],
[[' ', 'E ', 'C#'], [' ', 'E ', 'C#'], [' ', 'E ', 'C#']]]
desired output:
[[['A ', 'E ', 'C#'], [' ', 'F#', 'D '], [' ', 'E ', 'B '], [' ', 'E ', 'C#'], [' ', 'F#', 'D '], [' ', 'E ', 'B '], [' ', ' ', ' '], [' ', 'E ', 'C#']],
[['B ', 'E ', 'C#'], [' ', 'F#', 'D '], [' ', 'E ', 'B '], [' ', 'E ', 'C#'],
[' ', 'F#', 'D '], [' ', 'E ', 'B '], [' ', ' ', ' '], [' ', 'E ', 'C#']],
[['C ', 'E ', 'C#'], [' ', 'F#', 'D '], [' ', 'E ', 'B '], [' ', 'E ', 'C#'],
[' ', 'F#', 'D '], [' ', 'E ', 'B '], [' ', ' ', ' '], [' ', 'E ', 'C#']]]
I have the following program:
matrix8 = []
for index in matrix7:
matrix8 = numpy.array(matrix7).transpose()
matrix9 = matrix8.tolist()
print matrix9
which is giving me the wrong output:
[[['A ', 'E ', 'C#'], ['B ', 'E ', 'C#'], ['C ', 'E ', 'C#']],
[[' ', 'F#', 'D '], [' ', 'F#', 'D '], [' ', 'F#', 'D ']],
[[' ', 'E ', 'B '], [' ', 'E ', 'B '], [' ', 'E ', 'B ']],
[[' ', 'E ', 'C#'], [' ', 'E ', 'C#'], [' ', 'E ', 'C#']],
[[' ', 'F#', 'D '], [' ', 'F#', 'D '], [' ', 'F#', 'D ']],
[[' ', 'E ', 'B '], [' ', 'E ', 'B '], [' ', 'E ', 'B ']],
[[' ', ' ', ' '], [' ', ' ', ' '], [' ', ' ', ' ']],
[[' ', 'E ', 'C#'], [' ', 'E ', 'C#'], [' ', 'E ', 'C#']]]
Can anyone help me with this?
I think this is what you want:
numpy.transpose(matrix7, axes=(1, 0, 2)).tolist() # The 'axes' attribute tells transpose to swaps axes 0 and 1, leaving the last one alone.
OUTPUT:
[[['A ', 'E ', 'C#'], [' ', 'F#', 'D '], [' ', 'E ', 'B '], [' ', 'E ', 'C#'], [' ', 'F#', 'D '], [' ', 'E ', 'B '], [' ', ' ', ' '], [' ', 'E ', 'C#']],
[['B ', 'E ', 'C#'], [' ', 'F#', 'D '], [' ', 'E ', 'B '], [' ', 'E ', 'C#'], [' ', 'F#', 'D '], [' ', 'E ', 'B '], [' ', ' ', ' '], [' ', 'E ', 'C#']],
[['C ', 'E ', 'C#'], [' ', 'F#', 'D '], [' ', 'E ', 'B '], [' ', 'E ', 'C#'], [' ', 'F#', 'D '], [' ', 'E ', 'B '], [' ', ' ', ' '], [' ', 'E ', 'C#']]]
You don't actually need numpy for this:
>>> [list(x) for x in zip(*matrix7)]
[[['A ', 'E ', 'C#'], [' ', 'F#', 'D '], [' ', 'E ', 'B '], [' ', 'E ', 'C#'], [' ', 'F#', 'D '], [' ', 'E ', 'B '], [' ', ' ', ' '], [' ', 'E ', 'C#']],
[['B ', 'E ', 'C#'], [' ', 'F#', 'D '], [' ', 'E ', 'B '], [' ', 'E ', 'C#'], [' ', 'F#', 'D '], [' ', 'E ', 'B '], [' ', ' ', ' '], [' ', 'E ', 'C#']],
[['C ', 'E ', 'C#'], [' ', 'F#', 'D '], [' ', 'E ', 'B '], [' ', 'E ', 'C#'], [' ', 'F#', 'D '], [' ', 'E ', 'B '], [' ', ' ', ' '], [' ', 'E ', 'C#']]]
Or, if you don't mind getting a list of tuples of lists, just:
>>> list(zip(*matrix7))
[(['A ', 'E ', 'C#'], [' ', 'F#', 'D '], [' ', 'E ', 'B '], [' ', 'E ', 'C#'], [' ', 'F#', 'D '], [' ', 'E ', 'B '], [' ', ' ', ' '], [' ', 'E ', 'C#']),
(['B ', 'E ', 'C#'], [' ', 'F#', 'D '], [' ', 'E ', 'B '], [' ', 'E ', 'C#'], [' ', 'F#', 'D '], [' ', 'E ', 'B '], [' ', ' ', ' '], [' ', 'E ', 'C#']),
(['C ', 'E ', 'C#'], [' ', 'F#', 'D '], [' ', 'E ', 'B '], [' ', 'E ', 'C#'], [' ', 'F#', 'D '], [' ', 'E ', 'B '], [' ', ' ', ' '], [' ', 'E ', 'C#'])]
Or, in Python 2.x, even less:
>>> zip(*matrix7)
[(['A ', 'E ', 'C#'], [' ', 'F#', 'D '], [' ', 'E ', 'B '], [' ', 'E ', 'C#'], [' ', 'F#', 'D '], [' ', 'E ', 'B '], [' ', ' ', ' '], [' ', 'E ', 'C#']),
(['B ', 'E ', 'C#'], [' ', 'F#', 'D '], [' ', 'E ', 'B '], [' ', 'E ', 'C#'], [' ', 'F#', 'D '], [' ', 'E ', 'B '], [' ', ' ', ' '], [' ', 'E ', 'C#']),
(['C ', 'E ', 'C#'], [' ', 'F#', 'D '], [' ', 'E ', 'B '], [' ', 'E ', 'C#'], [' ', 'F#', 'D '], [' ', 'E ', 'B '], [' ', ' ', ' '], [' ', 'E ', 'C#'])]

Categories

Resources