Write multiple numpy arrays to file - python

I know how to use numpy.savetxt to write an array to a file. How can I write multiple arrays to the same file?
Essentially I want to do math to a column of numbers, and then replace the old column with the modified numbers. I read the easiest way to do this is to write a new file completely, put the modified numbers in, and just 'copy and paste' the other numbers in the file.
Any help is appreciated.
Thanks!

Answering a very old post for my own use. I've used the following to write out two 1D arrays of same size as CSV.
import numpy as np
x = [1, 2, 3]
y = [4, 5, 6]
zipped = zip(x, y)
# >>> [(1, 4), (2, 5), (3, 6)]
# Save the array back to the file
np.savetxt('z.csv', zipped, fmt='%i,%i')

If you're wanting to write multiple arrays to a file for later use, Look into numpy.savez.
However, from your description, it sounds like you're wanting to do something with a particular column of a delimited text file.
In that case, just load the entire thing in and operate on just the column you need to.
E.g.
import numpy as np
data = np.loadtxt('test.txt')
# Multiply the 4th column by 5
data[:,3] *= 5
# Do something more complicated to the 2nd column
data[:,1] = np.cos(data[:,1])
# Save the array back to the file
np.savetxt('test.txt', data)

import numpy
list1 = [1, 2, 3, 4]
list2 = [0.45, 0.98, 0.89, 0.21]
dat = numpy.array([list1, list2])
dat = dat.T
numpy.savetxt('data.txt', dat, delimiter = ',')

Related

Converting numpy ndarray into pandas dataframe with column names and types

Edit: As explained below in #floydian's comment, the problem was that calling a = np.array(a, dtype=d) creates an a double array which was causing the problem.
I am aware that this has been already asked multiple times, and in fact am looking at Creating a Pandas DataFrame with a numpy array containing multiple types answer right now. But I still seem to have a problem while converting. It must be something very simple that I am missing. I hope that someone can be so kind and point it out. Sample code below:
import numpy as np
import pandas as pd
a = np.array([[1, 2], [3, 4]])
d = [('x','float'), ('y','int')]
a = np.array(a, dtype=d)
# Try 1
df= pd.DataFrame(a)
# Result - ValueError: If using all scalar values, you must pass an index
# Try 2
i = [1,2]
df= pd.DataFrame(a, index=i)
# Result - Exception: Data must be 1-dimensional
I would define the array like this:
a = np.array([(1, 2), (3, 4)], dtype=[('x','float'), ('y', 'int')])
pd.DataFrame(a)
gets what you want.
One option to separate it after the fact could be e.g.
pd.DataFrame(a.astype("float32").T, columns=a.dtype.names).astype({k: v[0] for k, v in a.dtype.fields.items()})
Out[296]:
x y
0 1.0 3
1 2.0 4

Convert String to Variable List

I have a string:
str='ABCDEFG'
I also have numpy arrays defined:
A=numpy.array([1,2,3])
B=numpy.array([2,3,4])
Now I want to be able to covert the string into a numpy array with the rows defined by these variables:
str=[[1,2,3],[2,3,4],...]
These are very long strings and I would rather not loop through them with a find and replace type of operation.
List comprehension for the win:
In[18]: str='ABCDEFG'
In[19]: A=[1,2,3]
B=[2,3,4]
In[20]: [locals().get(x) for x in str if x in locals().keys()]
Out[20]: [[1, 2, 3], [2, 3, 4]]
You should use locals or globals depending on your scope.
I wouldn't recommend the approach you're proposing. Managing variables and variable names is really the job of the programmer, not the program.
It seems like what you're trying to do would be accomplished easier with a DataFrame (i.e. table) object. Each row of the dataframe would have an identifier (a character within 'ABCDEFG' in your case).
I'd recommend checking out the pandas library: http://pandas.pydata.org/. It fits your use case well with minimal code:
import pandas as pd
rownames = list('AB')
dataframe = pd.DataFrame([[1,2,3],[2,3,4]], index=rownames)
dataframe.loc['B'] # Returns [2, 3, 4] as a Series

Python structured numpy array multiple sort

Hello all I have a list of delimiter separated strings:
lists=['1|Abra|23|43|0','2|Cadabra|15|18|0','3|Grabra|4|421|0','4|Lol|1|15|0']
I need to convert it to numpy array than sort it just like excel do first by Column 3, then by Column 2, and finaly by the last column
Ive tried this:
def man():
a = np.array(lists[0].split('|'))
for line in lists:
temp = np.array(line.split('|'),)
a=np.concatenate((a, temp))
a.sort(order=[0, 1])
man()
Of course no luck because it is wrong! Unfortunately im not strong in numpy arrays. Can somebody help me pls? :(
This works just perfect for me but here numpy builds array from file so to make it work i've write my list of strings to file than read it and convert to array
import numpy as np
# let numpy guess the type with dtype=None
my_data = np.genfromtxt('Selector/tmp.txt',delimiter='|', dtype=None, names ["Num", "Date", "Desc", "Rgh" ,"Prc", "Color", "Smb", "MType"])
my_data.sort(order=["Color","Prc", "Rgh"])
# save specifying required format (tab separated values)
print(my_data)
How to remain everything as is but change the conversion function to make it build the same array not from file but from list
There may be better solutions, but for a start I would sort the array once by each column in reverse order.
I assume you want to sort by column 3 and ties are resolved by column 2. Finally, remaining ties are resolved by the last column. Thus, you'd actually sort by the last column first, then by 2, then by 3.
Furthermore, you can easily convert the list to an array using a list comprehension.
import numpy as np
lists=['1|Abra|23|43|0','2|Cadabra|15|18|0','3|Grabra|4|421|0','4|Lol|1|15|0']
# convert to numpy array by splitting each row
a = np.array([l.split('|') for l in lists])
# specify columns to sort by, in order
sort_cols = [3, 2, -1]
# sort by columns in reverse order.
# This only works correctly if the sorting algorithm is stable.
for sc in sort_cols[::-1]:
order = np.argsort(a[:, sc])
a = a[order]
print(a)
You can use a list comprehension in order to split your strings and convert the integers to int. Then use a proper dtype to create your numpy array then use np.sort() function by passing the expected order:
>>> dtype = [('1st', int), ('2nd', '|S7'), ('3rd', int), ('4th', int), ('5th', int)]
>>>
>>> a = np.array([tuple([int(i) if i.isdigit() else i for i in sub.split('|')]) for sub in delimit_strs], dtype=dtype)
>>> np.sort(a, axis=0, order=['3rd','2nd', '5th'])
array([(4, 'Lol', 1, 15, 0), (3, 'Grabra', 4, 421, 0),
(2, 'Cadabra', 15, 18, 0), (1, 'Abra', 23, 43, 0)],
dtype=[('1st', '<i8'), ('2nd', 'S7'), ('3rd', '<i8'), ('4th', '<i8'), ('5th', '<i8')])
You can also do this in python which for shorter data sets in more optimized. You can simple use sorted() function by passing a proper key function.
from operator import itemgetter
sorted([[int(i) if i.isdigit() else i for i in sub.split('|')]) for sub in delimit_strs], key=itemgetter(3, 2, 4))

Pandas DataFrame to List of Lists

It's easy to turn a list of lists into a pandas dataframe:
import pandas as pd
df = pd.DataFrame([[1,2,3],[3,4,5]])
But how do I turn df back into a list of lists?
lol = df.what_to_do_now?
print lol
# [[1,2,3],[3,4,5]]
You could access the underlying array and call its tolist method:
>>> df = pd.DataFrame([[1,2,3],[3,4,5]])
>>> lol = df.values.tolist()
>>> lol
[[1L, 2L, 3L], [3L, 4L, 5L]]
If the data has column and index labels that you want to preserve, there are a few options.
Example data:
>>> df = pd.DataFrame([[1,2,3],[3,4,5]], \
columns=('first', 'second', 'third'), \
index=('alpha', 'beta'))
>>> df
first second third
alpha 1 2 3
beta 3 4 5
The tolist() method described in other answers is useful but yields only the core data - which may not be enough, depending on your needs.
>>> df.values.tolist()
[[1, 2, 3], [3, 4, 5]]
One approach is to convert the DataFrame to json using df.to_json() and then parse it again. This is cumbersome but does have some advantages, because the to_json() method has some useful options.
>>> df.to_json()
{
"first":{"alpha":1,"beta":3},
"second":{"alpha":2,"beta":4},"third":{"alpha":3,"beta":5}
}
>>> df.to_json(orient='split')
{
"columns":["first","second","third"],
"index":["alpha","beta"],
"data":[[1,2,3],[3,4,5]]
}
Cumbersome but may be useful.
The good news is that it's pretty straightforward to build lists for the columns and rows:
>>> columns = [df.index.name] + [i for i in df.columns]
>>> rows = [[i for i in row] for row in df.itertuples()]
This yields:
>>> print(f"columns: {columns}\nrows: {rows}")
columns: [None, 'first', 'second', 'third']
rows: [['alpha', 1, 2, 3], ['beta', 3, 4, 5]]
If the None as the name of the index is bothersome, rename it:
df = df.rename_axis('stage')
Then:
>>> columns = [df.index.name] + [i for i in df.columns]
>>> print(f"columns: {columns}\nrows: {rows}")
columns: ['stage', 'first', 'second', 'third']
rows: [['alpha', 1, 2, 3], ['beta', 3, 4, 5]]
I wanted to preserve the index, so I adapted the original answer to this solution:
list_df = df.reset_index().values.tolist()
Now you can paste it somewhere else (e.g. to paste into a Stack Overflow question) and latter recreate it:
pd.Dataframe(list_df, columns=['name1', ...])
pd.set_index(['name1'], inplace=True)
I don't know if it will fit your needs, but you can also do:
>>> lol = df.values
>>> lol
array([[1, 2, 3],
[3, 4, 5]])
This is just a numpy array from the ndarray module, which lets you do all the usual numpy array things.
I had this problem: how do I get the headers of the df to be in row 0 for writing them to row 1 in the excel (using xlsxwriter)? None of the proposed solutions worked, but they pointed me in the right direction. I just needed one line more of code
# get csv data
df = pd.read_csv(filename)
# combine column headers and list of lists of values
lol = [df.columns.tolist()] + df.values.tolist()
Maybe something changed but this gave back a list of ndarrays which did what I needed.
list(df.values)
Not quite relate to the issue but another flavor with same expectation
converting data frame series into list of lists to plot the chart using create_distplot in Plotly
hist_data=[]
hist_data.append(map_data['Population'].to_numpy().tolist())
"df.values" returns a numpy array. This does not preserve the data types. An integer might be converted to a float.
df.iterrows() returns a series which also does not guarantee to preserve the data types. See: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.iterrows.html
The code below converts to a list of list and preserves the data types:
rows = [list(row) for row in df.itertuples()]
If you wish to convert a Pandas DataFrame to a table (list of lists) and include the header column this should work:
import pandas as pd
def dfToTable(df:pd.DataFrame) -> list:
return [list(df.columns)] + df.values.tolist()
Usage (in REPL):
>>> df = pd.DataFrame(
[["r1c1","r1c2","r1c3"],["r2c1","r2c2","r3c3"]]
, columns=["c1", "c2", "c3"])
>>> df
c1 c2 c3
0 r1c1 r1c2 r1c3
1 r2c1 r2c2 r3c3
>>> dfToTable(df)
[['c1', 'c2', 'c3'], ['r1c1', 'r1c2', 'r1c3'], ['r2c1', 'r2c2', 'r3c3']]
The solutions presented so far suffer from a "reinventing the wheel" approach. Quoting #AMC:
If you're new to the library, consider double-checking whether the functionality you need is already offered by those Pandas objects.
If you convert a dataframe to a list of lists you will lose information - namely the index and columns names.
My solution: use to_dict()
dict_of_lists = df.to_dict(orient='split')
This will give you a dictionary with three lists: index, columns, data. If you decide you really don't need the columns and index names, you get the data with
dict_of_lists['data']
We can use the DataFrame.iterrows() function to iterate over each of the rows of the given Dataframe and construct a list out of the data of each row:
# Empty list
row_list =[]
# Iterate over each row
for index, rows in df.iterrows():
# Create list for the current row
my_list =[rows.Date, rows.Event, rows.Cost]
# append the list to the final list
row_list.append(my_list)
# Print
print(row_list)
We can successfully extract each row of the given data frame into a list
This is very simple:
import numpy as np
list_of_lists = np.array(df)
Note: I have seen many cases on Stack Overflow where converting a Pandas Series or DataFrame to a NumPy array or plain Python lists is entirely unecessary. If you're new to the library, consider double-checking whether the functionality you need is already offered by those Pandas objects.
To quote a comment by #jpp:
In practice, there's often no need to convert the NumPy array into a list of lists.
If a Pandas DataFrame/Series won't work, you can use the built-in DataFrame.to_numpy and Series.to_numpy methods.
A function I wrote that allows including the index column or the header row:
def df_to_list_of_lists(df, index=False, header=False):
rows = []
if header:
rows.append(([df.index.name] if index else []) + [e for e in df.columns])
for row in df.itertuples():
rows.append([e for e in row] if index else [e for e in row][1:])
return rows

Referencing columns by assigned name in numpy array

I am trying to create column names for easy reference, That way I can just call the name from the rest of the program instead of having to know which column is where in terms of placement. The from_ column array is coming up empty. I am new to numpy so I am just wondering how this is done. Changing of data type for columns 5 and 6 was successful though.
def array_setter():
import os
import glob
import numpy as np
os.chdir\
('C:\Users\U2970\Documents\Arcgis\Text_files\Data_exports\North_data_folder')
for file in glob.glob('*.TXT'):
reader = open(file)
headerLine = reader.readlines()
for col in headerLine:
valueList = col.split(",")
data = np.array([valueList])
from_ = np.array(data[1:,[5]],dtype=np.float32)
# trying to assign a name to columns for easy reference
to = np.array(data[1:,[6]],dtype=np.float32)
if data[:,[1]] == 'C005706N':
if data[:,[from_] < 1.0]:
print data[:,[from_]]
array_setter()
If you want to index array columns by name name, I would recommend turning the array into a pandas dataframe. For example,
import pandas as pd
import numpy as np
arr = np.array([[1, 2], [3, 4]])
df = pd.DataFrame(arr, columns=['f', 's'])
print df['f']
The nice part of this approach is that the arrays still maintain all their structure but you also get all the optimized indexing/slicing/etc. capabilities of pandas. For example, if you wanted to find elements of 'f' that corresponded to elements of 's' being equal to some value a, then you could use loc
a = 2
print df.loc[df['s']==2, 'f']
Check out the pandas docs for different ways to use the DataFrame object. Or you could read the book by Wes McKinney (pandas creator), Python for Data Analysis. Even though it was written for an older version of pandas, it's a great starting point and will set you in the right direction.

Categories

Resources