Print DataFrame to reuse in the creation of a new one - python

I recently heard about the pandas read_clipboard() function and it has been super useful to quickly use DataFrames from SO questions. The problem is that the DataFrame must be the last copied thing. Is there a way to for example print a DataFrame in a way that can be used to hardcode a new DataFrame. I'll try to make this a bit clearer:
Say I find this DataFrame somewhere:
a b
0 1 4
1 2 5
2 3 6
So I can copy this DataFrame and then import it in my code like this
df = pd.read_clipboard()
But when I run this script later I have to make sure the DataFrame is the last thing I copied. What I'm looking for is a function (print_to_reuse()) that does something like this:
df.print_to_reuse()
out: {'a': [1, 2, 3], 'b': [4, 5, 6]}
Now I could copy this output and hardcode the definition of df as
df = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]})
In this way, it doesn't matter when I rerun my code and what is the last thing I copied.
I can think of a method that does the same but it seems like there should be an easier approach. I could export the copied DataFrame as a csv and then later on import this csv like this:
df = pd.read_clipboard()
df.to_csv("path")
And then
df = read_csv("path")
use_df()
So basically, is there a way to do this that doesn't require making a new csv?
Thank you.

I think you are looking for:
df.to_dict("list")
Which will give:
{'a': [1, 2, 3], 'b': [4, 5, 6]}
That you can use to run later the same script instead of read_clipboard()

Related

Dictionary from columns' elements and indexe

I have table like this:
Column A
Column B
a
[1, 2, 3]
b
[4, 1, 2]
And I want to create dictionary like this using NumPy:
{1: [a, b],
2: [a, b],
3: [a],
4: [b]}
is there a more or less simple way to do this?
Let us try with explode
d = df.explode('col2').groupby('col2')['col1'].agg(list).to_dict()
Out[206]: {1: ['a', 'b'], 2: ['a', 'b'], 3: ['a'], 4: ['b']}
As long as I know, numpy doesn't support dictionaries, it actually uses Arrays (numpy Arrays), as you can see here.
But there are many ways to achieve the creation of a dict from a pandas dataframe. Well, looping over dataframes is not a good practice as you can see in this answer, so we can use pandas.to_numpy as follows:
import pandas as pd
import numpy as np
d = {'col1': ['a', 'b'], 'col2': [[1,2,3], [4,1,2]]}
df = pd.DataFrame(data=d)
my_dict = {}
np_array=df.to_numpy()
for row in np_array:
my_dict.update({row[0]: row[1]})
Output:
>my_dict: {'a': [1, 2, 3], 'b': [4, 1, 2]}
Which is different from the output you wished, but I didn't see the pattern on it. Could you clarity more?
UPDATED
To achieve the output you want, one possible way is to iterate over each row then over the values in the list, like this:
for row in np_array:
for item in row[1]:
if item in my_dict.keys():
my_dict[item].append(row[0])
else:
my_dict.update({item: [row[0]]})

How to sort rows based on column combinations - python

Is there a way to sort a dataframe by a combination of different columns? As in if specific columns match among rows, they will be clustered together? An example below: Any help is greatly appreciated!
Original DataFrame
Transformed DataFrame
One way to sort pandas dataframe is to use .sort_values().
The code below replicates your sample dataframe:
df= pd.DataFrame({'v1': [1, 3, 2, 1, 4, 3],
'v2': [2, 2, 4, 2, 3, 2],
'v3': [3, 3, 2, 3, 2, 3],
'v4': [4, 5, 1, 4, 2, 5]})
Using the code below, can sort the dataframe by both column v1 and v2. In this case, v2 is only used to break ties.
df.sort_values(by=['v1', 'v2'], ascending=True)
"by" parameter here is not limited to any number of variables, so could extend the list to include more variables in desired order.
This is the best to match your sort pattern shown in the image.
import pandas as pd
df = pd.DataFrame(dict(
v1=[1,3,2,1,4,3],
v2=[2,2,4,2,3,2],
v3=[3,3,2,3,2,3],
v4=[4,5,1,4,2,5],
))
# Make a temp column to sort the df by
df['sort'] = df.astype(str).values.sum(axis=1)
# Sort the df by that column, drop it and reset the index
df = df.sort_values(by='sort').drop(columns='sort').reset_index(drop=1)
print(df)
Link you can refe - Code in python tutor
Edit: Zolzaya Luvsandorj's recommendation is better:
import pandas as pd
df = pd.DataFrame(dict(
v1=[1,3,2,1,4,3],
v2=[2,2,4,2,3,2],
v3=[3,3,2,3,2,3],
v4=[4,5,1,4,2,5],
))
df = df.sort_values(by=list(df.columns)).reset_index(drop=1)
print(df)
Link you can refe - Better code in python tutor

How to get the 'create' script from a pandas dataframe?

I have a pandas dataframe df. And lets say I wanted to share df with you guys here to allow you to easily recreate df in your own notebook.
Is there a command or function that will generate the pandas dataframe create statement? I realize that for a lot of data the statement would be quite large seeing that it must include the actual data, so a header would be ideal.
Essentially, a command that I can run on df and get something like this:
d = {'col1': [1, 2], 'col2': [3, 4]}
df = pd.DataFrame(data=d)
or
df = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]),
... columns=['a', 'b', 'c'])
I'm not sure how to even phrase this question. Like taking a dataframe and deconstructing the top 5 rows or something?
We usually using read_clipboard
pd.read_clipboard()
Out[328]:
col1 col2
0 1 3
1 2 4
Or If you have the df save it into dict so that we can easily convert it back to the sample we need
df.head(5).to_dict()

python original list changes when try to modify a set of such lists [duplicate]

This question already has answers here:
How do I clone a list so that it doesn't change unexpectedly after assignment?
(24 answers)
Closed 5 years ago.
I create two dictionaries: d1 and d2 and put them in a list c.
d1 = {'col1': [1, 2], 'col2': [3, 4]}
d2 = {'col1': [3, 6], 'col2': [5, 6]}
c=[d1,d2]
When I change an value in list c:
c[0]["col1"][0]=3
c
[{'col1': [3, 2], 'col2': [3, 4]}, {'col1': [3, 6], 'col2': [5, 6]}]
I surprisingly find that the specific value in original dictionary d1 also changed:
d1
{'col1': [3, 2], 'col2': [3, 4]}
Can anyone explain this to me? Why does d1 change together when I only try to modify values in list c?
So can I understand it this way that once I try to modify such a list, its original element (could be a dictionary, a list, or even a dataframe) will change at the same time?
You asked python to insert your dictionary of lists into a list called c. This will not copy the internal contents of the lists in c (sometimes this behaviour is wanted). So c[0]['col1'][0] points to exactly the same float as d1['col1'][0]. Changing one also changes the other. If you want to create c which copies all the data in d1 and d2 do:
import copy
d1 = {'col1': [1, 2], 'col2': [3, 4]}
d2 = {'col1': [3, 6], 'col2': [5, 6]}
c = copy.deepcopy([d1,d2])
This is entirely down to the mutability of lists - and how they are passed by reference.
You are not the first to be caught out by this, nearly every beginner will come across this at some point and get confused. The reason this is happening is that lists (and all other mutable data structures) are never copied in memory to different locations.
What I mean by this is when you call a function such as the following, you aren't actually passing in l, you are merely passing in what is essentially its location in memory - a reference to where it is.
def f(l):
l[0] = 9
So that explains what may seem to be confusing behaviour of the following snippet:
>>> ls = [0, 1, 2]
>>> f(ls)
>>> ls
[9, 1, 2]
So to apply this to your scenario, when you create the c list, you are only storing references to the dictionaries (which are mutable in the same way lists are). Thus, when you modify either the reference in the list, or the dictionary itself, you are always modifying the same section in memory, so the change is reflected in the other variable.
To give one final simple example to demonstrate the mutable nature of dictionaries:
>>> d = {3 : 4}
>>> dd = d
>>> dd[3] = 5
>>> dd
{3: 5}
>>> d
{3: 5}
Finally, this post has a great explanation of Python variables if you are interested in further reading.

set column of pandas.DataFrame object

Ideally, I want to be able something like:
cols = ['A', 'B', 'C']
df = pandas.DataFrame(index=range(5), columns=cols)
df.get_column(cols[0]) = [1, 2, 3, 4, 5]
What is the pythonic/pandonic way to do this?
Edit: I know that I can access the column 'A' by df.A, but in general I do not know what the column names are.
You do not need to store what columns a DataFrame has separately.
You can find out what columns exist in a pandas DataFrame by accessing the DataFrame.columns variable.
To access the Series attached to a particular column, you can use the getitem method of the DataFrame []
Tiny example:
col = df.columns[0]
df[col] = [1, 2, 3, 4, 5]
Okay, this is particularly straightforward.
df[cols[0]] = [1, 2, 3, 4, 5]

Categories

Resources