Hi all I need to rotate two dimensional array as shown in the given picture. and if we rotate one set of array it should reflect for all the problems if you find out please do help me to solve the issue
input:
output:
Thankyou
I have tried slicing method to rotate the values but it doesn't give the correct values
import pandas as pd
df = pd.read_csv("/content/pipe2.csv")
df1= df.iloc[6:10]+df.iloc[13:20]
df1
You can use numpy.roll and the DataFrame constructor:
N = -2
out = pd.DataFrame(np.roll(df, N, axis=1),
columns=df.columns, index=df.index)
Example output:
0 1 2 3 4 5 6
0 3 4 5 6 7 1 2
Used input:
0 1 2 3 4 5 6
0 1 2 3 4 5 6 7
Use this:
import pandas as pd
df = pd.read_csv("/content/pipe2.csv")
df1=pd.DataFrame(data=df)
df1_transposed = df1.transpose()
df1_transposed
Related
So I have a DataFrame with (amongst others) four colours with numerical values. I want to add a column to the DataFrame that has the maximum of the two sums obtained from summing two columns.
My solutions so far is
from pandas import DataFrame
df = DataFrame(data={'text': ['a','b','c'], 'a':[1,2,3],'b':[2,3,4],'c':[5,4,2],'d':[-2,4,1]})
df['sum1'] = df['a'].add(df['b'])
df['sum2'] = df['c'].add(df['d'])
df['maxsum'] = df[['sum1','sum2']].max(axis=1)
which gives the desired result.
I am pretty sure, there is a more concise way to do this...
There is nothing wrong with your approach. In fact, it is the approach I would take if nothing more than the fact it is easy to read and figure out what you are doing. But if you are looking for another solution, here is one using numpy.ufunc.reduceat
import pandas as pd
import numpy as np
# sample frame
df = pd.DataFrame(data={'text': ['a','b','c'], 'a':[1,2,3],'b':[2,3,4],'c':[5,4,2],'d':[-2,4,1]})
# we skip the first column and convert to an array - df[df.columns[1:]].values
# we specify the indicies to slice - np.arange(len(df.columns[1:]))[::2]
# then find the max
df['max'] = np.max(np.add.reduceat(df[df.columns[1:]].values,
np.arange(len(df.columns[1:]))[::2],
axis=1),
axis=1)
text a b c d max
0 a 1 2 5 -2 3
1 b 2 3 4 4 8
2 c 3 4 2 1 7
Not that it much more concised, but instead of your current approach you can apply one-shot assignment:
df = df.assign(sum1=df[['a', 'b']].sum(1), sum2=df[['c', 'd']].sum(1),
maxsum=lambda df: df[['sum1','sum2']].max(1))
text a b c d sum1 sum2 maxsum
0 a 1 2 5 -2 3 3 3
1 b 2 3 4 4 5 8 8
2 c 3 4 2 1 7 3 7
I have a data frame that looks like below:
import pandas as pd
import numpy as np
df = pd.DataFrame({'id':[1,2,3],'class':[7,5,3], 'grades':[[4,5,6],[8],[]]})
what I am trying to do here is that I want to duplicate each row for the number of element in the list in "grades" column.
It is kind of hard to explain, so it will be better to show the desired output.
output = pd.DataFrame({'id':[1,1,1,2,3],'class':[7,7,7,5,3], 'grades':[4,5,6,8, np.nan]})
I have look through some solutions, but could not figure out a way.
It will be great if someone can provide a guidance.
it is explode
df.explode('grades')
Out[11]:
id class grades
0 1 7 4
0 1 7 5
0 1 7 6
1 2 5 8
2 3 3 NaN
I would like to know whether I can get some help in "translating" a multi dim list in a single column of a frame in pandas.
I found help here to translate a multi dim list in a column with multiple columns, but I need to translate the data in one
Suppose I have the following list of list
x=[[1,2,3],[4,5,6]]
If I create a frame I get
frame=pd.Dataframe(x)
0 1 2
1 2 3
4 5 6
But my desire outcome shall be
0
1
2
3
4
5
6
with the zero as column header.
I can of course get the result with a for loop, which from my point of view takes much time. Is there any pythonic/pandas way to get it?
Thanks for helping men
You can use np.concatenate
x=[[1,2,3],[4,5,6]]
frame=pd.DataFrame(np.concatenate(x))
print(frame)
Output:
0
0 1
1 2
2 3
3 4
4 5
5 6
First is necessary flatten values of lists and pass to DataFrame constructor:
df = pd.DataFrame([z for y in x for z in y])
Or:
from itertools import chain
df = pd.DataFrame(list(chain.from_iterable(x)))
print (df)
0
0 1
1 2
2 3
3 4
4 5
5 6
If you use numpy you can utilize the method ravel():
pd.DataFrame(np.array(x).ravel())
Like the title suggests, I've been working on a script that I can use to find duplicate lines in a CSV file based on multiple cells using pandas.
So far I have managed to write a script that will look at one cell, and output all the duplicates as a CSV file. However, I am now having trouble adding a second cell for the script to look at.
The code I currently have looks like this:
import pandas as pd
df = pd.read_csv('input.csv', sep=';', dtype=str)
names = df["FULLNAME"]
duplicates = df[names.isin(names[names.duplicated()])].sort_values("FULLNAME")
duplicates.to_csv('DUPLICATE_OUTPUT.csv')
Any help would be greatly appreciated!
Thanks!
Instead of using pd.Series's duplicated , you can use pd.DataFrame's duplicated, which takes the subset argument.
Reference here
Example:
In [1]: import pandas as pd
In [2]: df = pd.DataFrame({'x': [1,2,3,4,1,2,3,4], 'y': [1,2,3,4,1,2,5,6], 'random': [1,3,143,15,1,3,2,1]})
In [3]: df
Out[3]:
random x y
0 1 1 1
1 3 2 2
2 143 3 3
3 15 4 4
4 1 1 1
5 3 2 2
6 2 3 5
7 1 4 6
In [4]: df[df.duplicated(['x', 'y'])]
Out[4]:
random x y
4 1 1 1
5 3 2 2
When i try to convert tuples to pandas dataframe i get the following error:
DataFrame constructor not properly called!
I am using the following code
columnlist=["Timestamp","Price","Month","Day","DayofWeek","tDaysleftMonth","tDayinMonth","tDayinWeek"]
tickerData=pd.DataFrame(tickerDataRaw,columns=columnlist)
The data was loaded to tuples from a MySQL database ,
Please find a screenshot of the data. Data I am trying to convert
I think you can use DataFrame.from_records with converting tuples to list:
import pandas as pd
tuples = ((1,2,3),(4,6,7),(7,3,6),(8,2,7),(4,6,3),(7,3,6))
columnlist = ['a','b','c']
df = pd.DataFrame.from_records(list(tuples), columns=columnlist)
print (df)
a b c
0 1 2 3
1 4 6 7
2 7 3 6
3 8 2 7
4 4 6 3
5 7 3 6
Another solution with DataFrame constructor only:
import pandas as pd
tuples = ((1,2,3),(4,6,7),(7,3,6),(8,2,7),(4,6,3),(7,3,6))
columnlist = ['a','b','c']
df = pd.DataFrame(list(tuples), columns=columnlist)
print (df)
a b c
0 1 2 3
1 4 6 7
2 7 3 6
3 8 2 7
4 4 6 3
5 7 3 6
EDIT:
If check DataFrame and parameter data:
data : numpy ndarray (structured or homogeneous), dict, or DataFrame
Dict can contain Series, arrays, constants, or list-like objects
According to the Dataframe documentation page, data is required to be
numpy ndarray (structured or homogeneous), dict, or DataFrame
The easiest way to resolve your problem is simplty load your data in a numpy array and it should work fine.
>>> tuples = ((1,2,3),(1,2,3),(1,2,3))
>>> columns = ["A", "B", "C"]
>>> pd.DataFrame(tuples, columns=columns)
PandasError: DataFrame constructor not properly called!
>>> pd.DataFrame(np.array(tuples), columns=columns)
A B C
0 1 2 3
1 1 2 3
2 1 2 3