python - How can I transform a dataframe in pandas without losing my index? -
i need winsorize 2 columns in dataframe of 12 columns.
say, have columns 'a', 'b', 'c', , 'd', each series of values. given cleaned nan columns, number of columns reduced 100 80, still indexed 100 gaps (e.g. row 5 missing).
i want transform columns 'a' , 'b' via winsorize method. this, must convert columns np.array.
import scipy.stats df['a','b','c','d'] = #some values per each column ab_df = df['a','b'] x = scipy.stats.mstats.winsorize(ab_df.values, limits=0.01) new_ab_df = pd.dataframe(x, columns = ['a','b']) df = pd.concat([df['c','d'], new_ab_df], axis=1, join='inner', join_axes=[df.index])
when convert np.array, pd.dataframe, it's len() correct @ 80 indexes have been reset 0->80. how can ensure transform 'a' , 'b' columns indexed correctly? don't think can use apply(), preserve index order , swap out values instead of approach, creates transformed copy of df 2 columns, concats them rest of non-transformed columns.
you can inplace original dataframe.
from description of question, sounds confusing rows , columns (i.e. first dataframe has 12 columns, , number of columns reduced 100 80).
it best provide minimal example of data in question. lacking this, here data based on assumptions:
import numpy np import scipy.stats import pandas pd np.random.seed(0) df = pd.dataframe(np.random.randn(7, 5), columns=list('abcde')) df.iat[1, 0] = np.nan df.iat[3, 1] = np.nan df.iat[5, 2] = np.nan >>> df b c d e 0 1.764052 0.400157 0.978738 2.240893 1.867558 1 nan 0.950088 -0.151357 -0.103219 0.410599 2 0.144044 1.454274 0.761038 0.121675 0.443863 3 0.333674 nan -0.205158 0.313068 -0.854096 4 -2.552990 0.653619 0.864436 -0.742165 2.269755 5 -1.454366 0.045759 nan 1.532779 1.469359 6 0.154947 0.378163 -0.887786 -1.980796 -0.347912
my assumption drop row nan, , winsorize.
mask = df.notnull().all(axis=1), ['a', 'b'] df.loc[mask] = scipy.stats.mstats.winsorize(df.loc[mask].values, limits=0.4)
i applied high limit winsorize function results more obvious on small dataset.
>>> df b c d e 0 0.400157 0.400157 0.978738 2.240893 1.867558 1 nan 0.950088 -0.151357 -0.103219 0.410599 2 0.378163 0.400157 0.761038 0.121675 0.443863 3 0.333674 nan -0.205158 0.313068 -0.854096 4 0.378163 0.400157 0.864436 -0.742165 2.269755 5 -1.454366 0.045759 nan 1.532779 1.469359 6 0.378163 0.378163 -0.887786 -1.980796 -0.347912
Comments
Post a Comment