python - How can I transform a dataframe in pandas without losing my index? -


i need winsorize 2 columns in dataframe of 12 columns.

say, have columns 'a', 'b', 'c', , 'd', each series of values. given cleaned nan columns, number of columns reduced 100 80, still indexed 100 gaps (e.g. row 5 missing).

i want transform columns 'a' , 'b' via winsorize method. this, must convert columns np.array.

import scipy.stats df['a','b','c','d'] = #some values per each column ab_df = df['a','b'] x = scipy.stats.mstats.winsorize(ab_df.values, limits=0.01) new_ab_df = pd.dataframe(x, columns = ['a','b']) df = pd.concat([df['c','d'], new_ab_df], axis=1, join='inner', join_axes=[df.index]) 

when convert np.array, pd.dataframe, it's len() correct @ 80 indexes have been reset 0->80. how can ensure transform 'a' , 'b' columns indexed correctly? don't think can use apply(), preserve index order , swap out values instead of approach, creates transformed copy of df 2 columns, concats them rest of non-transformed columns.

you can inplace original dataframe.

from description of question, sounds confusing rows , columns (i.e. first dataframe has 12 columns, , number of columns reduced 100 80).

it best provide minimal example of data in question. lacking this, here data based on assumptions:

import numpy np import scipy.stats import pandas pd  np.random.seed(0) df = pd.dataframe(np.random.randn(7, 5), columns=list('abcde')) df.iat[1, 0] = np.nan df.iat[3, 1] = np.nan df.iat[5, 2] = np.nan  >>> df                   b         c         d         e 0  1.764052  0.400157  0.978738  2.240893  1.867558 1       nan  0.950088 -0.151357 -0.103219  0.410599 2  0.144044  1.454274  0.761038  0.121675  0.443863 3  0.333674       nan -0.205158  0.313068 -0.854096 4 -2.552990  0.653619  0.864436 -0.742165  2.269755 5 -1.454366  0.045759       nan  1.532779  1.469359 6  0.154947  0.378163 -0.887786 -1.980796 -0.347912 

my assumption drop row nan, , winsorize.

mask = df.notnull().all(axis=1), ['a', 'b'] df.loc[mask] = scipy.stats.mstats.winsorize(df.loc[mask].values, limits=0.4) 

i applied high limit winsorize function results more obvious on small dataset.

>>> df                   b         c         d         e 0  0.400157  0.400157  0.978738  2.240893  1.867558 1       nan  0.950088 -0.151357 -0.103219  0.410599 2  0.378163  0.400157  0.761038  0.121675  0.443863 3  0.333674       nan -0.205158  0.313068 -0.854096 4  0.378163  0.400157  0.864436 -0.742165  2.269755 5 -1.454366  0.045759       nan  1.532779  1.469359 6  0.378163  0.378163 -0.887786 -1.980796 -0.347912 

Comments

Popular posts from this blog

java - Run spring boot application error: Cannot instantiate interface org.springframework.context.ApplicationListener -

reactjs - React router and this.props.children - how to pass state to this.props.children -

Excel VBA "Microsoft Windows Common Controls 6.0 (SP6)" Location Changes -