python 2.7 - Memory usage during and after drop_duplicates() -



python 2.7 - Memory usage during and after drop_duplicates() -

i working info frame takes 2 gb of memory (according htop) dimensions (6287475,19). info frame heterogeneous in info type, not matter. after loading info frame drop duplicate rows using command

df.drop_duplicates(inplace=true)

during execution of command memory usage jumps 7 gb. after command completed memory reduced 5 gb, more twice memory required store single instance of info frame. if delete info frame del df memory usage decreases 3 gb.

the behavior same if following:

df2 = df.drop_duplicates del df del df2

running gc.collect() nil , memory usage returns baseline level after terminating python session. memory leak? has seen similar behavior?

environment:

64-bit linux python 2.7.7 (64-bit) pandas 0.14.1 numpy 1.8.2 ipython 2.2.0 (behavior same cpython)

python-2.7 pandas

Comments

Popular posts from this blog

formatting - SAS SQL Datepart function returning odd values -

c++ - Apple Mach-O Linker Error(Duplicate Symbols For Architecture armv7) -

php - Yii 2: Unable to find a class into the extension 'yii2-admin' -