python 2.7 - Memory usage during and after drop_duplicates() -
python 2.7 - Memory usage during and after drop_duplicates() -
i working info frame takes 2 gb of memory (according htop) dimensions (6287475,19). info frame heterogeneous in info type, not matter. after loading info frame drop duplicate rows using command
df.drop_duplicates(inplace=true)    during execution of command memory usage jumps 7 gb. after command completed memory reduced 5 gb, more twice memory required store single instance of   info frame. if delete   info frame del df memory usage decreases 3 gb. 
the behavior same if following:
df2 = df.drop_duplicates del df del df2    running gc.collect()  nil , memory usage returns baseline level after terminating python session. memory leak? has seen similar behavior? 
environment:
64-bit linux python 2.7.7 (64-bit) pandas 0.14.1 numpy 1.8.2 ipython 2.2.0 (behavior same cpython) python-2.7 pandas 
 
  
Comments
Post a Comment