python 2.7 - Memory usage during and after drop_duplicates() -
python 2.7 - Memory usage during and after drop_duplicates() -
i working info frame takes 2 gb of memory (according htop) dimensions (6287475,19). info frame heterogeneous in info type, not matter. after loading info frame drop duplicate rows using command
df.drop_duplicates(inplace=true)
during execution of command memory usage jumps 7 gb. after command completed memory reduced 5 gb, more twice memory required store single instance of info frame. if delete info frame del df
memory usage decreases 3 gb.
the behavior same if following:
df2 = df.drop_duplicates del df del df2
running gc.collect()
nil , memory usage returns baseline level after terminating python session. memory leak? has seen similar behavior?
environment:
64-bit linux python 2.7.7 (64-bit) pandas 0.14.1 numpy 1.8.2 ipython 2.2.0 (behavior same cpython) python-2.7 pandas
Comments
Post a Comment