Skip to content

Memory Leak #386

@basejn

Description

@basejn

SFrame is stated to deal with large amounts of data without using mush RAM , but it leaks memory on simple tasks.

This sample of code continues to increase the RAM usage forever , and is strangely slow.
The speed can be explained with the disk storage that the library uses to deal with large sets.

import sframe as sf
data = sf.SFrame({'a':['string']*1000,'b':[1]*1000,'c':[{'key1':1}]*1000})
for i in xrange(10000):
    a = data.to_numpy()

Another example is:

suma=0
for i in xrange(10000):
    for row in datain:
        suma+=row['b']

The RAM usage steadily increases.

This are just samples , not real usage.

The thing that i am trying to accomplish with the library is to read the data from the SFrame one by one or batch by batch and agregate it without loading it in RAM .Actually to construct a sparse matrix which i will use for training with Gradient Descent.

If i iterate it batch by batch , after the iteration SFrame uses a lot of ram and doesn't release it. It uses no less memory that the real size of the data so using it becomes pointless.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions