52046 – Sparse CPU integration tests are leaking

LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 52046 - Sparse CPU integration tests are leaking

Summary: Sparse CPU integration tests are leaking

Status:	RESOLVED FIXED

Alias:	None

Product:	MLIR
Classification:	Unclassified
Component:	default (show other bugs)
Version:	unspecified
Hardware:	PC All

Importance:	P enhancement
Assignee:	Aart Bik

URL:
Keywords:

Depends on:
Blocks:

Reported:	2021-10-02 20:32 PDT by Mehdi Amini
Modified:	2021-10-05 10:02 PDT (History)
CC List:	3 users (show)

See Also:
Fixed By Commit(s):

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Mehdi Amini 2021-10-02 20:32:40 PDT

Running with ASAN, tests like Integration/Dialect/SparseTensor/CPU/sparse_sum.mlir have multiple leaks with memory is allocated from `_mlir_ciface_newSparseTensor`.

I disabled the leak checker for now in mlir/test/Integration/Dialect/SparseTensor/CPU/lit.local.cfg

Comment 1 Mehdi Amini 2021-10-02 20:35:10 PDT

The Sparse tensor dialect has a "new" operation but nothing to "destroy" apparently?

https://mlir.llvm.org/docs/Dialects/SparseTensorOps/#sparse_tensornew-mlirsparse_tensornewop

Comment 2 Aart Bik 2021-10-02 22:03:02 PDT

Yes, the memory management was still a bit "in flight" waiting for a full bufferization story, but I agree that does not mean we should not have a proper delete available (the runtime support it, there is just no "delete" op yet).

I will add that Monday, so we can start using the leaker properly. Thanks for bringing this to higher priority.

Comment 3 Stephan Herhut 2021-10-04 07:24:49 PDT

With the recent changes to the buffer deallocation pass, it should now also be able to handle your use case. You will need to implement the AllocationOpInterface for you new operation so that the pass knows how to deallocate and clone a sparse tensor.

The clone operation is only needed if you have control flow.

Comment 4 Mehdi Amini 2021-10-04 10:57:03 PDT

Note that in this case these are .mlir tests that are written with the leaks, it does not come from a bufferization pass introducing leaking allocations.
I just couldn't update the test without a primitive available for the "dealloc".

Comment 5 Aart Bik 2021-10-04 11:01:04 PDT

Yes, ad Stephan's coomment, this is not the "inplaceable" example I usually present as something that needs to be addressed.

I was merely referring to the fact that having "tensors" as type and "sparse storage schemes" as underlying storage formats creates some tension on new/delete operations. But I found a good compromise for now which will fix the leaks.

I expect to refine the ops as our bufferization strategy at-large converges on a single solution for the dense and sparse world.

Comment 6 Stephan Herhut 2021-10-05 00:56:44 PDT

The buffer-deallocation pass can be run once the program already is on buffers. It does not reason about performing computations in place. It only manages lifetimes by inserting appropriate dealloc operations.

For hand-written tests this is not required but might be useful in an end to end flow.

Comment 7 Aart Bik 2021-10-05 10:00:04 PDT

Revision https://reviews.llvm.org/D111099 addresses the immediate concern of leaking memory. The sparse integration tests can now run with asan enabled again, since all memory is properly released.

In the long run, we need to unify dense and sparse bufferization and memory management somehow. Mehdi's efforts to have asan on by default have put this strongly on my immediate radar screen!