LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 23395 - raw_svector_ostream will heap allocate without need
Summary: raw_svector_ostream will heap allocate without need
Status: RESOLVED FIXED
Alias: None
Product: new-bugs
Classification: Unclassified
Component: new bugs (show other bugs)
Version: trunk
Hardware: PC Windows NT
: P normal
Assignee: Unassigned LLVM Bugs
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-05-01 23:33 PDT by Yaron Keren
Modified: 2015-08-26 23:49 PDT (History)
5 users (show)

See Also:
Fixed By Commit(s):


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Yaron Keren 2015-05-01 23:33:31 PDT
The code line

 OS.reserve(OS.size() + 64);

called from raw_svector_ostream::write_impl called from raw_svector_ostream::~raw_svector_ostream will heap allocate without need. The issue is already documented:

raw_svector_ostream::~raw_svector_ostream() {
  // FIXME: Prevent resizing during this flush().
  flush();
}

And a solution is provided in raw_svector_ostream::init():

  // Set up the initial external buffer. We make sure that the buffer has at
  // least 128 bytes free; raw_ostream itself only requires 64, but we want to
  // make sure that we don't grow the buffer unnecessarily on destruction (when
  // the data is flushed). See the FIXME below.
  OS.reserve(OS.size() + 128);

This solution may be worse than the problem. In total:

* If the SmallString was less than 128 bytes init() will always(!) heap allocate. 
* If it's more or equal to 128 bytes but upon destruction less than 64 bytes are left unused it will heap allocate for no reason. 

A careful programmer who did size the SmallString to match its use + small reserve will find either of these heap allocations surprising, negating the SmallString advantage and then paying memcpy cost.
Comment 1 Duncan 2015-05-01 23:39:28 PDT
I haven't looked very closely, but I don't understand why there are any speculative calls to `reserve()`.  `push_back()` and `append()` take care of resizing efficiently, and only when necessary.
Comment 2 Yaron Keren 2015-05-02 00:03:04 PDT
I didn't understand either why it buffers a memory buffer... ultimately I hope a low-overhead class like raw_char_ostream will replace StringStream + raw_svector_ostream usage but given the amount of such existing code this issue should be solved regardless.
Comment 3 Duncan 2015-05-02 00:15:48 PDT
I think you should delete both/all the speculative calls to `reserve()` and see if the tests pass (maybe a caller is relying on something strange?).
Comment 4 Yaron Keren 2015-05-02 00:52:45 PDT
raw_ostream assumes that the buffer has at least one byte available. This is asserted in raw_ostream::SetBufferAndMode

  assert(((Mode == Unbuffered && !BufferStart && Size == 0) ||
          (Mode != Unbuffered && BufferStart && Size != 0)) &&
         "stream must be unbuffered or have at least one byte");

and required in raw_ostream::write

    if (LLVM_UNLIKELY(OutBufCur == OutBufStart)) {
      assert(NumBytes != 0 && "undefined behavior");
      size_t BytesToWrite = Size - (Size % NumBytes);
      write_impl(Ptr, BytesToWrite);

To make this work further changes are required.

raw_ostreaem::write logic is designed for independent buffer and stream so it is a bad, complex solution when buffer == stream. With raw_char_ostream I overrode write() completely instead of patching it yet more, resulting in a smaller, simpler, clearer code. 

It may be possible to do the same for raw_svector_ostream. Some overhead of coordinating between raw_svector_ostream and SmallString which is not really required will remain but this may be neglible. 

I'll try this approach too and see if it can achieve near-raw_char_ostream speed.
Comment 5 Yaron Keren 2015-08-26 23:49:14 PDT
r244870