LLVM 23.0.0git
ObjectStore.h
Go to the documentation of this file.
1//===- llvm/CAS/ObjectStore.h -----------------------------------*- C++ -*-===//
2//
3// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4// See https://llvm.org/LICENSE.txt for license information.
5// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6//
7//===----------------------------------------------------------------------===//
8///
9/// \file
10/// This file contains the declaration of the ObjectStore class.
11///
12//===----------------------------------------------------------------------===//
13
14#ifndef LLVM_CAS_OBJECTSTORE_H
15#define LLVM_CAS_OBJECTSTORE_H
16
17#include "llvm/ADT/StringRef.h"
18#include "llvm/CAS/CASID.h"
20#include "llvm/Support/Error.h"
22#include <cstddef>
23
24namespace llvm {
25
26class MemoryBuffer;
27template <typename T> class unique_function;
28
29namespace cas {
30
31class ObjectStore;
32class ObjectProxy;
33
34/// Content-addressable storage for objects.
35///
36/// Conceptually, objects are stored in a "unique set".
37///
38/// - Objects are immutable ("value objects") that are defined by their
39/// content. They are implicitly deduplicated by content.
40/// - Each object has a unique identifier (UID) that's derived from its content,
41/// called a \a CASID.
42/// - This UID is a fixed-size (strong) hash of the transitive content of a
43/// CAS object.
44/// - It's comparable between any two CAS instances that have the same \a
45/// CASIDContext::getHashSchemaIdentifier().
46/// - The UID can be printed (e.g., \a CASID::toString()) and it can parsed
47/// by the same or a different CAS instance with \a
48/// ObjectStore::parseID().
49/// - An object can be looked up by content or by UID.
50/// - \a store() is "get-or-create" methods, writing an object if it
51/// doesn't exist yet, and return a ref to it in any case.
52/// - \a loadObject(const CASID&) looks up an object by its UID.
53/// - Objects can reference other objects, forming an arbitrary DAG.
54///
55/// The \a ObjectStore interface has a few ways of referencing objects:
56///
57/// - \a ObjectRef encapsulates a reference to something in the CAS. It is an
58/// opaque type that references an object inside a specific CAS. It is
59/// implementation defined if the underlying object exists or not for an
60/// ObjectRef, and it can used to speed up CAS lookup as an implementation
61/// detail. However, you don't know anything about the underlying objects.
62/// "Loading" the object is a separate step that may not have happened
63/// yet, and which can fail (e.g. due to filesystem corruption) or introduce
64/// latency (if downloading from a remote store).
65/// - \a ObjectHandle encapulates a *loaded* object in the CAS. You need one of
66/// these to inspect the content of an object: to look at its stored
67/// data and references. This is internal to CAS implementation and not
68/// availble from CAS public APIs.
69/// - \a CASID: the UID for an object in the CAS, obtained through \a
70/// ObjectStore::getID() or \a ObjectStore::parseID(). This is a valid CAS
71/// identifier, but may reference an object that is unknown to this CAS
72/// instance.
73/// - \a ObjectProxy pairs an ObjectHandle (subclass) with a ObjectStore, and
74/// wraps access APIs to avoid having to pass extra parameters. It is the
75/// object used for accessing underlying data and refs by CAS users.
76///
77/// Both ObjectRef and ObjectHandle are lightweight, wrapping a `uint64_t` and
78/// are only valid with the associated ObjectStore instance.
79///
80/// There are a few options for accessing content of objects, with different
81/// lifetime tradeoffs:
82///
83/// - \a getData() accesses data without exposing lifetime at all.
84/// - \a getMemoryBuffer() returns a \a MemoryBuffer whose lifetime
85/// is independent of the CAS (it can live longer).
86/// - \a getDataString() return StringRef with lifetime is guaranteed to last as
87/// long as \a ObjectStore.
88/// - \a readRef() and \a forEachRef() iterate through the references in an
89/// object. There is no lifetime assumption.
91 friend class ObjectProxy;
92 void anchor();
93
94public:
95 /// Get a \p CASID from a \p ID, which should have been generated by \a
96 /// CASID::print(). This succeeds as long as \a validateID() would pass. The
97 /// object may be unknown to this CAS instance.
98 ///
99 /// TODO: Remove, and update callers to use \a validateID() or \a
100 /// extractHashFromID().
102
103 /// Store object into ObjectStore.
105 ArrayRef<char> Data) = 0;
106 /// Get an ID for \p Ref.
107 virtual CASID getID(ObjectRef Ref) const = 0;
108
109 /// Stores the data of a file into ObjectStore.
110 ///
111 /// An underlying implementation could perform optimizations that reduce I/O
112 /// and disk space consumption.
113 ///
114 /// If there are any concurrent modifications to the file, the contents in the
115 /// CAS may be corrupt.
116 ///
117 /// \param FilePath the path of the file data.
119
120 /// Exports the data of an object to a file path. It does not include any
121 /// references of the object.
122 ///
123 /// An underlying implementation could perform optimizations that reduce I/O
124 /// and disk space consumption.
125 ///
126 /// \param Node the object to read data from.
127 /// \param FilePath the path of the file data.
128 virtual Error exportDataToFile(ObjectHandle Node, StringRef Path) const;
129
130 /// Get an existing reference to the object called \p ID.
131 ///
132 /// Returns \c None if the object is not stored in this CAS.
133 virtual std::optional<ObjectRef> getReference(const CASID &ID) const = 0;
134
135 /// \returns true if the object is directly available from the local CAS, for
136 /// implementations that have this kind of distinction.
138
139 /// Validate the underlying object referred by CASID.
140 virtual Error validateObject(const CASID &ID) = 0;
141
142 /// Validate the entire ObjectStore.
143 virtual Error validate(bool CheckHash) const = 0;
144
145protected:
146 /// Load the object referenced by \p Ref.
147 ///
148 /// Errors if the object cannot be loaded.
149 /// \returns \c std::nullopt if the object is missing from the CAS.
151
152 /// Like \c loadIfExists but returns an error if the object is missing.
154
155 /// Get the size of some data.
157
158 /// Methods for handling objects. CAS implementations need to override to
159 /// provide functions to access stored CAS objects and references.
161 function_ref<Error(ObjectRef)> Callback) const = 0;
162 virtual ObjectRef readRef(ObjectHandle Node, size_t I) const = 0;
163 virtual size_t getNumRefs(ObjectHandle Node) const = 0;
165 bool RequiresNullTerminator = false) const = 0;
166
167 /// Get ObjectRef from open file.
168 virtual Expected<ObjectRef>
170 std::optional<sys::fs::file_status> Status);
171
172 /// Get a lifetime-extended StringRef pointing at \p Data.
173 ///
174 /// Depending on the CAS implementation, this may involve in-memory storage
175 /// overhead.
179
180 /// Get a lifetime-extended MemoryBuffer pointing at \p Data.
181 ///
182 /// Depending on the CAS implementation, this may involve in-memory storage
183 /// overhead.
184 std::unique_ptr<MemoryBuffer>
185 getMemoryBuffer(ObjectHandle Node, StringRef Name = "",
186 bool RequiresNullTerminator = true);
187
188 /// Read all the refs from object in a SmallVector.
189 virtual void readRefs(ObjectHandle Node,
190 SmallVectorImpl<ObjectRef> &Refs) const;
191
192 /// Allow ObjectStore implementations to create internal handles.
193#define MAKE_CAS_HANDLE_CONSTRUCTOR(HandleKind) \
194 HandleKind make##HandleKind(uint64_t InternalRef) const { \
195 return HandleKind(*this, InternalRef); \
196 }
197 MAKE_CAS_HANDLE_CONSTRUCTOR(ObjectHandle)
199#undef MAKE_CAS_HANDLE_CONSTRUCTOR
200
201public:
202 /// Helper functions to store object and returns a ObjectProxy.
203 Expected<ObjectProxy> createProxy(ArrayRef<ObjectRef> Refs, StringRef Data);
204
205 /// Store object from StringRef.
210
211 /// Default implementation reads \p FD and calls \a storeNode(). Does not
212 /// take ownership of \p FD; the caller is responsible for closing it.
213 ///
214 /// If \p Status is sent in it is to be treated as a hint. Implementations
215 /// must protect against the file size potentially growing after the status
216 /// was taken (i.e., they cannot assume that an mmap will be null-terminated
217 /// where \p Status implies).
218 ///
219 /// Returns the \a CASID and the size of the file.
222 std::optional<sys::fs::file_status> Status = std::nullopt) {
223 return storeFromOpenFileImpl(FD, Status);
224 }
225
226 static Error createUnknownObjectError(const CASID &ID);
227
228 /// Create ObjectProxy from CASID. If the object doesn't exist, get an error.
229 Expected<ObjectProxy> getProxy(const CASID &ID);
230 /// Create ObjectProxy from ObjectRef. If the object can't be loaded, get an
231 /// error.
233
234 /// \returns \c std::nullopt if the object is missing from the CAS.
236
237 /// Read the data from \p Data into \p OS.
239 uint64_t MaxBytes = -1ULL) const {
241 assert(Offset < Data.size() && "Expected valid offset");
242 Data = Data.drop_front(Offset).take_front(MaxBytes);
243 OS << toStringRef(Data);
244 return Data.size();
245 }
246
247 /// Set the size for limiting growth of on-disk storage. This has an effect
248 /// for when the instance is closed.
249 ///
250 /// Implementations may leave this unimplemented.
251 virtual Error setSizeLimit(std::optional<uint64_t> SizeLimit) {
252 return Error::success();
253 }
254
255 /// \returns the storage size of the on-disk CAS data.
256 ///
257 /// Implementations that don't have an implementation for this should return
258 /// \p std::nullopt.
260 return std::nullopt;
261 }
262
263 /// Prune local storage to reduce its size according to the desired size
264 /// limit. Pruning can happen concurrently with other operations.
265 ///
266 /// Implementations may leave this unimplemented.
267 virtual Error pruneStorageData() { return Error::success(); }
268
269 /// Validate the whole node tree.
270 Error validateTree(ObjectRef Ref);
271
272 /// Import object from another CAS. This will import the full tree from the
273 /// other CAS.
274 Expected<ObjectRef> importObject(ObjectStore &Upstream, ObjectRef Other);
275
276 /// Print the ObjectStore internals for debugging purpose.
277 virtual void print(raw_ostream &) const {}
278 void dump() const;
279
280 /// Get CASContext
281 const CASContext &getContext() const { return Context; }
282
283 virtual ~ObjectStore() = default;
284
285protected:
286 ObjectStore(const CASContext &Context) : Context(Context) {}
287
288private:
289 const CASContext &Context;
290};
291
292/// Reference to an abstract hierarchical node, with data and references.
293/// Reference is passed by value and is expected to be valid as long as the \a
294/// ObjectStore is.
296public:
297 ObjectStore &getCAS() const { return *CAS; }
298 CASID getID() const { return CAS->getID(Ref); }
299 ObjectRef getRef() const { return Ref; }
300 size_t getNumReferences() const { return CAS->getNumRefs(H); }
301 ObjectRef getReference(size_t I) const { return CAS->readRef(H, I); }
302
303 operator CASID() const { return getID(); }
304 CASID getReferenceID(size_t I) const {
305 std::optional<CASID> ID = getCAS().getID(getReference(I));
306 assert(ID && "Expected reference to be first-class object");
307 return *ID;
308 }
309
310 /// Visit each reference in order, returning an error from \p Callback to
311 /// stop early.
313 return CAS->forEachRef(H, Callback);
314 }
315
316 LLVM_ABI std::unique_ptr<MemoryBuffer>
317 getMemoryBuffer(StringRef Name = "",
318 bool RequiresNullTerminator = true) const;
319
320 /// Get the content of the node. Valid as long as the CAS is valid.
321 StringRef getData() const { return CAS->getDataString(H); }
322
323 /// Exports the data of an object to a file path.
325 return CAS->exportDataToFile(H, Path);
326 }
327
328 friend bool operator==(const ObjectProxy &Proxy, ObjectRef Ref) {
329 return Proxy.getRef() == Ref;
330 }
331 friend bool operator==(ObjectRef Ref, const ObjectProxy &Proxy) {
332 return Proxy.getRef() == Ref;
333 }
334 friend bool operator!=(const ObjectProxy &Proxy, ObjectRef Ref) {
335 return !(Proxy.getRef() == Ref);
336 }
337 friend bool operator!=(ObjectRef Ref, const ObjectProxy &Proxy) {
338 return !(Proxy.getRef() == Ref);
339 }
340
341public:
342 ObjectProxy() = delete;
343
345 return ObjectProxy(CAS, Ref, Node);
346 }
347
348private:
350 : CAS(&CAS), Ref(Ref), H(H) {}
351
352 ObjectStore *CAS;
353 ObjectRef Ref;
354 ObjectHandle H;
355};
356
357/// Create an in memory CAS.
358LLVM_ABI std::unique_ptr<ObjectStore> createInMemoryCAS();
359
360/// \returns true if \c LLVM_ENABLE_ONDISK_CAS configuration was enabled.
362
363/// Create a persistent on-disk path at \p Path.
364LLVM_ABI Expected<std::unique_ptr<ObjectStore>>
365createOnDiskCAS(const Twine &Path);
366
367} // namespace cas
368} // namespace llvm
369
370#endif // LLVM_CAS_OBJECTSTORE_H
assert(UImm &&(UImm !=~static_cast< T >(0)) &&"Invalid immediate!")
AMDGPU Mark last scratch load
#define LLVM_ABI
Definition Compiler.h:213
static cl::opt< unsigned > SizeLimit("eif-limit", cl::init(6), cl::Hidden, cl::desc("Size limit in Hexagon early if-conversion"))
#define I(x, y, z)
Definition MD5.cpp:57
#define H(x, y, z)
Definition MD5.cpp:56
#define MAKE_CAS_HANDLE_CONSTRUCTOR(HandleKind)
Allow ObjectStore implementations to create internal handles.
Represent a constant reference to an array (0 or more elements consecutively in memory),...
Definition ArrayRef.h:40
Lightweight error class with error context and mandatory checking.
Definition Error.h:159
static ErrorSuccess success()
Create a success value.
Definition Error.h:336
Tagged union holding either a T or a Error.
Definition Error.h:485
This interface provides simple read-only access to a block of memory, and provides simple methods for...
This class consists of common code factored out of the SmallVector class to reduce code duplication b...
Represent a constant reference to a string, i.e.
Definition StringRef.h:56
Context for CAS identifiers.
Definition CASID.h:28
Unique identifier for a CAS object.
Definition CASID.h:58
Handle to a loaded object in a ObjectStore instance.
Reference to an abstract hierarchical node, with data and references.
static ObjectProxy load(ObjectStore &CAS, ObjectRef Ref, ObjectHandle Node)
friend bool operator==(ObjectRef Ref, const ObjectProxy &Proxy)
LLVM_ABI std::unique_ptr< MemoryBuffer > getMemoryBuffer(StringRef Name="", bool RequiresNullTerminator=true) const
size_t getNumReferences() const
friend bool operator!=(const ObjectProxy &Proxy, ObjectRef Ref)
friend bool operator==(const ObjectProxy &Proxy, ObjectRef Ref)
Error exportDataToFile(StringRef Path) const
Exports the data of an object to a file path.
Error forEachReference(function_ref< Error(ObjectRef)> Callback) const
Visit each reference in order, returning an error from Callback to stop early.
StringRef getData() const
Get the content of the node. Valid as long as the CAS is valid.
friend bool operator!=(ObjectRef Ref, const ObjectProxy &Proxy)
CASID getReferenceID(size_t I) const
ObjectStore & getCAS() const
ObjectRef getRef() const
ObjectRef getReference(size_t I) const
Reference to an object in an ObjectStore instance.
Content-addressable storage for objects.
Definition ObjectStore.h:90
virtual void print(raw_ostream &) const
Print the ObjectStore internals for debugging purpose.
virtual Error validateObject(const CASID &ID)=0
Validate the underlying object referred by CASID.
virtual Expected< std::optional< uint64_t > > getStorageSize() const
Expected< ObjectRef > storeFromOpenFile(sys::fs::file_t FD, std::optional< sys::fs::file_status > Status=std::nullopt)
Default implementation reads FD and calls storeNode().
virtual Expected< bool > isMaterialized(ObjectRef Ref) const =0
virtual Expected< ObjectRef > store(ArrayRef< ObjectRef > Refs, ArrayRef< char > Data)=0
Store object into ObjectStore.
virtual ArrayRef< char > getData(ObjectHandle Node, bool RequiresNullTerminator=false) const =0
virtual Error exportDataToFile(ObjectHandle Node, StringRef Path) const
Exports the data of an object to a file path.
virtual CASID getID(ObjectRef Ref) const =0
Get an ID for Ref.
virtual Expected< std::optional< ObjectHandle > > loadIfExists(ObjectRef Ref)=0
Load the object referenced by Ref.
const CASContext & getContext() const
Get CASContext.
virtual Error setSizeLimit(std::optional< uint64_t > SizeLimit)
Set the size for limiting growth of on-disk storage.
virtual ~ObjectStore()=default
virtual Expected< ObjectRef > storeFromFile(StringRef Path)
Stores the data of a file into ObjectStore.
Expected< ObjectRef > storeFromString(ArrayRef< ObjectRef > Refs, StringRef String)
Store object from StringRef.
virtual Error pruneStorageData()
Prune local storage to reduce its size according to the desired size limit.
uint64_t readData(ObjectHandle Node, raw_ostream &OS, uint64_t Offset=0, uint64_t MaxBytes=-1ULL) const
Read the data from Data into OS.
virtual ObjectRef readRef(ObjectHandle Node, size_t I) const =0
ObjectStore(const CASContext &Context)
virtual Expected< CASID > parseID(StringRef ID)=0
Get a CASID from a ID, which should have been generated by CASID::print().
virtual uint64_t getDataSize(ObjectHandle Node) const =0
Get the size of some data.
virtual Expected< ObjectRef > storeFromOpenFileImpl(sys::fs::file_t FD, std::optional< sys::fs::file_status > Status)
Get ObjectRef from open file.
StringRef getDataString(ObjectHandle Node)
Get a lifetime-extended StringRef pointing at Data.
virtual Error validate(bool CheckHash) const =0
Validate the entire ObjectStore.
virtual size_t getNumRefs(ObjectHandle Node) const =0
virtual std::optional< ObjectRef > getReference(const CASID &ID) const =0
Get an existing reference to the object called ID.
friend class ObjectProxy
Definition ObjectStore.h:91
virtual Error forEachRef(ObjectHandle Node, function_ref< Error(ObjectRef)> Callback) const =0
Methods for handling objects.
An efficient, type-erasing, non-owning reference to a callable.
This class implements an extremely fast bulk output stream that can only output to a stream.
Definition raw_ostream.h:53
unique_function is a type-erasing functor similar to std::function.
unsigned ID
LLVM IR allows to use arbitrary numbers as calling convention identifiers.
Definition CallingConv.h:24
LLVM_ABI bool isOnDiskCASEnabled()
LLVM_ABI std::unique_ptr< ObjectStore > createInMemoryCAS()
Create an in memory CAS.
LLVM_ABI Expected< std::unique_ptr< ObjectStore > > createOnDiskCAS(const Twine &Path)
Create a persistent on-disk path at Path.
This is an optimization pass for GlobalISel generic memory operations.
void dump(const SparseBitVector< ElementSize > &LHS, raw_ostream &out)
@ Offset
Definition DWP.cpp:558
ArrayRef< CharT > arrayRefFromStringRef(StringRef Input)
Construct an array ref of bytes from a string ref.
@ Ref
The access may reference the value stored in memory.
Definition ModRef.h:32
@ Other
Any other memory.
Definition ModRef.h:68
FunctionAddr VTableAddr uintptr_t uintptr_t Data
Definition InstrProf.h:221
StringRef toStringRef(bool B)
Construct a string ref from a boolean.