LLVM 22.0.0git
ObjectStore.h
Go to the documentation of this file.
1//===- llvm/CAS/ObjectStore.h -----------------------------------*- C++ -*-===//
2//
3// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4// See https://llvm.org/LICENSE.txt for license information.
5// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6//
7//===----------------------------------------------------------------------===//
8///
9/// \file
10/// This file contains the declaration of the ObjectStore class.
11///
12//===----------------------------------------------------------------------===//
13
14#ifndef LLVM_CAS_OBJECTSTORE_H
15#define LLVM_CAS_OBJECTSTORE_H
16
17#include "llvm/ADT/StringRef.h"
18#include "llvm/CAS/CASID.h"
20#include "llvm/Support/Error.h"
22#include <cstddef>
23
24namespace llvm {
25
26class MemoryBuffer;
27template <typename T> class unique_function;
28
29namespace cas {
30
31class ObjectStore;
32class ObjectProxy;
33
34/// Content-addressable storage for objects.
35///
36/// Conceptually, objects are stored in a "unique set".
37///
38/// - Objects are immutable ("value objects") that are defined by their
39/// content. They are implicitly deduplicated by content.
40/// - Each object has a unique identifier (UID) that's derived from its content,
41/// called a \a CASID.
42/// - This UID is a fixed-size (strong) hash of the transitive content of a
43/// CAS object.
44/// - It's comparable between any two CAS instances that have the same \a
45/// CASIDContext::getHashSchemaIdentifier().
46/// - The UID can be printed (e.g., \a CASID::toString()) and it can parsed
47/// by the same or a different CAS instance with \a
48/// ObjectStore::parseID().
49/// - An object can be looked up by content or by UID.
50/// - \a store() is "get-or-create" methods, writing an object if it
51/// doesn't exist yet, and return a ref to it in any case.
52/// - \a loadObject(const CASID&) looks up an object by its UID.
53/// - Objects can reference other objects, forming an arbitrary DAG.
54///
55/// The \a ObjectStore interface has a few ways of referencing objects:
56///
57/// - \a ObjectRef encapsulates a reference to something in the CAS. It is an
58/// opaque type that references an object inside a specific CAS. It is
59/// implementation defined if the underlying object exists or not for an
60/// ObjectRef, and it can used to speed up CAS lookup as an implementation
61/// detail. However, you don't know anything about the underlying objects.
62/// "Loading" the object is a separate step that may not have happened
63/// yet, and which can fail (e.g. due to filesystem corruption) or introduce
64/// latency (if downloading from a remote store).
65/// - \a ObjectHandle encapulates a *loaded* object in the CAS. You need one of
66/// these to inspect the content of an object: to look at its stored
67/// data and references. This is internal to CAS implementation and not
68/// availble from CAS public APIs.
69/// - \a CASID: the UID for an object in the CAS, obtained through \a
70/// ObjectStore::getID() or \a ObjectStore::parseID(). This is a valid CAS
71/// identifier, but may reference an object that is unknown to this CAS
72/// instance.
73/// - \a ObjectProxy pairs an ObjectHandle (subclass) with a ObjectStore, and
74/// wraps access APIs to avoid having to pass extra parameters. It is the
75/// object used for accessing underlying data and refs by CAS users.
76///
77/// Both ObjectRef and ObjectHandle are lightweight, wrapping a `uint64_t` and
78/// are only valid with the associated ObjectStore instance.
79///
80/// There are a few options for accessing content of objects, with different
81/// lifetime tradeoffs:
82///
83/// - \a getData() accesses data without exposing lifetime at all.
84/// - \a getMemoryBuffer() returns a \a MemoryBuffer whose lifetime
85/// is independent of the CAS (it can live longer).
86/// - \a getDataString() return StringRef with lifetime is guaranteed to last as
87/// long as \a ObjectStore.
88/// - \a readRef() and \a forEachRef() iterate through the references in an
89/// object. There is no lifetime assumption.
91 friend class ObjectProxy;
92 void anchor();
93
94public:
95 /// Get a \p CASID from a \p ID, which should have been generated by \a
96 /// CASID::print(). This succeeds as long as \a validateID() would pass. The
97 /// object may be unknown to this CAS instance.
98 ///
99 /// TODO: Remove, and update callers to use \a validateID() or \a
100 /// extractHashFromID().
102
103 /// Store object into ObjectStore.
105 ArrayRef<char> Data) = 0;
106 /// Get an ID for \p Ref.
107 virtual CASID getID(ObjectRef Ref) const = 0;
108
109 /// Get an existing reference to the object called \p ID.
110 ///
111 /// Returns \c None if the object is not stored in this CAS.
112 virtual std::optional<ObjectRef> getReference(const CASID &ID) const = 0;
113
114 /// \returns true if the object is directly available from the local CAS, for
115 /// implementations that have this kind of distinction.
117
118 /// Validate the underlying object referred by CASID.
119 virtual Error validateObject(const CASID &ID) = 0;
120
121 /// Validate the entire ObjectStore.
122 virtual Error validate(bool CheckHash) const = 0;
123
124protected:
125 /// Load the object referenced by \p Ref.
126 ///
127 /// Errors if the object cannot be loaded.
128 /// \returns \c std::nullopt if the object is missing from the CAS.
130
131 /// Like \c loadIfExists but returns an error if the object is missing.
133
134 /// Get the size of some data.
136
137 /// Methods for handling objects. CAS implementations need to override to
138 /// provide functions to access stored CAS objects and references.
140 function_ref<Error(ObjectRef)> Callback) const = 0;
141 virtual ObjectRef readRef(ObjectHandle Node, size_t I) const = 0;
142 virtual size_t getNumRefs(ObjectHandle Node) const = 0;
144 bool RequiresNullTerminator = false) const = 0;
145
146 /// Get ObjectRef from open file.
147 virtual Expected<ObjectRef>
149 std::optional<sys::fs::file_status> Status);
150
151 /// Get a lifetime-extended StringRef pointing at \p Data.
152 ///
153 /// Depending on the CAS implementation, this may involve in-memory storage
154 /// overhead.
158
159 /// Get a lifetime-extended MemoryBuffer pointing at \p Data.
160 ///
161 /// Depending on the CAS implementation, this may involve in-memory storage
162 /// overhead.
163 std::unique_ptr<MemoryBuffer>
165 bool RequiresNullTerminator = true);
166
167 /// Read all the refs from object in a SmallVector.
168 virtual void readRefs(ObjectHandle Node,
169 SmallVectorImpl<ObjectRef> &Refs) const;
170
171 /// Allow ObjectStore implementations to create internal handles.
172#define MAKE_CAS_HANDLE_CONSTRUCTOR(HandleKind) \
173 HandleKind make##HandleKind(uint64_t InternalRef) const { \
174 return HandleKind(*this, InternalRef); \
175 }
176 MAKE_CAS_HANDLE_CONSTRUCTOR(ObjectHandle)
178#undef MAKE_CAS_HANDLE_CONSTRUCTOR
179
180public:
181 /// Helper functions to store object and returns a ObjectProxy.
182 LLVM_ABI_FOR_TEST Expected<ObjectProxy> createProxy(ArrayRef<ObjectRef> Refs,
183 StringRef Data);
184
185 /// Store object from StringRef.
190
191 /// Default implementation reads \p FD and calls \a storeNode(). Does not
192 /// take ownership of \p FD; the caller is responsible for closing it.
193 ///
194 /// If \p Status is sent in it is to be treated as a hint. Implementations
195 /// must protect against the file size potentially growing after the status
196 /// was taken (i.e., they cannot assume that an mmap will be null-terminated
197 /// where \p Status implies).
198 ///
199 /// Returns the \a CASID and the size of the file.
202 std::optional<sys::fs::file_status> Status = std::nullopt) {
203 return storeFromOpenFileImpl(FD, Status);
204 }
205
206 static Error createUnknownObjectError(const CASID &ID);
207
208 /// Create ObjectProxy from CASID. If the object doesn't exist, get an error.
210 /// Create ObjectProxy from ObjectRef. If the object can't be loaded, get an
211 /// error.
213
214 /// \returns \c std::nullopt if the object is missing from the CAS.
216
217 /// Read the data from \p Data into \p OS.
219 uint64_t MaxBytes = -1ULL) const {
221 assert(Offset < Data.size() && "Expected valid offset");
222 Data = Data.drop_front(Offset).take_front(MaxBytes);
223 OS << toStringRef(Data);
224 return Data.size();
225 }
226
227 /// Set the size for limiting growth of on-disk storage. This has an effect
228 /// for when the instance is closed.
229 ///
230 /// Implementations may leave this unimplemented.
231 virtual Error setSizeLimit(std::optional<uint64_t> SizeLimit) {
232 return Error::success();
233 }
234
235 /// \returns the storage size of the on-disk CAS data.
236 ///
237 /// Implementations that don't have an implementation for this should return
238 /// \p std::nullopt.
240 return std::nullopt;
241 }
242
243 /// Prune local storage to reduce its size according to the desired size
244 /// limit. Pruning can happen concurrently with other operations.
245 ///
246 /// Implementations may leave this unimplemented.
247 virtual Error pruneStorageData() { return Error::success(); }
248
249 /// Validate the whole node tree.
251
252 /// Import object from another CAS. This will import the full tree from the
253 /// other CAS.
255
256 /// Print the ObjectStore internals for debugging purpose.
257 virtual void print(raw_ostream &) const {}
258 void dump() const;
259
260 /// Get CASContext
261 const CASContext &getContext() const { return Context; }
262
263 virtual ~ObjectStore() = default;
264
265protected:
266 ObjectStore(const CASContext &Context) : Context(Context) {}
267
268private:
269 const CASContext &Context;
270};
271
272/// Reference to an abstract hierarchical node, with data and references.
273/// Reference is passed by value and is expected to be valid as long as the \a
274/// ObjectStore is.
276public:
277 ObjectStore &getCAS() const { return *CAS; }
278 CASID getID() const { return CAS->getID(Ref); }
279 ObjectRef getRef() const { return Ref; }
280 size_t getNumReferences() const { return CAS->getNumRefs(H); }
281 ObjectRef getReference(size_t I) const { return CAS->readRef(H, I); }
282
283 operator CASID() const { return getID(); }
284 CASID getReferenceID(size_t I) const {
285 std::optional<CASID> ID = getCAS().getID(getReference(I));
286 assert(ID && "Expected reference to be first-class object");
287 return *ID;
288 }
289
290 /// Visit each reference in order, returning an error from \p Callback to
291 /// stop early.
293 return CAS->forEachRef(H, Callback);
294 }
295
296 std::unique_ptr<MemoryBuffer>
297 getMemoryBuffer(StringRef Name = "",
298 bool RequiresNullTerminator = true) const;
299
300 /// Get the content of the node. Valid as long as the CAS is valid.
301 StringRef getData() const { return CAS->getDataString(H); }
302
303 friend bool operator==(const ObjectProxy &Proxy, ObjectRef Ref) {
304 return Proxy.getRef() == Ref;
305 }
306 friend bool operator==(ObjectRef Ref, const ObjectProxy &Proxy) {
307 return Proxy.getRef() == Ref;
308 }
309 friend bool operator!=(const ObjectProxy &Proxy, ObjectRef Ref) {
310 return !(Proxy.getRef() == Ref);
311 }
312 friend bool operator!=(ObjectRef Ref, const ObjectProxy &Proxy) {
313 return !(Proxy.getRef() == Ref);
314 }
315
316public:
317 ObjectProxy() = delete;
318
320 return ObjectProxy(CAS, Ref, Node);
321 }
322
323private:
325 : CAS(&CAS), Ref(Ref), H(H) {}
326
327 ObjectStore *CAS;
328 ObjectRef Ref;
329 ObjectHandle H;
330};
331
332/// Create an in memory CAS.
333LLVM_ABI std::unique_ptr<ObjectStore> createInMemoryCAS();
334
335/// \returns true if \c LLVM_ENABLE_ONDISK_CAS configuration was enabled.
336bool isOnDiskCASEnabled();
337
338/// Create a persistent on-disk path at \p Path.
339LLVM_ABI Expected<std::unique_ptr<ObjectStore>>
340createOnDiskCAS(const Twine &Path);
341
342} // namespace cas
343} // namespace llvm
344
345#endif // LLVM_CAS_OBJECTSTORE_H
assert(UImm &&(UImm !=~static_cast< T >(0)) &&"Invalid immediate!")
AMDGPU Mark last scratch load
#define LLVM_ABI
Definition Compiler.h:213
#define LLVM_ABI_FOR_TEST
Definition Compiler.h:218
static cl::opt< unsigned > SizeLimit("eif-limit", cl::init(6), cl::Hidden, cl::desc("Size limit in Hexagon early if-conversion"))
#define I(x, y, z)
Definition MD5.cpp:57
#define H(x, y, z)
Definition MD5.cpp:56
#define MAKE_CAS_HANDLE_CONSTRUCTOR(HandleKind)
Allow ObjectStore implementations to create internal handles.
ArrayRef - Represent a constant reference to an array (0 or more elements consecutively in memory),...
Definition ArrayRef.h:40
Lightweight error class with error context and mandatory checking.
Definition Error.h:159
static ErrorSuccess success()
Create a success value.
Definition Error.h:336
Tagged union holding either a T or a Error.
Definition Error.h:485
This interface provides simple read-only access to a block of memory, and provides simple methods for...
This class consists of common code factored out of the SmallVector class to reduce code duplication b...
StringRef - Represent a constant reference to a string, i.e.
Definition StringRef.h:55
Context for CAS identifiers.
Definition CASID.h:28
Unique identifier for a CAS object.
Definition CASID.h:58
Handle to a loaded object in a ObjectStore instance.
Reference to an abstract hierarchical node, with data and references.
static ObjectProxy load(ObjectStore &CAS, ObjectRef Ref, ObjectHandle Node)
friend bool operator==(ObjectRef Ref, const ObjectProxy &Proxy)
std::unique_ptr< MemoryBuffer > getMemoryBuffer(StringRef Name="", bool RequiresNullTerminator=true) const
size_t getNumReferences() const
friend bool operator!=(const ObjectProxy &Proxy, ObjectRef Ref)
friend bool operator==(const ObjectProxy &Proxy, ObjectRef Ref)
Error forEachReference(function_ref< Error(ObjectRef)> Callback) const
Visit each reference in order, returning an error from Callback to stop early.
StringRef getData() const
Get the content of the node. Valid as long as the CAS is valid.
friend bool operator!=(ObjectRef Ref, const ObjectProxy &Proxy)
CASID getReferenceID(size_t I) const
ObjectStore & getCAS() const
ObjectRef getRef() const
ObjectRef getReference(size_t I) const
Reference to an object in an ObjectStore instance.
Content-addressable storage for objects.
Definition ObjectStore.h:90
LLVM_ABI_FOR_TEST Expected< ObjectProxy > createProxy(ArrayRef< ObjectRef > Refs, StringRef Data)
Helper functions to store object and returns a ObjectProxy.
virtual void print(raw_ostream &) const
Print the ObjectStore internals for debugging purpose.
virtual Error validateObject(const CASID &ID)=0
Validate the underlying object referred by CASID.
Expected< ObjectRef > importObject(ObjectStore &Upstream, ObjectRef Other)
Import object from another CAS.
virtual Expected< std::optional< uint64_t > > getStorageSize() const
Expected< ObjectRef > storeFromOpenFile(sys::fs::file_t FD, std::optional< sys::fs::file_status > Status=std::nullopt)
Default implementation reads FD and calls storeNode().
Expected< std::optional< ObjectProxy > > getProxyIfExists(ObjectRef Ref)
virtual Expected< bool > isMaterialized(ObjectRef Ref) const =0
virtual Expected< ObjectRef > store(ArrayRef< ObjectRef > Refs, ArrayRef< char > Data)=0
Store object into ObjectStore.
virtual ArrayRef< char > getData(ObjectHandle Node, bool RequiresNullTerminator=false) const =0
virtual CASID getID(ObjectRef Ref) const =0
Get an ID for Ref.
static Error createUnknownObjectError(const CASID &ID)
virtual Expected< std::optional< ObjectHandle > > loadIfExists(ObjectRef Ref)=0
Load the object referenced by Ref.
const CASContext & getContext() const
Get CASContext.
virtual Error setSizeLimit(std::optional< uint64_t > SizeLimit)
Set the size for limiting growth of on-disk storage.
virtual ~ObjectStore()=default
Error validateTree(ObjectRef Ref)
Validate the whole node tree.
Expected< ObjectRef > storeFromString(ArrayRef< ObjectRef > Refs, StringRef String)
Store object from StringRef.
virtual Error pruneStorageData()
Prune local storage to reduce its size according to the desired size limit.
uint64_t readData(ObjectHandle Node, raw_ostream &OS, uint64_t Offset=0, uint64_t MaxBytes=-1ULL) const
Read the data from Data into OS.
virtual ObjectRef readRef(ObjectHandle Node, size_t I) const =0
ObjectStore(const CASContext &Context)
virtual Expected< CASID > parseID(StringRef ID)=0
Get a CASID from a ID, which should have been generated by CASID::print().
virtual uint64_t getDataSize(ObjectHandle Node) const =0
Get the size of some data.
virtual Expected< ObjectRef > storeFromOpenFileImpl(sys::fs::file_t FD, std::optional< sys::fs::file_status > Status)
Get ObjectRef from open file.
StringRef getDataString(ObjectHandle Node)
Get a lifetime-extended StringRef pointing at Data.
virtual Error validate(bool CheckHash) const =0
Validate the entire ObjectStore.
virtual void readRefs(ObjectHandle Node, SmallVectorImpl< ObjectRef > &Refs) const
Read all the refs from object in a SmallVector.
virtual size_t getNumRefs(ObjectHandle Node) const =0
std::unique_ptr< MemoryBuffer > getMemoryBuffer(ObjectHandle Node, StringRef Name="", bool RequiresNullTerminator=true)
Get a lifetime-extended MemoryBuffer pointing at Data.
virtual std::optional< ObjectRef > getReference(const CASID &ID) const =0
Get an existing reference to the object called ID.
LLVM_ABI Expected< ObjectProxy > getProxy(const CASID &ID)
Create ObjectProxy from CASID. If the object doesn't exist, get an error.
friend class ObjectProxy
Definition ObjectStore.h:91
virtual Error forEachRef(ObjectHandle Node, function_ref< Error(ObjectRef)> Callback) const =0
Methods for handling objects.
An efficient, type-erasing, non-owning reference to a callable.
This class implements an extremely fast bulk output stream that can only output to a stream.
Definition raw_ostream.h:53
unique_function is a type-erasing functor similar to std::function.
unsigned ID
LLVM IR allows to use arbitrary numbers as calling convention identifiers.
Definition CallingConv.h:24
bool isOnDiskCASEnabled()
LLVM_ABI std::unique_ptr< ObjectStore > createInMemoryCAS()
Create an in memory CAS.
LLVM_ABI Expected< std::unique_ptr< ObjectStore > > createOnDiskCAS(const Twine &Path)
Create a persistent on-disk path at Path.
This is an optimization pass for GlobalISel generic memory operations.
@ Offset
Definition DWP.cpp:532
ArrayRef< CharT > arrayRefFromStringRef(StringRef Input)
Construct a string ref from an array ref of unsigned chars.
@ Ref
The access may reference the value stored in memory.
Definition ModRef.h:32
@ Other
Any other memory.
Definition ModRef.h:68
FunctionAddr VTableAddr uintptr_t uintptr_t Data
Definition InstrProf.h:189
ArrayRef(const T &OneElt) -> ArrayRef< T >
StringRef toStringRef(bool B)
Construct a string ref from a boolean.