For convenience, the initial TLV support in the ORC runtime (for both macho_platform and elfnix_platform) just saves all registers, then jumps into a function to do table lookup for the requested TLV instance address. I think the following scheme should provide better performance, while still allowing us to add new TLVs at runtime: 1. Instead of pointing directly at a TLV manager object the thread data pointer for the current thread should point to a table structure containing a N + 2 pointer sized objects. The first two entries will contain a pointer to the TLV manager object and a table size, and all subsequent entries will contain addresses of TLV instances for the current thread. As new TLVs are added to the JIT they are assigned indexes within this table (these indexes may extend past the end of the currently allocated table). ``` tlv_table: <pointer to TLV manager> <number of elements N> <element 0> ... <element N - 1> ``` 2. TLV descriptors (in the current MachO parlance) should become a triple of (get_addr_fn, key, index) (instead of the current (get_addr_fn, key, addr)), and... 3. The new tlv_get_addr function should look something like this: ``` if (desc->idx > table->size) return expandTableAndGet(desc); else if (table->elements[desc->idx] == 0) return allocateElementAndGet(desc); return table->elements[desc->idx]; ``` Or, in x86-64 assembly, something like: ``` _tlv_get_addr: movq 8(%rdi),%rax // get key from desc movq %gs:0x0(,%rax,8),%rsi // get table for thread movq 16(%rdi), %rax // get index from desc cmpq 8(%rsi), %rax // compare index and table size jae LexpandTableAndGet // if index > size then expand table movq 16(%rsi,%rax,8), %rax. // otherwise get table entry testq %rax, %rax // check for a null entry je LallocateElementAndGet // if null then allocate entry ret // Non-null entry: we have our result. LexpandTableAndGet jmp __resize_table LallocateElementAndGet jmp __allocate_element ```