1255 – Should enhance LLVM switch instruction to take case ranges

LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 1255 - Should enhance LLVM switch instruction to take case ranges

Summary: Should enhance LLVM switch instruction to take case ranges

Status:	NEW

Alias:	None

Product:	libraries
Classification:	Unclassified
Component:	Core LLVM classes (show other bugs)
Version:	trunk
Hardware:	All All

Importance:	P normal
Assignee:	Unassigned LLVM Bugs

URL:
Keywords:	quality-of-implementation, slow-compile

Depends on:
Blocks:

Reported:	2007-03-13 10:31 PDT by Chris Lattner
Modified:	2020-01-27 03:57 PST (History)
CC List:	11 users (show)

See Also:	18348 18349
Fixed By Commit(s):

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Chris Lattner 2007-03-13 10:31:47 PDT

The LLVM switch instruction should handle ranges of case values that go to the same destination without 
enumerating all of them.

This would be good to do before 2.0, and would be a great project for someone who wants to learn/touch 
more of LLVM.

-Chris

Comment 1 Anton Korobeynikov 2007-03-13 10:35:04 PDT

That's definitely that thing, I've thought about after discussing with Duncan
his switch-handling patch.

In fact, it will make switch instruction much more "natural" and will eliminate
the need of "clusterifying" during switch lowering/codegen.

Comment 2 Duncan Sands 2007-03-20 11:07:24 PDT

Maybe this can be added to the project page as something suitable for
the google SoC.

Comment 3 Anton Korobeynikov 2007-03-20 11:31:48 PDT

Maybe, however Open Projects page already mention "code-cleanup" bugs. Maybe
it's just worth to set keyword here? :)

Comment 4 Anton Korobeynikov 2007-06-30 16:09:24 PDT

I'm currently thinking about implementing this feature (otherwise we can simple
forget about it :)

Main question is: how should case ranges implemented. It seems, that these
conditions should be met:

1. Only first and last element of the range should be stored. This is pretty
natural. I've said before, that LLVM's ConstantRange class is not very good
suitable for storing case ranges, because it's in form [Lower, Upper), so in
many real-world cases Upper can overflow, which pretty inconvenient for user to
deal with.

2. CaseRange should be inherited from User in order not to break usual LLVM API
3. CaseRange should be some form of "Constant".

Currently I'm planning to play with some specialized version of ConstantStruct.
Any thoughts/objections?

Comment 5 Chris Lattner 2007-07-02 01:53:56 PDT

I don't think that CaseRange should be a value.  I think that Switch should just have triples of operands.  
Just store "lower, upper, destbb" as three operands.

-Chris

Comment 6 Duncan Sands 2007-07-02 04:35:34 PDT

> I've said before, that LLVM's ConstantRange class is not very good
> suitable for storing case ranges, because it's in form [Lower, Upper),
> so in many real-world cases Upper can overflow, which pretty inconvenient
> for user to deal with.

I don't understand your objection to ConstantRange.  First off let me
say that I think it useful to support wrapping case ranges: with Ada,
I'm sure you'll fairly often see case statements in which very positive
and very negative values branch to the same label.  This can be naturally
represented using a wrapped range.

Secondly, there's another reason to handle wrapping ranges.  The root cause
is that case statements can be signed or unsigned in gcc (Ada supports both)
unlike in C.  Of course internally LLVM is presumably going to represent
switches unsigned (or maybe signed, in any case one of the two).  That means
that signed case statements need to be mapped to unsigned.  Consider the
following range:
  case -1 .. 1 => do_something
If mapped directly to unsigned, this naturally becomes a wrapped range.  (The
same phenomenon occurs if switches are represented as signed in LLVM, when
converting unsigned switches).

If the LLVM switch stuff is going to have to handle wrapped ranges anyway, I
see no reason not to use ConstantRange.

Comment 7 Chris Lattner 2007-07-02 16:25:13 PDT

> I don't understand your objection to ConstantRange.

Ah, I don't object to constant range.  To be more clear, I object to making ConstantRange itself be a 
value*, which would be required to make it an operand of SwitchInst.

Actually the problem I think is that we currently represent the value as an explicit operand of switch.  
For example, saying "1 -> bb1, 2 -> bb2, ..." etc, currently has two operands for each source and 
destination.  Instead of doing this, we could just store the BB's as operands and have a separate list of 
the values that correspond to them.  Because they do not need to be Value*'s, we could now use 
ConstantRanges for these.

Does that make sense?

-Chris

Comment 8 Steven Bosscher 2007-09-16 02:47:44 PDT

FWIW case ranges are a language feature in standard Fortran. Does this mean that this bug needs to be resolved before llvm-gfortran can work, or does LLVM currently lower switch ranges produced by the gcc front ends to an enumeration of all case values?

Comment 9 Anton Korobeynikov 2007-09-16 03:08:21 PDT

(In reply to comment #8)
> FWIW case ranges are a language feature in standard Fortran. Does this mean
> that this bug needs to be resolved before llvm-gfortran can work, or does LLVM
> currently lower switch ranges produced by the gcc front ends to an enumeration
> of all case values?
> 
Yes. Exactly. With some threshold value: if case range is too big it emits just range comparison and 'if'.

Comment 10 Chris Lattner 2007-09-16 12:47:10 PDT

The SWITCH_EXPR lowering code in llvm-gcc currently converts case ranges to an enumerated set of cases in an llvm switch instruction if the case range is less than 64 elements.  If it is more than 64 elements, it emits an "if" before the switch to evaluate the case range.  This is not necessarily optimal, but is fully correct.

Comment 11 Stepan Dyatkovskiy 2011-09-07 13:59:42 PDT

Hi, guys. What the state of case ranges is by now?

Comment 12 Duncan Sands 2011-09-07 15:07:52 PDT

No one did anything.

Comment 13 Stepan Dyatkovskiy 2011-09-21 05:37:47 PDT

May be we can use ConstantArray instead of ConstantRanges? Something like this:

switch i32 %val, label %otherwise [ i32 0, label %onzero
                                    [2 x i32] [i32 1, i32 5], label %onzerofive
                                    i32 7, label %onsix ]

We also can simplify syntax in this case omitting [2 x i32]:

switch i32 %val, label %otherwise [ i32 0, label %onzero
                                    [i32 1, i32 5], label %onzerofive
                                    i32 7, label %onsix ]

Comment 14 Stepan Dyatkovskiy 2011-09-28 03:33:37 PDT

So what about moving case-values to another collection? I can try to do this if no-one against...

Comment 15 Chris Lattner 2011-09-28 14:55:49 PDT

FWIW, I don't think that using ConstantArray's here make sense.  Instead, SwitchInst should move to not using Value*'s to represent the arguments, they should be a non-Value* datastructure that just contains APInts or something.

Comment 16 Stepan Dyatkovskiy 2011-09-29 01:56:28 PDT

It seems that some optimizations becames impossible after case values was moved to non-operands collection. I mean optimizations that replaces Use objects with more optimal ones. "CodeGenPrepare::OptimizeExtUses" for example do it (CodeGenPrepare.cpp, string 1054).

Comment 17 Duncan Sands 2011-09-29 03:03:58 PDT

I don't understand what you are saying.  Currently switch cases are ConstantInt,
i.e. just numbers like 2 or 9.  As such they do not use other values, so can't
possibly be impacted in the way you seem to suggest.  Indeed they can't possibly
be simplified: how do you simplify the number 3?

Comment 18 Stepan Dyatkovskiy 2011-09-29 03:22:24 PDT

Switch condition is stored like a case value too, with idx=0. It is may be a non-constant value. So, it is may be optimized. 
But generally you right. It seems that "3" could not be simplified itself :-)

Comment 19 Duncan Sands 2011-09-29 03:30:32 PDT

I don't think anyone was suggesting changing the way the switch condition is
represented, only the case values.

Comment 20 Stepan Dyatkovskiy 2011-09-29 04:23:57 PDT

OK. What about next format of op-list:
[condition, default-dest, bb1, bb2, ...] ?

Comment 21 Anton Korobeynikov 2011-09-29 06:08:09 PDT

(In reply to comment #20)
> OK. What about next format of op-list:
> [condition, default-dest, bb1, bb2, ...] ?
This looks reasonable, might definitely be a "proper" Use

Comment 22 Chris Lattner 2011-09-29 10:39:21 PDT

Right.  Also, note that changing the storage of the case values in memory doesn't mean that the syntax in .ll files has to change.

Comment 23 Nuno Lopes 2012-05-21 13:52:23 PDT

I have two concerns about the current design:

First, the implementation of SwitchInst::CaseIteratorT::getCaseValue() seems to be buggy, since it only iterates over the Low value of each range. It should keep state and be able to iterate over all elements of each case range.
(http://llvm.org/docs/doxygen/html/Instructions_8h_source.html#l02506)

Second, I couldn't find a simple way to cast the output of SwitchInst::CaseIteratorT::getCaseValueEx()  (a ConstantRangesSet) to a ConstantRange, which seems to be the most natural object for users that can actually handle case range values. Can this be made available?

Comment 24 Stepan Dyatkovskiy 2012-05-22 01:41:42 PDT

SwitchInst::CaseIteratorT::getCaseValue() is deprecated old style method. It is kept for unchanged passes that are not adopted to new case ranges feature. 
Currently I'm in middle of case ranges implementation. And SwitchInst is already uses ConstantRangesSet but works in old style, currently each case is a still single number. I'm going to construct SwitchInst with *real* ranges after all dependend passes will adopted to case-ranges feature.

About ConstantRangesSet and ConstantRange. It is a slightly different classes. Don't try to convert CRS to CR.
Below little examples how to work with CRS:

// Example of enumeration ranges in case
  for (SwitchInst::CaseIt i = I.case_begin(), e = I.case_end(); i != e; ++i) {
    ConstantRangesSet Case = i.getCaseValueEx();
    for (unsigned n = 0, en = Case.getNumItems(); n != en; ++n) {
      ConstantRangesSet::Range r = Case.getItem(n);
      ConstantInt* Low = r.Low;
      ConstantInt* High = r.High;  
      // Do something with Low and High.
    }
  }

// Example of looking up for case that satisfies condition:
  for (SwitchInst::CaseIt i = I.case_begin(), e = I.case_end(); i != e; ++i) {
    ConstantRangesSet Case = i.getCaseValueEx();
    if (Case.isSatisfies(Condition)) {
      // Do something
      break;
    } 
  }

Also note that after all passes will updated and SwitchInst will work with real cases, the ConstantRangesSet will rebased from ConstantInt numbers to the APInt numbers.

Comment 25 Stepan Dyatkovskiy 2012-05-22 01:45:18 PDT

Of course the last example is a little bit redundant since you can use
"CaseIt SwitchInst::findCaseValue(const ConstantInt* C)" instead.

Comment 26 Nuno Lopes 2012-05-22 10:09:35 PDT

I guess a missed the discussion on the mailing list.
But for me it would make more sense to have ConstantRangesSet be, well, a set of ConstantRanges.  Why have a new Range class, if we already have ConstantRange?  Furthermore, many optimizations already know about ConstantRanges, so they wouldn't need any change to take advantage of this new case ranges.
Again, for current users, it would be nice to have some way to iterate over a switch's case ranges and get ConstantRanges instead.

Comment 27 Duncan Sands 2012-05-22 10:40:18 PDT

I think the name ConstantRangeSet is unfortunate.  In fact I think it should
just be a completely general set of integers that happens to store in an
efficient way sets that are unions of intervals (this list of intervals being
exposed for the convenience of users).

I think the implementation is also unfortunate, i.e. poor.  Why is it mucking
around with Constants at all, it should only be dealing with APInts?  They
should all have the same bitwidth, so basically the constructor should take the
bitwidth and that's it.  Under the hood it should probably be a SmallVector of
APInt pairs [Low, High) and that's it.  The whole "holder" concept is bad in my
opinion.  Breaking the abstraction with IsWide is bad.  Right at the start of
the file there is

template <bool IsReadonly> struct CRSConstantTypes {
  typedef ConstantInt ConstantIntTy;
  typedef ConstantRangesSet ConstantRangesSetTy;
};

template <>
struct CRSConstantTypes<true> {
  typedef const ConstantInt ConstantIntTy;
  typedef const ConstantRangesSet ConstantRangesSetTy;
};

WTF?  And this is only on the first screen, I didn't have the courage to look
further.

Comment 28 Duncan Sands 2012-05-22 14:42:02 PDT

Alternatively, if ConstantRangesSet is not a real abstract data type, but just a
helper for SwitchInst, then I reckon it should be moved to Instructions.cpp.  In
fact it would be great if lots of the SwitchInst stuff in Instructions.h could
be moved out of line.  The first thing you see at the start of the public
SwitchInst class is a ton of incomprehensible (to me) iterator stuff,

  template <class SwitchInstTy, class ConstantIntTy, class BasicBlockTy>
  class CaseIteratorT {
  protected:

    SwitchInstTy *SI;
    unsigned Index;

  public:

    typedef CaseIteratorT<SwitchInstTy, ConstantIntTy, BasicBlockTy> Self;

    /// Initializes case iterator for given SwitchInst and for given
    /// case number.    
    CaseIteratorT(SwitchInstTy *SI, unsigned CaseNum) {
      this->SI = SI;
      Index = CaseNum;
    }

    /// Initializes case iterator for given SwitchInst and for given
    /// TerminatorInst's successor index.

and so on.  I would love it if that could all be replaced with a forward
declaration, and have everything else hidden away out of line in
Instructions.cpp.

Comment 29 Stepan Dyatkovskiy 2012-05-23 01:39:30 PDT

Duncan, you right million times. Wrapping concept is temporary and looks terrible. So all this stuff like
template <bool IsReadonly> struct CRSConstantTypes {
  typedef ConstantInt ConstantIntTy;
  typedef ConstantRangesSet ConstantRangesSetTy;
};
will gone away. Since at the last stage I'll replace ConstantInt with APInt.

Relative to Instruction.h - OK. I'll make it with forward declared.

The strategy of case ranges implementation:

0. Implement new classes. Classes doesn't affect anything. They still work with ConstantInt base values at this stage.
1. Fictitious replacement of current ConstantInt case values with ConstantRangesSet. Case ranges set will still hold single value, and ConstantInt *getCaseValue() will return it. But additionally implement new method in SwitchInst that allows to work with case ranges. Currenly I think it should be some wrapper that returns either single value or ConstantRangesSet object.
2. Step-by-step replacement of old "ConstantInt* getCaseValue()" with new alternative. Modify algorithms for all passes that works with SwitchInst. But don't modify LLParser and BitcodeReader/Writer. Still hold single value in each ConstantRangesSet object. On this stage some parts of LLVM will use old-style methods, and some ones new-style. (I am here now)
3. After all getCaseValue() usages will removed and whole LLVM and its clients will work in new style - modify LLParser, Reader and Writer. Remove getCaseValue().
4. Replace ConstantInt*-based case ranges set items with APInt ones.

Comment 30 Duncan Sands 2012-05-23 01:44:50 PDT

OK, thanks for explaining!

Comment 31 Stepan Dyatkovskiy 2012-05-23 01:51:27 PDT

Hi Nuno. We considered the idea of ConstantRanges usage. But use cases are
differs. And we need implement additional methods for ConstantRanges that will
bulky. First of all I mean proper operations with sets like a difference (that
may produce two ranges) and union. So currently this idea is casted away.

Comment 32 Stepan Dyatkovskiy 2012-05-26 05:24:46 PDT

> I think the name ConstantRangeSet is unfortunate. 
What about "IntegerSubset" ?

Comment 33 Stepan Dyatkovskiy 2012-05-28 07:44:45 PDT

Hi Duncan,
Relative to Case iterators:
> I would love it if that could all be replaced with a forward
declaration
Done in r157575.

I also implemented IntItem (a wrapper around APInt). That allows to work with cases like with APInt right now where it is possible. More globally it allows to move functionality from ConstantInt to APInt slightly as a series of small commits.

Comment 34 Stepan Dyatkovskiy 2012-05-29 07:28:03 PDT

I renamed ConstantRangesSet to IntegersSubset, and CRSBuilder to IntegersSubsetMapping in r157612.

Comment 35 Bob Wilson 2013-08-30 14:48:32 PDT

What is the status of this work?  The current code for switch instructions seems to have been left in a confusing intermediate state.

Comment 36 Stepan Dyatkovskiy 2013-08-30 15:18:24 PDT

I tried to enhance current switch. But this way was too heavy for such large project. Then I had ask people for confirmation to reject latest switch changes I did, but nobody answers, though every time I'm looking at it I feel kind of shame :-(

It was bad (and first for me) llvm experience. But anyway - it is experience.

Now, I think, the best way to introduce features like this is *new* instruction. It is almost painless for project, since it would touch nothing if people will not use it. It could live like kind of experimental option on initial stage. I sent several patches with new 'switchr' instruction. I'm still waiting for response.

Comment 37 Bob Wilson 2013-08-31 14:18:15 PDT

I agree that changes like this are hard, and it's great that you gained experience from working on it.  But, we really shouldn't leave the code in the state that it is in now. If you don't think you will finish this soon, I think it would be better to revert the switch instruction changes back to what they were before.

Comment 38 Stepan Dyatkovskiy 2013-08-31 23:08:38 PDT

Bob,
I'm really glad to hear that.
I wanted to reject some changes many months ago. I asked about it in llvm-community, just nobody answers :-(
All I need is reviewer for my reject patches, since it is good to get LGTM before changes would be rejected. Could you (or somebody) review my patches then?

Comment 39 Bob Wilson 2013-08-31 23:29:44 PDT

Yes, I will review your patches.

Comment 40 Stepan Dyatkovskiy 2013-09-02 07:33:14 PDT

Bob,
Thanks. The first patch is in this llvm-commits post:
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20130902/186410.html

Comment 41 Bob Wilson 2013-09-09 14:17:12 PDT

I've reverted Stepan's patches in llvm svn 190328.

It would still be great to add this feature sometime, though!

Comment 42 Shreyansh Chouhan 2019-09-02 20:50:42 PDT

I'd like to work on implementing this, if it is still a desried feature, and if no one else is currently working on this. Might need some help along the way though :)

Comment 43 Stepan Dyatkovskiy 2019-09-03 00:19:26 PDT

(In reply to Shreyansh Chouhan from comment #42)
> I'd like to work on implementing this, if it is still a desried feature, and
> if no one else is currently working on this. Might need some help along the
> way though :)

Don't repeat my mistakes! Don't ever try to enhance existing SwitchInst. For this way supposes simultaneous enhancement of existing passes. A lot of them. You can't just add new feature to SwitchInst and keep the rest of llvm unaffected.

I would rather suggest you to introduce new instruction. And at first just to lower it to set of condition blocks. Then you could turn on its support in passes one by one.

Here is my 6 years old code with sketches for new 'switchr' instruction, which could be helpfull to you:

https://github.com/kaomoneus/llvm.stepan.dyatkovskiy/tree/switchr

Comment 44 Shreyansh Chouhan 2019-09-03 01:15:35 PDT

Hi Stepan,

Thanks a lot for letting me know, and also for sharing the old code, I know it'll be really helpful. The idea of introducing a new instruction seems good to me too, should I start working on it? Or should I wait for other people's review on this idea?

Comment 45 Shreyansh Chouhan 2019-09-03 01:20:43 PDT

(In reply to Stepan Dyatkovskiy from comment #43)
> (In reply to Shreyansh Chouhan from comment #42)
> > I'd like to work on implementing this, if it is still a desried feature, and
> > if no one else is currently working on this. Might need some help along the
> > way though :)
> 
> Don't repeat my mistakes! Don't ever try to enhance existing SwitchInst. For
> this way supposes simultaneous enhancement of existing passes. A lot of
> them. You can't just add new feature to SwitchInst and keep the rest of llvm
> unaffected.
> 
> I would rather suggest you to introduce new instruction. And at first just
> to lower it to set of condition blocks. Then you could turn on its support
> in passes one by one.
> 
> Here is my 6 years old code with sketches for new 'switchr' instruction,
> which could be helpfull to you:
> 
> https://github.com/kaomoneus/llvm.stepan.dyatkovskiy/tree/switchr

Also, does the old code contains your work on changing the switch instruction too? Or does it contain only the switchr sketches?

Missed the reply button last time ':)

Comment 46 Stepan Dyatkovskiy 2019-09-03 01:54:46 PDT

> Also, does the old code contains your work on changing the switch
> instruction too? Or does it contain only the switchr sketches?
> 
> Missed the reply button last time ':)

This code contains only new instruction and related modifications. I haven't checked it for years, but I believe it's compilable and runnable. I tried not to touch rest of llvm on purpose. Any line of new code in core should be done very carefully, for you have a lot of different 

The best way to check what was done there is to compare 'switchr' branch with those-days master branch.

Smth like this:
$ git clone https://github.com/kaomoneus/llvm.stepan.dyatkovskiy.git llvm-switchr
$ cd llvm-switchr
$ git diff master switchr

As for waiting for review. This is definitely should be reviewed and approved. But may be you could start prototyping already, so then you would had something at your hands during review discussion.

Comment 47 Stepan Dyatkovskiy 2019-09-03 01:57:04 PDT

*for you have a lot of different buildbots with different settings and targets. And each of them should be green (successful) all the time.

Also Think about some special trigger which allows to disable maximum of your code for legacy behaviour.

Comment 48 Shreyansh Chouhan 2019-09-03 03:14:45 PDT

(In reply to Stepan Dyatkovskiy from comment #47)
> *for you have a lot of different buildbots with different settings and
> targets. And each of them should be green (successful) all the time.
> 
> Also Think about some special trigger which allows to disable maximum of
> your code for legacy behaviour.

Thanks a lot for your help, I will be checking out the diff between the master and the switchr, and will ask you if I am confused about something.

Comment 49 Stepan Dyatkovskiy 2019-09-03 03:30:30 PDT

(In reply to Shreyansh Chouhan from comment #48)
Sure! You're welcome!

Comment 50 Shreyansh Chouhan 2019-09-04 09:36:41 PDT

Hi Stepan,
I have a few questions regarding the code that you gave me, should I start asking them over here? Or can I ping you on some other platform probably to avoid spamming a lot of simple questions here? Maybe on IRC if you're there?

Also the first question:
I am not able to understand the way you have implemented ranges, here are your comments:

/// If we have flat and ordered collection of numbers, then range
/// [Low, High) may be represented as Low=collection[i], High=collection[i+1].
/// Thus all ranges are covered, also including the wrapped range:
/// Low=collection[collection.size()-1], High=collection[0].

I couldn't really understand what does collection actually represent.

Comment 51 Shreyansh Chouhan 2019-09-04 09:38:03 PDT

(In reply to Shreyansh Chouhan from comment #50)
> Hi Stepan,
> I have a few questions regarding the code that you gave me, should I start
> asking them over here? Or can I ping you on some other platform probably to
> avoid spamming a lot of simple questions here? Maybe on IRC if you're there?
> 
> Also the first question:
> I am not able to understand the way you have implemented ranges, here are
> your comments:
> 
> /// If we have flat and ordered collection of numbers, then range
> /// [Low, High) may be represented as Low=collection[i],
> High=collection[i+1].
> /// Thus all ranges are covered, also including the wrapped range:
> /// Low=collection[collection.size()-1], High=collection[0].
> 
> I couldn't really understand what does collection actually represent.

If collections represents the collection of numbers used with the switchr instruction, what does it being wrapped mean?

Comment 52 Stepan Dyatkovskiy 2019-09-04 11:57:30 PDT

> > /// If we have flat and ordered collection of numbers, then range
> > /// [Low, High) may be represented as Low=collection[i],
> > High=collection[i+1].
> > /// Thus all ranges are covered, also including the wrapped range:
> > /// Low=collection[collection.size()-1], High=collection[0].
> > 
> > I couldn't really understand what does collection actually represent.
> 
> If collections represents the collection of numbers used with the switchr
> instruction, what does it being wrapped mean?

Honestly I should check code. But something I remember.

"Flat", well, I think that just means single-dimensional. So it's not flat, it's rather.. single dimensional :-) Basically I meant just an ordered sequence of numbers.

Wrapped range Low=collection[collection.size()-1], High=collection[0] perhaps means this:

[ [collection.size()-1], +inf ) U (-inf, collection[0] )

So it's universum with the rest of ranges excluded.

And the last statement better to check in code. I can do it only at monday though.

Comment 53 Chris Lattner 2019-09-04 21:41:20 PDT

> Don't repeat my mistakes! Don't ever try to enhance existing SwitchInst.

I don't speak for Stepan, but this advice isn't obviously correct to me.  I understand and agree with his point that "updating all the passes is a lot of work" but it isn't clear to me that the LLVM community with accept a new "switch with ranges" instruction.

It seems better to define a new global "eliminate switch ranges" operation that takes a SwitchInst, and update all passes to either a) work with ranges, or b) call this helper.

Then file a bug that says "phase out all calls to the 'eliminate switch ranges' helper".

Comment 54 Duncan Sands 2019-09-04 23:30:09 PDT

By an "eliminate switch ranges" operation, do you mean rewriting the IR, replacing a switch with ranges with (the equivalent of) a bunch of "if" statements and a switch without ranges?

Comment 55 Chris Lattner 2019-09-05 11:17:27 PDT

> By an "eliminate switch ranges" operation, do you mean rewriting the IR, replacing a switch with ranges with (the equivalent of) a bunch of "if" statements and a switch without ranges?

Yes, exactly.  This feature request doesn't add any new expressive capability to switch, it is merely a space/time optimization.  I agree that a "Big Bang" patch that updates all the things is impractical, so I'd suggest by enhancing the IR, but keeping all the (nontrivial) things that touch switch simple by lowering the rich representation to the old representation.

As a follow-on, each place where this lowering happens can be eliminated, over time, and by experts on the different parts of the compiler.

Comment 56 Shreyansh Chouhan 2019-09-06 02:59:05 PDT

The current switch instruction uses an operand list with
0 being the value to switch, 1 the default block, followed by 
the case values and the block to execute in an alternating fashion.

I am new here, so I might miss something extremely obvious, (and 
I apologize for it in advance) but,
I was thinking of what Chris Lattner earlier suggested, taking
both upper and lower values of the ranges along with the block
to execute as operands. Why was this idea dropped?

Comment 57 Shreyansh Chouhan 2019-09-07 07:46:22 PDT

(In reply to Shreyansh Chouhan from comment #56)
> The current switch instruction uses an operand list with
> 0 being the value to switch, 1 the default block, followed by 
> the case values and the block to execute in an alternating fashion.
> 
> I am new here, so I might miss something extremely obvious, (and 
> I apologize for it in advance) but,
> I was thinking of what Chris Lattner earlier suggested, taking
> both upper and lower values of the ranges along with the block
> to execute as operands. Why was this idea dropped?

As mentioned in the comments before, the reason why
modifying switch instruction is not advisable is because
it'd lead to a lot of changes in the code, which would be really difficult,
and would also result in a huge patch, which will be hard to review.

If that is the case and a new instruction isn't really an option,
then maybe I can clone the GitHub repo
like Stephan, and work on it and ask people to review my work
as and when I am done with specific parts? 
That way the the review process can be broken down.

I understand that it will be a lot of work on
my part too, but if 
possible I'd like to work on it, it'll help me gain a lot 
of experience and knowledge and get me familiar with LLVM (and compilers for that matter,) as mentioned in the issue description.

This might take a lot of time, but I don't think it is 
a feature that is required urgently, or something that
could potentially block a release.

I will apologise once again, in case I've got something wrong (please correct me), and also for bad english, it isn't my first language. :)

Comment 58 Chris Lattner 2019-09-07 20:57:25 PDT

I agree that megapatches are very problematic, but I don't see how this requires one.

There are straight-forward ways to phase this in as a series of incremental patches, using the existing SwitchInst.

1. Start by adding expanding the representation of SwitchInst so it can represent ranges.  Introduce asmprinter and asmparser support for it.  Keep all the existing methods (which should be eventually revised) and make them abort when a case has more than one value.  The entire compiler still works.

2. Add a new method "EliminateCaseRanges" to SwitchInst (or better yet, a utility method somewhere) that rewrites a switch that uses case ranges into one that doesn't.  Go change all the passes (one at at time if you'd prefer) to call it.  Now all the passes accept case ranges.

3. Change clang to start generating case ranges, e.g. for the GNU C case range extension "case 1...4:".

4. Incrementally remove all uses of EliminateCaseRanges, upgrading each pass to work with them directly instead of eliminating them.


I don't see how this would take a megapatch.

Comment 59 Shreyansh Chouhan 2019-09-08 00:08:29 PDT

(In reply to Chris Lattner from comment #58)
> I agree that megapatches are very problematic, but I don't see how this
> requires one.
> 
> There are straight-forward ways to phase this in as a series of incremental
> patches, using the existing SwitchInst.
> 
> 1. Start by adding expanding the representation of SwitchInst so it can
> represent ranges.  Introduce asmprinter and asmparser support for it.  Keep
> all the existing methods (which should be eventually revised) and make them
> abort when a case has more than one value.  The entire compiler still works.
> 
> 2. Add a new method "EliminateCaseRanges" to SwitchInst (or better yet, a
> utility method somewhere) that rewrites a switch that uses case ranges into
> one that doesn't.  Go change all the passes (one at at time if you'd prefer)
> to call it.  Now all the passes accept case ranges.
> 
> 3. Change clang to start generating case ranges, e.g. for the GNU C case
> range extension "case 1...4:".
> 
> 4. Incrementally remove all uses of EliminateCaseRanges, upgrading each pass
> to work with them directly instead of eliminating them.
> 
> 
> I don't see how this would take a megapatch.

I see, that will definitely help with the
megapatch. I will follow these steps then :)

Just to clarify things for myself EliminateCaseRanges for now will 
be replacing switch with ranges with if statements? 
And in step 4 while removing the calls to it everywhere, the actual
handling of ranges will be taken care of?

Comment 60 Stepan Dyatkovskiy 2019-09-08 13:08:17 PDT

> Just to clarify things for myself EliminateCaseRanges for now will 
> be replacing switch with ranges with if statements? 
> And in step 4 while removing the calls to it everywhere, the actual
> handling of ranges will be taken care of?
I think Chris meant that EliminateCaseRange should turn new version of switch into legacy one. But as long as it's impossible to have a fine conversion of that kind for big ranges, perhaps he also meant switch -> if conversion as well.
For EliminateCaseRanges I would propose to introduce some threshold N, if range size < N, that introduce one more switch case, then in switch default case introduce set of if's for bigger ranges.

Comment 61 Stepan Dyatkovskiy 2019-09-08 13:16:27 PDT

(In reply to Chris Lattner from comment #58)
> I agree that megapatches are very problematic, but I don't see how this
> requires one.
> 
> There are straight-forward ways to phase this in as a series of incremental
> patches, using the existing SwitchInst.

What if to modify your approach as follows:
0. Introduce LLVM_CASE_RANGES_ENABLED build option. And guard new pr1255 code by relevant #define scope.
1. Introduce ranges fo SwitchInst.
2. Make EliminateCaseRanges as special pass and make it to be the first in passes queue (let's say on the top).
3. Add support for case ranges for existing passes.
4. Whenever it's possible bring EliminateCaseRanges pass down, until it hits bottom.
5. Remove EliminateCaseRanges.

Comment 62 Shreyansh Chouhan 2019-09-08 13:24:48 PDT

(In reply to Stepan Dyatkovskiy from comment #60)
> > Just to clarify things for myself EliminateCaseRanges for now will 
> > be replacing switch with ranges with if statements? 
> > And in step 4 while removing the calls to it everywhere, the actual
> > handling of ranges will be taken care of?
> I think Chris meant that EliminateCaseRange should turn new version of
> switch into legacy one. But as long as it's impossible to have a fine
> conversion of that kind for big ranges, perhaps he also meant switch -> if
> conversion as well.
> For EliminateCaseRanges I would propose to introduce some threshold N, if
> range size < N, that introduce one more switch case, then in switch default
> case introduce set of if's for bigger ranges.

Thank you for explaining me what the EliminateCaseRanges
method is supposed to do.

What will it be replaced with once we start removing it
from places? Will we still use ifs?

Comment 63 Stepan Dyatkovskiy 2019-09-08 13:34:15 PDT

> What will it be replaced with once we start removing it
> from places? Will we still use ifs?
Not quite understand what you mean. May be rephrase?
If you mean that EliminateCaseRanges is a method of SwitchInst and you remove its calls, then I suppose there should be the only call, or rather the only loop where this call is applied to all SwitchInsts, but correct me if I'm wrong.

The same for pass (when EliminateCaseRanges is a pass).

In any option, I think that it should be removed only, after you added support to all existing passes. Then.. just remove this call from all places, and that's it.

Comment 64 Shreyansh Chouhan 2019-09-08 13:36:57 PDT

(In reply to Stepan Dyatkovskiy from comment #63)
> > What will it be replaced with once we start removing it
> > from places? Will we still use ifs?
> Not quite understand what you mean. May be rephrase?
> If you mean that EliminateCaseRanges is a method of SwitchInst and you
> remove its calls, then I suppose there should be the only call, or rather
> the only loop where this call is applied to all SwitchInsts, but correct me
> if I'm wrong.
> 
> The same for pass (when EliminateCaseRanges is a pass).
> 
> In any option, I think that it should be removed only, after you added
> support to all existing passes. Then.. just remove this call from all
> places, and that's it.

By adding support to all the passes, do you mean that all the passes will now be able to replace switch with ifs in case the range is greater than the threshold value that you earlier mentioned? Without using a call to EliminateCaseRanges? Or without that pass?

Comment 65 Shreyansh Chouhan 2019-09-08 18:44:43 PDT

(In reply to Shreyansh Chouhan from comment #64)
> (In reply to Stepan Dyatkovskiy from comment #63)
> > > What will it be replaced with once we start removing it
> > > from places? Will we still use ifs?
> > Not quite understand what you mean. May be rephrase?
> > If you mean that EliminateCaseRanges is a method of SwitchInst and you
> > remove its calls, then I suppose there should be the only call, or rather
> > the only loop where this call is applied to all SwitchInsts, but correct me
> > if I'm wrong.
> > 
> > The same for pass (when EliminateCaseRanges is a pass).
> > 
> > In any option, I think that it should be removed only, after you added
> > support to all existing passes. Then.. just remove this call from all
> > places, and that's it.
> 
> By adding support to all the passes, do you mean that all the passes will
> now be able to replace switch with ifs in case the range is greater than the
> threshold value that you earlier mentioned? Without using a call to
> EliminateCaseRanges? Or without that pass?


As in every pass will be able to do what EleminateCaseRanges was doing?

If this is the case then I guess keeping EliminateCaseRanges as a 
method would help in later modification of
the passes. :)

Comment 66 Chris Lattner 2019-09-08 21:42:24 PDT

Please do not introduce #ifdef's in the codebase, there should be no need for them.

>I think Chris meant that EliminateCaseRange should turn new version of switch
into legacy one.

Correct.

> But as long as it's impossible to have a fine conversion of
that kind for big ranges, perhaps he also meant switch -> if conversion as
well.

I don't see a reason to worry about this unless some existing producer (e.g. clang's IRGen) would produce an if range.  The simple way to phase that in is to just not have clang (or anything else) produce a switch range in a case that it would produce something smarter than the equivalent lowered switchinst.

We can "take advantage" of case ranges when the whole compiler is more or less upgraded to deal with them.

Comment 67 Stepan Dyatkovskiy 2019-09-09 00:02:31 PDT

> I don't see a reason to worry about this unless some existing producer (e.g.
> clang's IRGen) would produce an if range.  The simple way to phase that in
> is to just not have clang (or anything else) produce a switch range in a
> case that it would produce something smarter than the equivalent lowered
> switchinst.

I think that perhaps we should consider two cases of EliminateCaseRanges use.

1. When we still have some set of passes which don't know how to deal with case ranges. In this case we somehow should lower switch. And in this case we should turn ranges like [1..2) into "case 1:", and turn ranges like [1..100) into "if (c >= 1 && c < 100)". 

2. When all passes support case ranges, but backend doesn't. Then avoid generating case ranges by clang, and if after all we got some set of case ranges, wait until clang's IRGen turn.

Am I right?

Comment 68 Stepan Dyatkovskiy 2019-09-09 00:31:41 PDT

> As in every pass will be able to do what EleminateCaseRanges was doing?

Note quite. EliminateCaseRanges supposes that it produce code with NO case ranges.

Whilst, if pass support case ranges, then that pass knows how to interpret it, and perhaps can even get some benefits out of case-ranges representation form. E.g. sparse constant propogation or loop unswitch pass could have some benefits in theory (basically all data flow analyzing passes could).
So, if all passes knows what to do with case ranges, there's probably no need in EliminateCaseRanges at all, until perhaps instruction selection phase.

Some targets could also support case ranges directly though.

Comment 69 Shreyansh Chouhan 2019-09-12 07:15:47 PDT

I have started working on adding case range support to the SwitchInst class.

I am currently taking the approach taken by Stepan in the code he provided, 
i.e., moving case ranges out of the operand list, and using the operand list
for storing only the value to switch on and the basic blocks.

I hope the approach is still relevant?