Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linear congruential generator generates invalid values #28213

Closed
llvmbot opened this issue May 23, 2016 · 17 comments
Closed

Linear congruential generator generates invalid values #28213

llvmbot opened this issue May 23, 2016 · 17 comments
Labels
bugzilla Issues migrated from bugzilla libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi.

Comments

@llvmbot
Copy link
Collaborator

llvmbot commented May 23, 2016

Bugzilla Link 27839
Resolution FIXED
Resolved on May 24, 2021 21:04
Version 3.7
OS All
Reporter LLVM Bugzilla Contributor
CC @hfinkel,@mclow,@utsumi-fj,@zoecarver

Extended Description

The linear congruential generator generates numbers outside the range of min() and max() for some values of a, c, and m. For example, std::linear_congruential_engine <unsigned long long, 25214903917ull, 11ull, 1ull<<48 > fails badly. These are the values used by drand48.

#include <random>
#include <iostream>
#include <cstdint>

using drand48_engine = std::linear_congruential_engine <
    std::uint64_t, 25214903917ull, 11ull, (1ull<<48) >;

int main ()
{
    drand48_engine rng;
    int pass_fail[2] = {0,0};

    for (int ii = 0; ii < 1000000; ++ii)
    {
        auto num = rng();
        ++pass_fail[num < (1ull<<48)];
    }
    std::cout << "#pass=" << pass_fail[1] << " #fail=" << pass_fail[0] << '\n';
}
@llvmbot
Copy link
Collaborator Author

llvmbot commented May 23, 2016

Also see http://stackoverflow.com/questions/37382139/os-x-libc-stduniform-real-distribution-bug . The title of that question is misleading. The bug is in std::linear_congruential_engine, not std::uniform_real_distribution.

@llvmbot
Copy link
Collaborator Author

llvmbot commented May 23, 2016

The problem lies in various specializations of class __lce_ta in the C++ library header 'random'. Four specializations use Schrage's algorithm and do not check whether the assumptions of that algorithm apply. Those assumptions do not apply for the values for a, c, and m outlined above. You can't blindly use Schrage's algorithm this way because the C++ standard does not place any constraints on a, c, and m other than they are of the same unsigned type and (m==0) || ((a<m) && (c<m)) .

@mclow
Copy link
Contributor

mclow commented May 23, 2016

This post http://numerical.recipes/forum/showthread.php?t=685 suggests that Schrage's algorithm requires m % a < m / a.

This is not true for the values in this bug; and we don't currently handle that case.

@mclow
Copy link
Contributor

mclow commented May 23, 2016

If you need a fix in a hurry, you can make a local change.

Change , line 1671 from:

      bool _MightOverflow = (__a != 0 && __m != 0 && __m-1 > (_Mp-__c)/__a)>

to:
bool _MightOverflow = (__a != 0 && __m != 0 && __m-1 > (_Mp-__c)/__a) && (__m % __a < __m / __a)>

This will fix your specific set of numbers; but I'm not yet convinced that this is a complete fix.

@llvmbot
Copy link
Collaborator Author

llvmbot commented May 23, 2016

Marshall, that does work in this case. I don't think that is the right fix. There are three cases, based on the following three conditions:

  1. Whether the calculation of a*x + c might overflow, and
  2. Whether overflow might create incorrect results on computing (a*x + c) % m, and
  3. Whether using Schrage's algorithm is applicable.

The existing code base checks for condition #​1. Condition #​2 checks whether even if overflow (actually, modulo 2^N arithmetic; I'll call it "overflow") does occur, it won't hurt. That's where the modulus is 2^M where M<=N.

So:

  • Use (a*x+c)%m if overflow can't occur or if it doesn't matter if if does occur.
  • Use Schrage's algorithm if you must and you can.
  • Do something that satisfies the concept of a Generator for goofy values of a, c, and m.

I'd argue using (ax+c)%m for now is okay for now (it's better than what you have), and petitioning the C++ committee to allow use of (ax+c)%m exclusion in the case of goofy values. Linear congruential is supposed to be fast, and not that good. At least using (a*x+c)%m does satisfy the concept of being a Generator.

@llvmbot
Copy link
Collaborator Author

llvmbot commented May 23, 2016

I don't need a fix. This was a result of me answering a question on stackoverflow on a reported bug in the LLVM uniform real distribution. I chased the bug down to linear congruential generator, and then felt compelled to write a bug report. (Otherwise a night's lost sleep would have gone by unjustified.)

I use Mersenne Twister when I need simulation-quality random numbers. The minimum standard generators are fine when I'm doing something quick and dirty.

@llvmbot
Copy link
Collaborator Author

llvmbot commented May 23, 2016

I don't think the condition is __m % __a < __m / __a . There's a hidden assumption of infinite precision arithmetic in the implementations Schrage's algorithm. The calculations of __t0 and __t1 must not overflow. Testing whether __a < __m / __a will do that (and it's a simpler test). Note that __m % __a < __m / __a is a consequence of this simpler test.

@zoecarver
Copy link
Contributor

I agree, (a * x + c) % m should be used at a minimum. It is much faster and does not have the same problem as Schrage's algorithm. David, what are some of the cases in which it fails ("goofy values")?

@llvmbot
Copy link
Collaborator Author

llvmbot commented Jul 6, 2019

@​ZoeCarver, one specific set of goofy values is in the problem description. I linked to question at StackOverflow in comment #​1 (the comment that immediate follows the description). Here's a repeat of that link: https://stackoverflow.com/questions/37382139/os-x-libc-stduniform-real-distribution-bug. Note well: The title of that StackOverflow question is incorrect. The problem lies with the LLVM linear congruential generator rather than the LLVM implementation of std::uniform_real_distribution.

It's been three years since I filed this report. I found at the time that the GNU C++ version does not exhibit this bug. Their linear congruential generator code is a bit more convoluted than is the LLVM code but in the process the GNU version explicitly avoids this problem and similar ones.

I did not copy the GNU code as a solution because I didn't know whether it was kosher to copy code from one free compiler to another. Moreover, I did not feel it was appropriate for me to implement a solution because having seen the GNU implementation gave an idea what a solution should look like.

@zoecarver
Copy link
Contributor

Thank you for the response! I think your qualms about implementing this yourself are fair. I read the SO post (very interesting by the way) but cannot seem to get my simple implementation to fail your example test. I think the clear next step is to generate some tests for std::linear_congruential_engine (ideally some tests with goofy values). Given how much you know about this, it would be very helpful if you could provide some more cases where either implementation might fail.

Here is my test implementation:

template<class T, T a, T c, T m>
class lcg
{
T seed;
public:
lcg() : seed(0) {}
T operator()() { seed = (a * seed + c) % m; return seed; }
};

using drand48_engine = lcg<std::uint64_t, 25214903917ull, 11ull, (1ull<<48)>;

@zoecarver
Copy link
Contributor

Also, I think it should be mentioned that the current Schrage's implementation could be fixed by adding mod m. I don't think this is the route that should be taken because LCG is supposed to be fast, but it is an option. Here is an example:

T q = m / a;
T r = m % a;
T k = seed / q;
seed = a * (seed - k * q) - r * k;

if (seed < 0) seed += m;
return seed % m;

@llvmbot
Copy link
Collaborator Author

llvmbot commented Jul 6, 2019

Zoe, I've verified that the code in problem description does indeed report failures for all but 24 of the million generated numbers. Compiler details:

c++ --version
Apple LLVM version 10.0.1 (clang-1001.0.46.4)
Target: x86_64-apple-darwin18.6.0

I've also verified that the gnu c++ compiler works just fine: zero failures.

The problem is that the LLVM implementation uses Schrage's algorithm where doing so is not valid. The LLVM implementation tests whether a*x + c might overflow and always uses Schrage's algorithm if this is the case. Schrage's algorithm uses q = m/a and r = m mod a and is valid only if r <= q. In the case of the values used by the rand48 family, where m=2^48, a=25214903917, and c=11, this results in q=11163 and r=1004285185, making Schrage's algorithm invalid.

There is a workaround for m=2^48 (or any other power of 2 less than or equal to 2^64), which is that overflow doesn't matter with unsigned long long when m is such a power of 2 less than 2^64. Just use x = (a*x+c) mod m in such cases. (A lot of linear congruential generators use m=2^n, so this is not an off the wall workaround).

Another possible workaround is to use 128 bit integers, if those are available, to handle large (but not too large) values of a and m.

Finally, what if overflow might happen and matters, and Schrage's algorithm is invalid? I would suggest croaking. (This is what gcc does.) While the standard seems to imply that LCG should work with all valid values of a, c, and m, that to me is a bug in the standard. When someone uses extremely oddball values of a, c, and m, and then complains, disposition the problem report as "Will not fix."

@zoecarver
Copy link
Contributor

I wasn't clear in my comment. I meant that I couldn't reproduce failure using (a * seed + c) % m.

I think the solution you proposed is a good one-- It allows for the speed of (a * x + c) % m and the safety of Schrage's algorithm. I will create a patch implementing that.

What are some oddball values which do not work in either case?

@llvmbot
Copy link
Collaborator Author

llvmbot commented Jul 8, 2019

Given how unsigned arithmetic works in C++, there's no way (ax+c)%m can result in a value >= to m when a, c, m, and x are all unsigned long long. Thus simply calculating (ax+c)%m coupled with the other member functions of std::linear_congruential_engine will always make that class template qualify as a RandomNumberEngine.

There are however many ways in which an extended precision calculation of (ax+c) mod m will differ from unsigned long long calculation of (ax+c)%m. Strictly speaking, such values would make the use of (a*x+c)%m fail as a linear congruential engine. This is a (mostly) benign failure as the purported linear congruential engine still works as a RandomNumberEngine. Probably a very bad one, but meh. Even "good" linear congruential engines are not very good.

The real problem occurs where an implementation of linear congruential engine fails to satisfy the requirements of a RandomNumberEngine such as g() returning a value outside of the range [g.min(), g.max()], which is exactly what can happen where Schrage's algorithm is used when using that algorithm is invalid.

None of the solutions are good for goofy values of a, c, and m. The GNU solution is a static assertion that the values are not "goofy". The standard does not allow this. Zoe Carter's proposed solution is to use (a*x+c)%m for such values. The standard does not allow that, either. The best solution, to me, is to request a retroactive change to the standard that

  • Clarifies the meaning of UintType in std::linear_congruential_engine. For example, the LLVM implementation silently falls apart when UintType is __int128. LCGs are supposed to be fast, which implicitly means using a smallish unsigned integer type.
  • Specifies that some values of UintType, a, c, and m may result in undefined behavior (the "Get Out Of Jail, Free" card for an implementation vendor).

@zoecarver
Copy link
Contributor

Here is my fix D65041 https://reviews.llvm.org/D65041

@zoecarver
Copy link
Contributor

Resolved by 31dfaff / D65041.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Nov 26, 2021

mentioned in issue llvm/llvm-bugzilla-archive#34206

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 10, 2021
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi.
Projects
None yet
Development

No branches or pull requests

3 participants