Expected Length of Longest Streak (Question by e-mail)

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Ganchrow
    SBR Hall of Famer
    • 08-28-05
    • 5011

    #1
    Expected Length of Longest Streak (Question by e-mail)
    ... for this strategy to work I need to know how many losess I can expect to see in a row.

    So if I have a 33.333% chance of winning each of my games at +200 whats the most losses in a row I can expect on average over 1500 games?
    I think we all know where the above is likely heading and it's nowhere good. Nevertheless, the underlying question remains valid -- namely how would one calculate the expected length of the longest streak over a series of given length with given associated single-game win probability? For good measure let's also throw in the standard deviation of the max series length as well as the fair price on the central over/under.

    Before you pull out the pen and paper, however, I'll caution you that no closed form solution to the problem exists and so is probably best solved by writing a computer program to determine a solution for any given inputs (in this case a win probability of 33 1 3 %, and a series length of 1,500 games).

    There's been some talk as of late on how best to get started with quantitative programming. Personally, I've found the best way to learn how to write a certain type of program is just to dive right in. So that said, if anyone wants to give this a whirl, they're certainly more than welcome. Otherwise, I'll post my Perl solution to this problem tomorrow. (If you want to download Perl, you can get the latest Windows version here and can find other versions here.)

    The mechanics of the solution, btw, are pretty straightforward. I detailed the solution to the problem of calculating the probability of seeing a streak of at least a given length in this post.
  • Ganchrow
    SBR Hall of Famer
    • 08-28-05
    • 5011

    #2
    I can generally tell when I've hit on a topic of great interest to the Think Tank at-large when it receives no replies. Nevertheless, it's my hope that the incredibly exciting and sexy Perl script that follows will generate some intense interest.

    So without further ado:

    PHP Code:
    #!perl
    
    # Author: Ganchrow
    # E-mail ganchrow AT sbrforum.com
    # website: http://forum.sbrforum.com/handicapper-think-tank/
    
    # save file as streak.pl
    # run from a command prompt in the directory
    # where this files is saved as:
    # perl streak.pl series_length single_game_prob
    # where "series_length" "single_game_prob"
    # correspond to the figures of interest
    
    use warnings;    # tells the compiler to alert of all warnings
    use strict;        # tells the compiler to necessiate the declaration of all varaiables
    
    my ($series_length, $single_prob) = &get_args(\@ARGV);
    
    main();
    exit 1;
    
    sub main {
        # main suboutine that gets and displays output data
        print STDERR "Series Length: $series_length\n";
        printf STDERR "Single Probab: %f%%\n", $single_prob*100;
        print STDERR "-" x 25 . "\n\n";
        select((select(STDERR), $| = 1)[0]);    # flush standard error output buffer
    
        my ($exact_ra, $or_greater_ra) = &calc_streak_histogram($series_length, $single_prob);
        my $ev = &calc_ev($exact_ra);
        my $stddev = &calc_stddev($exact_ra, $ev);
        my ($over, $over_prob) = &calc_median_over($exact_ra, $or_greater_ra);
        printf "EV:\t%0.5f\n", $ev;
        printf "StdDev:\t%0.5f\n", $stddev;
        printf ("Ov%0.1f:\t%0.5f%%\n", $over, $over_prob*100);
        printf ("Un%0.1f:\t%0.5f%%\n", $over, (1-$over_prob)*100);
    }
    
    # quantitaive subroutines
    sub calc_streak_prob($$$) {
        # calculates the probability of experiencing a streak of at
        # least length $streak_length over a series of $streak_length
        # games, given a single game probability of $single_prob
        # this subroutine is meant to be called from calc_streak_histogram
        # and only verifies input data integrity for $streak_length
        my ($series_length, $streak_length, $single_prob,) = @_;
        my (@tot_probs, $fixed_prob,);
    
        if ($streak_length > $series_length) {
            return 0;
        } elsif ($streak_length <= 0) {
            return 1;
        }
    
        $#tot_probs = $series_length - 1;
        $tot_probs[$streak_length] = $single_prob ** $streak_length;
        $fixed_prob = $tot_probs[$streak_length] * ( $streak_length == 1 ? 1 : (1-$single_prob) );
        for (my $i = $streak_length+1; $i <= $series_length; $i++) {
            $tot_probs[$i] = $tot_probs[$i-1] + $fixed_prob  * (
                       defined $tot_probs[$i-$streak_length] ?
                         1-$tot_probs[$i-$streak_length] : 
                         1
                     )
            ;
        }
        return $tot_probs[$series_length];
    }
    
    sub calc_streak_histogram($$) {
        # calculates the probability of all streaks of length from
        # 0 to $series_length. reurns two references, first to array of exact
        # probabilities, and the second to array of "or greater" probs
        # this script verifies that $series_length and both valid values
        my ($series_length, $single_prob,) = @_;
    
        my (@exact_probs, @or_greater_probs,);
        # @exact_probs      -- array holding probabilities of a streak of exactly given length
        # @or_greater_probs -- array holding probabilities of a streak of given length or longer
    
        {
            use integer;
            $series_length = int($series_length);
        }
    
        if ($single_prob < 0 || $single_prob > 1) {
            die "Single game probability must be between 0 and 1\n";
        }
    
        # sets the size of the given arrays
        $#exact_probs = $series_length - 1;
        $#or_greater_probs = $series_length - 1;
    
        for (my $streak_length = $series_length; $streak_length >= 1; $streak_length--) {
            # This loop fills in the various probabilities for the @exact_probs
            # and @or_greater_probs arrays. We start at the maximum streak length
            # which is equal to the series size and then work our way backwards.
            # This is done so we can so that we can first call the calc_streak_prob
            # for the largest streak sub and then subtract it out for the next largest
            # series length so as to calculate the @exact probs in a single pass.
            # If we were more concerned we'd probably recode this in a fatser language such
            # as C or C++ and then call it from a DLL (in windows.
            # Another way to speed this up would be to calculate the EV, Std. Dev., and
            # over/under values in this loop so as o avoid duplicating this work later.
            # This, however wouldn't be much of a time savings as most of the time is really
            # spent in the multiple calls to &calc_streak_histogram().
    
            # display current count of calls to &calc_streak_prob() subroutine
            unless ($streak_length%50) {
                # only update display once every 50 trials
                print STDERR ( "\b" x length($streak_length+50) );
                print STDERR ( " "  x length($streak_length+50) );
                print STDERR ( "\b" x length($streak_length+50) );
                print STDERR ( "$streak_length" );
                select((select(STDERR), $| = 1)[0]);    # flush output buffer
            }
    
            $or_greater_probs[$streak_length] =
                &calc_streak_prob($series_length, $streak_length, $single_prob)
            ;
            $exact_probs[$streak_length] =
                $or_greater_probs[$streak_length] -
                  ($streak_length < $series_length ? $or_greater_probs[$streak_length+1] : 0 )
            ;
        }
        print STDERR "\b\b";
        select((select(STDERR), $| = 1)[0]);    # flush output buffer
    
        $or_greater_probs[0] = 1;            # we set the or great prob for a streak of 0 to 100%
        $exact_probs[0] = 1-$or_greater_probs[1];
    
        # we return refernces (aka "pointers") to the two array data structure
        return (\@exact_probs,\@or_greater_probs)
    }
    
    # descriptive statistics subs
    # as noticed previously these could be rolled
    # into calc_streak_histogram for slightly greater
    # execution, but at the cost of clarity and modularity
    sub calc_ev(\@) {
        # calculates the ev
        # input is a an array reference (a "pointer") that maps exact
        # streak length to probabilities
        my ($exact_ra) = @_;
        my $ev = 0;
        foreach (1 .. $#$exact_ra) {
            # we start the above loop at a streak length of 1
            # because a streak length of 0 will naturally not impact
            # the EV
            $ev += $exact_ra->[$_] * $_;
        }
        return $ev;
    }
    
    sub calc_stddev(\@$) {
        # calculates the standard deviation
        # first input argument is a an array reference that maps exact
        # streak length to probabilities, second (optional) is a scalar
        # representing the mean
        my ($exact_ra, $mean) = @_;
        $mean = &calc_ev($exact_ra) unless defined $mean;    # if the specfied we re calculate it
        my $variance = 0;
        foreach (0 .. $#$exact_ra) {
            $variance += $exact_ra->[$_] * ($_ - $mean)**2;
        }
    
        # check for positive valus of $variance to avoid weird rounding errors
        # that might result in taking the squre root of a negative number
        return($variance > 0 ? sqrt($variance) : 0);    
    }
    
    sub calc_median_over(\@\@) {
        # calculates the series length and associated over probability closest to 50%
        # locates the cumulative probability closest to 50% (some of which may
        # represent half points) and returns the streak length to a cumlative
        # probability of 50%, as well as the exact probability figure
        # first and second input atrguments are both array references, the first
        # mapping exact streak length to probabilities, the second mapping a streak length
        # of at least the speccified to probabilities
    
        my ($exact_ra, $or_greater_ra) = @_;
        my $closest_over_prob = 0;
        my $closest_streak_length = 0;
        foreach (my $streak_length = 0; $streak_length <= $#$exact_ra; $streak_length += 0.5) {
            my $over_prob = &calc_over_prob($exact_ra, $or_greater_ra, $streak_length);
            if (abs($over_prob - 0.5) < abs($closest_over_prob - 0.5)) {
                $closest_over_prob = $over_prob;
                $closest_streak_length = $streak_length;
            }
            last if ($over_prob <= 0.5);
        }
        return($closest_streak_length, $closest_over_prob);
    }
    
    sub calc_over_prob(\@\@$) {
        # Calcuales the over probability for the given streak
        # This properly handles half points, but does not
        # check that $streak_length is >= 0
        # first and second input atrguments are both array references, the first
        # mapping exact streak length to probabilities, the second mapping a streak length
        # of at least the speccified to probabilities
        # third argument is scalar containing the streak length of interest (must be
        # an integer or half integer
        my ($exact_ra, $or_greater_ra, $streak_length) = @_;
        my $int_streak_length = int($streak_length);
        my $over_prob = ($or_greater_ra->[$int_streak_length+1] || 0);
        if ( $streak_length - $int_streak_length < 0.01 ) {
            $over_prob /= ( (1-$exact_ra->[$int_streak_length]) || 1 );    # integer streak
        }
        return $over_prob;
    }
    
    # startup subroutines
    sub get_args(\@) {
        # Gets arguments ($series_length and $single game win prob from command line
        # and verifies their integrity. Displays usage aned exits if nvalid
        my(@myARGV) = @{$_[0]};
    
        if ($#myARGV < 1 || grep {/\?|h/} @myARGV) {
            &display_usage_and_exit();
        }
    
        # removes commas as converts %ages and fractions to decimals
        @myARGV = map { s/,//g; s!([0-9]+)%$!$1/100!x; eval "$_"; } @myARGV;
    
        my ($series_length, $single_prob) = @myARGV;
    
        if ($single_prob > 1 || $series_length != int($series_length)) {
            my $temp = $series_length;
            $series_length = $single_prob;
            $single_prob = $temp;
        }
    
        if ($single_prob > 1 || $single_prob < 0 || $series_length != int($series_length)) {
            &display_usage_and_exit();
        }
        return ($series_length, $single_prob);
    }
    
    sub display_usage_and_exit(); {
        # displays proper script arguments
        # and then exits
        warn "Usage: perl $0 series_length single_game_prob\n";
        exit 0;
    } 
    


    A '#' at the beginning or near the end of a line denotes a comment (similar to an old school BASIC 'rem' statement). These comments should explain what's within the relevant piece of code.

    From a command prompt in the directory where the file is saved run the script as:
    perl streak.pl 1500 2/3

    This should yield output of:
    Code:
    Series Length: 1500
    Single Probab: 66.666667%
    -------------------------
    
    EV:     16.25075
    StdDev: 3.15546
    Ov15.5: 53.26133%
    Un15.5: 46.73867%
    This tells us that given a single-game win probability of 33 1 3 %, over 1,500 games, the expected length of the longest consecutive losing streak would be about 16.25075 games with a standard deviation of 3.15546 games.

    The central over/under would be a streak of 15.5 games with fair prices of about and , respectively.

    My hope here is that this will serve to demonstrate how one might go about solving a fairly simple problem programmatically (even if the solution to said problem is of at best minor relevance to advantage sportsbetting).

    As always, if at all interested, please free to point out any bugs, gross inefficiencies, or suggested improvements.
    Comment
    • Data
      SBR MVP
      • 11-27-07
      • 2236

      #3
      Originally posted by Ganchrow
      the incredibly exciting and sexy Perl script
      The script is useless without pictures.
      Comment
      • Ganchrow
        SBR Hall of Famer
        • 08-28-05
        • 5011

        #4
        Originally posted by Data
        The script is useless without pictures.
        Here you go:

        [ATTACH]2662[/ATTACH]

        25 points to whomever is able to put a name to this exciting and sexy face.

        And no ... it's not my face (or my shirt). I should be so lucky.
        Comment
        • Art Vandeleigh
          SBR MVP
          • 12-31-06
          • 1494

          #5
          Larry Wall.

          I want my 25 points in one lump sum, after taxes.
          Comment
          • Ganchrow
            SBR Hall of Famer
            • 08-28-05
            • 5011

            #6
            Originally posted by Art Vandeleigh
            Larry Wall.

            I want my 25 points in one lump sum, after taxes.
            Well done. Points are en route.

            Larry Wall, Godfather of Perl.

            Larry Wall once penned the Three Virtues of a Programmer:
            1. Laziness - The quality that makes you go to great effort to reduce overall energy expenditure. It makes you write labor-saving programs that other people will find useful, and document what you wrote so you don't have to answer so many questions about it. Hence, the first great virtue of a programmer. Also hence, this book. See also impatience and hubris.
            2. Impatience - The anger you feel when the computer is being lazy. This makes you write programs that don't just react to your needs, but actually anticipate them. Or at least pretend to. Hence, the second great virtue of a programmer. See also laziness and hubris.
            3. Hubris - Excessive pride, the sort of thing Zeus zaps you for. Also the quality that makes you write (and maintain) programs that other people won't want to say bad things about. Hence, the third great virtue of a programmer. See also laziness and impatience.
            Comment
            Search
            Collapse
            SBR Contests
            Collapse
            Top-Rated US Sportsbooks
            Collapse
            Working...