At work, several bugs were filed due to HTTP requests failing with internal server errors. They were all caused by the same exception being raised:

#<FiberError: can't set a guard page: Cannot allocate memory>

This looks like we’re running out of memory on the box. However, when I checked, they were not using much memory. It was never more than 50% at the time the exception was raised. Then, I came across this bug report: https://bugs.ruby-lang.org/issues/17263. The reported bug is unrelated but one of the replies gave the cause of the error:

Regarding “can’t set a guard page” it’s because of your system is limiting the number of memory mapped segments. Each fiber stack requires a guard page and this is considered a separate memory map entry.

https://bugs.ruby-lang.org/issues/17263#note-1

Why can’t it allocate memory?

The specific code block in Ruby that raises the FiberError error is:

if (mprotect(page, RB_PAGE_SIZE, PROT_NONE) < 0) {
    munmap(allocation->base, count*stride);
    rb_raise(rb_eFiberError, "can't set a guard page: %s", ERRNOMSG);
}

https://github.com/ruby/ruby/blob/75ed086348da66e4cfe9488ae9ece5462dd2aef9/cont.c#L549-L552

The call to mprotect(2)1 is creating a guard page, and a return value less than zero means it failed to change the memory protection on the page. A guard page is a memory page used to separate the Fiber’s stack within the Ruby process’s memory. It prevents Fibers from accidentally interacting due to a memory overrun as trying to access memory in the guard page would result in a segmentation fault.

The ERRNOMSG global that Ruby uses in the exception message is set to “Cannot allocate memory” when mprotect(2) returns the error value ENOMEM. There are three situations that cause this error value to be returned:

  1. The kernel didn’t have enough memory available to allocate its internal structures. This didn’t happen because we saw no memory pressure on the box.
  2. The address range [page, page + RB_PAGE_SIZE) isn’t valid for the process, or includes pages that have not been mapped (via mmap(2)). This would indicate a bug in Ruby, which, while possible, seemed exceptionally unlikely.
  3. Changing the memory protection on a region would result in too many memory mappings. This seemed like the most likely: we were allocating too many Fibers. It is also what the reply on the Ruby bug report stated.

This call to mprotect(2) tries to create new memory mapped segments because it is removing read and write access to a memory range to create the guard page. By default, Linux limits the number of memory mapped segments to 65536 but you can change it with:

sysctl -w vm.max_map_count=x

If we were creating lots of Fibers then it’s possible that we are running into this limit because each guard page creates two new memory mapped segments. A memory mapped segment is a contiguous block of memory with the same protections. By changing the protection of a page in a segment from read/write to none, we end up with three segments: read/write, none, read/write. Thus, where there was once one memory mapped segment, there are now three segments.

FiberError is being raised because there are too many memory mapped segments in the Ruby process due to too many Fibers being created.

Why are we creating too many Fibers?

The first clue was that all the exception stack traces came from the same method: Enumerable.first!. This is a method that we monkey patch onto the Enumerable module. It is like first except, rather than returning nil when the object is empty, it raises an ArgumentError. This is useful because we use Sorbet2 extensively. first returns a nilable value, whilst first! a non-nilable value. This allows us to express the invariant that the enumerable should never be empty. Around the time this error started occurring in the logs, we had updated the method with the following patch:

    def self.first!
-       T.must(self.first)
+       self.to_enum.next
+   rescue StopIteration
+       raise ArgumentError.new("Enumerable must not be empty: #{self.inspect}")
    end

We’d gone from using #first on the receiver object, to converting the receiver to an Enumerator, then calling Enumerator#next3. This meant we are now performing external iteration. External iteration is where the caller manually steps the iterator from one item to the next. Iteration is driven by the caller. In Ruby, external iteration uses a Fiber. As we use Enumerable#first! all over the code base, we were suddenly using lots of Fibers, leading to running into the memory mapped segments limit.

We’re creating too many Fibers because the implementation of Enumerable#first! was changed to use external iteration.

Why is a Fiber used for external iteration?

Fibers4 are type of userspace thread. The important difference between fibers and threads is that fibers use cooperative multitasking. This means that different fibers have to work together, yielding to each other and being manually resumed. Threads are scheduled by the operating system and don’t require manual pausing and resumption to allow tasks to continue to make progress. Ruby provides Fiber5 as a fiber implementation. The reason it is used for external iteration of an Enumerator is that Enumerators are implemented using internal iteration. Internal iteration is where you pass a function to the iterator and the function is called with each item in the iterator. Iteration is driven by the iterator. Enumerator#each is the method on which all iteration of an Enumerator is built. It is a method to which you pass a block and it calls the block with each item in the enumerator — internal iteration. By using the cooperative multitasking provided by fibers, Ruby is able to switch contexts between the caller iterating the Enumerator and the block given to #each. In Iteration Inside and Out, Part 26 Bob Nystron gives a great simplified implementation exemplifying how Ruby converts internal iteration to external iteration:

class MyEnumerator
    include Enumerable
    def initialize(obj)
        @fiber = Fiber.new do
            obj.each do |value|
                Fiber.yield(value)
            end
            raise StopIteration
        end
    end

    def next
        @fiber.resume
    end
end

The key insight here is that we’re using the Fiber to switch contexts between the caller of #next and the block given to #each.

The actual Ruby implementation is written in C and a bit more difficult to follow but the concept is the same. Interestingly with the introduction of JITs to Ruby (mjit, yjit, rjit, etc.) we could see more implementations moving out of C and into Ruby of which Enumerator could be a candidate. So, in the future the implementation may look similar to the Ruby one above.

External iteration creates a Fiber so that Enumerable, which is internally iterable, can provide an external iteration interface.

Can you not? External iteration without a fiber

What can we do to fix it? By using internal iteration we don’t need a fiber. Enumerable#each performs internal iteration. Knowing this we can re-write #first! as:

def self.first!
    self.each { |i| return i }
    raise ArgumentError.new("Enumerable must not be empty: #{self.inspect}")
end

The reason this works is that #each is being passed a block. When you return from a block, it will return from the surrounding function, in this case #first!. Therefore, if the enumerable is not empty we’ll return the first item. If it is empty then the block given to #each won’t be called. Then we will execute the next line raising ArgumentError. This is the first time I’ve been grateful for Ruby’s differentiation between blocks and functions!

Interestingly, my solution to prevent #first! using a fiber has a similar shape to the usage of Fiber.yield to pass control back to the caller. The difference is that my solution only works for a single iteration, while using a Fiber is generalisable to iterating over the entire enumerator.

2

Sorbet is a gradual type system for Ruby. Amongst other things, it lets you define type signatures for your functions and methods then enforce them statically and at runtime. You can read more about it here: https://sorbet.org.

3

The reason for this is that T.must will raise TypeError if it is called on nil. It’s totally valid for nil to be the first element of an enumerator so calling T.must on the result will incorrectly raise. With #first we can’t tell the different between for example [] and [nil], and T.must will raise on the result of #first for both.