Simplifying Tggl to enforce good practices

31 Oct 2024•9 min read

Today, we are kicking off a smooth transition to address a design decision we made at the very start of our journey with Tggl—one that, in hindsight, wasn’t the right solution to the problem we were trying to solve. In this post, we will share why we believe this design choice was a misstep, outline the steps we’re taking to fix it, and guide you on how to smoothly transition your existing flags to embrace the new paradigm.

Skip to the FAQ

The wrong solution to an important problem

The main problem we needed to solve was to offer a reliable way to handle unexpected events that can prevent feature flags from being correctly evaluated:

Network error on the client side
Downtimes on Tggl’s side
Missing API key environment variable
Accidentally deleting a flag
And so on…

Those scenarios will happen, and our job is to make sure our system is resilient in those scenarios, even when caused by a misstep on the user side. To understand our first approach to this problem, we first need to talk about variations.

Variations are all the values a flag can have, for instance true and false, or "A" and "B". With Tggl, users are not limited to two variations per flag and could easily have 3, 4, or even more.

We decided to introduce a special variation called the “fallback variation”. This variation could be returned based on conditions (like any other variation) and — as its name implies — would also be used in case anything goes wrong on the client side. This makes error handling very explicit on the app.

Imagine we are rolling out a feature to a few users, we would have one active variation called “On” and our fallback variation would be the “Off” variation. In our code that would look something like this:

if (client.isActive('my-feature')) {
  // Do something...
}

The isActive method would return true if the flag explicitly returns the active variation, and would return false if the flag explicitly returns the fallback variation or if anything unexpected happens. This way, if the feature flags API is not reachable for some reason, the new feature will be hidden for everyone, which is much better than being suddenly visible to everyone or worst: making your app crash.

So far so good. What is the problem then?

Now lets take a second example: imagine that you run an e-commerce platform with a new advanced search algorithm that yields better results than the old one, but is slower to deliver those results. You might want to have a kill switch in place to use the old (but fast) algorithm for times where traffic is extremely high to avoid down times (eg, Black Fridays).

if (client.isActive('use-new-algorithm')) {
  // Use better new algorithm 💪
} else {
  // Use faster old algorithm 🏎️
}

That looks good, but there is a catch. What happens when the flags API is not reachable? The flag will be considered inactive and the app will switch back to the old algorithm. In this situation we’d rather stay on the new version 99% of the time and only fallback to the old version when explicitely told by the flag. The first way to achieve this is to flip the logic of the flag:

// Notice that we changed the name and "purpose" of the flag
if (client.isActive('use-old-algorithm')) {
  // Use faster old algorithm 🏎️
} else {
  // Use better new algorithm 💪
}

Or we could read the value of the flag with two active variations (On = true, Off = false) and ignore the fallback variation:

// We use true as a default value to fallback to the new algorithm
if (client.get('use-new-algorithm', true)) {
  // Use better new algorithm 💪
} else {
  // Use faster old algorithm 🏎️
}

Those two solutions are perfectly valid, but the fact that there are two was confusing. We could feel it during all our user research calls, we had to spend time explaining the difference between active and fallback, that fallback did not always mean “Off”, and that they could solve the resilience issue in two distinct ways. We had to fix it.

The new paradigm

We are ditching the concept of fallback and active variations altogether. Starting on Dec 16 2024, new flags now only have active variations with explicit values. Out of the two possible patterns, we believe this is the right one because:

This leaves users with a single way to solve the problem at hand
This removes the need to learn the concept of “active” and “fallback” variations for new users
This lets users choose the way they want to set their flag up, without having to flip its meaning, put a negation in the name, or think about the default variation when they are creating a flag
This lets users change their mind on what the default variation should be

All SDKs will receive a major version bump with only two breaking changes: the removal of the isActive method and making the defaultValue argument mandatory. All documentations have been updated accordingly.

To be clear, the new paradigm is not actually new. Everything that it does was already feasible, it simply enforces good practices by removing the competing solution, hence simplifying the learning curve for beginners. If you were already using the get method with a default value, nothing changes for you, otherwise you may want to transition your flags but that is completely optional for existing flags.

Note that all the safety mechanisms are still in place, and the SDK will still fallback to the default value in case of a network error or any other unexpected event.

How to transition at your own pace

First, know that you do not have to do anything. Switching to the new paradigm is completely optional for flag created before Dec 16 2024. Migration can be done progressively as new flags are created and old flags archived:

If you are a new user, you are not concerned, your account is already using the new paradigm
If you have an existing account with existing flags, those flags will be untouched unless you decide to, and new flags created after Dec 16 2024 will embrace the new paradigm

What do I need to change right now?

Nothing. Everything will just keep working as it does today.

What changes for flags I create from now on?

New flags do not have a fallback variation anymore, instead, they only have “active” variations with explicit values (eg. On = true, Off = false). This means that the .isActive() method will always return true (unless their is a network error).

Do not use isActive for new flags since it will always return true
Use get instead with a default value. Note that the default value will be used only in case something goes wrong.

This is only true for new flags that you create from now on, and you do not need to update your SDK for this to work.

Do I have to transition old flags?

No. You can let them live like that and eventually archive them. You only have to transition your flags if you want to update the SDK to the latest major version which drops support for the isActive method.

Before updating your SDK, either wait for all “old” flags to be archived, or manually transition them to the new paradigm.

How to transition old flags?

If you decide to transition old flags, you can follow this simple process to achieve zero downtime.

Let’s assume you have this simple setup for your flag:

Make sur that the “Live” variation has a value of true. If it does not, check your code to make sure you are not about to break something and set its value to true. Apply those changes.

Your code probably looks like this:

if (client.isActive('my-feature')) {
  // Do something
}

Update your code to to use the get method instead and make sur it has a default value:

if (client.get('my-feature', false)) {
  // Do something
}

If your code already uses the get method you are good to go, otherwise push those changes to production. Make sure that you changes are deployed and that everything is still working correctly before proceeding to the next step.

Now create a new active variation with the value false that you can call “Off”, and rename the fallback variation something like “Off legacy”. Apply those changes.

Now go to the conditions tab and make sur that the “Off legacy” is never used, replacing it with the new “Off” variation you just created. Apply those last changes and make sure everything still works.

You can now go back to the variations tab and hit “Migrate flag” to complete the process.

How to rollback a migration?

When you migrate a flag, it creates an event in the history tab, just like a normal change. To rollback, select the event right before and click “Rollback to here”

How to list the flags that need to be transitioned?

On your feature flags page you can add a filter to only see the flags that need to be transitioned:

When to update the major version of the SDK?

Before anything, update the proxy to its latest version, this can be done at anytime since it has no breaking change. Then, once you’ve transitioned all your flags, you can safely update the SDK. Type-checking should do its work, but you can search for isActive or is_active to make sure all flags are truely transitioned.

Conclusion

We acknowledge that we took a bad design decision early on and decided that it was time to make things right. We spent a lot of time making sure that the transition would be as smooth as possible for existing users and are ready to help you if needed.