Rollup maturity ranking proposal - call for final feedback

After numerous discussions with the whole l2beat team we have decided to have another round of community feedback before finally releasing our implementation of a “rollup maturity ranking” which is our interpretation of what has been proposed by Vitalik in his ethereum magicians’ post.

We understand that no proposal will be perfect. Different projects have different roadmaps and different ideas of what is the best way to achieve a common goal, i.e. fully decentralized, permissionless scaling solution that ultimately is fully secured by Ethereum.

The main problem with any type of roadmap or ranking is that it is linear, not multi-faceted. For example you may have a construction which has a fully decentralized sequencer and is upgradable with a significant time delay however it has no running fraud proof / validity proof system. Other construction may have deployed fraud proof / validity proof system, however it has centralized sequencer that can censor users’ transactions and it is instantaneously upgradable. Which of these two constructions are further down the road? They are clearly in different places and taking different paths towards what is hopefully the same goal.

Presenting complete and nuanced information makes it harder to see the big picture and quickly assess the state of specific projects or the whole ecosystem. At the same time a simple ranking with clear steps for moving forward will encourage projects to take the necessary steps towards better security and decentralisation, and reward (at least symbolically) those projects that actually make progress.

With our proposal we would like to make sure that the assessment of maturity is easy to understand for the end users but at the same time it does not miss important details / caveats.

We think that each Rollup may be progressing towards the ultimate state somewhat independently on three primary dimensions:

  • Running Fraud Proof System / Validity Proof System (no plans, under construction, deployed on testnet, deployed on mainnet but behind whitelist, deployed on mainnet and available permisonlessly)
  • Upgradability (instant upgradability by EOA, instant upgradability by community MultiSig / Security Council, upgradability behind timelock with built-in override mechanism, etc…)
  • Decentralisation / Existence of Escape Hatches, etc… - central sequencer, permissionless sequencer, existence of L1 force-queues, escape hatches, etc…

On each of these dimensions different constructions may be more or less mature.

The highest maturity ranking (“A”) would be reserved for a Rollup that essentially is mature in all three dimensions. Provided no bugs in implementation, these constructions are essentially “secured by Ethereum” with potentially some small caveats.

The next maturity ranking (“B”) would be reserved for a Rollup that has a running validity / fraud proof system but it misses on one or both of the remaining areas. This is to acknowledge that having running fraud/validity proof is arguably the most difficult part of the Rollup and having it in a production is a major achievement for any team. At the same time these constructions may be stil instantly upgradable or they may temporarily have no mechanism for censorship resistance.

Finally, the “C” ranking would be reserved for Rollups that do not have validity / fraud proof system deployed on the mainnet.

Apart from the main ranking (“A”, “B”, “C”) we might add one or two “-” signs if there is one or more areas of additional concern. Similarly we might add “+” sign if a Rollup has some additional feature that is noteworthy (for example running validity/fraud proof on a testnet for a “C” ranked rollup). A full objective list of criterias that need to be met to have “+” or “-” modifiers will be published.

This “maturity ranking” largely corresponds with stages as proposed by Vitalik. The basic criterias are the same, however it adds some more nuances. Additionally some extra information can be added summarising what needs to be done to have a higher maturity ranking.

Graphically instead of a simple Stage designation that could be presented to the end users like this:

image

We would present more detailed information as depicted on the mockup below:

image

We are gathering the final feedback from the community - is this proposal going in the right direction and - if so - what should be the exact and specific criteria for assigning “A”, “B”, “C” rankings and how “+” and “-” should be added, or should we stick with the original Stages as proposed by Vitalik.

7 Likes

With the rollup narrative becoming hotter, I think this ranking model is very well timed.

The six ranks you proposed (A, B, and C along with +/-) would be very helpful for the end user who is not concerned about the nuanced indicators behind the scenes. Correct me if I am wrong, but a lot of end users of L2Beat are also very tech-savvy. And I believe these users would like to get a more quantifiable overview of the ZKRUs or ORUs. I would also suggest to include a whole number score for such a user.

In addition to the whole number score, there should be a dashboard with all the indicators along with sliders which demo how the score is actually calculated so that the user can really accept this ranking model. The user can play with the slider to see what their “perfect” version of a rollup may look like. Basically, having this slider dashboard would increase transparency into how this ranking framework is engineered.

Looks good! No quibbles, am wondering about how you’re thinking about some of the specifics, re what qualifies for “A” vs. “B” for the 2nd the 3rd dimensions (“upgradability” and “decentralization”).

Few examples:

For upgradability, would the following be “A” or “B”?

  • Instant upgradability by a large multisig of public, independent entities
  • Centralized EOA upgadability but on a huge timelock

And likewise for “decentralization,” A or B?

  • Centralized sequencer with force-inclusion path
  • Distributed (but fixed/permissioned) sequencer federation.

My primary concern with this proposal is that Vitalik’s original “stages” system is a mechanism for milestones while the proposed “grades” system is a mechanism for ranking. From a user standpoint, milestones tell users that one project is further along than another on a certain set of tasks, but ranking tells users that one project is better than another project along a given set of metrics. Because of the nature of L2Beat, it’s very likely that users will take this “betterness” to mean “more secure”.

I believe this is an important distinction to point out because it has an impact on the way that L2Beat needs to think about the factors that go into a ranking. The current proposal confuses these two concepts (milestones and rankings) in a way that ultimately degrades the value of the ranking system.

For example, take the following (emphasis my own):

The next maturity ranking (“B ”) would be reserved for a Rollup that has a running validity / fraud proof system but it misses on one or both of the remaining areas. This is to acknowledge that having running fraud/validity proof is arguably the most difficult part of the Rollup and having it in a production is a major achievement for any team.

This paragraph treats the existence of a fraud/validity proof as a milestone (“have something running”) but uses it as input to a security metric even if the existence of the fraud/validity proof does not significantly improve network security. This sort of metric creates a perverse incentive to build and ship a weak proof system to achieve the “B” ranking, likely at the expense of other factors that can improve security.

If this is meant to be a milestone tracker, I would use language that reflects this. “Stages” is better than “grades” for tracking milestones.

If it’s a ranking, I would expect to see a well-reasoned “point” system that provides a list of features or metrics and ascribes point value to each, where the point value is determined by the actual impact that the feature has on the security and robustness of the network. The ranking of a project would then be derived from the point total. This has the positive side-effect of allowing projects to make progress towards security along different pathways rather than defining one strict pathway for network security.

Right now I feel that this proposal is confused in its objective. L2Beat should carefully consider the purpose of the RMR. Is it milestone-based or security-metric-based? Is it a ranking or is it a milestone tracker? What exactly is it trying to communicate to users?

5 Likes

I pick metis.up! :rofl: :rofl: :rofl: :rofl: :rofl: :rofl: :rofl: :rofl:

Just a minor feedback:

Letter rating systems usually have an established cultural norm - from A to F, or A to E. So, “C” kind of has the connotation of being “average”, rather than the lowest category.

Instead, I’d suggest a more universal, visual system that’ll be easy to understand - the star rating system. Now, the most common is the 5-star, but it can be easily adapted to 3 stars. Having 1 star filled and 2 outlined is pretty clear to everyone it gets a 1 out of 3, so to speak. The plus/minus can be half-filled stars - also a very clear visual indicator.

Regarding “Stages” - as long as you define it clearly, it’s fine - however different L2s have different requirements. For example, some app-specific rollups may never need to get past Stage 1 - it would be unfair to penalize them as being insufficient.

(By the way, for consistency, you can also consider using a 3 star system instead of Stages - as Kelvin points out it may cause confusion with milestones etc. So, the individual elements can have their ratings, plus an overall rating.)

3 Likes

I think this is headed in the right direction. I like having a top-level stage that is easy to understand, but also providing the detail for folks who want to dig in.

One detailed suggestion regarding upgradability. In order for a rollup to get to the final stage of “upgradability behind timelock with built-in override mechanism”, I think it’s important that the upgrade be initiated via a decentralized, trustless process (similar to what you see from mature DeFi apps with on-chain governance e.g. Aave).

I think that directly using a score (A, +, -) will confuse the user into thinking that it refers to performance rather than security. Surely we will see some L2 that not going to be fully decentralized in sequencing (let’s say enterprise solutions), how to conciliate these criterias between stage of development and final goal?

Improving the visual part, I would consider bars like these or some circular bars to demostrate how close is the L2 from its final form (stage).

image

Hey, first of all thank you very much for all the responses! They were very insightful and gave us a lot of thinking. We’re still processing them, yet in the meantime let me share with you some of the background of our thinking that led to the presented design, maybe that will spur some more thoughts.

First of all, while thinking about the ranking, we wanted to balance various different aspects:

  • first of all we wanted it to incentivise a push towards a better, safer and more decentralized rollups, in a spirit of Vitalik’s post
  • given that, we didn’t want to ‘punish’ any rollup for ‘not being there yet’, rather kind of create a reward for going into a good direction - “with this new release, we will be able to go up in L2BEAT ranking”
  • at the same time we wanted to avoid valuing projects as ‘good’ or ‘bad’, rather show a comparison in terms of maturity level
  • we knew that any kind of ranking will trigger false assumptions, such as “project X is higher than project Y, thus X is more secure than Y”, as mentioned by @smartcontracts . Nevertheless, even though we probably can’t eliminate them, we would like to minimise such misleading impressions. This ranking is not supposed to be a ‘security grading’, but rather ‘maturity grading’.
  • as @Bartek mentioned in his post, we believe that the road to the secure, decentralised future is not straight, linear and additive in a simple way, but rather multi-faceted and staged. That’s why we have 3 clear cut-off lines (stages) but also additional modifiers that highlight both positives (+) and things that need to be worked on (-). That allows us to differentiate projects, award those that are progressing in the right direction even if they didn’t make it to the next stage yet, and at the same time give us an opportunity to mention things that we believe should be fixed.
  • and last but not least, we want to be fair to all the projects. While, as an autonomous watchdog, our commitment is always to provide reliable and accurate information, we have great respect to every buidler.

1. Original Vitalik’s idea

Here we tried applying the three stages, as described in Vitalik’s post. Our initial thoughts about this approach:

  • it’s too linear and not granular enough. Most projects would land in ‘Stage 0’, saying ‘Under construction’. We felt it’s a bit too harsh as it does not allow to promote projects that even if not yet ready for ‘Stage 1’, put a lot of effort in some areas that are worth mentioning.
  • it does however stress out that it’s not a ‘rank’ but a ‘milestone’ as mentioned by @smartcontracts
  • when put together though, it gives quite a misleading view that everything is ‘under construction’ and not much has been achieved yet.

2. Stars

Our second approach went in line with @polynya thinking. Using stars we could differentiate a bit (1,5 star is better than one star but worse than two stars). Few thoughts about this approach

  • in this case it’s easy to ‘award’ a project for doing something good, by giving it additional half-star, but it’s harder to raise awareness about project’s shortcomings as ‘taking away’ half a star degrades it to the previous ‘full star’ ranking. So a two-star project with some shortcomings becomes almost the same as one-star project with some additional values, while we feel that they should be more strictly differntiated. We could use some visual tricks in order to achieve this differentiation (like using different colours, having a dotted edge instead of full line edge, etc.) but it was too subtle and too complicated at the same time.
  • my personal belief (we had different opinions on our team) is that star-system is deeply ingrained in the internet with an expectation of everthing to be almost full-star. Like you would never ride with an Uber driver with three stars out of five. A hotel or restarurant at Tripadvisor with 1 star out of five is a clear no-go. And at first our site would show most projects with a half-star, one star, maybe one and a half. That would definitely send the wrong message right away as a portrayal of ‘L2 ecosystem’.

3. Letter ranking

That brought us to our final design - a letter system with +’s and -’s.

Our reasoning behind this design:

  • it gives us a lot of flexibility, as we have three main ‘stages’, but each with five different variants.
  • it also allows us to extend this system with further levels if needed, as the ‘final/perfect’ state is A, we can always extend the in-between states above C, D, etc.
  • as it is similar to the grading known from school or rating from financial world, it gives a good intuition of what each score is. At the same time (although to be honest, we didn’t plan for that, but it fits well :slight_smile: ), as noted by @polynya , if we refer it to the school-grade system, the C category lands somewhere in the average range - which is great for us, as it does not ‘punish’ those projects for not-being-there yet, but instead it promotes those that made an effort to go further.
  • even though quite complex in detail (theoretically there are 15 different state possible), it’s quite easy to get a quick overview of the current maturity of the ecosystem by simply looking through the rating

FInal thoughts

Some additional remarks regarding previous feedback:

  • I noticed, that it might be not clear, that our ‘grade’ is for the whole project, not for each aspect that we analyze (fraud proofs/validity proofs, upgradability, decentralization) separately.
  • regarding @Joxes proposal - we were thinking about such ‘progress bar’ as well, however what we found disturbing in it is that the different aspects that we assess are not clearly additive and interchange’able. Thus showing this progress is not easy as ‘counting points’ on the scale of 0 to 100. So we could use a progress bar but just to represent the different ratings, using A/B/C as different ‘checkpoints’ in the progress bar, and +/- would simply add/deduct some points from them. What do you think about it? We’ll check this approach as well, but I’m afraid it might be a bit cryptic and not as easy to compare when presented in a table.
  • even though we are aware that some may treat our maturity ranking as security ranking, as @smartcontracts noted, but a general rule we would like to avoid such connotations. That’s why we’re clearly suggesting that it’s maturity ranking and that’s why we present ‘what should be done to achieve a better grade’ right away in the ‘grade onhover’. Any suggestion how can we improve on that?

Once again, thanks a lot for the feedback so far and we’d like to hear more thoughts, so don’t hesitate to share any.

3 Likes

Let’s talk about goals

first of all we wanted it to incentivise a push towards a better, safer and more decentralized rollups

I think this is 100% the correct goal. Rollups should be safer and more decentralized. L2Beat is a powerful force for helping Rollups focus on the right things. However, I feel that Vitalik was actually wrong in his original proposal – the stage model is not the correct way to think about things because it enforces a linear path to decentralization. This is a key part of my problem with the current proposal.

I’ll use a specific comparison to demonstrate the point I’m trying to make. Imagine two rollups. Rollup A has fault proofs, but it has instant upgrade keys. Rollup B does not have fault proofs, but it has an upgrade security council, and upgrade delay, and it uses a community multisig-based system as a replacement for the fault proof until the fault proof is ready.

In this case, Rollup B is actually likely more secure and decentralized than Rollup A, even though it does not have fault proofs. The proposed ranking system would still give Rollup A a better grade than Rollup B, even though Rollup B is likely more secure/decentralized.

This is the general problem with a linear stage model. It’s impossible to reflect the multi-faceted nature of security and decentralization while using specific features (like a fault proof) as the gating metric between stages. Security and decentralization simply doesn’t work that way. We should not punish Rollups for failing to follow a specific, linear approach to security and decentralization. Doing so is actively harmful.

Confused targets

I’ll again make the statement that a grading system is not appropriate for stages. People associate letter grades with quality. Even a star system or stage 0, 1, 2 etc doesn’t really make sense to me. If you want to reflect milestones, then it’s better to just have a checklist. IMO this reflects a deeper issue with the proposal in its current state – it’s still confused about whether it’s trying to speak to the security/decentralization of a project or if it’s trying to demonstrate which parts of the project have been completed.

Generally, I don’t think the checklist is very interesting. Who cares if you’ve built X component or Y component, plus different projects will have very different components. As a result, I don’t think a milestone based system makes much sense at all.

Proposal: just focus on grading

I believe that what we really want is clear: we want some sort of metric that speaks to the real security of a system.

I think the correct way to do this is just to focus on a real grading system. Really compare different systems based on their relative security. Figure out a ranking system, associate points with the real security model and not based on milestones. A system that relies on a 6/9 multisig is better than 3/5 multisig, and that should be reflected in the score. Make people compete on meaningful additions to security, not on things like the fault proof that don’t actually significantly improve security unless they’re also paired with things like a security council and upgrade delays.

Conclusions

My stance at the moment is that “maturity ranking” is not a meaningful metric. People will confuse maturity with security. Users don’t care about maturity, they care about security and decentralization. Maturity simply isn’t well-defined enough and becomes a weak proxy for security that confuses users more than it enlightens them. I believe the community would be better served by a “security and decentralization ranking” that takes the security model and level of decentralization into account.

2 Likes

Just caught up on this thread after independently coming up with a similar solution: Feature: Weighted/ranked risk analysis · Issue #1277 · l2beat/l2beat · GitHub

My 2 cents on this discussion would be:

  • In similar thoughts as @smartcontracts, I believe a maturity label would be useful, but not to much a maturity ranking. I believe a security ranking is much more important (see my linked post above). Maybe it’s just a semantics issue, but it seems like the concern here is the user’s risk (in addition to the points listed at the start of kaereste’s post).
  • An issue I see with letter ranking is the difficulty on the user to understand the grading. Let’s say you have projects rated B and C, but no B- or B–. It may appear that there’s a single step between B and C. Likewise, if there’s not an X-- or X++, or if there is one of these in-between letter grades but they’re just not visible to the user (not in view).
1 Like