Velocity Is Internal
Story points and burndown charts. Two staples of the agile planning process. They're what let you have confidence that what you're planning for any given sprint is close to what will be delivered. And they're pretty good at that. They let you compare projections and actuals over time for your team. But that's pretty much all they're good for.
One of the things they're very bad at is comparing across teams. You can make all the claims you want about common definitions for story points and capacity, but the reality is that every team goes through the 4 stages of development (remember forming/storming/norming/performing?) and gets to a point where the team consistently agrees on what a story point means to them. And once you have a consistent value for a story point and historical data on how many of those story points gets done in a sprint you can use it to predict, with some accuracy, how much work to put in a sprint and expect it to get done. But, that only applies to that team. Other teams have different definitions of story points and different capacities, so comparing velocity across teams is like comparing the efficiency of a regional bus and a taxi by the number of trips they make a day. With enough context you might be able to compare them, but just looking at the two numbers doesn't tell you much. If you want to compare teams using some of that information then maybe look at comparing the accuracy of their predictions. If the ratio of story points predicted to accomplished gets close to 1 and stays there then the team is making reliable predictions.
Another thing velocity is bad at is predicting the project completion date. And that's for similar reasons. The task list, let alone the story points, estimated early in the cycle just aren't that accurate. So until you get close to the end and really know how many points are left, as defined by the team that is going to be working on it, you can't just divide by velocity and trust the answer.
Bottom line, historical velocity is a short term leading predictor, not a long term performance measurement. So don't treat it as such.
