Recent Posts (page 34 / 65)

by Leon Rosenshein

Input Validation

Little Bobby Tables might not be real, but SQL injection certainly is. And there are lots of ways to prevent it. And we should do all of them. From sanitizing inputs as soon as possible to using least privilege when making changes (why does the student management interface even have drop table permission?).

But this isn't about what to do. This is about where to do it. And I'm going to assert that it's as soon as possible. Because the sooner you can tell the user, the more context you can provide. Which would you rather deal with, a website that lets you type in answers to all 15 fields, then comes back with "Invalid input, please re-submit" or one that, as soon as you leave the age field, tells you that the input can only be numbers if you hit a character by mistake? I know which one I'd choose.

The same thing applies to client libraries that you might provide for your service. Very often there are options that can't be chosen together. Save time, a server round-trip, and all of the error generation/parsing by doing that on the client.

Yes, sometimes this means making a library that you can re-use both client and server-side. Or it might mean implementing the validation in multiple languages because you've got multiple clients. But it still makes sense.  You always want to delight your users.

Even (especially?) when they make mistakes.

by Leon Rosenshein

Abstractions vs. Interfaces

"Code against interfaces." Generally speaking that's a good idea, but let's be clear about one thing. But there's a problem with that sentence. What does interface mean? There are a couple of options

It could be the dictionary definition of an interface, which is "a point where two systems, subjects, organizations, etc. meet and interact" or, in computer terms, the public API. It could be a class, a service, a website, or something else. That means every C++/Java/<insert language> class is an interface. And that's certainly not what the intent is.

Or, it could mean the keyword interface, but C++ doesn't have an *interface* (unless you're talking about the MSVC extension). So that's probably not what interface means either.

What it really means is code against the abstraction of the model. The platonic ideal as it were. The things that boil a model down to its essence. The capabilities of a thing, or it's verbs.

Consider a logger. It has one verb, write. It takes a LogLevel and a String, and puts them somewhere for later access. When you're using a logger that's all you care about. You might write some convenience functions to help hide the definition of level, or to build the string from a template, but that's about it. And that's what you code against. In C++ it's an abstract class with only pure virtual functions. In Java its an interface. In Golang it's a type of interface.

Constructing one can be as specific and complicated as you need, or even provided via dependency injection, but when used you only know/care about the abstraction.

So, you're already coding against interfaces. In fact you can't help but do that. What you want to do is code against abstractions

by Leon Rosenshein

Decisions, Decisions, Decisions

Decisions are important. Who makes them. Why they're being made. When they get made. What the intended consequences are. What the unintended consequences are. Those are all important things about decisions. And there's another important thing that I didn't mention. That's how they're made.

Sometimes making a decision is complicated. Like choosing a U.S. President. First you have to jump through the hoops to get on the ballot. Then the eligible people vote in their states. The states count the votes, and the results of the people's vote is announced. Next, the state legislatures choose some other people (the number chosen based on the number of Senators and Congresscritters) who then go off in a room somewhere and have another vote. The person who gets a majority in that count is then declared to be President. Lots of moving parts, and it takes months.

Other decisions are simple. Most of us have a dominant hand, and when we need to write something down we use that hand. Not a lot of thinking about consequences, involving others, or taking time. Just pick up a pen and write something down.

At work we're all decision makers. The hard part knowing how those decisions should get made. In really broad strokes, there's a continuum, from autocratic to consensus to unanimity. A good way to approach it is to think about the scope of the decision. The name of a temporary variable in a loop in a function has small scope, and the developer should pick the name and use it. That would be the wrong time to call a meeting. collect ideas, and then discuss until everyone agrees that you've picked the perfect name.

On the other hand, if you're designing an API you need to ensure that your customers will actually want to use it. Again, you're not going to wait until everyone agrees that the API is perfect for all of their different use cases, but you should have consensus among the group that the API isn't unusable.

And on rare occasions, it can be necessary for everyone to be fully committed. If success requires everyone to actively participate then making sure everyone is in full agreement is critical. Those situations aren't common, but when they occur, everyone needs to agree. 

Of course, sometimes you can't get to consensus, let alone unanimity. It could be because of viewpoint, conflicting goals, or simply lack of time. In those cases, after trying to get consensus on the API, someone needs to, with understanding of the use cases, and ensure that what's truly required can be done, make an autocratic decision.

And regardless, once the decision is made, everyone needs to work with the decision.

by Leon Rosenshein

Take Time

One of the counterintuitive things about productivity is that you can be more productive overall by taking a break. And like everything else, this scales. Taking a short break during the day can clear out the cobwebs, let your mind process what’s floating around, and surface ideas that you were too busy to notice.

Similarly, holidays and vacations give you a slightly longer period to reset. There's more to life than work and, especially these days of WFH, having the time to not work is critical. Whether it's spending time with family, working on a hobby, having new experiences, or actually doing nothing (which is very different from not doing anything), it takes time to shift from work to something else. Personally I like cruises because of how disconnected from the rest of the world they are. One thing I've noticed is that it takes me 2+ days to actually get disconnected and start to get the benefits.

Then there's the sabbatical. 4 weeks for every 5 years of service. You can really do something with 4 continuous weeks. Everything from nothing (again) to starting something that's important to you. You could do a Feynman sabbatical in another field. Help out that NonProfit that needs your help. Whatever makes you happy.

I say this today because tomorrow is Thanksgiving Day in the US and Americans, particularly in the tech field, are notoriously bad at taking vacations and enjoying their holidays. Lots of reasons why and this isn't the place for them. But it is the place to remind folks that if tomorrow is a holiday for you, take it. And for the non-US folks, when your holidays come up take them.

I guarantee you the work will be there when you get back. And you'll be better equipped to do it.

by Leon Rosenshein

Error Types

Statistics is all about the null hypothesis. You assume it’s true and try to prove if it is false. Consider a fire alarm. If it’s not ringing you assume there’s no fire. If it is ringing then the assumption is that there is a fire. The state of the alarm is a a simple binary. It’s either ringing or it’s not. And either there is a fire or there’s not. So you have the following truth table

                 Alarm
           Ringing   Not Ringing
         |---------|-------------|
 No Fire | Type I  |   CORRECT   |
         |---------|-------------|
    Fire | CORRECT |   Type II   |
         |---------|-------------|

Simple and clean. Two correct states and two error cases. The Type I, or false positive, and the Type II, or false negative.

As developers we need to deal with this kind of problem all the time. One of the more common is alerting, or error detection. If you have a perfect signal for an error case then you can always do the right thing. If your service is not running that’s an error. Simple. But what if you’re not getting the signal that your service is running? What type of error is that? Does that mean your service isn’t running or that the signal is blocked? What do you do in that case?

Well, it depends. Mostly it depends on the various costs of being wrong and benefits of being right. For monitoring the datacenter most of our alerts will fire if we don’t get the signal. It’s a Type I error and we do that for a few reasons. First, even if the DC is ok, the fact that we’re not getting a signal is a problem, and the cost of DC outages is high. Even if it’s not something in our control (i.e. the fiber seeking backhoe strikes again),

FSB

we still want to know so we can do something about it. Second, the actual cost of the alert is pretty low. Just a phone call, albeit potentially in the middle of the night. Actually, the cost for a single event is low.

The problem is that this is a distributed system. There are latencies. There are networks. There are many reasons why we might not get a datum on time, and if we fired the alert every time that happened we’d quickly succumb to alert fatigue and start ignoring them. That’s not an error, but it is a real problem.

So to avoid that we build some latency into the system, The signal needs to be bad for some time before we fire. The longer the time, the less likely we are to have a Type I error. Unfortunately, the longer the time, the more likely we are to have a Type II error, a false negative. And the cost of those is high. 100’s of people and thousands of tasks failing. We really don’t want that, so it’s a balance.

Your situation might be different. For mission critical safety decisions you might choose to eliminate Type II errors in favor of more Type I errors. It’s not comfortable, but a spurious hard braking event is better than no brakes and hitting something. Other things might be OK to just ignore.

And that completely skips the Type III (the right answer to the wrong problem) and Type IV (the right answer for the wrong reason) errors, but those are topics for another time

by Leon Rosenshein

Implicit vs. Explicit

"An implicit understanding is anything a skilled geek would have to be told before being set loose in your codebase to make a change. Implicit understandings take lots of different forms, but at their simplest, they often involve hidden correlations."

    -- @GeePawHill

Computers are very literal. They always do exactly what you tell them to do. Even (especially?) when that's not what you want them to do. And yet we often write code that lets us do that.

How many times have you come across a library that requires you to create an object, then use that object to set some properties before you can actually use it? It's a fairly common pattern. And it's an example of implicit requirements.

You need to know that before you can use the object you need to initialize it. That's not an unreasonable thing, but why write a library that makes the user have to remember that. Consider an HTTP request class. It probably has a member that determines if it's a PUT, GET, PATCH, DELETE, etc. And the typical use case looks something like

req = new HttpRequest()
req.Method = HTTP.GET
req.URL = "someurl"
.
.
.
resp = client.Do(req)

That works, but it's possible, and if the code is more complex, easy to not set the Method or URL members. And then at runtime you get some kind of error. So why subject yourself to that kind of delayed error?

You can prevent that kind of error by being explicit. Instead of creating a bare Http request, explicitly create a GET request, and make the creator require a URL. Something like

req = new HttpGetRequest("someurl")
.
.
.
resp = client.Do(req)

With that pattern it's impossible to execute a GET request without setting a URL. 

So next time you're explaining a feature or bug fix and part of the explanation includes the line "And before you do X, don't forget to do Y", take a look at the code and see if you can turn an implicit requirement into explicit code.

For more examples of implicit requirements and explicit code to remove them, check out this thread.

by Leon Rosenshein

Just No

Intern/work-study is a hard gig. You're dropped into a company, given a few days intro, a mentor and then expected to produce something in a few weeks/months. It can be stressful, and done right the intern learns/grows a lot. I think it's the reason that graduates from Waterloo often do so well.

Years ago I had a discussion with the director of the engineering placement office at a school I was visiting for campus interviews. We were discussing the differences between an intern interview and an FTE interview. We talked about how an internship is in some ways a really long interview. There's no long-term commitment by the company so you can take more of a chance on the candidate. On the other hand, a good internship produces something useful and takes the mentor's time away from their day job. By the time the discussion was over I had just about convinced myself that intern interviews had a higher bar than FTE interviews.

That's not strictly true, but think about it. A campus FTE hire is expected to take a while to come up to speed. Months to be fully on-boarded and fully adding value wouldn't be out of the question, and any FTE hire is a multi-year investment. An intern, on the other hand, is expected to produce something meaningful in 6 to 12 weeks, and while we all want things to work out, there's no commitment beyond that. If nothing else, taking on a non-traditional candidate (someone with little/no coding experience) because you've identified that spark doesn't make much sense if the candidate is going to spend their internship learning to code, but it might be OK for an FTE. I've hired those candidates as FTEs and it's worked out well, but would have been a disaster as an intern.

I say all that as context and to say that I think internships are great for both the employer and the intern. I think everyone should do one. I said that to my kids and they did internships before they graduated and entered the workforce full time. But I also told them that they should do paid internships. If you're not getting paid it's not an internship, it's volunteer work. I also think volunteering is a good thing and my kids did that as well, but don't let anyone tell you the two are the same thing.

Which brings me to my rant. There are some industries, such as the creatives (Hollywood, fashion, music, marketing), law, healthcare, and non-profits, that have traditionally offered unpaid internships. As much as I think that's a bad idea, and potentially illegal, people should at least expect it going in. Engineering, on the other hand, traditionally doesn't do that. And I don't think we should start.

Apparently others think it's a good idea. Places like LambdaSchool, which as near as I can tell, not only gives out its students for a 4 week free trial, but does it as part of the program, so the students are not working for free, they're paying to be sent out as free labor.

And that's just wrong.


by Leon Rosenshein

Hofstadter’s Law

It always takes longer than you expect, even when you take into account Hofstadter’s Law

     — Hofstadter’s Law

Even if you've never read Godel, Escher, and Bach: An Eternal Golden Braid (GEB) you've probably heard Hofstadter's Law. And like most of GEB, it's a recursive law. That doesn't make it any less relevant though. The real question though, is how can we acknowledge it and make plans while maintaining our internal honesty.

One (the?) way to do that is to be clear that we're not planning, we're forecasting. Forecasting the weather is far from an exact science, and no-one expects it to be. Long term we have climate. Colorado is sunnier than Seattle. While any given day might be different, over any significant period of time it's a true statement. The 2 week forecast is usually directionally correct, but expecting the daily high/low to be accurate more than 3 days out is asking to be surprised. Similarly, tomorrow's high/low is pretty accurate, but no-one is very surprised if it's 5 degrees off.

Estimating software completion dates shows the same kind of pattern. However, we call them plans or schedules, and that changes expectations. The fact is that for any given person's estimate, the larger in scope/longer the estimate the less accurate it's going to be. If we call them forecasts instead of schedules a couple of important things change.

First, and most important, is that the expectations of accuracy change. Forecasts aren't just off by a consistent percentage, the accuracy gets worse the farther out they are. So when you say something will be done today people expect it to be done today or early tomorrow. If you say it's going to take 6 weeks then people won't be surprised with anything between 3 and 12 weeks. They'll be unhappy, but if you keep them updated they won't be surprised.

Second, is the internal expectation. You expect a forecast to change as much as anyone else, so there's much less internal struggle to change it. If your schedule is wrong you've missed a deadline, but if a forecast is wrong you update it as soon as you have better information and don't feel bad about it. So you're more likely to update it and keep it as accurate as possible. Which, paradoxically, makes it more reliable and better for planning :)

So, next time you're thinking about what you're going to do over the next month/quarter/year, try forecasting it, not scheduling it. Get it directionally correct. Make sure the sequencing is correct (as far as you know now) so you don't end up blocking yourself. And update your forecast as new information becomes available. You'll end up being more accurate that way.

And if you haven't read GEB, take a few weeks and read it. It's worth it.


by Leon Rosenshein

Feedback Please

Back in September I mentioned that O'Reilly was doing an Architectural Kata competition and that I thought it would be fun to get together a team and enter the competition. Well, we did put together a team, and we our team, selfdriventeam was selected as a semi-finalist. You can see our submission on the github. The final step is a live presentation to the judges on December 3rd.

My ask is for volunteers to provide feedback on our final presentation. I'd like to do a run-through the week of November 30th/Dec 1st. If you're interested in providing feedback let me know.


by Leon Rosenshein

Thinking Like An Engineer

First, a link to something I posted about 7 months ago. It was relevant then, and it's still relevant now.

That said, there are lots of things that go into thinking like an engineer, but here are a few that I think are important.

  1. Balancing constraints: Everything we do has some sort of constraints. They could be memory, bandwidth, execution time, development time, short vs long term gains, user value, or something else entirely. Our job as engineers is to look at the set of constraints and figure out the best solution to the problem with the information currently available.
  2. Making it practical: The solutions we come up with need to be doable. Part of it is balancing constraints, but it's also about not limiting yourself to the perfect solution when there is a good enough solution that meets all the requirements. If the perfect solution we come up with needs unobtanium then it's not a practical solution and it doesn't count.
  3. Solving a problem: Theoretical physicists do important work and without their additions to the body of knowledge engineers wouldn't be able to build/design things, but theoretical physicists aren't engineers. Sheldon Cooper might get the Nobel prize for his work, but it took Howard Wolowitz to turn things into devices people could use. Also, one of my favorite answers when someone asks me how to use a system in a non-standard way is "What are you really trying to do?" This makes sure that I'm not just solving a problem, I'm solving the right one.
  4. Always learning/teaching: Speaking of "What are you really trying to do?", another reason I like that question is that at least one of us, and often both of us, learns something, and often it's both. I get to understand use cases better, so I can provide a better solution. The person with the original question either learns how to do what they asked or they learn a better way to approach the problem.
    Additionally, as an engineer you recognize that there are other engineers out there working on similar problems. It's great to learn from your own mistakes, but it's even better if you can learn from someone else's. Good engineers stay aware of what's going on in their fields _and_ related fields and figure out how to use that knowledge going forward.
  5. Laziness: Great engineers are lazy. They'll put a lot of effort into something up front so they never have to think about the problem again. Designing automation and feedback loops so that proper function is maintained despite changing conditions. In the software world it's things like scripts, crontabs, triggers and redundancy that let us sleep soundly at night.


Of course there are lots of others. What do you think it means to think like an engineer.