Recent Posts (page 47 / 65)

by Leon Rosenshein

On Language

Just a little humor today. 

ACHTUNG

ALLES TURISTEN UND NONTEKNISCHEN LOOKENSPEEPERS!

DAS KOMPUTERMASCHINE IST NICHT FÜR DER GEFINGERPOKEN UND MITTENGRABEN! ODERWISE IST EASY TO SCHNAPPEN DER SPRINGENWERK, BLOWENFUSEN UND POPPENCORKEN MIT SPITZENSPARKEN.

IST NICHT FÜR GEWERKEN BEI DUMMKOPFEN. DER RUBBERNECKEN SIGHTSEEREN KEEPEN DAS COTTONPICKEN HÄNDER IN DAS POCKETS MUSS.

ZO RELAXEN UND WATSCHEN DER BLINKENLICHTEN.


The inspiration for your favorite language:

  • Python:   What if everything was a dict?
  • Java:   What if everything was an object?
  • JavaScript:   What if everything was a dict *and* an object?
  • C:   What if everything was a pointer?
  • APL:   What if everything was an array?
  • Tcl:   What if everything was a string?
  • Prolog:   What if everything was a term?
  • LISP:   What if everything was a pair?
  • Scheme:   What if everything was a function?
  • Haskell:   What if everything was a monad?
  • Assembly:   What if everything was a register?
  • Coq:   What if everything was a type/proposition?
  • COBOL:   WHAT IF EVERYTHING WAS UPPERCASE?
  • C#:   What if everything was like Java, but different?
  • Ruby:   What if everything was monkey patched?
  • Pascal:   BEGIN What if everything was structured? END
  • C++:   What if we added everything to the language?
  • C++11:   What if we forgot to stop adding stuff?
  • Rust:   What if garbage collection didn't exist?
  • Go:   What if we tried designing C a second time?
  • Perl:   What if shell, sed, and awk were one language?
  • Perl6:   What if we took the joke too far?
  • PHP:   What if we wanted to make SQL injection easier?
  • VB:   What if we wanted to allow anyone to program?
  • VB.NET:   What if we wanted to stop them again?
  • Forth:   What if everything was a stack?
  • ColorForth:   What if the stack was green?
  • PostScript:   What if everything was printed at 600dpi?
  • XSLT:   What if everything was an XML element?
  • Make:   What if everything was a dependency?
  • m4:   What if everything was incomprehensibly quoted?
  • Scala:   What if Haskell ran on the JVM?
  • Clojure:   What if LISP ran on the JVM?
  • Lua:   What if game developers got tired of C++?
  • Mathematica:   What if Stephen Wolfram invented everything?
  • Malbolge:   What if there is no god?

Shooting yourself in the foot, language style

What's your favorite? Share in the thread.

by Leon Rosenshein

Priority -1

We all have lots of work to do. During the development phase there's often so much work that at any given moment there is more work to be done than there is time to do it. In what we used to call the test phase (now beta or pre-GA) there are defect reports. So how do you know what to work on next and/or when you're done? The traditional answer is priority. When you're doing feature development there are usually 3 or 4 priorities, P0, P1, P2, and sometimes P3. P0 is critical, P1 really important, P2 is nice, and P3 is thanks for mentioning, not gonna happen. Bugs are roughly the same, P0 means fix it immediately, P1 is can't ship/go GA with this problem, P2 people will be sad and we'll fix it in the first patch, and again P3 is thanks for the report, but …

Of course there are variations on this. There's "Unbreak Now" and "Recall class" bugs. Uber Core Business uses Level and Scope to define an outage, with Higher numbers being bigger problems. And as unusual as it is, in many ways that's better.

Because with any leveling scheme there's inflation. When I started at Microsoft tasks and bugs were P1 - P3. And we argued pretty loudly about it. My bug/feature is more important than yours, so I want to be higher priority. There was a lot of passion and energy, so the argument would continue and eventually end up with both at P1. Then the business would shift a little and suddenly there was a new "Most Important Thing". And instead of having the arguments again, the new thing became the most important. And we called it P0 to make sure everyone knew it was the most important thing.

Of course over the next few cycles, instead of everyone arguing for P1, they argued for P0. In general we held the line for a while, but, as with all things inflationary, we eventually lost the battle, and now P0 is the most important thing. Nothing changed except the labels in the tracking system.

I haven't seen a tool that supports P -1 yet, but I'll bet it's out there. We could do it with GitHub if we wanted to :)

But the real problem is that when you have more than 4 items in your list that breaks down. Instead of having actual priorities you just have buckets. And when you have buckets of work you don't have a priority list. You can't determine the order of things getting done because as long as you pull from the right bucket you can do whatever you want and by definition you're doing the right thing. But that's a topic for another day.

by Leon Rosenshein

It Depends

Time for a car analogy. What's the right way to make your car faster? More reliable? More efficient? Have higher resale value?

There's really only one answer to all those questions. And that answer is "It depends." It depends on what your priorities are. It depends on where you're starting. It depends on what you mean by those questions. It depends on how much you can spend to meet your priorities. Does faster mean top speed, trap time in the ¼ mile, or 0-60 time? Is reliability about MTBF, cost to repair, or total downtime? Is efficiency about moving one person from home to office, 50 people from a suburb to an urban core, or moving 400T of stuff from one end of a strip mine to another?

The same is true in software development. Want your software to be faster? Want it to crash less? Use less resources? Reduce time to market? If someone comes in with a silver bullet and says they know the right answer to that question a-priori, they're almost certainly wrong, and if they happen to be correct, in your exact case, they got lucky.

Sure, we have best practices, and we should probably follow them, but when you get down to it, those best practices are guidelines. If you really have no clue about what you're trying to do and why then best practices are a good place to start, until you know better. And that's the thing.

When you know better you should choose to do the right thing. Because the right thing depends on knowing why you're doing something. Engineering is about tradeoffs, but the only way to make informed decisions is to know what you're trading between, and why. Because *it depends."

Once you know what you're minimizing and what you're maximizing and what the cost functions are between them, you can get something close to the right answer. For your specific situation. At that particular time. With those particular constraints.

by Leon Rosenshein

Real Engineering

Here’s a question for you. Are you a programmer, developer, computer scientist, software engineer, hardware engineer, or something entirely different? Maybe you’re an artist working in the medium of bits? A data wrangler? Some combination of all of these, depending on the day and the task at hand?

For the last 50 years or so people have been trying to figure out if software development was an art or a science. Or was it engineering? When I was in college there was no such thing as a degree in software engineering. There were specialized electrical engineers that built computers, there were computer scientists that tried to figure out what to do with them, and the rest of us engineers that used them. The math department in the School of Arts and Science had a lot to say too, particularly around formal logic and correctness. But for most of us who were writing programs the computers were tools to do a job. Sometimes we wrote programs to help other people do their jobs, but writing code was almost always in service of some other task. And we treated it that way. Just get it done. Small groups, late nights.

Then I got out into the real world and something changed. I became a “software” engineer instead of a Mechanical and Aerospace engineer. But really, nothing else changed. Then I went to work for a game company, and instead of building software to do something, we built software to sell. And we had deadlines. And we missed them. So we tried to engineer harder. And we still missed our dates. Then I went to work for Microsoft. And they really engineered hard. Waterfall development. Months of planning. Then start doing. Still missed our deadlines a lot, but at least we saw it coming. But it was engineering. Requirements. Design. Plan. Build.

Then came Scrum and Agile and Extreme. Throw all that planning out. Just do something. Figure out the goal along the way. Don’t worry about done, just move fast and adjust as you go. We did ship things more often, but big changes got hard and we never really knew where we were going. It sure didn’t feel like engineering.

So the debate continued. Is it art or science? Craftsmanship or Engineering? Lots of people have thought about it and talked about it. I say it’s engineering. Engineering is not about doing the “perfect” thing. There is no perfect thing. It’s about tradeoffs and dealing with uncertainty and doing the best you can to meet the goals and priorities with what you have available. And one of the best explanations of not only that journey, but where we are now and how we can get even better at the process of what we do, comes from Glen Vanderburg in his Real Software Engineering talk. It’s about an hour long (45 minutes at 1.5x), but well worth the time.

by Leon Rosenshein

Outdoor Sports

Continuing on with the string of GlobalOrtho stories, image capture, both aerial and terrestrial, is, just like operating a robot car, an outdoor sport.

At the heart of the GlobalOrtho project was the UltraCam-G. Designed and built by our team in Graz Austria. Something like 200 MP, taking simultaneous RGB, Monochrome, and NIR images at 30cm resolution for the RGB image. And this camera was tested. Countless flights over Graz and the surrounding areas. Calibrated for physical construction, lens distortion, thermal drift, chromatic aberration and anything else the designers could come up with. The pictures were stunning. The 3D modeling was amazing. Not just 2.5D shells, but full 3D models with undercuts and holes. So we sent it out into the field.

And the feedlots were purple. The edges of the images were red. As I mentioned the other day there were spikes and holes. How could this have happened? These cameras were tested. Over and over again. And all the tests came back great. We sent one back for recalibration, but the before and after results showed no change, and the test images were spot on.

So we kept digging. And we realized a few things. Color balance. It turns out that Graz and the surrounding areas are Austrian Alps (who would have guessed). Lots of alpine forests and orange tiled roofs. And the software did great in those areas. But there aren't a lot of feedlots. And color correction was done in a lab. Yes, we used sunlight equivalent lighting, but the room was a few meters deep. Outside there were cloudy days, dusty days, humid days, and in some places smoggy days. Plus, the camera flew at 5000m, and with a +/-40° FOV, the amount of air between the camera and the ground was very different between the center of the image and the edge.

Geometry. Lots of church steeples and building corners. But no miles square corn fields with waving stalks. Or pastures with walking cows. Or large lakes. Or high rise urban cores with deep canyons. Lots of environments that weren't part of the test set. And the software struggled.

Why, because even though we captured hundreds of thousands of test images, and ran hundreds of test jobs. they were all basically the same operational domain. For all the hours we spent testing, we really only ran a few tests. Then we got out into the real world and the situations were different. So we had to evolve. Make things more dynamic and adaptive. Because that's the way the world is.

by Leon Rosenshein

Murder Mystery Theater - Acting All Roles

-- By Andrew Gemmel

There’s been a mild annoyance bothering developers on our team - and likely others - for a few months now. Occasionally the ssh-agent on development machines will die. Needing a ussh cert for most remote actions, the remedy for one terminal session is a quick eval $(ssh-agent) or more permanently, restarting the machine. We all chalked it up to a bad chef configuration or similar, at least until today.

Today, @mike.deats was debugging a separate IDE issue on his machine and noticed something odd. Without fail, he could reproduce this issue by running all tests in the atg-services repo. Ok, that’s disconcerting. A quick bisect effort isolated the problem to a single Golang package. One that I had written. Heavily unit tested, in fact notoriously so. This package is the taskhost program for the BatchAPI. If you’ve ever run a BatchAPI job, you can thank this code for its success. 

The taskhost is the thin wrapper between kubernetes and your user code that reports any issues back to the BatchAPI and ensures that your logs end up in the right place. The tests for this program basically mimic various job scenarios in kubernetes, kicking off a number of taskhost processes masquerading as docker containers and observing the state of the filesystem and output streams that result. 

In order to do this in a test environment, the taskhost always interacts with the outside world through a dependency injection context that provides things like a filesystem, log writers, AWS clients, and a shell process runner. 

Or at least, that was true until pull request 981 was landed. This was a late-night code change that I deployed while on-call to mitigate an outage. Long story short, an issue with the rNA log-reader was overwhelming the disks in our cluster and causing machines to hit 100% disk usage and get wedged. To mitigate this, that change deletes the log-reader cache in /tmp between each BatchAPI task run.

If you read through that PR carefully, you’ll notice that the RemoveContents() function I so carefully copy and pasted from StackOverflow does not use the dependency injection filesystem. That’s right, every single time the taskhost unit tests run on a machine, they delete everything in /tmp on the user’s machine, including the ssh agent’s ussh cert.

Wow. It’s a miracle that killing ssh agents was the worst thing that this mistake did. The corresponding fix was as simple as deleting that mitigation code, as the underlying log-reader problem has long since been remedied.

There’s a few lessons here. One, hot fix code is a necessary evil but checking it in without careful audit is A Bad Thing. Second, when that evil code is checked in, a ticket to ensure it’s removed as soon as possible would be A Good Thing. Third, debugging can often become a game of murder mystery theater where you are not only the detective, but the murderer and victim too.

by Leon Rosenshein

GIGO

Even the greatest algorithm can't correct for bad data. Ever hear of photogrammetry? Probably. It's using images to understand the physical world. We use it to map the world. Using stereoscopic techniques and two (or more) pictures of a scene from a known position, you can extract 3D information. Roughly speaking you find points in each image that are the same thing, then, correcting for all sorts of distortions, use the difference in camera locations and the directions to the point from each camera to calculate the position of that point relative to the cameras. Do that for enough points and you get a depth map. One way to find those points is with the SIFT algorithm. It's really nice because it handles differences in scale and orientation. And with our SDVs the images are taken at the same time, so the world hasn't changed between the images.

For aerial photography that isn't the case. Typically there's one airplane, with one camera flying over the area, taking one picture at a time, then looping around and flying a parallel track slightly offset. Repeat this pattern all day. To make the needed stereo pairs images are taken with lots of overlap, typically 80+% in the direction of flight, and 20+% between image strips. Using differential GPS, some Kalman filters, and lots of math, you can get pretty good location info for where the camera was when the image was taken, so that part is covered.

What isn't covered is that the world changes. Trees blow in the wind. Cars move. Waves wash up on the shore. Cows walk.

As part of the Global Ortho project we mapped the continental US and Western Europe with 30 cm imagery and generated a 2.5D surface map with about 4 meter resolution. We did this by splitting the target areas into 1° cells and collecting and processing data in those chunks. Turns out that flying each track, then turning around and flying back takes a few minutes. That means that pictures taken at the beginning of one strip and the end of the next can be 3-5 minutes apart in time.

And lots can happen in that time. Fast things, like planes, trains, and automobiles have moved far enough that the SIFT algorithm doesn't try to match them across images. Things that don't move far, like treetops blowing in the wind get lost in the image resolution. But things that move slowly, but keep going have a wonderful effect. Remember that cow that was walking? It probably gets the same SIFT id since it's a 3x5 black spot against a green pasture. And it didn't move that far, so it gets matched with the one from 3 minutes ago. The same thing happens with whitecaps on open water. Then we triangulate. And depending on which way it moved, you either get a spike or a well in the surface model. All because the cows don't stand still.

And those spikes kept lots of folks employed. Their job was to look at the model, find anomalies, then go into a 3D modeling program, and pound them flat. Yes, we gave them tools to find the issues and we did automatic fixup where we could, but we still needed eyes on all of the data to make sure it was good. All because a cow thought that patch of grass over there looked better. Which meant our data was a little messy. And the automation didn't understand messy data.

So keep your data clean. The earlier you identify/fix/remove bad data the better your results, the less manual correction and explaining of what happened you need to do, and the more your results will be trusted.

by Leon Rosenshein

Test It Again Sam

tdd

Unit tests, integration tests, black box tests, end to end tests, user tests, test driven development, demo days. There are lots of kinds of tests. And they all can provide value. But only if you run the right tests at the right time. And as with so many things, it comes back to context, scope, and scale.

You want to have enough inputs to test that it works, that the different combinations of flags/features/datasets all work together the way you expect. But not just that the correct cases are handled correctly. You need to test that you detect and provide useful error information if the inputs don't make sense, you can't handle them, or something goes wrong during processing. That's the context part.

For scope, you want to run just enough code to test the system under test. There's lots of range to scope. From an individual algorithm to a class/package/executable to a service/distributed service/ecosystem. And your tests and framework need to reflect that.

If you're testing an algorithm then write the algorithm and enough code around it to test that it works per the above. Mock out everything but the algorithm. Provide the data in the expected format. Know what the answer is supposed to be. Remember what you're testing (the algorithm). These kinds of tests are generally called unit tests.

Unit tests can also have a slightly bigger scope. If you're testing the external interface of a class/library/exe then you need to provide enough environment around it to run, but you need to control the environment. This isn't the time to run against the live dB in production. You don't want to upset the production system, and it's hard to make it respond consistently to a test. You want to provide enough constraints so that you're sure what you're testing and that when there's a failure you know where to look.

The next step in scope is the integration test. This is where you're making sure that two things that you know work "correctly" (however that's defined) by themselves work well together. In the Bing GlobalOrtho project we spent a lot of time using WGS84 coordinates. We threw around a lot of latitudes and longitudes. We did this in the image stitcher and the poisson color blender. And all of the unit tests worked. Perfect. Let's hook these things together. And it worked. Mostly But the further east/west we went the weirder it got. Then all of a sudden things started crashing. Turns out some things took in latitude, longitude, others took longitude, latitude. It was only during the integration that we found the problem. and of course, you need a more complex system to do integration testing, but it's still not the full thing.

Then there are end-2-end tests. *That's* where you run the whole thing, in something not entirely unlike the production environment. With known inputs. Expecting known outputs. Really good for making sure nothing has broken, but not good at all for telling you what went wrong. In Global Ortho when the color of the output images changed by more than a certain amount we first had to figure out why. And that usually took longer than the actual fix. But again, without that kind of testing we never would have known.

So what kind of testing is there after end-2-end? You've run out of scope, but now you get to scale. There are a few kinds of scale. Maybe your system can handle blending 50 images of roughly the same place, but what if you have 1000? Or 10,000? Or your system behaves correctly at 100 Queries/Sec (QPS), but sometimes you get 10,000 QPS or more? What happens when your dataset grows by 10x? 1000x? More? What about parallelism? Breaking things into 10 pieces might cut your almost 90%, but at 100 pieces it fails or takes longer.

Then there's the kind of scale that describes the test space. Your system does the right thing in a few cases, but there's a combinatorial explosion of possible cases and there are millions of tests to run. How do you scale to that?

Then there's black box testing. Go outside your system. Act like a user. Using an entirely different mechanism, test what you're doing with no knowledge of the system other than the external APIs. Even here there are two kinds of tests. Those that make sure things work right, and those that make sure things don't break. Because those are two very different things. And remember, as Bill Gates saw 20+ years ago, even with all the testing, sometimes things go worng

by Leon Rosenshein

Syntax Matters

But memorizing all of the possible syntaxes (syntaxi?) doesn't. In my career I've spent months/years with _at least_ Ada, Assembly (x86), Bash, Basic, Csh, C/C++, C#, Fortran (4/77), Golang, HTML, Java, Javascript, Pascal, Perl, Python, Scala, SQL, and VisualBasic (v6 and VBA). Then there's "config" file formats, css, ini, json, xml, and, yaml. What about .bzl, .csv, .proto, and .thrift? What about your favorite DSL? Are they config files? Languages? Who knows? Who cares? 

Can I sit down in front of a compiler and pound out syntactically correct code in all those languages today? Not even close. I could manage "Hello World" in most of them, with a little help from the compiler/interpreter, but others (ada) I don't even remember where to begin other than there's a header that defines everything, and a separate implementation.

And that's OK. The important thing is to be able to read what's there, understand what the impact is, and understand the structures and data flow well enough to make the change you want without having unintended impact on something else. And in any sufficiently large system the syntax can't tell you that. It can hint, it can guide, but it can't tell you what the class/package/method/library in the next file is actually doing.

Plus, there are lots of good resources available online to help with the syntax part. Between them and your IDE memorizing where to put a ;, the order of method parameters, or whether it's int foo; or var foo int isn't the best use of your time.

So focus on the important things. Understanding the code in front of you. Writing code that the next person can understand. Thinking about WHY you're doing the thing you're doing and if there is a better, more systemic solution. And look up the syntax when you need it.

by Leon Rosenshein

Rubber Ducky, You're The One

On the silver lining front, one nice thing about WDP is that I get to spend more time with my kids. My daughter has taken to sitting with me on and off during the day, sometimes doing her schoolwork, sometimes watching videos, and sometimes being my debugging aid.

The other day she noticed I was arguing with my computer, doing some Google searches, then yelling (quietly) at my computer again. After she got over being surprised that I was using Google to figure things out I started explaining to her what I was trying to do. I was writing a bash script to get the members of an LDAP group and then see which members of that group weren't in a different group. Sounds simple, right? Conceptually, yes, but I wanted to be able to share the code, so I was making it a little more "production ready" than I might otherwise have. It also involved some relatively simple usage of jq to extract some fields and I wanted to pretty print the results in a way I could pipe into the next part of the chain. And things weren't going exactly how I wanted.

So I explained to her the services I was calling, what I expected the results to be, and what I wanted to extract. I explained the weird symbology of bash variables and why there were single quotes, double quotes, pipes and what a /dev/null was. I told her what cerberus was and why I needed to use it. I even complained a little about yab and YARPC and why I wished I didn't have to use it. She asked me some questions and I explained the answers to her. And I got it figured out, got the results I needed, and was able to share the tool and the results I needed. Then I thanked her for being my rubber duck. Initially that confused her even more, but when I explained rubber duck debugging she got that immediately.

For those that don't know, rubber duck debugging is how you do pair programming when you're alone. You explain the problem, the invariants, the processes and the intermediate results to something, traditionally a rubber duck. And you go into as much detail as you need to make sure the duck understands it. What happens quite often is that you realize where your assumptions and understanding don't match reality. It could be a problem with your memory, the documentation, or something else entirely, but you find the disconnect, and you fix it. Or you find the disconnect and you go update your understanding and then you fix it. And even if that doesn't happen your understanding of the problem goes way up and you can then ask a much better question, which means you're much more likely to get an answer that helps. So next time you run into a problem and get stuck, ask a rubber duck.