Recent Posts (page 51 / 70)

May 7, 2020 by Leon Rosenshein

How Much Config Is Enough?

it depends

Pop quiz. You're building an application. What parameters do you put in your config file? The answer, like any other architectural question is, "It depends." On one extreme you spend a lot of time with your customers and bake everything into the application. Changing anything, from the name on the splash screen to the on-disk fully qualified path to the splash screen image is compiled right into the binary. On the other end, the only thing the application knows how to do is read a configuration file that tells it where to find the dynamic libraries and custom DSL files full of business logic that actually do anything.

Like any other tradeoff, the right answer lives somewhere between those two extremes, and is tightly coupled to your customer(s). As a rule of thumb, the bigger the spread of customers, whether it's scale (whatever that means), domain, expertise, commonality of use cases, etc, the more configurable you need to be. A simple single user personal checkbook application doesn't need much configuration. Some user info, some financial institute info, some categorization, and maybe a couple of other things. A US tax preparing application, on the other hand needs all that, plus a way to handle the different states and rules, including ones that change or get clarified after you ship.

So how do you approach the configuration challenge? First, think about your customer and what they need to do. What do they need to customize, and how often does it change? Can you come up with a sensible default for your majority case? Is there a convention you can follow for that default? Maybe the right answer is wizard that runs the first time (and maybe on demand) that walks the user through the setup. One nice thing about a wizard is that you can validate that the configuration chosen makes sense.

Another thing to think about is if the defaults are in the config file or the application. If it's in the file it's obvious what the knobs are at least, but then you get folks poking at them just to see what they do. And what if your customers don't make backups? They think they know what they want and change the default. How do they get it back? Maybe a better idea is a base configuration file they can't change with an override file. Safer, and probably easier to recover, but each level of override makes it that much harder to know what the actual configuration is.

But is a local configuration file the right place to store that information? In most cases, probably, but what about the enterprise? How do you centralize configuration? How do you keep things in sync? Central databases, Flagr, Puppet/Chef and pushed configuration? Semi-random user triggered pulls (`update-uber-home.sh` anyone)? All options, and unless you really understand the user problem you're trying to solve, you can't make the tradeoffs, let alone come up with the best answer.

When it comes time to figure out how to configure your configuration, it depends.

May 6, 2020 by Leon Rosenshein

Your Bug, Your Customer's Feature

It's not a bug, it's a feature. We've all heard it. We've probably all said it. But think about it. What exactly is a feature. It's the application doing something the customer wants, when the customer wants it done. It doesn't matter if that's what you expected. It doesn't matter if that's what the spec called for or the PM really wanted. It's fulfilling a customer need. And to the customer, that's a feature.

Back when I was working on Falcon 4.0 we kept a lot of internal display information in a couple of buffers. To make runtime debugging easier we turned those buffers into a shared memory segment and would run external tools to display it. It was so useful we created some additional structures and added other internal state. Really handy and let us do things like write to a memory mapped bitmap display (a Hercules graphics card) while DirectX had the main monitor.

The "bug" was leaving it on when we sent out beta versions to testers. Our testers were a creative bunch. Many of them had homemade cockpits. And multiple displays. One of them even had physical gauges that they used with someone else's product. And they found our shared memory. Then they decoded part of it. They used it to drive their own displays and gauges. Then, they thanked us for building such a useful feature and asked if the documentation of the memory layout was ready yet. And suddenly an internal debugging aid that slipped out to our beta testers became a feature. In this case it wasn't a big deal and didn't add a lot of support burden, but it could have.

When I worked on MS FlightSim backward compatibility was a major feature of every release, and not just the "supported" features. As a "platform" we choose to support all add-ons developed with the last 2 or 3 versions, and most of the rest, regardless of which bug/corner case, or side effect the add-on used. It made for great PR, but it was a lot of work.

So be careful what you allow to slip out. It will come back to haunt you.

NB: This is the kind of thing your users can do with your "bugs"

https://github.com/lightningviper/lightningstools/blob/master/src/F4SharedMem/FlightData.cs (Lines 177-229 are definitely from my original shared mem structure. I think the next 30 or so are. After that it's aftermarket stuff)

May 5, 2020 by Leon Rosenshein

Disk Errors

best practices

I've been working on datacenters and parallel computing for over 10 years now. Back in 2007 we had about 100PB of spinning disk and had our own home-grown distributed file system. Roughly equivalent to HDFS, but of course Windows (NTFS) based since we were part of Microsoft. We had filesets (roughly equivalent to an HDFS file) composed of individual NTFS files (blocks in HDFS terms), and instead of a namenode/journalnode we had a monstrous MS-SQL server that kept track of everything. And, like HDFS we had redundancy (nominally 3 copies) and that usually worked. But sometimes it didn't. And we couldn't figure out why.

Luckily, as part of Microsoft we had access to the folks that actually wrote NTFS and the drivers. So we talked to them. We explained what we were doing and why. Number of hosts. Number of disk drives. Number of files and filesets, Average file size. What our data rates were (read and write). We talked about our pipelines and validation. We listened to best practices. We learned that no-one else was pushing Windows Server that hard with that many disks per node. And all through the discussion there was one guy in the back of the room scribbling away furiously.

After about 30 minutes of this, he picks his head up and says. You have at least 2 problems. First, at those data rates at least 7 times a day you're going to write a zero or one to the disk, and it's not going to stick. Second, in your quest for speed in validation you're validating the disk's on-board write through cache, not the actual data on disk.

We'd run into one of the wonders of statistics. All disk writes are probabilistic, but we were pushing so much data that we were pretty much guaranteed to have multiple write errors a day. And then we compounded the problem by doing a bad job of validating. So we fixed the validation problem (not much we could do about physics) and things got a lot better. After that, over 10 years and scaling up to 330 PB of data, we only lost one file that wasn't caused by people deleting them by mistake. Human error is a story for another day.

But that didn't solve all of our problems. We solved the data loss problem, but not the availability problem. That was a whole different problem. That one was caused because we were too smart for ourselves. For safety we distributed our files randomly across the cluster. And many of our filesets contained hundreds of files, so we made sure we didn't put them together. That way when we lost a node we didn't take out all of the replicas. Sounds good so far. But that caused its own problems.

We had about 2000 nodes. How do you do a rolling reboot of 2000 nodes when you can only take down 2 at a time? You do it 2 at a time 1000 times. At 10 minutes a reboot that's 10000 minutes, 166+ hours if everything goes smoothly. 20+ days if you do it during working hours. That sucks. Also, disks popped like popcorn. Power supplies failed. Motherboards died. On any given day we'd have 10-20 nodes down. You wouldn't think that was a problem, but the way we did things, filesets were the unit of availability, not files, so if one file was broken then that instance of the fileset was down. And 3 different files on three different machines could, and often did, make all 3 versions of a fileset unavailable.

So how do you solve those problems? With intelligent placement. Since we had 3 copies of every file we didn't spread things out randomly, we made triplets. 3 nodes that looked exactly like each other. You could reboot ⅓ of the nodes at once. Do the whole lab in a morning. When a node went down we fixed it, and the other two carried the load. If we ever had 2 nodes in a triplet down ops knew and jumped all over it. And that's how we solved the availability problem.

So what's the lesson here? We ran into issues no-one had ever seen. Issues at all sorts of levels. From the firmware level on the harddrive to the macro level of how do you manage fleets of that size. Could we have foreseen them? Possibly, but we didn't. We had to recognize them as they occurred and deal with them. Because scale is a problem all its own. The law of large numbers turns the possible or occasional problem into a daily event that you just have to deal with. It's no longer an exception. It's just another state in the graph.

May 4, 2020 by Leon Rosenshein

Star Wars Security

Besides turning datacenters on and off, making them more efficient, and helping users take advantage of them, the other thing that's been occupying my time is coming up with a way to figure out who's making that call in the DC and if they are really allowed to do it. Otherwise known as Identity and Access Management, or IAM.

I could go into a lot of boring detail about cryptographically secure messages, the basics of TLS, x.509 certificates, OAUTH2, RSA, elliptical curves or some other buzzword, but not today. Today is Star Wars Day, so in honor of that, here's what happens to all 3 versions of your super-weapon when you don't take security seriously, Star Wars style.

If you trust me, drop this in your terminal:

telnet towel.blinkenlights.nl

If you don't, there's http://www.asciimation.co.nz/

May 1, 2020 by Leon Rosenshein

What's In A URI?

The other day I came across an article that asked if you could identify every part of a URL, then listed the 6 parts of a URL, scheme://domain:port/path?query=string#anchor. Now that's not wrong, in that those 6 parts, put together that way do make up a valid URL, but that's hardly all of the story. In the article's defense, it does say there are other parts and that those are the most common, but if you went into an interview and insisted that was the definition of a URL you wouldn't be "acing" the interview.

In reality, URIs (which include URLs) are made up of 5 parts, scheme://[authority]path?query#fragment, with each of those having its own definition. Scheme and path are required, but authority, query, and fragment are optional.

Yes, domain:port is one example of an authority, but so is bob:Password@contoso.com:6543. You won't see that very often, but it is valid. Between the scheme and the path there are some number of /'s. Sometimes (usually) there's 2 of them, but sometimes there's 3, and occasionally 0. I'm pretty sure 1 / is also valid, but I'm not actually sure. And you can have pairs of them inside the path section. It only looks like an on-disk path. It's not.

According to the spec, query is pretty much open. It's a string. By convention, it's a set of key-value pairs, but that's not required. Fragment is even less clearly defined since it's defined as a sub-resource inside the main URI, so it's totally dependent on the scheme.

And then there's character set. A URI is basically limited to [a-z,A-Z,0-9,._~] with a bunch of caveats depending on which part of the URI you're talking about. Any other character needs to be encoded.

So, as noted, URIs are complicated. And hard to parse correctly. The solution? Don't. You'll only get it wrong and get tripped up later. Use the one built into your language or find an appropriate library. For c++ that seems to be cpp_netlib_uri, and for python, urllib. For Golang/Java/C# (anyone actually using C# ?), there are great implementations in the standard library.

April 30, 2020 by Leon Rosenshein

Else, Switch, And Map

cognitive load

Back when I was in 7th grade I entered some kind of scholastic programming competition. No idea what it was called, or what most of the tasks were. I do remember it was 3 hours on a Sunday afternoon, we were using Basic on a TRS-80, and one of the tasks was to come up with a frequency chart of characters in a block of text. The code I submitted looked a lot like

1 DIM A as string 2 DIM I as integer 3 DIM C as string 4 DIM N(26) as integer 5 INPUT A 6 FOR I = 1 to LEN (A) 7 C = MID(A, I, 1) 8 IF C <> "A" AND C <> "a" THEN GOTO LETTER_B 9 N(0) = N(0) + 1 10 LETTER_B: 11 IF C <> "B" AND C <> "b" THEN GOTO LETTER_C 12 N(1) = N(1) + 1 13 LETTER_C: 14 IF C <> "C" AND C <> "c" THEN GOTO LETTER_D 15 N(2) = N(2) + 1

° ° °

82 LETTER_Z: 83 IF C <> "Z" AND C <> "z" THEN GOTO NEXT_I 84 N(25) = N(25) + 1 85 NEXT_I: 86 NEXT I 87 FOR I = 0 to 25 88 PRINT "There were"; N(I); " "; CHR(65+I); "'s" 89 NEXT I

Hey, It's ugly, but it worked, or at least it mostly did. Lots of copypasta and lots of copypasta errors. In the comparisons, the indices, and the GOTOs. But that was early in my career. Not long after that I realized I could have just used the ASC() function to get the ascii character and use that (suitably adjusted) as the index. So all those sequential IFs turn into a simple set of assignments. With a more modern language a map of character or string to integer would be even easier. The body of the loop turned into N[ToUpper(C)]++ Much easier to read, much less error-prone to type, and handles things that aren't alphabetic characters to boot. My original attempt didn't crash on numbers or punctuation, but it didn't count them either.

The point of this isn't that my code wasn't very clean 40 years ago (it wasn't), but that while you can take a very procedural approach to coding, a better choice is almost always to let the data and data structures guide you. Cascading if's are rarely a good idea. For smaller numbers of choices consider a switch. That can make things a lot easier to read at least.

For multi-dimensional things, you could do nested switches, methods. One approach I like in such cases is a multi-dimensional map. Let's say you need to calculate something which is a function of color, width, shape, and language. You could have one function with all of the inputs and a bunch of internal logic, some of which might be very different (shape/language differences). You could come up with a class hierarchy that handles it, create the classes, make a factory, and use it. Or, write the functions that are different, use the inputs and a map, and decide which one to call. In pseudo-code something like map[shape][language](func (c color, w linewidth, s shape, l language)), and use

calculator := map[shape][language] { {CIRCLE, ENGLISH, CalcCircleRomance}, {TRIANGLE, ENGLISH, CalcTriangleRomance}, {CIRCLE, FRENCH, CalcCircleRomance}, {TRIANGLE, FRENCH, CalcTriangleRomance}, {CIRCLE, ITALIAN, CalcCircleRomance}, {TRIANGLE, ITALIAN, CalcTriangleRomance}, {CIRCLE, RUSSIAN, CalcCircleCyrillic}, {TRIANGLE, RUSSIAN, CalcTriangleCyrillic}, {CIRCLE, BULGARIAN, CalcCircleCyrillic}, {TRIANGLE, BULGARIAN, CalcTriangleCyrillic}, }

func := calculator(shape, language)

if (func == null) { throw Unsupported } result := func(color, width, shape, language) ...

With this pattern the decision logic is collected in one place and it's clear what each pair of shape/language is going to do. The list of supported pairs is easy to see, and it's easy to extend. All of which tends to reduce cognitive load. Which, as I've mentioned before, is a good thing.

Obviously you can take this too far, and at some point a polymorphic hierarchy makes sense. But if you don't need that complexity this is a good compromise.

April 29, 2020 by Leon Rosenshein

Fear, Courage, and Professionalism

Rumors of layoffs suck. Let's be upfront about that. There are no platitudes that make it easy. I know. I've been there. Take all the downsides and issues from a layoff and then layer in lots of fear, uncertainty, and doubt. You don't know if or when something might happen, or even who might be impacted, but life goes on. Work goes on. So what do you do?

I'm not suggesting that you simply ignore rumors, social media, Blind, or news reports. That's just unrealistic, and probably impossible. Yes, the rumors will impact you. It will impact your productivity. It will impact your attitude towards yourself, your work, your co-workers, and your friends/family.

What I have found that works for me, for teams I've worked on, and teams that have worked for me, is professionalism and staying focused. Regardless of the rumors, we all still have our jobs. The tasks that were there yesterday are still there now, and ignoring them won't make them go away. In all cases ignoring the work makes things worse overall, and in most cases ignoring it makes it worse for you personally.

Consider the best case scenario. It was just a rumor and nothing happened. If you had stopped working then you're just that much behind and now you and your team need to make up that lost time.

What if there were some layoffs, but you still have your job. Again, the work hasn't gone away. There will be some kind of schedule adjustment, but the amount of work is the same, just fewer people, so lost time is even harder to make up. How do you want to be thought of in 6 months or a year? How you respond now helps define how you are seen in the future.

And what if the rumors were true and you are impacted. I don't know about you, but I like my coworkers, and I want them to have a positive impression of me. It's highly likely we'll come in contact again, and I don't want to be "that person". Whether it's getting stuff done, sharing as much knowledge as possible, or just listening, it's preparing for my future. And who knows, you might want to work for the company again in the future. You don't want to burn any bridges.

But what about preparing for the future? Like I said, don't ignore the rumors. This is probably a good time to touch up your LinkedIn page and make sure your resume is up to date. Think about what's important to you in your job and your career. You should be doing that anyway, but this is a good reminder. Regardless of how this ends up you'll be having discussions about your career with your boss, so knowing what you want always helps.

Finally, a few quotes from a couple of Frank's (Roosevelt1, Roosevelt 2, Herbert). Always useful, but maybe more timely now.

April 28, 2020 by Leon Rosenshein

80/20 Rule

Ever notice how projects seem to go quickly at the beginning and then take longer and longer when you get close to the end? Or the opposite, that making a few small fixes in a codebase can make everything better? That's the Pareto Principle at work. 80% of the work takes 20% of the time, and the remaining 20% of the work takes 80% of the time. Fixing 20% of the bugs eliminates 80% of the issues customers are seeing.

The Pareto principle isn't a universal content, and it's not about cause and effect. It's a post hoc observation. It's really just fitting reality to a power law distribution. You can't use it to predict which 80% takes 20% of the time, or even to say it's why things were distributed the way they were. That's not just post hoc, that's Post hoc ergo propter hoc

But even with that said, knowing that something is going to come up and take the lion's share of the resources (time, manpower, compute, etc) is something you can use along the way. Especially when you're doing something for the first time and don't have lots of experience. If you have 10 things to do in 10 days and when you start they're all unknown that's OK. However, if you get to day 5 and you've only completed 5 of the tasks, it might be time to start worrying. Sure, you've done half the tasks in half the time, but you have no idea if you've done half the work in half the time. And the Pareto principle tells you that there's a good chance that you haven't. So maybe it's time to tell someone the situation.

The other thing to be aware of, especially when you're trying to have a big impact on things, is to remember that it often applies the other way as well. 20% of your feature set is used 80% of the time. 80% of the processing time might be spent in 20% of your code. So nailing that 20% of your total feature set first or optimizing the right 20% of the code will have a much bigger impact than starting with a random work item and then randomly picking them until you run out of work.

But remember what I said at the beginning. The Pareto principle is a post hoc explanation of what happened. It's not the cause and it isn't a real predictor. It's just something that happens a lot and you need to be aware of. And 80/20 isn't a magical pair. It could be 90/10, 70/30, or it might not even add up to 100. It's just one more thing to keep in mind. 80% of the time.

April 27, 2020 by Leon Rosenshein

Awareness

The world is always changing. And the rate of change appears to be changing as well. The stone age lasted 3000 years, and ended 5000 years ago. The bronze age took the next 1800 years, ending 3200 years ago. Iron ruled for about 1500 years, and technology stabilized for a while. Then things took off again. The industrial revolution, the atomic age, the space race, the silicon era, all in the last 250 years. Or, to put it in perspective, my grandparents grew up before phones, my parents had a party line, and the computing power you have in the phone in your pocket is probably more than the entire world had in 1965. And here we are on the cutting edge of technology. But technology isn't just deep, it's wide. So how do you keep track of things? Not only what's happening now, but what's just being talked about now and won't be a "thing" for 2 or 3 years?

One of the things you can do is find a person or group of people you trust who do think about things like that, and keep track of what they're thinking and saying. In the computer world one of those groups is ThoughtWorks. They're a cross between a consulting company and a think tank. On the think tank side they put out something called the Technology Radar. About twice a year they get together and think about the state of the industry, what's happening, what's hot, what's past its prime, and what to get ready for. I've been paying attention for the last 5 years or so, and they've done a pretty good job of prediction. I think that comes from the fact that they're consultants who need to understand how things really work if they want to get paid.

Regardless, they've been doing it for about 10 years now, and they put together a retrospective of the big changes. It's a pretty good synopsis of how things have changed, along with some predictions on the next 10 years. Something interesting to check out if you've got a few minutes.

April 24, 2020 by Leon Rosenshein

On Language

Just a little humor today.

ACHTUNG

ALLES TURISTEN UND NONTEKNISCHEN LOOKENSPEEPERS!

DAS KOMPUTERMASCHINE IST NICHT FÜR DER GEFINGERPOKEN UND MITTENGRABEN! ODERWISE IST EASY TO SCHNAPPEN DER SPRINGENWERK, BLOWENFUSEN UND POPPENCORKEN MIT SPITZENSPARKEN.

IST NICHT FÜR GEWERKEN BEI DUMMKOPFEN. DER RUBBERNECKEN SIGHTSEEREN KEEPEN DAS COTTONPICKEN HÄNDER IN DAS POCKETS MUSS.

ZO RELAXEN UND WATSCHEN DER BLINKENLICHTEN.

The inspiration for your favorite language:

Python: What if everything was a dict?
Java: What if everything was an object?
JavaScript: What if everything was a dict *and* an object?
C: What if everything was a pointer?
APL: What if everything was an array?
Tcl: What if everything was a string?
Prolog: What if everything was a term?
LISP: What if everything was a pair?
Scheme: What if everything was a function?
Haskell: What if everything was a monad?
Assembly: What if everything was a register?
Coq: What if everything was a type/proposition?
COBOL: WHAT IF EVERYTHING WAS UPPERCASE?
C#: What if everything was like Java, but different?
Ruby: What if everything was monkey patched?
Pascal: BEGIN What if everything was structured? END
C++: What if we added everything to the language?
C++11: What if we forgot to stop adding stuff?
Rust: What if garbage collection didn't exist?
Go: What if we tried designing C a second time?
Perl: What if shell, sed, and awk were one language?
Perl6: What if we took the joke too far?
PHP: What if we wanted to make SQL injection easier?
VB: What if we wanted to allow anyone to program?
VB.NET: What if we wanted to stop them again?
Forth: What if everything was a stack?
ColorForth: What if the stack was green?
PostScript: What if everything was printed at 600dpi?
XSLT: What if everything was an XML element?
Make: What if everything was a dependency?
m4: What if everything was incomprehensibly quoted?
Scala: What if Haskell ran on the JVM?
Clojure: What if LISP ran on the JVM?
Lua: What if game developers got tired of C++?
Mathematica: What if Stephen Wolfram invented everything?
Malbolge: What if there is no god?

Shooting yourself in the foot, language style

What's your favorite? Share in the thread.

Older Newer