Performance Prescription

As I was looking to get out of the learning management company, I was headhunted into a contract-to-hire position at a comfortable rate; comfortable enough to make me forget about my “title” track.

The company was running a mail-order distribution pharmacy with a call-center for the sale and distribution of primarily a semi-generic, yet narrowly produced drug. There were more than a few “gray” business practices involving insurance mark-ups and inter-state pharmacy acquisitions (to get state licenses). Regulators and agencies with multiple letter initialisms began to take notice and the company folded.

But, that’s not my story – except how it’s dissolution forced me to find a new job at the end.

Performance Guru

My inaugural mission was to improve the overall performance of the web-based call-center software. For background, the company used a SQL Server based pharmacy management package, pushed way past it’s design tolerances for load. The software was probably meant for a small independent pharmacy, and had a late 1990s early 2000s desktop client-server Windows-based user interface (C/S-UI). Stored procedures, functions, and triggers did all the heavy processing.

While some of the staff used the Windows C/S-UI, by far the bulk of the call-center users accessed the system through a home-grown ASPNET MVC web-based interface. This interface presented a streamlined workflow for call-center order processing and insurance validation, calling into the SQL stored procedures to get things done.

Gathering performance metrics involved resetting a performance testing database, running standard sets of call-center user interactions through it at scale, and capturing durations for calls and pages.
That side of the process was managed by a very meticulous and competent fellow developer.
I would analyze the heavy hitters, and make anything that looked like a problem go away…that is, no longer be a problem.

Removing problems fell into three buckets:

  1. common-lookup caching
  2. user session footprint reduction
  3. database query tuning

Lookup Caching

Rather than running database queries to get invariant lists of data for presentation on a web page (such as in drop-down boxes), I developed an in-memory caching discipline based on generic dictionaries resolved and refreshed through dependency injection and occassionally polled against a single “freshness” table to determine whether any needed to be updated.

To get this data calls to the caching system were made instead of direct queries to the database. Also, some more complex dynamic data queries that joined to the lookup data were rewritten to only return the key values, and the result-sets were enriched by the lookups in the ASPNET layer…effectively performing out-of-band client-joins.

The overall approach was interesting enough that I carried over a re-implementation of the concepts to the next company I worked at after it.

User Session Footprint

The original web login relied heavily on the ASPNET session object to hold onto a snapshot of user permissions and similar stuff. IIRC, it was probably around 50KB. Since we had multiple web-servers, we used SQL server as the session store. The constant thrashing of 50KB back and forth caused some measurable pain, so first was to try using SQL in-memory DB (backed by disk-writes for recoverability), which did help a bit, but still caused sluggishness.

Realizing that most of the session state could be “enriched” by the caching system (storing permission group expansions), it was possible to reduce considerably the amount of information needed to get a working security principle, which I got down to about 700 bytes, so much so that 1000 users fit in under a 1MB rather than 50MB, meaning less stuff to throw around per web interaction.

Nowadays, I’d probably implement an OIDC token authentication system with roles embedded in it.

Database Tuning

Getting the development system to work faster also meant looking at SQL queries. Once everything else was eliminated, the longer running web requests often spun on SQL calls. I became well versed in Brent Ozar and spent a lot of time on index creation and maintenance (such as adjusting existing indexes to include other fields), or rewriting complex queries into CTEs and playing with breaking them into temporary tables and table variables, and sometimes hinting at the cardinality with judicious TOP statements to inform the estimator how many rows I expected to get back.

Operational DBA

Beyond the software development team itself, I also became involved in resolving performance and down-time problems with the live system. Mostly this amounted to monitoring SQL, alleviateing immediate query/process problems (i.e., identifying and killing blocking processes) and remediating performance problems that became operational blockages.

Between myself and one person on the infrastructure team, we covered all the daily DBA functions and firefighting for the production system. I often said that the two of us together were about 1.5 DBAs.

To get me out of the DBA role (which I had sort of fallen into again like at a previous job) I was largely involved in interviewing and hiring for a dedicated DBA. Most of the DBAs we interviewed had more of a focus on data-analytics and reporting systems, so even once we found someone, I still had to make sure we stayed on target for what we needed for DB administration goals. At that point, the three of us were about 2.25 DBAs, and we kept a tight mindset without ever having an official meeting.

SQL resource governor

Using resource governor to limit resources by querying system. The legacy C/S-UI had some screens and operations that could sink the server if left unchecked. Putting a cap on the memory and CPU for connections from the C/S-UI kept it from eating all the server resources and bringing the call-center to its knees.

Providing guidance on using C/S-UI

The unfiltered open-ended “monitoring” screens in the C/S-UI returned all active order rows for the pharmacy, which was probably fine if only a few hundred orders were being processed daily, but we were processing tens of thousands. The base application wasn’t built for call center, and had no safeguards to limit it’s use.

Since these queries could overwhelm the database with page and table locks, resource governor couldn’t help.

I kept pushing to find out what users were doing with the screens in the legacy C/S-UI so we could create safer alternatives in our system that didn’t slam the tables. Never got there before the company folded.

Developed query tools for monitoring locking and blocking

Using SQL Server’s management views, I made SQL procedures that would recursively query blocking calls to determine what queries were the culprits of blocking storms. I also had a few other routines for monitoring the number of active and waiting tasks (with run duration), locks counts by object and type, and a few other hotspot monitor queries.

In-Place System Upgrade Planning and Execution

Since I had started, the development team had been developing an “enhanced” version of the call-center application (this was the subject of the performance testing and tuning). Eventually we needed a plan to install it, without disrupting operations more than a weekend.

We had a very smart and industrious DevOps developer who had parted ways with us to go back to his consulting lifestyle, but left us a PowerShell and Octopus Continuous Integration (CI) system that was mostly what we needed to do the migration.

I began modifying a copy of the CI deployment system, making the migration system we needed, so that we could reliably run the migration from production backups to the new database servers setup to run the new version. The plan (including network changes, connection strings for the legacy C/S-UI apps on the desktop VM images) and implementation for the migration became mine to oversee, since by the time it became apparent to other people we needed a plan, I was already about 40% into it.

The actual upgrade/migration went relatively smoothly IIRC, except one developer hadn’t pulled the correct backup and we hadn’t discovered it until well into the DB restore on the new environment. We lost a few hours to that.
If I also remember, there were a few early morning issues with getting internal DNS records and connections strings correct – we had an availability group split across a WAN, so SQL connections and DNS entries needed some coaxing.

People and Management – in Summary

The development manager was top-notch, mostly because she kept the upper echelons of the company away from our daily cycles; and the few interactions we had with the upper management during the early days we were glad she was handling them and we weren’t. She understood what people told her, and asked the right questions.

Eventually other upper echelon people were hired above her – making our direction less clear. I believe the desire to take the company public was a driving factor in this. As great as she was for operational development, having her name near the top of a prospectus wasn’t going to push any investor over the edge to buy-in.

The team had about a dozen people. They were a broad cast of characters: some career-minded hoping to be managers, some workaday developers, some people who had become software developers because their tinkering had built something useful, some very young getting initial experience, some older who lived their technical life constantly in search of the new.
The hiring discipline was usually pretty spot-on, and I found myself interviewing people to bring on-board despite my aversion to interviewing people who didn’t work for me directly. Due to the loose nature of the organization and the support of direct management, I found this palatable.

We did have regular status meetings, but since we had no automated tracking board, we weren’t genuflecting to the board, but meeting to ensure everyone was progressing. Interpersonal conflicts? Sometimes. Generally, we respected each other’s space and roles; at least that was my take.

When the company started having legal problems and had to cease operations, I was actually sort of sad to see it go.

Back to Title Quest-1