The Team of Teams to End All Teams
This one left a bit of a mark on me, so it’s a bit longish.
2005-2007
Part A – Background
For my next adventure, I contracted into a “team” of software architects. A relatively large bank had acquired a smaller company with a “portal”-based portfolio management suite for financial advisors. The size disparities of the organizations (and their client-base) meant that integrating the teams and technologies had scaling problems.
> For my next adventure, I contracted into a “team” of software architects.
The architects stemmed from the smaller SaaS company, while “development” was a larger amorphous function from the bank; I had been “adopted into” the lineage of the SaaS company, thus in an outgroup as far as “development” was concerned. There was also IT and several sub-specialties (like DB management), QA, and probably a few others on the non-coding periphery of software development.
Part B – The Performance Wall
Financial advisors often provide portfolio summaries and managed account details to their clients as proof of their value. In 2005 this meant generating printable-PDF documents snapshotting account changes and balances over time periods (months, quarters, annually). Time-periods and/or diffs also allowed cool trend-line charts. The SaaS company had a database schema and processing to give advisors portfolio-based reports for their clients (accessible from the SaaS company’s “portal” suite).
The reports were crafted in Crystal Reports (the creme-de-la-creme of desktop dashboard reporting software in the 1990s and early 2000s). Crystal Reports was geared for desktop presentation (GDI, single threaded apartments), so it was as resource intensive as MS Word on a server. Crystal Reports for .NET wasn’t much better as it was still aimed at – and rooted in – it’s desktop dashboard legacy (again GDI+ rendering with STA caller expectations).
None of this was too much of a problem with the SaaS company’s original book of business, but the bank had a much bigger
base of advisors to support, many more portfolios to manage, and larger account sizes per portfolio.
The technology footprint just couldn’t keep pace with scaled demands. The production report system in its jacked-up
hacked-up way could top out at about 35 reports per minute.
> … the bank had a much bigger base of advisors to support, > many more portfolios to manage, > and larger account sizes per portfolio
When I started, I was told this bottleneck was the biggest problem (I would later discover it wasn’t), and that making it go faster was my goal. I broke the “speed” concerns into two major components:
- data query and results streaming speed to feed into report generation
- report file generation speed, itself decomposable into two sub-concerns
- templating data through the report system
- raw file IO for output generation
I was able to prove the queries themselves were plenty fast enough and could stream all necessary results sets to a single networked client process exceeding 10x what the production servers could output in report product – i.e., I could stream all result set rows needed for reports to my workstation from a dev DB and get about 450 reports/minute.
Part 1 taken care of. It wasn’t a query nor result-set size issue (nor a network bandwidth issue, but duh!).
By roughing out a replacement output generator (all the same data and numbers of pages while running the queries), and using a .NET native library (in this case Cete Software’s Dynamic PDF) to create PDFs directly to file, I ended up proving that the file IO on my workstation-class machine could keep pace with the throughput from the SQL server. Part 2(b) was taken care of, it wasn’t an IO issues, the templating process was the bottleneck; re-crafting the reports on .NET native package was the path to take.
Overall, a 10 fold increase in processing speed by switching technology and approach. However, by eliminating the report output problem as a focus, we soon realized the major problem with scaling to the bigger book of business was data-processing into the report-ready format. Also, getting “development” to realize they needed to own making templates in a home-grown report generation system was a bit challenging from my side of the organizational field.
Part C – Developing Report Generation Development
Getting “development” to productionalize the reporting framework (making it easier to use, building components, field-hardening the designer) and setting up projects to customize for specific customers (as an ongoing concern), was a bit of a challenge.
Development was from the bank-side and had more than it’s share of bureaucratized functional managers – born to some extent from needing to match financial regulatory capture demands – but also, larger organizations can breed more career-organizationally oriented people – bolstered by departmental responsibilities and guarded by departmental boundaries.
It was a challenge; but at least at that point it was a fairly straightforward one-on-one operation between me – a slightly empowered architect – trying to get a (bank) development manager – mostly just wanting detailed plans that could be track after assigning to staff developers – to understand that I wasn’t going to detail a plan for resources outside my control.
I remember some meetings, and am hazy on how well the transition from prototype to productionalized processing went…mostly because the “real” processing problems were manifesting as the technical scaling problems of output were receding.
Part D – The Data Sausage Factory
Back to the organizational departments… Besides Architecture and “development”, two or the more obvious functions – database administration and general IT (servers and network) – had some clearly identifiable areas of responsibility; usually they hummed along well-enough.
QA seemed to be a mandatory function for validating new software features, or ensuring no regressive problem were introduced with fixes or optimizations. This group seemed to me to be something folded-in, or built-out within the context of the bank, and what probably lacked inherent knowledge of the software (at least from the outset).
Running the services offering was what I call “system administration” – inside the context of the software’s
data processing space – made sure monthly data processing steps were working and performed in a timely manner.
They were largely responsible for data-integrity.
Here’s where the bigger problems came in – the multitude of data processing steps were quite involved – having long running, brittle and nigh-on impossible to rejoin mid-step if technical problems happened or data problems were discovered. The entire “process” was not well documented from the outset. There was a lot of manual intervention, file transfer channels, DTS/SSIS packages, stored procedures, spreadsheets, and manual tracking to make sure things were moving correctly.
Systems administration had been able to keep up with the smaller client base when it was a SaaS company and only had to produce quarterly reports for the smaller portfolio sizes it had. But with the bank’s book-of-business, the number of portfolios went up, the size of the portfolios went up, and the variety of sources went up.
> a lot of manual intervention, file transfer channels, > DTS/SSIS packages, stored procedures, spreadsheets, and manual tracking
To make it even more interesting, they were hoping to generate monthly reports, meaning that the pre-processing had to finish in well under a month (I call this out because for quarterly reporting, that wasn’t close to a reality).
Part E – Problem Solving – Collective-Style
With data-processing now the top priority, the idea of their being a single technical fix seemed less likely to me (and the rest of the architecture team), but management still thought solutions could be found. To this end, and to emphasize the importance to everyone, we all met on a regular basis in a conference room too small for all of us.
By all, I mean all; every member of every team, there was well over 22 people on average in a conference room with a table that sat maybe 12 people. There may have been more, my memory is fallible and my notebooks from that job I left at that site, where-in I’m sure I probably counted the personnel in the room more than once.
I quickly gauged I could sit on either the windowsill and balance well enough, or on some of the crappy metal filing-cabinets/shelves in the room. Meetings were at least a half-hour, and the corporal punishment of standing that long was not something I ever expected I’d encounter once I got our of primary education and away from parochial nuns.
We would go around the room and each person reported publicly and verbally what they had noticed and had done to improve the situation and performance. It was as if each should be finding the silver bullet, or helping make a clip of silver bullets, that would slay the foul beast that kept us from our goal. Since no one even knew the nature of the beast, there was a lot of shooting, or gun-shyness, exasperated by being able to take pot-shots in the direction of other groups, or worry about pissing off one’s boss. Some people, IIRC, could say “business as usual”, which some of us imitated to show the ludicrosity of the entire exercise.
If your were cynical like me, it could appear like some mix of Lord of the Flies, meets a Soviet Politburo meeting, meets the anarcho-syndicalist commune of Monty Python and the Holy Grail; but not as funny on the surface.
> By all, I mean all; every member of every team, there was well over 22 people on average in a conference room with a > table that sat maybe 12 people.
The only real progress we made in taming the beast, came when I decided we needed to at least document the steps the system administrators performed, and place the steps we could into a harness, so we could monitor and recover mid-stream where possible. In doing so, we could also monitor the performance characteristics of each step instead of getting ad-hoc information about “problems”. The meetings themselves didn’t stop, they just seemed inconsequential to those of us working so close to the beast.
We decided that was good enough, and between the architects, the system administrators, and the DBA arm of IT, we embarked on creating it. Eventually we documented somewhere around 34 big data-processing steps: with lot’s of data-loading, validation, intermediate calculations and reformulations. We started down the path to make it all work, and I remember specifically working late one night with the head of the architecture team, with him impressing me with just how technical he really was, and how he could roll up his sleeves and jam on the keyboard.
The solution paths were set, and I was getting tired of the overall cultural departmentalism. After finding another position, I gave two weeks – a manager in the development department (not the one I dealt with for the report generation) didn’t like me too much, and realized I had documented my stuff well enough already that I was given one week.
Back to .NET Mercenary Years (1)
On to Something Borrowed