Skip to Content

In My Experience

3 posts

IT Support shenanigans from experiences past and present

Posts tagged with In My Experience

The Helpdesk Illusion: What Studios Lose When IT Stays a Support Function

A Project Manager once told me, in a tone that suggested this was obvious, that IT existed to facilitate the needs of Production and resolve their tickets.

He was not being dismissive. He was being sincere. That is what made it worth thinking about.

What That View Assumes

The helpdesk model of IT has a coherent internal logic. Users have problems. IT fixes them. The faster the fix, the better IT is performing. Metrics are clean. Accountability is simple. Everyone understands the transaction.

It also assumes that the problems users can see are the only problems that matter.

They are not.

The problems users cannot see, the ones that accumulate quietly in the background of every studio that treats IT as a support function, are the ones that eventually surface as compliance violations, security incidents, unplanned outages, and the specific kind of expensive chaos that arrives when nobody was governing the infrastructure because everyone assumed someone else was.

What AI Just Made Worse

Generative AI tools have accelerated something that was already happening. Production teams move fast. They adopt tools that solve immediate problems. They do not, by default, think about where the data goes, who owns the outputs, what the licensing terms say about commercial use, or whether the API key being used belongs to a personal account or a company account with no audit trail.

That is not a criticism. That is the natural behavior of a team optimizing for throughput.

But every AI tool adopted without IT involvement is a governance gap. It is a potential data residency violation. It is an unlicensed asset waiting to become a legal dispute. It is credentials stored somewhere that will never appear in an offboarding checklist. It is a model trained on studio IP under terms nobody read.

None of these failures announce themselves at the ticket queue.

The Two Studios

Consider two studios at the same scale, with the same headcount, building the same kind of game.

In the first studio, IT is a helpdesk. It resolves access requests, replaces hardware, and maintains the network. When Production wants to integrate a new AI concepting tool, they sign up, share the credentials across the team in a group chat, and get to work. IT finds out when something breaks. Security finds out during an audit, if there is one. Legal finds out when there is a dispute about generated assets used in a shipped product.

In the second studio, IT is infrastructure. When Production wants to integrate the same AI concepting tool, IT is in the room before the contract is signed. The licensing terms are reviewed. The data processing agreement is assessed. The tool is integrated into the identity management system so access is provisioned and deprovisioned with the rest of the user lifecycle. The outputs are logged. The cost is tracked against a known budget line. When an artist leaves, their access to the tool leaves with them.

Both studios shipped a game. One of them has a defensible audit trail. The other has a group chat with a password in it.

What IT Actually Governs in a Modern Studio

When AI is part of the production pipeline, IT is no longer managing network drives and build servers as its primary function. The scope is different now.

It includes the infrastructure that makes AI workloads run, including GPU provisioning, cloud architecture, storage at scale, latency between systems that need to communicate in real time. It includes the licensing and vendor governance for every AI service the studio touches, which in a mid-sized studio in 2026 is likely more services than anyone has formally counted. It includes the security posture of those services, meaning how credentials are managed, how outputs are stored, how access is controlled when someone moves teams or leaves the company. It includes the compliance and audit readiness that determines whether the studio can defend its use of AI-generated assets if that use is ever challenged.

None of that fits in a ticket queue.

The Contractor Math

There is a version of the helpdesk model that its proponents sometimes use to justify it: if IT is not doing infrastructure, then contractors can do infrastructure when it is needed.

This is worth examining directly.

A studio that outsources infrastructure governance to contractors does not save money. It defers cost and loses continuity. The contractor who builds the cloud environment does not maintain it. The contractor who handles the security audit does not monitor it between audits. The contractor who reviews the AI vendor contracts does not track the renewals or the usage. Each function, separated from the others and handed to an external party on demand, costs more and produces less than a single internal function that owns the full scope.

The helpdesk model does not eliminate the need for infrastructure governance. It just means the studio pays more for it, gets less of it, and has nobody internally who understands the whole picture when something goes wrong.

The Actual Question

The Project Manager's comment was not wrong about what IT does in a helpdesk model. He was describing the reality of studios that have made that choice, intentionally or by default.

The question worth asking is not whether IT should resolve support tickets. It should. The question is whether that is the ceiling or the floor.

In a studio where AI is now part of the production pipeline, treating IT as a helpdesk is not a lean organizational choice. It is a governance gap with a timer on it.

The timer runs until the first audit, the first breach, the first legal challenge to an AI-generated asset, or the first time a critical vendor integration fails and nobody inside the building knows how it was built.

Whichever comes first.

Directing the Move: Learning by Shifting a Game Production Studio

Moving is a challenging task on its own, but moving a production studio without major disruptions is another beast altogether. I have planned and executed a full studio migration a couple times now.

This is the story of when I moved an active mobile production studio from one end of Abu Dhabi to the other. I designed the network, coordinated the contractors and kept production running through all of it.

The Setup

In late 2021, an active mobile production studio in Abu Dhabi began planning a move from Park Rotana Complex in Khalifa Park to the newly constructed Yas Creative Hub on Yas Island.

By the time IT was brought into the planning process, the broad decisions had already been made. The timeline existed. The building was signed. What remained was the execution, and the question of how to move a live production environment across the city with a weekend of maximum downtime.

I had one person on my team. That person was me.

The planning phase ran from November 2021 to March 2022. Physical execution began in March and concluded with the studio officially opening in October 2022. In between those two dates lived approximately eleven months of contractor negotiations, municipal certifications, regulatory checks, ISP transitions, and the specific kind of creative problem-solving that only emerges when something has gone genuinely wrong.

What follows are the lessons I took out of that process. Each one is the product of something not going according to plan.

Lesson 1: Contractors Will Interpret Your Plans Creatively

The Townhall area was one of the centrepiece spaces in the new studio. Staircase-style tiered seating, designed to hold the full studio for all-hands presentations and company-wide events. Significant square footage. Significant investment.

At some point during construction, the contractor expanded the footprint of the Townhall seating into the adjacent office space.

This was not in the drawings. This was not discussed. This was a unilateral creative decision made by someone who either misread the plans or chose not to read them at all.

The options at that point were: tear down the built staircase structure and rebuild within the correct boundaries, or reduce the Townhall seating to fit the original allocation and accept the smaller configuration.

Tearing down the staircase would have added weeks to the timeline and reopened a cost conversation nobody wanted to have. I made the call to reduce the seating and move on.

Key Lesson: Treat every contractor deliverable as a draft until you have physically walked it. Do not assume that architectural drawings translate into accurate builds without supervision. The gap between what is planned and what is built is the gap you do not check on.

Lesson 2: "That's Normal" Is Not a Technical Answer

The Yas Creative Hub building was new. The HVAC systems were new. The air conditioning units for the studio floor were new.

When the AC units were tested, they leaked water.

All of them.

The contractor's position was that this was normal behaviour during initial testing. Condensation. Expected. Nothing to worry about.

I did not accept this.

Water leaking from ceiling-mounted AC units directly above workstations, server infrastructure, and production hardware is not a commissioning quirk. It is a liability. I pushed back, escalated, and held the sign-off until the units were inspected, the drainage systems corrected, and a dry test was completed.

The contractor was not happy about this. The contractor was also wrong.

Key Lesson: When a contractor tells you a failure mode is normal, ask them to put it in writing. The willingness to document tends to clarify the situation quickly. You are not an expert in HVAC engineering. You are an expert in what happens to your infrastructure when water falls on it.

Lesson 3: Certification Will Slip. Plan for It to Fail Three More Times.

The datacenter required fireproofing certification from the Abu Dhabi municipality before it could be approved for occupation. This is not optional. This is not a formality. Without it, the datacenter does not open.

The first inspection failed. The contractor had fireproofed the upper walls but not the lower portion where the raised floor began. The inspector identified the gap immediately.

The contractor returned, made corrections, and scheduled the second inspection.

The second inspection failed. Same issue, different section of wall.

By the third scheduled inspection, I attended in person. I walked the inspector through the space before the formal review. I had already identified and flagged the remaining gaps to the contractor the day prior. We tore down the wall completely, sealed the entirety from the ceiling all the way to the gaps in the raised floor that were flagged the previous times and then rebuilt it in a day.

Eleven months of planning. Three fireproofing inspections later, the datacenter was finally approved.

Key Lesson: Regulatory certification does not run on your project timeline. Build buffer into every sign-off that involves a third-party authority. Assume one failure at minimum. Assume two if the contractor has already demonstrated they are reading the requirements selectively. The municipality inspector is not your adversary, they are the only person in the room who has no reason to cut corners.

Lesson 4: Someone Has to Be in the Room

Network cabling for floor boxes sounds like an unattractive task. Route the cables, terminate the connections, test the links. Standard process.

The studio floor had clearly defined user desk areas. It also had collaboration zones, separate open spaces designed for informal meetings, breakout sessions, and team clusters away from the main desk rows.

The cabling team forgot about the collaboration zones entirely.

This was not discovered during the cabling phase. It was discovered when the floor was nearly complete and someone asked why the collaboration areas had no connectivity.

The answer was that nobody had been watching.

Separately, the meeting rooms, conference room, and Townhall all arrived with furniture below the specified quality. Table microphones and ceiling speakers were installed without acoustic testing. The first time a meeting was held with both active simultaneously, the room fed back on itself.

The soundproofing was inadequate. I had to require the contractor to return and correct the acoustic treatment before the rooms were signed off.

Neither of these failures were inevitable. Both would have been caught earlier with consistent on-site supervision.

Key Lesson: Delegation to a contractor is not the same as oversight. Also, expecting the contractor to read the plans carefully is a mistake on its own. For any build task that involves multiple phases or multiple teams, assign a human being, ideally yourself, to physically verify completion at each stage. The cabling team did not maliciously skip the collaboration zones. They were not reminded those zones existed.

Lesson 5: Build for Airflow Before You Build for Aesthetics

The build machines, the systems used for compiling and packaging the game builds, were racked in a dedicated enclosure. The rack was purpose-built for the space. Solid construction. Clean cable management. Doors on all sides.

The machines were fully sealed.

When I tested the systems with the doors closed, temperatures climbed immediately. The enclosure had no rear ventilation. Hot air had nowhere to go. Left unaddressed, the build machines would have begun throttling and eventually failing within weeks of production use.

The fix was not elegant. I instructed the team to cut the back panel off the enclosure to create airflow. A purpose-built rack, freshly installed in a new studio, was modified with a cutting tool before it ever went into use.

It worked. The temperatures normalized. Production was never impacted.

Key Lesson: Thermal management is not an afterthought. When specifying any enclosure, rack, or cabinet that will house active compute hardware, airflow path is a primary requirement, not a secondary consideration. A rack that looks correct but traps heat is worse than no rack at all. Ask where the hot air goes before you approve the build.

Lesson 6: WiFi is a People Problem, Not a Space Problem

The wireless network for the new studio was designed based on the floor plan. Access points were positioned to achieve even signal coverage across the studio area. The planning looked correct on paper.

When the studio opened, one team reported consistently poor wireless performance. The QA team.

QA teams in mobile game production operate with a high density of devices. Each tester runs multiple handsets simultaneously, sometimes four to six devices per person. A team of five QA testers represents potentially twenty-five to thirty active wireless devices in a single area.

The QA team had been seated in a corner of the studio. The AP coverage in that corner was designed for standard user density. It was not designed for the device density of a QA floor.

The fix required repositioning access points and adjusting the wireless design to reflect actual usage patterns. It was not a complicated fix. It was a fix that should not have been necessary if the seating plan was completed earlier on.

Key Lesson: Plan your wireless network after you have confirmed where every team is sitting and not before. Coverage maps measure signal strength across physical space. They do not account for the number of devices a given team will connect. Seat your highest-density teams first. Design the wireless network around them.

Lesson 7: When the Plan Fails, Build a Fallback

Building contract dates can be challenging to manage when there is a large scale migration like a studio move happening. But these are the kind of scenarios that you should include buffers and fallback plans for.

The contract with the Park Rotana building ended in August 2022. The original agreement with the contractor was that the new studio would be ready by then. It was not.

Electrical approvals had not completed. The site safety inspection had failed. The physical move approval was delayed until a full cleanup could be completed, estimated at end of September at the earliest. The old building contract was expiring. The new building could not legally be occupied.

There was a gap. A real one.

I had an internal discussion with the Yas Creative Hub facilities team and confirmed one thing: moving equipment in for installation and testing purposes was permitted. The building did not need an occupancy permit for machines. Only for people.

That was enough.

I formulated a plan around that single fact. The ISP line was installed and activated in the new building. I made the decision to break the firewall high-availability pair and move one unit to Yas Creative Hub, making it the primary. The network was built out and tested. Then on a single weekend, I shut down the datacenter at Park Rotana, moved the servers and build machines to the new site, and brought the full environment back up, accessible over VPN.

The HA pair was reconstructed. The build pipeline came online. On the last week of August, the studio went fully remote.

For six weeks, an active mobile game production studio operated entirely over VPN from a datacenter that sat inside a building nobody was allowed to enter yet. Production continued without disruption. No milestones missed. No pipeline failures.

When the second safety inspection passed and the occupancy permit was issued in mid-October, the studio opened. The physical move for employees was the final step, a natural transition from working remotely to coming to a brand new studio.

Key Lesson: When the original plan becomes impossible, the question is not how to restore it. Rather, think about what new plan can be made with the new constraints. The building facilities conversation was not a workaround, it was a requirements discovery session. Understanding the exact boundary of what was permitted revealed a path that the original plan had never considered.

The initial plan failed. The migration and the timeline did not.

Commonalities and Uncommon Paths

Reading these back, there is a common thread across the first six lessons that I did not fully understand until I was well past the project.

Every one of these six challenges was a visibility failure.

The contractor expanded into the wrong space because nobody caught it early. The AC units leaked because I was expected to take someone's word for it. The fireproofing failed twice because I was not in the room. The collaboration zones had no cabling because nobody was watching. The rack had no airflow because aesthetics were evaluated and thermals were not. The WiFi was weak in the QA corner because people density was never part of the wireless planning conversation.

The lessons here were earned, not studied.

Handling situations like the seventh however, requires an understanding of the situation beyond what the defined IT jurisdictions are. If I had simply worked with what IT Infrastructure teams were supposed to work within, I would not have thought of a Hail Mary such as moving a portion of the equipment over even though there was no entry permit to the new building.

A studio migration is not an infrastructure project with a construction component. It is a coordination problem that happens to involve infrastructure. Contractors, inspectors, furniture vendors, ISPs, and municipality authorities are all operating on their own timelines, with their own definitions of done.

Your definition of done is the one that matters. The only way to enforce it is to be present, to verify directly, and to treat every sign-off as provisional until you have seen it yourself.

Checklists are great to have. Structured plans are amazing to create. Without expecting failures and planning buffers in the project however, you invite chaos.

I had no predecessor to learn from. No internal playbook. No one who had done this before and left notes.

This is me leaving the notes.

Data Security Posture Management in Practice

Data Security Posture Management is often discussed in abstract terms: Discovery. Classification. Governance. Remediation.

In reality, posture failures surface during high-pressure events: Migrations. Audits. Incidents.

This story from my experience illustrates how incomplete visibility can translate into operational disruption.

The Scenario

During a large-scale Microsoft tenant-to-tenant cloud migration, the IT team executed a structured migration plan:

  • Exchange mailboxes migrated
  • SharePoint sites migrated
  • OneDrive data migrated
  • Teams environments migrated
  • Permissions mapped and validated

From an infrastructure perspective, the migration was comprehensive. What was missing was discovery. The production team had been using Microsoft Loop as their primary planning environment. Critical project-planning data lived entirely within Loop workspaces. IT had no inventory of this usage. No classification. No tracking. No migration mapping.

When the production team accessed the new tenant, their planning data was incomplete.

The migration had technically succeeded. Operationally, it had not.

What Went Wrong

This was not a tooling failure. It was a visibility failure.

There was:

  • No centralized inventory of SaaS workloads in use
  • No monitoring of newly adopted Microsoft 365 services
  • No sensitivity tagging tied to workload discovery
  • No structured data ownership validation before migration

Loop usage had never been formally onboarded into governance oversight. It existed within the tenant, but not in IT's operational awareness or the business-critical software inventory.

This is a classic posture management gap.

The Consequence

Once the gap was discovered, the organization faced a time-critical recovery scenario.

The only viable path was manual intervention:

  • Identifying affected Loop workspaces
  • Exporting data from the source tenant
  • Recreating workspaces in the destination tenant
  • Copying content manually
  • Validating completeness with production stakeholders

The remediation effort took six full days.

Six days of cross-team coordination, late hours, manual verification, and elevated stress. The migration timeline was disrupted. Trust was strained. Risk exposure increased. The damage to reputation and team trust was far harder to repair than the actual missing data.

All because discovery had not preceded execution.

Where Data Security Posture Management Would Have Helped

A mature posture management capability would have reduced or eliminated this disruption.

1. Continuous Discovery

Automated workload inventory would have revealed:

  • Active Microsoft Loop workspaces
  • Volume of content stored
  • User adoption patterns

Loop would have been visible as a production-critical workload rather than an unnoticed collaboration tool.

2. Data Classification and Sensitivity Mapping

If planning artefacts had been labelled according to sensitivity or business criticality:

  • High-value workspaces would have been flagged
  • Migration planning could have prioritized them
  • Data validation checklists would have included them

Classification provides a signal. Without it, all data appears equal.

3. Pre-Migration Posture Assessment

A structured posture review before migration would have asked:

  • Which workloads are actively used
  • Which contain business-critical data
  • Which services fall outside standard migration tooling

That assessment would likely have surfaced Loop usage early, while remediation was still simple.

4. Ownership and Accountability Mapping

Posture management also clarifies data ownership. If each collaboration workspace had a defined business owner:

  • Owners would have been engaged during migration validation
  • Confirmation of completeness would have occurred before cutover

Instead, ownership discovery happened after the disruption.

The Operational Lesson

Data Security Posture Management is not only about compliance and regulatory alignment. It is about operational continuity. When IT lacks visibility into:

  • Emerging SaaS workloads
  • Shadow adoption of collaboration tools
  • Data criticality distribution

Strategic initiatives such as tenant migrations become risk multipliers. Infrastructure execution without data awareness creates blind spots.

From Discovery to Remediation

In this case, remediation was manual and reactive. It consumed six painful days because the discovery occurred after the impact. A mature posture management lifecycle would follow a different sequence:

  1. Discover workloads and data locations
  2. Assess sensitivity and criticality
  3. Validate ownership
  4. Incorporate findings into migration design
  5. Execute with verified scope

Remediation then becomes exception handling, not crisis response.

Conclusion

The tenant migration did not fail technically. It failed from a posture perspective. The absence of continuous discovery and workload awareness turned a standard cloud migration into a six-day-long recovery exercise.

There is an additional lesson that is often overlooked. In a fast-moving or rapidly-growing environment, it is common for teams to adopt new collaboration tools outside formal governance workflows. Without structured discovery and verification mechanisms, these adoptions remain invisible to migration planning.

In this case, there was an implicit assumption that all production critical planning data was known and accounted for. That assumption proved incorrect.

Verbal confirmation is not validation. IT leadership must independently verify workload usage, data locations, and service dependencies before executing high-impact changes. This means conducting technical discovery scans, usage analysis, access reviews, and controlled testing rather than relying solely on stakeholder declarations.

Data Security Posture Management formalizes that discipline. It replaces assumption with evidence. It ensures that the business teams' beliefs are technically validated before transformation begins.

Infrastructure planning without independent verification is highly risky. Continuous posture management closes that gap and converts uncertainty into measurable control.