Skip to Content

Tech Diaries

10 posts

My thoughts on IT

Posts tagged with Tech Diaries

State of IT Part 7: IT in Video Game Studios

Why helpdesk fundamentals are not enough in an industry where the entire technology stack can shift twice in a decade.

In the previous installments of this series, we have discussed efficiency during lean periods, understanding colleague workflows, security posture, operational resilience, AI guardrails, and the balance between innovation and stability. These topics apply broadly across industries. But there is one sector where every single one of those themes converges with an intensity that few other environments can match.

Video game development.

This is an industry that builds some of the most technically demanding products in the world, products that ship to millions of consumers simultaneously, yet often treats IT as an afterthought. The expectation in many studios is that IT exists to set up machines, reset passwords, and keep the Wi-Fi running. That expectation is not only outdated. It is actively harmful to production.

Not Just Another Tech Company

From the outside, video game studios look like any other technology company. Developers write code. Artists use workstations. Designers collaborate in shared tools. There are servers, networks, and cloud subscriptions.

The resemblance is surface-level.

Underneath, a game studio operates more like a film production crossed with a software engineering firm, running on timelines dictated by hardware manufacturers, platform holders, and a consumer market that has no patience for technical excuses. The production pipeline in a game studio is not a simple sequence of inputs and outputs. It is a dense, interconnected web of proprietary tools, middleware, engine builds, asset management systems, version control at massive scale, render farms, build distribution platforms, QA infrastructure, and live service backends. All of it moving in parallel. All of it interdependent.

An IT team that does not understand this pipeline is not supporting the studio. It is merely occupying space within it.

The Production Pipeline Is the Product

In most industries, IT supports the business process. In game development, IT is embedded within the product pipeline itself. Consider the chain of dependencies in a single day of production at a mid-to-large studio:

  1. Artists check in high-resolution assets through version control systems like Perforce, often pushing hundreds of gigabytes per day across distributed teams.
  2. Those assets are ingested by the game engine's build pipeline, which compiles, cooks, and packages them for target platforms.
  3. Build servers run continuous integration, producing testable builds for QA, design, and leadership review.
  4. QA teams deploy those builds to dev kits or test clouds, test hardware, and cloud-streaming environments.
  5. Multiplayer engineers rely on backend services, databases, and matchmaking infrastructure that must mirror production environments.
  6. Live operations teams monitor telemetry, player data, and service health in real time once the game ships.

If any single link in this chain breaks, the downstream effect is not a minor inconvenience. It is a production stoppage. A failed build server can halt an entire studio's daily progress. A misconfigured Perforce proxy can turn a ten-second file sync into a twenty-minute ordeal, multiplied across hundreds of users. A network bottleneck during asset ingestion can delay milestone submissions by days.

IT teams that view their role as separate from this pipeline will consistently be blindsided by the urgency and complexity of the problems they are asked to solve.

The Tectonic Shifts: How the Ground Moves Twice a Decade

Most industries experience technological change gradually. New software versions roll out. Cloud migrations happen over quarters or years. Hardware refreshes follow predictable depreciation cycles.

Video games do not operate on that cadence.

The games industry is tethered to hardware generations and platform evolution in a way that few other sectors are. When a new console generation launches, when a new graphics API becomes standard, when a major engine overhauls its rendering pipeline, the ripple effects are seismic. These shifts tend to arrive roughly every five to seven years, meaning that within a single decade, the foundational technology that a studio's entire workflow depends upon can change fundamentally. Twice.

This applies to mobile studios just as acutely, though the shifts take a different shape. Mobile game development is governed by the release cycles of Apple and Google. A single iOS update can deprecate rendering frameworks, change how push notifications behave, or alter memory management rules overnight. Android fragmentation introduces its own layer of complexity, where a game must perform acceptably across thousands of device configurations with wildly different chipsets, screen resolutions, and OS versions. When Apple transitioned from OpenGL ES to Metal, or when Google began enforcing 64-bit requirements and target API level mandates, studios that were not prepared lost weeks of production time scrambling to comply.

Consider what has shifted in the last decade alone:

  • Console generations transitioned from the PS4/Xbox One era to the PS5/Xbox Series generation, requiring entirely new dev kit infrastructure, updated SDKs, and new build configurations.
  • Mobile platforms moved through multiple seismic shifts: the deprecation of OpenGL ES in favour of Metal and Vulkan, mandatory 64-bit support, App Tracking Transparency upending analytics and monetisation pipelines, and increasingly aggressive background process restrictions that changed how live games maintain persistent connections.
  • Game engines have moved from largely offline, packaged-build models to live-service, always-connected architectures requiring persistent backend infrastructure. For mobile studios, this shift was not optional. The free-to-play model that dominates mobile demands live operations from day one.
  • Asset fidelity has increased exponentially, with photogrammetry, volumetric capture, and procedural generation placing massive new demands on storage, networking, and compute. Even mobile titles now ship with gigabytes of downloadable assets and require robust CDN strategies for over-the-air content delivery.
  • Remote and distributed development, accelerated by the pandemic, has become a permanent fixture, requiring studios to rethink VPN architecture, remote workstation access, and globally distributed build systems.
  • AI-assisted workflows for content generation, testing, and localisation have begun entering production pipelines, and studios are still determining what infrastructure, governance, and access controls these tools require.

Each of these shifts does not merely add to the existing workload. It restructures it. The IT team that was expertly managing on-premise Perforce servers in 2018 may now need to architect hybrid cloud-edge solutions for globally distributed teams. The mobile studio IT team that once maintained a handful of Mac Minis for iOS builds may now be managing a fleet of Apple Silicon build agents, Android signing infrastructure across multiple keystores, and automated submission pipelines to both app stores simultaneously.

This is not incremental change. It is periodic reinvention.

The Knowledge Gap: Helpdesk Fundamentals Are Not Enough

There is nothing wrong with strong helpdesk skills. Provisioning accounts, imaging machines, managing device inventories, and handling break-fix tickets are all necessary functions. They are the foundation. But in a game studio, they are only the foundation.

The challenge is that many studios, particularly smaller or mid-sized ones, hire IT staff with generalist backgrounds and expect them to operate in an environment that demands specialist knowledge. This is especially common in mobile studios, where the early-stage team is small enough that IT responsibilities are shared informally or handled by a single person wearing multiple hats. The result is a persistent knowledge gap that only becomes visible when it is already causing damage.

An IT administrator in a game studio needs to understand, at minimum:

  1. Version control at scale. Perforce is the industry standard for large binary assets in console and PC development. Mobile studios often start with Git or Git LFS, which works adequately for a single small project but begins to strain under the weight of multiple concurrent titles with large asset repositories. Understanding when and how to migrate, or how to manage branching strategies across several live projects sharing common frameworks, is critical knowledge that a generalist background does not provide.
  2. Build infrastructure. Whether it is Jenkins, TeamCity, Unreal's BuildGraph, Fastlane for mobile, or a custom system, IT must understand how builds are compiled, distributed, and validated. In mobile studios, build infrastructure carries additional complexity: iOS builds require macOS hardware, Android builds require managing SDK versions and NDK configurations, and both platforms demand code signing workflows that are fragile and poorly documented. A build engineer and an IT administrator in this industry share a significant overlap in responsibilities.
  3. Workstation specifications and GPU workflows. Artists, programmers, and technical artists have workstation requirements that are fundamentally different from a standard corporate environment. Mobile studios may underestimate this, assuming that because the target device is a phone, the development hardware can be modest. This is a misconception. Authoring content for mobile still demands capable workstations, and the testing matrix of physical devices that IT must procure, manage, charge, update, and distribute across QA teams is a logistical challenge unto itself.
  4. Network architecture for high-throughput environments. The volume of data moving through a studio's network including asset syncs, build distribution, render output, and telemetry streams, dwarfs typical enterprise traffic. Network design must account for this or production suffers.
  5. Platform-specific compliance and security. Console development requires adherence to strict NDAs and security requirements from platform holders like Sony, Microsoft, and Nintendo. Mobile development carries its own compliance burden: App Store review guidelines that change without warning, Google Play policy updates that can pull a live game from the store, privacy regulations that affect SDK integration, and the constant management of provisioning profiles, certificates, and entitlements that silently expire and break builds at the worst possible moment. IT must understand these requirements at a level that goes well beyond standard corporate policy.
  6. Confidentiality, as we see it. Beyond platform holders, studios also manage NDA and confidentiality obligations with middleware providers, outsourcing partners, and service vendors. An IT team must understand which tools and environments are subject to these agreements, and ensure that access provisioning, data handling, and network segmentation reflect those contractual boundaries. A vendor NDA breach caused by misconfigured access is not a hypothetical. It is a career-ending event for the people responsible.

None of this is exotic knowledge. But it is specialised, and it is rarely part of a traditional IT training path. The expectation that a generalist helpdesk background prepares someone for this environment is one of the most common and most costly misconceptions in the industry.

The Scaling Problem: From One Project to Many

Perhaps nowhere is the gap between generalist IT and production-aware IT more painfully exposed than in mobile studios that experience rapid growth.

The pattern is familiar. A studio launches with a single game. The team is small. Infrastructure is lean, often held together with a combination of cloud services, manual processes, and institutional knowledge stored in a few people's heads. IT, if it exists as a distinct function at all, is reactive and informal. Tickets are Slack messages. Documentation is sparse. It works because the scale is manageable.

Then the game succeeds.

Revenue comes in. The studio greenlights a second project. Then a third. Hiring accelerates. Suddenly there are multiple teams, each with different engine versions, different backend stacks, different build requirements, and different release cadences. The infrastructure that comfortably supported thirty people working on one game cannot support a hundred and fifty people working on four.

This is where the cracks appear.

  1. Identity and access management becomes tangled. What started as a flat permission structure with everyone having access to everything must now be segmented by project, by discipline, by seniority. Platform holder NDAs may require that only specific employees can access certain repositories or dev kits. Onboarding a new hire used to take an afternoon. Now it takes days because nobody has documented which groups, tools, licences, and environments each role requires.
  2. Continuous Access Monitoring. Equally important is what happens after access is granted. Without continuous access monitoring, permissions accumulate and drift. An artist who moved from Project A to Project B six months ago may still have write access to both repositories. A contractor whose engagement ended may still have active credentials. Access reviews in a fast-moving studio feel like overhead until the audit, or the breach, arrives. Automated access monitoring and periodic entitlement reviews are not bureaucratic exercises. They are the minimum standard for a studio handling multiple projects under separate NDAs and compliance requirements.
  3. Build infrastructure does not scale linearly. A single build pipeline for one project is straightforward. Four concurrent pipelines, each with their own platform targets, signing configurations, and release branches, competing for the same build agents and artefact storage, is an entirely different problem. Build queues back up. Developers wait. Production slows.
  4. Tooling sprawl accelerates. Each new project team brings preferences. One team uses Jira, another prefers Linear. One team deploys backends on AWS, another inherited a GCP setup. Without intentional governance, the tool landscape fragments, and IT is left supporting an ever-expanding matrix of platforms with no standardisation and no leverage.
  5. Live operations multiply the surface area. A single live game requires monitoring, incident response, content deployment, and player-facing service management. Multiple live games multiply all of this. Each game has its own release calendar, its own event schedule, its own critical revenue periods. An outage during a limited-time event in one game is a revenue loss measured in real currency. IT must ensure that the infrastructure supporting these services is resilient, observable, and independently manageable.
  6. Technical debt compounds invisibly. The shortcuts that were acceptable at a smaller scale. Examples such as hardcoded configurations, manual deployment steps, undocumented server setups become liabilities. But there is rarely a mandate to address them because leadership is focused on shipping the next game. IT inherits this debt whether or not it was involved in creating it.

The studios that navigate this transition successfully are the ones where IT is involved early in the scaling conversation. Not after the third project has been greenlit and the infrastructure is already straining, but at the point where growth is being planned. IT needs a seat in that room, not to slow things down, but to ensure that the foundation can support what is being built on top of it.

The Cultural Disconnect

There is often a cultural gap between IT departments and production teams in game studios. Developers, artists, and designers are accustomed to working with cutting-edge technology. They push hardware to its limits. They customise their tools extensively. They expect rapid iteration and minimal friction.

In mobile studios, this culture runs particularly hot. The pace of live operations means that production teams are accustomed to shipping updates weekly, sometimes more frequently. They expect environments to be available, builds to be green, and deployments to be seamless. When IT introduces process — change windows, approval gates, access reviews — it can feel like friction being imposed by people who do not understand the urgency.

IT teams that approach this environment with a rigid, policy-first mindset will encounter resistance. Not because production teams are undisciplined, but because the nature of creative production demands flexibility that traditional IT governance models do not always accommodate.

This does not mean security and process should be abandoned. Far from it. As we discussed in Part 3 and Part 5 of this series, security posture and AI governance are non-negotiable. But the approach must be adapted to the context. Lockdown policies that work in a financial services firm will strangle a game studio. Approval workflows designed for quarterly software deployments will be incompatible with a production environment that deploys internal builds multiple times per day and pushes live content updates to millions of players on a weekly cadence.

The most effective IT teams in game studios are those that earn their seat at the production table. They attend sprint reviews. They understand milestone deliverables. They know what "alpha", "beta", and "gold master" mean in console development and what "soft launch", "global launch", and "LiveOps calendar" mean in mobile. They understand that a store submission deadline is not a suggestion. They are not waiting for tickets to arrive. They are anticipating the needs before they become blockers.

Building for the Next Shift

Given that the technology landscape in games will continue to shift, the question is not whether the next disruption is coming. It is whether IT is prepared to absorb it without falling behind.

For mobile studios, the next shifts are already visible on the horizon. Platform holders are tightening privacy controls further. Cross-play and cross-progression between mobile and other platforms are becoming player expectations. Cloud gaming is blurring the line between mobile and console entirely. AI-driven content pipelines are promising to accelerate production but introducing new infrastructure requirements and governance questions that most studios have not yet answered.

As AI tools enter the production pipeline, studios need clear policy frameworks governing their use - what data can be fed into third-party models, how generated assets are reviewed for IP compliance, and who approves the integration of new AI services into production workflows. IT is uniquely positioned to enforce these frameworks at the infrastructure level, controlling which services are accessible, how data flows between internal systems and external APIs, and ensuring that usage is logged and auditable. Without this, AI adoption becomes another vector for shadow IT, as discussed in Part 5 of this series.

Preparation means investing in several areas:

  1. Modular infrastructure. Design systems that can be reconfigured without being rebuilt from scratch. Containerised build environments, infrastructure-as-code, and abstracted storage layers all contribute to adaptability. For studios running multiple live games, modular infrastructure also means shared services such as centralised authentication, common monitoring stacks, and unified artefact repositories, that reduce duplication without creating dangerous single points of failure.
  2. Continuous learning. IT staff in game studios must be given time and resources to stay current with engine updates, platform SDK changes, and emerging tools. In mobile, this includes staying ahead of Apple's WWDC announcements, Google Play policy updates, and the evolving landscape of ad mediation, analytics, and attribution SDKs that live games depend upon. This is not a luxury. It is an operational necessity.
  3. Cross-functional relationships. IT should have direct lines of communication with technical directors, pipeline engineers, and production managers. When IT understands what production is building toward, it can provision proactively rather than reactively. In a multi-project studio, this means IT should have visibility into each project's roadmap, not just its current ticket queue.
  4. Documentation and knowledge transfer. The institutional knowledge of how a studio's pipeline works is often held by a handful of senior engineers. IT should actively participate in documenting these systems so that support continuity does not depend on individual availability. This is doubly important in fast-growing studios where the people who built the original infrastructure are increasingly consumed by the demands of the newest project and unavailable to support the systems they created.

A Note on Recognition

There is an uncomfortable truth worth stating plainly. IT in video game studios is frequently under-resourced, under-recognised, and under-represented in production decisions. Studios will spend millions on user acquisition campaigns and proprietary engine features while running their IT operations on minimal staff and constrained budgets. Mobile studios are especially prone to this because the perceived simplicity of the platform, phrases such as "it is just a mobile phone game" or "these are just casual games", mask the genuine complexity of the infrastructure required to develop, deploy, and operate live games at scale.

This is a structural problem, not an individual one. And it will not change until IT teams demonstrate, consistently, that they understand the production pipeline deeply enough to be considered part of it. This is not about seeking validation. It is about earning the influence needed to make infrastructure decisions that serve the studio's long-term health rather than merely reacting to its short-term emergencies.

Understanding the pipeline is not a bonus qualification. It is the baseline.

The State of IT in Games

The video game industry is similar to other technology sectors in its reliance on infrastructure, security, and operational discipline. It is fundamentally different in its pace of change, the density of its production pipelines, and the degree to which its supporting technology can be reshaped by external forces outside the studio's control.

Mobile game development amplifies these characteristics. The release cycles are faster. The platform shifts are more frequent and less predictable. The scaling challenges are more abrupt. And the expectation that IT can simply "keep things running" without deeply understanding what "things" are and how they connect is more dangerous.

IT teams in this space cannot afford to be generalists who happen to work in games. They need to be technologists who understand game production. The distinction matters because when the next platform shift arrives, when the next engine overhaul lands, when the next wave of tooling transforms how content is created, when the studio's third or fourth live game goes into production and the infrastructure must absorb it without collapsing, it will be the IT teams that understood the pipeline who adapt. Everyone else will be scrambling.

That is the reality of IT in video game studios. The ground moves. The question is whether you are building on bedrock or sand.

Directing the Move: Learning by Shifting a Game Production Studio

Moving is a challenging task on its own, but moving a production studio without major disruptions is another beast altogether. I have planned and executed a full studio migration a couple times now.

This is the story of when I moved an active mobile production studio from one end of Abu Dhabi to the other. I designed the network, coordinated the contractors and kept production running through all of it.

The Setup

In late 2021, an active mobile production studio in Abu Dhabi began planning a move from Park Rotana Complex in Khalifa Park to the newly constructed Yas Creative Hub on Yas Island.

By the time IT was brought into the planning process, the broad decisions had already been made. The timeline existed. The building was signed. What remained was the execution, and the question of how to move a live production environment across the city with a weekend of maximum downtime.

I had one person on my team. That person was me.

The planning phase ran from November 2021 to March 2022. Physical execution began in March and concluded with the studio officially opening in October 2022. In between those two dates lived approximately eleven months of contractor negotiations, municipal certifications, regulatory checks, ISP transitions, and the specific kind of creative problem-solving that only emerges when something has gone genuinely wrong.

What follows are the lessons I took out of that process. Each one is the product of something not going according to plan.

Lesson 1: Contractors Will Interpret Your Plans Creatively

The Townhall area was one of the centrepiece spaces in the new studio. Staircase-style tiered seating, designed to hold the full studio for all-hands presentations and company-wide events. Significant square footage. Significant investment.

At some point during construction, the contractor expanded the footprint of the Townhall seating into the adjacent office space.

This was not in the drawings. This was not discussed. This was a unilateral creative decision made by someone who either misread the plans or chose not to read them at all.

The options at that point were: tear down the built staircase structure and rebuild within the correct boundaries, or reduce the Townhall seating to fit the original allocation and accept the smaller configuration.

Tearing down the staircase would have added weeks to the timeline and reopened a cost conversation nobody wanted to have. I made the call to reduce the seating and move on.

Key Lesson: Treat every contractor deliverable as a draft until you have physically walked it. Do not assume that architectural drawings translate into accurate builds without supervision. The gap between what is planned and what is built is the gap you do not check on.

Lesson 2: "That's Normal" Is Not a Technical Answer

The Yas Creative Hub building was new. The HVAC systems were new. The air conditioning units for the studio floor were new.

When the AC units were tested, they leaked water.

All of them.

The contractor's position was that this was normal behaviour during initial testing. Condensation. Expected. Nothing to worry about.

I did not accept this.

Water leaking from ceiling-mounted AC units directly above workstations, server infrastructure, and production hardware is not a commissioning quirk. It is a liability. I pushed back, escalated, and held the sign-off until the units were inspected, the drainage systems corrected, and a dry test was completed.

The contractor was not happy about this. The contractor was also wrong.

Key Lesson: When a contractor tells you a failure mode is normal, ask them to put it in writing. The willingness to document tends to clarify the situation quickly. You are not an expert in HVAC engineering. You are an expert in what happens to your infrastructure when water falls on it.

Lesson 3: Certification Will Slip. Plan for It to Fail Three More Times.

The datacenter required fireproofing certification from the Abu Dhabi municipality before it could be approved for occupation. This is not optional. This is not a formality. Without it, the datacenter does not open.

The first inspection failed. The contractor had fireproofed the upper walls but not the lower portion where the raised floor began. The inspector identified the gap immediately.

The contractor returned, made corrections, and scheduled the second inspection.

The second inspection failed. Same issue, different section of wall.

By the third scheduled inspection, I attended in person. I walked the inspector through the space before the formal review. I had already identified and flagged the remaining gaps to the contractor the day prior. We tore down the wall completely, sealed the entirety from the ceiling all the way to the gaps in the raised floor that were flagged the previous times and then rebuilt it in a day.

Eleven months of planning. Three fireproofing inspections later, the datacenter was finally approved.

Key Lesson: Regulatory certification does not run on your project timeline. Build buffer into every sign-off that involves a third-party authority. Assume one failure at minimum. Assume two if the contractor has already demonstrated they are reading the requirements selectively. The municipality inspector is not your adversary, they are the only person in the room who has no reason to cut corners.

Lesson 4: Someone Has to Be in the Room

Network cabling for floor boxes sounds like an unattractive task. Route the cables, terminate the connections, test the links. Standard process.

The studio floor had clearly defined user desk areas. It also had collaboration zones, separate open spaces designed for informal meetings, breakout sessions, and team clusters away from the main desk rows.

The cabling team forgot about the collaboration zones entirely.

This was not discovered during the cabling phase. It was discovered when the floor was nearly complete and someone asked why the collaboration areas had no connectivity.

The answer was that nobody had been watching.

Separately, the meeting rooms, conference room, and Townhall all arrived with furniture below the specified quality. Table microphones and ceiling speakers were installed without acoustic testing. The first time a meeting was held with both active simultaneously, the room fed back on itself.

The soundproofing was inadequate. I had to require the contractor to return and correct the acoustic treatment before the rooms were signed off.

Neither of these failures were inevitable. Both would have been caught earlier with consistent on-site supervision.

Key Lesson: Delegation to a contractor is not the same as oversight. Also, expecting the contractor to read the plans carefully is a mistake on its own. For any build task that involves multiple phases or multiple teams, assign a human being, ideally yourself, to physically verify completion at each stage. The cabling team did not maliciously skip the collaboration zones. They were not reminded those zones existed.

Lesson 5: Build for Airflow Before You Build for Aesthetics

The build machines, the systems used for compiling and packaging the game builds, were racked in a dedicated enclosure. The rack was purpose-built for the space. Solid construction. Clean cable management. Doors on all sides.

The machines were fully sealed.

When I tested the systems with the doors closed, temperatures climbed immediately. The enclosure had no rear ventilation. Hot air had nowhere to go. Left unaddressed, the build machines would have begun throttling and eventually failing within weeks of production use.

The fix was not elegant. I instructed the team to cut the back panel off the enclosure to create airflow. A purpose-built rack, freshly installed in a new studio, was modified with a cutting tool before it ever went into use.

It worked. The temperatures normalized. Production was never impacted.

Key Lesson: Thermal management is not an afterthought. When specifying any enclosure, rack, or cabinet that will house active compute hardware, airflow path is a primary requirement, not a secondary consideration. A rack that looks correct but traps heat is worse than no rack at all. Ask where the hot air goes before you approve the build.

Lesson 6: WiFi is a People Problem, Not a Space Problem

The wireless network for the new studio was designed based on the floor plan. Access points were positioned to achieve even signal coverage across the studio area. The planning looked correct on paper.

When the studio opened, one team reported consistently poor wireless performance. The QA team.

QA teams in mobile game production operate with a high density of devices. Each tester runs multiple handsets simultaneously, sometimes four to six devices per person. A team of five QA testers represents potentially twenty-five to thirty active wireless devices in a single area.

The QA team had been seated in a corner of the studio. The AP coverage in that corner was designed for standard user density. It was not designed for the device density of a QA floor.

The fix required repositioning access points and adjusting the wireless design to reflect actual usage patterns. It was not a complicated fix. It was a fix that should not have been necessary if the seating plan was completed earlier on.

Key Lesson: Plan your wireless network after you have confirmed where every team is sitting and not before. Coverage maps measure signal strength across physical space. They do not account for the number of devices a given team will connect. Seat your highest-density teams first. Design the wireless network around them.

Lesson 7: When the Plan Fails, Build a Fallback

Building contract dates can be challenging to manage when there is a large scale migration like a studio move happening. But these are the kind of scenarios that you should include buffers and fallback plans for.

The contract with the Park Rotana building ended in August 2022. The original agreement with the contractor was that the new studio would be ready by then. It was not.

Electrical approvals had not completed. The site safety inspection had failed. The physical move approval was delayed until a full cleanup could be completed, estimated at end of September at the earliest. The old building contract was expiring. The new building could not legally be occupied.

There was a gap. A real one.

I had an internal discussion with the Yas Creative Hub facilities team and confirmed one thing: moving equipment in for installation and testing purposes was permitted. The building did not need an occupancy permit for machines. Only for people.

That was enough.

I formulated a plan around that single fact. The ISP line was installed and activated in the new building. I made the decision to break the firewall high-availability pair and move one unit to Yas Creative Hub, making it the primary. The network was built out and tested. Then on a single weekend, I shut down the datacenter at Park Rotana, moved the servers and build machines to the new site, and brought the full environment back up, accessible over VPN.

The HA pair was reconstructed. The build pipeline came online. On the last week of August, the studio went fully remote.

For six weeks, an active mobile game production studio operated entirely over VPN from a datacenter that sat inside a building nobody was allowed to enter yet. Production continued without disruption. No milestones missed. No pipeline failures.

When the second safety inspection passed and the occupancy permit was issued in mid-October, the studio opened. The physical move for employees was the final step, a natural transition from working remotely to coming to a brand new studio.

Key Lesson: When the original plan becomes impossible, the question is not how to restore it. Rather, think about what new plan can be made with the new constraints. The building facilities conversation was not a workaround, it was a requirements discovery session. Understanding the exact boundary of what was permitted revealed a path that the original plan had never considered.

The initial plan failed. The migration and the timeline did not.

Commonalities and Uncommon Paths

Reading these back, there is a common thread across the first six lessons that I did not fully understand until I was well past the project.

Every one of these six challenges was a visibility failure.

The contractor expanded into the wrong space because nobody caught it early. The AC units leaked because I was expected to take someone's word for it. The fireproofing failed twice because I was not in the room. The collaboration zones had no cabling because nobody was watching. The rack had no airflow because aesthetics were evaluated and thermals were not. The WiFi was weak in the QA corner because people density was never part of the wireless planning conversation.

The lessons here were earned, not studied.

Handling situations like the seventh however, requires an understanding of the situation beyond what the defined IT jurisdictions are. If I had simply worked with what IT Infrastructure teams were supposed to work within, I would not have thought of a Hail Mary such as moving a portion of the equipment over even though there was no entry permit to the new building.

A studio migration is not an infrastructure project with a construction component. It is a coordination problem that happens to involve infrastructure. Contractors, inspectors, furniture vendors, ISPs, and municipality authorities are all operating on their own timelines, with their own definitions of done.

Your definition of done is the one that matters. The only way to enforce it is to be present, to verify directly, and to treat every sign-off as provisional until you have seen it yourself.

Checklists are great to have. Structured plans are amazing to create. Without expecting failures and planning buffers in the project however, you invite chaos.

I had no predecessor to learn from. No internal playbook. No one who had done this before and left notes.

This is me leaving the notes.

AI in IT: Raising the Bar

AI’s True Impact on IT: Raising the Bar, Not Replacing the Workforce

Artificial Intelligence is accelerating across industries. Creative roles are already being re-evaluated. Operational workflows are being automated. Entire teams are being restructured around automation and efficient, lean pipelines.

Within IT, the reaction often swings between two extremes.

One side believes IT is next.
The other dismisses the shift entirely believing IT is immune.

Both responses miss the point.

AI is unlikely to replace IT departments in the near term. But it will redefine which IT teams remain strategic and which become overhead.

What Will Shrink?

The parts of IT built on repetition are vulnerable.

These are the first areas that AI will streamline.

  • Level 1 Support
  • Basic ticket triage
  • Routine systems provisioning
  • Template policy drafting
  • First-pass vulnerability reviews
  • Log filtering and alert classification

These tasks follow patterns. AI thrives on patterns.

If a portion of your daily task can be scripted, it must be. Those are the portions of your job role that are at risk of vanishing immediately. If an IT team defines its value by the volume of tickets closed or manual tasks performed, that team is operating on a layer that will vanish entirely.

This is not to say the team failed. This is structural evolution with the available technology.

What Must Grow?

The next generation of IT teams must shift toward control-plane thinking. More than just executing your tasks, think higher.

Take one step back and understand why something must be done.
Think what the management might want from this task.
Think how you can make it more efficient.

Focus less on operating systems manually. Instead, focus on designing systems that operate themselves.

The durable layers of IT will be:

  • Architecture design
  • Automation strategy
  • Governance modeling
  • Risk orchestration
  • Vendor integration oversight
  • Identity and Access strategy
  • Resilience engineering
  • Multi-cloud decision framing

Notice how most of the layers are not at the execution level. AI can execute tasks, as long as there is a strong and capable team providing direction, correcting flow errors, monitoring at level 2, and controlling the boundaries. AI can assist with your tasks, it cannot own them. Accountability, trade-offs, and contextual judgement remain human responsibilities.

Practical Steps for L1 / L2 Teams

Start with the below and make them your own over time. These are the skills I look for when I hire for my team.

1 - Document and Script Repetition

If a task is performed more than three times, it should not remain manual.

  • Create PowerShell / Bash scripts for recurring fixes.
  • Build standard provisioning templates.
  • Maintain shared script repositories.

An L1 engineer who writes automation becomes harder to replace than one who executes tasks manually.

2 - Convert Tickets Into Patterns

Instead of resolving tickets individually, do the following. It improves your team efficiency by a margin greater than you expect.

  • Identify the top 10 recurring issues.
  • Map root causes.
  • Propose structural fixes.

Reducing ticket volume through systemic correction is higher leverage than resolving tickets faster. This, in my experience, has always received the most amount of push back from IT teams because there is the inherent fear that team sizes are proportional to the volume of tickets received and resolved every quarter.

The managements that measure their team efficiency this way are places where communication is key. Clearly explaining the long-term cost difference between having a higher volume of L1 tickets vs having the team focus more on automation is needed.

AI adoption by the IT team to better support the operational teams without additional 'subject-matter expert' hires must be emphasized.

3 - Build Self-Service Layers

The following will help reduce noise in the tickets immediately.

  • Introduce password self-service tools.
  • Automate onboarding templates.
  • Create internal knowledge portals.

The goal is not to protect ticket volume. It is to eliminate avoidable load, freeing up the team to focus on better initiatives.

4 - Learn AI-Assisted Operations

AI tools can already:

  • Draft scripts.
  • Summarize logs.
  • Suggest remediation paths.
  • Parse audit outputs.

Teams that learn to use AI to accelerate analysis will outperform those who resist it.

5 - Shift from Task Completion to Risk Awareness

L1/L2 engineers should begin asking themselves:

  • What is the business impact of this failure?
  • Is this symptom masking a systemic issue?
  • Is there an automation opportunity here?

This mindset transition is the bridge toward architectural relevance and organizational importance.

The Complacency Risk

Becoming overly worried about AI is unproductive. Becoming complacent however, is more dangerous.

Complacency in IT sounds like this:

  • "I am doing my job, and that is enough."
  • "Infrastructure cannot be automated."
  • "Cloud is just someone else's server."
  • "AI is just another tool."
  • "AI usage makes the IT team relaxed."

The teams that shrink fast will be those that treat AI as noise rather than signal. The ones that evolve will be those that deliberately redesign their operating model to adapt with the industry.

The Required Mindset Shift

IT teams must stop equating being busy with value. You might feel like you do a lot for your organization and you are thus invulnerable to the restructuring efforts. This is a dangerous thought path to assume.

Closing more tickets is not strategic leverage. Firefighting faster is not operational maturity. Future-ready IT will work on the following today.

  • Reduce manual dependency.
  • Design automation intentionally.
  • Accept smaller, higher-leverage structures and systems.
  • Measure success by system stability and risk reduction.
  • Treat AI as an operational accelerator, not a threat.

The shift is from executor to architect.
From operator to orchestrator.
From cost center to continuity guarantor.

Closing Thoughts

AI will not eliminate IT departments overnight. That fear is overstated. But it will expose which teams are tactical operators and which are strategic designers.

The future of IT provides strong growth paths to teams that embrace automation at their core. This future belongs to the teams who design the systems that automation runs on.

Those who design systems, not just operate them, will remain indispensable.

Data Security Posture Management in Practice

Data Security Posture Management is often discussed in abstract terms: Discovery. Classification. Governance. Remediation.

In reality, posture failures surface during high-pressure events: Migrations. Audits. Incidents.

This story from my experience illustrates how incomplete visibility can translate into operational disruption.

The Scenario

During a large-scale Microsoft tenant-to-tenant cloud migration, the IT team executed a structured migration plan:

  • Exchange mailboxes migrated
  • SharePoint sites migrated
  • OneDrive data migrated
  • Teams environments migrated
  • Permissions mapped and validated

From an infrastructure perspective, the migration was comprehensive. What was missing was discovery. The production team had been using Microsoft Loop as their primary planning environment. Critical project-planning data lived entirely within Loop workspaces. IT had no inventory of this usage. No classification. No tracking. No migration mapping.

When the production team accessed the new tenant, their planning data was incomplete.

The migration had technically succeeded. Operationally, it had not.

What Went Wrong

This was not a tooling failure. It was a visibility failure.

There was:

  • No centralized inventory of SaaS workloads in use
  • No monitoring of newly adopted Microsoft 365 services
  • No sensitivity tagging tied to workload discovery
  • No structured data ownership validation before migration

Loop usage had never been formally onboarded into governance oversight. It existed within the tenant, but not in IT's operational awareness or the business-critical software inventory.

This is a classic posture management gap.

The Consequence

Once the gap was discovered, the organization faced a time-critical recovery scenario.

The only viable path was manual intervention:

  • Identifying affected Loop workspaces
  • Exporting data from the source tenant
  • Recreating workspaces in the destination tenant
  • Copying content manually
  • Validating completeness with production stakeholders

The remediation effort took six full days.

Six days of cross-team coordination, late hours, manual verification, and elevated stress. The migration timeline was disrupted. Trust was strained. Risk exposure increased. The damage to reputation and team trust was far harder to repair than the actual missing data.

All because discovery had not preceded execution.

Where Data Security Posture Management Would Have Helped

A mature posture management capability would have reduced or eliminated this disruption.

1. Continuous Discovery

Automated workload inventory would have revealed:

  • Active Microsoft Loop workspaces
  • Volume of content stored
  • User adoption patterns

Loop would have been visible as a production-critical workload rather than an unnoticed collaboration tool.

2. Data Classification and Sensitivity Mapping

If planning artefacts had been labelled according to sensitivity or business criticality:

  • High-value workspaces would have been flagged
  • Migration planning could have prioritized them
  • Data validation checklists would have included them

Classification provides a signal. Without it, all data appears equal.

3. Pre-Migration Posture Assessment

A structured posture review before migration would have asked:

  • Which workloads are actively used
  • Which contain business-critical data
  • Which services fall outside standard migration tooling

That assessment would likely have surfaced Loop usage early, while remediation was still simple.

4. Ownership and Accountability Mapping

Posture management also clarifies data ownership. If each collaboration workspace had a defined business owner:

  • Owners would have been engaged during migration validation
  • Confirmation of completeness would have occurred before cutover

Instead, ownership discovery happened after the disruption.

The Operational Lesson

Data Security Posture Management is not only about compliance and regulatory alignment. It is about operational continuity. When IT lacks visibility into:

  • Emerging SaaS workloads
  • Shadow adoption of collaboration tools
  • Data criticality distribution

Strategic initiatives such as tenant migrations become risk multipliers. Infrastructure execution without data awareness creates blind spots.

From Discovery to Remediation

In this case, remediation was manual and reactive. It consumed six painful days because the discovery occurred after the impact. A mature posture management lifecycle would follow a different sequence:

  1. Discover workloads and data locations
  2. Assess sensitivity and criticality
  3. Validate ownership
  4. Incorporate findings into migration design
  5. Execute with verified scope

Remediation then becomes exception handling, not crisis response.

Conclusion

The tenant migration did not fail technically. It failed from a posture perspective. The absence of continuous discovery and workload awareness turned a standard cloud migration into a six-day-long recovery exercise.

There is an additional lesson that is often overlooked. In a fast-moving or rapidly-growing environment, it is common for teams to adopt new collaboration tools outside formal governance workflows. Without structured discovery and verification mechanisms, these adoptions remain invisible to migration planning.

In this case, there was an implicit assumption that all production critical planning data was known and accounted for. That assumption proved incorrect.

Verbal confirmation is not validation. IT leadership must independently verify workload usage, data locations, and service dependencies before executing high-impact changes. This means conducting technical discovery scans, usage analysis, access reviews, and controlled testing rather than relying solely on stakeholder declarations.

Data Security Posture Management formalizes that discipline. It replaces assumption with evidence. It ensures that the business teams' beliefs are technically validated before transformation begins.

Infrastructure planning without independent verification is highly risky. Continuous posture management closes that gap and converts uncertainty into measurable control.

State of IT Part 6: Balancing Innovation and Operational Stability

A quiet tension is building within most IT teams.

On one side, there is demand to innovate. Automate more. Integrate AI into workflows. Reduce headcount dependency. Move faster. Deliver more with less. On the other side, there is the unglamorous reality of keeping systems stable. Patch cycles. Identity hygiene. Backup validation. Endpoint drift. License audits. Incident response. The daily grind that nobody celebrates until it fails.

Innovation gets applause. Stability gets silence.

Yet stability is the foundation that makes innovation survivable.

The Illusion of Acceleration

We are in a time when leadership conversations are dominated by speed.

  1. How quickly can we deploy?
  2. How fast can we automate?
  3. How much AI can we embed?

The assumption is that acceleration equals progress. But acceleration without structural maturity creates fragility. If your identity architecture is inconsistent, automating access provisioning will compound those inconsistencies. If your asset inventory is incomplete, AI-driven analytics amplify blind spots. If your governance model is unclear, automation only accelerates chaos.

Innovation in this manner does not compensate for weak foundations. It exposes them.

Stability Is Not Resistance to Change

There is a misconception that teams focused on operational discipline are resistant to innovation.

This is rarely true.

The best operations teams understand a fundamental truth. Stability is not the opposite of innovation. It is the prerequisite for it.

Resilient systems allow experimentation. Documented processes allow safe iteration. Clear ownership allows confident delegation. When fundamentals are strong, innovation becomes additive. When fundamentals are weak, innovation becomes disruptive.

We need to focus on system maturity and stability before we can consider iterating on or innovating existing tools and structures.

The Cost of Ignoring the Base Layer

When innovation initiatives outpace functional stability, the symptoms appear gradually.

Small outages become recurring patterns. Security exceptions multiply. Access reviews become performative. Shadow IT grows quietly. Eventually, the organization does not suffer from a lack of innovation. It suffers from cumulative operational debt. IT then becomes reactive instead of strategic. Teams spend their time firefighting instead of designing. The irony is that the more an organization pushes for innovation without discipline, the less innovative it actually becomes.

A Practical Balancing

Juggling innovation and stability does not require complex frameworks. It needs intentional sequencing.

First, define non-negotiables.

  1. Backup integrity.
  2. Identity hygiene.
  3. Patch compliance.
  4. Monitoring coverage.

These act as foundational controls.

Second, assess operational health before accelerating growth and experimentation. If your incident resolution time is unstable, automation should focus there first.

Third, introduce innovation in limited domains.

  1. Pilot AI in reporting before applying it to access control.
  2. Test automation in non-critical workflows before applying it to production pipelines.

Fourth, preserve human monitoring. Automation decreases manual effort. It does not remove accountability. Innovation should feel like reinforcement, not replacement. This is where, in my humble opinion, most organizations fail.

Leadership Expectations and Reality

Many IT leaders are navigating expectations shaped by headlines rather than infrastructure realities. There is a belief that AI can replace inefficiency. These tools can compensate for process gaps. That digital transformation is primarily about platform adoption.

In practice, transformation is about discipline. It is about clarity in roles. It is about visibility in systems. It is about governance that scales. Technology accelerates what already exists.

If structure exists, it accelerates efficiency. If any disorder exists in your structure, it accelerates instability.

The Human Element

There is another dimension that is often overlooked. Operational dependability is not purely technical.

It is cultural. Teams that value documentation. Teams that respect change control. Teams that escalate early rather than conceal mistakes. These are the teams that innovate sustainably.

When people feel pressured to deliver visible innovation at the expense of quiet stability work, corners are cut. Over time, trust erodes. The strongest IT environments are not the most automated. They are the most accountable.

Accountability > Automation.

Redefining Success

Perhaps the biggest shift required is revising how success is measured.

Not only by how many AI initiatives were launched.
Not only by how many systems were modernized.
But by how many incidents were prevented.
How many risks were mitigated before they happened.
How stable the environment remained during the transformation.

Innovation that destabilizes is not progress. It is a deferred cost.

The State of IT Today

We are not short of tools. We are not short of ambition. What many organizations lack is calibrated pacing.

Balancing innovation and stability is not about slowing down. It is about strengthening the base before increasing velocity. In these uncertain times, the temptation to move fast is understandable. The discipline to move deliberately is what will separate resilient IT teams from reactive ones.

Innovation should expand capability. Stability assures that expansion does not collapse under its own weight.

State of IT Part 5: Guardrails for the Generative Era

Building a Safety Net Against Unchecked AI Tool Usage in the Workplace

AI-powered tools are becoming part of everyday work. Companies now face a challenging balance. Productivity gains and creative leaps are appealing. But risks are real. Data could leak. Compliance could become a problem. To thrive, organizations must build strong protections around AI tool use. This is not just wise. It is necessary for business trust and continuity.

The Growing Attack Surface

AI adoption does not always start with leaders. Employees want to work faster and solve new problems. They may try generative AI tools before IT teams know about them. These tools include chat bots, code helpers, and quick image creators. The number and speed of new AI tools can quickly overwhelm old security methods.

Discovery: Shedding Light on Shadow AI

The first step is to see what is happening. You cannot protect what you cannot see. Some ways to find AI use include:

  • Watch network activity for connections to well-known AI services such as OpenAI, Midjourney, or Anthropic.
  • Scan devices to list browser extensions and desktop apps that use AI.
  • Ask employees through surveys or interviews. Sometimes, a simple question reveals hidden use cases.

There are also less common but important options:

  • Study internal messages for language patterns that suggest AI-generated content. Be sure to respect privacy.
  • Audit API keys. Track which keys are created and used for outside AI services.

Monitoring and Control: Keeping AI Usage in Check

Discovery is just the beginning. The next step is to set up real oversight:

  • Use data loss prevention tools to flag or block uploads to AI services.
  • Limit who can use approved AI tools based on their job, project, or the type of data involved.
  • Create alerts for strange usage. For example, large data uploads or unusual access times.

More advanced controls include:

  • Make lists of allowed or blocked apps. Update these lists as new tools appear.
  • Use special firewalls or gateways that inspect AI traffic and enforce rules.

Blocking: When to Draw a Firm Line

Not all AI tools are safe. Some carry too much risk. To block these, try:

  • Blacklist specific websites or IP addresses to prevent devices from accessing risky AI services.
  • Blacklist certain domains and services entirely if you are unsure about the service provider’s business practices with regard to training their models.
  • Enforce browser rules that stop people from installing unapproved extensions.
  • Use mobile device management to limit AI access on both company and personal devices.

AI rules and laws change fast. Companies need to take several steps to ensure compliance and protection:

  • Map how data moves when AI tools are used. Make sure this meets privacy laws such as GDPR or CCPA.
  • Define clear request and approval procedures for the use of new AI tools.
  • Specify who can submit requests, how requests are submitted, and what information must be included in each request.
  • Check all AI vendors for strong security, privacy, and ethics.
  • Set approval criteria for new AI tools, including vendor security, cost, data handling, and ability to meet regulatory standards.
  • Identify automatic rejection criteria. For example, reject tools that cannot ensure data residency or that do not grant proper intellectual property ownership. This applies to tools that do not integrate with your Identity Provider as well.
  • Keep records of AI use and any exceptions to the rules.
  • Require enterprise-level review for significant decisions, such as tools that impact budgets, require integration with sensitive systems, or could create legal exposure.
  • Consider risks related to budget overruns, unclear ownership of created content, and the difference between code generation and art generation.
  • Ensure use cases align with the company’s strategy and legal requirements.

The Foundation: A Strong AI Usage Policy

Before encouraging AI-driven creativity, set clear rules. A good AI policy should cover practical steps for control and decision-making.

  • What types of AI tool use are allowed or banned?
  • How to handle data at every stage, especially if it is sensitive or regulated.
  • Training for employees on risks and safe habits.
  • Steps for reporting problems or responding to AI misuse.
  • Define who reviews and approves requests for new AI tools. Document criteria for approval, including compliance, costs, and potential risks.
  • List automatic rejection triggers, such as lack of data protection or IP ownership.
  • Require periodic policy and tool reviews at the enterprise level.
  • Address budget risks, ownership of generated intellectual property, and the distinction between code and creative content.

Creative Environments: Fostering Innovation with Boundaries

Creative teams need room to try new things. But they also need limits.

Consider:

  • Setting up sandboxes so AI experiments do not touch real business data.
  • Introduce a new Pipeline: ie, Sandbox, Build, Dev-Test, Pre-Production, Production, and Live. Keep the Sandbox pipeline similar to the Development pipeline, but fully contained.
  • Giving trusted users more access while keeping checks in place.
  • Regularly reviewing both AI tools and the policy as technology changes.

Conclusion

AI tools can change organizations for the better. Without solid safeguards, they can also cause harm. The winners in the generative era will be the IT teams that combine smart discovery, careful monitoring, strong controls, and a clear policy. This approach allows for creativity while keeping risks low.