Skip to Content

Writings

Tags

Posts

State of IT Part 4: Operational Resilience in an Era of AI, Automation & Expectations

As we move into 2026, the digital landscape is rapidly evolving, driven by the widespread adoption of artificial intelligence. If recent years have taught us anything, it’s that adapting to transformative technology is no longer optional but vital. In Part 1, we discussed efficiency in lean times, in Part 2, we explored understanding our colleagues’ workflows, and in Part 3, we examined elevating our security posture. Now, we must focus on how to evolve our operations by embracing AI and automation while ensuring business continuity amid constant disruption.

The New Operational Mandate

Today’s IT teams are more than support functions – they are the foundation upon which business outcomes depend. With the integration of AI and automation into daily operations, expectations have shifted: uptime is table stakes, efficiency is assumed, and innovation is demanded, often without increased resources. The ability to harness AI for predictive analytics, intelligent automation, and adaptive decision-making is now critical to resilience and business continuity.

Even small to mid-size organizations now face complexity that rivals that of large enterprises a decade ago. The question is no longer whether you are resilient – it’s how resilient you are, by design.

What Business Continuity Actually Means

Resilience goes beyond backups and firewalls. It’s about:

  1. System Continuity: Can critical services continue (or fail gracefully) during outages?
  2. Predictable Recovery: Do you know how long it will take to restore core functions when something goes wrong?
  3. Adaptive Capacity: Can your team learn from disruption and fortify weak points upstream?
  4. User Experience Stability: Are end users (internal and external) insulated from volatility as much as possible?

Resilience is the architecture, process, and culture. In other words, it is our job to make sure the wheel keeps spinning.

Four Pillars of 2026 Business Continuity

1) AI-Assisted Observability and Response

AI and ML-based monitoring tools are no longer “nice to have”. They are essential for business continuity.

They enable us to:

  • Be able to deliver results with a smaller, tighter team size.
  • Spot abnormal patterns before they become outages.
  • Predict degradation based on trend data.
  • Automate initial diagnostics so human responders can concentrate on resolution

However, automation without context can create noise. Ensure your AI tools are configured to minimize false positives and that human experts validate critical alerts. The synergy between AI-driven insights and human judgment strengthens operational reliability.

2) Fail-Safe Automation

Many organizations have begun to discover that careless automation can lead to fragility. AI-powered automation should not just execute tasks, it should detect and respond to its own failures.

Build checks such as:

  • End-to-end validation after each automated workflow
  • Rollback triggers when thresholds are crossed
  • Simulation/testing environments that mirror production

In 2026, automation without error containment is a recipe for compound outages (self-inflicted disasters, in effect).

3) Decentralized Redundancy

Gone are the days when a single cloud region, a single identity provider, or a single primary data store was sufficient.

BC / Resilience planning in 2026 includes:

  • Multi-region deployments and backups
  • Identity and access alternatives (gateways, multi-auth setups)
  • Cross-provider failover plans

These strategies don’t always require sophisticated AI. Sometimes, even basic, well-documented failover playbooks paired with intelligent automation can significantly increase resilience.

4) Human-Centric Resilience Training

While technology is crucial, people remain at the heart of resilience, even in an AI-driven environment. Wish organizations would realize this sooner than later. Regular resilience drills, such as tabletop exercises where teams simulate incidents, help identify weaknesses that automated systems might miss.

Training should include:

  • Incident command roles
  • Communication rules
  • After-action reviews

A culture that normalizes incident simulation is far better prepared than one that treats outages as rare catastrophes. Meaning, be prepared for constant, smaller threats to be acted upon rather than waiting for a rare, large-scale attack to surface.

The Invisible Imperative: Expectations Management

Resilience isn’t simply technical; it’s communicative. Too often, IT teams build great systems but fail to meet stakeholder expectations. This leads to eleventh-hour crisis mode when SLAs aren’t met, even if a system is technically sound.

You don’t need to promise perfection, but you do need to promise clarity on:

  • What IT can guarantee
  • What IT can reasonably aim for
  • What happens when assumptions break

Clarity creates calm, for the most part.

A Simple Starting Point

If you’re unsure where to begin in your organization, start with a single checklist:

  1. What are the top three services we cannot afford to lose?
    Document dependencies and failure modes for each.
  2. Do we have automated monitoring with useful alerts?
    If not, enable it. Even basic uptime and threshold alerts are a start.
  3. Have we practiced a recovery scenario in the last 90 – 120 days?
    If not, schedule one.
  4. Can non-technical teammates explain how to get help during an outage?
    If not, craft and distribute a simple internal guide.

These steps don’t require budget authorizations, only intention.

Conclusion

In 2026, the State of IT is about more than keeping systems running; it’s about building adaptive, AI-enabled systems that continue to deliver business value even during disruption. Resilience is not the absence of risk; it’s the presence of preparation, adaptability, and the ability to leverage emerging technologies.

As we continue this series, I’ll be exploring specific architectural patterns, real infrastructure setup projects, and stories from teams that built IT infrastructure the hard way: with tight timelines, stringent budgets, and a lean team.

Stay curious, stay adaptive, and use any tool, including AI, to build resilience and stability. Proper automation helps small IT teams be more efficient despite the challenges. Just remember to be diligent with your configurations and human oversight.

State of IT Part 3: Navigating the Threat Landscape

Practical Security Steps for the Modern IT Administrator

As our digital ecosystem evolves, so do the tactics of malicious actors. Cybersecurity is now a fundamental part of every IT administrator’s role, not just a specialized concern for security teams. In this third installment of the State of IT series, we delve into the growing threats targeting both large enterprises and smaller environments, providing effective steps that even novice IT administrators can implement to enhance their security posture.

The Expanding Threat Landscape

Today’s threats are increasingly sophisticated, ranging from ransomware-as-a-service to phishing kits, supply chain attacks, and deepfake-driven social engineering. High-profile breaches may make the news, but many attacks succeed due to a lack of basic security practices.

The misconception that only large organizations are at risk is fading. Small businesses, remote work configurations, and poorly managed environments are increasingly vulnerable to attacks. In this landscape, even the simplest IT practices can offer substantial protection.

Five Simple but Effective Steps Every IT Admin Should Take

  1. Establish a Baseline Security Policy
    A basic security policy, even if it’s just one page, can define acceptable practices for your organization. Include requirements such as:
  • Mandatory use of strong, unique passwords.
  • Locking the screen after a period of inactivity.
  • Prohibiting certain software or plug-ins.
    Tools like Microsoft Intune, Google Workspace Admin Console, or open-source alternatives like Wazuh can help enforce these policies.
  1. Use Multi-Factor Authentication (MFA) Everywhere
    Credentials remain the primary target for attackers. Enabling MFA on all critical accounts and systems, such as email and admin dashboards, adds a vital layer of security. For smaller teams, services like Authy, Microsoft Authenticator, or Google Authenticator are straightforward to implement and train for.
  2. Harden the Network Perimeter
    Even without a dedicated security appliance, you can:
  • Disable unused ports.
  • Change default router credentials.
  • Segregate guest Wi-Fi from internal networks.
  • Use DNS filtering (e.g., Quad9, NextDNS, Cloudflare for Teams) to block known malicious domains.
    If your network includes a firewall such as Fortigate, pfSense, or OPNsense, ensure logging is enabled and alerts configured for suspicious activities.
  1. Secure Endpoint Devices
    While EDR tools may not be feasible for all organizations, you still have options:
  • Uninstall unnecessary software.
  • Set devices to auto-lock after inactivity.
  • Disable USB autorun.
  • Use free or open-source tools like Malwarebytes, ClamAV, or OSQuery for regular endpoint scans and monitoring.
    Encourage regular updates for systems and software. Automating updates with tools such as Patch My PC can alleviate some of the burden.
  1. Prepare for the Worst – Backups and Incident Response
    Security isn’t solely about prevention; it’s also about recovery. Be sure that:
  • At least one automated, offline backup exists.
  • Admins know who to contact during an attack.
  • A simple “what to do if compromised” flowchart is available (even in print form).
    Open-source solutions like Duplicati or Restic, as well as platforms like Backblaze or Wasabi, can provide cost-effective and reliable backup options.

Bonus Tip: Create a Culture, Not Just Controls
Regardless of how advanced your tools are, human error remains a significant vulnerability. Foster a culture of security awareness by:

  • Sharing quick security tips in team communications.
  • Explaining the rationale behind specific security measures.
  • Recognizing and rewarding secure behavior, particularly among non-technical staff.

Make security awareness engaging and relevant. Gamify the learning experience with “phishing simulations” using tools like GoPhish, and discuss actual incidents during team meetings.

Conclusion
As an IT administrator, you are tasked not only with resolving issues as they arise but also with preventing them from occurring in the first place. While security can seem overwhelming, it doesn’t have to be. By taking small, consistent steps towards fortifying your environment, you lay the foundation for long-term resilience.

Although we may not control external threats, we can manage our preparedness from within. Whether you are beginning your IT journey or leading a small team with limited resources, consistency, awareness, and a proactive mindset are crucial.

Stay vigilant, stay resilient and continue to build systems that are worth protecting.

State of IT Part 2: Understanding Organizational Workflows

Alleviating common workflow blockers for colleagues and teams to increase overall process efficiency.

As we venture deeper into our series on enhancing operational efficiency within IT and system administration teams, it’s crucial to extend our focus beyond mere technology. To effectively support our colleagues in different departments, we must develop a thorough understanding of their day-to-day operations and the common challenges they encounter. This level of insight allows us to identify minor yet significant blockers, those small frustrations that can hinder productivity and morale. In addressing these challenges, we foster a collaborative environment that strengthens our organization as a whole.

Understanding the Workflow

Each department has its unique workflows and requirements, and by engaging with our peers, IT can gain valuable insight into their processes. This involves not only listening to their concerns but also observing how they interact with the tools and resources available to them. What often becomes clear is that it’s the minor issues, rather than major technical failures, that frequently impede progress.

Common Blockers and Solutions

  1. Formatting Requirements for Documents: Many teams regularly create reports or presentations that can be time-consuming to format. By creating and distributing standardized templates for common documents, IT can save teams valuable hours. For instance, designing a template for weekly status reports can help streamline the process and ensure consistency.
  2. Translation Needs for Recruitment Documents: In an increasingly diverse workforce, the need for translated materials can be a real challenge, especially in recruitment. IT can assist HR by facilitating the use of translation tools or integrating platforms that allow for quick and easy translations of key documents without disrupting workflow.
  3. Data Tabulation and Visualization in Excel/Word: Teams often find themselves spending excessive time organizing and visualizing data, which can detract from their core responsibilities. Providing training on Excel’s more advanced features or offering automated tools for data analysis can significantly enhance efficiency. Additionally, creating a library of pre-built macros for common tabulations can empower teams to handle data more effectively.
  4. Simplifying Approval Processes: Many departments encounter delays due to cumbersome approval workflows. IT can work collaboratively with these teams to streamline approval processes by leveraging digital signatures, automating notifications, or implementing workflow management tools that keep everyone informed and accountable.
  5. Improving Communication Channels: Oftentimes, miscommunication or lack of clarity around requests can lead to delays. By standardizing communication protocols or investing in project management platforms like Trello or Asana, IT can help ensure that tasks are clearly assigned and tracked, alleviating confusion and enhancing accountability.

Fostering Collaboration and Continuous Improvement

While addressing minor blockers may feel like a patchwork approach, these small improvements can have a substantial cumulative effect on overall productivity. When IT teams take the initiative to identify and resolve minor challenges, they demonstrate their commitment to facilitating smoother operations across the organization.

Moreover, by actively involving ourselves in the daily activities of other teams, we nascently cultivate a culture of collaboration. Regularly scheduled check-ins or workshops with different departments can open lines of communication and be instrumental in continuous improvement efforts.

Conclusion

In this second installment, we have explored the importance of fully understanding the daily workings of other teams within our organizations. IT can significantly enhance departmental productivity and collaboration by identifying common minor blockers and implementing targeted solutions. Our goal is not just to empower the IT department but to create a robust support system that augments the work of all teams. Small improvements can lead to major appreciation for the teams and setting up robust support structures is the key to effectively increasing the value of the IT team in any organization.

State of IT Part 1: Navigating the Leaning Phase

How IT Teams can drive efficiency and increase their value in a time where companies are undergoing a thinning phase.

When I first sat down to write this piece, my intention was to highlight key areas that demand the attention of tech support teams and administrators. However, as I delved deeper into the topic, it became clear that the current landscape of the tech industry is complex and multifaceted. This post will explore these intricacies, and future installments in this series will provide practical strategies for IT to help our organizations adapt and thrive.

The Current Landscape

The years 2022-2025 have seen stagnation in major tech releases and innovations, with the exception of AI advancements, leading to a massive surge in AI investment. The industry is now grappling with alarming news of substantial layoffs at prestigious companies such as Microsoft, Google, and Amazon. Additionally, many multinational corporations are trimming their workforce, often eliminating entire departments they consider non-essential. Several ongoing projects are facing cuts, particularly those that are deemed over-budget.

In the face of these challenges, where do we, as infrastructure owners, architects, system administrators, failover experts, and infrastructure support specialists, fit in? It appears that companies are undergoing a ‘leaning’ phase—a period where efficiency and productivity are prioritized. We in IT are uniquely positioned to help in this transition. Many of us have already embraced the philosophy of ‘doing more with less,’ adept at finding creative solutions despite budget constraints and supporting ever-changing needs while maximizing the use of existing infrastructure.

How Can IT Help?

As we navigate this ‘leaning’ phase, it’s critical to consider how we can assist our organizations in enhancing their operational efficiency. Here are several steps IT teams can take to streamline processes for other departments:

  1. Evaluate Existing Tools and Services: Begin by taking stock of the various tools and services your organization utilizes across departments. Identify which are essential, which can be combined, and which may be redundant.
  2. Identify Automation Opportunities: Look for processes that could be improved or automated. Automation can significantly reduce manual, time-consuming tasks, thus allowing teams to focus on higher-value work.
  3. Reduce Time Spent on Tool Interactions: Consider how the IT team can optimize workflows to reduce the time other departments spend interacting with their tools. This might involve simplifying user interfaces, providing additional training, or implementing more intuitive systems.
  4. Leverage In-house Capabilities: Explore what IT can do in-house to prevent unnecessary outsourcing. By developing internal solutions, we can help keep costs down and maintain greater control over processes.

By focusing on these areas, IT can uncover inefficiencies and streamline operations in ways that might not have previously been considered. Some administrators might hesitate, believing that teaching other teams to resolve minor issues independently could diminish the perceived value of the IT department. However, this is a misconception. Empowering colleagues with knowledge can foster collaboration and enhance the overall efficiency of the organization.

A good starting point for this reevaluation is to consider key departments such as Finance, HR, and Administration. These areas often involve complex processes that may lack technical expertise. By working closely with these departments to streamline and automate their tools and workflows, we can save both time and resources. This proactive approach not only integrates IT more deeply into the company’s operations but also showcases our ability to create meaningful impact.

From Cost Center to Force Multiplier

In a thinning phase, perception matters as much as performance.

IT teams that remain positioned as reactive support functions will be evaluated as that, a reactive team. An overhead, to the organization. IT teams that position themselves as operational intelligence partners become indispensable.

Driving efficiency is not only about reducing tool sprawl or automating workflows. It is about translating infrastructure decisions into measurable business outcomes.

This means:

  1. Mapping infrastructure spend to business capability
  2. Quantifying the cost of downtime in revenue terms
  3. Highlighting risk exposure in financial language
  4. Providing visibility into consumption trends and mitigating factors available (This is more important than many realize)
  5. Forecasting capacity needs before they become budget crises

Conclusion

This article is the beginning of a series that will delve deeper into the ways IT can support and enhance departmental functions within our organizations. In future installments, we will explore specific strategies, tools, and case studies that highlight successful implementations of these ideas. By collectively focusing on improvement and efficiency, we can elevate our teams and ultimately contribute to our organization’s success in these challenging times. Stay tuned for the next part, where we will discuss practical examples of automation and process improvement.

Welcome!

Welcome!

I have spent the last fifteen years working in the video game industry. Most of that time was focused on building and running infrastructure that had to work under real pressure. I have managed large scale systems, handled more site migrations than I would like to remember, and made decisions around strategy, scale, and budgets when there was no perfect answer.

Alongside the technical work, I have always been writing. Over the years I have written dozens of short stories, usually in the margins between projects, outages, and late nights. Writing has been a constant, even when the work changed.

This space is where those worlds meet. I will be sharing what I have learned from the industry, my thoughts on technology, my take on the movies and shows that affect me, and the stories I am working on.

If any of that sounds interesting, I am glad you are here.