Mahmoud Khalifa — Infrastructure × Intelligence

/01

Signal

Storage and SAN is where I am most at home. I have run SRDF cutovers inside live change windows, designed dual-site PowerMax and VPLEX Metro, and swapped SAN directors while replication kept running underneath. I have moved whole datacenters for banks, utilities, and government bodies without taking the applications offline. The data and AI work I do now runs on that same foundation. The layer is new. The discipline is not.

Role Infrastructure & Storage Solutions Architect · Data & AI Engineer

Focus Enterprise storage, SAN, migrations, DR & business continuity

Delivered 200+ projects · 100+ accounts · 20+ countries

Years 22 — Fujitsu field eng, Dell EMC SA (7 yrs), platform engineering

Certified EMCTAe (full ladder, 7 KM cycles) · 3× AWS Associate · VCE Vblock · Red Hat

Reach North America · 20+ named GCC & MENA accounts

Languages English · Arabic · Python

Lab 80+ vendor API simulators · OpenShift/OKD · AWS CloudFormation

01 — Read the error before you escalate. The answer is usually in the output, not the vendor's queue.

02 — "It runs" and "it's correct" are two different claims. I only ship the second one.

03 — A 30-second guard check beats an unrecoverable fabric at 2am. Every time.

/03

Trajectory

Phase 01 · Foundation

Fujitsu — Field Systems Engineer

Unix and Linux, Oracle RAC, Fibre Channel SAN, backup automation on SPARC and Solaris. The job that taught me the most was a root-disk crash on a live Symbios controller in the field. I fixed it by reading the hex the driver was throwing, not the manual. That is still how I work.

Phase 02 · The Long Build

EMC → Dell EMC — Implementation to Architecture

I started in field implementation and spent seven years as a Senior Solutions Architect after that, across more than 20 accounts in the GCC and MENA. VMAX and PowerMax, VPLEX Metro, SRDF, Brocade and Cisco fabrics. I hold the top EMC architect credential and kept it current through seven Enginuity releases, which is rarer than holding it once. One bank kept coming back for years; the latest job there moved 190-plus SRDF groups and over 1,000 zones off Cisco onto a new PowerMax 8500 build.

Phase 03 · The Crossing

Texas — Building the Watcher

VSI is a Python telemetry platform that took storage onboarding from weeks down to hours. I built the collection daemon that handles 100-plus device types, the 50-plus parser and reporter modules behind it, 80-plus vendor API simulators to test against, and a replication compliance framework spanning nine vendors.

Phase 04 · Now

AI, HPC & the Hybrid Cloud

Cloud observability across accounts, in production. Steady study of AI and ML pointed at HPC and GPU infrastructure. And real engineering work done with AI, kept honest by a verification step it does not get to skip.

Phase 05 · Next

Infrastructure That Reasons

Infrastructure that watches itself, says plainly what is wrong, and plans its own capacity and migrations before anyone files a ticket. That is the part I want to build next.

/04

Transmission

A fresh opinion, written live. You will not get the same one twice.

Press a button and a live model writes one contrarian take on infrastructure. Either a clean opinion, or one built on something real from today's news. Nothing cached, nothing pre-written.

/05

Selected Engagements

Real projects, with the names removed. The numbers are the actual ones from each job, not rounded up for effect.

GCC Bank · multi-yearrecurring

A Bank's Storage, Across Many Projects

This was not one planned program. It was the same bank coming back, project after project, over several years. First an HP 3PAR to VMAX 250FX migration with no downtime, done over SRDF. Then a VMAX 40K re-platform across more than 30 application clusters. Then dual-site DR on VMAX3. The latest was the biggest: a PowerMax 8500 dual-site build and a Gen-7 Brocade director refresh that took the fabric off Cisco entirely, moving 190+ SRDF groups and over 1,000 zones across. A bash suite I wrote drove every cutover, throttling SRDF bandwidth so business hours stayed clean. They kept calling because the earlier work held.

190+ SRDF groups1,000+ zones MDS→DCX4 DCX7-8B directors

Government Utility100+ servers

VMAX → All-Flash via Live VPLEX Metro

Every SAN-attached workload on a government utility, Oracle, VMware, and bare-metal servers alike, moved off a legacy VMAX onto all-flash with nothing taken offline. The path ran through a live VPLEX Metro stretched cluster. I wrote a dozen-plus per-host migration runbooks, re-zoned two Brocade DCX 8510 fabrics, and hit a VPLEX meta-volume problem mid-migration that needed a vendor escalation to clear. It cleared.

Dual VPLEX clusters5-version migration planzero data loss

Government Healthcare~2,700 zones

Datacenter Consolidation to VxBlock

A government healthcare authority wanted its floor simplified. IBM Storwize and NetApp N-series went into VxBlock running VMAX 250F and Unity 400. The zone count is what made it real: roughly 2,700 active zones across two Cisco MDS fabrics. I wrote 18-plus migration bundles, each with a rollback section, because in a hospital you do not get to wing the recovery step.

VxBlock targetdual MDS fabrics18+ runbooks

Financial Services · SAPSRDF/e

Extended-Distance DR for SAP on AIX

A financial-services customer ran SAP on AIX and needed synchronous DR at a distance that normally rules synchronous out. SRDF/e on a fresh VMAX3 was the answer. I did not want to take the design on faith, so the test plan captured 80-plus TimeFinder snapshots every four hours for six days, proving the replication state held while the workload ran hot.

34 masking views80+ TTP snapshots~4 TB largest SG

Field · Fujitsu eralive debug

Cluster Install Failure, Solved in the Field

Early in my career, a cluster install on SCO UnixWare hung on root-disk encapsulation and just scrolled abort messages until it timed out. The Symbios controller was throwing them faster than anyone wanted to read. I stopped reaching for the install guide and started reading the actual driver output. It was a driver conflict; the patch that fixed it broke the cluster software, so the real fix was a timeout value nobody had tuned. That habit, read what the machine is telling you, never left.

read the hexnot the manualroot cause

AI Engineering$2–5 / ticket

Cerebro — An Agentic Delivery Pipeline

Cerebro takes a work ticket and carries it all the way to tested code, through a chain of specialized AI agents: research, design, implement, review, test. I built it to take the manual grind out of vendor API integrations across a fleet of 70-plus collectors. Every phase runs under its own tool permissions and a $15 spend cap, the whole pipeline stops at $100, and after three failed strikes it hands back to a human. Two human approval gates, by design. The quality gates are not optional either: compliance checks, a clean compile, and an automated code review all have to pass before anything ships. And the research phase has to audit the vendor's real GitHub before a line of code gets written, which is what stops the agents from inventing API paths that do not exist. A ticket runs about $2 to $5 and takes 15 to 25 minutes.

70+ collector fleet$100 budget ceilingaudit-before-code

/06

The Twin

Live AI twin · my career as its memory · live web access

A LIVE AI MODEL, BRIEFED ON MY CAREER · ANSWERS FROM A DEFINED KNOWLEDGE BASE

/07

Credentials

Education

Master of Business Administration (MBA)University of the People, USA · 2019–2022
MicroMasters, Supply Chain ManagementMIT · edX
Bachelor of Information TechnologySadat Academy, Egypt · Very Good with Honours

Cloud & AI

AWS Certified Solutions Architect — AssociateAWS
AWS Certified SysOps Administrator — AssociateAWS
AWS Certified Developer — AssociateAWS
OCI AI Foundations AssociateOracle Cloud
OCI Foundations AssociateOracle Cloud
Dell GenAI FoundationsDell

Storage, Infrastructure & Service

EMC Technology Architect Expert (Symmetrix)EMCTAe · top tier
EMC Implementation Engineer Expert (Symmetrix)EMCIE
VCE Certified Professional — Associateconverged infrastructure
Red Hat Certified System AdministratorRHCSA
Fujitsu Certified Expert Systems EngineerFujitsu
ITIL Foundationservice management

EMC ladder maintained across 7 Knowledge Maintenance cycles · Enginuity v5.1–v7.5