I have spent twenty-two years on the storage and SAN layer of large enterprises. Migrations that could not take the application down. Dual-site DR for banks. SAN director swaps with replication running live through the cutover. That is the work. Lately I have been pointing the same instinct at data and AI engineering. There is a live AI version of me on this page if you would rather ask than read.
Storage and SAN is where I am most at home. I have run SRDF cutovers inside live change windows, designed dual-site PowerMax and VPLEX Metro, and swapped SAN directors while replication kept running underneath. I have moved whole datacenters for banks, utilities, and government bodies without taking the applications offline. The data and AI work I do now runs on that same foundation. The layer is new. The discipline is not.
01 — Read the error before you escalate. The answer is usually in the output, not the vendor's queue.
02 — "It runs" and "it's correct" are two different claims. I only ship the second one.
03 — A 30-second guard check beats an unrecoverable fabric at 2am. Every time.
Unix and Linux, Oracle RAC, Fibre Channel SAN, backup automation on SPARC and Solaris. The job that taught me the most was a root-disk crash on a live Symbios controller in the field. I fixed it by reading the hex the driver was throwing, not the manual. That is still how I work.
I started in field implementation and spent seven years as a Senior Solutions Architect after that, across more than 20 accounts in the GCC and MENA. VMAX and PowerMax, VPLEX Metro, SRDF, Brocade and Cisco fabrics. I hold the top EMC architect credential and kept it current through seven Enginuity releases, which is rarer than holding it once. One bank kept coming back for years; the latest job there moved 190-plus SRDF groups and over 1,000 zones off Cisco onto a new PowerMax 8500 build.
VSI is a Python telemetry platform that took storage onboarding from weeks down to hours. I built the collection daemon that handles 100-plus device types, the 50-plus parser and reporter modules behind it, 80-plus vendor API simulators to test against, and a replication compliance framework spanning nine vendors.
Cloud observability across accounts, in production. Steady study of AI and ML pointed at HPC and GPU infrastructure. And real engineering work done with AI, kept honest by a verification step it does not get to skip.
Infrastructure that watches itself, says plainly what is wrong, and plans its own capacity and migrations before anyone files a ticket. That is the part I want to build next.
Real projects, with the names removed. The numbers are the actual ones from each job, not rounded up for effect.
This was not one planned program. It was the same bank coming back, project after project, over several years. First an HP 3PAR to VMAX 250FX migration with no downtime, done over SRDF. Then a VMAX 40K re-platform across more than 30 application clusters. Then dual-site DR on VMAX3. The latest was the biggest: a PowerMax 8500 dual-site build and a Gen-7 Brocade director refresh that took the fabric off Cisco entirely, moving 190+ SRDF groups and over 1,000 zones across. A bash suite I wrote drove every cutover, throttling SRDF bandwidth so business hours stayed clean. They kept calling because the earlier work held.
Every SAN-attached workload on a government utility, Oracle, VMware, and bare-metal servers alike, moved off a legacy VMAX onto all-flash with nothing taken offline. The path ran through a live VPLEX Metro stretched cluster. I wrote a dozen-plus per-host migration runbooks, re-zoned two Brocade DCX 8510 fabrics, and hit a VPLEX meta-volume problem mid-migration that needed a vendor escalation to clear. It cleared.
A government healthcare authority wanted its floor simplified. IBM Storwize and NetApp N-series went into VxBlock running VMAX 250F and Unity 400. The zone count is what made it real: roughly 2,700 active zones across two Cisco MDS fabrics. I wrote 18-plus migration bundles, each with a rollback section, because in a hospital you do not get to wing the recovery step.
A financial-services customer ran SAP on AIX and needed synchronous DR at a distance that normally rules synchronous out. SRDF/e on a fresh VMAX3 was the answer. I did not want to take the design on faith, so the test plan captured 80-plus TimeFinder snapshots every four hours for six days, proving the replication state held while the workload ran hot.
Early in my career, a cluster install on SCO UnixWare hung on root-disk encapsulation and just scrolled abort messages until it timed out. The Symbios controller was throwing them faster than anyone wanted to read. I stopped reaching for the install guide and started reading the actual driver output. It was a driver conflict; the patch that fixed it broke the cluster software, so the real fix was a timeout value nobody had tuned. That habit, read what the machine is telling you, never left.
Cerebro takes a work ticket and carries it all the way to tested code, through a chain of specialized AI agents: research, design, implement, review, test. I built it to take the manual grind out of vendor API integrations across a fleet of 70-plus collectors. Every phase runs under its own tool permissions and a $15 spend cap, the whole pipeline stops at $100, and after three failed strikes it hands back to a human. Two human approval gates, by design. The quality gates are not optional either: compliance checks, a clean compile, and an automated code review all have to pass before anything ships. And the research phase has to audit the vendor's real GitHub before a line of code gets written, which is what stops the agents from inventing API paths that do not exist. A ticket runs about $2 to $5 and takes 15 to 25 minutes.
EMC ladder maintained across 7 Knowledge Maintenance cycles · Enginuity v5.1–v7.5