NA004 - Autocon3 hallway track

Posted on June 22, 2025 • 7 min read • 1,289 words

Podcast Network-Automation NSOT Autocon Conference Terraform

Share via

Link copied to clipboard

The Autocon3 3 part hallway track

On this page

NA004 - Autocon3 hallway track — Photo by Steinzi

Network Automagic EP004 - Autocon3 hallway track

🎙️ LIVE FROM AUTOCON3! 🎙️

Get ready for an exclusive three-part journey through the buzzling hallways of Autocon3! We’re bringing you the raw, unfiltered conversations that happen between sessions - the kind of deep dives that only happen when network automation experts get together.

🚀 PART 1: Catch up with Damien Garros from Ops mill for an impromptu hallway chat that dives deep into the evolution of network data management and automation strategy.

⚡ PART 2: Join the heated Terraform discussion with Eduardo Pozo & Christian Drefke as they share battle-tested insights from the trenches of enterprise network automation.

🔥 PART 3: Wrap up with John Howard exploring the cutting-edge world of network telemetry and observability - what’s hot in 2025!

This is conference content at its finest - authentic, technical, and packed with real-world wisdom you won’t find anywhere else.

Episode Guests:

Damien Garros - Ops mill - Hallway chat LinkedIn
Eduardo Pozo & Christian Drefke - Terraform discussion Eduardo LinkedIn Christian LinkedIn
John Howard - Network telemetry and observability in 2025 LinkedIn

Listen to the show on YouTube:

Listen to the show anywhere:

Listen now!

YouTube: @networkautomagic
Spotify: Network AutoMagic
Apple Podcasts: Network AutoMagic
RSS Feed: Anchor.fm

Show notes resources:

What we cover:

Key Topics Discussed:

Damien Garros

Network Data Management & Schema Evolution

Infrastructure data management platforms vs “network social truth”
The evolution and controversy around “source of truth” terminology
YANG schema development and its 25-year history
Phil Schaeffer’s contributions to NETCONF and YANG standards
Open models vs closed vendor models debate

Schema Design Philosophy

XML Schema (XSD) influence on YANG development
Starting with vendor-agnostic, topology-level models
Transformation layers for device-specific implementations
The pitfalls of trying to build comprehensive schemas from the start

Automation Strategy & Implementation

Starting small and scaling automation projects incrementally
Workflow analysis and time mapping for teams
Identifying quick wins vs comprehensive overhauls
The importance of understanding current team processes before automating

Building Trust in Automation Tools

Why automation tools often go unused by operations teams
The “black box” problem in network automation
Making automation predictable and transparent
Providing visibility into automation processes and stages

Metrics & Validation

Establishing baseline metrics before implementing automation
Demonstrating value through before/after comparisons
The challenge of selling automation benefits to organizations
Management buy-in and adoption strategies

Practical Considerations

Balancing feature completeness with usability
Service-centric vs device-centric modeling approaches
The reality of budget constraints and gradual implementation
Learning from automation failures and building better tools

Eduardo & Christian

Topics Discussed: Network Automation & Terraform Expert Panel

Participants

steinzi (Host)
Eduardo Pozo (Terraform Expert, Healthcare Environment)
Christian Drefke (Network Automation Expert, Enterprise Integrator)

1. Terraform Provider Evolution & Challenges

Initial Provider Pain Points

3+ years ago: Providers were “half-baked” with ~50% of resources non-functional
Many resources could create but not modify or delete configurations
Frequent crashes and apparent lack of vendor testing
Vendors prioritized development based on client demand and revenue

Solution Strategy

Direct vendor engagement and collaboration
Opening issues and contributing fixes
Community-driven development approach
Beta testing partnerships with vendors

Current State Improvements

Palo Alto: New provider version with significant improvements
Cisco Catalyst Center: Updated provider with better functionality
Overall ecosystem maturity has dramatically improved

2. State File Management & Best Practices

Key Challenges Discussed

Understanding state file mechanics
Difference between refresh, apply -refresh, and regular refresh operations
State file security and validation
Managing distributed state across multiple environments

Recommended Approaches

Split State Files: Essential for scaling (300-400 pipeline runs/day mentioned)
Single Repository Strategy: Avoid repository sprawl per state file
Tools: Terra* (Terramate) for scaling without vendor lock-in
Dev Containers: Consistent development environments across teams

3. Data Validation & Security

Multi-Layer Validation Strategy

Eduardo’s Approach:

Syntax Validation: Data correctness checks
Semantic Validation: Business logic (e.g., BGP neighbor dependencies)
Critical Resource Protection: Preventing deletion of critical VLANs
Custom Python Validation: Instead of Terraform Sentinel

Christian’s Approach:

Strict Input Validation: Fail-fast methodology
External Data Files: Never embed data directly in HCL
Secured Data Repository: Restricted access with Git monitoring
GitLab Security Controls: Repository access management

Fail-Fast Philosophy

“If we let the wrong data come into Terraform, maybe Terraform will go through 1000 objects before seeing that the data is wrong… you already wasted like ten minutes of your time” - Eduardo

4. Source of Truth Evolution

Current Challenges with NetBox

Limited modeling capabilities for specific network values
Need for distributed source of truth solutions

Emerging Solutions

Eduardo’s Direction:

Moving to notebook-based distributed source of truth
Custom plugins and models
Northbound/Southbound API integration
Custom work runners for pipeline triggers

Christian’s Approach:

Evaluating Infrahub solution
Workflow-driven data feeding
Integration beyond networking (Kubernetes, OpenShift, VMware)
Reliable data sourcing with audit history

5. Development Best Practices

Module Design Philosophy

Start Small: Begin with simple modules (e.g., single VLAN creation)
Building Block Approach: Lego-style incremental development
Always Use Modules: Even for non-reusable code (organization benefits)
Versioning Strategy: Module registry with proper version control

Development Environment

Dev Containers: Consistent environments across team members
Same Tool Versions: Terraform, Python packages, plugins
Easy Handoffs: Seamless collaboration and vacation coverage

6. Emergency Change Management

Brownfield Challenges

Manual emergency changes in production
Maintaining state file accuracy
Audit trail requirements in regulated environments

Proposed Solutions

Audit Log Integration: Pulling manual changes from system logs
Workflow Automation: Converting manual changes to approved workflows
TACACS/RADIUS Integration: Triggering Terraform refreshes on manual logins
Slack Bot Integration: Real-time change notifications and approvals

7. AI/ML Integration Considerations

Current Usage

GitHub Copilot: Security scanning in pull requests
Basic AI Tools: Limited use due to data sensitivity

Future Potential

Regulatory Environments: Government approval required for healthcare/sensitive data
Troubleshooting Applications: AI-assisted problem diagnosis
Security Analysis: Automated compliance checking
Configuration Review: AI as “second pair of eyes”

MCP (Model Context Protocol) Considerations

Limited input file integration
Chatbot-driven state file edits
Zero-conflict automated changes
Current limitations in sensitive environments

8. Vendor-Specific Insights

Cisco Ecosystem

Catalyst Center Provider:

Significant improvements over past 2 years
Better API foundation compared to Meraki
Community collaboration success story

ACI Provider:

Consistently reliable from day one
Minimal issues with resource functionality
Active migration to new Terraform SDK

Provider Quality Assessment

Need for star rating system for providers
Evaluation criteria: API foundation, issue resolution, community support
Importance of underlying technology (REST API vs gRPC)

9. Team Structure & Skills

The “Unicorn” Debate

Controversial Opinion: The DevOps unicorn (network engineer + developer) isn’t scalable for sophisticated projects.

Recommended Team Structure

Senior Network Engineers: Product knowledge and design expertise
Dedicated Developers: Software engineering principles and SOLID architecture
Collaborative Approach: Bridge-building between departments
Specialized Skills: High-level expertise in respective domains

Reality Check

“A network engineer can always begin to code, but I feel like we are never going to be as good as a software developer that studied for that” - Eduardo

10. Scaling Considerations

Production Metrics

Pipeline Frequency: 300-400 runs per day
Multi-Environment: One repository per location/hospital
Closed Environment: Local provider/module caching
Container Strategy: Version-controlled runner environments

Architecture Decisions

Tool Selection: Terraform vs Pulumi considerations
State Management: Local vs remote state strategies
CI/CD Integration: GitLab Actions and pipeline orchestration
Security: NDA-compliant implementations in healthcare

Key Takeaways

Provider Maturity: Significant improvements in the last 2-3 years through vendor collaboration
Start Small: Incremental module development approach
Team Composition: Balance of network expertise and software development skills
Validation is Critical: Multi-layer validation prevents costly pipeline failures
State Management: Proper splitting and security essential for scale
Community Engagement: Direct vendor collaboration accelerates provider improvement
Future-Ready: AI integration coming but with regulatory considerations

NA003 - NUTS with Marco Martinez

On this page: