Sean Byron LINUX OPERATIONS SPECIALIST Extensive experience in System Administration and Engineering 4145 Via Marina #221, Marina del Rey, CA 90292 - 818.325.7123 - sean@seanbyron.com Objective To significantly contribute to the direction of an Operations / Engineering team in a leading Internet-based organization, utilizing my extensive experience in Linux Systems Administration. Skills - Extensive Linux (RedHat/Fedora/CentOS/Debian/Slackware) history. 15+ years, 12 professionally. - Professional experience with FreeBSD, OpenBSD, NetBSD, Sun Solaris, SGI IRIX, and Mac OS X. - Knowledge and understanding of a wide range of Internet services and technologies. - Ability to create custom System Administration tools and distribute them uniformly across a network. - Excellent writing skills, including writing technical documents for internal and customer use. - Strong belief in DevOps principles: automation, collaboration, peer review, and version control. WORK EXPERIENCE 11/2010 - Present Senior Systems Engineer - The Rubicon Project (Los Angeles, CA) In November 2011, The Rubicon Project acquired the technology and the engineering team of Fox Audience Network, bringing aboard myself and the majority of the team I was working with at Fox. A major selling point for Rubicon, a fast-growing environment, was the automation-centric systems infrastructure that I helped to create with my team. In just a few short months, we had completely redesigned one of Rubicon's international datacenters with automation and virtualization in mind, allowing us to quantify the benefits of our DevOps-based approach, and pave the way for future plans to redesign additional existing datacenters. In the past few months I have been personally involved in the complete redesign of the Amsterdam datacenter pop, the build-out of a brand-new Las Vegas datacenter hub, and the build-out of a brand-new Virginia datacenter hub. I have also been solely responsible for the implementation of a new system/applications metric suite known as Graphite, as well as leading our team's efforts in automation (via Puppet) and monitoring (via GroundWork/Nagios). RESPONSIBILITIES - System Administration of a large network, including over 2,000 hosts running over 33,000 services. - Solely responsible for all system/application metrics, utilizing Graphite and Ganglia. - Solely responsible for all system/application monitoring, utilizing GroundWork / Nagios. - Authored over 35 custom Nagios plugins, created over 500 Nagios unique service checks. - Primarily responsible for automation, including administration of Puppet Master and Dashboard. - Authored over 200 Puppet classes for management of various applications and configurations. - Solely responsible for numerous cornerstones of operations: Jira, Confluence, Postfix, and more. - Triaged numerous incoming support issues from development, QA, and project management teams. PUPPET AUTOMATION PROJECTS - Audited dozens of classes from various sources to fix dependencies and conform to style guidelines - Authored/published internal docs on Puppet and our usage of it for non-Operations team members. - Authored/published internal docs on Puppet Dashboard and our own `clashboard` interface to it. - Audited and improved all of our Puppet classes to be datacenter agnostic. - Installed/managed Puppet Dashboard, which we use as an External Node Classifier for Puppet. - Migrated all node grouping definitions from nVentory to Puppet Dashboard. - Authored/implemented new Puppet class to manage the configuration of new Puppetmaster servers. - Devised/implemented method to test auto-generated sudoers files for errors before applying to hosts. - Devised/implemented method to watch directories for changes and trigger actions if change occurs. - Implemented PuppetRun in our environment to allow developers and others to trigger catalog runs. - Created custom facter fact to return unique numerical string per host (a padded and stripped version of the host's IP address) for use in various Puppet classes. - Patched the Yum provider in Puppet to support yum downgrade support, allowing us to specify lower package versions in a class and perform automated roll-backs. - Devised/implemented a fully separate Puppet development environment, for creation and testing of new Puppet classes (or changes to existing ones) before rolling them into Production. - Created/implemented class to intelligently manage internal yum repository definitions for all hosts. Based on the hostname of a server, the correct (Prod, Staging, Dev, or QA) repos will be configured. - Installed/tested/implemented the custom VCSRepo type in Puppet. This custom type allowed me to author classes that refer directly to external subversion repository objects - for instance, allowing us to check out configuration files for projects directly from the same svn repo the Developers use. - Condensed 'user access' Puppet classes down from 73 individual Puppet classes (2,118 total lines of configuration) into one single Puppet class (109 lines). Applied this single Puppet class by default to all machines in all datacenters to automatically have the correct SSH and Sudoers access defined for hosts automatically. With no manual intervention, every newly imaged host automatically received the correct configurations. The class parsed the hostname of a system to determine which of the 21 Platforms (Projects) the host belonged to and which of the 6 Statuses (Prod, Dev, QA, etc) the host was in; it would then apply the correct configurations out of a possible 126 variations - while still allowing per-host exceptions to be made for these configurations. - Authored a comprehensive command-line interface to Puppet Dashboard, that allows for very easy manipulation of our Node Classification Database. This allowed us to very quickly apply Puppet classes to hosts, determine what classes were applied to hosts (and what hosts had a certain class applied), clone the classes applied on one host to another, etc. This tool allowed us to better leverage Puppet Dashboard for all Node Classification, and removed our dependence on nVentory, our previous node-classification database (which suffered from performance, stability, and usability issues). The availability of this tool dramatically increased productivity for our team when creating new hosts in our environments, or migrating hosts between environments. - Created dozens of other Puppet classes. Took a leadership role on our implementation of Puppet. GRAPHITE METRICS PROJECTS - Designed and implemented our Graphite implementation for all system and application level metrics. - Created highly-redundant data-replicated Graphite configuration in multiple datacenters. - Architected Graphite for high performance, designed to handle 5,000+ metric updates per second. - Solely responsible for all system/application metrics - currently over 150,000 updating every minute. - Created web UI to Graphite for quick graph building and to save collections of graphs on single page. - Authored dozens of custom metric plugins for Graphite, measuring items such as HTTP response time, partner latency, RTB bid win rate, house-ad serving rate, impressions per country code, network interface errors, profits per impression / per host / total, HTTP error/status code percentages, Nagios service/host check status counts, memcache statistics, DNS response time, and many, many more. GROUNDWORK MONITORING PROJECTS - Solely responsible for all internal host and service monitoring via GroundWork / Nagios. - Authored dozens of custom Nagios plugins and services, called both remotely and locally via NRPE. - Authored custom check_vz_beancounters plugin for monitoring of OpenVZ container resource limits. - Authored multi_nrpe plugin, a wrapper / aggregator of NRPE service checks across multiple hosts. - Created command-line scripts to automate addition of hosts/services to Nagios using MonarchAPI. - Documented existing Nagios alerts and provided recommendations for new alerting. - Created/implemented new Nagios monitoring plugin to alert for Puppet Catalog run failures on hosts. - Authored Puppet class to manage installation of NRPE daemon on all hosts. - Authored Puppet class to manage distribution of custom Nagios plugins to all GroundWork hosts. - Created and managed on-call rotation database and corresponding scripts. OTHER PROJECTS - Created Puppet class to manage all Oracle pre-reqs to aid with migration from Oracle 10G to 11G. - Created/executed plan for rack-level redundancy across application servers in prod environments. - Created scripts to automate creation/destruction/recreation of OpenVZ containers. - Installed/configured the Mailman software in our environment for tracking internal mailing lists. - Deployed new instances / hosts in the Amazon EC2 Cloud using RightScale. - Created scripts to continuously rsync log data between integration points in various datacenters. - Trained SpamAssassin mail filters when needed to eliminate the occasional uncaught spam. - Regularly audited/maintained a large collection of internal documentation in team's Confluence wiki. - Created/destroyed/configured various Projects and Shares on Sun Amber Road appliances. - Amsterdam: Authored Puppet classes for new cage to enforce consistent application configurations - Imaged, installed, and configured core infrastructure services for new cage in Amsterdam, including GroundWork Nagios, Graphite, Ganglia, rsnapshot, Postfix, SpamAssassin, ClamAV, Amavisd-new, Puppet, Puppet Dashboard, and Func - Las Vegas: Assisted in the physical build-out of new datacenter cage in Las Vegas, NV - Imaged, installed, and configured core infrastructure services for new cage in Las Vegas, including DNS, DHCP, Cobbler, GroundWork Nagios, Graphite, Ganglia, rsnapshot, Postfix, SpamAssassin, ClamAV, Amavisd-new, Squid, LDAP, Kerberos, Puppet, Puppet Dashboard, and Func - Authored func plugins for inventorying RAID controllers, hardware utilization, IP Addresses and more. TECHNOLOGY UTILIZED Operating Systems: CentOS, Red Hat Enterprise Linux, Solaris System Automation: Puppet, Cobbler, func, Bash Scripting, SystemImager, Puppet Dashboard Lights-out Datacenter: HP iLO, Avocent Cyclades, func-inventory Monitoring: Groundwork, Nagios, NRPE, Ganglia, OSSEC, Cacti, RRDtool, Graphite Applications: Apache Hadoop, Apache Tomcat, Apache httpd (mod_ssl, suEXEC, suPHP), nginx Databases: Postgres, Oracle, MySQL (including replication), pgpool Software Configuration Management: Subversion, Bamboo, Maven, FishEye, Crucible RPM Packaging: Managing yum repositories, Writing spec files, Building/rebuilding RPMs E-mail: Postfix, SpamAssassin, ClamAV, Amavisd-new Logging: Syslog, Splunk, logrotate, Apache log4j Ticketing / Documentation: Atlassian Jira, Atlassian Confluence Other Software: OpenVZ, LDAP, Kerberos, Squid, ActiveMQ, ZFS, NFS, daemontools, rsnapshot, SugarCRM, Bonnie++, iptables, rsync, Paymentech SDK, PHP development, EC2 & RightScale Hardware (HP): ProLiant 1/3/5 Series (DL140/160/180), (DL320/365/380/385), (DL580/585) Hardware (Sun): Sun Fire X4500 (Thumper), Sun Fire X4540 (Thor) Hardware (Other): Avocent Cyclades ACS48, HP BL2x220c Double-Density Blades, FusionIO 09/2008 - 11/2010 Senior Systems Engineer - Fox Audience Network (Santa Monica, CA) Fox Audience Network supported monetization efforts across News Corp's online content portfolio, as well as third-party publisher sites by leveraging proprietary advertising technology to create highly-targeted advertising campaigns and deliver cutting-edge tools and services. It also developed and managed a self-serve advertising platform, which was utilized by more than 30,000 advertisers. In all, FAN reached 155 million consumers each month and had built relationships with more than 600 online publishers before its successful acquisition by The Rubicon Project in late 2010. Below, I've included a partial list of the responsibilities, projects, and technology applicable to my position at this company. RESPONSIBILITIES - System Administration of a large network, including over 1,200 hosts running over 20,000 services. - Managing all systems in Infrastructure, Production, Dev, QA, Research, and Stage environments. - Working directly with the 12 members of the Systems Engineering / Operations team. - Interacting with dozens of software developers, QA engineers, release engineers and DBAs. - Resolving over 1,000 tickets in 12 months for the 16 internal platforms. - Acting as the sole platform lead for two large and separate software internal platforms. - Coordinating all incoming requests and future requirements for two platforms which I lead. - Interacting with HP and Sun support teams on regular basis to follow up on hardware issues. - Managing subversion repositories for Systems Engineering / Operations team. - Designing and coordinating all host/service monitoring for all software platforms. - Interfacing directly with Puppet Labs team to constantly improve our Puppet implementation. INVENTORY PROJECTS - Created system documentation policies for our large network, which at the time wasn't inventoried. - Audited entire network/datacenter to produce comprehensive inventory listing for 800+ servers. - Developed simple PHP/MySQL inventory tool to store and manage results of inventory audit. - Later, integrated nVentory, a Rails-based inventory tool, with automation tools Puppet and Cobbler. - Streamlined server roll-out process; created Hardware Request form to be submitted by platforms. - Implemented New Server Checklist process for engineers to track progress during deployments. MIGRATION PROJECTS - Assisted in the total migration of all company servers from L.A. to Ashburn, VA with zero downtime. - Personally responsible for complete migration of two key internal software platforms. - Migrated and upgraded Atlassian Jira (3.6.5 to 3.13.3) and Atlassian Confluence (2.2.10 to 2.10.3). - Wrote extensive documentation on migration steps, including 41 page guide on Atlassian migrations. - Additionally, migrated entire SCM suite, including Subversion, FishEye, Maven, Bamboo, etc. AUTOMATION PROJECTS - Distributed configs, managed systems automatically via Puppet, a configuration management tool. - Created custom Puppet classes for wide variety of tasks: installing software, sysctl parameters, etc. - Devised, implemented, and documented integrated testing environment for Puppet development. - Deployed new systems accurately and consistently via Cobbler, a kickstart provisioning system. - Maintained default distribution kickstart via Cobbler, and added new profiles for divergent systems. - Built new RPMs for various unpackaged software and distributed the resulting packages via YUM. - Leveraged Groundwork 'Automated Discovery' to automate monitoring; integrated with nVentory. - Fully automated Ganglia configs via Puppet classes based on extlookup() and generate() functions MONITORING PROJECTS - Massive Groundwork/Nagios installation: 1,200+ hosts, 20,000+ services, 950,000+ lines of config. - Maintained complex on-call rotation system directly in Groundwork via escalations and time periods. - Devised, implemented, and documented numerous monitoring standards for entire team - Performed seamless migration from a poorly realized instance of GroundWork 5.0 to 6.0 - Created highly consistent instance of GroundWork 6.0, using standards and best practices I created - Authored/deployed many custom Nagios plugins for application-level and system monitoring. - Among the plugins created: disk health, Hadoop health, application counters and health page check. - Created high-volume Ganglia (host metrics) system, with RAM-disk RRD storage and snapshots. - Produced custom Ganglia metrics for hosts, including IPMI stats: fan speed, temperatures, etc. - Developed basic monitoring landing page to provide users with quick access to all monitoring tools. - Devised/implemented automated OSSEC (security monitoring) roll-out procedure for critical systems. - Integrated OSSEC alerts into Splunk indexing tool, and added OSSEC-related user dashboards. - Documented all host and service checks for platforms I lead, including appropriate responses. - Implemented continuous feedback cycle for monitoring with development teams. STANDARDS PROJECTS - Created extremely detailed hardware and system inventory based on func automation framework. - Authored plugins for func-inventory: cdpr, cyclades, disks, facter, puppet, rpms, sysctl and more. - Developed user access control policies and standards for systems access. - Created automated method for managing user access policies on servers via Puppet and nVentory. - Aided creation/documentation of internal PCI standards, and implementations of standards. - Wrote and presented an Information Security training session for Operations team members. - Created/documented several internal standards for configuration of various classes of servers. - Work with management to create and document a standard for on-call responsibilities. OTHER PROJECTS - Migrated dozens of non-Production hosts from physical servers and Xen VMs to OpenVZ containers. - Created scripts to automate many aspects of creation of OpenVZ containers - Designed/implemented standard backups for critical systems, based on rsnapshot and rsyncd. - Designed/implemented load-balanced vhost pool for critical sites, using httpd and replicated MySQL. - Designed/implemented MX server pool, a load-balanced mail solution for inbound/outbound mail. - Configured critical systems to produce core dumps by default, and intelligently alert on those dumps. - Configured intelligent log rotation via logrotate and log4j (including DailyRollingFileAppender). - Configured and rolled out NTP daemon across entire network for time synchronization. - Interacted with SCM, Dev, QA teams to deploy major upgrades: Tomcat, JDK, Paymentech SDK, etc. - Benchmarked various hardware; provided results to Systems Architect for purchasing evaluations. - Created several custom RPM packages for required software such as ActiveMQ and Atlassian suite. - Authored several custom init scripts for software that did not provide RedHat-style startup scripts. - Identified single points of failure in our platforms and worked with teams to build redundancy. TECHNOLOGY UTILIZED The technology utilized at this Fox Audience Network position is roughly the same as the technology utilized at the above Rubicon Project position, due to the nature of the acquisition. There are only a few technologies that were exclusive to Rubicon Project (apachelight, Graphite, Amazon EC2 / Rightscale, and PHP development) - and no technologies that were exclusive to this position at Fox Audience Network (that weren't also included in the position at The Rubicon Project). 08/2007 - 09/2008 Operations Manager - InfoStreet, Inc. (Tarzana, CA) InfoStreet took me on as their primary Linux System Administrator, and within three months promoted me to Operations Manager. At this small but fast-growing Software as a Service provider, my role gave me an opportunity to develop a close relationship between the Operations and Development teams in order to set feature goals and meet deadlines. Below, I've included a partial list of the responsibilities, projects, and technology applicable to this position. RESPONSIBILITIES - Lead System Administrator for InfoStreet and its sister company, Topica, based in San Francisco. - Leadership of Operations team, including one Jr. System Administrator in L.A. and one in S.F. - Facilitating communication between Operations and Development teams. - Training coworkers on technical issues not limited to System Administration. - High-level customer support for escalated technical issues. - 24x7 on-call emergency response to off-hours issues. - On and off-site management of four datacenters across three cities. PROJECTS - Designed new datacenter deployment of 30 servers at One Wilshire Annex, including all aspects of hardware selection and purchasing, network design, physical racking and build out of the cage, load testing and configuration of all servers, switches, routers, and Alteon load balancers. - Designed/implemented host, service, and bandwidth monitoring for all 4 cages via Munin/Nagios. - Custom authored many Nagios and Munin plugins for service/metric monitoring. - Implemented rsync-based backup solution for 184 servers located in InfoStreet's four datacenters. - Created 7-node MySQL (Carrier Grade) cluster for application heavily used by 475,000 customers. - Identified resource bottlenecks in internal applications and suggested improvements to Dev team. - Creation of a custom spam filtering system based on Postfix, SpamAssassin, clamAV, and amavis that reduced inbound spam volume by 90% with less than 0.02% false positives. - Established multi-homed, fully redundant network topology based on OpenBSD, PF firewalling, PFsync rule synchronization, CARP IP Address sharing, OpenBGPd and OpenOSPFd routing daemons, and HP ProCurve switches. This design replaced the pre-existing single-homed, non-redundant, iptables solution that was hosted on an underpowered and outdated Linux server and greatly improved network response times and reliability. TECHNOLOGY UTILIZED Web: Apache, mod_python, PHP, mod_perl, custom SaaS web applications Monitoring: Nagios, Munin, SNMP, Mon Databases: MySQL, MySQL Cluster (Carrier Grade), Oracle Languages: HTML, PHP, Perl, Python, Bash E-mail: Postfix, Qmail, Sendmail, Exim, dovecot, Courier IMAP Spam Filtering: SpamAssassin, ClamAV, amavisd-new, BitDefender, Various DNSBLs Other: vsftpd, sshd, BIND, CVS, RCS, rsnapshot, Tuxedo, postal, bonnie++, iostat 04/2001 - 05/2007 System Administrator - ZeroLag Communications (Beverly Hills, CA) I served as the sole Linux System Administrator for ZeroLag Communications, a fast-growing hosting company, for six years. Over this time I had many responsibilities and roles, as our company was relatively small and our team members needed to adapt quickly to any changes in the fast-paced world of Internet hosting. This included hundreds of migrations of live websites and servers across various datacenter locations, both internal and for our hosting customers, as well as a network-wide migration from Red Hat Linux to Debian Linux, after I created, documented, and deployed a custom standard Debian installation process for all of our servers. I take it as a point of pride that many of the systems I deployed and standards I created while at ZeroLag are still in use at the company today. Below, I've included a partial list of the responsibilities, projects, and technology applicable to this position. RESPONSIBILITIES - Development and deployment of system standards and best practices. - Research, selection, installation, and configuration of all software on our Linux servers. - System and network auditing and documentation. - Security analyses and updates. - Abuse handling, including creation and enforcement of an Acceptable Use Policy. - Employee training on technical subjects and company policies. - Technical support for our customer base. PROJECTS - Several scripts to automate common tasks on our internal servers. - A central backup solution for a diverse network of 200+ servers, based on rsnapshot. - A custom ticketing system based on RequestTracker. - A custom spam filtering system based on Postfix, SpamAssassin, ClamAV, and amavisd-new. - A custom monitoring / paging system based on MRTG, Nagios, Munin, and lm_sensors. - Several Nagios plugins to monitor and restart specific services on our network (via event_handlers). - Co-developed ZeroLag's corporate website. - Co-developed ZeroLag's custom control panel, featuring web-based management of all aspects of the hosting environment, from customer information to resources such as DNS records, e-mail accounts, allocated servers, and monitoring configurations. TECHNOLOGY UTILIZED Web: Apache httpd, Apache Tomcat, IIS, PHP, mod_perl, Ruby on Rails E-commerce: OS Commerce, Zencart, Interchange Monitoring: MRTG, Nagios, Munin, lm_sensors Databases: MySQL, PostgreSQL Languages: HTML, PHP, Perl, Bash E-mail: Postfix, Sendmail, Exim, Dovecot, cucipop, OpenWebmail, Horde/IMP, Spam Filtering: SpamAssassin, ClamAV, amavisd-new, Various DNSBLs Other: RequestTracker, vsftpd, ProFTPd, sshd, BIND, CVS, Subversion, Xen, rsnapshot 05/2000 - 03/2001 System Administrator - WorldSite Networks (Beverly Hills, CA) At WorldSite, I served as the primary systems and network administrator for a varied network of Linux, Solaris, IRIX, Windows NT, and Novell Netware servers, heterogeneous clients, and Cisco routers and switches. It was here that I refined my UNIX and networking skills by configuring and installing new servers, implementing a security model, maintaining the current infrastructure of over 75 servers, and troubleshooting system problems. This often involved working hands-on with networking equipment, rack-mounting and building servers in our Network Operations Facility, and communicating with customers to resolve issues. My position at WorldSite also required me to support the office LAN, frame-relay circuits, and in-house datacenter. I left WorldSite following their bankruptcy/acquisition, but they went out as the largest hosting and access provider in the Beverly Hills area, with clients such as Paramount Pictures and New Jersey Films. 02/1999 - 04/2000 System Administrator - Ninja Networks (North Hollywood, CA) Ninja Networks, an Internet consulting firm, required a System Administrator for their network of about a dozen Linux Servers and core Cisco network gear. The primary focus of this position was maintaining the reliability of the network and security of the servers. After training the current staff to maintain and troubleshoot the network, I left Ninja Networks to pursue a more challenging/rewarding work environment. References available upon request.