Network Engineer
Trantor
Job Description
Job Title: SysEng - Network Engineer Position Overview Looking for a network engineer with experience in datacenter environments and at least light programming experience. β SONiC programmatic iterative configuration (gnmi/yang, swss) experience required β SONiC base configuration (L2, mclag, lag/portchannel, bgp, bfd, etc) experience preferred β FRR experience preferred ( OSPF) β OpenGear experience nice to have β Light systems programming language (c, c++, golang, rust, etc) experience preferred, stronger experience nice to have β Scripting language (python, bash, etc) experience required β Linux administration (bash, systemd units, general system navigation) experience preferred β Virtual networking (VXLAN) experience preferred β AWS networking (vpc, direct connect) nice to have Task Expectations β Programmatic iterative configuration of SONiC switches (yang/gnmi, swss, etc). Expected experience/abilities: β Has used the above previously to configure, or can trivially identify how to implement CRUD operations (or at least CRD) against constructs such as but not limited to: physical and sub interfaces, vxlan/vni, vrf, acl β At minimum must provide correctly functioning examples β Functionality will ultimately be written in golang. Network engineer is merely expected to identify, document, and demonstrate interface functionality for SingleStore team to implement.
Network engineer being able to implement the CRUD/CRD functionality as a library/module in golang would be a plus but is unexpected. β Actual virtual networking control plane implementation is expected to be responsibility of SingleStore team. Network engineer contributing here would be high value, freeing up team to focus on storage implementation. β Base configuration of SONiC switches β Inband L2, mclag, lag/portchannel, bgp, bfd β Aside: Need to set up anycast addresses for metadata service ip, SAG, etc β Bifurcated spine-leaf topology inband β Each side of aisle has 2x spines to be mclagβd β Each spine has 2x connections to each other spine to be lagβd β Each spine has 2x connections to each ToR/leaf on same side to be lagβd β Each side of aisle has private AS β ToR-compute node connections breakouts, ToR-storage nodes standard β Spine-leaf topology out of band β Each side of aisle has 1x spine β Each spine has connection to each ToR/leaf on both sides β Currently each side of aisle has private AS β Can be argued should be single AS β Currently hardcoded L3 β Spine model in use insufficient resources for unified dhcp stack - had to settle on model due to tariff season. isc dhcpd usable. β Server BMCs previously static IPβd and/or infinite leaseβd via DHCP by vendor, require crash carting/manual full reset in order to DHCP β Switch management ports physically connected but not currently configured to be reachable via OOB network β PDU mgmt ports do not DHCP, require on-site troubleshooting to bring into network β Coordinate w/devops on switch integration with monitoring β Transition from ad-hoc to code-driven base configuration β Coordinate w/devops on switch provisioning (ZTP or otherwise) β Coordinate w/devops on SONiC build pipeline β Transition from ad-hoc to code-driven configuration β Ensure multipath functioning correctly β Firewall rules engine appears to favor single source interface for all src/dst resulting in erroneous packet drops β Ensure upstream egress/ingress A/P functioning correctly β Will need to work with network team of colocation vendor providing IP transit to remedy IP transit only having one functioning leg at present β May require on-site work/coordination β Ensure direct connect multipath correctly working β Ensure no overly eager security features negatively impacting legitimate traffic (session drops/throttling, unreasonable latency impacts, etc - currently see ~200ms hit on some traffic) β Ensure no unlicensed security features enabled (I believe dnssec is currently erroneously enabled) β NAT public ips for use β Interzone traffic rules currently permissive to avoid issues in early deployment - More mature tiered scheme necessary for long term β Coordinate w/devops on firewall integration with monitoring β Console network setup β Physical topology of 2x OpenGear OM2224 spines and 16x Opengear IM7248 ToRs β Current state is spines routable, providing loop for firewall mgt ports, cellular not active, ToRs lack ethernet routing, all end-device access currently through nested console sessions β FRR experience required, OpenGear experience bonus β OpenGear cellular fallback does not play nicely with multipath - destroys routing when triggered - cellular fallback should be manually implemented, simple systemd timer with heartbeats over various paths is all thatβs needed β Set up direct to end-device serial console via ssh (existing feature, just needs to be configβd after ethernet routing is set up) β Set up standardized versions for IM7248s, OM2224s β Set up standardized credentials for IM7248s + OM2224s β Need business cellular plan for OM2224s - previously used prepaid consumer (not an option - AT&T allows OpenGears but bars Palo Alto traffic on consumer plan). Verizon 4G coverage in colo area is inconsistent.
One of TMobileβs 4G bands is unsupported by OM2224βs modem. β At least 10GB/mo on each, 50+ preferred - Itβs for emergency access, there is a world where we catastrophic failure requires recovery involving pushing images over these connections, hitting plan limit in an emergency is the last thing we should ever deal with β Will need to work with colocation vendor to coordinate antenna extension installation on roof to ensure reliable cell signal a