Private Cloud Architecture – pt2 – Selecting Providers

This is the second post of my four part series discussing private cloud architecture. In this post I will discuss selecting the various providers necessary to build your private cloud. The first post in this series can be found here: Private Cloud Architecture – pt 1

There is a lot that goes into selecting each provider type, from their location, to capabilities, to required buildouts, to contracting details; all of which will impact your selection process.

To be honest, this is the most daunting part of setting up a private cloud. There are so many moving pieces that have to move forward at the same time, and there’s simply no room for compromise otherwise you may pay the price in the long run.

Note that we are choosing to not rely on our datacenter providers for any internet bandwidth or private access from one datacenter to another. Even if you choose to use the same datacenter provider for all 3 locations, you want to be in control of your network, don’t let them sell you on private line services or their own internet services. You are trying to control your own destiny here, you might as well use a public cloud if you’re willing to outsource control.

Contracting Considerations

It is highly advisable that all contracts from providers have legal review. Most providers will actually have you execute two contracts, a Master Services Agreement (MSA), and a Sales Order. Often terms in the Sales Order can override terms in the MSA. Request the MSA up front when they provide you a quote so you can identify issues early in the process, it may impact your decision making on what provider(s) you want to use.

In general, I’d recommend starting out with a 3 year initial term for all providers as it will usually give you a good amount of leverage for pricing and terms negotiation. All contracts are negotiable and most contracts will contain a few key terms you need to look out for:

Evergreen clauses. An evergreen clause is an auto-renew clause for a long duration, often with a very short window which can be used to terminate the contract (e.g. auto-renew for a subsequent 3 yr period, but you have to notify them 75-45 days before expiration of the current term. If you notify them outside that window, or via a different communication means than they state, they will auto-renew). Evergreen clauses are outlawed in many states as a deceptive practice for contracts with consumers, but still allowed in business to business contracts. You should never accept an evergreen clause, all providers I’ve dealt with have agreed to remove such a clause. Your contract should go month-to-month after the initial term. It is standard when removing an evergreen clause they will want a longer termination notice, such as 90 days.
Automatic price increases. Many providers will have a clause to auto-increase your service price. You should negotiate both on the amount of increase as well as how often they are allowed to make price adjustments. Most providers will agree to a price lock during the entire initial term, however some may require a modest increase annually. Always ensure price increases are fair (either tied to CPI, or in the range of 3-5%). Some providers may charge a higher rate (10-15%) when a contract’s initial term is over and transitions to month-to-month; don’t be fooled into accepting an evergreen clause because of this, simply negotiate a new contract at the end of your initial term if you wish to keep the provider.
Service Level Agreements (SLAs). Every contract with a provider must have a service level agreement to protect your interests. The primary purpose should be to allow you to terminate the contract without penalty if they do not meet the terms of the SLA. Often the initial SLA in the contract will just give you the ability to request compensation for outages; this is only acceptable for infrequent incidents, but the SLA must allow you to terminate for systemic issues.
Contract Termination Fees. Most (all?) providers will require you pay out the remainder of your contract in the event you want to terminate before your contract term is complete (even auto-renewed contracts due to an evergreen clause). It never hurts to ask for a fixed rate fee to your advantage, but don’t expect it to be granted. Focus more on #1 (Evergreen clauses) and #3 (SLAs) above so you don’t need to exercise such a clause in the first place.

Datacenter Provider(s)

Choosing your datacenter providers is the most critical component of bringing up a private cloud, and it is all about location, location, location. Carriers are easy to switch out and replace, but having to move all your equipment and endpoints for carrier connections is infeasible. Spend the most time and due diligence on your datacenter locations and providers.

Identifying target cities

In the first part of this series, we discussed that you need 3 datacenters in a roughly equilateral triangle, each 100-500 miles apart. Realistically, you’ll want to shoot for somewhere in the middle of that range (lets say 250ish miles). So first you want to figure out the major cities that might fit your needs, optimally you will want to locate one of the datacenters in the same city as your company resides (if it is a major city) simply for the convenience factor.

No matter what, you really want all 3 datacenters to be within driving distance of your IT staff that will be installing and servicing the equipment. When we perform a deployment, we usually have boxes of cables of different types and lengths and various spare parts packing the entire vehicle; after all, these aren’t components you can just pick up at the closest Walmart. It can also be hard to come up with a precise inventory of what you’ll need, and you can guarantee some cable or transceiver is going to be bad, or the wrong length. In short, don’t plan on flying, you’re not going to be able to carry it all with you and you’ll never ship everything you might need.

The next consideration when choosing the appropriate locations for your datacenters is you should locate one in a city that a is a major interconnection point for the region. In the US, that means you should really be targeting Northern Virginia/Washington DC, Atlanta, Chicago, New York City, Dallas, Silicon Valley, Los Angeles, or Seattle for one of your datacenter locations. Such a location will provide a vast number of carriers to choose from and provide the lowest latency to the rest of the world. This is also the city you may want to pick up other peering relationships, such as through an Internet Exchange Point.

Identifying the best datacenter in each city

Ok, now you have identified 3 target cities, its time to choose a datacenter provider in each city. It is very unlikely you will use the same provider in every city; this will take some research. Typically each city has a single city block or building where pretty much all major carriers buildout to first (often an old Ma Bell site), then serve the rest of the city with “tails”; if at all possible that is where you want to be. I’ve seen tail circuits take unbelievable paths that can literally double the distance of a long haul link by weaving through the city. As you can imagine, the longer the path, the less reliable it will be and of course the higher latency it will be.

Actual fiber path provided by a carrier to non-optimal datacenter locations. Note how the fiber runs directly in front of each datacenter to the center of each city, then retraces its path back to the datacenter locations. The tails doubled the overall distance in this real-world example.

Major datacenter players like Digital Realty and Equinix house some of these major interconnection points, and there are often multiple providers on the same block or within the same high rise building often referred to as Carrier Hotels. A good starting point would be searching PeeringDB for exchanges and take note of the number of networks to determine where the likely demarcation in the city may be. If the identified data centers have no available capacity (I’m looking at you, 56 Marietta in Atlanta), often one of the same providers will have a sister location less than a mile away and offer MeetMe room services that mirror the main facility.

Finally, you need to make sure the provider you’re choosing has all the appropriate certifications you might require. This may include things like PCI-DSS and SOC 2, which your contract needs to include as an ongoing requirement.

What to order

Now, lets talk about what you will need to order:

Physical space. Unless you know you have extremely high compute needs up front, often a single full cabinet is sufficient starting out. You’d be surprised what 5x 1U Servers can handle (plus the associated support network equipment). That said, if you get lucky at a major interconnection point and they have the capacity to let you get in there, it could be wise to get 2 side-by-side cabinets so you’re guaranteed future capacity (as its likely in the future there will be insufficient capacity).
Power. At a minimum, you need 2x 120v@20A circuits for redundancy (A/B power). More than likely, however, 20A will not be sufficient as you really don’t want to use more 80% total across all your equipment to be able to handle spikes and circuit failures. I’d recommend going with 120v@30A at a minimum, but really 208v@20A (usually single phase) is more efficient and provides more capacity (but check your equipment for compatibility, most will be able to handle 120v-240v natively).
Cross Connects. A cross-connect is basically just an unlit single-mode fiber pair connecting your cabinet to the MeetMe room of the datacenter so you can connect to the various carriers (either Point 2 Point or IP Transit). This is a major profit center for the datacenter providers, there is often an upfront connection fee plus a monthly fee. You can usually get the upfront fee waived when negotiating the contract, however expect to pay $250-350 per month per cross connect. You can argue all you want about the price being unreasonable (and it is for how little they actually do and what they provide), but it’s not going to change. For this example private cloud buildout we only need 3 cross connects, for larger buildouts you may be able to get bulk discounts.
Blended Internet for Emergency Remote Access. This is a single Cat5 (100Mb or 1Gb) Ethernet link to the datacenter’s own blended internet access, where they will give you a small subnet (usually an IPv4/29) that you can use. This is strictly for backdoor emergency access, and will typically be connected to a dedicated machine running SSH (and of course some form of MFA), providing serial console access to all the networking gear (routers, switches, firewalls). This isn’t strictly required, but sometimes things can go sideways (often due to human error), and you really don’t want to have to drive out to a datacenter to correct things. When negotiating this component, let them know the use-case and that you’ll only ever be using a few kilobits of data, you really want the minimum bandwidth they will sell you (target is under $100/mo including all connection fees).

In all, you should expect to pay around $1500/mo on average for each datacenter provider for all the listed services above. You’ll also want to ensure that your contract doesn’t start until you need the space; datacenters can usually provision your environment quickly, but the carriers will often take much longer.

Finally, make sure you request a carrier list housed in each datacenter as these will be necessary to identify the various point 2 point link and ip transit providers that we will discuss in the next sections.

Point 2 Point Link Providers

Up until this point I’ve been somewhat vague on what I mean by a Point 2 Point link. I’m specifically talking about a Type 1 Unprotected Wavelength. Do not entertain any other offerings such as MPLS, EVPN, or “Ethernet”, they will have unpredictable latency and performance. In particular I am recommending a 10Gbps wavelength as a sweet spot, don’t go any lower as it isn’t cost effective (especially considering cross-connect fees).

We will need 3 wavelength circuits for this private cloud, as we connect every datacenter to every other datacenter. This means you need to choose 3 different providers. Do not use the same provider for more than one wavelength no matter if they tell you they can guarantee divergent paths as there are other opportunities for simultaneous failure if using the same provider.

Wavelength Information

A wavelength is just what it sounds like, it’s a wavelength (aka lambda) of light through fiber. Using DWDM, a single strand of fiber can support 96 simultaneous wavelengths/channels. Using a wavelength offering is optimal as the entire path from A to Z is passive, meaning there is zero added latency as it uses passive filters to separate and combine channels as well as passive amplifiers (Erbium-doped fiber amplifiers). That said, providers will use electronics to convert back to 1310nm at either end for plugging into customer equipment without specialized transceivers and to ensure you can’t exceed the contracted speed.

We want to use wavelength services ONLY as they are considered Layer 1 services and you can use them however you want (e.g. use VLANs, change MTUs, etc), and are guaranteed to be the lowest latency point to point connection.

The “Type 1” simply means the provider owns the entire path and aren’t using another provider for “last mile” services. “Unprotected” means it is a single dedicated path, if there is a fiber cut, it will not reroute; this is fine because we are architecting redundancy ourselves and don’t want an unpredictable path.

Due to the labor cost of running fiber, most fiber is laid as a bundle containing 288 strands, which would be 144 pairs (fiber is typically used in pairs, one for Tx one for Rx). It’s not unusual for long haul paths to lay more than one cable bundle at a time, then of course you have multiple providers laying their own fiber with similar paths between major cities. Multiply this by 96 possible channels/wavelengths, and you can see there is a lot of available capacity out there making wavelengths a cost effective offering.

Path Verification

One of the most important parts of selecting your providers is determining which providers have optimal paths. You should request latency estimates, distances, as well as a KML/KMZ file so you can view the actual fiber paths in Google Earth. The speed of light through fiber is 31% slower than in a vacuum…distance matters. The target latency needs to be 10ms or less for each wavelength circuit; this gives you a 20ms max latency between any 2 datacenters even if there is a fiber cut. This is important if you plan on using synchronous replication between datacenters. Make sure the fiber distance is in the contract!

In addition to having an optimal path, you need to look for path intersections across your potential providers. If you see 2 fiber paths traversing the same 20 mile stretch before diverging to the 2 other datacenters, that is something you want to avoid by selecting a different provider for one of the paths (or ask your providers if there are alternate paths). It is possible they are sharing a stretch of dark fiber (something they still consider Type 1 services, even though they don’t own the dark fiber) or the same conduit, so if a fiber cut were to occur in that region, it would completely isolate that datacenter.

Another thing to consider is actual ingress to the datacenter provider itself. Often large datacenter providers will have more than one entry point, so you should work with your providers to determine if the two you choose for any given datacenter can ingress through different entry points to avoid any “last mile” issues.

Cost and Considerations

Often the first quote you get from each provider will be very high, negotiate! From my experience, you can easily get a 10Gb wavelength for around $1000/month, but I’ve heard of people getting significantly better (granted, they’re probably purchasing more than one per provider).

Another thing to be aware of when crossing state boundaries, the USF tax will add another 30% or so to the final cost, and fluctuates every quarter.

Finally, make sure you have your providers do full capacity verification checks and provide an SLA for installation. The SLA for installation should simply allow you to cancel if they can’t meet the timeline without penalty. Its not unusual for a sales person to tell you they can provision you in 3 weeks, to turn out they need to do a buildout and its 9 months. 90-120 days is pretty typical for installations, unless you’re Telia Carrier (now Arleion) — by far the most competent carrier I’ve worked with. Don’t forget to validate latency upon delivery, its not unheard of for carriers to mis-build the circuit, some times traversing multiple times the original distance estimates; point to your contract and have them re-build.

IP Transit Providers

When choosing an IP Transit provider, you want to select Tier 1 providers also known as backbone providers. These are providers that that exchange traffic with other Tier 1 providers without paying any fees; they do this as they exchange roughly equal amounts of traffic. Different continents might have their own Tier 1 providers.

Such as with Point 2 Point link providers, you will want to use 3 different providers, never use the same provider for the same service type. It would, however, be considered acceptable to use the same provider for one Point 2 Point link and one IP Transit link.

A good starting place in determining which providers are considered Tier 1 would be to reference the CAIDA AS Rank and sort using the AS Cone size. You’d then compare this list to the provider list for each datacenter you’ve selected. You should also do additional due diligence with each provider to ensure they don’t have a history of peering disputes. For instance it would appear Hurricane Electric and Cogent would both be Tier 1 candidates, however, to my knowledge, they still have an ongoing peering dispute and do not exchange traffic resulting in an IPv6 black hole.

Also, in our architecture we’ll be using 2 edge routers at each datacenter location, so make sure you let the IP Transit provider know you’ll be needing to have 2 peers upfront as it will change the allocation they provide you from a /31 to a /29 for IPv4. For IPv6 a /126 can be used for this architecture (as unlike IPv4 there is no network base or broadcast reservation), but for some reason most providers want to allocate a /125 when you have more than 1 peer. In fact, even if you only had one peer, I’ve never seen anyone use an IPv6/127 for some reason; perhaps there is some sort of legacy bug that is known in some routing devices.

Finally, don’t forget to apply the same contracting guidelines in reference to SLAs as with Point 2 Point providers. For a 10Gbps symmetric (unmetered) link delivered via Single Mode Fiber, expect to pay $800/mo on average per provider, but if you don’t need that amount of bandwidth, you can consider 95th percentile contracts that allow bursting up to the 10Gbs connection often for much less. Most providers will offer add-on DDoS services, if you have such business needs you should contract for DDoS services from every IP Transit provider. When subscribing for DDoS, only traffic destined for you from that provider should be scrubbed; this is a better approach than using a single DDoS provider that scrubs all traffic then has to deliver back to you via GRE tunnels or similar (while also then creating a single point of failure which is what this architecture is trying to avoid).

Conclusion

This concludes our provider selection post. Hopefully you have found this information useful for what goes into provider selection. For those following along for price estimates, the monthly costs for all service providers (before tax) for the entire cloud architecture comes out to around $10k/month.

One thing I didn’t mention above … when choosing a P2P or IP Transit provider, I’d recommend avoiding any carrier that also happens to provide Cellular service (or at least the top couple providers)… just talking from experience here 🙂

In the next post in this series I will discuss the hardware architecture (and costs) when designing your public cloud.

The next post in this series is now available: Private Cloud Architecture – pt3 – Hardware