16/12/2015

13/12/2015

LibreOffice macro to fix common mistakes with Regex - Python.

However I'm a Linux system admin, but I have other activities too. As a technical writer and an author, I work a lot with text, and as usual I always try to automate what can be automated even outside system administration!
Continue Reading »

23/11/2015

14/11/2015

05/11/2015

25/10/2015

Opsview/Nagios as dynamic inventory for Ansible - Python.

Abstract.

I needed to a unified inventory for all servers I manage, so I made a Python script retrieves servers list and ssh ports from Opsview via APIs, and it supports 4 kinds of output formats: You can find it on Github: Opsview as Ansible inventory.
It looks like a hack more than anything else, and most importantly .. it works! :D
And for now it's enough for current needs.

More details.

Previously, I made a script with AWK and Bash to convert hosts in SSH config file (~/.ssh/config) to Ansible -static- inventory. It was good because I already have most of servers in my SSH config file, but recently I needed to make it more centralized, I need to one place has all servers information!

Puppet clan has many tools for that like Foreman, but I didn't find anything simple and modern for Ansible except Ansible Tower, which is great but too advanced and more than is required. Eventually I found we almost add all servers to monitoring server (mostly based on Nagios either Icinga or Opsview). I didn't like idea of getting data from configuration or database directly, so I decided to make it with APIs.

While Nagios has plugins to get APIs like Nagira, and Icinga team are still working on their APIs (it will be released next November and looks AWESOME! And I'm going to make another script for Icinga once they are finished), so I started with Opsview which is already has a mature APIs.

Approach.

In fact, you can easily get a just list of servers from Opsview with minimum work needed, but as mentioned before, I want one place has all servers with SSH port (thus most of extra work will be to get SSH port!).

So the information I need is:
  • Server Group.
  • Server Name.
  • IP/Hostname.
  • SSH port.

But I need a reliable way to get these data, I want the solution has the following:
  • Work remotely, that's why I preferred APIs in the first place.
  • Don't make a lot of load on server, and don't make many huge queries! Think in terms of lots of servers on your monitoring server.
  • Can be selective, so I have control over servers, for example some devices working through SNMP not SSH like most of routers and switches.

Gathering data.

Almost any monitoring system will have 3 of those information: Group, Server name, and IP/Hostname ... so we just need to SSH port. The cool thing with monitoring system as inventory is SSH port will not be just a written number! It is going to make sure it really working (by default Opsview has SSH port check).

I found just one problem with relying on active check in monitoring system, by default SSH output is something like that "TCP OK - 0.082 second response time on port 22", but if the server -for whatever reason- cannot check SSH port, will give Socket timeout e.g. "Socket timeout after 10 seconds"!

That means you cannot use it as inventory, because it's real check not just fixed number! But if you decided use fixed number will lose the advantage of active check (which check if the port already opened/listen).

So, to get best of the both worlds, we are going to get the data from active check and if it don't have the port for any reason, will get it from a fixed value.

Requirements.

I tried to make it simple and generalize the solution, reduced number of dependences, and use default settings as possible, also minimize modifications and pre-configuration.

So, what do you need to use this script with Opsview? (I will assume you are familiar with Opsview and its terminologies).
  • A "Host Template" has all servers you want them to be in your inventory. You can call this template anything, we just need its ID.
  • A new dummy check script on Opsview server (we will take a look on it after a bit).
  • A "Service Check" using the dummy script we added to main Opsview server.
  • An Opsview user with administrator privileges.

How it works?

Once you made the necessary dependences, here is simply how this script work:
  • Call Opsview API to get content of the "Host Template".
  • Store content of "Host Template" which are server name and server URL in Opsview.
  • Query for every URL in the list to get "Server Group" and "IP/Hostname".
  • Make a query for active SSH check, and if it return ssh port number it will stop here.
  • If active check didn't return ssh port number, then it will check passive check which always will return a number.

Now we have all the data we need which are again: Server Group, Server Name, IP/Hostname, and SSH port. We need just format it in one of 3 formats: OpenSSH config, Anisble JSON, or pure JSON.

Configuration.

After this long introduction, let's go ahead and make some configuration in Opsview for the script. And as mentioned before most of extra work here is just to get SSH port!
  • Create a "Host Template" in your Opsview contains all servers you need to be in your inventory, and get its "ID" (that appears in template URL). So, create a new Host Template and add whatever servers you want to it (or just clone existing one):
    Settings > Basic > Host Template > Create new Host Template


    Just hover the mouse over Template name and you will see the ID:
      https://YOUR_OPSVIEW_URL/admin/hosttemplate/edit/71
    
  • Second step, create a dummy ssh check, which is just a bash script prints any arguments passed to it. Unlike "SSH" check, "SSH-Non-Active" is not a default check, and you need to add it to your Opsview.
    Connect to Opsview server though SSH, and create "check_ssh_dummy" file with following commands:
    cat << EOS > /usr/local/nagios/libexec/check_ssh_dummy
    #! /bin/bash
    echo $*
    exit 0
    EOS
    
    chown nagios: /usr/local/nagios/libexec/check_ssh_dummy
    
    chmod ug+x /usr/local/nagios/libexec/check_ssh_dummy
    
  • Now create a new "Server Check" (or just clone existing one) and name it "SSH-Non-Active":
    Settings > Advanced > Service Checks > Create new Service Check


    As you can see here, you actually can add this check to multiple templates, so if you have many servers with different ssh ports, you actually can set the port to bunch of server with few clicks, or set the port individually for any server (That's why I did choose make a new check not using attribute, which is hard to edit it with massive servers).


  • Create a new Role (Opsview user) with Administrator privileges:
    Don't forget to reload Opsview to read new configuration. We almost done and ready to use the script now! But let's take a look on how it actually works?!
    Settings > Basic > Contacts > Create new Contact
  • Now, you have all required configuration, all what you need is just edit those configuration inside the script or inside the ini file. You have following values to edit:
    [Defaults]
    ; Opsview URL, Role (Opsview user), and its Password.
    ; Security level required for user is "Administrator".
    "Opsview URL": ""
    "Opsview User": ""
    "Opsview Password": ""
    
    ; SSH user that will be printed.
    "SSH User": "root"
    
    ; The ID of template that has all servers in Opsview.
    ; You need to find ID number in your Opsview.
    "Template ID": ""
    
    ; Name of real SSH check.
    "Active check name": "SSH"
    
    ; Name of dummy SSH check.
    "Passive check name": "SSH-Non-Active"
    

Syntax.

You can pass some arguments to this script or edit defaults values directly inside the script OR use ini file, but if you are going to use it with Ansible, you have to set the defaults values, because Ansible pass two arguments only to inventory script which are "--list" or "--host".

You need at least one of 3 arguments: "--json", "--ssh", or "--list" (or the human readable version of the argument "--ansible"), all other variables stored in the script.

    usage: opsview-ansible-inventory.py [-h]
                                        [--template-id TEMPLATE_ID]
                                        [--active-check-name ACTIVE_CHECK_NAME]
                                        [--passive-check-name PASSIVE_CHECK_NAME]
                                        [--user USER]
                                        [--json] [--ssh] [--list]
                                        [--host HOST]

    optional arguments:
      -h, --help                show this help message and exit
      --template-id TEMPLATE_ID
                                The ID of "Host Template" that has all servers in your Opsview.
      --active-check-name ACTIVE_CHECK_NAME
                                Name of active "Service Check" as in your Opsview. Default is "SSH".
      --passive-check-name PASSIVE_CHECK_NAME
                                Name of passive "Service Check" as in your Opsview. Default is "SSH-Non-Active".
      --user USER               Name of SSH user that will be printed as output. Default is "root".
      --json                    Print output as JSON format.
      --ssh                     Print output as OpenSSH config file format.
      --ansible, --list         Print output as Ansible dynamic inventory format.
      --host HOST               Ansible option to get information for specific host.

Output example.

I will assume you already edited the values inside the script, and just need to select output format.

Ansible dynamic inventory (--ansible or --list):
./opsview-ansible-inventory.py --ansible --user xuser
{
    "Group1": {
        "hosts": [
            "Server1", 
            "Server2"
        ]
    }, 
    "Group2": {
        "hosts": [
            "Server3", 
            "Server4",
        ]
    }, 
    "_meta": {
        "hostvars": {
            "Server1": {
                "Hostname": "10.0.0.1", 
                "Port": "22", 
                "User": "xuser"
            }, 
            "Server2": {
                "Hostname": "10.0.0.2", 
                "Port": "22", 
                "User": "xuser"
            }, 
            "Server3": {
                "Hostname": "10.0.0.3", 
                "Port": "22", 
                "User": "xuser"
            }, 
            "Server4": {
                "Hostname": "10.0.0.4", 
                "Port": "22", 
                "User": "xuser"
            }
        }
    }
}

SSH config file(--ssh):
./opsview-ansible-inventory.py --ssh --user xuser
##############################
# Groupname Group1

Host Server1
  Hostname 10.0.0.1
  Port 22
  User xuser

Host Server2
  Hostname 10.0.0.2
  Port 22
  User xuser

##############################
# Groupname Group2

Host Server3
  Hostname 10.0.0.3
  Port 22
  User xuser

Host Server4
  Hostname 10.0.0.4
  Port 22
  User xuser

Pure JSON (--json):
./opsview-ansible-inventory.py --json --user xuser
{
    "Group1": {
        "Server1": {
            "Hostname": "10.0.0.1", 
            "Port": "22", 
            "User": "xuser"
        }, 
        "Server2": {
            "Hostname": "10.0.0.2", 
            "Port": "22", 
            "User": "xuser"
        }
    }, 
    "Group2": {
        "Server3": {
            "Hostname": "10.0.0.3", 
            "Port": "22", 
            "User": "xuser"
        }, 
        "Server4": {
            "Hostname": "10.0.0.4", 
            "Port": "22", 
            "User": "xuser"
        }
    }
}

Conclusion.

Yeah yeah I know, this script isn't a perfect solution, and It looks like a hack more than anything else :D

Build in variables (it got ini file now), and external dependences, and a large number of queries, but in an Agile environment you cannot wait for the perfect solution, and you should make your own, so it's enough for current needs :-)
Continue Reading »

15/10/2015

Getting Started with Ansible - Presentation.

If you looked at the sidebar of this blog, you will find I introduce myself as linux system admin who is too lazy to do anything manually :D

I have no problem to digging and troubleshooting an issue for hours, or dealing with new technology from zero, but
Continue Reading »

09/10/2015

02/10/2015

27/09/2015

Efficient and quick search with find and grep - Bash

When you need to search inside files, as linux user you will think in "grep", but when you try to search in zillion of files (or just thousands), grep isn't the best option for that!

It's very slow especially when you use "-r" option (recursive), even if you are going to use "fgrep" or "grep -F" (--fixed-strings) which interprets patterns as a list of fixed strings), thus it's a little bit faster because it treats the strings literally and doesn't try to parse them with regex.
Continue Reading »

05/09/2015

06/08/2015

21/07/2015

CloudU Certificate? good but not for 2015!

Today I got my "CloudU Certificate", after finishing the curriculum. It contains 10 chapters with 10 quizzes and final exam, it's pretty cool, but also it's a little bit old for 2015! However it's still a good refreshment to anyone even for someone already working in the field or intersted to learn about the theory of Cloud Computing.
Continue Reading »

01/07/2015

Book Summary 01: High Performance Browser Networking - Part1 (Networking 101)

Recently I started reading "High Performance Browser Networking", and If you ask me, I will say it's a masterpiece! Not just for developers, but also for SysAdmins and DevOps! This summary for first part of this great book, the first 4 chapters cover TCP, UDP, TLS, and networking essentials.

Actually, as a system administrator, I've seen a lot of developers do the same mistakes (specially Front-End developers), small mistakes lead to big performance issues! And those mistakes can be fixed with minimum efforts! So I really love this book slogan "What every web developer should know about Networking and Browser Performance!"

This post summarize about 20,000 words in less than 2500 words! However, I'm highly recommend to read the book, it's available online for free!

About The Book:

  • Name: High Performance Browser Networking.
  • Author: Ilya Grigorik - Web performance engineer and developer advocate on the Make The Web Fast team at Google.
  • Description: How prepared are you to build fast and efficient web applications? This eloquent book provides what every web developer should know about the network, from fundamental limitations that affect performance to major innovations for building even more powerful browser applications - including HTTP 2.0 and XHR improvements, Server-Sent Events (SSE), WebSocket, and WebRTC.

Part1 - Networking 101 (CH 1-4)

Ch1. Primer on Latency and Bandwidth.

  • Speed is a feature! faster websites lead to better engagement, better retention, higher conversions! Latency and bandwidth are the bottleneck here!
  • Latency has a lot of components: Delay in propagation, transmission, processing, queuing. So the "Latency" == "all delays between any two sides".
  • Even though we can reach maximum transfer speed using fiber optics but at the end, speed of light has limitations; which are distance and medium.
  • Ironically, we get max speed across the oceans due to less latency, but in the "last mile" between you and your ISP there are a lot of deterioration in speed! That's due to routing and materials between you and your ISP.
  • Bandwidth can do nothing with latency! Think in terms of a bigger hose on a far tank, water still needs to go all the way through the hose to reach you.

Ch2. Building Blocks of TCP.

  • The Internet has two main protocols; IP (Internet Protocol) and TCP (Transmission Control Protocol). Their first appearance was in 1974 by Vint Cerf and Bob Kahn.
  • HTTP -in practice- uses TCP whereas nothing can be done with latency itself, optimization and enhancements in TCP means faster HTTP.
  • TCP Property number zero ... Three-way handshake! (SYN .. SYN ACK .. ACK).
  • The delay imposed by the three-way handshake makes new TCP connections expensive to create, and is one of the big reasons why connections reuse is a critical optimization for any application running over TCP.
  • For the previous reason, a new mechanism has been developed "TCP Fast Open", which is:
  • An extension to speed up the opening of successive Transmission Control Protocol (TCP) connections between two endpoints. It works by using a TFO cookie (a TCP option), which is a cryptographic cookie stored on the client side and set upon the initial connection with the server - Wikipedia.
  • When both ends have different bandwidth capacity, the end with bigger bandwidth may send too much data to other; which may not be able to process it. So, in early versions of TCP, there were multiple mechanisms to address this problem such as "Flow Control", "Congestion Control" and "Congestion Avoidance".
  • Flow control is a mechanism to prevent the sender from overwhelming the receiver with data it may not be able to process - the receiver may be busy, under heavy load ... etc.
  • To address this, every system has a default value for the size of the available buffer space to hold called "RWND" (Receive Window)
  • RWND value is a part of every TCP connection so both sides can dynamically adjust data flow rate that fits each side (and when "RWND=0" this means this side can't get any more data).
  • Flow control can prevent both sides from overwhelming each other, but has nothing to do with the network or any mediator between them.
  • Slow-Start was created to address this issue using CWND (Congestion Window Size), CWND limits amount of data the receiver can have, so the server starts the connection with small amount of data and if there is no packet loss, it will double packet size till packet loss occurs. Then, the server will decrease the size of packets (actually, the server decreases CWND which in turn limits amount of data that is sent to receiver) and the process/it starts again.
  • Slow-Start is very important factor in HTTP connections because we cannot use the full capacity of the link immediately no matter how large the available bandwidth is. And this is a big problem of HTTP connections due to the nature of many of HTTP connections being short and hasty unlike other large connections such as streaming download. So, a lot of web applications performance will be limited with slow-start!
  • If there is a "slow-start" then of course there is "slow-start restart"! SSR is a mechanism which reset the CWND value when the connection becomes idle -even for a blink- so the connection will start again from CWND defaults. this option has bad effect for long-lived connections like HTTP keepalive so disable it.
  • To check and disable it on Linux do the following:
      $> sysctl net.ipv4.tcp_slow_start_after_idle
      $> sysctl -w net.ipv4.tcp_slow_start_after_idle=0
  • Starting from Linux kernel 2.6.39 initial congestion window got a bigger value (Value of 10 segments - IW10), but there are more improvements in kernel 3.2+ as we shall see later.
  • When slow-start hits bandwidth limits and packet loss occurs as mentioned before, there is another mechanism takes place called "Congestion Avoidance". Congestion Avoidance is a mechanism to minimize further loss.
  • Congestion Avoidance is under heavy R&D, so you will find many variants and a lot of implementations.
  • Proportional Rate Reduction for TCP:
    • Determining the optimal way to recover from packet loss is a nontrivial exercise: if you are too aggressive, then an intermittent lost packet will have significant impact on throughput of the entire connection, and if you don’t adjust quickly enough, then you will induce more packet loss!
    • Proportional Rate Reduction (PRR) is a new algorithm specified by RFC 6937, whose goal is to improve the speed of recovery when a packet is lost. How much better is it? According to measurements done at Google, where the new algorithm was developed, it provides a 3–10% reduction in average latency for connections with packet loss.
    • PRR is now the default congestion avoidance algorithm in Linux 3.2+ kernels another good reason to upgrade your servers!
  • Congestion control and congestion avoidance are applied on local networks too! and can also be the bottleneck in those high speed LANs. so you need to take care about this point even in local networks.
  • TCP provides an easy and reliable way to transfer data, but the core feature of TCP is its same weakness point, If one of the packets is lost enroute to the receiver, then all subsequent packets must be held in the receiver’s TCP buffer until the lost packet is retransmitted and arrives at the receiver’s and this effect is known as TCP head-of-line (HOL) blocking.
  • TCP is a great protocol, but not every application needs its features, so don't use it unless you need its features because there’s no way to get some and leave out all the rest.
  • Packet loss is an important part of how TCP works as we have seen before, so it's ok.
  • TCP is a flexible protocol and appropriate for a lot of environments and scenarios, so for the same reason, the best way to optimize TCP is to tune it based on your environment and specific conditions (Web, wireless, LAN, WAN ... etc).
  • Regardless environment, TCP has core principles and their implications remain unchanged.
    • TCP three-way handshake introduces a full roundtrip of latency.
    • TCP slow-start is applied to every new connection.
    • TCP flow and congestion control regulate throughput of all connections.
    • TCP throughput is regulated by current congestion window size.
  • Before you do any optimization in TCP configuration, make sure to upgrade your server because, as mentioned before, there are a lot of enhancements already evolved and implemented.
  • Performance Checklist:
    Optimizing TCP performance pays high dividends, regardless of the type of application, for every new connection to your servers. A short list to put on the agenda:
    • Upgrade server kernel to latest version (Linux: 3.2+).
    • Ensure that cwnd size is set to 10.
    • Disable slow-start after idle.
    • Ensure that window scaling is enabled.
    • Eliminate redundant data transfers.
    • Compress transferred data.
    • Position servers closer to the user to reduce roundtrip times.
    • Reuse established TCP connections whenever possible.

Ch3. Building Blocks of UDP.

  • UDP (User Datagram Protocol) is one of popular protocols nowadays, and is always referred to as the opposite form of TCP, Not because of the features that it has, but the features that it doesn’t have It’s so popular that even its RFC (RFC 768) can be written on a napkin!
  • The UDP protocol encapsulates user messages into its own packet structure, with only four additional fields 1. Source port 2. Destination port 3. Length of packet 4. Checksum (Actually; source port and the checksum fields are optional!!), and lack of other fields as in TCP results the following in UDP:
    • No guarantee of message delivery.
    • No guarantee of order of delivery.
    • No connection state tracking.
    • No congestion control.
  • We can summaries the difference between TCP and UDP at the level of message/information as follows:
    • TCP: Messages can be separated across packets.
    • UDP: Every message encapsulated in one packet.
  • Due to IPv4 being 32 bits long (that means 4,294,967,296 IPs only) and as a result of the increasing in number of devices joining networks. In mid-1994 NAT (Network Address Translators) was introduced. And it was supposed to be a temporary solution, but "there is nothing more permanent than a temporary solution"!
  • If you want to have the service built on UDP only, you will face an issue regarding establishing the connection between the two sides, because the UDP package knows nothing about devices behind NAT!
  • So there are some techniques and methods appeared to solve this issue ... STUN, TURN, and ICE (you can read about them on internet or Wikipedia) every solution has their pros and cons, and to get a reliable connection, you’re going to use one of them and others as failovers.
  • If you’re going to use UDP, you should know that you’re doing a lot of things on your own, remember, UDP doesn’t have any kind of "Flow Control", "Congestion Control" or "Congestion Avoidance".

Ch4. Transport Layer Security (TLS).

  • SSL and TLS are security protocols (asymmetric key cryptography) working in "Session" layer (the layer above TCP), although SSL and TLS technically are different protocols, but a lot of people still use them as equivalent, but they aren't! (actually SSL v3.0 was released in 1996, and as of 2014 the SSL 3.0 is considered insecure because of the "POODLE" attack, that means you should use TLS 1.2).
  • TLS was designed to provide three essential things: 1. Encryption 2. Authentication 3. Integrity.
    • Encryption - A mechanism to obfuscate what is sent from one computer to another.
    • Authentication - A mechanism to verify the validity of provided identification material.
    • Integrity - A mechanism to detect message tampering and forgery.
  • In addition to traditional TCP 3-way handshake, TLS has also other handshakes. And this means more full roundtrip which means it will take more time! (HTTPS = more time! and more if not carefully managed!) The encrypted tunnel must be negotiated: the client and the server must agree on the version of the TLS protocol, choose the ciphersuite, and verify certificates if necessary.
  • Performance of RSA (public-key cryptosystems) and ASE (symmetric-key cryptosystems) varies based on hardware, so you will need to make tests on your hardware to see the performance.
  • Due to high cost of full TLS connection, new techniques have been invented to add ability to resume TLS sessions, so no need to full roundtrip.
  • There are mainly two mechanism of TLS Session Resumption: 1. Session Identifiers 2. Session Tickets.
    • Session Identifiers: as shown by its name it works as follows every client gets unique ID that has some information from first connection, and the server caches this ID, when the client makes a new connection it will do an abbreviated handshake instead of a full one. A lot of web browsers wait for the first TLS/HTTPS connection to take advantage of this feature if available. The downside of this mechanism is that all things done on the server side; The server does a lot of processing, caching and maintaining for every client, and this means a lot of CPU and RAM usage.
    • Session Tickets: It is another mechanism to resume TLS connection. Unlike Session Identifiers, Session Tickets is a "Stateless Resumption", the server makes an encrypted record (ticket) that contains all information of client TLS session, and this record is stored by client. Next time the client makes a new connection it will just send this record. Although this record is stored on the client side, It is still safe because it is encrypted with a key known only by the server. (If you are using multiple load-balancers, make sure that all LBs use the same session key).
  • The main point of TLS and encrypted connections is about "Trust", you must be confident you are connected to the right destination when you make a secure connection! if you aren't so, the encrypted connection is meaningless. So the web browser should have a way to make sure it's connected to the right destination, and it has a choice of three:
    • Manually specified certificates: Every browser and operating system provides a mechanism for you to manually import any certificate you trust. How you obtain the certificate and verify its integrity is completely up to you.
    • Certificate authorities: A certificate authority (CA) is a trusted third party that is trusted by both the subject (owner) of the certificate and the party relying upon the certificate.
    • The browser and the operating system: Every operating system and most browsers ship with a list of well-known certificate authorities. Thus, you also trust the vendors of this software to provide and maintain a list of trusted parties.
  • Practically, Certificate Authority is the most commonly used method on the web, and CA actually is based on something called "Chain of Trust", Chain of Trust simply is when you trust "A" and this one give a trust for "B", then you can trust "B" based on your trust in "A" and this is what happens when "Root CAs" make certificates for any website that wants to use trusted SSL, and every system and browser are preloaded with all CAs they trust by default.
  • The problem appears when you want to check: Does "A" still trust "B"? Is "B"'s certificate still valid? Hence there are many mechanisms that have been developed to check certificate revocation.
    • Certificate Revocation List: CRL it's a list with all revoked certificates, and any browser needs to check the status of a certain certificate has to download the list and find the status of the certificate. the downside of this method is a long list means a long time to download, and there is no notification mechanism when the list is updated.
    • Online Certificate Status Protocol: OCSP came to solve CRL problems, it allows browsers to query about a specific certificate, although this mechanism is faster and uses less bandwidth than CRL, it creates some other problems like that the CAs must create, maintain this service as well and make sure it's up and working as expected. Also this may slow down client navigation because it will be blocked till OCSP query is finished.
  • In reality, CRL and OCSP complement each other and both are used.
  • Due to TLS actually working over TCP, there is no need for additional modifications -or just small ones- in apps to use TLS. And for the same reason the same optimization for TCP will be suitable for TLS too; just small areas need to be checked with TLS like: size of TLS records, memory buffer, certificate size, and support for shortened handshake.
  • As mentioned before, at the end TLS is an additional layer, and this came with a cost in processing and time, but by time and with evolution of software or mechanisms, this cost became smaller and smaller (and of course with right optimization makes the cost even lower).
  • We already know that you can't make packets travel faster, but you can put your server closer to the user, hence CDNs (Content delivery network) appeared, its servers serve the user from the nearest place. CDN can serve the user directly (by caching the content), or working as proxy for origin servers.
  • Maybe CDNs decrease latency for your users, but it can't remove all of latency, so you still need other optimizations for TLS, and the main one is "Session Resumption". Don’t assume "Session Caching" and "Session Ticket" are enabled by default, you need to double-check your server configuration; the good thing that "Session Caching" and "Session Ticket" aren't exclusive. And for best practice you should enable both of them: Session Caching is really easy to configure, but Session Caching may need more things to be checked:
    • Servers with multiple processes or workers should use a shared session cache.
    • Size of the shared session cache should be tuned to your levels of traffic.
    • A session timeout period should be provided.
    • In a multiserver setup, routing the same client IP, or the same TLS session ID, to the same server is one way to provide good session cache utilization.
    • Where “sticky” load balancing is not an option, a shared cache should be used between different servers to provide good session cache utilization.
    • Check and monitor your SSL/TLS session cache statistics for best performance.
  • TLS record size plays an important role in HTTPS performance, but unfortunately there is no right answer for it. It depends on a lot of factors, but there’s an important rule that says: Small records incur overhead, large records incur latency. So make sure TLS record is not separated over multiple TCP packets.
  • TLS compression is not just useless, but also detrimental! you should disable TLS compression because it will expose you to the risk of "CRIME" exploit, and on the other hand, it's not content aware. So it will compress contents that are already compressed like images and videos, and repeating the compression is a waste of resources for both server and client. Just make sure your server is configured to Gzip all text-based assets.
  • Minimizing Certificate-chain length is another little known optimization, additional size of certificate-chain is one of the common mistakes; a lot of sites include unnecessary certificates specially "Certificate of Root CA" which is completely useless!
    • Minimize the number of intermediate CAs. Ideally, your sent certificate chain should contain exactly two certificates: your site and the CA’s intermediary certificate; use this as a criteria in the selection of your CA. The third certificate, which is the CA root, should already be in the browser’s trusted root and hence should not be sent.
    • It is not uncommon for many sites to include the root certificate of their CA in the chain, which is entirely unnecessary: if your browser does not already have the certificate in its trust store, then it won’t be trusted, and including the root certificate won’t change that.
    • A carefully managed certificate chain can be as low as 2 or 3 KB in size, while providing all the necessary information to the browser to avoid unnecessary roundtrips or out-of-band requests for the certificates themselves. Optimizing your TLS handshake mitigates a critical performance bottleneck, since every new TLS connection is subject to its overhead.
  • One of the steps of a secure connection is to check a certificate’s status; is it still valid or revoked? Actually this step is widely different from one browser to another. However, one of the mechanisms to shorten this process called "OCSP Stapling", with OCSP Stapling the server can include the OCSP response from the CA to its certificate chain, this allows the browser to skip extra requests of certificate verification thus decreasing the time of establishing a connection.
  • Another option with TLS is "HTTP Strict Transport Security", HSTS simply a HTTP header tells the browser to use HTTPS by default, this means the browser will automatically send HTTPS requests, so it eliminates unnecessary redirection from HTTP to HTTPS (This is useful when a website is using HTTPS for all connections).
  • Performance Checklist:
    • Get best performance from TCP.
    • Upgrade TLS libraries to latest release, and (re)build servers against them.
    • Enable and configure session caching and stateless resumption.
    • Monitor your session caching hit rates and adjust configuration accordingly.
    • Terminate TLS sessions closer to the user to minimize roundtrip latencies.
    • Configure your TLS record size to fit into a single TCP segment.
    • Ensure that your certificate chain does not overflow the initial congestion window.
    • Remove unnecessary certificates from your chain; minimize the depth.
    • Disable TLS compression on your server.
    • Configure SNI support on your server.
    • Configure OCSP stapling on your server.
    • Append HTTP Strict Transport Security header.
Continue Reading »

24/06/2015

20/06/2015

Ansible module to list groups in inventory - Python.

Update 1/11/2015: I updated the module and made a Github repository "List Ansible inventory groups".

--------------------

I'm really in love with Ansible, it's very simple, very handy ... it's Awesome! You can master it in no time at all! Get Started now!

As far as I remember, I've been started using Ansible since v1.7,
Continue Reading »

05/06/2015

Generate Ansible inventory from SSH config file.

Update 10/1/2016: I updated script and made a Github repository for it: "ansible-inventory-from-ssh-config".
----

For a long time I used "Gnome Connection Manager" to manage remote servers, however I moved to "Terminator". I really love Terminator, but If
Continue Reading »

20/05/2015

15/05/2015

05/05/2015

Powered by Blogger.

Hello, my name is Ahmed AbouZaid and this is my "lite" technical blog!

I'm a passionate DevOps, Linux system administrator, RedHat Certified Engineer (RHCE), AWS SysOps/Solutions Architect, Free/Open source geek, author, interested in environment, calligraphy, and I believe that “Details Matter”!

Automation, data, and metrics are my preferred areas. I have a built-in monitoring chip, and too lazy to do anything manually :D

Popular Posts