August 9, 2018

what tools / services you are using to monitor your servers?

Hey all

i try to find some solution to monitor servers system status

like cpu,space,server health , process which crashed ..

Thanks


  1. 3

    I'm monitoring the cron jobs for my serivce https://Acct.Watch with https://cronhub.io. All CPU/memory stuff is handled by DigitalOcean.

    1. 1

      how much do you pay for them ?

      in what stack the agent build from

      1. 1

        Zabbix is written in C with a PHP frontend.

        The Dashboard can be extended - for example with "grafana", and

        you can run it on Linux, Windows, Mac - monitor the machine as well as the network.

        It is open source and free to use.

        For a first introduction please watch this (intro + installation on an Ubuntu machine with MySQL BD):

        https://www.youtube.com/watch?v=jpj933xdaHc

  2. 3

    My company does exactly this and we rolled our own solution using Influx (the TICK stack). We wrote a collection of blog posts about picking, building and the alerting that we setup if that is of use.

    You don't mention your scale and experience however and there are some awesome Open source tools out there. One of my favorites is https://www.cacti.net/

    1. 1

      looks great , in what stack the agent build from ?

      1. 1

        The agent is Telegraf, it and the whole TICK stack is written in golang (although you don't need to know that due to the huge number of plug-ins that already exist for it)

        1. 1

          I'm looking for some tiny agent that doesn't take any CPU or take as less as possible , native as much as possible

          1. 1

            The holy grail :-) We found Telegraf very good in that respect. We have servers reporting every 10 seconds down to a fraction of a percent and Telegraf hardly registers as using any resources.

            We even have it running on the RaspberryPi's we have running in the office controlling the TV, music and 3D printer :-)

  3. 2

    I'm using http://hyperping.com + Digital Ocean

  4. 2

    Currently I am using Prometheus in combination with NodeExporter (which collects the data) and Grafana to visualize the data. There is a pre-build dashboard in Grafana for NodeExporter with a lot of metrics.

    Earlier I was trying out netdata which was nice but I felt there was some time related slicing missing which Grafana / Prometheus does.

    Keep in mind though that you usually need to install a few bits of software to get this running. The docs are great for all of those projects, though.

  5. 1

    I am using https://nodequery.com/ for years. It is free, hopefully owner someday will charge users to keep service alive.

  6. 1

    https://amplify.nginx.com

    Amplify from nginx is a great free service.

  7. 1

    We are using Atatus (https://atatus.com) for frontend and backend(node.js and php) monitoring.

  8. 1

    There are a bunch of different answers in this thread. Looks like you are looking for passive monitoring of system metrics. There are a bunch of very good SaaS services or self-managed products for that:

    • Zabbix, Cacti, Datadog, Librato etc. there are A LOT.

    Then there is active monitoring: a thing that pings your service and checks if it's OK. Someone mentioned Hyperping for example. There are many players also in this space:

    • Pingdom, Apex Ping, HyperPing etc. Again there A LOT.

    I'm building Checkly (https://checklyhq.com) which does API & browser transaction monitoring. It's similar to Runscope, just that it adds browser click flow monitoring and at a different price level. Hope this helps.

  9. 1

    I assume you are talking about Linux-based servers.

    Tried out several ones : Zabbix, Icinga, Munin, Nagios, Centreon, Netdata and Prometheus. All are free and open-source.

    My advice is to go with Prometheus : active community, quality and diversity of metrics, easier to hack (by writing your own exporters thanks to the help of many SDK) etc.

    Coupled with Grafana, you obtain a powerful monitoring solution, not only for servers but also for applications and networks. The best part ? You can set up everything in less than an hour. Tutorials are super clear and the community helps you if you encounter any issue.

    My previous company also used Munin. Very ugly interface, but does the job well : super lightweight, secure and diverse metrics.

    Good luck

  10. 1

    I am gonna toot my own horn here. If your server serves APIs, check out moesif (https://www.moesif.com). If really gives you a lot more insights besides just if it is up or down, (which most ping services can do at most), with additional things like if it is behaving abnormally, slow down unexpected, sudden spikes or dips of variety of metrics, etc.

  11. 1

    Kubernetes cluster UI

  12. 1

    +1 for Zabbix. Open source/free, we have it running on a $5/month VPS at DigitalOcean, monitoring 35 of our servers for disk usage, memory, swap, application uptime. Configured custom templates and automatically setup monitoring when a server is created by using the Zabbix API! Does exactly what I needed.

  13. 1

    My day job is monitoring, I look after all aspects of monitoring for an ISP (Grafana, Graphite, Icinga2/Nagios, Prometheus, Collectd, Statsd...).

    I currently use a DO hosted Icinga2/Graphite/Grafana stack, but I'm developing my own SaaS as my first product, which does basic website monitoring. I was going to use Icinga2 as the backend but that's not as fun, so I've written a mock one in NodeJS, currently rewriting in Python as that's my comfort blanket.

    1. 1

      I hope you do understand it is commercial product (Icinga2 )

      and its open source is GPL license which is very problematic

      1. 1

        One of many reasons I chose not to use it. I don't particularly like the direction that Icinga2 is going as-of-late either, and requests for bugfixes are usually met with a 'Pay/Sponsor us, or wait for years' approach, whereas the Prometheus team are ace guys/girls to speak to.

  14. 1

    ...also adding to this list: server access and intrusion detection.

  15. 1

    All my nodejs servers are monitored by keymetrics.