this post was submitted on 20 Aug 2024
39 points (95.3% liked)

Sysadmin

10296 readers
2 users here now

A community dedicated to the profession of IT Systems Administration

No generic Lemmy issue posts please! Posts about Lemmy belong in one of these communities:
!lemmy@lemmy.ml
!lemmyworld@lemmy.world
!lemmy_support@lemmy.ml
!support@lemmy.world

founded 2 years ago
MODERATORS
 

What's the best way to monitor and log which processes are responsible for high system load throughout the day? Tools like top and htop only provide immediate values, but I'm looking for a solution that offers historical data to identify the main culprits over time.

@sysadmin

#sysadmin #linux #server

top 16 comments
sorted by: hot top controversial new old
[–] mosiacmango@lemm.ee 12 points 11 months ago* (last edited 11 months ago) (2 children)

Netdata is excellent, simple and I believe FOSS. Just install locally and it should start logging pretty much everything.

[–] vk6flab@lemmy.radio 8 points 11 months ago (2 children)

Clicked the link, started reading .. closed the window when I read "Netdata also incorporates A.I. insights for all monitored data".

[–] gravitas_deficiency@sh.itjust.works 4 points 11 months ago* (last edited 11 months ago)

~~Eesh. Yeah, that’s a nope from me, dawg.~~

Actually, it’s all self-hosted. Granted, I haven’t looked at the code in detail, but building NNs to help efficiency detect and capture stuff is actually a very appropriate use of ML. This project looks kinda cool.

[–] jimmy90@lemmy.world 3 points 11 months ago* (last edited 11 months ago) (1 children)

this limited scope ML trained analysis is actually where "AI" excels, e.g. "computer vision" in specific medical scenarios

[–] vk6flab@lemmy.radio 1 points 11 months ago (1 children)

If the training data is available, yes, in this case, no chance.

[–] jimmy90@lemmy.world 1 points 10 months ago

you don't think they could get training data from friendly customers using their service?

[–] Brewchin@lemmy.world 2 points 11 months ago

I run this in a Docker container on my home network without connecting it to their cloud platform (despite their - increasingly strident, it feels - "encouragements" to do so). It's very powerful, and the majority of low level configuration is done via text files. But 99% of it is automatic.

The UI is unique. It's a single, long and scrollable page, which may be an issue for some.

There are other tools out there, too. I previously used one that integrates Grafana, Prometheus and Node Exporter, which is more complex to set up and configure.

[–] daqu@feddit.org 6 points 11 months ago

In my time we used sar. I feel old when reading about all your new tools I never heard of.

[–] Kkmou@lemm.ee 5 points 11 months ago

I like to use atop at the first step during investigation : https://www.atoptool.nl/

[–] raoul@lemmy.sdf.org 4 points 11 months ago

atop should be available in your package manager and run as a daemon. It stores the history in /var/

[–] j4k3@lemmy.world 4 points 11 months ago* (last edited 11 months ago)

Look through RHEL stuff. I'm not sure if Tuna has exactly what you're looking for, but it is the tool for detailed analysis of processes on logical cores, CPU set isolation and monitoring. RHEL has tools for everything in this area, and most are available in any other distro.

[–] Feddinat0r@feddit.org 3 points 11 months ago (1 children)

https://www.paessler.com/prtg/download We are using this. Loving it but i think only runs on windows. Free for first 100 sensors which should be enough at home.

[–] haywire7@lemmy.world 2 points 11 months ago

Love a bit of PRTG, it can monitor pretty much anything via SNMP and the like.

[–] rowinxavier@lemmy.world 3 points 11 months ago

I did a whole stack of servers using SNMP based monitoring years ago and it was amazing. I could see loads, memory stats, NIC utilisation, disk space, and all sorts of other things. I tried Cacti and Icinga and settled on the latter but they are all fairly similar. Once you are generating the data you van do whatever you like with it, so monitoring load attributable to which actual executable is definitely manageable. It is also handy for getting notifications for something being down, losing stability, or just being out if whack.

[–] freezy@discuss.tchncs.de 2 points 7 months ago

Cockpit will show you load spikes over time pretty much out of the box.

[–] Oisteink@lemmy.world 1 points 10 months ago* (last edited 10 months ago)

I like zabbix. It can monitor what ever i like, using snmp, ipmi, rest apis or its own agent.

I have a team member insisting on using netdata, but outside of the nice dashboard it doesn’t provide anything. It is local only, and setting up alarms is a pain. And tbh it nags more than canonical stuff