CPU, temperature, memory, and disk usage


Because software must be able to swap usage between applications, CPU, memory, and disk usage should never be close to 100 percent.

 Here you can see in what way the green CPU usage suddenly drops and then goes up again. If there are large fluctuations in CPU usage, you need to check to identify the processes that are triggering them and if they can cause a perfect storm above 100% processing capacity.

 SUGGESTIONS ON BEST PRACTICES:

• Total CPU <80% - Processes will grow to 100% for short periods of time, so keep your average CPU usage low enough to be efficient.

• Total Reminiscence <70% - Similar to CPU usage, all programs will need swap space and the ability to allocate additional chunks of memory.

• Total Disk <75% - SSDs are so cheap now that you shouldn't be going anywhere. If the disk reaches 95%, you should disable the robot's ability to move or interact, as processes can arbitrarily start to die for no reason due to file I / O failures.

• Process-level processor <60% of an individual core: Unless you have a specific algorithm that cannot be split, most architectures should allow decoupling between multiple ROS nodes or separate processes that interact with each other. correctly with each other.

• Process-level memory <25% - RAM is cheap these days and you can redesign algorithms to work in very small spaces. Most algorithm should be able to run in less than 1 GB, leaving most of the RAM for other uses.

• CPU temperature typically <60 ° C: can reach a maximum of 80 ° C, but should normally be well below. If it is too hot, the processor speed is reduced and the processes that were running previously may take more than 100% of the time. If it increases over time, it should improve cooling, especially for outdoor robots, where the robot housing can become an oven, the internal temperature of a robot can reach 60 ° C or even higher even without the added heat. of the CPU, so the robot can easily overheat. With those indoors that do not have ventilation, this can also happen.

1. Create a resource "guardian" dead robot switch

If one of the best practices for maximum use is out of range, disable movements and other safety interaction on the robot and report it in an alert. If any of these basic resources fail, overall performance and the number of system errors can skyrocket, and you can't even report a problem or maintain control of the system.

2. Find slow memory leaks with multi-day charts

As you zoom out on resource monitoring, you can look for memory usage ramps over time to identify leaks and correlate them to the specific process that caused them by expanding into the details of that process. `top` is a good place to start, but it doesn't clearly represent things over time. You can expand any process that requires more than 1% CPU or RAM on your computer in the Freedom Resource Monitor tab.

3. Check the PID change to reset the process.

If your process's PID (ex: ROS Node) keeps changing over time, it will restart and this is usually caused by a crash / crash. Most of the time this is not noticeable because the process restarts automatically, but the point at which it failed usually masks a resource failure or a code exception. In the resource monitor, you can see the exact start and stop time of each process.

4. Walk away ... a lot ... and stare at the data with narrowed eyes.

It might not sound scientific, but our brains are great model comparators. In Resource Monitor, you can download multiple days of data (although it might take a while!). This will allow you to starts seeing patterns: Does RAM or CPU oscillate or increase over time and correlate with different nodes, processes, or connectivity changes?

We have detected background packages that we have installed that regularly use the CPU, but only once a day. This can lead to hidden crashes in the future.

5. Update your IT and download processes

Consider upgrading your Raspberry Pi to an NVIDIA Jetson or your NUC to a more powerful version. Many robots start with the cheapest and weakest treatment available. This may work fine for a while, but when your resource averages start to peak, you will have times when IT will stop working due to spikes in usage that you can't see in the averages. .

 

 

Popular posts from this blog

Deployment on Mars: mapping and location tutorial

Steam Engine on Industrialization

robotic technologies in logistics operations