Performance patterns for RTOS tasks

Embedded.com – Using design patterns to identify and partition RTOS tasks: Part 1

… any operation whose deadline is measured in milliseconds—not microseconds—is a candidate for a high priority task. Assuming that your system has some serious computing to do at least once in a while, then anything that your system must do in milliseconds will miss its deadline if it has to wait for the serious computing to complete. (If your system never has any time-consuming CPU activity, then you’re unlikely to have any deadline problems anyway and might well stick to a polling loop for your code.)Operations whose deadlines are measured in microseconds typically end up in ISRs in any case. Some examples of things that fall into the millisecond category are (1) Responding after a complete message has been received from another system over the serial port; (2) Constructing a suitable response to a network frame and sending it; and , (3) Turning off the valve when sensors tell us that we’ve added enough of some ingredient to a manufacturing mixture.

CPU Hog Pattern.
Any operation that takes up an amount of CPU time measured in seconds, or perhaps a large number of milliseconds, is a candidate to be moved into a low priority task. The trouble with CPU-intensive operations is that they endanger every deadline in the system. Creating a separate, low-priority task for a CPU-intensive operation gets rid of the interference. Note that of course moving such an operation into a low priority task is the equivalent of moving everything else into a higher priority task. However, the “put the CPU-hogging operation into a low priority task” is often an obvious pattern

Continuous Performance Testing

ACM Queue – High-Performance Team – Does your development team have a high-performance mind-set?

If you want to keep producing high-performance software, you must be able to run reproducible, comparable performance tests. Ideally, you’ll have dedicated, standard hardware on which to run these tests; this should be representative of, if not directly comparable with, what your customers run in production. You’ll run a basic set of performance tests as part of your release cycle, plus more comprehensive benchmarks as required.So what should you test? What is important? You need to find a balance between the time it takes to run the tests and the information they actually give you. A large set of complex tests can tell you a huge amount about your application and even help you track down areas that have caused performance degradation, but that might be too time consuming to run for every release. Simpler tests that can run automatically in less than an hour would be better. Furthermore, your tests need to measure something using public interfaces that are stable between releases; otherwise, maintaining the tests will become an overhead.
Of course, the tests must exercise the operations and code paths that are important to your customers. They must measure the throughput of the common transactions or queries, based on the types of datasets and loadings seen on production systems. If practical, a captured production workload that can be rerun on demand would be ideal.

Designing for performance

ACM Queue – High-Performance Team – Does your development team have a high-performance mind-set?

Designing for performance is a controversial area; there are those who think you must always start by designing for performance, and others who think you should start with something that works and optimize it later. Both approaches have their merits; as always, it’s a case of finding the right balance between the two.

High-level design decisions are often hard to change, and thus are fundamentally tough to optimize. Therefore, at this level, you must consider performance—interfaces between major components, public APIs, and database schemas all fall into this category—particularly as modifications make upgrades difficult. Lower-level design points—for example, a private, nonpersistent data structure—are easier to change, so it’s best to start with something easy to understand and optimize it when it proves to be a problem.

At this point, you need to remember that you’ve already told everyone that performance is important, so they can be trusted to implement those low-level details with performance in mind. Your job now is to encourage experimentation: Rather than theorizing, ask developers to hack together 50-line test rigs to contrast different approaches to the same problem. If a particular algorithm or data structure has been chosen on the basis of performance or efficiency, ask to see the evidence—and say why. You’re not trying to prove anyone wrong or make them look silly; you just want to know they have thought about it and can justify their decisions. What’s more, you want people to back up those thoughts with experimental evidence; you don’t want decisions made based on experience or prejudices gained on an old platform or in a previous job.

Tales from the trenches with a former Enron performance guru

ACM Queue – A Conversation with Jarod Jenson – Tales from the trenches with a former Enron performance guru

Chances are your algorithm is good, so you should do some hunting elsewhere. Use the tools you have today, such as DTrace or VTune. Ask where the application is spending its time. If it is in the code, you’re back to where you started, but I would be willing to bet if you’re having massive performance problems, it’s not directly in your code or more precisely, what you think is your code.

Top 10 things that you must monitor on any server to look for performance and/or scalability issues

Sun Dialogue Programs

(Q): If you have to pick top 10 things that you must monitor on any server to look for performance and/or scalability issues…what would they be?
Richard McDougall (A): Off the top of my head, in no particular order:

  1. CPU: Check idle time and run queue length.
  2. If there’s a CPU bottleneck, check if it’s an application or kernel CPU utilization issue with mpstat: high percentages of users indicate it’s an application issue. High sys may point to high network load or lock contention.
  3. Memory: Check MDBs memstat to ensure there is sufficient free memory
  4. Network: Check that networks are not overloaded by observing the bytes xfered against the availability bandwidth per link.
  5. CPU for network: check if any CPUs are 100% busy servicing network interrupts. CPUs at 100% in mpstat, or intrstat are possible candidates.
  6. File system latency: check the application visible latency with DTrace at the system call level (perhaps fsstat, iosnoop, or an aggregation around system calls).
  7. Storage latency: check disk latency with iostat
  8. Application level lock contention: check application level locks are now visible with plockstat
  9. Kernel level locks: Check for hot locks with lockstat.
  10. Check MMU activity on SPARC using trapstat. Sometimes an application may be reporting as running 100% in user mode, but may actually be spending a significant amount of time in kernel mode servicing TLB misses. Trapstat will show the % of time spent using TLB misses. If a significant amount of time (>10%) is evident, then large MMU pages may help.