Per-VM Visibility

user January 25, 2018 0

Troubleshooting storage performance problems in a virtual environment can be dreadful. Complaints about a slow VM can often be attributed to storage, but how do you verify this when the VM is sharing a LUN with a dozen other VMs, and that LUN is a slice of a RAID array that contains many other LUNs? The problem could also have its roots in the ESXi host or the storage network, or even the user’s application. The legacy array provides no statistics on a per-VM basis.

Identifying performance bottlenecks is a time-consuming, frustrating and often inconclusive process that requires gathering immense amounts of data, analyzing that data to form a hypothesis, and then testing. In larger enterprises, this process often involves coordination between several people and departments, and can span many days, weeks, and even months.

Tintri VMstore collects per-VM hypervisor latency stats and directly correlates them with per-VM storage stats. This provides a level of visualization that legacy vendors simply cannot match. The hypervisor latencies are obtained using standard VMware vCenter APIs, while the network, file system and disk latencies are provided by Tintri VMstore, which knows, for each IO request, the identity of the corresponding VM.

Granular stats are collected at all of the following levels:

File level
- flat-vmdk
- swap
- config
- redo logs
- snapshots
Virtual disks
Virtual machine
Target IP
Target ethernet device
Target system

Per-VM Performance Metrics

The troubleshooting process described above is fully automated using Tintri instant bottleneck visualization. For each VM and vDisk stored on the system, Tintri displays a breakdown of the end-to-end latency, from the guest OS down to the disks within the Tintri appliance.

For any VM or vDisk, you can see at-a-glance how much of the latency was spent in the ESXi host, the network, the Tintri file system, or accessing the disk. A history of this information is automatically stored and can be displayed as a graph, so you can see the bottleneck for each VM at any given point over the last seven days.

Tintri then provides these statistics in an intuitive format. In an instant you can see the bottleneck, rather than trying to deduce where it is based on indirect measurements and time-consuming detective work.