We faced a problem while build and delivery phase in a Production environment which we would like to share with you all.
Once the AWS infrastructure provisioned and deployed the application stack and commenced testing, everything was working fine. One week later we understand that application response was very much sluggish. This sluggishness was noticed particularly when copying files to particular application directory which was mounted in ElasticFileSystem (EFS). On the process of finding the issue, we tried executing some commands like “ls” in EFS mounted application directory and “df” both command output was very much delayed. Other metrics like Memory Usage & CPU are minimal as well.
There are two throughput modes available with EFS, Bursting Throughput mode and Provisioned Throughput mode. With Bursting Throughput mode, throughput on Amazon EFS scales as your file system grows. With Provisioned Throughput mode, you can instantly provision the throughput of your file system (in MiB/s) independent of the amount of data stored.
All file systems, regardless of size, can burst to 100 MiB/s of throughput. Those over 1 TiB large can burst to 100 MiB/s per TiB of data stored in the file system. For example, a 10-TiB file system can burst to 1,000 MiB/s of throughput (10 TiB x 100 MiB/s/TiB). The portion of time a file system can burst is determined by its size. The bursting model is designed so that typical file system workloads can burst virtually any time they need to and whenever it’s inactive
or driving throughput below its baseline rate, the file system accumulates burst credits.
For example, a 100-GiB file system can burst (at 100 MiB/s) for 5 percent of the time if it’s inactive for the remaining 95 percent. Over a 24-hour period, the file system earns 432,000 MiBs worth of credit, which can be used to burst at 100 MiB/s for 72 minutes.
File systems larger than 1 TiB can always burst for up to 50 percent of the time if they are inactive for the remaining 50 percent. The minimum file system size used when calculating the baseline rate is 1 GiB, so all file systems have a baseline rate of at least 50 KiB/s.File systems can earn credits up to a maximum credit balance of 2.1 TiB for file systems smaller than 1 TiB, or 2.1 TiB per TiB stored for file systems larger than 1 TiB. This approach implies that file systems can accumulate enough credits to burst for up to 12 hours continuously.
The following table provides more detailed examples of bursting behavior for file systems of different sizes.
|File System Size (GiB)||Baseline Aggregate Throughput (MiB/s)||Burst Aggregate Throughput (MiB/s)||Maximum Burst Duration (Min/Day)||% of Time File System Can Burst (Per Day)|
(Thanks to Amazon Documentation)
So during the go live period, the application responded well until it fully utilized the credit balance. Once the credit balance exhausted, it started responding at the baseline rate of 50 Kb/s. We were using 5GB. The cause is there is no enough available IOPs to respond in time.
We changed the mode to Provisioned Throughput mode to have a dedicated throughput rather than throughput based on size of the file system. So we Provisioned throughput as 5 Mb/s to ensure we start minimally and optimize further after performance testing is complete.
Enter your information to get the invitation for free seminarsX