Report Server DB in a SQL Server Failover Cluster

Failover clustering is supported only for the report server database; you cannot run the Report Server service as part of a failover cluster.

To host a report server database on a SQL Server failover cluster, the cluster must already be installed and configured. You can then select the failover cluster as the server name when you create the report server database in the Database Setup page of the Reporting Services Configuration tool.

Although the Report Server service cannot participate in a failover cluster, you can install Reporting Services on a computer that has a SQL Server failover cluster installed. The report server runs independently of the failover cluster. If you install a report server on a computer that is part of a SQL Server failover instance, you are not required to use the failover cluster for the report server database; you can use a different SQL Server instance to host the database.

Hope this helps!

Happy learning!

Aman Kharbanda


Parameter Groups with Amazon RDS

A parameter group, in AWS terminology, would be best defined as a container with all the configuration values that are applied on a DB instance.

A default DB parameter group is created when you create a DB instance without specifying a customer-created DB parameter group.
This default group contains database engine defaults and Amazon RDS system defaults based on the engine, compute class, and allocated storage of the instance.
You cannot modify the parameter settings of a default DB parameter group; you must create your own DB parameter group to change parameter settings from their default value. Note that not all DB engine parameters can be changed in a customer-created DB parameter group.

Changing the values in your customized parameter group might call for an reboot of database – for e.g. Auto-Vacuum in PostgreSQL RDS is something which is not enabled by default and it’s a good practice to keep the vacuum enabled. In order to achieve this, one must create a new parameter group (clone it from one of the defaults), and then change parameters as necessary.

When you change a dynamic parameter and save the DB parameter group, the change is applied immediately regardless of the Apply Immediately setting. When you change a static parameter and save the DB parameter group, the parameter change will take effect after you manually reboot the DB instance.

To update your database with this parameter group, from the default group, calls for a reboot to bring in the effect.

Happy learning!

Aman Kharbanda

High-Availability with Amazon RDS

With Amazon RDS, the high-availability feature is not a scaling solution for read-only scenarios; you cannot use a standby replica to serve read traffic. To service read-only traffic, you should use a Read Replica.

RDS provides high availability and failover support for DB instances using Multi-AZ deployments. Amazon RDS uses several different technologies to provide failover support. Multi-AZ deployments for Oracle, PostgreSQL, MySQL, and MariaDB DB instances use Amazon’s failover technology. SQL Server DB instances use SQL Server Mirroring. Amazon Aurora instances stores copies of the data in a DB cluster across multiple Availability Zones in a single AWS Region, regardless of whether the instances in the DB cluster span multiple Availability Zones.

If you have a DB instance in a Single-AZ deployment and you modify it to be a Multi-AZ deployment, Amazon RDS takes a few steps at the backend.
Amazon RDS takes a snapshot of the primary DB instance from your deployment and then restores the snapshot into another AZ. Amazon RDS then sets up synchronous replication between your primary DB instance and the new instance. This action avoids downtime when you convert from Single-AZ to Multi-AZ, but you can experience a significant performance impact when first converting to Multi-AZ. This impact is more noticeable for large and write-intensive DB instances.
Once the modification is complete, Amazon RDS triggers an event (RDS-EVENT-0025) that indicates the process is complete.

Amazon RDS automatically provisions and maintains a synchronous standby replica in a different Availability Zone. The primary DB instance is synchronously replicated across Availability Zones to a standby replica to provide data redundancy, eliminate I/O freezes, and minimize latency spikes during system backups.

Happy learning!

Aman Kharbanda

How do I prioritize RedShift queries or segregate queries into different queues?

Workload Management (WLM) is an ordered set of query queues that define how resources are allocated and how queries are routed for processing.
In Amazon Redshift workload management (WLM), query monitoring rules define metrics-based performance boundaries for WLM queues and specify what action to take when a query goes beyond those boundaries. For example, for a queue dedicated to short running queries, you might create a rule that aborts queries that run for more than 60 seconds. To track poorly designed queries, you might have another rule that logs queries that contain nested loops.
WLM evaluates metrics every 10 seconds. If more than one rule is triggered during the same period, WLM initiates the most severe action—abort, then hop, then log. If the action is hop or abort, the action is logged and the query is evicted from the queue. If the action is log, the query continues to run in the queue. WLM initiates only one log action per query per rule. If the queue contains other rules, those rules remain in effect. If the action is hop and the query is routed to another queue, the rules for the new queue apply.

By default, changes to concurrency, memory allocation, or timeout duration are applied dynamically when you save. All other changes require a cluster reboot. Optionally, you can choose to defer dynamic changes.

Defer dynamic changes until reboot

Hope this helps.

Happy learning!

Aman Kharbanda

Limitations for Microsoft SQL Server if used on AWS as their RDS Service

So before you plan on moving your workloads from On-Prem SQL Server to the cloud, if your preferred public cloud vendor is going to be AWS and you wish to use their DBaaS i.e RDS then here are a few limitations, as of this writing, that you may encounter with SQL Server.
As always and with every cloud vendor, there are product updates almost every day, refer to their website for more up to date changes.

Limits for Microsoft SQL Server DB Instances

The Amazon RDS implementation of Microsoft SQL Server on a DB instance have some limitations you should be aware of:

  • You can create up to 30 databases on each of your DB instances running Microsoft SQL Server. The Microsoft system databases, such as master and model, don’t count toward this limit.
  • Some ports are reserved for Amazon RDS use and you can’t use them when you create a DB instance.
  • Amazon RDS for SQL Server does not support importing data into the msdb database.
  • You can’t rename databases on a DB instance in a SQL Server Multi-AZ with Mirroring deployment.
  • The maximum storage size for SQL Server DB instances is the following:
    • General Purpose (SSD) storage: 16 TB for all editions
    • Provisioned IOPS storage: 16 TB for all editions
    • Magnetic storage: 1 TB for all editions

    If you have a scenario that requires a larger amount of storage, you can use sharding across multiple DB instances to get around the limit. This approach requires data-dependent routing logic in applications that connect to the sharded system. You can use an existing sharding framework, or you can write custom code to enable sharding.

  • The minimum storage size for SQL Server DB instances is the following:
    • General Purpose (SSD) storage: 200 GB for Enterprise and Standard editions, 20 GB for Web and Express editions
    • Provisioned IOPS storage: 200 GB for Enterprise and Standard editions, 100 GB for Web and Express editions
    • Magnetic storage: 200 GB for Enterprise and Standard editions, 20 GB for Web and Express editions
  • Because of limitations in Microsoft SQL Server, restoring to a point in time before successful execution of a DROP DATABASE might not reflect the state of that database at that point in time. For example, the dropped database is typically restored to its state up to 5 minutes before the DROP DATABASE command was issued, which means that you can’t restore the transactions made during those few minutes on your dropped database. To work around this, you can reissue the DROP DATABASE command after the restore operation is completed. Dropping a database removes the transaction logs for that database.

Happy learning!

Aman Kharbanda

Workaround to get SQL Server’s Developer Edition on Amazon RDS

Amazon RDS doesn’t support running SQL Server Analysis Services, SQL Server Integration Services, SQL Server Reporting Services, Data Quality Services, or Master Data Services on the same server as your Amazon RDS DB instance. To use these features, the recommended way forward is that you install SQL Server on an Amazon EC2 instance, or use an on-premise SQL Server instance, to act as the Reporting, Analysis, Integration, or Master Data Services server for your SQL Server DB instance on Amazon RDS. You can install SQL Server on an Amazon EC2 instance with Amazon EBS storage, pursuant to Microsoft licensing policies.

Because of licensing requirements, AWS can’t offer SQL Server Developer edition on Amazon RDS. You can use Express edition for many development, testing, and other nonproduction needs. However, if you need the full feature capabilities of an enterprise-level installation of SQL Server, you must use a dedicated host environment. You can download and install SQL Server Developer edition (and other MSDN products) on Amazon EC2. Dedicated infrastructure is not required for Developer edition. By using your own host, you also gain access to other programmability features that are not accessible on Amazon RDS.

Hope this helps while you plan to host your SQL Server in Amazon world!

Happy Learning!

Aman Kharbanda

AWS S3 Data Lake Architecture

A data lake is an increasingly popular way to store and analyze data that addresses the challenges of dealing with massive volumes of heterogeneous data. A data lake allows organizations to store all their data—structured and unstructured—in one centralized repository. Because data can be stored as-is, there is no need to convert it to a predefined schema.

Many organizations understand the benefits of using Amazon S3 as their data lake. For example, Amazon S3 is a highly durable, cost-effective object start that supports Open Data Formats while decoupling storage from compute, and it works with all the AWS analytic services. Although Amazon S3 provides the foundation of a data lake, you can add other services to tailor the data lake to your business needs. For more information about building data lakes on AWS, see What is a Data Lake?

Because one of the main challenges of using a data lake is finding the data and understanding the schema and data format, Amazon recently introduced AWS Glue. AWS Glue significantly reduces the time and effort that it takes to derive business insights quickly from an Amazon S3 data lake by discovering the structure and form of your data. AWS Glue automatically crawls your Amazon S3 data, identifies data formats, and then suggests schemas for use with other AWS analytic services.

AWS Glue is an essential component of an Amazon S3 data lake, providing the data catalog and transformation services for modern data analytics.

In the preceding figure, data is staged for different analytic use cases. Initially, the data is ingested in its raw format, which is the immutable copy of the data. The data is then transformed and enriched to make it more valuable for each use case. In this example, the raw CSV files are transformed into Apache Parquet for use by Amazon Athena to improve performance and reduce cost.

The data can also be enriched by blending it with other datasets to provide additional insights. An AWS Glue crawler creates a table for each stage of the data based on a job trigger or a predefined schedule. In this example, an AWS Lambda function is used to trigger the ETL process every time a new file is added to the Raw Data S3 bucket. The tables can be used by Amazon Athena, Amazon Redshift Spectrum, and Amazon EMR to query the data at any stage using standard SQL or Apache Hive. This configuration is a popular design pattern that delivers Agile Business Intelligence to derive business value from a variety of data quickly and easily.

Hope this helps!

Happy learning.

Aman Kharbanda