DevOps : master.dacpac does not exist error message in VSTS (part V)

July 15, 2018, 6:10 am

≫ Next: WorldWideImporters Data warehouse Datamodels

≪ Previous: DevOps : Building your SSIS project in VSTS (Part IV)

One of the problems I had with automated builds was the database reference in the .sqlproj file. I noticed that the master- and the msdb dacpac was referenced to a location on the C: drive. This works fine in the situation where you're the only developer, but when working together in a team and even more when you use VSTS as your build environment, it is a not a good practice. If you try to build in VSTS when the dacpac has a standard location, you'll get a message like this:

C:\Program Files (x86)\Microsoft Visual Studio\2017\Professional\Common7\
IDE\Extensions\Microsoft\SQLDB\Extensions\SqlServer\110\SqlSchemas\master.dacpac"
does not exist.

To get rid of this error message is to include the master.dacpac (and the msdb.dacpac) file (!) in your project (with location option, add database reference) and so I did...

But building the project in VSTS kept on giving this error. Investigating the .sqlproj file showed me that the reference was still there(?!) even when I deleted the database reference in the solution explorer.

I finally manually removed the entries in the .sqlproj file and the problems were away...


<ArtifactReference Include="C:\Program Files (x86)\Microsoft Visual Studio\2017\Professional\Common7\IDE\
 Extensions\Microsoft\SQLDB\Extensions\SqlServer\110\SqlSchemas\master.dacpac">
<HintPath>$(DacPacRootPath)\Extensions\Microsoft\SQLDB\Extensions\SqlServer\110\SqlSchemas\master.dacpac</HintPath>
</ArtifactReference>
<ArtifactReference Include="C:\Program Files (x86)\Microsoft Visual Studio\2017\Professional\Common7\IDE\
    Extensions\Microsoft\SQLDB\Extensions\SqlServer\110\SqlSchemas\msdb.dacpac">
<HintPath>$(DacPacRootPath)\Extensions\Microsoft\SQLDB\Extensions\SqlServer\110\SqlSchemas\msdb.dacpac</HintPath>
</ArtifactReference>

Hennie

↧

WorldWideImporters Data warehouse Datamodels

November 2, 2016, 6:34 am

≫ Next: DevOps : Investigating different options 'building' Data warehouse / BI project in VSTS (part V)

≪ Previous: DevOps : master.dacpac does not exist error message in VSTS (part V)

Introduction

Just a small blogpost about the data models of World wide Importers data warehouse (WorldWideImportersDW). I just wanted to get an overview of the data warehouse models of World Wide Importers demo database of SQL Server 2016. There seems to be 6 facts:

Order
Sale
Purchase
Stock Holding
Movement
Transaction

WideWorldImportersDW is the main database for data warehousing and analytics (OLAP – OnLine Analytics Processing). The data in this database is derived from the transactional database WideWorldImporters, but it uses a schema that is specifically optimized for analytics.

Order

This is the order fact of the WorldWide Importers database and there a couple of (role playing) dimensions here:

Order date and Picked Date (Date)
SalesPerson and Picker (Employee)
City
Date
Customer
StockItem

Sale

The sales fact contains almost the same dimensions as the order fact. There are a couple of (role playing) dimensions here:

InvoiceDate and DeliveryDate (Date).
SalesPerson (Employee).
City.
Date.
Customer and BillToCustomer (Customer).
StockItem

Purchase

The purchase Fact has the following dimensions :

Date
Supplier
StockItem

Stock holding

The StockHolding is a bit weird fact, in my opinion. I don't why and how but there seems to be one dimension:

Stock holding

Movement

The movement fact has the following dimensions:

Date
StockItem
Customer
Supplier
TransactionType

Transaction

And the last one, Transaction has the following dimensions:

Date
Customer and BillToCustomer (customer)
Supplier
TransactionType
PaymentMethod

Conclusion

A simple and overview of the WorldWideImporters demo database of SQL Server 2016.

Greetz,

Hennie

↧

DevOps : Investigating different options 'building' Data warehouse / BI project in VSTS (part V)

August 25, 2018, 3:13 am

≫ Next: DevOps : Testing of a Database project in Azure Devops (Part VI)

≪ Previous: WorldWideImporters Data warehouse Datamodels

Introduction

I'm gaining more and more knowledge and experience with the Microsoft VSTS environment and 'building' a typical data warehouse and BI project. Now, there is 'building' and there is 'building'. I mean with building, the 'Build' option as we know it in Visual Studio, resulting in deployable packages or others. In Visual Studio, you can build a solution or a project and this will result in a deploy able package, for instance a dacpac or an asdatabase file.

A typical Data warehouse/BI project may consist of the following types of projects :

a Database project.
a SSIS project.
a SSRS project.
a SSAS Tabular project.
a SSAS Multidimensional project.

In order to setup a build and release process, the first step is to build these kind of projects and there seems some confusion on how to build these kind of projects with VSTS. These are the options:

Use MSBuild Task.
Use Visual Studio Build Task.
Use Devenv.com commandline Task.
Use a third part solution Task.

For each of these types of projects, I'll explore and describe the different options of building the different type of projects.

Database project build

First, let's start with a database project. This is the main project of every Microsoft Data warehouse/BI project. In a database project there are all kinds of database objects stored and those can be build in a so called dacpac file. Such a dacpac file is deployable on different environments. There are a couple of options and I'll explore them and describe them here. There is a lot of information available on building such projects in VSTS.

MSBuild build

This is the most popular option regarding building a database project. MSBuild is the Microsoft Build engine and is used for building software, not only database projects but also software like C#. Now below is a screenshot shown of a typical build with a MSBuild task in VSTS.

So there are five steps building a database project:

Get the code from the repository (I use Git).
Choose the right agent (I use VS2017).
Build the project with MSBuild.
Copy the files from the build folder to the artifacts staging folder.
Publish the files to the artifacts folder, where you can download the file.

I've included the specific build step code here. Just specifying the sqlproj file is enough to build the sql projects. The **/* is a wildcard for every subfolder were .sqlproj exists.

DevEnv.com build

Yet another option for building a database project is using devenv.com. Devenv.com is the IDE environment and is used as a container for creating applications with different languages. In the folowing build process in VSTS I've replaced the MSbuild task with a command line task and used devenv.com as the command line utility to build the database project.

I've used the following settings for the commandline:

I specified the solution, whether or not to rebuild and the project location.

Visual Studio Build

Yet another task, that is available for building a database project is the Visual Studio Build task. The options almost covers the same options of the MSbuild and the command line utility DevEnv.com.

I've used the following settings :

And here I say Build every sqlproj file in a folder.

SSIS project

Also an important part of a data warehouse project is ETLing with SSIS and therefore it needs a deployment package for releasing it in a DevOps environment. This will be a ispac file.

MSBuild build

I tried to build a SSIS project with MSBuild but the MSBuild doesn't recognize the dtproj file, unfortunately. That is a pity. The following error occurred.


The element <DeploymentModel> beneath element <Project> is unrecognized.

Devenv.com

It is possible building a SSIS Project with the Devenv.com tool. This is done exactly in the same way as building a database project with devenv.com.

Below I've included the specific steps for building a SSIS Project with the program Devenv.com.

Tool :

C:\Program Files (x86)\Microsoft Visual Studio\2017\Enterprise\Common7\IDE\devenv.com

Arguments:

"$(Build.SourcesDirectory)\WideWorldImporters-SSDT.sln"
/rebuild $(BuildConfigurationSSIS) 
/project 
"$(Build.SourcesDirectory)\SSISWideWorldImporters-SSDT\SSISWideWorldImporters-SSDT.dtproj"

Visual Studio Build
Now building the project with Visual Studio is also a problem. There is also an error returned when executing the Visual Studio Build.


The target "Build" does not exist in the project.

SSIS Build by Toxic globe

SSIS Build & Deploy is a 3rd party tool that is developed by ToxicGlobe and is available on the Marketplace. It is free. It has quite some downloads and a good rating. I've tested the build task and it works great.

There are a couple of more 3rd party custom tasks that build and/or deploy SSIS task. I haven't tried them myself.

A SSRS project

Another important project involved in a Microsoft DWH and BI project is a SSRS project, a reporting project. Building a report project results in a copy of the rdl files to the build folder.

MSBuild build

I tried to build a SSRS project with MSBuild but the MSBuild doesn't recognize the rptproj file, unfortunately. The following error occurred.


The target "Build" does not exist in the project.

Rptproj project files are also not supported by MSbuild.

Devenv.com
Fortunately, with the Devenv.com command line utility it is possible to build your SSRS project. Once setup properly, it will run smoothly.

These are the settings, I used for building this project file

Tool :

C:\Program Files (x86)\Microsoft Visual Studio\2017\Enterprise\Common7\IDE\devenv.com

Arguments:

"$(Build.SourcesDirectory)\WideWorldImporters-SSDT.sln"
/rebuild $(BuildConfigurationSSRS) 
/project 
"$(Build.SourcesDirectory)\SSRSWideWorldImporters-SSDT\SSRSWideWorldImporters-SSDT.rptproj"

Visual Studio Build

Building a SSRS project with the Visual Studio Task will give the following error:


The element <DeploymentModel> beneath element <Project> is unrecognized.

3rd party task

There is a third party task and it is downloaded 80 times (at the moment of writing) and so not heavenly used. I tried it shortly but didn't manage to make it work in VSTS.

SSAS tabular project

It is possible to build a SSAS tabular project with MSBuild, just like the database project build.

MSBuild

Building a Tabular project is executed with the MSBuild task and now the extension .smproj is used for building the projects.

Devenv.com
I don't have a working example available for building a tabular project with Devenv.com.

Visual studio build
This is an example of building a Tabular project with Visual studio build task

3rd party
I didn't try the 3rd party examples for building a tabular project.

SSAS Multidimensional project

Again, the build of SSAS Multidimensional project is not possible with MSBuild and therefore use devenv again to build the SSAS multidimensional project.

Devenv.com

Tool :

C:\Program Files (x86)\Microsoft Visual Studio\2017\Enterprise\Common7\IDE\devenv.com

Arguments:

"$(Build.SourcesDirectory)\WideWorldImporters-SSDT.sln"
/rebuild $(BuildConfigurationSSIS) 
/project "$(Build.SourcesDirectory)\SSASMDWideWorldImporters-SSDT\SSASMDWideWorldImporters-SSDT.DWproj"

A typical build

Now, we have gathered the findings of the experiments with the different options of building a BI project, we can now set up a best practice for setting up a Build in VSTS. I have created the following steps :

Build Projects.
Copy the builds to the artefactsfolder.
Publish the artefacts in the artefactsfolder.

Now this results in the following artefacts:

Some files are not needed and can be excluded in the copy task, for instance, or setting the build options. It also possible to group the different type of project files in separate project artefacts. In that case, the setup of the build pipeline is bit different.

Final thoughts

There are several options building data warehouse projects : MSbuild, Visual Studio Build, 3rd party builds. At first it can be very confusing what task to use for a project. Here is an oversight of my experiments with the different tasks in VSTS.

Hennie

↧

DevOps : Testing of a Database project in Azure Devops (Part VI)

October 2, 2018, 11:17 pm

≫ Next: DevOps : Deploying a Tabular cube with Powershell

≪ Previous: DevOps : Investigating different options 'building' Data warehouse / BI project in VSTS (part V)

Introduction

In my previous blogpost, I've mostly talked about the Microsoft hosted agent and that works great when you want build and release your data warehouse and BI project. The code is checked in, an automatic build is started and the code is released to a release environment. But, what about testing? Testing is one of the key elements of Azure Devops (formerly known as VSTS): build the code, test it and release the code to a new environment. I'm using the terminology VSTS and Azure Devops interchangeably throughout this blogpost as the name change just happened while I was writing this blogpost.

If you want to test your database code with unit testing scripts, there has to be an environment where the testscripts can be run against it. Now as far as I can see now, there are a couple of options here:

Deploy the database code to the localDB instance on the Azure Devops server and run the database tests on the localDB instance.
Deploy the code on an on - premise box and run the test locally. For this setup there must be local agent installation.
Deploy the code to an Azure Database / VM box and run some tests on that database/machine.

I've noticed that as a starter it is a good idea to begin with the first option : Deploy the code to the localDB on the Azure DevOps server. It is possible to deploy and run the test against the localDB on the Azure DevOps server, although it took me while to figure it out. So, this blogpost is a report of my experiments with unit testing and automating this on a localdb instance, in a DevOps environment.

I've used an article by Rahul Mittal on code project for setting up a Unit testing. I've used this example to automate my builds and testing in Azure DevOps.

DBUnitTesting project

The database test project as described in the article on CodeProject describes 7 types of tests on a database. For this blog post, I followed the article in detail and executed the steps. I can recommend doing this too, if you want to know more about this subject.

There are a couple of important steps that are necessary to implement a database test project:

Create a database project in VS2015 or VS2017 (I used VS2017).
Create a testproject in VS2015 or 2017 together with the database project in the solution.
Define the tests in the project.
Execute and test the testproject.

As said before, there are 7 types of test possible:

Counting the number of rows in a table.
Data checksum in order to check whether some data has changed in the table.
Check if the schema has changed.
A performance check if a stored procedure is performing within some time boundaries.
Check if a query or a stored procedure returns an empty resultset.
Check if a query or a stored procedure returns a non-empty empty resultset.
Standard not test option: "inconclusive".

The tests are located in the test explorer where it is possible to run the tests.

I made a couple of mistakes while setting up the environment in Azure DevOps. One of the mistakes was that I didn't use the right dll. One helpful tool was vstest.console.exe to test the different local test dll. So I quickly found out that I had to use the another dll.

C:\Users\Administrator\source\repos\SQLUnitTestingDB\TestCases\bin\Debug>
"C:\Program Files (x86)\Microsoft Visual Studio\2017\Professional\
 Common7\IDE\CommonExtensions\Microsoft\TestWindow\vstest.console" testcases.dll
Microsoft (R) Test Execution Command Line Tool Version 15.7.2
Copyright (c) Microsoft Corporation.  All rights reserved.

Starting test execution, please wait...
Passed   SqlTest1
Passed   SqlTest1
Passed   SqlTest1
Passed   SqlTest1
Passed   SqlTest1
Passed   RowCountCompany
Passed   SqlTest1

Total tests: 7. Passed: 7. Failed: 0. Skipped: 0.
Test Run Successful.
Test execution time: 0.8713 Seconds

If you use the wrong dll you will get the following messages :


C:\Users\Administrator\source\repos\SQLUnitTestingDB\TestCases\bin\Debug>
"C:\Program Files (x86)\Microsoft Visual Studio\2017\Professional\
Common7\IDE\CommonExtensions\Microsoft\TestWindow\vstest.console"
sqlunittestingdb.dll
Microsoft (R) Test Execution Command Line Tool Version 15.7.2
Copyright (c) Microsoft Corporation.  All rights reserved.

Starting test execution, please wait...
No test is available in 
C:\Users\Administrator\source\repos\SQLUnitTestingDB\TestCases\bin\Debug\sqlunittestingdb.dll. 
Make sure that test discoverer & executors are registered and 
platform & framework version settings are appropriate and try again.

Additionally, path to test adapters can be specified using 
/TestAdapterPath command. Example  /TestAdapterPath:<pathToCustomAdapters>.

Build the database project in VSTS (Azure Devops)

1. Let's start with setting up the build process in a CI process in VSTS. The first step is to Create a build definition. Click on Build& release in the toolbar and click on Builds. When no build definitions have been created so far you should see a empty Build Defintions screen

2. Click New Pipeline. There are a number of options to choose from: Select a source control system, the team project , the repository and the branch.

3. Click on Continue. Now you should see a template screen where you can choose from a template Pipeline. Choose the Empty Pipeline template.

4. Click on Apply. and the empty pipeline should look like this. You choose a name, get the resource and choose which agent to run on the build.

5. Choose the Default Pipeline in the Agent pool selection.

6. Choose the Team project, repository, branch and set some options if you want.

7. Choose Visual Studio build task on the agent and click on Add.

8. Click on the Visual Studio Build task and set the options. Set the following properties:

Name Build : SQLUnitTestingDB solution
Solution : SQL UnitTestingDB
Visual studio Version : Visual Studio 2017
MSBuild Arguments: /t:Build;Publish /p:SqlPublishProfilePath=SQLUnitTestingDB.publish.xml

For the MSBuild option, I've included a publish file in my project. This will publish the database to the localdb on the VSTS server.

9. Add a Visual Studio Test Task to the pipeline and use the following settings:

Display Name
Testfiles : **\*Testcases.dll !**\*TestAdapter.dll!**\obj\**
Search folder : $(System.DefaultWorkingDirectory)

Run the Database Unit Testing Pipeline

When you are done setting up the pipeline in VSTS (Azure Devops) it is time to test the Azure DevOps pipeline. If nothing is changed, the pipeline will run into an error. We have some results from the Visual studio Test Task but not what we want.

Failed   RowCountCompany
Error Message:
 Initialization method TestCases.CompanyScalar.TestInitialize threw exception. 
 System.Data.SqlClient.SqlException: System.Data.SqlClient.SqlException: A network-related or 
 instance-specific error occurred while establishing a connection to SQL Server.
 The server was not found or was not accessible. Verify that the instance name 
 is correct and that SQL Server is configured to allow remote connections.
 (provider: Named Pipes Provider, error: 40 - Could not open a connection to SQL Server) 
 ---> System.ComponentModel.Win32Exception: The system cannot find the file specified.
Stack Trace:
 ........
at Microsoft.Data.Tools.Schema.Sql.UnitTesting.SqlDatabaseTestService.OpenExecutionContext()
at Microsoft.Data.Tools.Schema.Sql.UnitTesting.SqlDatabaseTestClass.InitializeTest()
at TestCases.CompanyScalar.TestInitialize() in D:\a\1\s\TestCases\CompanyScalar.cs:line 24

The problem is that the configuration settings in the testproject is set to localhost. Therefore the settings in the app.config should be changed into localdb.

Commit the code to Azure DevOps and let's see whether the Visual Studio Test task will execute the testproject. As it is shown in the screen below, the test has failed, but 2 of the 7 tests went successful.

The one that went wrong are the ones that assumes there is data in the table.

It is important to start your scripts from scratch or from a initial situation with some data. In latter case you have execute some scripts or restore a backup of the database on the local db. I'm not sure if it is possible to restore a database to the localdb instance. THat is one for a future blogpost.

Final thoughts

Setting up testing in your database project has some explorative aspects in it. There aren't much resources that clearly explains the possibilities, options and howto's. In this blogpost I've explored the possibility of testing a database project in the localDB. This blogpost is report of that experiment.

Hennie

↧

DevOps : Deploying a Tabular cube with Powershell

November 4, 2018, 6:18 am

≫ Next: 70-473 : Designing and Implementing Cloud Data Platform Solutions

≪ Previous: DevOps : Testing of a Database project in Azure Devops (Part VI)

Introduction

One step in a SQL data warehouse DevOps project is to deploy a SSAS tabular project on an instance. In this blogpost I'm going to show you a script that I'm using for deploying SSAS tabular cubes. As inspiration for creating the deployment script, I used information from a blogger Harbinger Singh. I had to make some changes to the script to make it work in my situation.

Steps

In the script, I've created a couple of blocks of code :

An array of cubes, I want to deploy to the server. This will help me control which cubes to deploy. Another option is to loop over the content of a folder and deploy the cubes.
Create a loop and loop through the array.
Check if the cube is present and print a warning if it can't find the cube.
Adjust the .asdatabase file database connectionstrings. I've multiple connections to databases and they must be changed.
Adjust the .deploymenttargets file database connectionstring.
Generate a .configsettings file. This file is not generated with the build of a SSAS tabular model.
Adjust .configsettings file database connectionstrings with the desired connectionstrings.
Not every cube uses a connectionstring to two databases. There is check whether there is a DWH_Control connectionstring in the .configsettings file.
Adjust .deploymentoptions file database connectionstrings.
Create the xmla script with AnalysisServices.Deployment wizard.
The last step is to deploy the xmla script to the server with Invoke-ASCmd.

The code

The complete script is written below.

#---------------------------------------------------------------------
# AllCubes.SSASTAB.Dev.Script
#
#---------------------------------------------------------------------
# General variables
$path           = "C:\<directory>"
$SSASServer     = "localhost"
$DwDBnameDM     = "DWH_Datamart"
$DwDBnameCTRL   = "DWH_Control"
$DwServerName   = "localhost"

# Structure bimname, CubeDB, modelname
$CubeArray = @(
             ("<filename1>" , "<cubeDB1>"           , "<modelname1>"),
             ("<filename2>" , "<cubeDB2>"           , "<modelname2>")
)

cls
Write-Host "------------------------------------"
foreach ($element in $CubeArray) {

    $bim            = $element[0]
    $CubeDB         = $element[1]
    $CubeModelName  = $element[2]

    $AsDBpath             = "$path\$bim.asdatabase"
    $DepTargetpath        = "$path\$bim.deploymenttargets"
    $ConfigPath           = "$path\$bim.configsettings"
    $DeployOption         = "$path\$bim.deploymentoptions"
    $SsasDBconnection     = "DataSource=$SsasServer;Timeout=0"
    $DwDbDMConnString     = "Provider=SQLNCLI11.1;Data Source=$DwServerName;Integrated Security=SSPI;Initial Catalog=$DwDBnameDM"
    $DwDbCTRLConnString   = "Provider=SQLNCLI11.1;Data Source=$DwServerName;Integrated Security=SSPI;Initial Catalog=$DwDBnameCTRL"
    $IsDMConnStrPresent   = [bool]0
    $IsCTRLConnStrPresent = [bool]0

    if (!(Test-Path $AsDBpath))  {
      Write-Warning "$AsDBpath absent from location"
      Write-Host "------------------------------------"
      continue
    }

    #Adjust .asdatabase file database connectionstring
    $json = (Get-Content $AsDBpath -raw) | ConvertFrom-Json
    $json.model.dataSources | % {if($_.name -eq 'DWH_DataMart'){$_.connectionString=$DwDbDMConnString ; $IsDMConnStrPresent=[bool]1 }}
    $json.model.dataSources | % {if($_.name -eq 'DWH_Control'){$_.connectionString=$DwDbCTRLConnString ; $IsCTRLConnStrPresent=[bool]1 }}
    $json | ConvertTo-Json  -Depth 10 | set-content $AsDBpath

    #Adjust .deploymenttargets file database connectionstring
    $xml  = [xml](Get-Content $DepTargetpath)
    $xml.Data.Course.Subject
    $node = $xml.DeploymentTarget
    $node.Database = $CubeDB
    $node = $xml.DeploymentTarget
    $node.Server = $SsasServer
    $node = $xml.DeploymentTarget
    $node.ConnectionString = $SsasDBconnection
    $xml.Save($DepTargetpath)

    # generate .configsettings as this file is not generated with the build. 
    if (($IsDMConnStrPresent) -and ($IsCTRLConnStrPresent))  {
'<ConfigurationSettings xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ddl2="http://schemas.microsoft.com/analysisservices/2003/engine/2" xmlns:ddl2_2="http://schemas.microsoft.com/analysisservices/2003/engine/2/2" xmlns:ddl100_100="http://schemas.microsoft.com/analysisservices/2008/engine/100/100" xmlns:ddl200="http://schemas.microsoft.com/analysisservices/2010/engine/200" xmlns:ddl200_200="http://schemas.microsoft.com/analysisservices/2010/engine/200/200" xmlns:ddl300="http://schemas.microsoft.com/analysisservices/2011/engine/300" xmlns:ddl300_300="http://schemas.microsoft.com/analysisservices/2011/engine/300/300" xmlns:ddl400="http://schemas.microsoft.com/analysisservices/2012/engine/400" xmlns:ddl400_400="http://schemas.microsoft.com/analysisservices/2012/engine/400/400" xmlns:ddl500="http://schemas.microsoft.com/analysisservices/2013/engine/500" xmlns:ddl500_500="http://schemas.microsoft.com/analysisservices/2013/engine/500/500" xmlns:dwd="http://schemas.microsoft.com/DataWarehouse/Designer/1.0">
<Database>
<DataSources>
<DataSource>
<ID>DWH_DataMart</ID>
<ConnectionString>Provider=SQLNCLI11.1;Data Source=localhost;Integrated Security=SSPI;Initial Catalog=DWH_Datamart</ConnectionString>
<ManagedProvider>
</ManagedProvider>
<ImpersonationInfo>
<ImpersonationMode xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">ImpersonateServiceAccount</ImpersonationMode>
<Account xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">
</Account>
<Password xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">
</Password>
<ImpersonationInfoSecurity xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">Unchanged</ImpersonationInfoSecurity>
</ImpersonationInfo>
</DataSource>
<DataSource>
<ID>DWH_Control</ID>
<ConnectionString>Provider=SQLNCLI11.1;Data Source=localhost;Integrated Security=SSPI;Initial Catalog=DWH_Control</ConnectionString>
<ManagedProvider>
</ManagedProvider>
<ImpersonationInfo>
<ImpersonationMode xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">ImpersonateServiceAccount</ImpersonationMode>
<Account xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">
</Account>
<Password xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">
</Password>
<ImpersonationInfoSecurity xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">Unchanged</ImpersonationInfoSecurity>
</ImpersonationInfo>
</DataSource>
</DataSources>
</Database>
</ConfigurationSettings>' | Out-File -FilePath $path\$bim.configsettings
    }
    else {
'<ConfigurationSettings xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ddl2="http://schemas.microsoft.com/analysisservices/2003/engine/2" xmlns:ddl2_2="http://schemas.microsoft.com/analysisservices/2003/engine/2/2" xmlns:ddl100_100="http://schemas.microsoft.com/analysisservices/2008/engine/100/100" xmlns:ddl200="http://schemas.microsoft.com/analysisservices/2010/engine/200" xmlns:ddl200_200="http://schemas.microsoft.com/analysisservices/2010/engine/200/200" xmlns:ddl300="http://schemas.microsoft.com/analysisservices/2011/engine/300" xmlns:ddl300_300="http://schemas.microsoft.com/analysisservices/2011/engine/300/300" xmlns:ddl400="http://schemas.microsoft.com/analysisservices/2012/engine/400" xmlns:ddl400_400="http://schemas.microsoft.com/analysisservices/2012/engine/400/400" xmlns:ddl500="http://schemas.microsoft.com/analysisservices/2013/engine/500" xmlns:ddl500_500="http://schemas.microsoft.com/analysisservices/2013/engine/500/500" xmlns:dwd="http://schemas.microsoft.com/DataWarehouse/Designer/1.0">
<Database>
<DataSources>
<DataSource>
<ID>DWH_DataMart</ID>
<ConnectionString>Provider=SQLNCLI11.1;Data Source=localhost;Integrated Security=SSPI;Initial Catalog=DWH_Datamart</ConnectionString>
<ManagedProvider>
</ManagedProvider>
<ImpersonationInfo>
<ImpersonationMode xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">ImpersonateServiceAccount</ImpersonationMode>
<Account xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">
</Account>
<Password xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">
</Password>
<ImpersonationInfoSecurity xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">Unchanged</ImpersonationInfoSecurity>
</ImpersonationInfo>
</DataSource>
</DataSources>
</Database>
</ConfigurationSettings>' | Out-File -FilePath $path\$bim.configsettings
    }

    #Adjust .configsettings file database connectionstring
    $xml = [xml](Get-Content $ConfigPath)
    $xml.Data.Course.Subject
    $nodeDM = $xml.ConfigurationSettings.Database.DataSources.DataSource | ? { $_.ID -eq $DwDBnameDM }
    $nodeDM.ConnectionString = $DwDbDMConnString
    $nodeCTRL = $xml.ConfigurationSettings.Database.DataSources.DataSource | ? { $_.ID -eq $DwDBnameCTRL }

    # In case here is not a DWH_Control Connectionstring in the .configsettings file
    if (![string]::IsNullOrEmpty($nodeCTRL))
    {
        $nodeCTRL.ConnectionString = $DwDbCTRLConnString
        $xml.Save($ConfigPath)
    }

    #Adjust .deploymentoptions file database connectionstring
    $xml = [xml](Get-Content $DeployOption)
    $xml.Data.Course.Subject
    $node = $xml.DeploymentOptions
    $node.ProcessingOption = "DoNotProcess"
    $xml.Save($DeployOption)

    # Create the xmla script with AnalysisServices.Deployment wizard
    Write-Host "Deploying Cube : $CubeDB"
    $path = $path
    cd $path 
    $exe = "C:\Program Files (x86)\Microsoft SQL Server\140\Tools\Binn\ManagementStudio\Microsoft.AnalysisServices.Deployment.exe"
    $param1 = $bim + ".asdatabase"
    $param2 = "/s:" + $bim + ".txt"
    $param3 = "/o:" + $bim + ".xmla"
    $param4 = "/d"
&($exe)($param1)($param2)($param3)($param4)

    Write-Host "Importing SQL modules..."
    # import modules 
    if ((Get-Module -ListAvailable | where-object {($_.Name -eq 'SqlServer') -and ($_.Version.Major -gt 20) } |Measure).Count -eq 1){
        # implementation of new sql modules migated into new location
        Import-Module SqlServer -DisableNameChecking
    }
    else{
        # fallback for SQLPS 
        Import-Module SQLPS -DisableNameChecking -Verbose
    }

    Write-Host "Invoking deployment script... This may take several minutes."
    Invoke-ASCmd -Server:$SsasServer -InputFile $path\$bim.xmla | Out-File $path\$bim.xml
    Write-Host "Please check $path\$bim.xml as this is output of this deployment"
    Write-Host "Done."
    Write-Host "------------------------------------"
}

Final thoughts

Althought it is quite a script, it is fairly easy to setup and deploy a cube with a Powershell script. In order to use it in Azure DevOps you have to replace some of the variables with the Azure DevOps variables to make it work as you desire.

Hennie

↧

70-473 : Designing and Implementing Cloud Data Platform Solutions

November 9, 2018, 10:30 am

≫ Next: Azure SQL Database exam terminology

≪ Previous: DevOps : Deploying a Tabular cube with Powershell

Introduction

One of my targets in the coming year is to learn more about Azure. Recent news shows that Microsoft earnings on Azure is increasing quite a lot and cloud seems to be the future. I've already invested in courses like the Microsoft professional program Data science and the Big data program. These courses are a great resource if you want to know more about big data solutions with Azure and data science with AzureML and R. As said before, I feel more and more confident and familiar with the Azure environment and it's time to pass some Microsoft exams. The first one is the exam 40-473 : Designing and Implementing Cloud Data Platform Solutions. This exam is about the relational databases like SQL Server, Azure SQL database, MySQL and ProgresSQL. The latter two have been added in June 2017 and replaced the SQL Data warehouse section. Why ? I don't know. I'm not into MySQL and PostgresSQL but I want to pass the exam! Therefore I need to study some things I don't really need for my day-to-day basis of work.

Now, I got the idea from MSSQLTips to copy the content of the requirements of the exam and search for relevant content and add them to exam requirements. I've used most of the information that was already present, but reorganized and added my own links for studying the items.

General sources

Now for this exams I studied quite some resources. These are sometimes generic information about SQL Server, SQL database, Azure or else. These help you with some background information:

General Microsoft Azure information

1. Design and implement database solutions for Microsoft SQL Server and Microsoft Azure SQL Database

2 Design and Implement Security

2.1 Design and implement SQL Server Database security

Configure firewalls

Manage logins, users, and roles

Assign permissions

Configure auditing

Configure Transparent Database Encryption (TDE)

Configure row-level security

Configure data encryption

Configure data masking

Configure Always Encrypted

2.2 Design and implement Azure SQL Database security

Configure firewalls

Manage logins, users, and roles

Assign permissions

Controlling and granting database access to SQL Database and SQL Data Warehouse

Configure auditing;

Configure row-level security

Configure data encryption;

Transparent data encryption for SQL Database and Data Warehouse

Configure data masking

Configure Always Encrypted

Always Encrypted: Protect sensitive data and store encryption keys in Azure Key Vault

Configure Automatic Threat Detection

Azure SQL Database Threat Detection

3 Design for high availablity, Disaster recovery and scalability

3.1 Design and implement high availability solutions

Design a high-availability solution topology

Design a high-availability solution for SQL on Azure VMs

Implement high-availability solutions

3.2 Design and implement scalable solutions

Design a scale-out solution

Implement multi-master scenarios with database replication

Implement elastic scale for Azure SQL Database

3.3 Design and implement Azure SQL Database data recovery

Implement self-service restore

Copy and export databases

Implement long-term retention backups

4 Monitor and manage database implementations in Azure

4.1 Monitor and troubleshoot SQL Server VMs on Azure

Monitor database and instance activity

Monitor by using DMVs and DMFs

Monitor performance and scalability

Scale Up or Scale Out? Scaling SQL Server

Monitor and troubleshoot SQL Database

4.3 Monitor and troubleshoot SQL Database

Monitor database activity

Monitor by using DMVs and DMFs

Monitor performance and scalability

System Dynamic Management Views

4.4 Automate and manage database implementations on Azure

Automate and manage SQL Server on Azure VMs

Automate and manage Azure SQL Database

How to use PowerShell to create a SQL Virtual Machine in Azure

Configure automation and runbooks

↧

Azure SQL Database exam terminology

November 10, 2018, 7:42 am

≫ Next: Azure Data Factory Series : Create Azure Data Factory with SSIS runtime

≪ Previous: 70-473 : Designing and Implementing Cloud Data Platform Solutions

Introduction

Currently studying for the Microsoft 70-473, Designing and Implementing Cloud Data Platform Solutions, and I have gathered some of the buzzwords that appeared when studying for this exam. I've written them down for myself and for you for referencing and studying. Good luck!

In the coming months I'll update this list to reflect my findings during investigation of new study material for this exam. Some terminology is not yet explained, but I'll do that later.

The buzzwords explained

Active Directory Federation Services (ADFS) :
Always On Availability Groups : An availability group supports a replicated environment for a discrete set of user databases, known as availability databases. You can create an availability group for high availability (HA) or for read-scale. An HA availability group is a group of databases that fail over together.
Always On Failure Cluster instances :
App Service Authentication / Authorization : App Service Authentication / Authorization is a feature that provides a way for your application to sign in users so that you don't have to change code on the app backend. It provides an easy way to protect your application and work with per-user data.
Application Insights : Application Insights is an extensible Application Performance Management (APM) service for web developer. With Application Insights, you can monitor your live web applications and automatically detect performance anomalies.
Asynchronous communication.Communication between loosely coupled system by using storage queues or Service bus queues for later processing.
Availability database : A database that belongs to an availability group. For each availability database, the availability group maintains a single read-write copy (the primary database) and one to eight read-only copies (secondary databases).
Availability group : A container for a set of databases, availability databases, that fail over together.
Availability group listener : A server name to which clients can connect in order to access a database in a primary or secondary replica of an Always On availability group. Availability group listeners direct incoming connections to the primary replica or to a read-only secondary replica.
Availability Modes : The availability mode is a property of each availability replica. The availability mode determines whether the primary replica waits to commit transactions on a database until a given secondary replica has written the transaction log records to disk (hardened the log). Always On availability groups supports two availability modes—asynchronous-commit mode and synchronous-commit mode.
Availability sets : Azure positions the virtual machines in a way that prevents localized hardware faults and maintenance activities from bringing down all of the machines in that group. Availability sets are required to achieve the Azure SLA for the availability of Virtual Machines.
Availability Replica : Each set of availability database is hosted by an availability replica. Two types of availability replicas exist: a single primary replica. which hosts the primary databases, and one to eight secondary replicas, each of which hosts a set of secondary databases and serves as a potential failover targets for the availability group.
Azure Advisor : Azure Advisor is a personalized cloud consultant that helps you to optimize your Azure deployments. It analyzes your resource configuration and usage telemetry.
Azure Compute Units (ACU) : The concept of the Azure Compute Unit (ACU) provides a way of comparing compute (CPU) performance across Azure SKUs. This will help you easily identify which SKU is most likely to satisfy your performance needs.
Azure Cross Platform Interface (CLI ): The Azure CLI is Microsoft's cross-platform command-line experience for managing Azure resources. You can use it in your browser with Azure Cloud Shell, or install it on macOS, Linux, or Windows and run it from the command line.
Azure Disk Encryption : Azure Disk Encryption is a new capability that helps you encrypt your Windows and Linux IaaS virtual machine disks. It applies the industry standard BitLocker feature of Windows and the DM-Crypt feature of Linux to provide volume encryption for the OS and the data disks.
Azure DNS : The Domain Name System, or DNS, is responsible for translating (or resolving) a website or service name to its IP address. Azure DNS is a hosting service for DNS domains, providing name resolution using Microsoft Azure infrastructure.
Azure Key Vault :
Azure Monitor. Azure Monitor offers visualization, query, routing, alerting, auto scale, and automation on data both from the Azure infrastructure (Activity Log) and each individual Azure resource (Diagnostic Logs).
Azure Load Balancer : Azure Load Balancer delivers high availability and network performance to your applications. It is a Layer 4 (TCP, UDP) load balancer that distributes incoming traffic among healthy instances of services defined in a load-balanced set.
Azure Region :
Azure Resource Manager (ARM) : Azure Resource Manager enables you to work with the resources in your solution as a group.
Azure Resource Manager template : You use an Azure Resource Manager template for deployment and that template can work for different environments such as testing, staging, and production.
Azure Scheduling Service: Azure Logic Apps is replacing Azure Scheduler, which is being retired. To schedule jobs, try Azure Logic Apps instead.
Azure Security Center (ASC) : Azure Security Center helps you prevent, detect, and respond to threats with increased visibility into and control over the security of your Azure resources.It provides integrated security monitoring and policy management across your Azure subscriptions, helps detect threats that might otherwise go unnoticed, and works with a broad ecosystem of security solutions.
Azure Service Fabric (ASF)
Azure Site Recovery (ASR) : Azure Site Recovery helps orchestrate replication, failover, and recovery of workloads and apps so that they are available from a secondary location if your primary location goes down.
Azure SQL Database Query Performance Insight : Managing and tuning the performance of relational databases is a challenging task that requires significant expertise and time investment. Query Performance Insight allows you to spend less time troubleshooting database performance by providing the following:

Deeper insight into your databases resource (DTU) consumption.
The top queries by CPU/Duration/Execution count, which can potentially be tuned for improved performance.
The ability to drill down into the details of a query, view its text and history of resource utilization.
Performance tuning annotations that show actions performed by SQL Azure Database Advisor

Azure SQL logical server : A logical server acts as a central administrative point for multiple single or pooled databases, logins, firewall rules, auditing rules, threat detection policies, and failover groups. A logical server can be in a different region than its resource group. The logical server must exist before you can create the Azure SQL database.
Azure Storage : blobs, tables, queues
Azure Storage Analytics : Azure Storage Analytics performs logging and provides metrics data for a storage account. You can use this data to trace requests, analyze usage trends, and diagnose issues with your storage account.
Azure subnets :
Azure Traffic Manager (WATM) : Once a datacenter-specific failure occurs, you must redirect traffic to services or deployments in another datacenter. This routing can be done manually, but it is more efficient to use an automated process.
Azure Virtual Network (VNET) : An Azure virtual network (VNet) is a representation of your own network in the cloud. It is a logical isolation of the Azure network fabric dedicated to your subscription. You can fully control the IP address blocks, DNS settings, security policies, and route tables within this network
Always Encrypted :
Blob :
Bring Your Own Key (BYOK) :
Cold data : Cold data is inactive data that is rarely used, but must be kept for compliance reasons.
Compute scale-units :
Compute optimized VM : High CPU-to-memory ratio. Good for medium traffic web servers, network appliances, batch processes, and application servers.
Content Delivery Network (CDN) :
Cross-Origin Resource Sharing (CORS) : Cross-Origin Resource Sharing (CORS) is a mechanism that allows domains to give each other permission for accessing each other’s resources. The User Agent sends extra headers to ensure that the JavaScript code loaded from a certain domain is allowed to access resources located at another domain.
Data in motion :
Database Transaction Unit (DTU) : The Database Transaction Unit (DTU) represents a blended measure of CPU, memory, reads, and writes.
DBaaS : Database as a Service
Direct Server Return :
Disaster Recovery (DR) :
Disk Striping :
DTU-based purchasing model : This model is based on a bundled measure of compute, storage, and IO resources. Compute sizes are expressed in terms of Database Transaction Units (DTUs) for single databases and elastic Database Transaction Units (eDTUs) for elastic pools.
Dynamic Scalability : Dynamic scalability is different from autoscale. Autoscale is when a service scales automatically based on criteria, whereas dynamic scalability allows for manual scaling without downtime. Dynamic scalability enables your database to transparently respond to rapidly changing resource requirements and enables you to only pay for the resources that you need when you need them.
eDTU : eDTU measures the shared resources in an elastic pool. A pool is given a set number of eDTUs for a set price. Within the elastic pool, individual databases are given the flexibility to auto-scale within the configured boundaries. A database under heavier load will consume more eDTUs to meet demand. Databases under lighter loads will consume less eDTUs
Effective availability : Effective availability considers the Service Level Agreements (SLA) of each dependent service and their cumulative effect on the total system availability.
Elastic :
Elastic Pool :
Encryption at rest : There are three Azure storage security features that provide encryption of data that is “at rest”: Storage Service Encryption allows you to request that the storage service automatically encrypt data when writing it to Azure Storage. Client-side Encryption also provides the feature of encryption at rest. Azure Disk Encryption allows you to encrypt the OS disks and data disks used by an IaaS virtual machine.
Encryption in transit : Encryption in transit is a mechanism of protecting data when it is transmitted across networks. Transport-level encryption, such as HTTPS when you transfer data into or out of Azure Storage. Wire encryption, such as SMB 3.0 encryption for Azure File shares. Client-side encryption, to encrypt the data before it is transferred into storage and to decrypt the data after it is transferred out of storage.
Estimated Recovery Time (ERT) :
Express route : Microsoft Azure ExpressRoute is a dedicated WAN link that lets you extend your on-premises networks into the Microsoft cloud over a dedicated private connection facilitated by a connectivity provider.
Fail Over :
Fail Over groups :
Fault Detection :
Fault Domain (FD):
Fault Tolerance : A fault tolerant solution detects and maneuvers around failed elements to continue and return the correct results within a specific timeframe.
Feature parity :
Forced tunneling: Forced tunneling is a mechanism you can use to ensure that your services are not allowed to initiate a connection to devices on the Internet.
Geo-redundant storage (GRS)
hardware Security modules (HSMs) : Key Vault provides the option to store your keys in hardware Security modules (HSMs) certified to FIPS 140-2 Level 2 standards.
High Availablity/Disaster Recovery (HADR) :
High Availability (HA) : A highly available cloud solution should implement strategies to absorb the outage of the dependencies of services that are provides by the cloud platform.
High performance compute VM : Our fastest and most powerful CPU virtual machines with optional high-throughput network interfaces (RDMA).
Hyperscale Databases :
Hyperscale service tier
General purpose VM : Balanced CPU-to-memory ratio. Ideal for testing and development, small to medium databases, and low to medium traffic web servers.
Geo-Replication :
Geo-Replication Support (GRS) :
GPU optimized VM : Specialized virtual machines targeted for heavy graphic rendering and video editing, as well as model training and inferencing (ND) with deep learning. Available with single or multiple GPUs.
PaaS :
Premium Storage :
IaaS :
Internal Load Balancer (ILB) : Load balance traffic between virtual machines in a virtual network, between virtual machines in cloud services, or between on-premises computers and virtual machines in a cross-premises virtual network. This configuration is known as internal load balancing.
Log Analytics: Provides an IT management solution for both on-premises and third-party cloud-based infrastructure (such as AWS) in addition to Azure resources. Data from Azure Monitor can be routed directly to Log Analytics so you can see metrics and logs for your entire environment in one place.
Load balancing :
Locally redundant storage (LRS) :
Log shipping :
Managed Disks : Managed Disks handles storage behind the scenes. In addition, when virtual machines with Managed Disks are in the same availability set, Azure distributes the storage resources to provide appropriate redundancy. Microsoft recommends Managed Disks for SQL Server.
Memory optimized VM : High memory-to-CPU ratio. Great for relational database servers, medium to large caches, and in-memory analytics.
Multi-master replication : Multi-master replication is a method of database replication which allows data to be stored by a group of computers, and updated by any member of the group. All members are responsive to client data queries. The multi-master replication system is responsible for propagating the data modifications made by each member to the rest of the group, and resolving any conflicts that might arise between concurrent changes made by different members.
Multi-Factor Authentication (MFA) :
Multi-Tenant :
Network access control : Network access control is the act of limiting connectivity to and from specific devices or subnets and represents the core of network security.
Network Security groups (NSGs) : Network Security groups (NSGs) can be used on Azure Virtual Network subnets containing App Service Environments to restrict public access to API applications.
Next Generation Firewall (NGFW) : Add a Next Generation Firewall Recommends that you add a Next Generation Firewall (NGFW) from a Microsoft partner to increase your security protections
Pricing Tier :
Priority routing method :
Public Virtual IP (VIP) :
Resilience :
Read-access geo-redundant storage (RA-GRS) :
Read-scale balancing : A read scale availability group provides replicas for read-only workloads but not high availability. In a read-scale availability group there is no cluster manager.
Read scale out : Read scale-out is a feature available in where you are getting one read-only replica of your data where you can execute demanding read-only queries such as reports. Red-only replica will handle your read-only workload without affecting resource usage on your primary database.
Read Only (RO) :
Replication option : Locally redundant storage (LRS), Zone-redundant storage (ZRS), Geo-redundant storage (GRS), Read-access geo-redundant storage (RA-GRS)
Role-Based Access Control (RBAC) :You can secure your storage account with Role-Based Access Control (RBAC). Restricting access based on the need to know and least privilege security principles is imperative for organizations that want to enforce Security policies for data access
RPO: Recovery point Objective is the acceptable amount of data lost in case a disaster.
RTO: Recovery Time Objective is the acceptable amount required to recover from a disaster.
Secure Sockets Layer (SSL) : Enforcing SSL connections between your database server and your client applications helps protect against "man in the middle" attacks by encrypting the data stream between the server and your application.
Service Tier :
Shared Access Signature (SAS) : A shared access signature (SAS) provides delegated access to resources in your storage account. The SAS means that you can grant a client limited permissions to objects in your storage account for a specified period and with a specified set of permissions.
Standard Storage : Standard Storage has varying latencies and bandwidth and is only recommended for dev/test workloads. This includes the new Standard SSD storage.
Stretch database: Stretch Database in SQL Server 2017 migrates your historical data transparently to Azure. Using the Stretch Database approach only applies to SQL Server 2017 and does not apply to SQL Database.
Stretch Database Advisor
Scalable solutions : Scalable solutions are able to meet increased demand with consistent results in acceptable time windows
Scale units
Sharding
SKU :
Stateless :
Stateless compute nodes :
Storage scale-units : Examples are Azure Table partition, Blob container, and SQL Database.
Storage optimized VM : High disk throughput and IO. Ideal for Big Data, SQL, and NoSQL databases.
System-versioned temporal tables :
Read and Write access (RW) :
Time-To-Live (TTL) : Time to live (TTL) or hop limit is a mechanism that limits the lifespan or lifetime of data in a computer or network. TTL may be implemented as a counter or timestamp attached to or embedded in the data.
Transient : The word “transient” means a temporary condition lasting only for a relatively short time.
Transparant Data Encryption (TDE) :
Tenant :
Update Domain (UD) :
vCore model :
vCore-based purchasing model : This model allows you to independently choose compute and storage resources. It also allows you to use Azure Hybrid Benefit for SQL Server to gain cost savings. Managed Instances in Azure SQL Database only offer the vCore-based purchasing model.
VPN gateway : To send network traffic between your Azure Virtual Network and your on-premises site, you must create a VPN gateway for your Azure Virtual Network.
web application firewall (WAF) : The web application firewall (WAF) in Azure Application Gateway helps protect web applications from common web-based attacks like SQL injection, cross-site scripting attacks, and session hijacking.
Windows Server Failover Cluster(WSFC). Deploying Always On availability groups for HA on Windows requires a Windows Server Failover Cluster(WSFC). Each availability replica of a given availability group must reside on a different node of the same WSFC.
Zone-redundant storage (ZRS)
Zone-redundant Databases

My SQL

MySQL Community Edition : MySQL Community Edition is the freely downloadable version of the world's most popular open source database. It is available under the GPL license and is supported by a huge and active community of open source developers.
MySQL Document Store including X Protocol, XDev API and MySQL Shell
Transactional Data Dictionary :
Pluggable Storage Engine Architecture :
InnoDB :
NDB :
MyISAM :
MySQL Replication :
MySQL Group Replication for replicating data while providing fault tolerance, automated failover, and elasticity
MySQL InnoDB Cluster : Deliver an integrated, native, high availability solution for MySQL
MySQL Router : For transparent routing between your application and any backend MySQL Servers
MySQL Partitioning : to improve performance and management of large database applications
Performance Schema for user/application level monitoring of resource consumption
MySQL Workbench : for visual modeling, SQL development and administration

PostgresSQL

Azure_maintenance - This database is used to separate the processes that provide the managed service from user actions. You do not have access to this database.
Azure_sys - A database for the Query Store. This database does not accumulate data when Query Store is off; this is the default setting. For more information, see the Query Store overview.
Continuous sync capability : DMS performs an initial load of your on-premises to Azure Database for PostgreSQL, and then continuously syncs any new transactions to Azure while the application remains running. After the data catches up on the target Azure side, you stop the application for a brief moment (minimum downtime), wait for the last batch of data (from the time you stop the application until the application is effectively unavailable to take any new traffic) to catch up in the target, and then update your connection string to point to Azure.
GIN operator
Supported PostgreSQL Database Versions (at the moment of writing: 10.4, 9.6.9 and 9.5.13
pg_dump :
pg_restore :
Postgres - A default database you can connect to once your server is created.
PostgreSQL Extensions : PostgreSQL provides the ability to extend the functionality of your database using extensions. Extensions allow for bundling multiple related SQL objects together in a single package that can be loaded or removed from your database with a single command. After being loaded in the database, extensions can function as do built-in features
Pricing Tiers : Basic, General purpose and memory optimized.

Hennie

↧

Azure Data Factory Series : Create Azure Data Factory with SSIS runtime

December 12, 2018, 1:43 am

≫ Next: Certifications and courses

≪ Previous: Azure SQL Database exam terminology

Introduction

This blogpost describes how to create the Azure Data Factory. The plan is create some follow up blogpost about running SSIS packages in Azure Data Factory. Now, this blogpost descibes how to create and setup the Azure Data Factory. This is a introduction to Azure Data Factory.

Prerequisites for creating a Azure Data Factory is an Azure subscription. I'm using a Visual Studio Professional subscription.

For later use I've downloaded and installed the Storage Explorer. I expect to need that for uploading some data to the storage account.

Create a data factory

1. First step is login into azure.

2. The second step is to create the resource Azure data factory in the Azure Portal. In the Resource blade select the Create a Resource -> Analytics -> Azure Data Factory

3. On the New Factory blade I put a descriptive name in the name field and press on Create.

And the Azure Data Factory is created in Azure.

4. When you open the Azure Data Factory, it's possible to browse the the settings and view the monitoring

5. Select Author&Monitor to start the Azure Data Factory

6. Now let's configure the SSIS integration Runtime in the Azure Data Factory. Click on the Configure SSIS Integration Runtime.

7. Now enter some settings in the General settings tab.

7. Now enter some settings in the SQL settings tab and click on test connection to make sure that the database server can be reached.

8. Enter some advanced settings. I'll not go into much detail here because thisis a introductory blogpost.

9. Now it will take some time to create the SSIS runtime in Azure. In my case it took exactly 30 minutes to finish.

10. And we are done! There are two Integration Runtimes created. One is the standard ADF IR and the SSIS runtime. We are up and running!!

11. Let's see how SSMS can connect to the server. We see now there is a SSISDB installed along the SQL Azure Database.

12. And if you want to access the SSIS catalog, some settings are needed when connecting to Catalog. You can change that in the Options tab.

13. This is the result of connecting to the IS catalog.

Well, this is a small introduction to setting up Azure Data Factory with a SSIS runtime. I hope you like it and see you at future blogposts..

Final thoughts

Setting up the Azure Data Factory with SSIS runtime is very easy in Azure. More ADF blogposts to come...

Hennie

↧

Certifications and courses

January 31, 2019, 2:11 am

≫ Next: Fact Oriented Modeling Introduction (Part I)

≪ Previous: Azure Data Factory Series : Create Azure Data Factory with SSIS runtime

Introduction

I have used some of my "in between job" time to study and learn more about the Microsoft stack and I participated in a course "Fact oriented Modelling" aka FOM aka FCO-IM and formerly known as NIAM. The certifications for Microsoft technology are more technical oriented and the other is more based on data- or information modelling about facts.

Microsoft certifications

Now, in the last month, I've studied for the following certifications:

MCSA SQL Server 2016 Business intelligence development.
MCSA SQL Server 2016 Database development.
MCSA BI reporting.
MCSE Data management and Analytics.
Microsoft Certified - Azure fundamentals.

I've already earned the MCSA SQL server 2012/2014 a couple of years ago. The badges of the achieved certifications are shown in the picture, below.

The MCSA certifications are based on on - premise knowledge but also Azure knowledge is required. The Azure fundamentals certification is based on exam AZ-900 and is an exam of basic knowledge of Azure.

Fact oriented modelling

Fact Oriented Modelling (FOM) is another area of interest. FOM is based on NIAM. FOM is focussed on conceptual modelling, logical modelling and about algoritmic way of deriving an ER model based.

FOM is not a very broadly used modelling methodology, but has some important aspects I have not seen before:

It is based on the communication about things in the world around us and not based on the objects in it self, in contrast with what traditional modelling techniques prescribes like Codd and Chen. If nobody talks (verbal communication, reports, Excel, etc) about it, it is not important for modelling in data structures. That makes sense.
It has a strong PDCA cyclus in conceptual modelling, logical modelling and physical modelling. From what we learn about communication, sentences are created (verbalization), a logical model and a physical model is designed. From the data that is present in the tables, we can recreate the sentences and show that to the users in order to verify that the model is properly created. That is PDCA in the data modelling! This is much more intuitive than normal ER modelling with the Normal Forms.

These are just some aspects of FOM I've learned. I'll blog about this topic in the future to show more aspects of FOM.

Hennie

↧

Fact Oriented Modeling Introduction (Part I)

February 8, 2019, 5:22 am

≫ Next: Fact Oriented Modeling constraints (part II)

≪ Previous: Certifications and courses

Introduction

Today, I want to write something about Fact based Oriented Modelling (FOM). FOM is not about modeling objects in the real world but it is focused on modeling the communication about the objects in the world around us. This is another focus than methods like Codd or Chen. During projects, I gathered information about the area of interest and one of the next steps was trying to imagine the objects and model the data. For instance, you have Patient data and you define the Patient entity. This approach is different than FOM. With FOM you gather the information from communication and verbalize the information in so called fact expressions.

Semantics

I've noticed that during discussions I've had so far, semantics is a much used keyword. Although I was aware of a kind of meaning of this word, I decided to google it, and here is what I found:

"The branch of linguistics and logic concerned with meaning. The two main areas are logical semantics, concerned with matters such as sense and reference and presupposition and implication, and lexical semantics, concerned with the analysis of word meanings and relations between them."

In my opinion, this is saying that semantics is the area of understanding the meaning of communication between (business)people. As a data modeler it is important to understand the wording, the meaning and the relations between the (certain) words.

Verbalization

Fact expressions are important in FOM. FOM expressions are based on predicate logic. They are true or not. For instance you can say something like: "There is a student called Peter Janssen" or "Order 12345 is ordered on February 15th, 2019". The first one is a so called postulated existent expression (I hope I translated that from Dutch correctly) and the latter one is elementary fact expression, meaning it is the minimal information to identify the fact. In other words, there is no redundant information and there is not to less information.

For this blogpost, I've used the following examples (translated from Dutch) from the book Fully Communication Oriented Information Modeling (FCO-IM) by G.Bakema, J. Zwart and H. van der Lek. This is a very readable book about FOM. English version here.

There is a student Peter Jansen.
There is a student Jan Hendriks.
Student Peter Jansen lives in Nijmegen.
Student Jan Hendriks lives in Nijmegen.
Internship S101 is available.
Internship S102 is available.
Student Peter Jansen prefers nr 1 stage S101.
Student Peter Jansen prefers nr 2 stage S203.
Stage S101 takes place in Nijmegen.
Stage S102 takes place in Eindhoven.
Stage S101 is developing a time registration program.
Stage S102 is researching CASE tooling.
Student Peter Jansen is assigned to internship S101.
Student Jan Hendriks is assigned to internship S203.

As you can see, these sentences are easier to verify by business users than a Bachman- or a Chen diagram. Users can say: "No that is not correct, it should be this or that". So this is the first, but very important step in data modeling with FOM. I've not seen this kind of approach earlier. You can say that is the conceptual level of modelling! It models the facts in the communication.

Qualification and Classification

When you are satisfied with the verbalizing the facts, the next step starts. That is grouping the fact expressions into categories and giving the group a name. This is called qualification and classification. For instance, sentences like "There is a student Peter jansen" and "There is a student Jan Hendriks" are grouped together and named "Student".

[Student]
There is a student Peter Jansen.
There is a student Jan Hendriks.

[Residence]
Student Peter Jansen lives in Nijmegen.
Student Jan Hendriks lives in Nijmegen.

[Internship]
Internship S101 is available.
Internship S102 is available.

[Internshippreference]
Student Peter Jansen prefers nr 1 stage S101.
Student Peter Jansen prefers nr 2 stage S203.

[Internshiplocation]
Stage S101 takes place in Nijmegen.
Stage S102 takes place in Eindhoven.

[Internshipdescription]
Stage S101 is developing a time registration program.
Stage S102 is researching CASE tooling.

[internshipassignment]
Student Peter Jansen is assigned to internship S101.
Student Jan Hendriks is assigned to internship S203.

Creating an Information Grammatical Diagram (IGD)

When verbalization is done the next step is executed: designing an IGD. This model is not used for communication with end users because the diagram can be overwhelming and difficult to understand by novice users. IT people prefer abstract diagrams to understand the area of interest better.

Now, you can do it manually or you can use a tool like CaseTalk. Let's take the first fact expression and try to identify the labels and objects here. Objects are things that we want to know more about it. Objects should have an unique identification. Labels are descriptive information.

Below, I have used CaseTalk to identify the labels firstname and lastname. On the right an impression of the diagram is shown. FactType Student has two roles with two labels firstname and lastname.

This results in the following part of an IGD. A fact expression with placeholders 1 and 2 that can instantiated by "Peter Jansen" and "Jan Hendriks". I've entered the second sentence into CaseTalk too.

The next step is to enter all of the sentences aka fact expressions into CaseTalk. For instance, when the next fact expression is entered, the diagram is changed into the following :

Now we can derive two fact expressions from this model : There is a Student Jan Hendriks and Student Peter Jansen lives in Nijmegen.

When all of the Fact expressions are entered in CaseTalk the diagram appears as follows:

In this diagram all of the Fact types are added and the Factexpressions can be derived from the model. For instance,

F2 : <3> lives in <4>.
Role 3 is played by the nominalized objecttype Student which is Student <1><2>.
Role 1 is played by labeltype first name and role 2 is played by labeltype last name.
Role 4 is played by Nominalized objecttype Place <5>.
Role 5 is played by the labeltype placename.

Now this result in the following substitution:

"Student Jan Hendriks lives in Nijmegen"

Final thoughts

This is a short description about Fact Oriented Modelling. I've explained verbalization, classification and qualification, deriving an IGD from fact expressions. In the next blogpost, I'll focus on the constraints of a model. Although there is a structure in the model, there are more limitations/constraints possible, for instance there can be only one student with the same name. This will be subject for the following blogpost.

Hennie

↧

Fact Oriented Modeling constraints (part II)

March 6, 2019, 11:23 am

≫ Next: Powerdesigner series : How to organize your workspace in Powerdesigner

≪ Previous: Fact Oriented Modeling Introduction (Part I)

Introduction

In my previous blogpost about FOM, I've discussed the basics of FOM. In this blogpost I would like to elaborate on constraints in FOM. Constraints are important in data modelling because they limit the degrees of freedom (of the data) in the model. In this blogpost I'll describe the different constraints that can be applied to the FOM data models.

Constraints

As said before, there are different constraints possible in FOM :

Value Constraints.
Uniqueness Constraints.
Totality Constraints.
Subset Constraints.
Equality Constraints.
Mutually Exclusion Constraints.
Number constraints.

Value Constraints

Valueconstraints are limitations of certain labeltypes, for instance a sequencenumber can only exist of 1, 2 or 3. You can add this in CaseTalk.

This result in the following change in the diagram. It is now visible that sequencenumber can only exist of 1, 2 or 3.

Uniqueness Constraints

Yet another constraint is uniqueness, meaning that values are unique in a population. You can do this with the menu option Set Unique Constraint in CaseTalk.

This results in the following constraint on Student, saying that there can be only one student with a firstname and a lastname.

There are also other combinations possible, for instance :

Here you say that Peter Jansen is unique and this means that Peter cannot live in two places at the same time. The other way around: In Nijmegen it is possible that two students live here. So, there is not an unique constraint needed.

Another example is presented below. Here you can see that there are two unique constraints on separate columns, meaning that a student is unique and internship is unique. This means that a student can only do one internship and an internship can only be done by one student.

Here is a more complicated uniqueness constraint. In this diagram there 3 roles and two uniqueness constraints:

Student and internship.
Student and sequencenumber.

The combination of student and internship is unique. That makes sense. You can not apply twice for the same internship. The other uniqueness constraint is student and sequencenumber is also logical. A student can not apply for an internship with the same sequencenumber. The student should make an order of internships.

After determining the uniqueness constraints a couple of tests are necessary and they are organized as follows:

Elementary test

n-1 rule test
n rule test
Projection/Join test

Nominalisation test

Totality Constraints.

Totality constraints are constraints that says that every tuple from a role of a (nominalised) facttype should be present in a involved role. For instance: "Every student should have a place to live" meaning that when a student is known, the place is also known. If a place is not known, this is not possible in the semantics. This is not the fact. The fact is that a student has a place. So, you identify is with a totality constraint. How to set this up in CaseTalk?

I had to figure this one out but I managed to make a totality constraint. First you have to press the <CTRL> button and then click on the role you want to put a Totality constraint on.

Now when you are done, The following diagram has Totality Constraints.

Setting Totality Constraints should be handled with care, because when generating a data model with a totalityconstraints can result in a NOT NULL column.

Subset Constraints.

Subset constraints are constraints that says something about subsetting of particular roles. This states that the set of roles is a subset of another set of roles.The values of oneset of roles should be present in the other set of roles.

Let's experiment this with CASE Talk. First press on the subset constraint Icon in the top left in diagram pane.

This will show the following window and here you can enter the subset constraint with the From and To part.

And here is the result of the subset constraint.

The plotted size is a bit huge in contrast with the diagram. You can change that in the diagram with Style and Options.

And here you set the font size.

Equality Constraints.

An equality constraint is needed when two set of roles are equal. This is entered as two subset constraint and is only changed in the From and To.

Mutually exclusion constraints
Mutually exclusive constraints are constraints that are exclusive between two roles. These roles cannot have a common population. For instance, You're married or divorced. You can not be both at the same time.

Number constraints

Number constraints are constraints that limit the values in a role. For instance a student can only enter three preferences for a internship.

Final thoughts

Setting the constraints on the FOM datamodel with CaseTalk will limit the degrees of freedom in your information model and help you structure the information and ultimately a physical model.

Regards,

Hennie de Nooijer

↧

Powerdesigner series : How to organize your workspace in Powerdesigner

April 25, 2019, 12:12 pm

≫ Next: DevOps series : tSQLt Framework

≪ Previous: Fact Oriented Modeling constraints (part II)

Introduction

In this blogpost I'll outline the concept of workspaces in Powerdesigner. A workspace is a container of models that you want to organize in a logical manner. Although, a workspace in Powerdesigner is a concept that is comparable with Visual studio solutions, there are some differences between the concepts of workspaces and visual studio solutions.

Similarities between Powerdesigner and Visual Studio projects are that you can organize different files in a container, you can use folders and store files in a hierarchy of folders. It is possible to have multiple workspaces on a machine.

Now there is also the same concept of projects. In solution explorer of Visual Studio you can create projects as in Powerdesigner. Although, I haven't been investigate projects very thoroughly yet, projects seems very different in Powerdesigner than projects in Visual studio. In Visual Studio files are organized in (one or more) project(s) in a solution, but the concept of projects in Powerdesigner is a bit more different. Projects inPowerdesigner are supposed to show relationships between models and what the dependencies are.

Start with a workspace

Let's start looking at the concept of workspaces. When Powerdesigner is started a workspace is already there (in contrast with Visual Studio).

Now you can save the workspace at a certain place on your computer and give the file a proper name.

When the workspace file is saved, a .sws file is created with the a name that you gave.

Rename a workspace

Now saving the file with another name than the standard "workspace" does not change the name in Powerdesigner. You have to rename that too.

And when you have done this, the workspace in Powerdesigner is renamed to different name.

So starting Powerdesigner and saving a Workspace at a proper place and renaming the workspace with a same name is advisable. This way there is less confusion.

Add a data model to a workspace

A next step is adding a data model to the workspace. That is possible with the menu option New > Physical Data Model (for instance).

Now a Physical Data Model is added to the Workspace. Note that there are two levels in the Wokspace added. A Physical Model and a Physical Diagram. I'll show some more examples later.

Add a folder to a workspace

Yet another option to organize models in a workspace is the usage of folders. You can use the option New > Folder.

Folders are logical organized in your workspace file and not physical. If you want to organize your model in a physical folder structure you can simply create them in the window folder and save your models there.

Organize models in folders

Models can be organized in models and now you can simply drag models to the folder and drop it there. The model (and diagram) is now present in the folder.

And organize models in a physical folder on your file system.

Adding more models and diagrams

It's also possible to organize models and diagrams in different ways in a workspace. You can add diagrams to a model and you can add separate models and diagrams. Below an example of adding another diagram to a existing model.

And off course you can add a new model/diagrams to a workspace.

If you insert different types of workspace objects to the workspace you can have something like this.

Save your workspace

You can save the workspace to disk and it can be confusing saving the models to the disk and using logical and physical folder all together. In my opinion, use only logical folders in your workspace because aligning physical and logical folders can be very difficult and confusing.

How is the sws file organized?

I've added a screenshot of the workspace file (.sws) and this organized as a XML file. Notice that not all elements of the workspace are mentioned in the file. Only one model and one diagram is stored.

Final thoughts

This blogpost was about discovering the basic options of organizing a workspace in Powerdesigner. Using workspaces have some similarities and dissimilarities with Visual Studio projects.

Hennie

↧

DevOps series : tSQLt Framework

April 28, 2019, 10:05 am

≫ Next: Microsoft: DAT263x Introduction to Artificial Intelligence (AI)

≪ Previous: Powerdesigner series : How to organize your workspace in Powerdesigner

Introduction

In one of my former projects I gained a lot of experience with building, testing and releasing with Azure DevOps and I noticed that unit testing in the database is a good thing to do. In my current project we are using tSQLt Framework for testing purposes. This blogpost is about the tSQLt framework, on how to install and how to use. The first basic steps in order to get you (and me) going.

Why is unittesting important? Well, I have noticed during my work as a consultant that releasing code can be very cumbersome and tricky when you're not convinced that everything still works, even the code you have not touched. May be something has changed in the data that will gives errors somewhere else. So running unittests before you release, is a very good thing to do!

tSQLt is free and is downloadable. There are a couple of steps that you have to do to make it work. You have to spend some time in using the stored procedure calls and scripting to understand to working of the tSQLt framework.

For this blogpost I've used some inspiration from sqlshack.

Installation of the tSQLt framework

First download the files from tsqlt.org and unzip it somewhere on your disk, like I've done below. There are a couple of sql files.

The next step is the installation of an example of the framework into SQL Server. That is the example.sql file. Open SSMS and execute the example.sql file.

Executing the tests scripts in the example file succeeds except one. The last unittest gives an error, unfortunately.

Below is the specific error of the test that is executed.

So the test is failed and therefore we need to check whether what went wrong. Is the test not good defined or something else has happened?

I changed to > into >= and execute the testexecution again and now it runs properly.

My first tSQLt testscript

Now the next step I've done is executing my own script (with a little help from sqlshack). I've created a database, a customer table and inserted a record and added a stored procedure that I would like to test.


CREATE DATABASE TestThetSQLtFramework
GO

USE TestThetSQLtFramework;
GO

CREATE TABLE [dbo].[Customer] (
    [CustomerId]       INT           IDENTITY (1, 1) NOT NULL,
    [Name]             VARCHAR (40)  NOT NULL,
);

SET IDENTITY_INSERT [dbo].[Customer] ON
INSERT INTO [dbo].[Customer] ([CustomerId], [Name]) VALUES (1, N'Hennie')
SET IDENTITY_INSERT [dbo].[Customer] OFF

SELECT * FROM [Customer]


CREATE PROCEDURE AddCustomer(@Name VARCHAR(40))
AS
BEGIN

  INSERT INTO dbo.[Customer] (Name)
  VALUES (@Name)

END
GO

The next step is to run the test framework, and it seems nothing is there, and the test process is executed properly. Off course that is because there is no test there.


EXEC tSQLt.RunAll


(0 rows affected)

+----------------------+
|Test Execution Summary|
+----------------------+

|No|Test Case Name|Dur(ms)|Result|
+--+--------------+-------+------+
-----------------------------------------------------------------------------
Test Case Summary: 0 test case(s) executed, 0 succeeded, 0 failed, 0 errored.
-----------------------------------------------------------------------------

The following step is creating a test class (aka a schema) in the database.

EXEC tSQLt.NewTestClass 'CustomerTests';
GO

Then, a test stored procedure has to be created in the database with three steps : Assemble, Act and Assert.

ALTER PROCEDURE [CustomerTests].[TestAddCustomer]
AS
BEGIN
 -- Assemble
        EXEC tSQLt.FakeTable 'dbo.Customer', @Identity = 1

    Create TABLE [CustomerTests].[Expected] 
 (
   [CustomerId] INT NOT NULL,
   [Name] VARCHAR(40) NOT NULL,
 )

 INSERT INTO [CustomerTests].[Expected] (CustomerId,Name)
 VALUES (1,'Hennie')

 -- Act
 EXEC dbo.AddCustomer 'Hennie'
 SELECT * INTO CustomerTests.Actual FROM dbo.Customer 

  -- Assert (compare expected table with actual table results)
 EXEC tSQLt.AssertEqualsTable @Expected='CustomerTests.Expected', 
@Actual='CustomerTests.Actual'
END;
GO

Executing the stored procedure will result in the following output. All is fine!


EXEC tSQLt.RunAll


(1 row affected)

+----------------------+
|Test Execution Summary|
+----------------------+

|No|Test Case Name                   |Dur(ms)|Result |
+--+---------------------------------+-------+-------+
|1 |[CustomerTests].[TestAddCustomer]|    113|Success|
-----------------------------------------------------------------------------
Test Case Summary: 1 test case(s) executed, 1 succeeded, 0 failed, 0 errored.
-----------------------------------------------------------------------------

Now, see what happens when I change the output from the desired output. I changed Hennie into Hennie2 in the call of the stored procedure.

-- Act
EXEC dbo.AddCustomer 'Hennie2'
SELECT * INTO CustomerTests.Actual FROM dbo.Customer

Excuting the RunAll SP of the tSQLt framework will result in an error in the test framework.

EXEC tSQLt.RunAll


(1 row affected)
[CustomerTests].[TestAddCustomer] failed: (Failure) Unexpected/missing resultset rows!
|_m_|CustomerId|Name   |
+---+----------+-------+
|<  |1         |Hennie |
|>  |1         |Hennie2|

+----------------------+
|Test Execution Summary|
+----------------------+

|No|Test Case Name                   |Dur(ms)|Result |
+--+---------------------------------+-------+-------+
|1 |[CustomerTests].[TestAddCustomer]|    127|Failure|
-----------------------------------------------------------------------------
Msg 50000, Level 16, State 10, Line 29
Test Case Summary: 1 test case(s) executed, 0 succeeded, 1 failed, 0 errored.
-----------------------------------------------------------------------------

Final thoughts

Building test is very simple in the tSQLt test framework

Hennie

↧

Microsoft: DAT263x Introduction to Artificial Intelligence (AI)

May 9, 2019, 11:09 am

≫ Next: An introduction to Azure Data Studio

≪ Previous: DevOps series : tSQLt Framework

Introduction

I'm participating in the Microsoft Professional Program AI by Micrsosoft. I've done already the programs Datascience and Big data. These are also part of the Microsoft Professional Programs. I've experienced them as an easy to follow instruction based courses. There are more of these programs available that are interesting for data enthusiasts. Think about Internet of Things and Data analysis.

The great thing about these programs is that these programs consists of high quality instructor led courses, broken in easy to digest videos, exercises, labs and quizzes on the Edx site. So every minute spare time you have you can follow a couple of videos.

The program is broken in the following courses :

Introduction to Artificial Intelligence (AI)
Introduction to Python for Data Science
Essential Mathematics for Artificial Intelligence
Ethics and Law in Data and Analytics
Data Science Essentials
Build Machine Learning Models
Build Deep Learning Models
Build Reinforcement Learning Models
Develop Applied AI Solutions
Microsoft Professional Capstone : Artificial Intelligence

This blogpost describes the experiences I had with the first course : Introduction to Artificial Intelligence (AI).

DAT263x Introduction to Artificial Intelligence (AI)

This course is about an introduction of AI and exists of the following parts:

Machine learning
Language and communication
Computer vision
Conversation as a platform

Machine learning is a very lightweight introduction of machine learning and not a very comprehensive overview of the different terminology like AI, machine learning and deep learning. Very quickly the course presents Azure ML Studio with regresssion, classification and clustering.

Language and communication is about textprocessing, an introduction to NLP and using the application in Azure LUIS (Language Understanding Intelligent Service) with intents and how to use language processing in an example.

Computer vision is an introduction to get you starting with image processing and working with Images and Videos.

Conversation as a platform is about the bots: an introduction and how to build an intelligent bots

Final thoughts

I haven't followed the complete program yet (disclaimer alert!). The course is mostly about the products of Microsoft and is very hands-on. For a very theoretical, or an overview of AI I would rather look into another course like that of Andrew Ng on Coursera. Although I didn't participated in that specialization track yet, I think that kind of a course is more about the theory of AI.

But, if you want to know more about the products of Microsoft and how these are related to AI, I would recommend this program of Microsoft. I've learned about products like LUIS and didn't knew before.

I'll let you know my progress in the program!

Hennie

↧

An introduction to Azure Data Studio

May 23, 2019, 12:47 pm

≫ Next: Devops : Structuring a tSQLt test environment

≪ Previous: Microsoft: DAT263x Introduction to Artificial Intelligence (AI)

Introduction

Until recently I have been using SQL Server Management Studio for developing code with SQL Server. I really love this tool, but it has grown into a comprehensive - and sometimes awkward tool to use. Azure Data Studio is a lightweight software tool that makes developing and administration of SQL Server easier than SQL Server Management Studio. ADS can be used on multi platforms like MacOS, Linux and Windows and is integrated with Git. You can find more information here.

So in short Azure Data studio has the following interesting features :

Different kernels like SQL and python.
Code snippets.
Integration with Source control.
Powershell support.
Notebooks.

Installation of Azure Data Studio

First let's start with downloading the Azure Data Studio from the download location. Here you can find the installation files for Linux, MacOS en Windows. I choose the windows User installation files of the latest version (May 2019, version 1.7.0). The installation is fairly easy and it's a matter of Next, Next and Next.

The initial screen is a simple screen where you can set the database connection.

After you set the database connection strings you're set to go using the Azure Data Studio.

Executing a script

The first thing I tried is executing a script in ADS.

I checked the error messages in SSMS and they are exactly the same.

Searching objects

Finding objects in ADS is a bit different than in SSMS. You can find objects by using prefixes like t: for tables and sp: for stored procedures.

Browsing objects on a server is also possible.

Notebooks

Notebooks are new in Azure Data Studio. I know about notebooks because of I use them during jobs, R Courses and AI courses (jupyter). It is an easy way to share code. I like the way notebooks work. It's like telling story with code all together. Who hasn't joined a project with nothing else but code. Wouldn't it be great when thoughts, decisions are well written in a story together with the code. I'm note sure whether developers are the targetted people of notebooks, I think that people who work with data like data scientists and analysist will appreciate this functionality very much

There are a couple of options creating a notebook. One option is with the menu option and another way is to use the command palette. I choose the latter one. Yet another surprise is that you can write in Spark | R, Python and Pyspark in Azure Data Studio. These are kernels.

Creating a notebook is easy to do. You can add codeblocks and you can add text(blocks) and that by each other. It is possible to have multiple lines of code in a code block.

The notebook is saved as a .ipynb extension and that is can be used in Microsoft Azure notebooks.

Code snipppets

Adding code snippets is very easy in Azure Data Studio. Open a query and type in sql. Typing sql will open a dropdown menu where it is possible to choose a template.

Source explorer

It is also possible to integrate source explorer in Azure Data Studio. Initially, the source explorer is showing an error message "No source control providers registered". In order to have a proper working source control integration it is necessary to install a Source control provider.

In this particular case I'll download git and install git in a standard manner.

Next step is creating a working folder. Click on open Folder and locate the folder you want to work from and click OK. In my case, I'm using D:\Git. Now it is possible to use the git integratoin in ADS.

After initializing git the following git options are available.

It seems there is not a native support for Azure DevOps yet. It's possible to download extensions where you add support for Azure DevOps.

Final thoughts

I've scratched the surface on how to work with Azure Data Studio aka ADS. It is an interesting tool to use and I'll decide in the near future whether I'm going to leave SSMS and use ADS. Time will tell.

One thing I noticed is that ADS is quite CPU intensive. I'm using a fairly old laptop with a VM and it happens that the CPU is sky high on 100% and that problem doesn't occur with SSMS. Probably it's my old laptop that gives this problem.

Hennie

↧

Devops : Structuring a tSQLt test environment

June 5, 2019, 9:30 am

≫ Next: DAT208x : Introduction to Python for Data Science

≪ Previous: An introduction to Azure Data Studio

Introduction

Imagine that you have worked hard on a database project in a data warehouse environment and you plan to move your code to a production environment. Sometimes tables are often 'touched' by data and there are sometimes tables rarely used in a production environment. When tables are used very often during production usage, errors will occur immediately and users start complaining there is something wrong. But what if there are objects that aren't used very often, it may not very easily detected if something is wrong. Therefore it is a good practice to implement (database) unit testing. In tSQLt test framework there is a procedure (AssertObjectExists) which just checks whether an object exists (or not). This could be a good starting point to implement unit testing with tSQLt. When code is deployed to a test environment you can run this procedure checking if the object exist.

How to structure your unittesting

As said in the introduction, one of the simple tests is checking whether the installation of the code is correctly executed. You can do this by just checking whether an object exists in a database (or not). But, simply generating a tSQLt test procedure is too easy (for me). You want to organize these testprocedures and easily adding unittesting procedures in the future is useful. Therefore, I've decided to organize tSQLt test procedures in the following manner: test procedures are organized in databases, schemas and database objects, like Tables and views. This is depicted in the diagram, below.

In the example above, there are two test procedures for checking the existence of an object and one for testing whether if the Primary key is functioning properly.

A list of objects you want scripts could be :

Tables
Views
Synonyms
Stored procedures
Triggers
Functions
Sequences
etc

How to structure a tSQLt Framework installation process

One of the disadvantages is that tSQLt Framework is installed in the database in which you are working. And, although there is a de-installation script, I still have found tSQLt code in the database. I know that there are administrators who are not very happy with this.

Anyway, my current approach is as follows:

Installation of the framework :

Installation of the tSQLt Framework.
Installation of extra helper code.
(Check whether the installation works by just executing an empty framework)
Installation of the test procedures.

Execution of the test procedures :

Execute the testprocedures (these have their own schema).

And when I'm done I will execute the following steps:

De-installation of the test procedures
De-installation of the tSQLt Framework

Powershell script

I've created a Powershell script and I can execute this script repeatably (that won't happen) because it will create a new folder (with $path = "D:\tmp\SQL\"+"$date_"). I've done this in order to test the script and check differences with previous versions. This not feasible in a real world environment in my opinion. In a real world scenario, new manually created test procedures are added to the test environment. These are very difficult to create automatically. So for now, My advice is to use the script and try it a couple times until you are satisfied. Fix the structure and proceed on and add new manual created test procedures. But perhaps, you can setup a compare and change script to add new test procedures for new objects. For me it is enough to set up the environment once and proceed manually.

This script has the following characteristics :

It supports multiple databases.
It supports multiple schemas.
It supports all kind of database objects.
It's is an initial setup script and it's not execute it twice and save the scripts to the same location.

# Datetime used for creating the folder for generated scripts
$date_ = (Get-Date -f yyyyMMddHHmmss)

# Name of the SQL Server name
$ServerName = "."

# Location where the scripts are stored
$path = "D:\tSQLt\AdventureWorksLT\unittesting\"

# Used this for testing purposes
# $path = "D:\tmp\SQL\"+"$date_"

# The databases that you want to script (-or $_.Name -eq '<database>')
$dbs = $serverInstance.Databases | Where-Object {($_.Name -eq 'AdventureWorksLT2017') }

# The database objects you want to script
$IncludeTypes = @("Tables","StoredProcedures","Views","UserDefinedFunctions", "Triggers") #$db.

# The schemas that you want to script.
$IncludeSchemas = @("SalesLT")

# The name of the generated tSQLt test procedures
$TestSchema = "advtests"

# Connect to a SQL Server instance
[System.Reflection.Assembly]::LoadWithPartialName('Microsoft.SqlServer.SMO')
$serverInstance = New-Object ('Microsoft.SqlServer.Management.Smo.Server') $ServerName
$so = new-object ('Microsoft.SqlServer.Management.Smo.ScriptingOptions')

# For every database in the variable $dbs
foreach ($db in $dbs)
{
    $dbname = "$db".replace("[","").replace("]","")
    $dbpath = "$path"+ "\"+"$dbname"+"\"

    # Create a folder for every database
    if (!(Test-Path $dbpath))
        {$null=new-item -type directory -name "$dbname" -path "$path"}

    # For every schema in the Database
    foreach ($sch in $db.Schemas)
    {
        $schema = "$sch".replace("[","").replace("]","")

        # Is the schema present in the list of desired schemas
        If ($schema -in $IncludeSchemas) 
        {
           $schemapath = "$dbpath"+ "$schema"+"\"

           # Create a folder for every schema
           if (!(Test-Path $schemapath))
                {$null=new-item -type directory -name "$schema" -path "$dbpath"}

           $SchemaInstallScript = 
"SET ANSI_PADDING ON -- needed to prevent errors`r`n" + 
"`r`n" + 
"--:setvar scriptpath `"$path`"`r`n" + 
"`r`n"

            # For every type in the list of object types (eg. Stored procedures)
            foreach ($Type in $IncludeTypes)
            {
                # Create a folder for every objecttype
                $objpath = "$schemapath" + "$Type" + "\"
                if (!(Test-Path $objpath))
                    {$null=new-item -type directory -name "$Type" -path "$schemapath"}

                # This for installation SQL file (install.sql) for Object Types (Tables, SP, etc 
                $ObjTypeInstallScript = 
"SET ANSI_PADDING ON -- needed to prevent errors`r`n" + 
"`r`n" + 
"--:setvar scriptpath `"$path`"`r`n" + 
"`r`n"

                # Adding items to the Schema install script.
                $SchemaInstallScript += 
"print('$Type')`r`n" + 
"GO`r`n" + 
":r `$(scriptpath)`"\$dbname\$schema\$Type\install.sql`"`r`n" + 
"GO`r`n"

                # For every ObjectType in the list  
                foreach ($ObjType in $db.$Type)
                {

                    # Only the included schemas are scripted                      
                    If ($IncludeSchemas -contains $ObjType.Schema ) 
                    {
                        $ObjName = $ObjType.Name.replace("[","").replace("]","") 
                        $objectpath = "$objpath" + "$ObjName" + "\"

                        # Create a new folder for the object
                        if (!(Test-Path $objectpath))
                            {$null=new-item -type directory -name "$ObjName" -path "$objpath"}   

                        $OutObjectFile = "$objectpath" + "test_exists_" + $schema + "_" + $ObjName + ".sql"

                        # Adding items to the ObjType install script.
                        $ObjTypeInstallScript += 
"print('$ObjName')`r`n" + 
"GO`r`n" + 
":r `$(scriptpath)`"\$dbname\$schema\$Type\$ObjName\install.sql`"`r`n" + 
"GO`r`n"

                        # Generating the actual test exists procedure
                        $ContentObjectFile = 
"USE $dbname`r`n" + 
"GO`r`n" + 
"`r`n" + 
"IF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'$TestSchema.test_exists_$schema`_$ObjName') AND type in (N'P', N'PC'))`r`n" + 
"EXEC ('`r`n" + 
"    CREATE PROCEDURE $TestSchema.test_exists_$schema`_$ObjName AS`r`n" + 
"        BEGIN`r`n" + 
"            THROW 50001, ''tSQL generate_from_tpl error!'', 1;`r`n" + 
"        END`r`n" + 
"')`r`n" + 
"GO`r`n" + 
"ALTER PROCEDURE $TestSchema.test_exists_$schema`_$ObjName AS`r`n" + 
"/*`r`n" + 
"Author : `r`n" + 
"`r`n" + 
"Description: `r`n" + 
"    This stored is automatically generated`r`n" + 
"`r`n" + 
"History `r`n" + 
"    $date_ : Generated`r`n" + 
" `r`n" + 
"*/`r`n" + 
"BEGIN`r`n" + 
"SET NOCOUNT ON;`r`n" + 
"`r`n" + 
"----- ASSERT -------------------------------------------------`r`n" + 
"EXEC tSQLt.AssertObjectExists @ObjectName = N'$schema.$ObjName';`r`n" + 
"`r`n" +  
"END;" | out-File $OutObjectFile -Encoding ASCII

                        # Generating the local install file in the folder
                        $OutInstallFile = "$objectpath" + "install.sql"
                        $ContentInstallFile = 
"SET ANSI_PADDING ON -- needed to prevent errors `r`n" + 
"`r`n" + 
"USE $dbname`r`n" + 
"GO`r`n" + 
"`r`n" + 
"--:setvar scriptpath `"$path`"`r`n" + 
"`r`n" + 
"DECLARE @TestSchema as varchar(30) = '$TestSchema' `r`n" + 
"IF NOT EXISTS (SELECT * FROM sys.schemas WHERE name = @TestSchema)`r`n" + 
"    EXEC tSQLt.NewTestClass @TestSchema`r`n" + 
"`r`n" + 
"print('test_exists_$ObjName')`r`n" + 
"GO`r`n" + 
":r  `$(scriptpath)`"\$dbname\$schema\$Type\$ObjName\test_exists_$schema`_$ObjName.sql`"`r`n " + 
"GO" | out-File $OutInstallFile -Encoding ASCII

                        # OutCMDFile
                        $OutCMDFile = "$objectpath" + "install.cmd"
                        $ContentCMDFile = 
"REM Object CMD file`r`n" + 
"SET curpath=`"$path\`"`r`n" + 
"SQLCMD -S localhost -E -i `"install.sql`" -v scriptpath=%curpath%`r`n"+ 
"PAUSE" | out-File $OutCMDFile -Encoding ASCII
                    } # if
                } #object
                # Save the ObjType install.sql file
                $OutObjTypeInstallFile = "$objpath" + "install.sql"
                $ObjTypeInstallScript | out-File $OutObjTypeInstallFile -Encoding ASCII

                # creating the ObjType cmd file
                $OutObjTypeCMDFile = "$objpath" + "install.cmd"
                $ContentObjTypeCMDFile = 
"REM ObjectType CMD file`r`n" + 
"SET curpath=$path\`r`n" + 
"SQLCMD -S localhost -E -i `"install.sql`" -v scriptpath=`"%curpath%`"`r`n"+ 
"PAUSE" | out-File $OutObjTypeCMDFile -Encoding ASCII
            } # object type

            # Save the Schema install.sql file
             $OutSchemaInstallScript = "$schemapath" + "install.sql"
             $SchemaInstallScript | out-File $OutSchemaInstallScript -Encoding ASCII

            # creating the schema cmd file
             $OutschemaCMDFile = "$schemapath" + "install.cmd"
             $ContentSchemaCMDFile = 
"REM Schema CMD file`r`n" + 
"SET curpath=$path\`r`n" + 
"SQLCMD -S localhost -E -i `"install.sql`" -v scriptpath=`"%curpath%`"`r`n"+ 
"PAUSE" | out-File $OutschemaCMDFile -Encoding ASCII 

        } #if included in schema
    } #schema
} #db

This results in the following folderstructure :

On mostly every level I've created install scripts that can execute certain areas of testprocedures or even one unittest procedure. Below an example of executing all testprocedures on a database.

Below an example of the content of testprocedure file :

USE AdventureWorksLT2017
GO

IF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'advtests.test_exists_SalesLT_Customer') AND type in (N'P', N'PC'))
EXEC ('
    CREATE PROCEDURE advtests.test_exists_SalesLT_Customer AS
        BEGIN
            THROW 50001, ''tSQL generate_from_tpl error!'', 1;
        END
')
GO
ALTER PROCEDURE advtests.test_exists_SalesLT_Customer AS
/*
Author : 

Description: 
    This stored procedure is automatically generated

History 
    20190605180830 : Generated

*/
BEGIN
SET NOCOUNT ON;

----- ASSERT -------------------------------------------------
EXEC tSQLt.AssertObjectExists @ObjectName = N'SalesLT.Customer';

END;

Final thoughts

I was building the folder structure and scripts manually, but thought that it would be better to use Powershell script to create a test environment with one type of testing procedure: does the object exist. Now, I can script every database/project and setup very fast a test environment.

Hennie

↧

DAT208x : Introduction to Python for Data Science

July 2, 2019, 1:30 pm

≫ Next: DevOps series : tSQLt, Visual Studio and Azure DevOps

≪ Previous: Devops : Structuring a tSQLt test environment

Introduction

I'm participating in the Microsoft Professional Program AI by Micrsosoft. I've already done the programs Data science and Big data. These are also part of the Microsoft Professional Programs. I've experienced them as an easy to follow instruction based courses. There are more of these programs available that are interesting for data enthusiasts. Think about Internet of Things and Data analysis.

The great thing about these programs is that these programs consists of high quality instructor led courses, broken in easy to digest videos, exercises, labs and quizzes on the Edx site. So every minute spare time you have you can follow a couple of videos.

The program is broken in the following courses :

Introduction to Artificial Intelligence (AI)
Introduction to Python for Data Science
Essential Mathematics for Artificial Intelligence
Ethics and Law in Data and Analytics
Data Science Essentials
Build Machine Learning Models
Build Deep Learning Models
Build Reinforcement Learning Models
Develop Applied AI Solutions
Microsoft Professional Capstone : Artificial Intelligence

DAT208x : Introduction to Python for Data Science

This course is an introduction of Python in combination with data science. There are other Python courses available but they do not always focus on data science. This course it is. The course is a collaboration between Edx and DataCamp and I have to say that the interaction between the two sites works great.

The course is divided in the following sections:

Python Basics
List - A Data Structure
Functions and Packages
Numpy
Plotting with Matplotlib
Control Flow and Pandas
Final Lab
Final Exam

Final thoughts

The whole scripting is executing in a controlled environment of DataCamp. They did a great job building an integrated learning environment. Every section has one or more labs and they are graded in Edx.

The Final Lab is lot of work and covers more than the material in the sections and in the videos. This took me quite some time finding out how and what. Google is your friend here. The Final Exam contains 50 questions and must be finished within 4 hours. You have limited time per question.

Hennie

↧

DevOps series : tSQLt, Visual Studio and Azure DevOps

August 6, 2019, 10:00 am

≫ Next: DevOps series : Deploying with ARM templates

≪ Previous: DAT208x : Introduction to Python for Data Science

Introduction

Currently using and investigating tSQLt in a data warehouse project in a complicated environment with multiple developers, datamodellers and testers. I decided to investigate on how using Visual Studio together with the tSQLt framework and using that in Azure DevOps. This blogpost is just one step in the process of researching tSQLt together with Visual Studio and Azure DevOps. I'm not stating that this is the final solution on how to DevOpsing with tSQLt. Finetuning alert ahead ;-) I'll blog about that in the future. I am using AdventureWorksLT2017 as an example project for this experiment.

I'm not covering all of the details of tSQLt, Visual Studio and Azure DevOps, but I will show how to set up a project with AdventureWorksLT2017, tSQLt framework and together with some tests in Visual Studio end Azure DevOPs

Inspiration for this blog is from Nikolai Thomassen. I found more information from blogposts like these from Sandra Walters and Medium.com. Many thanks to these contributors.

The local environment

I've a VM with a development environment together with Visual Studio 2017 professional, SSDT installed, SSMS with Git support, connected with Azure DevOps. On this machine, I've created a set of Visual Studio projects and databases. I've depicted that in the following diagram.

So there are a couple of parts in this diagram :

The visual studio project (green boxes).
Database references (green lines).
Deployment to the different database projects (blue lines and blue cylinders).

The Visual Studio project

I've created one solution with three projects:

AdventureWorksLT2017. This is the development project where all of the database code is stored. It contains the table definitions, views, stored procedures and all other database objects.
AdventureWorksLT2017.framework.tsqlt. This is the project where the framework is installed. one of the reasons doing so, is that you can update the framework regardless of the tests and the database project.
AdventureWorksLT2017.tests. This is the project where all the test definitions aka the unittesting procedures are stored.

Now, this setup will make sure that the different parts of the project are deployed in their targeted environment. One of the issues I had, when I started is that the tSQLt was installed at my development database and with this setup. The development code (AdventureWorksLT2017) is now more separated from the tSQLt code and test procedures (AdventureWorksLT2017_CI).

Importing the TSQLt in your project can be a bit tricky but I assume you can do it!

Database references

Now, in order to make this work it is necessary to set up the projects with a so called composite project. Composite projects are projects that are part of a database. Normally a Database Reference (as it says) is a reference to a whole database : a project is a database. This can be very unhandy.

For composite projects it is necessary to set up a Database Reference with Database Location set up : "Same Database".

Deployment to databases

I've created some publish files at my projects for deploying the projects to the databases. I've created them for my local deployment but these can also be used in the Test release or other release environment.

The testproject

Now, the testproject is not much different than that of Nicolai. I've created a test procedure that will test whether a value of color is inserted in the column Color with the stored procedure InsertProduct.


CREATE PROCEDURE [tsqlttests].[test_insert_Product_check_if_color_is_inserted ]
AS 
BEGIN

--ASSEMBLE
DECLARE @Name   NVARCHAR = 'Car';
DECLARE @ProductNumber NVARCHAR = '12345';
DECLARE @StandardCost MONEY = 1 ;
DECLARE @ListPrice      MONEY = 2;
DECLARE @SellStartDate DATETIME = '2019-07-31';
DECLARE @Color   NVARCHAR (15)  = 'Yellow';
--ACT

EXEC SalesLT.InsertProduct 
  @Name    = @Name
    ,@ProductNumber  = @ProductNumber 
    ,@Color    = @Color
    ,@StandardCost  = @StandardCost
    ,@ListPrice   = @ListPrice
    ,@SellStartDate     = @SellStartDate

--ASSERT
DECLARE @Actual NVARCHAR = (SELECT TOP 1 Color FROM SalesLT.Product)


EXEC tSQLt.AssertEqualsString @Color, @Actual, 'Color name was not saved when @color was given'

END;

This stored procedure is shown below :


CREATE PROCEDURE SalesLT.InsertProduct
(
  @Name     [dbo].[Name]   
    ,@ProductNumber          NVARCHAR (25)    
    ,@Color                  NVARCHAR (15)
    ,@StandardCost           MONEY  
    ,@ListPrice              MONEY
    ,@Size                   NVARCHAR (5)
    ,@Weight                 DECIMAL (8, 2)
    ,@ProductCategoryID      INT
    ,@ProductModelID         INT  
    ,@SellStartDate          DATETIME 
    ,@SellEndDate            DATETIME  
    ,@DiscontinuedDate       DATETIME   
    ,@ThumbNailPhoto         VARBINARY (MAX)
    ,@ThumbnailPhotoFileName NVARCHAR (50) 
)
AS 
BEGIN
SET NOCOUNT ON;


SET NOCOUNT OFF;
END

The procedure will not insert a value and therefore the test will go wrong.

The DevOps Environment

The next step is setting up the Azure DevOps environment. I assume you have some basic knowledge of Azure DevOps. I'm not showing all of the basics here. In the following diagram, I'm showing a basic Build, Test and Release process that we are going to follow.

We have a local environment and we have a DevOps environment. The local environment is already described in the previous section. So what will happen when we are done, is that the code is committed to the Git database, where the build process is started to make sure that code is correct. Next step in the process is that the code is deployed to a CI environment. This will be executed with the dacpac from the test project. When the tests are done, it will deploy to TST environment for user testing.

The build process

The build process is executed in the Build part of Azure DevOps. Now, I'm not sure where Microsoft is heading with this, but it seems that YAML code writing will be the preferred way, in contrast with a half a year ago, when click and go was the way doing it. I've created the following YAML script:


pool:
  name: Hosted Windows 2019 with VS2019
  demands: msbuild

variables:
  BuildConfiguration: 'Debug'

steps:
- task: MSBuild@1
  inputs:
    solution: '**/*.sln'
    msbuildArguments: '/property:DSP="Microsoft.Data.Tools.Schema.Sql.SqlAzureV12DatabaseSchemaProvider"'

- task: CopyFiles@2
  displayName: 'Copy Files from Build to Artifact folder'
  inputs:
    SourceFolder: ''
    Contents: '**\bin\$(BuildConfiguration)\**.dacpac'
    flattenFolders: true 
    TargetFolder: '$(Build.ArtifactStagingDirectory)'

- task: PublishBuildArtifacts@1
  displayName: 'Publish Artifact: AdventureWorksLT2017'
  inputs:
    ArtifactName: 'AdventureWorksLT2017'

Notice the msbuild argument : /property:DSP="Microsoft.Data.Tools.Schema.Sql.SqlAzureV12DatabaseSchemaProvider". This is necessary to build the code for Azure databases. Locally I'm using SQL Server 2016 and I want to keep it that way, but when I deploy the code to Azure I must have to be Azure compatible. More information at medium.com.

The build process will deliver three (or four) dacpacs. These will be used for the release process.

The release process

I've setup the release process in two steps. One release and test on the CI database and one on the TST database. These database have different goals. The CI database uses the test dacpac and the TST database uses the AdventureWorksLT2017 dacpac. This makes sense because you don't want to deploy the test procedures to the TST database. So, the process is depicted below.

First step is to get the artefacts from the Artefacts folder and this is passed to the CI and TST release process.

The steps in the CI release process are presented below :

Most of the steps are quite straightforward. I borrowed the Powershell script from Nicolai and it worked like a charm.


$connectionString = "Server=tcp:devhennie.database.windows.net,1433;Initial Catalog=AdventureWorksLT2017_CI;Persist Security Info=False;User ID=xxxxxx;Password=xxxxxxxx;MultipleActiveResultSets=False;Encrypt=True;TrustServerCertificate=False;Connection Timeout=30;"

$sqlCommand = 'BEGIN TRY EXEC tSQLt.RunAll END TRY BEGIN CATCH END CATCH; EXEC tSQLt.XmlResultFormatter'

$connection = new-object system.data.SqlClient.SQLConnection($connectionString)
$command = new-object system.data.sqlclient.sqlcommand($sqlCommand,$connection)
$connection.Open()

$adapter = New-Object System.Data.sqlclient.sqlDataAdapter $command
$dataset = New-Object System.Data.DataSet
$adapter.Fill($dataSet) | Out-Null

$connection.Close()
$dataSet.Tables[0].Rows[0].ItemArray[0] | Out-File "$(System.DefaultWorkingDirectory)/_AdventureWorksLT2017/AdventureWorksLT2017/testresults.xml"

I've executed the following script, in order to retrieve the output.


BEGIN TRY EXEC tSQLt.RunAll END TRY BEGIN CATCH END CATCH; EXEC tSQLt.XmlResultFormatter

This is the output. This is readable by the test process in Azure DevOps.


<testsuites>
<testsuite id="1" name="tsqlttests" tests="1" errors="1" failures="0" timestamp="2019-08-07T06:55:29" time="0.077" hostname="devhennie" package="tSQLt">
<properties />
<testcase classname="tsqlttests" name="test_insert_Product_check_if_color_is_inserted " time="0.077">
<error message="Procedure or function 'InsertProduct' expects parameter '@Size', which was not supplied.[16,4]{SalesLT.InsertProduct,0}" type="SQL Error" />
</testcase>
<system-out />
<system-err />
</testsuite>
</testsuites>

When the testing is done, an overview is created in the Test tab in Azure DevOps. This is a nice integration with tSQLt and Azure DevOps.

Final thoughts

It was quite some work to figure it out but it was fruitful in the end. I had some great help with this blog from Nikolai Thomassen. He described some tweaks that I didn't know. I learned some interesting stuff again!

Hennie

↧

DevOps series : Deploying with ARM templates

August 11, 2019, 12:34 pm

≫ Next: DevOps series : Composite projects in Visual Studio SSDT

≪ Previous: DevOps series : tSQLt, Visual Studio and Azure DevOps

Introduction

During work on a mock up for the situation of unittesting of a database, it was a bit bugging me that CI database was only needed for a short term of time of testing and when the testing was done, the database was not needed anymore. Why not creating a database when the test starts en drop the database when the database is not needed anymore? It will save money when the database is not used. So, I decided to make some adjustments in my release pipeline.

The following steps are executed in my release pipeline.

The SQL Database Deploy ARM template

The first step of the release pipeline is creating a SQL database with an ARM template. Now there is some discussion about using this technique. This is a declarative way of creating infrastructure (as-a-service). But, some people advocates the procedural way of creating Infrastructure. I decide to use the ARM template to create my SQL Database. For this purpose, I'm using the template from Github. This is a Standard template and can be customized to your needs.

{
"$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#",
"contentVersion": "1.0.0.0",
"parameters": {
"sqlAdministratorLogin": {
"type": "string",
"metadata": {
"description": "The administrator username of the SQL Server."
      }
    },
"sqlAdministratorLoginPassword": {
"type": "securestring",
"metadata": {
"description": "The administrator password of the SQL Server."
      }
    },
"transparentDataEncryption": {
"type": "string",
"allowedValues": [
"Enabled",
"Disabled"
      ],
"defaultValue": "Enabled",
"metadata": {
"description": "Enable or disable Transparent Data Encryption (TDE) for the database."
      }
    },
"location": {
"type": "string",
"defaultValue": "[resourceGroup().location]",
"metadata": {
"description": "Location for all resources."
      }
    }
  },
"variables": {
"sqlServerName": "devhennie",
"databaseName": "AdventureWorksLT2017_CI",
"databaseEdition": "Basic",
"databaseCollation": "SQL_Latin1_General_CP1_CI_AS",
"databaseServiceObjectiveName": "Basic"
  },
"resources": [
    {
"name": "[variables('sqlServerName')]",
"type": "Microsoft.Sql/servers",
"apiVersion": "2014-04-01-preview",
"location": "[parameters('location')]",
"tags": {
"displayName": "SqlServer"
      },
"properties": {
"administratorLogin": "[parameters('sqlAdministratorLogin')]",
"administratorLoginPassword": "[parameters('sqlAdministratorLoginPassword')]",
"version": "12.0"
      },
"resources": [
        {
"name": "[variables('databaseName')]",
"type": "databases",
"apiVersion": "2015-01-01",
"location": "[parameters('location')]",
"tags": {
"displayName": "Database"
          },
"properties": {
"edition": "[variables('databaseEdition')]",
"collation": "[variables('databaseCollation')]",
"requestedServiceObjectiveName": "[variables('databaseServiceObjectiveName')]"
          },
"dependsOn": [
"[variables('sqlServerName')]"
          ],
"resources": [
            {
"comments": "Transparent Data Encryption",
"name": "current",
"type": "transparentDataEncryption",
"apiVersion": "2014-04-01-preview",
"properties": {
"status": "[parameters('transparentDataEncryption')]"
              },
"dependsOn": [
"[variables('databaseName')]"
              ]
            }
          ]
        },
        {
"name": "AllowAllMicrosoftAzureIps",
"type": "firewallrules",
"apiVersion": "2014-04-01",
"location": "[parameters('location')]",
"properties": {
"endIpAddress": "0.0.0.0",
"startIpAddress": "0.0.0.0"
          },
"dependsOn": [
"[variables('sqlServerName')]"
          ]
        }
      ]
    }
  ],
"outputs": {
"sqlServerFqdn": {
"type": "string",
"value": "[reference(concat('Microsoft.Sql/servers/', variables('sqlServerName'))).fullyQualifiedDomainName]"
    },
"databaseName": {
"type": "string",
"value": "[variables('databaseName')]"
    }
  }
}

Experimenting with the script

The next step was using the "Template Deployment" in Azure to test the script. For this reason, I searched for the resource (New Resource) New Template deployment.

Choose Build your own template in the editor. This will open a free editing window.

Copy and paste the ARM template in the editing window and press on Save.

The next step is to press Purchase in order to create the SQL Database with the ARM template.

The result should be that the database is created. If you're satisfied with the template, the next step will be to implement this ARM template in the release pipeline.

Implementing ARM Template in the DevOps pipeline

I've extended the release pipeline with the creation of a SQL database with ARM templates and with a DROP database script. This is depicted below.

In the following window I've included the details of the using the template. There are a couple of options of sourcing the ARM template. I decided to include it in my project. This has the advantage that all my deploymentscripts are stored together with my project code.

Now for dropping the database I (re)used a Powershell script from my previous blog about this subject. I've adjusted it a bit for executing a DROP DATABASE script.

This is the script.


$connectionString = "Server=tcp:devhennie.database.windows.net,1433;Initial Catalog=master;Persist Security Info=False;User ID=hennie;Password=xxxxxxx;MultipleActiveResultSets=False;Encrypt=True;TrustServerCertificate=False;Connection Timeout=30;"

$sqlCommand = 'DROP DATABASE AdventureWorksLT2017_CI;'

$connection = new-object system.data.SqlClient.SQLConnection($connectionString)
$command = new-object system.data.sqlclient.sqlcommand($sqlCommand,$connection)
$connection.Open()

Final Thoughts

In this blogpost I've sketched a scenario of creating and dropping a CI database together with the deployment of the code of a project, together with the tSQLt framework and testprocedures.

Advantage is that the database is created for only a short period of time, but disadvantage is that the creation of the database and installing the scripts take some considerable time (~10 minutes) for a small demo database.

Hennie

↧

DevOps series : Composite projects in Visual Studio SSDT

September 12, 2019, 10:00 am

≫ Next: Azure series : The Mapping Data Flow activity in Azure Data Factory

≪ Previous: DevOps series : Deploying with ARM templates

Introduction

In my current project, I'm working in a rather complicated environment with people working in teams in databases and these are also shared (used) among the teams. All kind of projects are deployed in a common database (in reality there a couple of more, but for the sake of simplicity for this blogpost, I'll assume there is one database. The main story is that objects (View, Stored procedures) of databases are shared between teams). This introduces an extra complication when using Visual Studio in combination with SSDT, because normally a project is a database (and a database is a project).

One of the challenges I see here is that if code is not present in a Visual Studio project it will try to delete the code in a database. Yet another challenge I'm investigating is version management through the different environments and what happens when a dacpac is pushed to a new step in the Software Development environment (DTAP).

Composite projects

For reasons described in the introduction, I was looking into composite projects. Bill Gibson mentioned something about working with different projects in a database (7 years ago! Time flies) in a blogpost. In my opinion, the following situations are possible in a database project (not SSDT project;-)) :

In this example there are 4 working projects (not Visual studio) on different systems and these are deployed into the same database. Working Project 1 uses code from Working Project 2 and Working Project 3. Working project 3 uses code from Working Project 2. Working project 4 uses code from Working project 2. Now, in this imaginary situation, I'm focusing on Working Project 1. As said before Working Project 1 uses code from Working Project 2 and 3. So the scope of Working project 1 is 2 and 3 (and off course itself). Working Project 4 is out of sight and not used by Working Project 1.

All the projects in a Visual Studio solution

There are a couple of options possible when designing your Visual Studio projects. First, You can include all of the code of projects in one Visual Studio solution. In my current situation I'm talking about 20 to 30 projects in the database and I don't need them (all of the time). The problem is that when one of the projects has some changes in the code, I've to update my local development database every time (even for projects that I'm not using for my current development). The advantage is that when I execute a Schema Compare between the database and the project it would show all of the changes easily.

Using DacPacs for decoupling

Yet another option is using dacpac's for decoupling the projects. So, not every change is reflected in your Visual Studio project. You set a database reference with the option "Same Database" and include the DacPac in your project and you're done. But, what about projects you don't even reference in your project? In the example is that Working Project 4.

In reality, It would look something like the situation below. We have THE project (eg. working Project 1), where we working on, we have the real referenced database project which code we are using in the THE project and we have projects that we are not using in the development of a database project at all.

To see what actually happens, I've created a mock up with two demo databases of Microsoft AdventureworksLT2017 and WideWorldImporters. I imported both databases in a Visual studio project, deployed them into one database and investigated what would happen if I execute a Schema Compare. If I don't add Database References to the projects, the Visual Studio WideWorldImporters wants to delete the code of AdventureWorksLT2017 and when I execute a Schema Compare in the AdventureWorksLT2017 it wants to delete the WideWorldImporters code in the database.

The next thing I tried was adding a Database reference, option Same Database and use the DacPac of the other project, I executed the Schema Compare again (don't forget to check "Include Composite Projects") and check the results again. The result was that the errors were gone. Even when the code is not used in a project, you can reference a dacpac for a "no error" comparison between a composite project and the database.

I can understand "following" the updates of the DacPacs from the referenced projects, but updating a local development database for every "not referenced" (other) projects can be time consuming. So, what will happen when you deploy your project to the Central Integration database and other steps in the DevOps pipeline.

Composite projects in DTAP

How will this work in the real world, where projects have different releases in different environments? Perhaps they need to execute a roll back of a release in an environment or are working in a different way? What if a project is provided with an old version of a dacpac and the responsible person of the project updated one of environments in the OTAP-line with a newer version and your project is planning an update to that environment?

I experimented with my AdventureWorksLT2017 and WideWorldImporters projects in one database and deployed these to a new database (as in a scenario like a new environment in the OTAP) and I added a new function in the WideWorldImporters project and deployed that to the new database. The result was that the AdventureWorksLT2017 project wants to delete the newly created function (because it is not in the dacpac of WideWorldImporters). So, I need to update the WideWorldImporters DacPac in the AdventureWorksLT2017 project.

Final Thoughts

This blogpost is an imagination of some experiments I've run with multiple database projects in one database. At the moment I've not found a satisfying solution for my problem. The option "Same database" in the Database reference seems handy in a small environment but in a multi team project environment, it introduces all kind of version issues and a high probability of errors and mistakes. It requires a certain skill set. Deployment should be done with high caution.

Hennie

↧

Introduction

Order

Sale

Purchase

Stock holding

Movement

Transaction

Conclusion

Introduction

Database project build

SSIS project

A SSRS project

SSAS tabular project

SSAS Multidimensional project

A typical build

Final thoughts

Introduction

DBUnitTesting project

Build the database project in VSTS (Azure Devops)

Run the Database Unit Testing Pipeline

Final thoughts

Introduction

Steps

The code

Final thoughts

Introduction

General sources

General Microsoft Azure information

1. Design and implement database solutions for Microsoft SQL Server and Microsoft Azure SQL Database

1.1 Design a hybrid SQL Server solution

1.2 Implement SQL Server on Azure Virtual Machines (VMs)

1.3 Design a database solution on Azure SQL database and SQL Server in Azure

1.4 Implement Azure SQL Database

1.5 Design and implement MySQL and PostgreSQL database solutions in Azure

2 Design and Implement Security

2.1 Design and implement SQL Server Database security

2.2 Design and implement Azure SQL Database security

3 Design for high availablity, Disaster recovery and scalability

3.1 Design and implement high availability solutions

3.2 Design and implement scalable solutions

3.3 Design and implement Azure SQL Database data recovery

4 Monitor and manage database implementations in Azure

4.1 Monitor and troubleshoot SQL Server VMs on Azure

4.3 Monitor and troubleshoot SQL Database

4.4 Automate and manage database implementations on Azure

Introduction

The buzzwords explained

My SQL

PostgresSQL

Introduction

Create a data factory

Final thoughts

Introduction

Microsoft certifications

Fact oriented modelling

Introduction

Semantics

Verbalization

Qualification and Classification

Creating an Information Grammatical Diagram (IGD)

Final thoughts

Introduction

Final thoughts

Introduction

Start with a workspace

Rename a workspace

Add a data model to a workspace

Add a folder to a workspace

Organize models in folders

Adding more models and diagrams

Save your workspace

How is the sws file organized?

Final thoughts

Introduction

Installation of the tSQLt framework

My first tSQLt testscript

Final thoughts

Introduction

DAT263x Introduction to Artificial Intelligence (AI)

Final thoughts