Using Lifecycle Management on Azure Blob Storage Account Containers

One of our clients wanted to set up a process to delete files from their blob storage containers after they aged beyond a certain time limit. Initially the conversation went down the route of creating some configuration information to drive the process, stored in a JSON file or a database and then periodically a process would use that configuration information to apply the rules across all the storage…I then thought hey surely Microsoft have dealt with this…and of course, they have…Lifecycle Management on the Storage Accounts, as documented here.

Lifecycle Management blade of Azure Storage Account
The Lifecycle Management Blade

Let’s take a rule we wanted to create and see how we can do that using Lifecycle Management. Imagine we want to remove files from a given folder in a given container on the Storage Account after thirty days.

On the Lifecycle Management blade click on “Add a rule” which brings up the first page of the wizard. Give the rule a sensible name (probably useful to define a naming standard that your organisation is comfortable with).

Set the rule scope which is either all blobs in the storage account or limit with filter – in this case we want to limit it to a particular container/folder and you’ll notice that when you select that radio button it adds a third page to the wizard (to set the limit settings).

Set the Blob type and Subtype – in this case I just want the defaults of Block Blobs and Base Blobs.

The page looks like this:

Lifecycle Management Wizard – Page 1

Click Next to move to page 2 of the wizard where we need to set the rules of what to do. The screen shows only the option of “Last Modified” but requires you to fill out the value for “More than (days ago)” – 30 in my case. Leave the Then part of the rule as “Delete the blob” so page 2 looks like this:

Lifecycle Management Wizard – Page 2

Click Next and we move on to page 3 where we need to set the Prefix Match. In my case I want to apply this rule to files found in the “output” container and the “myFolder” folder so I set the values and page 3 looks like this:

Lifecycle Management Wizard – Page 3

Now click on Add and the rule is created:

Lifecycle Management Rule Added

You’ll see there is a List View showing the new Policy in place. There is also a Code View which shows the policy in JSON format like so:

Lifecycle Management Policy – Code View

That’s it.

Other things to know…

  • The feature is free – well the feature itself and deletes of block blobs are but if you want to use the feature for moving files to a different tier then that is a Set Blob Tier API call which incurs a regular operation cost
  • Maintenance of the policies is via Portal, Powershell, CLI or REST APIs
  • Can also roll out policies using ARM Templates using the Microsoft.Storage/storageAccounts/managementPolicies type
Make sure you use base64 encoding on a Key Vault Secret for an sFTP Azure Data Factory Linked Service Connection

A quick post on setting up an sFTP Linked Service Connection in Azure Data Factory such that it uses a Key Vault for the SSH Key.

A friend of mine had tried setting this up but was getting the following error when testing the new Linked Service:

“Invalid Sftp credential provided for ‘SshPublicKey’ authentication type. The input is not a valid Base-64 string as it contains a non-base 64 characer, more than two padding characters, or an illegal character among the padding characters.”

The Linked Service was using a Key Vault to obtain the SSH Key to be used in the connection. The SSH Key had been uploaded as a Secret to the Key Vault using code similar to the following:

az keyvault secret set --name sshkey --vault-name akv-dev --file test.ssh --description "Test SSH Key"
After reading through the documentation on the az keyvault secret set call I noticed this:

So, the default is not base64 but utf-8.

We modified the az call to something like this:

az keyvault secret set --name sshkey --vault-name akv-dev --file test.ssh --description "Test SSH Key" --encoding base64

i.e. with the addition of the –encoding base64 part and then it worked fine.

Encrypt Acquired FTP file Dynamically Before Storing In Storage Account

I encuntered a requirement to acquire some unencrypted files from an on premise FTP server and place them on a target Azure Storage Account after first encrypting them. The goal is to not have these files in Azure unless they are encrypted.

This can be achieved in possibly a number of a ways but this post is about using Logic Apps to do this.


The following are assumed to be in place already:

  • An accessible FTP Server where the source file is hosted
  • An accessible Storage Account where the encrypted file is to be targeted to
  • An accessible Key vault with an encryption Key
Create Logic App

I’m going to use Azure Portal to do this.

First type Logic App in the search bar on the dashboard/home screen. Click on Logic App to bring up the screen.

You can see I have already attempted the action but we’ll create a new one for the purposes of writing this post. Hit Add and fill out the details for a new Logic App:

I’ve used an existing resource group but you could create one if needed. I called the Logic App “encrypt-ftp-to-storageaccount”. Hit Review + Create and it will validate before offering the create screen:

Hit Create and the deployment will execute until completion:

Now hit Go to resource and it will bring up the initial screen for the new Logic App…

Scroll down and click on “Blank Logic App”:

Notice that because I’ve already been playing in this area my Recent selections include some of the components we’re going to use here.

We need some kind of simple trigger to kick this off so let’s just use a Schedule with a once a day recurrence. Click on Schedule and then Recurrence and set the Interval to 1 and the Frequency to Day:

Now add a new step – we want to acquire the file from the FTP server.

Click on New Step.

Click on FTP if it is in your recent list or search for it. Then choose Get File Content (or Get File Content With Path if you need to specify a path). You then get the action on the editor with a number of fields to fill out:

Give the Connection a name (e.g. “ftp source”). Set the ftp server address, username and password and port number. I’ve not tried changing any of the other details but some of them might be sensible to address in your own environment. Click Create to create this action.

Now you can select a file – this could be programmed but for this exercise I’m just going to point it at a fixed file (robots.txt):

Now add another step by clicking “+ New Step”:

Choose Azure Key Vault from the Recent (if available) or by searching and then select “Encrypt data with key”. This creates the step:

Depending on whether you are already connected you may see that it tries to use an existing connection or if not asks you to specify the connection to the required Key Vault. In my case above I have a connection but I’ll choose to change that to show what needs to be set. Hitting Change Connection brings up this:

Choose Add New and it shows:

Set the Vault name to the name of the Key Vault where the Key for encryption is held, in my case akv-dev and click Sign In:

Azure brings up the usual credentials access dialog to allow you to connect.

Once connected you get the dialog box to select the Key:

I choose my Key (akv-dev-testkey) and set the Raw Data to the Dynamic Content value of File Content.

Now click on “+ New Step” again to add the write out of the data to the Storage Account.

Choose Azure Blob Storage and Create Blob:

I set the Connection Name and choose the Storage Account (oramossadls2) and then hit Create.

This creates the Create Blob Step and we can specify the folder path on the target Storage Account where we want to create the file. We can specify the target file name (robots.txt) and then we should specify the Dynamic Content of the encrypted data as the Blob Content but notice that the Dynamic Content doesn’t show it. It does, however, show the message “We can’t find any outputs to match this input format. Select See more to see all outputs from previous actions”:

Click on the “See More” and it will show the “encryptedData” as an option:

Choose “encryptedData” so that the Create blob dialog looks like:

Save the Logic App and Run it.

The Logic App runs and the output looks like this:

If we look on the Storage Account we see the file robots.txt:

And if I looked at the file in an editor it looks like:

Hope this helps.

Configure Linked Templates Use for Azure Data Factory with Azure DevOps Deployment

I’ve finally worked out how to do this so I thought I’d write a post on it since I can’t find any single resource that accurately covers it all – my apologies, in advance, if someone has already done this.

I’ve been using Azure Data Factory v2 for quite a while now and have it integrated with Azure DevOps for CI/CD between environments. I follow the standard approach which is documented here and I won’t repeat.

I’ll assume that you have a git enabled source Data Factory and a non git enabled target Data Factory and that your main code branch is “master” and the publish branch is “adf_publish”.

As the documentation there says:

“If you’ve configured Git, the linked templates are generated and saved alongside the full Resource Manager templates in the adf_publish branch in a new folder called linkedTemplates”

…that happens when you publish the master branch from Azure Data Factory.

We can see the non linked template files (ARMTemplateForFactory.json, ARMTemplateParametersForFactory.json) and linked template files (ArmTemplate_master.json, ArmTemplateParameters_master.json and ARMTemplate_0.json) in the picture below:

Note – this is a very small demonstration factory and there is only one linked template file (ArmTemplate_0.json) – as the factory grows in size additional, consecutively numbered, files will appear.

The question then is how do you get Azure DevOps to use those Linked Template files instead of the non linked ones sitting in the adf_publish branch root directory?

Supposedly you can just follow this link but unfortunately that document is a little out of date and no longer being updated. The document covers the deployment of a VNET with Network Security Group and makes no mention of Azure Data Factory but it still provides some useful pointers.

The document correctly points out that in order for the linked templates to be deployed they need to be accessible to Azure Resource Manager and the easiest way of doing that is by having the files in an Azure Storage Account – that article illustrates the use of “Storage (general purpose v1)” but I used a Gen 2 ADLS Storage Account instead and that worked fine.

My Gen 2 Storage Account “oramossadls2” looks like this:

I then created a Shared Access Signature for oramossadls2 Storage Account and copied the SAS token which I then put into a secret called StorageSASToken in an Azure Key Vault called akv-dev:

I created an Access Policy on this Key Vault to allow the Azure DevOps Service Principal to be able to read the Secret:

On my Gen 2 ADLS Storage Account I then created a Container called demo:

From the container properties the URL looks like: 

In Azure Storage Explorer, I grant access to the container to the Service Principal of my Azure DevOps site in order that it can access the files:

In Azure DevOps I then created a Variable Group called akv-dev which brings in the StorageSASToken Secret from the akv-dev Azure Key Vault:

I created a second Variable Group called “Production-Static” in which I created some more variables for use later on:

Following this article from Kamil Nowinski I have a Build Pipeline “ADF-CI” in Azure DevOps which stores the ARM template files as artifacts ready for use on a Release.

Now for the bit that took me a while to work out…the Release Pipeline.

My release pipeline has the artifacts from the latest Pipeline Build (_ADF-CI) and the code from the “master” branch as artifacts and a single stage with five tasks:

The Production-Static and akv-dev Variable Groups are attached to the Pipeline:

The five tasks are:

The first step copies the files on the latest Build, attached as an artifact (_ADF-CI) to this Pipeline, to the ADLS Gen2 Storage Account that I’ve created:

The value of StorageAccountName from the “Production-Static” Variable Group is “oramossadls2” and blobContainerName is “demo”.

The YAML for step 1 is:

 task: AzureFileCopy@3
 displayName: 'AzureBlob File Copy'
 SourcePath: '$(System.DefaultWorkingDirectory)/_ADF-CI'
 azureSubscription: 'Pay-As-You-Go (xxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx)'
 Destination: AzureBlob
 storage: '$(StorageAccountName)'
 ContainerName: '$(blobContainerName)'
 BlobPrefix: adf 

When this step eventually runs the Storage Account will look like this:

The second step stops the ADF Triggers – if you don’t stop active triggers the deployment can fail. I use a Powershell script to do this:

The YAML for step 2 looks like this:

 task: AzurePowerShell@4
 displayName: 'Stop ADF Triggers'
 azureSubscription: 'Pay-As-You-Go (xxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx)'
 ScriptPath: '$(System.DefaultWorkingDirectory)/_ADF/powershell/SetADFTriggersState.ps1'
 ScriptArguments: '-DataFactoryName $(DataFactoryName) -DataFactoryResourceGroupName $(DataFactoryResourceGroupName) -State "Stop" -ReleaseIdentifier $(Release.Artifacts._ADF.BuildId)'
 azurePowerShellVersion: LatestVersion 

The third step actually deploys the template to the target environment (in this case oramoss-prod Data Factory resource group “adf-prod-rg”):

It took a while to work this out. First I tried using Template Location of “Linked Artifact” which allows you just to select the template/parameter file from the attached artifacts but that doesn’t work because “nested templates ALWAYS have to be deployed from url” according to this. So, we have to set Template Location to “URL of the file”. We then have to specify the Template and Template Parameter file link using a URL which consists of the Primary Blob Service Endpoint, the container, Blob Prefix, folder hierarchy, filename and the SAS Storage Key, i.e.

Template File:$(StorageSASToken)

Parameters File:$(StorageSASToken)

Note – we are getting the Storage SAS Token from the attached Variable group akv-dev.

We then have to override some parameters:

-factoryName "oramoss-prod" -containerUri $(AzureBlobStorageURL)/$(blobContainerName)/adf/drop/linkedTemplates -containerSasToken $(StorageSASToken)

The factoryName needs to be overridden because we are moving the code from one Data Factory to the next.

In this article, it suggests that the parameter for the the URI of the template files is called “templateBaseUrl” but this appears to now be changed to “containerUri” and we set it to the Primary Blob Service Endpoint, Container Name and directory where the template files are held.

The article also suggest the Storage SAS Token parameter is called “SASToken” but it appears to now be “containerSasToken”.

The YAML for step 3 looks like:

 task: AzureResourceManagerTemplateDeployment@3
 displayName: 'Deploy ADF ARM Template'
 azureResourceManagerConnection: 'Pay-As-You-Go (xxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx)'
 subscriptionId: 'xxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx'
 resourceGroupName: 'adf-prod-rg'
 location: 'North Europe'
 templateLocation: 'URL of the file'
 csmFileLink: '$(StorageSASToken)'
 csmParametersFileLink: '$(StorageSASToken)'
 overrideParameters: '-factoryName "oramoss-prod" -containerUri $(AzureBlobStorageURL)/$(blobContainerName)/adf/drop/linkedTemplates -containerSasToken $(StorageSASToken)'
 deploymentName: 'oramoss-prod-deploy' 

The fourth step removes orphaned resources. Because we use an incremental approach to pushing the ARM template to oramoss-prod it means that if we dropped an element from oramoss-dev Data Factory it would not automatically get removed from oramoss-prod Data Factory, i.e. the element would be orphaned in oramoss-prod. This step removes any such elements it finds using a Powershell script.

The YAML for step 4 looks like:

 task: AzurePowerShell@4
 displayName: 'Remove Orphans'
 azureSubscription: 'Pay-As-You-Go (xxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx)'
 ScriptPath: '$(System.DefaultWorkingDirectory)/_ADF/powershell/RemoveOrphanedADFResources.ps1'
 ScriptArguments: '-DataFactoryName $(DataFactoryName) -DataFactoryResourceGroupName $(DataFactoryResourceGroupName) -armTemplate $(System.DefaultWorkingDirectory)/_ADF-CI/drop/ARMTemplateForFactory.json'
 azurePowerShellVersion: LatestVersion 

The last step runs the same script as the second step but with State set to “StartPriorEnabled” instead of “Stop”.

The YAML for step 5 looks like:

 task: AzurePowerShell@4
 displayName: 'Start ADF Triggers'
 azureSubscription: 'Pay-As-You-Go (xxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx)'
 ScriptPath: '$(System.DefaultWorkingDirectory)/_ADF/powershell/SetADFTriggersState.ps1'
 ScriptArguments: '-DataFactoryName $(DataFactoryName) -DataFactoryResourceGroupName $(DataFactoryResourceGroupName) -State "StartPriorEnabled" -ReleaseIdentifier $(Release.Artifacts._ADF.BuildId)'
 azurePowerShellVersion: LatestVersion 

That’s it. Save the Release Pipeline and run it and the output should look like similar to:

Helpful Links

Get Azure Networking Hierarchy Components With Powershell

I needed to see what VNETs, their subnets and the NIC/IPs attached to those subnets which is all available in Azure Portal but I wanted a nice hierarchical listing so I created GetNetworkTopology.ps1

The output looks like this (redacted):

VNET: Vnet01 / Resource Group: Vnet01-rg / Location: ukwest / Address Prefix:
 Subnet Count: 2
 -Subnet: Subnet01 / Address Prefix:
 --IP Configuration Id: /subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourceGroups/Vnet01-rg/providers/Microsoft.Network/networkInterfaces/VM1-Nic1/ipConfigurations/ipconfig1
 --IP Configuration Id: /subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourceGroups/Vnet01-rg/providers/Microsoft.Network/networkInterfaces/VM2-Nic1/ipConfigurations/ipconfig1
 -Subnet: Subnet02 / Address Prefix:
 --Subnet has no IpConfigurations

Azure Advisor And Fixing Errors

Azure can be configured to send you advisor reports detailing things that are not quite right in your environment. The advisor is not necessarily always right but it’s sensible to review the outputs periodically, even if they relate to non production environments.

A few issues popped up on an advisor report on my recent travels and although you can just use the entries on the report on the portal to target the offending resources, I thought it might be helpful to write some Powershell to identify the offending resources as an alternative.

Secure transfer to storage accounts should be enabled

This error shows up similar to this on the report:

Fairly obvious what this means really – the storage account has a setting which is currently set to allow insecure transfers (via http rather than https) – an example looks like this under the Configuration blade of the Storage Account:

The advisor highlights this and the solution is to just set the toggle to Enabled for “Secure transfer required” and press save.

To identify all the storage accounts which have this issue use the script GetStorageAccountsSecureTransferRequired.ps1

This gives output similar to the following (redacted):

StorageAccountName ResourceGroupName Location SkuName Kind AccessTier CreationTime ProvisioningState EnableHttps TrafficOnly
—————— —————– ——– ——- —- ———- ———— —————– ———–
XXXXXXXXXXXXXXXXXX AAAAAAAAAAAAAAA northeurope Standard_LRS Storage 9/6/19 9:51:53 PM Succeeded False
YYYYYYYYYYYYYYYYYY AAAAAAAAAAAAAAA northeurope Standard_LRS Storage 6/26/19 3:29:38 PM Succeeded False

An Azure Active Directory
administrator should be
provisioned for SQL servers

This one appears like the following in the advisor output:

As a long term Oracle guy I’m no SQL Server expert so I can’t quite see why this is an issue if you have a SQL Server authenticated administrative user active – no doubt a friendly SQL DBA will chime in and explain.

To fix this navigate to the SQL Server in question and the Active Directory admin blade and select “Set admin”, choose a user from the Active Directory and press Save.

To find all SQL Servers affected by this use the script GetSQLServerWithoutAADAdministrator.ps1

This returns output similar to the following (redacted):


Enable virtual machine backup to
protect your data from corruption
and accidental deletion

This one appears like the following in the advisor output:

To fix this, navigate to the Backup blade on the VM Resource in question and set the appropriate settings to enable the backup.

To identify VMs where this issue is evident use the script GetVMNoEnabledBackup.ps1

This gives results similar to the following, allowing you to see VMs where no backup is enabled:


Finding databases on each SQL Server using Powershell

A client had the requirement to list out the SQL Servers and the databases they have installed on those SQL Servers in their Azure Cloud environment this week. The reason for the requirement was to find SQL Servers that no longer had any databases on them so they could be considered for removal.

Essentially, it gathers a list of SQL Server resources, loops through them and counts and itemises them, not including the master database since that’s not relevant to the requirement.

I wrote a powershell script called GetDatabasesByServer.ps1.

Which returns the following type of output (amended for privacy):

Also by increasing the percentage purchase cheap cialis of nitric oxide in the body. Ever been in a car where the newbie driver keeps slamming the brakes? You know how torturous that can be! Similar is the cheapest levitra case with certain food items which include a lot of fats in the area, showing the image of bigger penis size. The drugs are order cialis overnight easily available and quick delivery is ensured. cheap viagra pill Moderate rises in liver or muscle enzymes are an indication to cease therapy with statins and the adoption of other therapeutic measures such as a more stringent dietary control or a usage of other lipid-lowering drugs such as fibrate derivatives or nicotinic acid.
Database Count:1
Database Count:3

Translating Chinese to English in SQL with Microsoft Translator

In Oracle, I had a table of data which had some Chinese words that I needed to translate into English on the fly, in SQL…this is how I did that…

Microsoft have a translator facility here with the Translator Text API v3.0 to allow you to call it programmatically. I’m using Microsoft as I’m currently working on Azure – of course, there are other translation facilities available.

The API has a translate method which one needs to construct a call to. The format of the call is:

…where xxxx is the from language, e.g. zh-Hans for Simplified Chinese (my case) and yyyy is the to language, e.g. en for English.

In the body of the request needs to be some JSON of the form:

[{"Text": "zzzz"}]

…where zzzz is the text that needs to be converted from Simplified Chinese to English.

Calling the API would result in a response which contains the translated text in JSON format.

So, what we need to do is create an Oracle Function which can be called from SQL passing in the text that needs translating from a selected column. The function will call the Microsoft Translator API via UTL_HTTP to translate the text and return the translated text which is then displayed in the SQL output.

Thanks to Tim Hall for this article and Lucas Jellema for this article which helped me with some of this – I just had to do a few tweaks to get things to work in my use case, namely:

  1. Set up the Oracle Wallet for using HTTPS
  2. Convert the publish_cinema_event procedure Lucas wrote to a function so I could call it in SQL
  3. Use LENGTHB instead of LENGTH to determine the length of the text to be translated due to the text being multi byte
  4. Use WRITE_RAW and UTL_RAW.CAST_TO_RAW rather than WRITE_TEXT otherwise the chinese characters get mangled
  5. Set the body text of the request to be UTF-8 by calling UTL_HTTP.SET_BODY_CHARSET

Firstly the calls to the Microsoft Translator are via HTTPS rather than HTTP so I needed to set up Oracle Wallet with keys to facilitate that. I tried to follow the instructions on Tim’s page about using Chrome to get the certificate but no matter which option I chose it wouldn’t include the keys/certificates in the output file. Instead, I chose to go onto our Linux server and do it this way (adjust to suit your paths):

mkdir -p /u01/app/oracle/admin/ORCL/wallet
openssl s_client -showcerts -connect </dev/null 2>/dev/null|openssl x509 -outform DER >/u01/app/oracle/admin/ORCL/wallet/ms_translate_key.der

This seemed to work fine – at least everything else after worked and the end result was that we could call the API so whatever the above did differently to Chrome I don’t know but it worked.

I then created a wallet on the Linux server:

orapki wallet create -wallet /u01/app/oracle/admin/ORCL/wallet -pwd MyPassword -auto_login
orapki wallet add -wallet /u01/app/oracle/admin/ORCL/wallet -trusted_cert -cert "/u01/app/oracle/admin/ORCL/wallet/ms_translate_key.der" -pwd MyPassword

Now once the wallet is created I created the following function:

CREATE OR REPLACE FUNCTION translate_text(p_text_to_translate in varchar2
                                         ,p_language_from in varchar2
                                         ,p_language_to in varchar2
                                         ) RETURN VARCHAR2 IS
  req utl_http.req;
  res utl_http.resp;
  buffer VARCHAR2(4000); 
  content VARCHAR2(4000) := '[{"Text": "'||p_text_to_translate||'"}]';
  dbms_output.put_line('CONTENT LENGTH:'||TO_CHAR(LENGTH(content)));
  req := utl_http.begin_request(url, 'POST',' HTTP/1.1');
  utl_http.set_header(req, 'user-agent', 'mozilla/4.0'); 
  utl_http.set_header(req, 'content-type', 'application/json'); 
  utl_http.set_header(req, 'Ocp-Apim-Subscription-Key', 'OCP_APIM_SUBSCRIPTION_KEY'); 
  utl_http.set_header(req, 'Content-Length', LENGTHB(content));
  utl_http.set_body_charset(req, 'UTF-8');
  res := utl_http.get_response(req);
  utl_http.read_line(res, buffer);
  RETURN buffer;
  THEN utl_http.end_response(res);
END translate_text;

NOTE – The SET DEFINE OFF is important given the embedded ampersand characters. The OCP_APIM_SUBSCRIPTION_KEY value needs to have whatever is relevant for your subscription as well. You may need to set up ACLs for the user running this code – Tim and Lucas cover that in their articles.

Now to run the code, login to the database and run this to engage the wallet:

EXEC UTL_HTTP.set_wallet('file:/u01/app/oracle/admin/ORCL/wallet', NULL);

Create a test table with some Simplified Chinese in it:

create table test_chinese(chinese_text varchar2(200));
insert into test_chinese values('敏捷的棕色狐狸跳过了懒狗');

Now select the data out using the translate_text function and see what we get:

select chinese_text,translate_text(chinese_text,'zh-Hans','en') from test_chinese;

The returned translation is in JSON format but of course if you wanted you could extract the text element from it easily.

That’s it.

Installing Hortonworks Data Platform 2.5 on Microsoft Azure

I presented this topic to the Big Data Meetup in Nottingham on Thursday but sometimes people prefer a blog to a presentation, so I’ve fashioned this article from the slides…

This article assumes the following:

Start by navigating to the Azure login page and enter your details. If you have never visited before your screen will look like this:

If you’ve logged in before the page will show your login and you can just click it:

After you login, you’ll arrive at the Dashboard:

Choose the “Marketplace” link at the bottom right, which leads to the following screen where you can type “HDP” and it will show you the options for Hortonworks Data Platform. There are currently two options 2.4 and 2.5 – I chose 2.5:

When you choose 2.5 it will bring up this screen which shows the details of the option you have chosen and offers you the “Create” button to go ahead and start the creation process – click on Create:

After clicking on Create, the process moves on to a five step wizard, the first step of which allows you to choose “Basic options” for the VM. I set the following options:

Name: oramosshdp25sandbox

VM Disk Type: SSD

User name: jeff

SSH Public key: my public SSH key

Subscription: Leave  set to Free Trial (if that’s what you are using, as per screenshot, or your Corporate/Pay As You Go subscription if you have one)

Resource Group: Create New called hdp25sandbox_rg

Location: UK West

A screenshot of these options looks like this:

Click on OK and move on to the 2nd step in the wizard for choosing the size of the VM. I chose the DS3_V2 size which seemed to work OK – you might be able to get away with something smaller, perhaps.

Click on Select and move on to step 3 of the wizard which is about configuring optional features. For this step I set the following:

Use managed disks: Yes

Leaving all other options as defaults this looks like:

Click on OK and move on to step 4 which is just a summary of the configuration:

If you’re happy, click on OK and move on to step 5 where you accept the terms of use and “buy” the VM:

If you’re happy, click on Purchase and that’s the end of the wizard. Azure then goes off to deploy the VM, which can take a few minutes. You’ll be returned to the dashboard screen where you’ll see the VM at the top right with the word Deploying on it:

As I say, it takes a few minutes to complete, but when it does, you’ll see a popup notification in the top right of the screen and the VM tile will change to look as below:

So, you now have the Sandbox VM up and running.

The VM by default only has inbound SSH access enabled and can only be accessed by IP address so we’ll make some changes to these next. First we’ll give the VM a DNS name which allows you to access it on the internet via a name rather than an IP address. From the dashboard screen (above) click on the VM and it takes you to this screen:

You’ll notice the Public IP address which is a hyperlink…click on that link and it takes you to the following screen where you can specify the DNS Name which means the machine will have a Fully Qualified Domain Name that you can access via the internet. I set my DNS Name to oramosshdp25sandbox and given I’d previously chosen to use UK West as the location, the Fully Qualified Domain Name is thus as per the screenshot below:

Now, navigate to the Inbound Security Rules page which is under the Network Security Group page (access from the Resource List on the dashboard). Notice that the only rule existing is one to allow inbound SSH communication:

In order to facilitate additional capabilities you should open up a few more ports, as follows:

  • 8888 – HDP
  • 8080 – Ambari
  • 4200 – Web SSH access
  • 50070 – Default Node Name
  • 21000 – Atlas
  • 9995 – Zeppelin
  • 15000 – Falcon
  • 6080 – Ranger

Click on Inbound Security Rule which takes you to the page for maintaining these rules and enter the details for the 8888 port. I specified the name as default-allow-8888 and the port as 8888 as shown below:

Click on OK to create the rule. Carry out the same process for the other ports.

Now that we’ve undertaken these additional activities we can access the VM using an SSH terminal logging onto as the user you have created (jeff in my case) and the private SSH key:

Whilst you are in the SSH terminal you can reset the Ambari password. This is not strictly necessary unless you want to login to Ambari as admin, but I’ll describe it anyway.

First become root with:

sudo su - root

Now SSH into the Docker Image as root:

ssh root@

You will be prompted to change the password for root on this first login – the current password is hadoop.

After changing the password run the Ambari password reset process:


Follow the instructions to reset the password and after that it will start the Ambari server process.

Once all that is done, exit out of the sessions and the original SSH terminal.

Now go into HDP via the web interface by logging on to the following URL:

The first time you access this URL you’ll be given a welcome (marketing) page which asks for your details:

Fill out the details and hit Submit which will take you to the main entry page for HDP:

Choose the Launch Dashboard option on the left, which brings up a pair of browser windows that use the entire desktop and show the Ambari login page on the left hand browser and the Tutorials website on the right hand browser like this:

You can use either the admin user that you just reset the password for or the predefined user raj_ops (password raj_ops) to access Ambari. Click on Sign In on the left hand browser once you entered the credentials and it takes you into the main Ambari homepage:

This is the main systems management environment for Hortonworks – more documentation here.

If we close this pair of browsers now and go back to the main HDP entry page and choose the Quick Links option on the right we get this page:

From here you can choose to use any of these specific components.

NOTE – I couldn’t get Atlas and Falcon to work – they need more configuration/setup to get them functional. Ranger, Zeppelin and the Web SSH client work fine though.

Just a basic introduction but I hope you find it useful.