Microsoft Azure is a cloud computing platform and infrastructure for building, testing, deploying and managing your applications and services. Since many years, we have servers, databases on-premises where companies are responsible for buying, installing, managing the hardware and software. Azure was introduced in year 2008 and since then we have seen huge transition in the cloud computing as it provides services in managing the hardware, infrastructure, securing the data including the backups and many other services.
This is a new blog series about Azure Fundamentals, if you would you like to get the notifications as soon as I post a new article to this series, subscribe here at the WANT TO BE NOTIFIED Section at the right side of the screen.
There are twenty two service categories available in Azure and some hundreds of services to choose from. If you are new to Azure, you can sign up for a free Azure account. Microsoft offers a free 200$ credit for 30 days so you can go ahead and freely explore their services like you can test and deploy enterprise apps for free, try data analytics with your data and try many more services. Here is the link from Microsoft you can go to and create a free account:
Azure Data Services
For on-premises, we are responsible to procure the hardware, disk space, CPU and all the other hardware in place and to maintain them. We are responsible for the networking, patching, virtualization, operating system, middleware, apps and data. What if virtualization, servers, storage and networking are managed by the Microsoft and we can mainly focus on the application development and we can manage our data? That’s where the IaaS (Infrastructure as a Service) comes in. In this service, we can only take care of the operating system, patching, install the software, apps and data.
PaaS (Platform as a service) is a service where Microsoft is going to take care of lot more services so you can solely focus on the application development and data development. Microsoft manages all the services managed by IaaS but also operating system, middle ware and runtime. If your database needed to be restored in any case, backups will be provided to restore at any time.
SaaS (Software as a service) is a service like office 365, Dynamics 365 and they do everything for you as you insert, write and read your data from those platforms.
Data Services in Azure:
These services are the Infrastructure as a service and Platform as a service solutions. In the below diagram, we can see on the left side all the resources we can connect to which can connect from the on-premises/cloud/structured/unstructured data types. We do have SSIS but it doesn’t actually have lot of cloud ETL capabilities as it cannot connect to all new data sources. As we move to Azure, there is Azure data factory which provides the graphical user interface where you can drag and drop and perform different activities. For extracting and transforming data, we also have another tool Azure Databricks which gives us lot of flexibility where you can write R, Python and Spark SQL but for using these requires coding. Professionals who are data scientists and data engineers use the databricks.
Azure Synapse Analytics is a limitless analytics service that brings together data integration, enterprise data warehousing, and big data analytics.
Once we apply the transformations, we load that data into the Azure data Lake storage or any other storage areas like Azure storage, Azure SQL DB, SQL pool or store the data in the non relational database systems like Cosmos DB. Once we store the data, we can report the data using Power BI.
We will take all the resources that we create in azure into a logical container. Resource groups are a great way to take all of the resources we create for a project, department or for any other purposes and store them into a resource group. This is same as the directory location where we put all the resources as a resource group. We can update, deploy or delete them as a group. All the resources that we create will be under a resource group. All the metadata information that we have under the resources will be stored in resource group like what type of resource we are creating, what resources were allocated to it and the names of them.
There are always ways to automate this but for this example, for simplicity purposes we are going to do this through GUI.
Open your Azure portal, go to the navigation pane and click on the “Create a resource”
This will bring us to the Market place where we can see the 22 different categories and some hundreds of different services that are available. Find the “Resource Groups” by typing it in the search bar
Select the Resource Group and click create
This will bring us to the resource creation page. We can give the resource group a name, choose the location where it is going to be stored. For latency sake, for productivity and performance, please choose the right region details. We do have other options out there but to get started with azure, this should be good. When you check the other Tags out there, these are for the information like to which project is this resource group associated with, who is the owner and who purchased it.
It is better to keep all of our resources in same region so we can reduce the cost. If we move the data from one region to other, there is actually an extra charge for that for moving data. When you are creating your databases, storage accounts, data factory to move data, it is important to know to keep the data in same data center.
Then at the bottom of the screen, click review and create.
It should give you the validation passed and press the create at the bottom of the screen. Once you do that, your resource group has been created and you will get the notification at the top right corner that the resource group has been created.
You can either go to the resource group from the notification section or you can also go from the navigation pane by clicking on the “Resource groups”
When you click on the resource group “dbanuggets”, you see a vertical pane opened up across your screen which is known as Blade in Azure. When you click on any tabs in it, you will see vertical tabs opening up. We are currently looking at the resource group we created “dbanuggets” and under that name you see it as a Resource group. We see all administrative properties under that and we can setup properties like security like access control, policies, resource costs, deployment history that we can control. So, we have created a logical container, resource group.
If you want to close this resource group pane, you can simply press the cross button at the top right corner of the pane to go back to the previous blade.
Azure storage is the general purpose account v2 and the Azure data lake store Gen2. This is the cheapest option for the storage that we have available. If you are creating a storage account on azure, usually Azure data lake gen 2 will be good enough unless you need the features that are only available in the general purpose accounts that are not existed in the Azure data lake store.
Azure Data lake uses the Active directory. Blob storage account uses the shared access key. If someone can get the name of the storage account and get to the access key, that might be dangerous situation. Access key should be regenerated every couple of weeks or so in a schedule to keep the key secure. With the Azure data lake Gen2 accounts, we can take the advantage of login Id of the Azure active directory. There are some regulatory and compliance that we get for general purpose that we do not get with data lake. One of them is WORM storage which means Write Once Read Many. Once the data has been written, it cannot be modified. Azure blob storage offers that capability of doing WORM storage and Azure data lake doesn’t. Azure blob storage have a capability of soft storage where if you accidentally deleted anything, we can restore that. Azure data lake doesn’t offer that.
Azure data lake is designed to store massive amounts of data for big data analytics. It provides unlimited scalability, Hadoop compatibility, optimized azure blob file system (ADFS) designed for big data analytics, Zone-redundant and Geo-redundant storage.
Lets create Azure Data Lake:
Go to the navigate pane and search for the “Storage account”. To create general purpose storage account or Azure data lake is through the same storage account service. You will see the storage account service, click create.
You will see a create storage account pane, and the top you see many tabs like data protection, advanced, tags. You can create the storage account first and then later configure these.
Select the resource group, We will be selecting the one that we created with name “dbanuggets” and give a storage account name. There are some requirements here that we have to remember while creating the storage account name. It should be all lower case letters and it should be globally unique across all of azure because when we create the storage account, we can connect to it through the name.
Give the name of the storage account in all lower case letters. Here I am creating the storage account with name “dbanuggetsstorage” and choose the location of the storage to match the resource group. Here I am choosing the East US. Then choose the kind of account you are going to create. Remember, this is not where we actually change this to general purpose v2 or azure data lake gen2.
Then choose the Replication tab and select the type of redundancy you need. The cheapest option and the one that is built in is the “Locally-redundant storage(LRS)” meaning if you are working with data lake or the storage account, Microsoft makes sure we have like three copies of that data at any given time. This is an amazing thing as we now have three copies available and do not have to worry about what will happen if one data rack goes down. We will still have other two racks readily available. This option is the redundancy with in the data center so if the data center itself goes down, then we need to check for different options available in the azure like zone redundant storage (ZRS), geographical redundancy storage(GRS) and many others. There is a cost associated with the options you choose from here. Here for this example, we are going to choose the cheapest option “Local-redundant storage(LRS)”.
When you check the other tabs like networking, here we can control the access through IP addresses. The next tab is data protection. Here we can do like soft deletes for blobs. If someone will delete the file accidentally, we can recover those files. You can do versioning for the blobs and also turn on the point in time restore for container.
The next tab is advanced tab where we can choose where we actually want the storage account to be. You can see this in the advanced tab> Data Lake Storage Gen2> diabled or enabled. This is the single place where you click to enable or disable the Data Lake storage which will signify if your storage is going to be Data lake and this needs to be done at the time of the creating the resource. We cannot go and modify this later. Once we choose the storage type here, we cannot change it later. Let’s go ahead and choose the Data lake to be enabled.
Once you enable this option and go to the Data protection tab, lot of options that were there under the data protection are disabled we lose the point in time restore, the ability to do the soft deletes, the versioning for the blobs and we also lose the WORM storage (meaning once we write the data, we cannot modify it). These are the limitations that we get with the Azure Data Lake, so if you need these options we have to go with the blob storage account. Remember that we are not restricted with only one storage account for all our projects. We can have for some projects blob storage and for others Azure Data Lake storage. We can have some data that requires the regulatory compliance like the WORM storage and you do not want any one to accidentally delete the files and you need the soft delete functionality, in that case for that specific project we can choose the blob storage account. We can always have and use the different services for our projects in Azure.
On the Tags tab, we can give the name and the value to that tag name so once we go to the billing information, this resource is now tagged and we can see our costs clearly. It is a good practice to tag all of our resources. Click review and create at the bottom of the page.
You see the validation passed on the top of the screen. You can review the resources that you wanted to create and click create at the bottom of the screen.
I am seeing the deployment in progress once I press the create button. The storage account that you created will not cost us anything now. You get billed for storage account is when you write the data to the storage account and when you read data from it. If we created the storage account but not created anything in it, we do not get billed for that. If we created and using very little like creating only few files then we will only be minimally be charged. We can monitor the costs down the lane in the Azure.
You will be seeing the notification button at the top right corner of the screen notifying the deployment is complete.
To see how the storage account that we created look like, go to the navigation pane on the left side of the screen, choose the resource group> select the resource group we created (in my example, dbanuggets). Here is our storage account “dbanuggetsstorage”.
Once you click on that storage account, a new blade will be opened in the vertical pane and you will be able to see all associated details of that storage account.
When you scroll down, you will see the “containers”. Click on that and here when you want to upload any files or you need to remove any files from azure storage account, we need to create a container for that first.
Click on the containers and create a container by giving a name.
Give the name like here I provided the name of the container as “dbawork”. Remember, we are creating a directory here and we need to give the name all small letters. I am keeping my public access Private here in the example. Then, click create.
You will see the notification that the container has been created successfully.
We can now upload files and download files. Usually, we all do this through automating. We will be doing this through azure data factory. Now, lets go ahead and manually upload a file in our example and see how easy it is to import data into Azure. Click on the container we just created.
Click on the upload button at the top
Select the folder to upload the file
Upload the file and press upload
File has been uploaded. This file is now in azure and stored in azure data center. Any file that we upload to storage account, we have to upload the files to a container. These containers are just like the directory locations where we have folders having the files on our on-premises environments. We can make sure we have security setup at the container level.
You can directly click on that file that is now in the container and click edit to directly edit the content in the file.
A new blade pops up opening the file and you can edit it
For easy uploading and managing your files to the storage, there is this beautiful application you can use “Azure storage Explorer”
Azure storage Explorer is an amazing tool where you can connect to your azure account and see all the different resources you have in azure. It is both for the general purpose and azure data lake Gen2. We can upload files, folders, change permissions. You can upload, download and you can also modify the security on these containers easily through this application. If you do not have it already, download it here.
Once you download the Azure storage explorer, signin with the azure account you created. I signed in with the azure account I created and I can see that in the account section in the application that you can see at the left side of the panel.
Under the explorer panel, expand the Azure subscription tab to see all the resources you have like storage account and the containers I have along with the fileshares.
You do not have to login to the azure portal, you can upload, download from here itself. We usually automate the upload processes through azure data factory or through the business applications.
In this post, we have learned about Azure Data Services- Infrastructure as a Service (IaaS), Platform as a Service (PaaS), Software as a Service and we have seen the difference in between these services. We have leaned about the Azure Storage-Blob Storage, Data lake storage Gen2. We have learned with examples so you can get started with Azure.
In the next post, we will learn the fundamentals of Azure SQL DB.
Thanks for reading!