Microsoft Azure provides a wide number of services for managing and storing data

Overview

Azure SQL Database Pipeline Project

Table of Contents

Overview

Microsoft Azure provides a wide number of services for managing and storing data. One product is Microsoft Azure SQL. Which gives us the capability to create and manage instances of SQL Servers hosted in the cloud. This project, demonstrates how to use these services to manage data we collect from different sources.

Resources

To use this project you will need to install some dependencies to connect to the database. To download the drivers needed go to to Microsoft SQL Drivers for Python. Once you download it, run through the installation process.

Resources - PYODBC with Azure:

If you would like to read more on the topic of using PYODBC in conjunction with Microsoft Azure, then I would refer you to the documentation provided by Microsoft.

Setup

Right now, this project is not planned to be hosted on PyPi so you will need to do a local install on your system if you plan to use it in other scrips you use. First, clone this repo to your local system.

Setup - Local Install:

If you plan to use this project in other projects on your system, I would recommend you either install this project in editable mode or do a local install. For those of you, who want to make modifications to this project. I would recommend you install the library in editable mode.

If you want to install the library in editable mode, make sure to run the setup.py file, so you can install any dependencies you may need. To run the setup.py file, run the following command in your terminal.

pip install -e .

If you don't plan to make any modifications to the project but still want to use it across your different projects, then do a local install.

pip install .

This will install all the dependencies listed in the setup.py file. Once done you can use the library wherever you want.

Setup - Requirement Install:

If you don't plan to use this project in any of your other projects, I would recommend you just install the dependencies by using the requirement.txt file.

pip install --requirement requirements.txt

Usage

Here is a simple example of using the azure_sql_pipeline library to grab a specific database from our SQL Server.

from pprint import pprint
from configparser import ConfigParser
from azure_data_pipeline.client import AzureSQLClient

# Initialize the Parser.
config = ConfigParser()

# Read the file.
config.read('config/config.ini')

# Grab the Azure Management Credentials.
subscription_id = config.get('azure_credentials', 'azure_subscription_id')
tenant_id = config.get('azure_credentials', 'azure_tenant_id')
client_id = config.get('azure_credentials', 'azure_client_id')
client_secret = config.get('azure_credentials', 'azure_client_secret')

# Grab the Azure SQL Server Credentials.
server_username = config.get('server_info', 'administrator_login')
server_password = config.get('server_info', 'administrator_login_password')

# Initialize the client.
azure_pipeline_client = AzureSQLClient(
    client_id=client_id,
    client_secret=client_secret,
    subscription_id=subscription_id,
    tenant_id=tenant_id,
    username=server_username,
    password=server_password
)
Owner
Riya Vijay Vishwakarma
learner
Riya Vijay Vishwakarma
Pandas integration with sklearn

Sklearn-pandas This module provides a bridge between Scikit-Learn's machine learning methods and pandas-style Data Frames. In particular, it provides

2.7k Dec 27, 2022
functional data manipulation for pandas

pandas-ply: functional data manipulation for pandas pandas-ply is a thin layer which makes it easier to manipulate data with pandas. In particular, it

Coursera 188 Nov 24, 2022
dplyr for python

Dplython: Dplyr for Python Welcome to Dplython: Dplyr for Python. Dplyr is a library for the language R designed to make data analysis fast and easy.

Chris Riederer 754 Nov 21, 2022
A Python toolkit for processing tabular data

meza: A Python toolkit for processing tabular data Index Introduction | Requirements | Motivation | Hello World | Usage | Interoperability | Installat

Reuben Cummings 401 Dec 19, 2022
Directions overlay for working with pandas in an analysis environment

dovpanda Directions OVer PANDAs Directions are hints and tips for using pandas in an analysis environment. dovpanda is an overlay companion for workin

dovpandev 431 Dec 20, 2022
Easy pipelines for pandas DataFrames.

pdpipe ˨ Easy pipelines for pandas DataFrames (learn how!). Website: https://pdpipe.github.io/pdpipe/ Documentation: https://pdpipe.github.io/pdpipe/d

694 Jan 05, 2023
Clean APIs for data cleaning. Python implementation of R package Janitor

pyjanitor pyjanitor is a Python implementation of the R package janitor, and provides a clean API for cleaning data. Why janitor? Originally a port of

Eric Ma 1.1k Jan 01, 2023
Build, test, deploy, iterate - Dev and prod tool for data science pipelines

Prodmodel is a build system for data science pipelines. Users, testers, contributors are welcome! Motivation · Concepts · Installation · Usage · Contr

Prodmodel 53 Nov 29, 2022
Tools for parsing messy tabular data.

Parsing for messy tables A library for dealing with messy tabular data in several formats, guessing types and detecting headers. See the documentation

Open Knowledge Foundation 382 Nov 10, 2022
Microsoft Azure provides a wide number of services for managing and storing data

Microsoft Azure provides a wide number of services for managing and storing data. One product is Microsoft Azure SQL. Which gives us the capability to create and manage instances of SQL Servers hoste

Riya Vijay Vishwakarma 1 Dec 12, 2021
BatchFlow helps you conveniently work with random or sequential batches of your data and define data processing and machine learning workflows even for datasets that do not fit into memory.

BatchFlow BatchFlow helps you conveniently work with random or sequential batches of your data and define data processing and machine learning workflo

Data Analysis Center 185 Dec 20, 2022