DataDragon - DBA & Data Analytics
Data Engineering & ETL Tools
- Apache Spark, Hadoop
- Airflow, dbt (data build tool)
- Talend, Informatica, AWS Glue
- Snowflake, Amazon Redshift, Google BigQuery
- Azure Synapse Analytics, Databricks
- Python (Pandas, NumPy, PySpark)
- R for statistical analysis
- Scala for big data processing
- Tableau, Power BI, Looker
- Jupyter Notebooks for exploration
- GDPR (General Data Protection Regulation)
- ISO 27001 (Information Security Management)
- Encryption protocols (TLS/SSL, AES)
- Role-based access control (RBAC)
- RESTful APIs, GraphQL for data integration
- Kafka for real-time data streaming
Relational Databases & SQL
Relational databases- MySQL – Open-source, widely used for web applications (e.g., WordPress, e-commerce).
- PostgreSQL – Advanced, enterprise-grade with support for JSON, GIS (spatial data).
- SQL Server – Microsoft's solution, often found in corporate environments.
- Oracle Database – Robust, widely used in finance, banking, and enterprise systems.
- MariaDB – A MySQL fork, optimized for high performance and scalability.
- Writing queries (SELECT, INSERT, UPDATE, DELETE).
- Creating tables (CREATE TABLE).
- Indexing data for faster access (CREATE INDEX).
- Managing relationships (JOIN, FOREIGN KEY constraints).
- Running aggregate functions (SUM, AVG, COUNT).
NoSQL databases store data in flexible, schema-less formats, suitable for handling large-scale applications and unstructured data. Examples include:
- MongoDB – Document-oriented; stores data in JSON-like format, great for applications with changing schemas.
- Cassandra – Distributed, high-speed key-value store used for handling big data.
- Redis – An in-memory key-value store optimized for caching and fast data retrieval.
- DynamoDB – AWS’s managed NoSQL solution, offering seamless scalability.
- MongoDB Query Language (MQL) – JSON-like syntax (db.collection.find({ "name": "Alice" })).
- Cassandra Query Language (CQL) – SQL-like syntax (SELECT * FROM users WHERE id = 1).
- Key-based Lookups – Used in Redis and DynamoDB for fast data retrieval (GET user:1234).
- Graph databases are optimized for storing complex relationships between entities.
- Neo4j – Uses Cypher Query Language (MATCH (a:User)-[:FRIEND]->(b) RETURN b).
- Amazon Neptune – Managed graph database supporting Gremlin queries.
- ArangoDB – Multi-model database combining key-value, document, and graph features.
- Social networks (friendship graphs).
- Fraud detection (linking financial transactions).
- Columnar & Time-Series Databases
- ClickHouse – Open-source columnar database, lightning-fast for analytical workloads.
- Google Bigtable – Column-family database designed for massive scalability.
- InfluxDB – Optimized for storing time-series data (IoT, server monitoring).
- TimescaleDB – PostgreSQL-based solution for time-series data.
- Columnar databases are used in analytical environments where reading large datasets efficiently is required.
- ElasticSearch – Text search and analytics database using Elasticsearch Query DSL.
- Memcached – Simple in-memory caching system.
- Apache Solr – Powerful search platform supporting structured queries.