Senior Data Engineer

Aivar Innovations

Bengaluru, Karnataka, India

Full-time•

Engineering

About Aivar Innovations

Aivar Innovations is an AI-native services and software company and AWS Preferred Partner, backed by Bessemer Venture Partners and Sorin Investments. Founded by four former Amazon Web Services senior leaders, Aivar combines deep cloud engineering expertise with AI-first thinking to build and deploy production-grade solutions at enterprise scale. We raised $4.6 million in seed funding in January 2026 and serve 100+ customers across fintech, healthcare, and technology verticals in 7+ industries globally.

Aivar means "five people" in Tamil — four co-founders and you, the customer. That philosophy shapes everything: we hold the customer at the centre of every architecture decision, every delivery milestone, and every line of code we write.

Our Three AI Accelerator Platforms

CONVOGENT:- Enterprise-grade voice and agent AI automation platform — intelligent virtual agents, multi-turn conversational flows, and voice-enabled self-service across telephony and digital channels.

VELOGENT:- Governed agentic process automation for regulated industries — AI orchestration with audit-ready governance controls, enabling safe automation of complex workflows in compliance-sensitive environments.

KUBOGENT:- Kubernetes-native AIOps platform — intelligent infrastructure management, automated incident response, and ML-driven observability at scale, reducing time-to-production by 60% with 99.9% uptime SLA.

WHAT YOU'LL BUILD:-

You will be the engineering backbone behind the data pipelines that power Aivar's three AI accelerators — Convogent, Velogent, and Kubogent. Everything our autonomous agents reason over, every compliance trace they produce, and every real-time insight they surface flows through infrastructure you design, build, and own. This is not pipeline maintenance — you will architect brand-new data systems from first principles, working directly with former AWS leaders and enterprise clients to solve problems that matter.

Design and operationalise large-scale ingestion pipelines that transform unstructured enterprise data — invoices, PDFs, transaction records, audio transcripts, RFQs — into clean, queryable datasets at terabyte scale.
Build the feature engineering layer that feeds agentic AI models in Velogent: entity extraction, document classification, semantic tagging, and compliance-aware embeddings.
Architect the real-time data backbone for Convogent — near-zero-latency event streams that drive voice agent decision-making across global telephony channels.
Instrument and harden the observability data layer for Kubogent AIOps — metric ingestion, anomaly detection feeds, and ML-driven incident correlation pipelines.
Establish data quality, lineage, and audit frameworks that satisfy regulated industries (GDPR, HIPAA, SOC 2, RBI guidelines) — because our clients' compliance is non-negotiable.

ROLE OVERVIEW:-

As a Senior Data Engineer at Aivar, you operate at the intersection of raw engineering craft and AI product impact. You own your pipelines end-to-end: from requirements gathering with the Solutions Architect, through design, build, and test, to production monitoring and SLA ownership. You bring the discipline of a platform engineer, the curiosity of an AI practitioner, and the rigour of someone who has shipped data systems that enterprise clients bet their operations on.

You will work on 2–4 simultaneous client delivery projects and may participate in presales technical scoping alongside the Data Architect. Strong written communication, design-document authoring, and code review leadership are as important as your hands-on coding ability.

KEY RESPONSIBILITIES:-

Pipeline Design & Engineering

Design end-to-end data pipelines — batch and streaming — processing large volumes of unstructured enterprise data (documents, PDFs, transaction records, email threads) with automated validation and quality checks.
Build production-grade data ingestion frameworks supporting multiple source systems (ERP, CRM, S3, Kafka topics, REST APIs) with schema evolution, dead-letter queuing, and retry logic.
Implement large-scale distributed processing using Apache Spark, Apache Flink, or AWS Glue handling terabytes of data reliably and cost-efficiently.
Develop advanced feature engineering pipelines — document entity extraction, semantic tagging, vector embeddings — to feed Aivar's agentic AI reasoning systems.

Data Architecture & Warehousing

Design lakehouse architectures (medallion: Bronze → Silver → Gold) on AWS S3 using open table formats (Apache Iceberg, Delta Lake, Apache Hudi) with schema evolution and time-travel capabilities.
Architect data warehousing solutions supporting both near-real-time operational queries for agentic AI and historical analytical workloads for client reporting (Snowflake, Redshift, Athena).
Implement efficient data partitioning, Z-ordering, and compaction strategies to optimise query performance and cloud storage costs.

Data Quality, Governance & Compliance

Build data quality frameworks using Great Expectations, Deequ, or custom validators — ensuring accuracy, completeness, and freshness critical for autonomous agent decision-making.
Implement end-to-end data lineage tracking, metadata management, and audit trail generation for regulated environments (HIPAA, GDPR, SOC 2).
Lead data security implementation for sensitive information — PII tokenisation, field-level encryption, RBAC, and data masking — across all managed pipelines.

NLP & AI Data Enablement

Build document understanding pipelines — OCR, layout analysis, named entity recognition, relationship extraction — using spaCy, Hugging Face Transformers, and AWS Textract.
Design and manage vector data stores (Pinecone, Weaviate, pgvector) for retrieval-augmented generation (RAG) and semantic search use cases.
Collaborate with ML Engineers to design and serve feature stores supporting low-latency model inference in production agentic flows.

Operational Excellence

Instrument all pipelines with observability: metrics, traces, and alerts via Prometheus, Grafana, and OpenTelemetry; own SLA adherence.
Maintain infrastructure-as-code discipline — provision and manage all data infrastructure through Terraform or AWS CDK.
Conduct code reviews, write technical design documents, and actively mentor junior Data Engineers on the team.

Required Skills & Technical Competencies

Unstructured data mastery — production ingestion, OCR, and processing of documents, PDFs, images, and logs at scale.
Distributed computing — Apache Spark, Apache Flink, or AWS Glue at production scale; performance tuning and cost optimisation experience.
Expert Python — data processing, ETL/ELT pipeline development, data science workflows; production-grade, not notebook-level.
NLP/text processing — document understanding, entity extraction, semantic processing (spaCy, Hugging Face Transformers, LangChain).
Lakehouse & lake architecture — experience with medallion patterns, Iceberg/Delta Lake/Hudi, schema evolution, and time-travel queries.
AWS data services — S3, Athena, Glue, RDS (PostgreSQL/Aurora), DynamoDB, MSK (Managed Kafka), EMR.
Streaming data — Kafka (MSK), Redis Streams, or Kinesis for real-time event processing.
Data quality & governance — metadata management, lineage tracking, GDPR/HIPAA/SOC 2 compliance frameworks.
dbt — modelling transformations, testing, documentation, and CI/CD for analytics engineering.
Infrastructure as Code — Terraform or AWS CDK for provisioning and managing cloud data infrastructure.

Preferred Experience

Prior experience in a data engineering role at an AI/ML product company, cloud consulting firm, or enterprise SaaS environment.
AWS Certified Data Engineer – Associate or AWS Certified Solutions Architect certification.
Experience integrating LLM APIs (OpenAI, Anthropic, AWS Bedrock) into production data workflows.
Familiarity with data catalog and governance tools such as AWS Glue Data Catalog, Collibra, Atlan, or Apache Atlas.
Exposure to agentic AI patterns — tool-use, multi-step reasoning, retrieval-augmented generation (RAG).
Experience working in a regulated industry (banking, insurance, healthcare) with awareness of RBI, IRDAI, or HIPAA data handling requirements.
Open-source contribution or published technical writing on data engineering topics.

DIVERSITY & INCLUSION

Aivar Innovations is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to gender, gender identity, sexual orientation, religion, disability, age, marital status, caste, or any other protected characteristic. We are committed to building a diverse, inclusive, and respectful workplace for everyone.

About the company

Aivar Innovations

Company website•Technology, Information and Internet

Aivar is an AI-first technology partner where cutting-edge technology meets industry expertise to supercharge your projects. Our AI-augmented teams accelerate development, reduce time-to-market, and deliver exceptional code quality. We bring together the best minds in tech to craft scalable, repeatable solutions that drive real momentum for your business.