Rijul Dahiya

Projects
Experience
Awards
Contact

NYU/Citibank Capstone Project

Duration: Sep 2025 – Present

Technologies: Python, Java, C++, Rust, BPMN, jBPM, InfluxDB, gRPC, Docker

Architected enterprise-grade multi-language trading platform supporting 1000+ diverse client scenarios with sophisticated workflow orchestration.

Implemented BPMN/jBPM-based process automation for trade lifecycle management.

Key Achievements:

• Optimized InfluxDB time-series queries achieving 40% performance improvement through custom indexing strategies and query rewrites

• Built high-performance gRPC APIs enabling sub-millisecond inter-service communication across polyglot microservices

• Designed fault-tolerant distributed system with circuit breakers and retry mechanisms ensuring 99.9% uptime

• Containerized entire application stack with Docker, enabling seamless deployment across development and production environments

Mechanistic Interpretability Research

Duration: Sep 2024 – Dec 2024

Course: NYU Emerging Topics in NLP

Technologies: PyTorch, Transformers, CUDA, Python, Distributed Computing

Conducted comprehensive research into the inner workings of multi-modal language models, investigating how neural networks process information across modalities.

Research Contributions:

• Analyzed 10,000+ activation patterns across 5+ state-of-the-art models (GPT, CLIP, Flamingo) using activation patching techniques

• Developed custom PyTorch hooks and intervention methods to trace information flow through transformer layers

• Implemented distributed computing pipeline on Unix/Linux clusters for parallel model evaluation across multiple GPUs

• Discovered novel insights into cross-modal attention mechanisms and feature composition in vision-language models

• Presented findings demonstrating 35% improvement in model interpretability metrics compared to baseline methods

User Behavior Analysis for Emerging Markets

Duration: Jan 2023 – Jun 2023

Lab: ICTD Lab, University of Washington

Technologies: Python, Scikit-learn, Pandas, K-means, DBSCAN, Random Forest

Led data science initiative analyzing user behavior patterns across 500+ Tanzanian enterprise datasets to drive strategic business decisions.

Impact & Methodology:

• Processed and cleaned 500+ datasets spanning retail, agriculture, and telecommunications sectors

• Applied unsupervised learning to identify 12 distinct user behavior clusters revealing unknown market segments

• Built predictive models achieving 87% accuracy in customer churn prediction

• Improved business decision-making accuracy by 25% across 3 industry verticals

• Created interactive dashboards for real-time KPI monitoring

• Published findings contributing to ICTD research on technology adoption patterns in developing economies

Deep Learning for Hydrological Forecasting

Duration: May 2022 – Dec 2022

University: Arizona State University

Technologies: PyTorch, ConvLSTM, Attention Mechanisms, GIS, Remote Sensing

Developed advanced deep learning architecture for streamflow prediction, achieving NSE of 0.91 and 8.33% improvement over baseline models.

Technical Innovation:

• Designed custom Sequence-to-Sequence Channel Attention ConvLSTM architecture

• Integrated multi-source data: satellite imagery, meteorological records, topographical maps

• Achieved NSE of 0.91 and Percent Bias of 0.04%, surpassing traditional hydrological models

• Improved prediction accuracy by 8.33% over baseline through attention mechanisms

• Processed 10+ years of historical data with sophisticated feature engineering

• Implemented ensemble methods for robust uncertainty quantification

Real-world Impact: Model deployed for water resource management supporting irrigation planning and flood risk assessment

TorchGeo Open Source Contribution

Organization: Microsoft Research

Technologies: PyTorch, Geospatial Analysis, Remote Sensing, Git, Python

Contributed to Microsoft's TorchGeo library for geospatial machine learning, expanding temporal satellite data processing capabilities.

Contribution Details:

• Implemented complete substation dataset module with comprehensive data loading and preprocessing pipelines

• Designed 5 temporal aggregation strategies (mean, median, max, min, percentile) for multi-temporal satellite imagery

• Developed efficient data indexing system enabling fast retrieval from large-scale geospatial datasets (100GB+)

• Created extensive unit tests achieving 95% code coverage and comprehensive documentation

• Collaborated with Microsoft researchers through code reviews and GitHub pull request discussions

Impact: Feature now used by researchers worldwide for climate monitoring and land use classification

TerraTorch Open Source Contribution

Organization: IBM Research

Technologies: PyTorch, Transformers, Computer Vision, Time Series, Python

Enhanced IBM's TerraTorch framework adding temporal processing for satellite time-series analysis, achieving 15% improvement in change detection.

Technical Contribution:

• Architected temporal processing module integrating with existing encoder architectures (ResNet, ViT, Swin Transformer)

• Implemented 3D convolution and recurrent processing layers for multi-temporal satellite sequences

• Designed flexible API for adapting spatial models to temporal earth observation tasks

• Optimized memory efficiency handling high-resolution satellite imagery sequences (512x512+)

• Validated on benchmarks demonstrating 15% improvement in change detection

• Contributed comprehensive documentation, tutorials, and example notebooks

Applications: Enables deforestation tracking, urban growth monitoring, and disaster response workflows

Machine Learning Intern

Company: Agree.com

Location: New York, NY

Duration: May 2025 – Aug 2025

Technologies: Python, LLMs (GPT-4, Claude), React, TypeScript, ONNX, OCR

Spearheaded AI-powered document processing systems integrating LLMs and computer vision, achieving 90% OCR accuracy and 80% latency reduction.

Key Accomplishments:

Payment Intent Detection: Architected full-stack microservice leveraging LLMs (GPT-4) to automatically extract payment terms, reducing manual processing time by 75%

• Built distributed processing pipeline handling 10,000+ documents daily with parallel job scheduling

• Implemented automated invoice generation processing $2M+ in monthly transactions

Form Field Detection: Developed ML-powered OCR achieving 90% accuracy in extracting structured data

• Optimized ONNX model inference reducing latency by 80% through quantization

• Integrated detection system into React TypeScript editor with real-time preview

Impact: Serving 500+ enterprise clients, saving 100+ hours weekly in manual document processing

Software Engineer

Company: Groupon

Location: Bengaluru, India

Duration: Jul 2023 – Aug 2024

Technologies: React, Node.js, PostgreSQL, Redis, RabbitMQ, AWS, Docker, Kubernetes

Led email marketing infrastructure handling 1M+ daily transactions, reducing deployment time by 70% and generating $15M+ quarterly revenue.

Technical Achievements:

Distributed Email Infrastructure: Scaled system handling 1M+ daily transactions with 99.8% delivery rate

• Optimized PostgreSQL reducing query times by 60% through strategic indexing

• Implemented Redis caching decreasing API response times from 800ms to 120ms

Bulk Campaign Creator: Developed React/Node.js application enabling 70% faster campaign deployment

• Built drag-and-drop interface with real-time preview and A/B testing

Scalability: Designed auto-scaling infrastructure supporting 18% month-over-month growth

• Deployed Kubernetes clusters with horizontal pod autoscaling for traffic spikes

• Implemented Grafana dashboards tracking 50+ KPIs

Business Impact: Generated $15M+ quarterly revenue through improved campaign effectiveness

Software Engineer Intern

Company: Bank of New York Mellon

Location: Chennai, India

Duration: Jan 2023 – Jun 2023

Technologies: Java, Spring Boot, Angular, SQL Server, LangChain, Python, OAuth2.0

Built financial microservices and AI-powered chatbot, reducing workflow processing time from 3 days to 1 day and achieving 30% faster data retrieval.

Development Highlights:

Financial Reporting Microservices: Built scalable Java Spring Boot applications with RESTful APIs

• Designed SQL Server schema with optimized queries for multi-million record datasets

• Developed Angular frontend with Material Design featuring CRUD, pagination, and filtering

• Implemented OAuth2.0/JWT authentication ensuring banking security compliance

Workflow Automation: Created data ingestion pipeline reducing processing time from 3 days to 1 day

AI-Powered Chatbot: Developed LangChain-based NLP assistant with semantic search

• Achieved 30% faster data retrieval for 200+ analysts

Recognition: Outstanding performance rating and full-time offer for ahead-of-schedule delivery

Software Engineer Intern

Company: Providence Healthcare

Location: Hyderabad, India

Duration: May 2022 – Jul 2022

Technologies: React, .NET Core, C#, SQL Server, Azure AD, HIPAA Compliance, RBAC

Architected HIPAA-compliant security framework protecting patient data across 8+ applications serving 5,000+ healthcare professionals.

Security Implementation:

Enterprise Security Framework: Deployed React and .NET Core security solution across 8+ healthcare applications

• Implemented RBAC system with granular permissions for 5,000+ professionals

• Designed audit logging for HIPAA compliance tracking all data access events

• Integrated Azure AD for single sign-on and multi-factor authentication

Database Security: Developed SQL Server encrypted layer with row-level security for PHI

• Implemented data masking for sensitive information

Compliance: Ensured HIPAA, HITRUST, and SOC 2 standards compliance

• Conducted security reviews identifying and resolving 15+ vulnerabilities

Leadership: Collaborated across DevOps, QA, and compliance to deliver on tight timeline

Research Intern

Organization: Indian Space Research Organization (ISRO)

Location: Space Applications Centre, Jodhpur, India

Duration: May 2021 – Jul 2021

Technologies: Python, TensorFlow, Keras, Pandas, NumPy, GDAL, LiDAR Processing

Developed ML pipeline for terrain mapping from LiDAR data, improving accuracy by 12% and processing 500+ km² for ISRO missions.

Research Contributions:

ML Pipeline: Designed end-to-end pipeline processing massive LiDAR datasets

• Implemented point cloud classification using deep learning for ground/vegetation/building distinction

• Optimized workflows handling 10GB+ raw LiDAR files efficiently

Digital Elevation Models: Generated high-resolution DEMs (1-meter resolution) for terrain analysis

• Improved mapping accuracy by 12% through neural network-based surface reconstruction

• Validated across diverse terrain types (desert, mountainous, urban)

Coverage & Scale: Processed 500+ km² supporting multiple ISRO missions

• Developed automated quality control ensuring mission-critical accuracy

• Created visualization tools for 3D terrain rendering

Impact: Research used for mission planning and resource assessment. Received commendation from senior ISRO scientists.

Merit Scholarship

Awarded by: BITS Pilani

Year: 2019-2023

Recognized as top 5% of students and awarded merit scholarship for academic excellence during undergraduate studies.

Best Short Film Award

Awarded by: Sports Engineering Association, India

Year: 2022

Awarded at 2nd International Conference of Sports Engineering for creative excellence in technical communication.

ACM Winter School

Organization: Association for Computing Machinery

Year: 2022

Selected to participate in ACM Winter School program for advanced computer science training and research.

Location: Jersey City, NJ

Email: rijul(dot)dahiya(at)gmail.com

Website: rijul.co

LinkedIn: linkedin.com/in/rijuldahiya

GitHub: github.com/rijuld

Medium: medium.com/@rijuldahiya

Twitter: @DahiyaRiju85696