
 
       
       
       
      Research and development projects across machine learning, NLP, and distributed systems.
Duration: Sep 2025 – Present
Technologies: Python, Java, C++, Rust, BPMN, jBPM, InfluxDB, gRPC, Docker
Architected enterprise-grade multi-language trading platform supporting 1000+ diverse client scenarios with sophisticated workflow orchestration.
Implemented BPMN/jBPM-based process automation for trade lifecycle management.
Key Achievements:
• Optimized InfluxDB time-series queries achieving 40% performance improvement through custom indexing strategies and query rewrites
• Built high-performance gRPC APIs enabling sub-millisecond inter-service communication across polyglot microservices
• Designed fault-tolerant distributed system with circuit breakers and retry mechanisms ensuring 99.9% uptime
• Containerized entire application stack with Docker, enabling seamless deployment across development and production environments
Duration: Sep 2024 – Dec 2024
Course: NYU Emerging Topics in NLP
Technologies: PyTorch, Transformers, CUDA, Python, Distributed Computing
Conducted comprehensive research into the inner workings of multi-modal language models, investigating how neural networks process information across modalities.
Research Contributions:
• Analyzed 10,000+ activation patterns across 5+ state-of-the-art models (GPT, CLIP, Flamingo) using activation patching techniques
• Developed custom PyTorch hooks and intervention methods to trace information flow through transformer layers
• Implemented distributed computing pipeline on Unix/Linux clusters for parallel model evaluation across multiple GPUs
• Discovered novel insights into cross-modal attention mechanisms and feature composition in vision-language models
• Presented findings demonstrating 35% improvement in model interpretability metrics compared to baseline methods
Duration: Jan 2023 – Jun 2023
Lab: ICTD Lab, University of Washington
Technologies: Python, Scikit-learn, Pandas, K-means, DBSCAN, Random Forest
Led data science initiative analyzing user behavior patterns across 500+ Tanzanian enterprise datasets to drive strategic business decisions.
Impact & Methodology:
• Processed and cleaned 500+ datasets spanning retail, agriculture, and telecommunications sectors
• Applied unsupervised learning to identify 12 distinct user behavior clusters revealing unknown market segments
• Built predictive models achieving 87% accuracy in customer churn prediction
• Improved business decision-making accuracy by 25% across 3 industry verticals
• Created interactive dashboards for real-time KPI monitoring
• Published findings contributing to ICTD research on technology adoption patterns in developing economies
Duration: May 2022 – Dec 2022
University: Arizona State University
Technologies: PyTorch, ConvLSTM, Attention Mechanisms, GIS, Remote Sensing
Developed advanced deep learning architecture for streamflow prediction, achieving NSE of 0.91 and 8.33% improvement over baseline models.
Technical Innovation:
• Designed custom Sequence-to-Sequence Channel Attention ConvLSTM architecture
• Integrated multi-source data: satellite imagery, meteorological records, topographical maps
• Achieved NSE of 0.91 and Percent Bias of 0.04%, surpassing traditional hydrological models
• Improved prediction accuracy by 8.33% over baseline through attention mechanisms
• Processed 10+ years of historical data with sophisticated feature engineering
• Implemented ensemble methods for robust uncertainty quantification
Real-world Impact: Model deployed for water resource management supporting irrigation planning and flood risk assessment
Organization: Microsoft Research
Technologies: PyTorch, Geospatial Analysis, Remote Sensing, Git, Python
Contributed to Microsoft's TorchGeo library for geospatial machine learning, expanding temporal satellite data processing capabilities.
Contribution Details:
• Implemented complete substation dataset module with comprehensive data loading and preprocessing pipelines
• Designed 5 temporal aggregation strategies (mean, median, max, min, percentile) for multi-temporal satellite imagery
• Developed efficient data indexing system enabling fast retrieval from large-scale geospatial datasets (100GB+)
• Created extensive unit tests achieving 95% code coverage and comprehensive documentation
• Collaborated with Microsoft researchers through code reviews and GitHub pull request discussions
Impact: Feature now used by researchers worldwide for climate monitoring and land use classification
Organization: IBM Research
Technologies: PyTorch, Transformers, Computer Vision, Time Series, Python
Enhanced IBM's TerraTorch framework adding temporal processing for satellite time-series analysis, achieving 15% improvement in change detection.
Technical Contribution:
• Architected temporal processing module integrating with existing encoder architectures (ResNet, ViT, Swin Transformer)
• Implemented 3D convolution and recurrent processing layers for multi-temporal satellite sequences
• Designed flexible API for adapting spatial models to temporal earth observation tasks
• Optimized memory efficiency handling high-resolution satellite imagery sequences (512x512+)
• Validated on benchmarks demonstrating 15% improvement in change detection
• Contributed comprehensive documentation, tutorials, and example notebooks
Applications: Enables deforestation tracking, urban growth monitoring, and disaster response workflows
A journey through prestigious organizations and innovative projects.
Company: Agree.com
Location: New York, NY
Duration: May 2025 – Aug 2025
Technologies: Python, LLMs (GPT-4, Claude), React, TypeScript, ONNX, OCR
Spearheaded AI-powered document processing systems integrating LLMs and computer vision, achieving 90% OCR accuracy and 80% latency reduction.
Key Accomplishments:
• Payment Intent Detection: Architected full-stack microservice leveraging LLMs (GPT-4) to automatically extract payment terms, reducing manual processing time by 75%
• Built distributed processing pipeline handling 10,000+ documents daily with parallel job scheduling
• Implemented automated invoice generation processing $2M+ in monthly transactions
• Form Field Detection: Developed ML-powered OCR achieving 90% accuracy in extracting structured data
• Optimized ONNX model inference reducing latency by 80% through quantization
• Integrated detection system into React TypeScript editor with real-time preview
Impact: Serving 500+ enterprise clients, saving 100+ hours weekly in manual document processing
Company: Groupon
Location: Bengaluru, India
Duration: Jul 2023 – Aug 2024
Technologies: React, Node.js, PostgreSQL, Redis, RabbitMQ, AWS, Docker, Kubernetes
Led email marketing infrastructure handling 1M+ daily transactions, reducing deployment time by 70% and generating $15M+ quarterly revenue.
Technical Achievements:
• Distributed Email Infrastructure: Scaled system handling 1M+ daily transactions with 99.8% delivery rate
• Optimized PostgreSQL reducing query times by 60% through strategic indexing
• Implemented Redis caching decreasing API response times from 800ms to 120ms
• Bulk Campaign Creator: Developed React/Node.js application enabling 70% faster campaign deployment
• Built drag-and-drop interface with real-time preview and A/B testing
• Scalability: Designed auto-scaling infrastructure supporting 18% month-over-month growth
• Deployed Kubernetes clusters with horizontal pod autoscaling for traffic spikes
• Implemented Grafana dashboards tracking 50+ KPIs
Business Impact: Generated $15M+ quarterly revenue through improved campaign effectiveness
Company: Bank of New York Mellon
Location: Chennai, India
Duration: Jan 2023 – Jun 2023
Technologies: Java, Spring Boot, Angular, SQL Server, LangChain, Python, OAuth2.0
Built financial microservices and AI-powered chatbot, reducing workflow processing time from 3 days to 1 day and achieving 30% faster data retrieval.
Development Highlights:
• Financial Reporting Microservices: Built scalable Java Spring Boot applications with RESTful APIs
• Designed SQL Server schema with optimized queries for multi-million record datasets
• Developed Angular frontend with Material Design featuring CRUD, pagination, and filtering
• Implemented OAuth2.0/JWT authentication ensuring banking security compliance
• Workflow Automation: Created data ingestion pipeline reducing processing time from 3 days to 1 day
• AI-Powered Chatbot: Developed LangChain-based NLP assistant with semantic search
• Achieved 30% faster data retrieval for 200+ analysts
Recognition: Outstanding performance rating and full-time offer for ahead-of-schedule delivery
Company: Providence Healthcare
Location: Hyderabad, India
Duration: May 2022 – Jul 2022
Technologies: React, .NET Core, C#, SQL Server, Azure AD, HIPAA Compliance, RBAC
Architected HIPAA-compliant security framework protecting patient data across 8+ applications serving 5,000+ healthcare professionals.
Security Implementation:
• Enterprise Security Framework: Deployed React and .NET Core security solution across 8+ healthcare applications
• Implemented RBAC system with granular permissions for 5,000+ professionals
• Designed audit logging for HIPAA compliance tracking all data access events
• Integrated Azure AD for single sign-on and multi-factor authentication
• Database Security: Developed SQL Server encrypted layer with row-level security for PHI
• Implemented data masking for sensitive information
• Compliance: Ensured HIPAA, HITRUST, and SOC 2 standards compliance
• Conducted security reviews identifying and resolving 15+ vulnerabilities
Leadership: Collaborated across DevOps, QA, and compliance to deliver on tight timeline
Organization: Indian Space Research Organization (ISRO)
Location: Space Applications Centre, Jodhpur, India
Duration: May 2021 – Jul 2021
Technologies: Python, TensorFlow, Keras, Pandas, NumPy, GDAL, LiDAR Processing
Developed ML pipeline for terrain mapping from LiDAR data, improving accuracy by 12% and processing 500+ km² for ISRO missions.
Research Contributions:
• ML Pipeline: Designed end-to-end pipeline processing massive LiDAR datasets
• Implemented point cloud classification using deep learning for ground/vegetation/building distinction
• Optimized workflows handling 10GB+ raw LiDAR files efficiently
• Digital Elevation Models: Generated high-resolution DEMs (1-meter resolution) for terrain analysis
• Improved mapping accuracy by 12% through neural network-based surface reconstruction
• Validated across diverse terrain types (desert, mountainous, urban)
• Coverage & Scale: Processed 500+ km² supporting multiple ISRO missions
• Developed automated quality control ensuring mission-critical accuracy
• Created visualization tools for 3D terrain rendering
Impact: Research used for mission planning and resource assessment. Received commendation from senior ISRO scientists.
Academic excellence and achievements throughout my journey
Awarded by: BITS Pilani
Year: 2019-2023
Recognized as top 5% of students and awarded merit scholarship for academic excellence during undergraduate studies.
Awarded by: Sports Engineering Association, India
Year: 2022
Awarded at 2nd International Conference of Sports Engineering for creative excellence in technical communication.
Organization: Association for Computing Machinery
Year: 2022
Selected to participate in ACM Winter School program for advanced computer science training and research.
Let's connect! Feel free to reach out for opportunities, collaborations, or just to say hello.
Location: Jersey City, NJ
Email: rijul(dot)dahiya(at)gmail.com
Website: rijul.co
LinkedIn: linkedin.com/in/rijuldahiya
GitHub: github.com/rijuld
Medium: medium.com/@rijuldahiya
Twitter: @DahiyaRiju85696