Vision / Multimodal ML Engineer at connecthum

We’re hiring a Vision | Multimodal ML Engineer for a fast-growing AI safety infrastructure startup building reliability and control systems for large-scale AI deployments.The Company Behind the Role:AI-native infrastructure productFocused on AI safety, reliability and model optimizationBacked by international investorsOperating at large production scaleSmall, highly technical team with fast shipping cyclesThe company builds systems that help organizations define, evaluate, and enforce how AI models behave in real-world environments.Your Impact:Train and fine-tune vision-language modelsExtend multimodal systems to video & long-context tasksDesign alignment pipelines (RL-based & preference optimization)Build evaluation benchmarks for multimodal reasoningCurate and optimize large-scale multimodal datasetsOptimize inference (quantization, batching, latency improvements)Work with advanced model architectures for efficient scalingThis role sits at the intersection of research and production.Tech Environment (High-Level):PyTorch-based distributed trainingVision encoders + LLM integrationVideo & temporal modelingMultimodal alignment techniquesLarge-scale inference optimizationProduction model serving infrastructure(Deep technical stack shared during interviews)Your Superpower:3+ years training and fine-tuning vision-language modelsStrong understanding of multimodal architecture designExperience with alignment techniques beyond text-only modelsHands-on experience with video or long-context modelingTrack record shipping multimodal systems to productionStrong distributed training experienceSolid engineering fundamentalsBonus Points If:Experience with advanced routing or sparse model architecturesBackground in AI safety or model evaluationExperience with synthetic data generation pipelinesWhy Join:Competitive compensation + equityHybrid setup in Europe + relocation supportComprehensive health coverageTop-tier hardware & toolsTeam off-sitesBudget for learning & AI tooling