The mission
We are
aiming to reduce the "Time to Market" in our semiconductor R&D by
revolutionizing how we analyze test data. Currently, our engineers manually
parse complex JSON files containing parametric extraction and electrical
curves.
Inspired by
recent literature on AI-Driven Semiconductor Test Data Analytics, we
want to build a system that combines a Data Translator with an
Interactive AI Agent. Your goal is to create a python-based framework where an
engineer can ask natural language questions (e.g., "Show me the defect
pattern on Wafer X" or "Compare the yield of Lot A vs Lot
B") and receive instant statistical insights and visual diagnostics.
Key responsibilities
- Literature review & strategy: Start by analyzing
state-of-the-art papers (such as Wang et al. on IEA-Plot and recent
TPOR frameworks). Benchmark the pros and cons of using Knowledge Graphs
vs. Vector RAG vs. SQL Agents for our specific data topology.
- Universal data translator: Design
a pipeline to ingest strandardized JSON data files (curves, parameters, flags)
and standardize them into a unified format (Parquet?) suitable
for AI querying.
- Agentic AI development: Build a "Code
Interpreter" agent using Python (LangChain/LlamaIndex) and local LLMs
(Llama 3, Mistral) on our GPU servers. The agent must be capable of
writing code to perform statistical aggregations (PCA, Mean, Sigma)
without hallucination.
- Advanced visualization: Go beyond basic charts.
Implement an automated plotting module capable of generating distribution curves, and correlation plots based on the chat context.
- Prototyping: Wrap this technology in a
user-friendly web interface (Streamlit) for immediate feedback
from R&D engineers. It's possible that the student considers an integration in Azure or in Copilot studio but a stand alone solution is prefered. Decision will be decided after point 1 (Literature review & strategy)
What you will learn
- Research-to-Production: How to take concepts from
academic papers and implement them in
a real industrial environment.
- Agentic workflows: Mastering the intersection of
LLMs and deterministic code execution.
- Semiconductor domain knowledge: Understanding the Parametric Chip/Device MOSFET testing.
- High-Performance computing: Utilizing local GPU
infrastructure for secure, private model inference.
Requirements
- Master’s
student in CS, Data Science, AI, or EE. Ph.D exchange.
- Understanding of LLM architectures (transformers, LSTM) and AI fundamentals is a must. Familiarity with advanced LLMs architecture is preferred. Experience in already trained LLM models locally is preferred.
- Strong Python skills: Experience in deep learning frameworks (tensorflow, pytorch) and related libraries such as pandas, numpy, is a must.
- Academic mindset: You are comfortable reading
IEEE/research papers and extracting the methodology to apply it to code.
- GenAI Knowledge: Understanding of RAG
(Retrieval Augmented Generation) and LLM limitations.
- Familiarity with Git and
Linux/Unix environments.
Nice to Have
- Knowledge of Knowledge
Graphs (Neo4j, NetworkX).
- Experience processing spatial
data (heatmaps/wafer maps).
How to
Apply
Please
send your CV. Bonus: In your email, briefly
mention one challenge you foresee in applying LLMs to numerical scientific data- resume
- motivation
- current study
Incomplete applications will not be considered.