Program 2025 - Day Three | FPGA Conference Europe

Program 2025 - DAY THREE *

Download PDF - Program Overview

3 Days – 9 Thematic Focus Tracks – 100 Percent Knowhow

TUE - 1 JULY 2025

WED - 2 JULY 2025

THU - 3 JULY 2025

from 08:00 am

Check-In and Welcome coffee

9:00 - 9:40 am

Industrial IoT Challenges and Solutions for Real Time Communication | Application
Helmut Demel
Lattice Semiconductor GmbH

Wolfgang Loewer
El Camino GmbH

Description:
This presentation talks about challenges in the industrial IOT and solutions for real time communications, not only for factory automation. Key topics include motor control, predictive maintenance, Industrial Ethernet, and cloud connectivity via OPC-UA. Our partner El Camino is giving a real world example on a Profinet implementation. The Lattice Automate stack offers modular hardware platforms, IP building blocks, and software tools to accelerate the development of intelligent automation systems. These solutions enhance efficiency, safety, and real-time performance in industrial environments.

Level: Intermediate

Heterogeneous Design Flow with Adaptive SoCs | Embedded / Vision
Ernst Wehlage
plc2 GmbH

Description:
AMD Adaptive SoCs offer heterogeneous architectures that include processing systems and programmable hardware, but also DPU cores, AI engine arrays or network-on-chip (NOC). And for designers of hardware, software and application development, the AMD tools require different needs for different target groups.

The session will give you an overview of the design flow for Adaptive SoC technologies such as Zynq MPSoC and Versal families with software and hardware programmability.

Attending this seminar will help you if you are planning new projects on AMD technologies and if you are coming from other technologies.

Level: Beginner/Intermediate

Introduction to Altera® FPGA AI Suite for Altera® FPGAs and Altera’s New DSP Block Architecture | AI / ML
Thomas Siebert
Altera, an Intel Company

Description:
This introduction presents the Altera FPGA AI Suite and groundbreaking AI Tensor Blocks newly integrated into Altera’s latest FPGA device families for deep learning inference. These innovative FPGA components bring real-time, low-latency, and energy-efficient processing to the forefront, supported by the inherent advantages of Intel FPGAs, including I/O flexibility, dynamic reconfiguration, and long-term support. We delve into the Altera FPGA AI Suite, demonstrating its flexibility in achieving scalable performance and seamless integration with industry-leading frameworks like TensorFlow and PyTorch, facilitated by Quartus Prime Software. Moreover, we highlight the game-changing role of AI Tensor Blocks in enhancing deep learning inference performance.

Level: Beginner/Intermediate/Expert

Bridging the Gap: Integrating Software Lifecycles with FPGA Development | Tools & Methodologies
Jan Aniol
plc2 Design GmbH

Description:
Thema: Collaboration Between Software and FPGA Engineers

Bridging the Divide Between Software and Hardware Teams:

Tools and methodologies that foster better collaboration.
How software architects and FPGA engineers can co-develop through shared frameworks or Agile practices.

Integrating Software and FPGA Development Lifecycles:

Adapting DevOps principles (e.g., CI/CD, containerization) to FPGA development environments.

Level: Beginner

GateMate FPGA: Qualification for Radiation-Tolerant Applications | Board Design & Connectivity
Dr. Michael Gude
Cologne Chip AG

Description:
Radiation effects are a critical concern for electronic systems operating in space, high-energy physics experiments, and other radiation-prone environments. Ensuring FPGA reliability in such conditions requires rigorous qualification methodologies to assess susceptibility to radiation-induced failures, including Single Event Latchup or Upset (SEL, SEU) and Total Ionizing Dose (TID) effects.
This talk presents the results of a comprehensive radiation qualification campaign conducted at CERN and a heavy-ion testing campaign at UCL to evaluate the radiation tolerance of GateMate FPGA devices. The session will provide an overview of the test methodologies used, including register tests, block RAM stability tests, and configuration memory robustness evaluations. These tests provide valuable insights into the FPGA’s ability to withstand radiation exposure while maintaining functional integrity.
In addition to qualification results, the session will discuss practical considerations for designing radiation-tolerant systems using GateMate FPGAs. Real-world example designs will be presented to demonstrate mitigation strategies, including error detection and correction techniques, redundancy approaches, and system-level design choices that enhance resilience.

Level: Intermediate

Agentic AI Design on AMD Ryzen AI | Lecture
Jens Stapelfeldt & Mario Ruiz
AMD

Description:

Agenda:

What is Agentic AI? - Autonomy, tool use, goals, planning, and memory
Ryzen AI Overview - NPU architecture, performance edge, energy efficiency
Intro to AMD Lemonade Server - What it is, why it matters (low-latency local inference); Open-source deployment flexibility
What is RAG? - RAG components: retriever, vector store, generator; How agents use RAG for context-aware memory
Demo / Tutorial and work thought: Local RAG agent with AMD-optimized model using Lemonade Server

Building trustworthy and capable agentic AI systems requires a foundation of open innovation, scalable compute, and transparent design principles. In this talk, we explore how open-source initiatives — with AMD as a key driver — are accelerating the development of autonomous AI agents. We will highlight how AMD’s latest Ryzen™ AI processors, with integrated NPU, provide optimized on-device compute for real-time inferencing and agentic workloads. Additionally, we will dive into Lemonade Server, AMD’s open-source framework designed to simplify and democratize access to AI model deployment and orchestration. Through real-world use cases, we will demonstrate how open software, and collaborative ecosystems are critical for advancing safe, efficient, and adaptable agentic AI architectures. Open-source is no longer optional — it is essential for the future of agent autonomy.

Level: Beginner/Intermediate
Duration: 90 mins

9:40 - 9:50 am

Short break and option to change rooms

9:50 - 10:30 am

Leveraging AI Engine in Versal Devices for Optimized DSP Applications | Application
Stanislaw Klinke
EBV Elektronik GmbH & Co. KG

Description:
The growing demands for high-performance, energy-efficient systems have driven the evolution of FPGA architectures. Xilinx’s Versal devices, equipped with the AI Engine (AIE), present a groundbreaking solution for data-intensive tasks such as Digital Signal Processing (DSP). In this presentation, we will explore the architectural details of the AI Engine and how its unique features enable the acceleration of DSP workloads. We will dive into the core principles of the AI Engine’s architecture, discussing its SIMD (Single Instruction Multiple Data) processing elements, dataflow programming model, and high-bandwidth memory access. A key aspect of this talk will focus on the need for kernel optimization when working with the AIE, highlighting the impact of efficient resource utilization and throughput on the overall system performance. Examples will demonstrate the AI Engine’s application in DSP tasks, such as filtering, FFT (Fast Fourier Transform) computation, and real-time signal processing. By the end of this presentation, attendees will gain insights into the powerful capabilities of the AIE in Versal devices, and learn strategies to optimize DSP applications for maximum performance.

Level: Intermediate

System Device Tree: Improved Support for Multicore Systems with AMD Adaptive SoCs | Embedded / Vision
Alexander Flick
plc2 GmbH

Description:
With the new AMD Vitis Unified IDE the methodology to pass the metadata from hardware from Vivado to the Vitis Unified IDE has changed. One particular item is the unification of the system address map in the System Device Tree (SDT). This talk introduces the basics of it is working somewhat behind the scenes, to generate any specific processor target device tree. As such centralized concepts are required to properly support heterogeneous devices like the AMD Adaptive SoC, it should be in the toolbox of the embedded designer, right after a first general embedded system design experience.

Level: Intermediate/Expert

Ultra-Fast, Customized CNN AI on Any Altera FPGA | AI / ML
Mustafa Celik
Arrow

Leo Wiegand
ONE WARE

Description:
How to maximize AI FPGA performance? Using ONE AI and the Altera FPGA AI Suite to generate and optimize tailor-made AI models automatically. Ultra-fast AI on any Altera FPGA from MAX 10 to Agilex 5.

Level: Beginner/Intermediate

Let Systhesis Do Its Job But Check the Results | Tools & Methodologies
Harald Flügel
Arrow Central Europe GmbH

Description:
Modern Synthesis tools can do a lot. Besides just generating boolean logic from the HDL description they also can make use of hard macros availabe on the FPGA silicon. Dual-port RAM, both single-clock or dual-clock, build on embedded block RAM can easily be infered. However, the logic generated may behave different from what the designer has modelled.

Level: Expert

GateMate FPGA: High-Speed Transceiver (SerDes) Hands-On | Board Design & Connectivity
Patrick Urban
Cologne Chip AG

Description:
High-speed serial communication is a critical component of modern FPGA-based systems, playing a key role in applications such as data centers, telecommunications, industrial automation, and high-performance computing. Serializers/Deserializers (SerDes) enable efficient, high-bandwidth data transfer over minimal pin counts, making them indispensable for connecting FPGAs to external memory, processors, high-speed sensors, and networking interfaces.

This hands-on session will provide an in-depth look at the 5G high-speed transceiver (SerDes) capabilities of the GateMate FPGA, showcasing its features, configuration process, and debugging techniques. Attendees will gain practical insights into setting up high-speed links using the SerDes Wizard, a dedicated tool designed to simplify the configuration and optimization of serial communication interfaces. The session will also cover essential debugging methodologies, including eye diagram analysis, which provides a visual representation of signal integrity and helps diagnose issues related to jitter, noise, and signal distortion.

A key highlight of this session will be a live demonstration of a high-speed link between two GateMate FPGA devices, illustrating real-world performance and best practices for achieving stable and efficient data transmission. Attendees will learn how to fine-tune SerDes parameters for optimal performance and ensure robust link reliability in their own FPGA designs. This session is ideal for FPGA developers, hardware engineers, and system designers looking to integrate high-speed serial interfaces into their projects.

Level: Intermediate

Agentic AI Design on AMD Ryzen AI | Lecture
Jens Stapelfeldt & Mario Ruiz
AMD

Description:

Agenda:

What is Agentic AI? - Autonomy, tool use, goals, planning, and memory
Ryzen AI Overview - NPU architecture, performance edge, energy efficiency
Intro to AMD Lemonade Server - What it is, why it matters (low-latency local inference); Open-source deployment flexibility
What is RAG? - RAG components: retriever, vector store, generator; How agents use RAG for context-aware memory
Demo / Tutorial and work thought: Local RAG agent with AMD-optimized model using Lemonade Server

Building trustworthy and capable agentic AI systems requires a foundation of open innovation, scalable compute, and transparent design principles. In this talk, we explore how open-source initiatives — with AMD as a key driver — are accelerating the development of autonomous AI agents. We will highlight how AMD’s latest Ryzen™ AI processors, with integrated NPU, provide optimized on-device compute for real-time inferencing and agentic workloads. Additionally, we will dive into Lemonade Server, AMD’s open-source framework designed to simplify and democratize access to AI model deployment and orchestration. Through real-world use cases, we will demonstrate how open software, and collaborative ecosystems are critical for advancing safe, efficient, and adaptable agentic AI architectures. Open-source is no longer optional — it is essential for the future of agent autonomy.

Level: Beginner/Intermediate
Duration: 90 mins

10:30 - 11:00 am

Coffee break

11:00 - 11:40 am

Low Latency Optimized TCP Stack for High Frequency Trading on FPGA | Application
Yakup Erdem Yıldız & Fatih Küçük
Bull Technologies

Description:
I am pleased to submit our work titled "Low Latency Optimized TCP Stack for High Frequency Trading on FPGA" for consideration at FPGA Conference. This work presents the design, implementation, and validation of a custom-developed TCP stack optimized for ultra-low latency, specifically tailored for high-frequency trading (HFT) environments. Our FPGA-based solution, rigorously tested and validated in real-world financial systems, achieves sub-microsecond latencies by selectively disabling conventional TCP features and integrating critical trading logic directly on the FPGA. The work also details compliance testing using IXIA IxANVL and deployment at Borsa Istanbul colocation utilizing Nasdaq infrastructure. We believe this work will contribute significantly to discussions on FPGA applications in latency-critical domains and look forward to the opportunity to present and discuss our findings with the conference community.

Level: Beginner

Implementing and Profiling Collaborative CPU-FPGA Projects with Real-Time Requirements | Embedded / Vision
Alexander Wirthmueller
MPSI Technologies GmbH

Description:
A wide range of FPGA-SoC projects uses the device's CPU complex merely for configuration and supervision of an underlying high-throughput FPGA algorithm. The focus of this contribution on the other hand is on FPGA-SoC projects where the time-constrained collaboration of CPU- and FPGA-based algorithm features is key. When partitioning an FPGA-SoC accordingly, one design choice to make is the CPU-FPGA interaction mechanism, which can be explicit information transfer across AXI or shared DDR memory sections, or both. Furthermore, typical FPGA-SoC's contain multiple CPU cores and types which rely on shared features such as cache and a common interconnect which can become bottlenecks or sources of unexpected behavior in general. The CPU-side decision between Linux (with or without real-time patch) versus bare metal is a tradeoff between convenience and determinism, although for many applications Linux offers sufficient real-time performance. Finally, the unique ability of FPGAs to perform clock-accurate probing of CPU-FPGA interaction signals can be used for profiling. The various mentioned aspects will be demonstrated on mid-range platforms of at least three vendors, by example of an FPGA image feature detection algorithm running on a live video stream. In this example, FPGA-based processing is complemented by multi-frame CPU-based high-level analysis. Results of this analysis are fed back into the delayed video stream while at the same time, image meta-information is output via Ethernet and OPC UA, a standard protocol used in industrial automation.

Level: Intermediate

How to Use the Tensor Mode of the DSP Blocks in ALTERA Agilex 5 FPGAs | AI / ML
Armin Faems
Arrow Central Europe GmbH

Description:
The Altera Agilex 5 FPGAs and SoCs are the first midrange or edge-centric FPGAs with AI tensor block, making it the ideal choice for edge AI applications. What are the exact differences between fixed, floating and Tensor mode and be able to determine when to use one mode over another. Compare the results of a design in fixed and floating Tensor mode, based on the ALTERA Agilex 5 FPGA.

Level: Beginner

Project-Based and Non-Project-Based Scripting in Vivado | Tools & Methodologies
Ernst Wehlage
plc2 GmbH

Description:
For AMD Adapitive SoCs and FPGAs the hardware design flow based on RTL languages requires AMD Vivado tool suite. A very important differentiation of the design tool flow is the distinction between project-based and non-project-based build flow and project management.

For team based designs there is a great flexibility in the tools which allows each user to do individual unit development based on these two flows. You will learn why the project based flow can provide more value to designers in the development phase, while the non-project-based flow might be better for a shorter runtime. And we also talk about scripting.

Learn the pros and cons with two different project management flows in the Vivado Tool Suite.

Level: Intermediate

How to Transfer the Shortest Packets at Sustained 400 Gbps over PCIe | Board Design & Connectivity
Lukáš Kekely
DynaNIC GmbH

Description:
With the increasing demands for high-performance networking, FPGA-based network cards have proven to be fast and flexible solutions in dynamic environments. The capabilities of nowadays cards can sustain packet processing performance of 400 Gbps or even higher.
Many applications require such throughputs to be also sustained between the host PC and the card itself. But, a problem in efficiency arises here when handling smaller packets, like Ethernet frames with sizes of 64–512 bytes. The achievable throughput for such packets is usually limited by the overhead in the host computer's transfer layer, resulting in significant performance losses of up to several tens of percent for the smallest packets.

This presentation will explore the limitations of the transfer layer in PCIe and DMA for DPDK using a common approach. Then it will provide practical insights into addressing these challenges with our specific newly proposed solution. Furthermore, we will present our performance and resource/area evaluation results obtained on real hardware platforms from various FPGA and card vendors. We will then discuss the flexibility of the solution, including its adaptability to meet the specific requirements of various network applications and hardware platforms. Overall, this presentation will provide valuable insights into throughput optimization for developers, researchers, and industry professionals interested in any kind of FPGA-based SmartNIC or high-speed network traffic processing.

Level: Intermediate

Agentic AI Design on AMD Ryzen AI | Hands-on Tutorial
Jens Stapelfeldt & Mario Ruiz
AMD

Description:

Agenda:

What is Agentic AI? - Autonomy, tool use, goals, planning, and memory
Ryzen AI Overview - NPU architecture, performance edge, energy efficiency
Intro to AMD Lemonade Server - What it is, why it matters (low-latency local inference); Open-source deployment flexibility
What is RAG? - RAG components: retriever, vector store, generator; How agents use RAG for context-aware memory
Demo / Tutorial and work thought: Local RAG agent with AMD-optimized model using Lemonade Server

Building trustworthy and capable agentic AI systems requires a foundation of open innovation, scalable compute, and transparent design principles. In this talk, we explore how open-source initiatives — with AMD as a key driver — are accelerating the development of autonomous AI agents. We will highlight how AMD’s latest Ryzen™ AI processors, with integrated NPU, provide optimized on-device compute for real-time inferencing and agentic workloads.

Additionally, we will dive into Lemonade Server, AMD’s open-source framework designed to simplify and democratize access to AI model deployment and orchestration. Through real-world use cases, we will demonstrate how open software, and collaborative ecosystems are critical for advancing safe, efficient, and adaptable agentic AI architectures. Open-source is no longer optional — it is essential for the future of agent autonomy.

Level: Beginner/Intermediate
Duration: 90 mins

11:40 - 11:50 am

Short break and option to change rooms

11:50 am - 12:30 pm

In-Vehicle Network - Automotive Zonebased Architecture with Time Sensitive Network - AutoTSN | Application
Maximilian Sokol & Andreas Braun
Missing Link Electronics GmbH

Description:
Automotive architectures are transforming: while more and more sensors become integrated in vehicles, the automotive industry is looking for ways to reduce wiring efforts in production, more scalability, higher level of integration and faster ways development.
Auto/TSN stands for automotive data over Time-Sensitive Networks which is an in-vehicle network infrastructure based on open standards such as IEEE Ethernet.
Auto/TSN virtualizes the in-vehicle network infrastructure: Key objective is to reduce costs, increase scalability and enable upgradability for next-generation automotive architectures including electric and/or autonomous vehicles.
The Presentation will show how a zone based architecture can look like in comparison to the "classic” wiring. It will explain the tasks of a zone gateway and why FPGA/Soc play a major role in sensor fusion. Further more why it is important to use middleware which turns devices in a service for a central car server and other ECUs.For visualization, we will show examples of the government funded CeCaS research project and show the complete chain from camera sensors over zone gateways to the central car server.

Level: Beginner/Intermediate

Versatile Vision AI Framework Lets You Focus on AI Development | Embedded / Vision
Gordan Galic
Xylon d.o.o.

Description:
The Xylon logicBRICKS Vision AI Framework offers an efficient and streamlined approach to AI inference on AMD® Versal™ adaptive SoCs. Designed to optimize AI processing, this framework integrates hardware-accelerated pre- and post-processing, ensuring high-performance execution with a deterministic pipeline and guaranteed system latency. Attendees will learn how to leverage Xylon's configurable vision AI framework to accelerate AI development without needing to dive deeply into tools and SoC technical details or retrain on new datasets. The framework supports advanced video data pre- and post-processing, including complex single and composite video transformations, making it ideal for vision-based AI applications. Additionally, its compatibility with AMD Versal™ and Zynq™ UltraScale+ SoCs ensures flexibility across various embedded AI deployments. The presentation will feature a live demonstration of an AI Surround View Monitor running on the AMD Versal VEK280 Evaluation Kit, highlighting the real-world capabilities and benefits of the Xylon logicBRICKS Vision AI Framework.

Level: Intermediate

AI on the Edge: Insights into NN acceleration | AI / ML
Alexander Flick
plc2 GmbH

Description:
To target neural networks with their typical massive compute effort the acceleration in hardware and specifically with FPGAs can offer a natural fit. The AMD Adaptive SoC families even support the application level on top of this pure computation. The ecosystem offers different approaches with various tools and implementation solutions. Vitis AI easily spans the various subfamilies of these architectures with a scalable IP to map out pretrained convolutional neural networks. It also provides a generic runtime stack to inline inference into your application. For more specific operators or even customizable physical implementations there are various accelerator libraries available, one of which is FINN
In this talk, we will present the options and compare the solution space and concepts when approaching an acceleration task from the Vitis AI approach and a FINN project.

Level: Intermediate/Expert

Multi-Run Management Using Vivado | Tools & Methodologies
Ernst Wehlage
plc2 GmbH

Description:
The AMD Vivado Tool Suite includes a Synthesis and Implementation build process: The run management. These processes typically consist of many individual runs that are highly customizable through the use of compiler options, strategies, and constraint files. Good multi-run management means that the designer should not simply overwrite and lose the intermediate builds, but should retain settings, builds, and reports from well-established runs during the incremental development phase.

Easier incremental design changes including constraining and compiler options in the Vivado Tool Suite will be described and discussed. A better methodology also means higher productivity managing the builds with their options in the Vivado Tool Suite, this will be shown and described in this session.

You can improve the incremental steps of hardware development through multi-run management, maintain good intermediate results, and move on to the next step of a design change.

Level: Expert

Doubling RFSoC ADC Rate from 5 Gsps to 10 Gsps | Board Design & Connectivity
Dr. Harry Commin
Enclustra GmbH

Description:
In this presentation, we show how three ADCs can be digitally combined to produce a single data stream with double the ADC’s maximum sample rate. We demonstrate that this “frequency interleaving” technique requires no special analog circuitry, besides ordinary anti-aliasing filters. We share the challenges faced and practical results from our research project, implemented on AMD RFSoC. Developed in VHDL, verified with VUnit.

Level: Expert

Agentic AI Design on AMD Ryzen AI | Hands-on Tutorial
Jens Stapelfeldt & Mario Ruiz
AMD

Description:

Agenda:

What is Agentic AI? - Autonomy, tool use, goals, planning, and memory
Ryzen AI Overview - NPU architecture, performance edge, energy efficiency
Intro to AMD Lemonade Server - What it is, why it matters (low-latency local inference); Open-source deployment flexibility
What is RAG? - RAG components: retriever, vector store, generator; How agents use RAG for context-aware memory
Demo / Tutorial and work thought: Local RAG agent with AMD-optimized model using Lemonade Server

Building trustworthy and capable agentic AI systems requires a foundation of open innovation, scalable compute, and transparent design principles. In this talk, we explore how open-source initiatives — with AMD as a key driver — are accelerating the development of autonomous AI agents. We will highlight how AMD’s latest Ryzen™ AI processors, with integrated NPU, provide optimized on-device compute for real-time inferencing and agentic workloads. Additionally, we will dive into Lemonade Server, AMD’s open-source framework designed to simplify and democratize access to AI model deployment and orchestration. Through real-world use cases, we will demonstrate how open software, and collaborative ecosystems are critical for advancing safe, efficient, and adaptable agentic AI architectures. Open-source is no longer optional — it is essential for the future of agent autonomy.

Level: Beginner/Intermediate
Duration: 90 mins

12:30 - 1:30 pm

Lunch break

1:30 - 2:10 pm

Porting and Optimizing CVA6 to Altera FPGAs | Application
Angela Gonzalez
PlanV GmbH

Description:
CVA61 is a popular open source RISCV CPU from OpenHWGroup. It provides different capabilities like single or dual issue, various RISC-V extensions and Linux Support. Initially, CVA6 was designed for ASIC targets. However, the interest to have an optimized version for FPGA deployment emerged among the community members. The OpenHW CVA6 repository initially provided a reference design to implement CVA6 on an FPGA with a set of minimal peripherals. This design was prepared for the Digilent Genesys II board, featuring a Kintex 7 FPGA from Xilinx. Previous work done by Thales optimized CVA6 for FPGAs targeting this Xilinx technology. In this talk, we would like to present the work done in collaboration with Thales, to port the existing FPGA reference design of CVA6 to Altera technology, in particular, the Agilex 7 platform. Additionally, we also migrated the existing FPGA optimizations. Previous work had identified some technology agnostic optimizations, and some technology specific ones. The technology specific ones can be used in Xilinx or Microchip FPGAs, but they can’t be reused in Altera technology. This is because most of the optimizations are moving flip-flops (FFs) to RAM blocks, and the memory primitives in Altera FPGAs differ from the other vendors. Mostly, Altera FPGAs do not provide asynchronous RAM primitives, which were extensively used in the above optimizations. In this context, we adapted the code of CVA6 to guide Quartus towards inferring synchronous RAM memories instead of flip-flops, in order to be able to benefit from the optimizations in Altera technology too.
We would like to present the journey of this adaptation, sharing the key differences between technologies and the main insights identified. The results achieved show a similar reduction of resources in Xilinx and Altera FPGAs (-30% of FFs), showing that the optimizations were successfully migrated. Both the design for Agilex 7 and the Altera optimizations have been contributed to the OpenHW repository and are currently available to the community. The next step will be to add Linux support.

Level: Intermediate

FPGA-Based Highly Configurable and Low-Latency Multi GMSL Camera 10GbE RTP Streaming: Performance and Design Choices | Embedded / Vision
Alin-Tudor Sferle
Analog Devices

Ulrich Langenbach
Missing Link Electronics GmbH

Description:
In the decade of high-performance networking and computing, FPGAs have arisen as a promising and highly convenient solution, offering flexibility, reprogramming capacity and parallelism options. The role of high-performance solutions that offer a high throughput in network-related operations is extremely beneficial in real-time processing tasks executed on embedded systems, such as real-time video streaming. This presentation showcases the capabilities and obtained performance of our FPGA-based
high-speed multi GMSL camera to RTP streaming solution using a single 10GbE link. Thus, there will be a walk through the multi GMSL Ser/Des integration, and the FPGA-powered components overview: the CSI-2 to RTP streams translation and the multi-GMSL camera synchronization, accompanied by the highly configurable and low-latency UDP/IP network accelerator. The described high-performance data path is integrated with the on-chip CPU subsystem to provide time synchronization via PTPv2 and enable control and monitoring of the device via the network. At the end of the presentation, we emphasize the most important design choices to build such a multi-camera streaming system. We finally draw the conclusions and the lessons learned from that successful experience.

Level: Intermediate

Hyperparameter Tuning of FPGA-Accelerated Convolutional Neural Network Inference | AI / ML
Tobias Genzinger
Ingenics Digital GmbH

Description:
This lecture presents BOFIT (Bayesian Optimization for FPGA-accelerated Inference Tuning), a procedure to find optimal hyperparameter configurations for deep learning inference execution on FPGAs. BOFIT optimizes the hyperparameters of a convolutional neural network architecture and the hardware configuration of the FPGA simultaneously to obtain high-quality trade-offs between the competing design goals like high accuracy and low energy consumption. Using neural networks in embedded systems is challenging due to application constraints such as energy limitations, availability of computing resources, and low latency requirements. The overall performance and efficiency of the inference run depend on the application, the model architecture, and the used hardware. FPGAs enable customized hardware development to optimize the execution of the inference in embedded systems. Frameworks for inference execution on FPGAs, like Vitis AI, enable running neural networks on configurable hardware in an automized way. Using these frameworks and optimization algorithms, good performing configurations of the model and the hardware can be found to meet the design goals without needing to simulate the whole system accurately. A suitable optimization algorithm for this problem is Bayesian optimization. The lecture will give an overview on how to find optimized hyperparameter configurations by utilizing frameworks for inference execution on FPGAs and Bayesian optimization.
BOFIT is evaluated using the example of Vitis AI. In the experiments, hyperparameter search using Bayesian optimization shows a better trade-off quality between the design goals compared to random search. Scenarios are defined to emulate application requirements that show that BOFIT provides higher accuracy than random search when the inference speed or energy consumption is severely constrained. It is also shown that BOFIT can achieve comparable energy efficiency than a graphics processing unit.

Level: Intermediate

Vitis HLS: From Scratch up to New Features | Tools & Methodologies
Ernst Wehlage
plc2 GmbH

Description:
The AMD High-Level Synthesis tool is now a tightly integrated part of the AMD Unified Vitis tool suite. Even hardware and software designers can program hardware components in AMD technologies using C/C++ and get well-optimized code without the need for VHDL or Verilog programming. Legacy C/C++ code can be used with a higher level of abstraction that requires tool Directives to optimize a build for higher efficiency or better performances. Specific API based on HLS data types or C++ templates can be used to achieve high levels of optimization instantly without user needs to optimize such codes.

This seminar describes the tool use from the scratch, showing examples of coding, describes the build flow and shows ways of validation and the need to integrate created modules in Vitis or Vivado projects.

Level: Beginner/Intermediate

Developing Ethernet Solutions with FPGAs | Board Design & Connectivity
Gianluca Mariani
Lattice Semiconductor

Description:
Developing Ethernet solutions with Lattice FPGAs involves leveraging their low power, high performance, and flexibility. Lattice provides a comprehensive suite of tools and IP cores for Ethernet implementation,
including support for various Ethernet standards. These resources enable rapid prototyping and deployment of Ethernet interfaces in applications such as industrial automation, networking, and consumer electronics.

This presentation provides an overview of the Ethernet technology, and demonstrates the building blocks from Lattice that can be used to develop an Ethernet solution for your application. For various standards examples for use cases, implementation and hardware platforms are given.

Level: Intermediate

2:10 - 2:20 pm

Short break and option to change rooms

2:20 - 3:00 pm

How to Drive Parallel High-Speed Circuits from an AMD FPGA | Application
David Kirchner
World of FPGA

Description:
This presentation covers the topic of driving high-speed circuits. These circuits can include analog digital converters, displays, or other circuits with a parallel bus interface. Often, the FPGA designer does not have the
flexibility to choose the I/O standard, necessitating the design of an appropriate connection between the FPGA and the circuit. Another focus is on the internal structure of the FPGA. For high-speed operations, serial processing of parallel data is not feasible; instead, parallel processing is required. Additionally, it is necessary to reorder this data before sending it to the high-speed circuit. Clocking networks and resources are crucial for achieving proper resolution.
Following a brief introduction, this presentation will methodically explore these steps. It will begin from the exterior of an AMD FPGA and proceed inward through the I/O pins and resources to the processing component.
The presentation will examine clocking networks and resources in detail. It will also discuss the advantages and disadvantages of various structures. Data reordering will be illustrated using an existing project (anonymously) to provide insight into necessary changes to the processing structure.

Level: Intermediate

Seamless Integration of Image Processing and ML Acceleration: AMD Ultrascale+ MPSoC and Hailo in Action | Embedded / Vision
Stanislaw Klinke
EBV Elektronik GmbH & Co. KG

Description:
The increasing need for real-time, high-throughput image processing and machine learning (ML) acceleration has driven the integration of specialized hardware in embedded systems. In this presentation, we will demonstrate how combining the power of AMD Ultrascale+ MPSoC with the Hailo ML Accelerator creates an optimized solution for image pre- and post-processing along with ML model acceleration. We will delve into the synergy between the versatile MPSoC, which efficiently handles image data flow, and the Hailo ML accelerator, which is designed to accelerate the execution of deep learning models, achieving significant performance gains. Attendees will gain insights into the integration process, from image data acquisition to efficient pre- and post-processing, as well as how the Hailo accelerator is leveraged to offload the ML workloads. A live demo will showcase the step-by-step implementation of this solution, including key considerations for hardware and software integration, as well as the performance improvements achieved. This session is ideal for developers and engineers looking to enhance their embedded systems with powerful image processing and ML acceleration.

Level: Intermediate

SpikeSteg: Interconnecting Steganography with FPGA Technology Using Spiking Neural Networks | AI / ML
Dr. Pedro Machado
Nottingham Trent University

Description:
Over the years, the advancement of cognitive computing has sparked significant interest in spiking neural networks (SNNs) due to their biologically inspired design and computational efficiency.
These networks offer a novel paradigm for addressing challenges in various fields. In parallel, the rise of digital technologies has placed immense importance on data security, data compression, necessitating advancements in cryptographic and steganographic techniques to mitigate potential threats.Despite the vast body of research in both SNNs and steganography, little to no studies to date have investigated the integration of these two areas. This presentation introduces a pioneering algorithm, SpikeSteg, which leverages the power of SNNs for steganographic applications. By embedding discrete spikes within an image, the SpikeSteg algorithm generates a stego-image that is robust against conventional steganalysis techniques.
The implementation of SpikeSteg on the Sundance VCS3 platform, featuring the AMD Zynq UltraScale+ FPGA, demonstrates its potential for resource-constrained environments, including wearable medical devices. This integration ensures computational efficiency and robustness, making it an attractive solution for scenarios requiring secure and efficient data embedding. Key findings from the study highlight that SpikeSteg outperforms or matches traditional steganography techniques across metrics such as image capacity, robustness, and computational efficiency. These results underscore the promise of this approach for future applications and research directions.The presentation will cover the architecture of SpikeSteg, its FPGA implementation on the Sundance VCS3, and its implications for cognitive computing, secure data embedding, and resource-efficient systems.

Level: Intermediate/Expert

Direct MATLAB® to AMD Vitis HLS Workflow for High-Performance Results | Tools & Methodologies
Derek Hagen
AMD

Description:
MathWorks and AMD have collaborated to introduce an innovative high-level synthesis (HLS) workflow that revolutionizes the transformation of high-level algorithms in MathWorks MATLAB® code into highly optimized synthesizable C++ code, tailored for Vitis™ HLS to target AMD FPGA and adaptive SoCs. This conference paper delves into the workflow, highlighting the use of HDL Coder to convert MATLAB code into Vitis™ HLS ready C++ code. The generated synthesizable C++ code becomes the primary input to the AMD Vitis HLS tool to generate optimized RTL. This robust workflow empowers system engineers with early and accurate insights into performance and area metrics, streamlining the journey from MATLAB code to high performance hardware.

Introduction: The rapid evolution of AMD FPGA and AMD SoC devices like MPSoCs, RFSoCs, and Versal™ adaptive SoCs demand an efficient design methodology that can keep pace with the increasing complexity of these systems. The collaboration between MathWorks and AMD presents a cutting-edge Vitis™ HLS workflow that bridges the gap between high-level floating-point designs all the way to generating RTL optimized for AMD boards and devices. Leveraging the strengths of MATLAB® floating-point to fixed-point workflows, the codegen C++ is suitable for the AMD Vitis™ HLS tool. This workflow not only accelerates the RTL development but also improves time to market with quality and efficiency for the resulting hardware implementation.

Level: Intermediate

48V Power Solutions for Modern FPGAs: Featuring Integrated Power Modules | Board Design & Connectivity
Nicolay Garcia & Tomas Hudson
Monolithic Power Systems (MPS)

Description:
This presentation demonstrates Power Solutions for FPGAs powered by 48V buses, highlighting the advantages of MPS Integrated Power Modules. By utilizing module solutions for 48V buses, designers can enhance system efficiency, reduce board space, and improve thermal management while cutting distribution losses.

Level: Intermediate

3:00 - 3:30 pm

Coffee break

3:30 - 4:10 pm

OSAT – Outsourced Assembly and Test - Services Around the Semiconductor Production in EU | Application
Thomas Kuhn
HTV Halbleiter-Test & Vertriebs-GmbH

Description:
What‘s this buzzword OSAT?
What‘s the situation in Europe?
What‘s HTV doing there (FPGA and ASIC testing)? OSAT+?
What‘s to remember?

Level: Beginner

Implementation of MIPI-CSI2 in Altera Agilex 5 FPGAs | Embedded / Vision
Armin Faems
Arrow Central Europe GmbH

Description:
Explanation on how to implement a MIPI-CSI2 video path from a camera to a frame buffer located in the LPDDR4, based on a reference design.

Level: Beginner

Bridging the Gap: Get Any Sensor into Nvidia GPUs | AI / ML
Martin Kellermann & Brian Colgan
Microchip Technology GmbH

Description:
Data-intensive algorithms is an area that overlaps between FPGAs and GPUs. FPGAs excel in their flexibility on interfacing to a vast range of data-sources and in low latency processing, GPUs peak in software flexibility and raw processing horsepower, however, are limited to only a few very defined interfaces like PCIe® or high-end Ethernet. How can you get the best of both worlds, the flexibility of an FPGA coupled with the ease of use and large library support of GPUs? Microchip has partnered with NVIDIA on the Holoscan AI sensor platform, to unite the benefits of both the FPGA and the GPU. Learn what this Holoscan platform is, how you can get access and how you can easily bridge your data into the NVIDIA GPUs.

Level: Intermediate

Partial Configuration: Is It Useful or Just an Academic Toy? | Tools & Methodologies
Prof. Dirk Koch
Universität Heidelberg

Description:
Partial reconfiguration (PR) is around basically as long as FPGAs exist. While in the early days, PR was used to implement complex systems on small devices, today the main use case is bootstrapping in order to hide configuration latency of large devices. However, PR is more: it’s the base to implement operating system services on FPGAs. This will become relevant with increasing device capacity and system complexity. In other words: partial reconfiguration is key for everybody who wants to use FPGAs for more than just as an updatable ASIC.

This talk will show use cases for PR, including advanced debugging, resource management, load balancing and composing accelerator pipelines for problems only known at runtime. The latter will be shown for an SQL database acceleration example where an FPGA is abstracted away in such a way that a user can just fire a query that will then be taken on by a runtime system for stitching partial modules together, initializing them and orchestrating the execution of the query execution. For large enough queries, that justify the configuration overhead, we can execute faster and/or with less resources than comparable statically systems that are prone to overprovisioning. Another application domain where PR is particularly useful is security, as we will show.

Of course, using PR requires a robust flow and we will cover technology capabilities and limitations we must consider when implementing runtime reconfigurable systems. We will highlight restrictions in vendor flows and what vendor tools do reliably support and reveal how open-source tools can help to overcome restrictions in vendor tool capabilities.

Level: Intermediate

Holistic Approach for Managing Transceiver Designsin Microchip FPGAs | Board Design & Connectivity
Dr. Aurang Zaib
Microchip Technology GmbH

Description:
Transceivers are critical components of FPGA architecture, essential for establishing high-speed external interfaces. Designing applications with transceivers necessitates careful consideration of factors such as signal integrity, speed and cross-interference, all of which demand substantial development effort and time. Fundamentally, transceivers comprise two primary hard blocks: an analog front end known as the Physical Media Attachment (PMA) and a digital backend referred to as the Physical Coding System (PCS).
FPGA vendors provide various methods and tools to select appropriate transceiver modes and settings to control PMA and PCS blocks, tailored to specific application requirements. Additionally, vendors offer tools
and techniques such as eye diagrams, loopback testing and test pattern monitoring to enhance design productivity.

This presentation will cover these topics, offering developers a holistic approach to transceiver design, ultimately leading to improved design productivity. The discussion will utilize Microchip FPGAs as a case
study.

Level: Intermediate/Expert

4:10 - 4:20 pm

Short break and option to change rooms

4:20 - 5:00 pm

The Golden Cage of FPGA Vendor Lock-In | Application
Mihaly Nemeth-Csoka
Heitec AG

Description:
Vendor lock-in is a major challenge for many development projects. Once dependent on a single vendor, switching is difficult without significant effort. This talk aims to explore the implications of vendor lock-in, its impact on FPGA projects, and potential strategies to mitigate these issues by examining case studies and current industry practices. Vendor lock-in can be the result of proprietary design tools, unique IP cores, and vendor-specific hardware features. The lack of interoperability between different vendors' tools and products amplifies this issue, resulting in increased costs, reduced innovation, and limited flexibility.
The presentation covers vendor lock-in in four steps:

Identify the primary factors that contribute to vendor lock-in in FPGA design. The motivation of vendors will be examined, as well as the evolution of vendor lock-in in the recent past, when tools seem to have become more open.
Analyze the impact of vendor lock-in on project cost, schedule and innovation. Identify which types of project suffer the most from vendor lock-in and where there is no need for concern.
Compare vendor lock-in in FPGA development to other fields, such as embedded software and cloud services.
Explore strategies and best practices for mitigating vendor lock-in in FPGA projects. Strategies for legacy projects and IP cores will be discussed, as well as how methods such as HLS or AI perform from a vendor lock-in perspective.

This talk provides inspiration for projects of all sizes and stages: Developers are shown how to increase their own market value by making their skillsets less tool and vendor dependent.
Managers will have a better understanding of the impact of vendor lock-in through the identification of typical pitfalls.

Level: Beginner

MIPI CSI-2/DSI with Efinix FPGA Families | Embedded / Vision
Maximilian Werner
Efinix GmbH

Description:
In this session we will explain which Efinix FPGAs can be used for a MIPI CSI-2 or MIPI DSI project. What are the advantages of the hard MIPI implementation and what are the advantages of the soft MIPI implementation of Efinix. Furthermore, we will also show how you can configure our different MIPI solutions with our different FPGA families Trion, Topaz and Titanium using our free software Efinity. Another point in this presentation will be our example design for MIPI CSI-2 and MIPI DSI,
which will give you a good starting point for your own MIPI application and how to use our MIPI core. There are also some UserGuides and application notes that can be helpful to check your MIPI settings such as data rate, resolution and pixel clock. At the end you will have an overview of the MIPI solutions from Efinix and also which Evalboard you can use for your MIPI application.

Level: Beginner

Accelerating Intelligent Edge Systems with Lattice Semiconductor and NVIDIA | AI / ML
Karl Wachswender
Lattice Semiconductor GmbH

Description:
The core of this solution is the Sensor Bridge Reference Design, implemented on the CertusPro-NX Sensor to Ethernet Bridge Board. This design supports low latency, flexible sensor configuration and interfacing, and includes an Ethernet Packetizer. It seamlessly integrates with the NVIDIA Holoscan Sensor Bridge, providing a comprehensive and easily programmable system control.

Key Features:

Flexible Sensor Configuration: The design supports a wide range of sensor input interfaces and protocols, ensuring adaptability to various application needs.
Low Latency: Optimized for minimal delay, the solution ensures real-time data processing and transmission.
Ethernet Packetizer: Facilitates efficient data packetization for transmission over Ethernet networks.

Integration with NVIDIA Holoscan

The Sensor Bridge Reference Design pairs effortlessly with NVIDIA Holoscan, leveraging NVIDIA IGX Orin and Orin AGX platforms. This integration offers:

Configurable FPGA IP: Ready-to-use IP blocks for rapid deployment.
Full Stack Solution: Comprehensive support for data acquisition and processing.
Programmable System Control: Easily programmable controls for system management.

API and Interface Support

The solution includes a standard API that supports:

Streaming DMA: Efficient data transfer mechanisms.
Control Interfaces: Simplified system control and management.
Transport Abstraction Layer: Flexibility in data transport methods.
ConnectX Smart NIC Acceleration: Enhanced performance with GPUDirect RDMA.
Linux Sockets: Compatibility with Linux-based systems.
Sensor and Driver Integration: Support for a variety of sensors and drivers.

Conclusion

Level: Intermediate

Wireguard FPGA Advanced Co-Simulation Verification Environment | Tools & Methodologies
Simon Southwell
Wyvern Semiconductors

Description:
A presentation on the advanced cosimulation features of the Wireguard FPGA project’s logic simulation verification using opensource VIP. Discussed is the use of the VProc virtual processor, allowing software to be co-simulating with the RTL in a Verilator logic simulation. The use of auto-generated CSR hardware abstraction layer software is used to achieve targeting application code compilation for both native and RISC-V builds is explained, with a common API for both. A co-simulated sparse memory model in C is used to provide a large memory space, with an API for use by co-simulated code and an HDL component to give access to the same memory space from the logic. The use of a GbE UDP/IPv4 GMII model is explained, also based around VProc, for driving the DUT’s ethernet ports, with the code running on these blocks also having access to the memory model’s API, allowing end-to-end closing of the verification loop. Finally, a PHY MDIO module is also provided, with register accesses targeting the memory model address space using a mem_model HDL block.

Level: Intermediate

Power Inductor for Point of Load Buck Converter Applications to Support FPGAs | Board Design & Connectivity
Michael Freitag
YAGEO Group - KEMET Electronics GmbH

Description:
Latest trends in power conversion like 12V to 1V DC with hundred´s of Ampere use either single turn power inductors, TLVR´s or coupled inductors. To avoid large core losses high saturation ferrite materials are mainly used but have some disadvantages like physical airgaps (EMI emissions) and hard saturation (no inductance at overload conditions). New soft magnetic materials (Nanomet®) are improving the performance and keep the core losses on an equal level. The presentation will cover an overview about applicable designs and differences in performance, benefits of fast load change responses with TLVR designs and an outlook what will become applicable

Level: Intermediate

* subject of change

Program 2025 - DAY THREE *

Industrial IoT Challenges and Solutions for Real Time Communication | Application

Heterogeneous Design Flow with Adaptive SoCs | Embedded / Vision

Introduction to Altera® FPGA AI Suite for Altera® FPGAs and Altera’s New DSP Block Architecture | AI / ML

Bridging the Gap: Integrating Software Lifecycles with FPGA Development | Tools & Methodologies

GateMate FPGA: Qualification for Radiation-Tolerant Applications | Board Design & Connectivity

Agentic AI Design on AMD Ryzen AI | Lecture

Leveraging AI Engine in Versal Devices for Optimized DSP Applications | Application

System Device Tree: Improved Support for Multicore Systems with AMD Adaptive SoCs | Embedded / Vision

Ultra-Fast, Customized CNN AI on Any Altera FPGA | AI / ML

Let Systhesis Do Its Job But Check the Results | Tools & Methodologies

GateMate FPGA: High-Speed Transceiver (SerDes) Hands-On | Board Design & Connectivity

Agentic AI Design on AMD Ryzen AI | Lecture

Low Latency Optimized TCP Stack for High Frequency Trading on FPGA | Application

Implementing and Profiling Collaborative CPU-FPGA Projects with Real-Time Requirements | Embedded / Vision

How to Use the Tensor Mode of the DSP Blocks in ALTERA Agilex 5 FPGAs | AI / ML

Project-Based and Non-Project-Based Scripting in Vivado | Tools & Methodologies

How to Transfer the Shortest Packets at Sustained 400 Gbps over PCIe | Board Design & Connectivity

Agentic AI Design on AMD Ryzen AI | Hands-on Tutorial

In-Vehicle Network - Automotive Zonebased Architecture with Time Sensitive Network - AutoTSN | Application

Versatile Vision AI Framework Lets You Focus on AI Development | Embedded / Vision

AI on the Edge: Insights into NN acceleration | AI / ML

Multi-Run Management Using Vivado | Tools & Methodologies

Doubling RFSoC ADC Rate from 5 Gsps to 10 Gsps | Board Design & Connectivity

Agentic AI Design on AMD Ryzen AI | Hands-on Tutorial

Porting and Optimizing CVA6 to Altera FPGAs | Application

FPGA-Based Highly Configurable and Low-Latency Multi GMSL Camera 10GbE RTP Streaming: Performance and Design Choices | Embedded / Vision

Hyperparameter Tuning of FPGA-Accelerated Convolutional Neural Network Inference | AI / ML

Vitis HLS: From Scratch up to New Features | Tools & Methodologies

Developing Ethernet Solutions with FPGAs | Board Design & Connectivity

How to Drive Parallel High-Speed Circuits from an AMD FPGA | Application

Seamless Integration of Image Processing and ML Acceleration: AMD Ultrascale+ MPSoC and Hailo in Action | Embedded / Vision

SpikeSteg: Interconnecting Steganography with FPGA Technology Using Spiking Neural Networks | AI / ML

Direct MATLAB® to AMD Vitis HLS Workflow for High-Performance Results | Tools & Methodologies

48V Power Solutions for Modern FPGAs: Featuring Integrated Power Modules | Board Design & Connectivity

OSAT – Outsourced Assembly and Test - Services Around the Semiconductor Production in EU | Application

Implementation of MIPI-CSI2 in Altera Agilex 5 FPGAs | Embedded / Vision

Bridging the Gap: Get Any Sensor into Nvidia GPUs | AI / ML

Partial Configuration: Is It Useful or Just an Academic Toy? | Tools & Methodologies

Holistic Approach for Managing Transceiver Designsin Microchip FPGAs | Board Design & Connectivity

The Golden Cage of FPGA Vendor Lock-In | Application

MIPI CSI-2/DSI with Efinix FPGA Families | Embedded / Vision

Accelerating Intelligent Edge Systems with Lattice Semiconductor and NVIDIA | AI / ML

Wireguard FPGA Advanced Co-Simulation Verification Environment | Tools & Methodologies

Power Inductor for Point of Load Buck Converter Applications to Support FPGAs | Board Design & Connectivity