답변게시됨2개월 전Last edited 지난달19 소스

프랙타일이 AI ‘추론 병목’을 해결하려는 이유

영국 AI 칩 스타트업 프랙타일이 AI 추론 병목 문제를 해결하기 위해 2억2000만 달러 규모의 시리즈 B 투자를 유치했다. 이 회사는 연산과 메모리를 같은 칩에서 처리하는 ‘인메모리 컴퓨팅’ 구조를 통해 데이터 이동을 줄이고 추론 속도와 효율을 높이려 한다.

Studio Global AI로 검색 및 팩트체크 인기 페이지 더 보기

Concept illustration of AI inference hardware integrating memory and compute — How is UK AI chip startup Fractile addressing the growing AI inference bottleneck, what did its $220M Series B funding involve, why does theFractile is developing AI chips designed to perform computation directly within memory to reduce inference latency and cost.
AI 프롬프트
Create a landscape editorial hero image for this Studio Global article: How is UK AI chip startup Fractile addressing the growing AI inference bottleneck, what did its $220M Series B funding involve, why does the. Article summary: Fractile is attacking the inference bottleneck with specialized AI inference hardware that moves compute much closer to memory, rather than relying on conventional GPU designs that shuttle model data between separate com. Topic tags: general, general web, user generated. Reference image context from search candidates: Reference image 1: visual subject "# Fractile United Kingdom ## Why Fractile matters #### Summary Fractile has raised $220 million in a Series B funding round led by Accel, Factorial Funds, and Founders Fund, wi" source context "Fractile raised $200M | AI Chips | MapCo" Reference image 2: visual subject "Founded in 2022, Fractile aims to address t
openai.com

AI 산업은 지난 몇 년 동안 더 큰 모델을 학습(training)하는 경쟁에 집중해 왔다. 하지만 이제 업계에서는 다른 문제가 점점 더 중요해지고 있다. 바로 훈련된 모델을 실제 서비스에서 빠르고 저렴하게 실행하는 ‘추론(inference)’ 단계다.

영국 런던 기반 스타트업 **프랙타일(Fractile)**은 바로 이 문제를 해결하겠다는 목표로 등장했다. 회사는 최근 2억2000만 달러 규모의 시리즈 B 투자를 유치해, 대형 AI 모델의 추론 성능을 크게 높이기 위한 전용 칩 개발에 나섰다.

프랙타일의 핵심 주장은 단순하다. 앞으로 AI 발전의 다음 한계는 모델 자체가 아니라 그 모델을 실제 서비스에서 얼마나 빠르고 저렴하게 실행할 수 있느냐라는 것이다.

왜 AI의 진짜 병목은 ‘추론’이 되는가

오늘날 대부분의 AI 인프라는 모델 학습에 최적화돼 있다. GPU는 대규모 병렬 연산을 처리하는 데 뛰어나기 때문에 대형 언어 모델을 학습하는 데 매우 효율적이다.

하지만 모델이 완성되어 서비스에 배포되면 상황이 달라진다. 이 단계에서는 사용자의 질문에 대해 토큰(token)을 하나씩 생성하며 답을 만들어내는 추론 과정이 계속 반복된다.

이때 중요한 것은 단순한 연산 성능만이 아니다. 오히려 더 큰 문제가 되는 것은 **메모리 대역폭과 지연(latency)**이다. 모델이 답을 생성할 때마다 수많은 가중치와 중간 데이터를 계속 읽어야 하기 때문이다. 하드웨어가 이 데이터를 충분히 빠르게 이동시키지 못하면, 아무리 연산 유닛이 빠르더라도 성능이 제한된다.

Studio Global AI

Search, cite, and publish your own answer

Use this topic as a starting point for a fresh source-backed answer, then compare citations before you share it.

Studio Global AI로 검색 및 팩트체크

사람들은 또한 묻습니다.