Gemini 1.5 Professional Now Obtainable in 180+ Nations; With Native Audio Understanding, System Directions, JSON Mode and Extra

Software Development

Gemini 1.5 Professional Now Obtainable in 180+ Nations; With Native Audio Understanding, System Directions, JSON Mode and Extra

lohitnath.453

April 10, 2024

Gemini 1.5 Professional Now Obtainable in 180+ Nations; With Native Audio Understanding, System Directions, JSON Mode and Extra

[ad_1]

Posted by Jaclyn Konzelmann and Megan Li – Google Labs

Seize an API key in Google AI Studio, and get began with the Gemini API Cookbook

Lower than two months in the past, we made our next-generation Gemini 1.5 Professional mannequin obtainable in Google AI Studio for builders to check out. We’ve been amazed by what the group has been capable of debug, create and be taught utilizing our groundbreaking 1 million context window.

At this time, we’re making Gemini 1.5 Professional obtainable in 180+ nations through the Gemini API in public preview, with a first-ever native audio (speech) understanding functionality and a brand new File API to make it simple to deal with information. We’re additionally launching new options like system directions and JSON mode to offer builders extra management over the mannequin’s output. Lastly, we’re releasing our subsequent era textual content embedding mannequin that outperforms comparable fashions. Go to Google AI Studio to create or entry your API key, and begin constructing.

Unlock new use circumstances with audio and video modalities

We’re increasing the enter modalities for Gemini 1.5 Professional to incorporate audio (speech) understanding in each the Gemini API and Google AI Studio. Moreover, Gemini 1.5 Professional is now capable of cause throughout each picture (frames) and audio (speech) for movies uploaded in Google AI Studio, and we look ahead to including API help for this quickly.

screen grab of a clooege professor using Gemini 1.5 Pro to create a quiz based on their latest lecture video in Google AI Studio

You’ll be able to add a recording of a lecture, like this 117,000+ token lecture from Jeff Dean, and Gemini 1.5 Professional can flip it right into a quiz with a solution key. [Video sped up for demo purposes]

Gemini API Enhancements

At this time, we’re addressing quite a lot of high developer requests:

1. System directions: Information the mannequin’s responses with system directions, now obtainable in Google AI Studio and the Gemini API. Outline roles, codecs, objectives, and guidelines to steer the mannequin’s conduct in your particular use case.

2. JSON mode: Instruct the mannequin to solely output JSON objects. This mode allows structured knowledge extraction from textual content or pictures. You will get began with cURL, and Python SDK help is coming quickly.

3. Enhancements to perform calling: Now you can choose modes to restrict the mannequin’s outputs, enhancing reliability. Select textual content, perform name, or simply the perform itself.

A brand new embedding mannequin with improved efficiency

Beginning as we speak, builders will be capable to entry our subsequent era textual content embedding mannequin through the Gemini API. The brand new mannequin, text-embedding-004, (text-embedding-preview-0409 in Vertex AI), achieves a stronger retrieval efficiency and outperforms current fashions with comparable dimensions, on the MTEB benchmarks.

table showing Gecko: Versativel Text Embeddings Distilled from Large Language Models

‘Textual content-embedding-004’ (aka Gecko) utilizing 256 dims output outperforms all bigger 768 dim output fashions on MTEB benchmarks

These are simply the primary of many enhancements coming to the Gemini API and Google AI Studio within the subsequent few weeks. We’re persevering with to work on making Google AI Studio and the Gemini API the simplest solution to construct with Gemini. Get began as we speak in Google AI Studio with Gemini 1.5 Professional, discover code examples and quickstarts in our new Gemini API Cookbook, and be a part of our group channel on Discord.

[ad_2]

Unlock new use circumstances with audio and video modalities

Gemini API Enhancements

A brand new embedding mannequin with improved efficiency

LEAVE A REPLY Cancel reply