Solved: Gemini Pro API - Unable to read context documents

SomebodySysop · 02-26-2024 10:12 PM

For 2 months after it's release, I struggled to figure out how to get the Gemini Pro API to read context documents sent in a RAG prompt. Finally got it to work a week ago. Then, just a few days ago, it stopped working again. My format is basically this:

"text": "<documents>context document list in xml format</documents>
System Message: blah blah
User Message: blah blah
"
And Gemini follows the system and user messages, but simply refuses to read or comprehend or remember what is in the document list.

This is true in the API as well as the playground:

https://makersuite.google.com/app/prompts?state=%7B%22ids%22%3A%5B%221CAkGmB7lvuRYqjFGr5tkVmFh0fUQXZ...

I have posted this on the Discord channel, but myself and one other person appear to be the only people using Gemini Pro for this sort of text intense application. No one ever has an answer, or even a suggestion, if we even get a response.

When pro was working, it was far superior to gpt-3.5-turbo-16k. Now it's useless. There has been a clear change somewhere. What's going on?

SomebodySysop

I've done even more testing, comparing the same questions and documents in the same format (XML) between Gemini Pro 1.0 and OpenAI gpt-3.5-turbo-16k. There is no comparison. Hands down, the gemin-pro API is far less capable of comprehending large amounts of text than gpt-3.5-turbo. Here are just a couple of examples:

So, unless there is some technical issue I am missing, I reckon this is the answer to my question.

View solution in original post

SomebodySysop

I have more information, which may or may not be related. I have been testing primarily with Biblical quotes. Not that I'm proselytizing or anything, but that is my application: A Bible/Talmud/Tanakh Knowledge base.

Today is the first day that the model has responded to *any* prompt I've given it. So, I asked a couple of questions that involved Moses, his wife, his son, and foreskin and circumcision. This is what I got:

Array
(
    [promptFeedback] => Array
        (
            [blockReason] => OTHER
        )

    [_elapsed_time] => Array
        (
            [https://generativelanguage.googleapis.com/v1beta/models/gemini-pro:generateContent?key=AIzaSyBiM8RHTd-4VbSekwvVnCtxwbrs54YAzmo] => 4.431930065155
        )

)

Which, believe it or not, is a good thing because that means, at least, that the darned thing is looking at the text and understanding it's meaning. But the censorship is a bit weird being that this is all Old Testament Holy Scripture stuff.

SomebodySysop

I've done even more testing, comparing the same questions and documents in the same format (XML) between Gemini Pro 1.0 and OpenAI gpt-3.5-turbo-16k. There is no comparison. Hands down, the gemin-pro API is far less capable of comprehending large amounts of text than gpt-3.5-turbo. Here are just a couple of examples:

So, unless there is some technical issue I am missing, I reckon this is the answer to my question.