site banner

Small-Scale Question Sunday for April 7, 2024

Do you have a dumb question that you're kind of embarrassed to ask in the main thread? Is there something you're just not sure about?

This is your opportunity to ask questions. No question too simple or too silly.

Culture war topics are accepted, and proposals for a better intro post are appreciated.

3
Jump in the discussion.

No email address required.

Since people keep talking about/recommending them, how do you use an LLM? I mean, most everything I search online is paywalled, and the free "AI tools" I've tried weren't very impressive (and ended up either shut down or paywalled)?

Could somebody give some ELI5-level guidance and/or recommendations?

Options:

  • Google's mainstay is Gemini (previously Bard) is free(ish) for now, if you have a Google account. Open it, start writing. Not private.

  • Anthropic pushes Claude. You can try Haiku and Sonnet, the lighter- and mid-weight models free, but Opus was more restricted last I checked. Tends to be one of the stronger fiction writers, for better or worse.

  • Chat-GPT3.5 is available for free at here, 4.0 is a paid feature at the same sight. The paid version is good for imagegen -- I think it's what a lot of Trace's current stuff is using. Flexible, if a bit prudish.

  • Llama is Facebook's big model, free. Llama 2 is also available for download and direct run, though it's a little outdated at this point.

  • LMSys Arena lets you pit models against each other, including a wide variety of above. Again, not private. Very likely to shutter with little notice.

  • Run a model locally, generally through the use of a toolkit like OobaBooga webui. This runs fastest with a decent-ish graphics card, in which case you want to download the .SAFETENSORS version, but you can also use a CPU implementation for (slow) generation by downloading GGUF versions for some models. Mistral 8x7B seems to be the best-recommended here for general purpose if you can manage the hefty 10+GB VRAM minimum, followed by SOLAR for 6GB+ and Goliath for 40+GB cards, but there's a lot of variety if you have specific goals. They aren't as good as the big corporate models, but you can get variants that aren't lobotomized, tune for specific goals, and there's no risk of someone turning it off.

Most online models have a free or trial version, which usually will be a little dumber, limited to shorter context (think memory), or be based on older data, or some combination of the above. Paid models may charge a monthly fee (eg, ChatGPT Plus gives access to DallE and ChatGPT4 for 20 USD / month), or they may charge based on tokens (eg, ChatGPT API has a per 1 million input and output token price rate, varying based on model). Tokens are kinda like syllables for the LLM, between a letter to a whole word or rarely a couple words, which are how the LLM breaks apart sentences into numbers. See here for more technical details -- token pricing is usually cheaper unless you're a really heavy user, but it can be unintuitive.

For use:

  • Most models (excluding some local options) assume a conversational model: ask the program questions, and it will try to give (lengthy) answers. They will generally follow your tone to some extent, so if you want a dry technical explanation, use precise and dry technical terms; if you want colloquial English, be more casual. OobaBooga lets you switch models between different 'modes', with Instruct having that Q/A form, and Default being more blank, but most online models can be set or talked into behaving that way.

  • Be aware that many models, especially earlier models, struggle with numbers, especially numbers with many significant figures. They are all still prone to hallucination, though the extent varies with model.

  • Long conversations, within the context length of the model, will impact future text; remember that creating a new chat will break from previous context, and this can be important when changing topics.

  • They're really sensitive to how you ask a question, sometimes in unintuitive ways.

Thanks! Maybe you'll mind answering two questions: About using local models, can it be tweaked so it doesn't forget context so easily? Maybe using learning runs on previous conversations? How does chatGPT retains context? I understand it does multiple processings for each prompt, and does lossy compression of previous chat history. How to simulate in in API?

You can finetune models on your personal data or information, but that only does so much. If you're more technically inclined, you can try setting up Retrieval-augmented generation, where the model queries and existing database and tries to answer based off the knowledge there and not just what it came baked in with.

Don't ask me how that can be done, but I know it's a thing. My PC isn't good enough to fuck around with the local models worth using, courtesy of Nvidia and their stingy amounts of VRAM.

How does chatGPT retains context?

I presume you're not talking about nitty gritty algorithmic details (which would be the self-attention mechanism IIRC) and instead mean how it continues a conversation or remembers details about a user?

Well, the official implementation has a "memory" feature where it gets to remember tidbits about your preferences as a user, as well as some relevant personal details like location.

The way it works is that the entire conversation is fed back to the model, with specific signs that tell it when it or the user was speaking, and it'll resume where the user left off. I think the API works this way by default, but my OAI credits expired ages ago, so if it seems to be treating each input as a fresh prompt, you need one of the many frontends available that use your API key and then handles the matter of copying back your entire conversation, which is tantamount to the model "remembering" it.