Feedback after some use

#1
by AlecFoster - opened

Hello, I'm quite torn on this model... I like it's output a lot, but it definitely still has repetition issues, not as bad as some models and seems to mostly become an issue when the conversations are longer.

So far I've had the most luck against repetition using the LatitudeGames/Harbinger-24B model, but that model also has some other issues as well... So that's why I'm torn between these models, I like the output a bit more from Codex-24B-Small-3.2 but the repetition does become a bit problematic since most of my sessions tend to go fairly long ~16K tokens a lot of the times.

Some other issues I've had with Codex-24B-Small-3.2 is user message misunderstandings, sometimes it tends to misunderstand what I'm saying quite drastically, forcing me to add more "metadata" to guide it towards the intended meaning. But I've found Codex-24B-Small-3.2 to be more flexible than Harbinger-24B if you suddenly introduce another character into the scenario. At least for me Harbinger-24B seems to need a lot of extra guiding to get it to consistently add dialogue for multiple characters if they aren't specifically mentioned in the system prompt (now this might be more an issue with how I define the character and world, so I wont blame it to much for it).

In summary I've really enjoyed messing around with Codex-24B-Small-3.2 and I'll keep messing around with it more to see if in the end I like it more than Harbinger-24B. These two models have made me quite excited though for the possibility of larger versions eventually, since I've mostly found the ~20B-30B parameter count to be okay, but not quite enough for more complex spatial scenes. Really wish Mistral would have released Mistral Medium openly, would have been really fun to see what you could have done with a slightly larger Mistral model.

Really wish I could mess around quicker, but my limited compute makes it quite slow to test longer scenarios and a larger variety of scenarios. So far I've tested the following temperatures 0.5, 0.55 and 0.75.

Thank you again for a fun model, I look forward to seeing what you make next :)

@Gryphe I was wondering if you'd be up to have a little chat, I have some questions on the training data you use for your models, specifically a question about repetition avoidance. So I'd love to have a chat with you through an appropriate medium.

Don't worry I'm not asking for you to give me all your training data nor your deepest darkest secrets πŸ˜‡, I just had some questions you're free to answer or not, but I'd love to at least ask them of you. Who knows maybe it will help you with future training, I'm hoping at least... Since I do not even have a slight chance to being able to test anything myself with my near zero resources...

If not, that's perfectly fine as well, hope you're well πŸ™‚

@Gryphe I was wondering if you'd be up to have a little chat, I have some questions on the training data you use for your models, specifically a question about repetition avoidance. So I'd love to have a chat with you through an appropriate medium.

Don't worry I'm not asking for you to give me all your training data nor your deepest darkest secrets πŸ˜‡, I just had some questions you're free to answer or not, but I'd love to at least ask them of you. Who knows maybe it will help you with future training, I'm hoping at least... Since I do not even have a slight chance to being able to test anything myself with my near zero resources...

If not, that's perfectly fine as well, hope you're well πŸ™‚

yes with a lot of fine tuning , is ontop of and on top of !
the models need to be normalized !

so for continued fine tning , we do a merge !!
so a linear merge is enough to reground the model , this is a non destructive merge !~
I used to use ties and even ties dare ( if it works )
but i found that after these merges they needed to have a final linear merge !

SO i usually just linear merge if if find problems !

I merge with one of my great responding models :

My latest merges have been amazing results ( despite the models flattening to 32k context windows when thye was sketchy 128k ) ... SO i found with mistral the sweet sot is actually 4k ! 4k context it works perfection but after 4k it deteriates and repeats ( some times inside the response it repaets a whole phrase reiterating then finding its feet and endng the response totally different place ! in fact i dont think it a error !

Before chains of thought was actually working ! .. when it started to emerge i found the modle talking to itself ? ie running a few responses , like it had alreay predicted your next few questions and had the full to and fro !

or was it eos marker not being implemented ?

but a faithful merge fixes this !

ALSO :NOTE:
ALL models can TAKE IMAGES !

you only need to frame your training prompt with a section for tools or image :and whenyou passimages to this feildyouconvertthe image into base64 first!

then it will accept it ! it can also generate a image or a base64 output ~!

I begun by training it an image and caption ! ... ( base64) then i trained the same datset to genrate the base64 from a caption ! < .... then it was working ! after 1000-5000 samples :and so this sample set was epoched! until it wasover fit!
then Merged into a model ! i found it was working ! << SO you need to train as many CAPTIONS as possible for input and retrieval ! ( it cannot generate images it has not seen ... for this you NEED diffusion !)

( Pity this model is 24b ! )
i use mistral model !!!! my models are HIGHLY TRAINED ! ( TOP OF LEADER BOARD ! FOR MISTRAL ) !!

Sign up or log in to comment