Recommendations for a lightweight Python LLM framework for a webapp?

hendrik@palaver.p3x.de · edit-2 1 hour ago

CPU-only. It’s an old Xeon workstation without any GPU, since I mostly do one-off AI tasks at home and I never felt any urge to buy one (yet). Model size woul be something between 7B and 32B with that. Context length is something like 8128 tokens. I have a bit less than 30GB of RAM to waste since I’m doing other stuff on that machine as well.

And I’m picky with the models. I dislike the condescending tone of ChatGPT and newer open-weight models. I don’t want it to blabber or praise me for my “genious” ideas. It should be creative, have some storywriting abilities, be uncensored and not overly agreeable. Best model I found for that is Mistral-Nemo-Instruct. And I currently run a Q4_K_M quant of it. That does about 2.5 t/s on my computer (which isn’t a lot, but somewhat acceptable for what I do). Mistral-Nemo isn’t the latest and greatest any more. But I really prefer it’s tone of speaking and it performs well on a wide variety of tasks. And I mostly do weird things with it. Let it give me creative advice, be a dungeon master or an late 80s text adventure. Or mimick a radio moderator and feed it into TTS for a radio show. Or write a book chapter or a bad rap song. I’m less concerned with the popular AI use-cases like answer factual questions or write computer code. So I’d like to switch to a newer, more “intelligent” model. But that proves harder than I imagined.

(Occasionally I do other stuff as well, but that’s a far and in-between. So I’ll rent a datacenter GPU on runpod.io for a few bucks an hour. That’s the main reason why I didn’t buy an own GPU yet.)

hendrik@palaver.p3x.de · edit-2 11 hours ago

Maybe that’s more an issue with modern standby? Or the hardware has some quirks. The last two laptops I had were a Thinkpad and now a Dell Latitude. And they both sleep very well. I close the lid and they’ll drain a few battery percent over the day, I open the lid, the display lights up and I can resume work… Rarely any issues with Linux.

hendrik@palaver.p3x.de · edit-2 11 hours ago

I think there are some posts out there (on the internet / Reddit / …) with people building crazy rigs with old 3090s or something. I don’t have any experience with that. If I were to run such a large model, I’d use a quantized version and rent a cloud server for that.

And I don’t think computers can fit infinitely many GPUs. I don’t know the number, let’s say it’s 4. So you need to buy 5 computers to fit your 18 cards. So add a few thousand dollars. And a fast network/interconnect between them.

I can’t make any statement for performance. I’d imagine such a scenario might work for MoE models with appropriate design. And for the rest performance is abysmal. But that’s only my speculation. We’d need to find people who did this.

Edit: Alternatively, buy a Apple Mac Studio with 512GB of unified RAM. They’re fast as well (probably way faster than your idea?) and maybe cheaper. Seems an M3 Ultra Mac Studio with 512GB costs around $10,000. With half that amount, it’s only $7,100.

hendrik@palaver.p3x.de · edit-2 12 hours ago

Well, I wouldn’t call them a “scam”. They’re meant for a different use-case. In a datacenter, you also have to pay for rack space and all the servers which accomodate all the GPUs. And you can now pay for 32 times as many servers with Radeon 9060XT or you buy H200 cards. Sure, you’ll pay 3x as much for the cards itself. But you’ll save on the amount of servers and everything that comes with it, hardware cost, space, electricity, air-con, maintenance… Less interconnect makes everything way faster…

Of course at home different rules apply. And it depends a bit how many cards you want to run, what kind of workload you have… If you’re fine with AMD or you need Cuda…

hendrik@palaver.p3x.de · 12 hours ago

Thanks for the random suggestion! Installed it already. Sadly as a drop-in replacement it doesn’t provide any speedup on my old machine, it’s exactly the same number of tokens per second… Guess I have to learn about the ik_llama.cpp and pick a different quantization of my favourite model.

hendrik@palaver.p3x.de · 14 hours ago

Thanks. I’ll factor that in next time someone asks me for a recommendation. I personally have Kobold.CPP on my machine, that seems to be more transparent toward such things.

hendrik@palaver.p3x.de · edit-2 16 hours ago

Is there any background information available on ollama becoming less open? It’s marked MIT licensed in the repo of my Linux distribution and on their Github.

hendrik@palaver.p3x.de · 1 day ago

Wasn’t Mistral AI supposed to be one of the European (French) answers to mostly US companies doing AI? From that perspective it wouldn’t be super great if an US company were to buy them. And the company goals don’t match either as Apple is mostly concerned with their own products. So I’d say they’d likely dismantle the company and we have one less somewhat open AI company.

hendrik@palaver.p3x.de · 1 day ago

From what I gather about current chatbots, they always sound very eloquent. They’re made that way with all the textbooks and Wikipedia articles that went in. But they’re not necessarily made to do therapy. ChatGPT etc are more general purpose and meant for a wide variety of tasks. And the study talks about current LLMs. So it’d be like a layman with access to lots of medical books, and they pick something that sounds very professional. But they wouldn’t do what an expert does, like follow an accepted and tedious procedure, do tests, examine, diagnosis and whatever. An AI chatbot (most of the times) gives answers anyways. So could be a dice roll and then the “solution”. But it’s not clear whether they have any understanding of anything. And what makes me a bit wary is that AI tends to be full of stereotypes and bias, it’s often overly agreeable and it praises me a lot for my math genious when I discuss computer programming questions with it. Things like that would certainly feel nice if you’re looking for help or have a mental issue or looking for reaffirmation. But I don’t think those are good “personality traits” for a therapist.

hendrik@palaver.p3x.de · edit-2 2 days ago

without paywall: https://archive.is/f4wU4

Edit: Can someone provide me with the link to the “new study”? (Second paragraph.) Somehow that’s garbled in the archived version.

hendrik@palaver.p3x.de · 3 days ago

I thought the one thing to worry about with the bedsheets is not to grow a large population of mites in them. So you mainly want to keep it ventilated.

hendrik@palaver.p3x.de · edit-2 3 days ago

This is new for us

Says the company who put “Open” in their name for some reason…

hendrik@palaver.p3x.de · 3 days ago

I like YunoHost. That’s an all-in-one solution to do the selfhosting for you. So you won’t learn a lot about the intricate details of the tech, but you can install things with a few clicks. That’s nice if you just want to use stuff. And that project has some track-record. I’m using it for years to self-host Peertube, Immich a Nextcloud and a few other things.

hendrik@palaver.p3x.de · 4 days ago

Correct. We currently have some sentiment against liberal spaces and DEI programs and so on. And some people think it’s the war against straight white men. But having a men’s groups or women’s groups or safe-spaces to talk freely about whatever topics isn’t authoritarian. The opposite of it is equally true. You can’t discuss certain topics without the correct space for it, and not allowing them to discuss how they like is authoritatian as well!

hendrik@palaver.p3x.de · edit-2 4 days ago

Interesting study. Also similar to my own observations. I’ve tried AI coding to some degree. Some people recommend it. And it definitely can do some nice things. Like write boilerplate code fast and do some tech-demos and exploring. And it’s kind of nice to bounce off ideas to someone. I mean the process of speaking out things loud and putting it into words helps me think it through. AI can do that (listen) and it’ll occasionally give back some ideas.

The downside of coding with it in real life is, I end up refactoring a lot of stuff. And that’s just tedious and annoying work. I’ve had this happen to me several times now. And I think the net time balance is negative for me as well. I think I’m better off coming up with an idea how to implement something, and then just type it down. Rather than skipping the step and then moving around stuff, changing it, and making sure it handles the edge-cases correctly and fits into the broader picture. Plus I still end up doing the thinking step, just in a different way.

hendrik@palaver.p3x.de · edit-2 4 days ago

Oh man, I’m a bit late to the party here.

He really believes the far-right Trump propaganda, and doesn’t understand what diversity programs do. It’s not a war between white men an all the other groups of people… It’s just that is has proven to be difficult to for example write a menstrual tracker with a 99.9% male developer base. It’s just super difficult to them to judge how that’s going to be used in real-world scenarios and what some specific challenges and nice features are. That’s why you listen to minority opinions, to deliver a product that caters to all people. And these minority opinions are notoriously difficult to attract. That’s why we do programs for that. They are task-forces to address things aside from what’s mainstream and popular. It’ll also benefit straight white men. Liteally everyone because it makes Linux into a product that does more than just whatever is popular as of today. Same thing applies to putting effort into screen readers and disabled people and whatever other minorities need.

If he just wants what is majority, I’d recommend installing Windows to him. Because that’s where we’re headed with this. That’s the popular choice, at least on the desktop. That’s what you’re supposed to use if you dislike niche.

Also his hubris… Says Debian should be free from politics. And the very next sentence he talks his politics and wants to shove his Trump anti-DEI politics into Debian… Yeah, sure dude.

hendrik@palaver.p3x.de · 5 days ago

Thanks. Yeah that was an omission in my comment. Back then (3months ago) it took some more effort for the average person to scroll through somebody else’s votes but that has changed since. Everybody can look up anyone’s votes. And they’ve always been handed out publicly.

hendrik@palaver.p3x.de · edit-2 5 days ago

Good question. I don’t know when Lemmy got the feature that mods can see all votes, but looks to me someone is agitated/frustrated or something and goes through the logs. We had some discussion back then about people doing their thing in their communities and then some random people aren’t even subscribed and do drive-by downvotes… Which is a bit frustrating. And AI is one of the many polarizing topics here. People tried discussing it in peace but it’s not very easy. Maybe OP got caught in the turmoil of this. Or they pissed off that person and then the next downvote was one too many… I don’t really know. And the person calling out people by name sounds a bit agitated. I’d say someone with that state of mind is likely going to react a bit more extreme. And they’re concerned with voting fraud and brigading in general.

hendrik@palaver.p3x.de · edit-2 5 days ago

Ah right, maybe that was it. I remember seeing the post as well. You got “called out” by name publicly. For supposed “brigading”. And told to F off. That must be the reason for this?! https://discuss.tchncs.de/post/34853477

hendrik@palaver.p3x.de · edit-2 5 days ago

How can you tell what you downvoted while randomly scrolling through the all feed or your subscriptions 3 months ago? I certainly couldn’t remember.

And the communities you were banned from aren’t necessarily the ones you downvoted in… That’s just the realm of the admin or mod who banned you. But they could have based the decision on other behaviour or downvotes of yours.

hendrik@palaver.p3x.de · edit-2 4 months ago

Recommendations for a lightweight Python LLM framework for a webapp?

hendrik@palaver.p3x.de · edit-2 7 months ago

(New) papers by Meta: Large Concept Models and BLT

hendrik@palaver.p3x.de · edit-2 8 months ago

SFSCON24 - Alexander Sander - NGI: No more EU funding for Free Software?!

hendrik@palaver.p3x.de · edit-2 10 months ago

Is there a working Spotify downloader that actually downloads from Spotify?

hendrik@palaver.p3x.de · edit-2 9 months ago

Is Arli AI a legit cloud LLM inference service? Any user experience?

hendrik