• 0 Posts
  • 26 Comments
Joined 1 year ago
cake
Cake day: June 12th, 2023

help-circle





  • Pumpkin Escobar@lemmy.worldto196@lemmy.blahaj.zoneThe Rule
    link
    fedilink
    English
    arrow-up
    9
    ·
    3 months ago

    There’s quantization which basically compresses the model to use a smaller data type for each weight. Reduces memory requirements by half or even more.

    There’s also airllm which loads a part of the model into RAM, runs those calculations, unloads that part, loads the next part, etc… It’s a nice option but the performance of all that loading/unloading is never going to be great, especially on a huge model like llama 405b

    Then there are some neat projects to distribute models across multiple computers like exo and petals. They’re more targeted at a p2p-style random collection of computers. I’ve run petals in a small cluster and it works reasonably well.


  • Taking ollama for instance, either the whole model runs in vram and compute is done on the gpu, or it runs in system ram and compute is done on the cpu. Running models on CPU is horribly slow. You won’t want to do it for large models

    LM studio and others allow you to run part of the model on GPU and part on CPU, splitting memory requirements but still pretty slow.

    Even the smaller 7B parameter models run pretty slow in CPU and the huge models are orders of magnitude slower

    So technically more system ram will let you run some larger models but you will quickly figure out you just don’t want to do it.







  • Just a note, the orange pi drivers are not in great shape. It’s getting better but I have a cluster of raspberry pi’s for development, bought an orange pi without first checking out much about them and it’s rough. Rockchip CPUs are great, and the driver / firmware situation is getting better, but something I’d read up on before buying one.

    I’d still look at the N100, it’s about 2.5x the performance of raspberry pi 5, and being x86 you have more options than arm.






  • TPM & secure boot. Look into sbctl for secure boot if you’re not on something that uses the signed shim like ubuntu. I know some hate secure boot but storing the unlock key in tpm is at least much more secure than having the key sitting on a usb drive

    Tang - network based unlock. If you have a separate raspberry pi or something you can set it up as a tang server. You’ll want that thing encrypted too, can set that up to require manual unlock so if someone boosts your servers the tang server never comes up, storage server won’t either

    Or just manually unlock the server with a password every boot?

    That’s roughly my prioritized/preferred list