• 5 Posts
  • 51 Comments
Joined 1 year ago
cake
Cake day: June 12th, 2023

help-circle
  • inspxtr@lemmy.worldtoSelfhosted@lemmy.world2024 Self-Host User Survey Results
    link
    fedilink
    English
    arrow-up
    14
    arrow-down
    2
    ·
    14 days ago

    Wonder how the survey was sent out and whether that affected sampling.

    Regardless, with -3-4k responses, that’s disappointing, if not concerning.

    I only have a more personal sense for Lemmy. Do you have a source for Lemmy gender diversity?

    Anyway, what do you think are the underlying issues? And what would be some suggestions to the community to address them?





  • inspxtr@lemmy.worldtoSelfhosted@lemmy.worldForgejo v1.21 is available
    link
    fedilink
    English
    arrow-up
    4
    ·
    edit-2
    11 months ago

    yeah I guess maybe the formatting and the verbosity seems a bit annoying? Wonder what the alternatives solution could be to better engage people from mastodon, which is what this bot is trying to address.

    edit: just to be clear, I’m not affiliated with the bot or its creator. This is just my observation from multiple posts I see this bot comments on.



  • Thanks for the suggestions! I’m actually also looking into llamaindex for more conceptual comparison, though didn’t get to building an app yet.

    Any general suggestions for locally hosted LLM with llamaindex by the way? I’m also running into some issues with hallucination. I’m using Ollama with llama2-13b and bge-large-en-v1.5 embedding model.

    Anyway, aside from conceptual comparison, I’m also looking for more literal comparison, AFAIK, the choice of embedding model will affect how the similarity will be defined. Most of the current LLM embedding models are usually abstract and the similarity will be conceptual, like “I have 3 large dogs” and “There are three canine that I own” will probably be very similar. Do you know which choice of embedding model I should choose to have it more literal comparison?

    That aside, like you indicated, there are some issues. One of it involves length. I hope to find something that can build up to find similar paragraphs iteratively from similar sentences. I can take a stab at coding it up but was just wondering if there are some similar frameworks out there already that I can model after.








  • Let me see if I get your point. Are you saying most questions on Lemmy ask for opinions, which makes them look like they are asked to use for training AI models?

    If so, I’m not entirely sure I agree. There’s tons of info online about any given topics, which can be very overwhelming. Maybe that causes people to prefer to seek out personal experience and opinions from others on such topics, rather than just hard cold facts.

    It may also depend on which communities the questions you’re sampling are asked as well.





  • Here are some options:

    • crypt.ee: I tried this before, I don’t think it’s selfhostable but quite usable, and nice UI. Encryption available. Ghost folders if you want to. Multimedia available, not sure about storage
    • joplin: you can use Nextcloud (or many other options like Dropbox) for sync and hence storage depends on your cloud solution. E2EE, has plugins, and simple enough to use.
    • anytype.io and logseq: I’ve seen these mention in many places but I haven’t used either. But they seem to have very rich features, not sure about selfhosting though.

  • I think the bias issues will always be there, but usually worsened, less detected (or delayed detection), and exacerbated when the people working on the original problem do not suffer such issues. Eg: if most people working on facial recognition are white and male.

    While I do have my reservation with AI technologies, I think this is a worthwhile effort that the people encountering the same issues work to identify and address them, especially in this case they lead the effort, rather than just be a consultant on it.

    They can lead the effort on collecting new data, or adapt new ways of looking at data, metricizing objectives in a more appropriate manner for the targeted audience. Based on the article, I think they are doing this.


  • inspxtr@lemmy.worldtoAsk Lemmy@lemmy.worldWhen is it good to reinvent the wheel?
    link
    fedilink
    اَلْعَرَبِيَّةُ
    arrow-up
    1
    ·
    edit-2
    1 year ago

    one case is when one is learning, experimenting and innovating.

    To follow through with the analogy a bit, trying to craft the wheel can have give insights and more robust understanding of how the commercial wheels work, especially in tandem with the complex systems that the wheels operate in, e.g. engines, motors, different types of environments and their effects on the wheels, etc. This may better prepare us for when things break or need to be adapted to different environments that the commercial ones are not specifically designed for in the first place.

    However, this does not mean the wheels one re-invents can easily replace the ones that have been stress-tested by many, especially the wheels in more critical situations.

    An example is encryption implementation. Playing around with it is fun, educational, insightful. If you do research in crypto, by all means, play with the wheels, pull it apart, physically and mathematically.

    But, unless you really really know what you’re doing, trying to cook one’s own implementation in an actual product to be offered for customers is, almost always, a promise for future data breaches. And this has happened in real world many times already, I believe.