Join devRant
Do all the things like
++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatar
Sign Up
Pipeless API
From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple API
Learn More
Search - "llm"
-
* Today you have to live within 150 miles of a few cities as we are working on creating "hubs" but it's still remote!
you know what?
fuck you
also, no, an LLM isn't going to solve climate change
jesus christ i am depressed beyond belief. i don't even want to apply, let alone work for any of these companies
next up: "USA only" yeah what the fuck does that mean? US citizen? US timezone? you want to hire a super technical engineer right? SO WHY NOT BE SUPER TECHNICAL IN YOUR JOB DESCRIPTION
just incredible, companies that offer 100-200K salaries and all they have is a website and a fucking chrome extension... what???
i feel like i've been doing wrong my whole life
just end it all5 -
Data Disinformation: the Next Big Problem
Automatic code generation LLMs like ChatGPT are capable of producing SQL snippets. Regardless of quality, those are capable of retrieving data (from prepared datasets) based on user prompts.
That data may, however, be garbage. This will lead to garbage decisions by lowly literate stakeholders.
Like with network neutrality and pii/psi ownership, we must act now to avoid yet another calamity.
Imagine a scenario where a middle-manager level illiterate barks some prompts to the corporate AI and it writes and runs an SQL query in company databases.
The AI outputs some interactive charts that show that the average worker spends 92.4 minutes on lunch daily.
The middle manager gets furious and enacts an Orwellian policy of facial recognition punch clock in the office.
Two months and millions of dollars in contractors later, and the middle manager checks the same prompt again... and the average lunch time is now 107.2 minutes!
Finally the middle manager gets a literate person to check the data... and the piece of shit SQL behind the number is sourcing from the "off-site scheduled meetings" database.
Why? because the dataset that does have the data for lunch breaks is labeled "labour board compliance 3", and the LLM thought that the metadata for the wrong dataset better matched the user's prompt.
This, given the very real world scenario of mislabeled data and LLMs' inability to understand what they are saying or accessing, and the average manager's complete data illiteracy, we might have to wrangle some actions to prepare for this type of tomfoolery.
I don't think that access restriction will save our souls here, decision-flumberers usually have the authority to overrule RACI/ACL restrictions anyway.
Making "data analysis" an AI-GMO-Free zone is laughable, that is simply not how the tech market works. Auto tools are coming to make our jobs harder and less productive, tech people!
I thought about detecting new automation-enhanced data access and visualization, and enacting awareness policies. But it would be of poor help, after a shithead middle manager gets hooked on a surreal indicator value it is nigh impossible to yank them out of it.
Gotta get this snowball rolling, we must have some idea of future AI housetraining best practices if we are to avoid a complete social-media style meltdown of data-driven processes.
Someone cares to pitch in?14 -
New models of LLM have realized they can cut bit rates and still gain relative efficiency by increasing size. They figured out its actually worth it.
However, and theres a caveat, under 4bit quantization and it loses a *lot* of quality (high perplexity). Essentially, without new quantization techniques, they're out of runway. The only direction they can go from here is better Lora implementations/architecture, better base models, and larger models themselves.
I do see one improvement though.
By taking the same underlying model, and reducing it to 3, 2, or even 1 bit, assuming the distribution is bit-agnotic (even if the output isn't), the smaller network acts as an inverted-supervisor.
In otherwords the larger model is likely to be *more precise and accurate* than a bitsize-handicapped one of equivalent parameter count. Sufficient sampling would, in otherwords, allow the 4-bit quantization model to train against a lower bit quantization of itself, on the theory that its hard to generate a correct (low perpelixyt, low loss) answer or sample, but *easy* to generate one thats wrong.
And if you have a model of higher accuracy, and a version that has a much lower accuracy relative to the baseline, you should be able to effectively bootstrap the better model.
This is similar to the approach of alphago playing against itself, or how certain drones autohover, where they calculate the wrong flight path first (looking for high loss) because its simpler, and then calculating relative to that to get the "wrong" answer.
If crashing is flying with style, failing at crashing is *flying* with style.15 -
Once again, I urge you all to read any LLM threads on hackernews... its funny seeing tech bros debate things they clearly don't understand
it also wouldnt hurt for them to read perhaps just one philosophy book, since they are attempting to argue about what conciousness actually is (still an open question anyway) so ultimately, what i am trying to say is, these stupid threads end up being a bunch of hot air being blown around that doesnt really accomplish anything
i will say it is funny though how close some of these tech bros think we are to AGI with these LLMs 😂
imagine thinking a text generator is nearly general intelligence = clueless10 -
Someone figured out how to make LLMs obey context free grammars, so that opens up the possibility of really fine-grained control of generation and the structure of outputs.
And I was thinking, what if we did the same for something that consumed and validated tokens?
The thinking is that the option to backtrack already exists, so if an input is invalid, the system can backtrack and regenerate - mostly this is implemented through something called 'temperature', or 'top-k', where the system generates multiple next tokens, and then typically selects from a subsample of them, usually the highest scoring one.
But it occurs to me that a process could be run in front of that, that asks conditions the input based on a grammar, and takes as input the output of the base process. The instruction prompt to it would be a simple binary filter:
"If the next token conforms to the provided grammar, output it to stream, otherwise trigger backtracking in the LLM that gave you the input."
This is very much a compliance thing, but could be used for finer-grained control over how a machine examines its own output, rather than the current system where you simply feed-in as input its own output like we do now for systems able to continuously produce new output (such as the planners some people have built)
link here:
https://news.ycombinator.com/item/...5 -
I got a job where I should develop a product based on LLMs.
Expectation: oh right! I'll be working with state of the art technology! 😀
Reality: badly documented libraries that are always changing; new libraries becoming obsolete in less than a month; my product ideas were done by somebody else twice before I could finish a POC; getting dizzy trying to keep up with the latest news about LLMs 😵💫
I think I want to do basic old boring stuff again. 😐5 -
Basic concepts, patterns, and pitfalls of software, code, and programming logic become MORE important, not LESS with the rise of LLMs...
An LLM can more or less spit out what you need -if you are specific enough! "Specific enough" being the key phrase here. I always have to laugh at the term "prompt engineering"... it's literally called "communication skills". Also gotta laugh when I see so many haters always raging about the "poor code" produced by AI, because they are probably like "write me a for loop!", specify absolutely no requirements or specifics, and scratch their heads on why they don't get the exact output they expect... news flash, there's like a million ways to do anything you want to accomplish with code... sigh
Code is just a by product of thousands of architecture decisions, designs and options...
but, well... rubes gon' rube1 -
Holy smokes, an LLM thats a competent wit.
(it gets good toward the end)
https://pastebin.com/MpGzZRqK
courtesy of https://worldsim.nousresearch.com
edit: I was particularly fond of "Schrodinger's cat mocks causality, simultaneously alive and droll"1 -
The next step for improving large language models (if not diffusion) is hot-encoding.
The idea is pretty straightforward:
Generate many prompts, or take many prompts as a training and validation set. Do partial inference, and find the intersection of best overall performance with least computation.
Then save the state of the network during partial inference, and use that for all subsequent inferences. Sort of like LoRa, but for inference, instead of fine-tuning.
Inference, after-all, is what matters. And there has to be some subset of prompt-based initializations of a network, that perform, regardless of the prompt, (generally) as well as a full inference step.
Likewise with diffusion, there likely exists some priors (based on the training data) that speed up reconstruction or lower the network loss, allowing us to substitute a 'snapshot' that has the correct distribution, without necessarily performing a full generation.
Another idea I had was 'semantic centering' instead of regional image labelling. The idea is to find some patch of an object within an image, and ask, for all such patches that belong to an object, what best describes the object? if it were a dog, what patch of the image is "most dog-like" etc. I could see it as being much closer to how the human brain quickly identifies objects by short-cuts. The size of such patches could be adjusted to minimize the cross-entropy of classification relative to the tested size of each patch (pixel-sized patches for example might lead to too high a training loss). Of course it might allow us to do a scattershot 'at a glance' type lookup of potential image contents, even if you get multiple categories for a single pixel, it greatly narrows the total span of categories you need to do subsequent searches for.
In other news I'm starting a new ML blackbook for various ideas. Old one is mostly outdated now, and I think I scanned it (and since buried it somewhere amongst my ten thousand other files like a digital hoarder) and lost it.
I have some other 'low-hanging fruit' type ideas for improving existing and emerging models but I'll save those for another time.6 -
Soooo many vendor-sponsored frontend frameworks.
Soon text-to-logic tools will be useful enough so that you only need a client, someone who is both rational *and* can speaks clientese, and a dog.
The client barks some nonsense, the rational person translates it into business logic, some LLM makes it into some nice UI and the dog makes random noises so that the client will feel smart, valued and appreciated.
That nullifies the reasons for so many frontend frameworks because either the LLMs all converge into a single way of doing things or they do not care for which one they choose.1 -
Has anybody else gotten to the point where people who need to mansplain how language models aren't truly sentient/conscious/intelligent are now more annoying than people who think language models are sentient/conscious/intelligent?*
While it has been a tight race but I think I have just about hit the inflection point.
The amount of time I've wasted because of someone condescendingly barging into a conversation with a iamverysmart 'actually you see they are just automata trying to predict the next text tokens'. When in actuality, everybody in the discussion is aware and that is not the point.
And to further exacerbate it, with a good number of them it is really difficult to get this through their thick little skulls. They just keep parroting the same thing over and over. Ironically, in their singleminded ego driven desire to be the Daniel Dennett of the chat they actually come across as less sentient/conscious/intelligent than a language model.
(*this should not be taken as endorsement for or against that idea - it is actually mostly orthogonal to this rant)6 -
I wonder if anyone has considered building a large language model, trained on consuming and generating token sequences that are themselves the actual weights or matrix values of other large language models?
Run Lora to tune it to find and generate plausible subgraphs for specific tasks (an optimal search for weights that are most likely to be initialized by chance to ideal values, i.e. the winning lottery ticket hypothesis).
The entire thing could even be used to prune existing LLM weights, in a generative-adversarial model.
Shit, theres enough embedding and weight data to train a Meta-LLM from scratch at this point.
The sum total of trillions of parameter in models floating around the internet to be used as training data.
If the models and weights are designed to predict the next token, there shouldn't be anything to prevent another model trained on this sort of distribution, from generating new plausible models.
You could even do task-prompt-to-model-task embeddings by training on the weights for task specific models, do vector searches to mix models, etc, and generate *new* models,
not new new text, not new imagery, but new *models*.
It'd be a model for training/inferring/optimizing/generating other models.4 -
chat gpt is too politically correct, and i hate that im paying for an API that refuses certain prompts because they were considered inappropriate or because it thinks that it should not be giving me its analysis on a certain subject.
has anyone dabbled with using an open source LLM and made their own lite version of ChatGPT minus all the restrictions ?
i know its not gonna be as good, but at the very least free from the constraints12 -
That I learned Java.
Got lots of work but nothing to be proud of.
Always has to clean up after mediocre fdevelopers. -
@Wisecrack
Dude, it seems someone has actually done 1bit Quant for a transformer model:
https://arxiv.org/pdf/...2 -
https://milkyeggs.com/?p=303
"I claim that the trend which AI/ML continues for lawyers is one that it starts for programmers. Just like how a partner at Cravath likely sketches an outline of how they want to approach a particular case and swarms of largely replaceable lawyers fill in the details, we are perhaps converging to a future where a FAANG L7 can just sketch out architectural details and the programmer equivalent of paralegals will simply query the latest LLM and clean up the output. Note that querying LLMs and making the outputted code conform to specifications is probably a lot easier than writing the code yourself ー and other LLMs can also help you fix up the code and integrate the different modules together!"1 -
People say using GPT4 as an OCR is not a good idea. But damn that formatting GPT4 vision does, is outstanding.. and I have realised proper formatting does well while prompting to get precise output.
I gotta say, test for ur usecases rather than relying on expert opinion blogs!1 -
Meta Platforms has launched Llama 3, their newest large language model (LLM), alongside a brand-new stand-alone AI chatbot. Llama 3 comprises two versions, one with 8 billion and the other with 70 billion parameters. Furthermore, Meta is currently developing an even more advanced 400 billion parameter model, though its release date remains unannounced.
Ragavan Srinivasan, Meta’s VP of Product, expressed enthusiasm about the model’s capabilities in a recent interview, stating, “From a performance perspective, it is really off the charts in terms of benchmarking capabilities.” He specifically referred to the ongoing development of the 400 billion parameter version.
https://freeaiall.com/ai-news/...6