Executives Driving Efficiency & Governments Predicting Voting Results with AI . . . But Where Does All the Data Come From?
Jordan Kelly • 22 June 2025

AI Expert Delves Into the Ethics Concerning the Sourcing of Data that 'Feeds' LLM Models

(By Columnist Jamie Munro, AI & Robotics Expert)


The last few years have seen Artificial Intelligence break out increasingly into mainstream usage. Its usefulness isn’t the question. The questions, at the user level, revolve around how to employ it for greatest effect (including competitively), and at the level of AI model training, how to advance AI models that ensure balanced, properly informed outputs from ethically-sourced inputs.


Executives are looking to drive efficiency improvements. Democratic governments are starting to use AI to predict voter outcomes while their counterparts in authoritarian regimes are using it to monitor citizen activities and crack down on dissent. Three in five doctors in the U.S. now report using Artificial Intelligence as part of their practice. Internal research at my firm (willowlearn.com) indicates that between 40 and 60 percent of teachers in the UK now use AI in some component of their role. The most eager early adopters of AI are university students - with the latest numbers from the UK's Higher Education Policy Institute showing 88 percent of university students admitting to the use of AI in their assignments.


You’re An AI User Whether You Know It or Not


You've almost certainly started to hear the names of various AI products coming up in conversations - names like ChatGPT, Gemini, Claude and Perplexity.


Even if you haven't deliberately used one of these products, you've definitely interacted with some sort of product or service employing AI in some way. If you've used Google over the last few months, you will have started noticing the "AI overview" section at the top of the results (congratulations – you are now an AI user).


So now that you are an AI user, you might be asking, what exactly is AI? To give a simple answer, “AIs” – in layman’s terms – are computer programs (“Large Language Models” – “LLMs”) that you can interact with using natural human language. Much like texting with a (very knowledgeable) friend. When you send a message to an LLM, it will respond back to you with its own message. What it's really doing is using some very complicated mathematics to predict, based on your input, what it is that you want to see, and then generating it for you. It's a bit of a simplification, but you can think of it as a beefed-up version of the predictive text system on your phone.


So How Are These Computer Programs Able to Predict What You Want to See?


This is where data comes in, and lots of it. AI models need to be shown millions (and trillions) of examples of human language in a process known as "training". This leads to another question - where does the data come from?


The answer to that question is sensitive.


Firstly, companies developing Artificial Intelligence LLMs for the marketplace need vast amounts of data to remain competitive. Many have been accused of employing unscrupulous methods to obtain it.


The issue of copyright around AI training materials is still an open question and probably one that can only be addressed with new legislation. It emerged earlier this year that Meta (Facebook’s parent company) used a large number of pirated books as part of its training data. That's just one of many copyright cases against AI giants currently making their way through the U.S. Courts system.


Copyright Dilemmas:  Is It Fair to Content Creators & Other Humans?


Proponents of greater copyright protections would argue that AI fundamentally breaks the current monetisation systems for content creators.


Under current systems, content creators list their content on search engines like Google and YouTube. Users search for topics they are interested in, the search engine shows the user adverts (which is how they make money), then the user clicks on an item that interests them. The creator of that content then has the chance to monetise their readers/viewers.


As users move away from search engines and towards Artificial Intelligence, AI services simply provide an authoritative answer to the user’s query without linking to an external resource. Previously, a user searching for “how to change a tyre” might find their answer on a mechanic’s website and decide to purchase some service from that mechanic’s business. But in the future, they will increasingly ask an AI how to change the tyre, and the AI will just tell them - even if the AI originally learned how to do that from the mechanic's website.


Is that fair to the original content creator (the mechanic, in this case)?


Proponents of reduced copyright protections would argue that the above argument would never be applied to humans. if I spent a year reading books about a particular topic and became an expert, and then I wrote my own book about the topic, nobody would argue that I'm just regurgitating the books I read (unless I was found to have plagiarised large chunks without attribution). The other argument is economic: if we start enforcing copyright protections for the creators of AI training material, the cost of these already expensive AI models will get even higher – and countries that don't care about copyright protections (such as China) will gain a massive advantage. 


Concentration of Control & Potential Misuse of Power


The second major data-related problem is one of control and power.


Everything an AI "knows" comes from its training data, so whoever decides what is included in the training data has immense power.


In a world where everyone gets their information from AI, the creators of AI will have massive influence over public opinion, much like the broadcasters and newspaper owners in the pre-social media world. Currently this power is concentrated in the hands of a few tech giants like Google, Meta and OpenAI.


Efforts to regulate these giants and AI more broadly could result in concentrating the power into even fewer hands.


AI Bias in Elections? And Why Do They All Sound the Same?


There have already been numerous accusations of bias made against major AI models; during the 2024 U.S. elections, for example.


Many users note that the major AI models, even ones from different developers, end up producing rather similar "facts". AI models are only as good as the training data and any biases in this data will certainly be reflected by the model. The similarities between model outputs can be explained if you consider that all the AI companies are broadly using the same training data – the contents of the internet.


The training data supplied to an AI model acts as a type of voting mechanism. The more times a particular idea or concept appears in the training data, the more likely the model is to reproduce it. This means that AI models are much more likely to adopt mainstream viewpoints, and less popular, more controversial or dissenting opinions and views are much less likely to be represented.


Are we heading towards a future where ordinary people only have access to the “official line”? And do we want the likes of Google, Meta and OpenAI deciding what that official line should be?


It’s too late to go back to a world without AI – that ship has already sailed. Countries, companies and individuals who fail to adopt AI will be left behind and out-competed by those who are already on board.

 

Humanity must bravely face the future and embrace the massive opportunities presented by Artificial Intelligence. But as we move towards that future, we cannot shy away from the issues introduced by AI. They must be tackled thoughtfully and they must be tackled head-on.


Artificial Intelligence is ultimately a technology built by humans, for humans, to make life better for us all. But whether the reality is a “for better or for worse” one, will be wholly dependent on whether we do address these fundamental and critical issues.


See Jamie Munro's full bio here.

___________________________


Recent Highlight Coverage: 


How Wellington REALLY Works:  The '5Ds' . . . and How Parliamentarians & Government Agencies Use These Against YOU

Other News, Reviews & Commentary

by Jordan Kelly 15 March 2026
Editor’s Conclusion : Unqualified. Unsupervised. Unaccountable. And Still Accredited.
by Jordan Kelly 10 March 2026
UPDATED: 10.3.26 Will This Badly Behaving Institution Finally Allow the Full Truth to Be Revealed?
by Jordan Kelly 8 March 2026
Hidden in Plain Sight: Unashamed Conflicts of Interest to Make Your Head Spin
by Jordan Kelly 4 March 2026
Time for Change : New Zealand's Pet Parents Say NO MORE to the Poor Standards, Compromised Care & Outright Contempt We Put Up With from the 'Products' of the Massey Veterinary Degree Factory
by Jordan Kelly 27 February 2026
Readers following the coverage of my attempts to get to the bottom of what happened to my beloved little papillon, Harry, with whom I was extraordinarily closely bonded, will know that: (A) The rot in Massey University’s Companion Animal “Hospital” (CAH) runs deep. (B) Honesty and transparency is not their policy. Denial, dismissal, stonewalling, legal threats and intimidation are. (C) Animals aren’t safe there, with cruelty embedded in “care”, and your property (as your pet legally is) not considered your property at all, as far as Massey, its CAH staff and management are concerned. Your pet is theirs ; to do with as they please, according to their mindset and their modus operandi. And if that involves catastrophic levels of unauthorised, contraindicated, convenience sedation to facilitate their use of your pet in monetised student video collections (including on private cell phones, and to which you will be given no access), this too, according to Massey, is its own God-given right and “best practice” Standard Operating Procedure. (D) “Informed Consent” has a very different meaning in the Massey playbook to that which is generally deemed its accepted definition. (E) “Accountability” is a foreign concept and not one with which they have any intention of becoming acquainted. (F) Laws – including those governing animal welfare, property conversion and more – are not only optional, in Massey’s case, they simply don’t apply. In fact, they appear blissfully ignorant of them according to my (and Harry's) experience. You know all that. You’ve read about it here , here , here , here , here , here , here , here and in most of my other now 30+ articles covering the numerous different sub-atrocities within the overall atrocity that was the demise and disposal of my precious little Harry. Actually, "atrocious" doesn't come anywhere near to being an adequate adjective. Despite having been a professional writer since I was 16 and having upwards of 25 published books under my belt, I don't actually have an adjective that's adequate for the pure evil that was perpetrated upon Harry . . . and, by extension, me . There is not one word or one phrase that can sufficiently convey the depth and breadth of the sheer, unadulterated wickedness that festers without restraint within the walls of Massey University's Companion Animal "Hospital". What you, my readers (or those of you not on Massey's massive legal team payroll) didn’t yet know – because I didn’t yet know – is that record and evidence tampering (which, for any other New Zealand citizen would attract jail time of up to 10 years under the Crimes Act 1961 Section 258 (Altering document with intent to deceive) or Section 260 (Falsifying registers) , and/or a $10,000 fine under the Privacy Act Section 212(2)(b) - appears also to be included in the “we’re exempt” culture of Massey and its veterinary “hospital” staff. Note to Readers: The above laws aren't some hypothetical, bottom-drawer, dusty old legal tracts in archaic library textbooks. They're real, "living" laws that apply to every individual in our country. And today, they are being made to apply to Dr Stephanie Rigg and her "colleagues" who falsified Harry's records to create a cover-up of what they did to him . . . and to me. I will, duly, see Dr Rigg and her associates in Court. Dissecting the Cover-Up: Massey’s Metadata of Deception But back to what readers do know for a moment: You’ll know that I’ve been in the battle of battles for the past two months to extract Harry’s full records (or anything approaching them) from Massey’s Legal and Governance department. HOWEVER . . . there was one thing I hadn’t known how to decipher that they actually had finally drip-fed to me. It was File Name: Patient Change Log (Field-Level Audit) . I’ve been learning a lot about veterinary science, record-keeping, and law in general lately. Not because I wanted to. But because if you want to figure out how deep the rot really runs at Massey, you kind of have to. So I’ve learned a bit about how to decipher clinical metadata. Just e nough to realise that this Patient Change Log (Field-Level Audit) is exactly where the digital fingerprints of a cover-up are hiding. Despite the fact that this document has as much redacted as it shows (probably more), with ALL staff names and positions blacked out, for example -I still found four distinct “smoking gun” entries in these otherwise heavily-redacted metadata logs. BIG. FAT. SMOKING. GUNS. that amounted to one undeniable overall conclusion: This document isn’t a clinical record so much as it’s a literal crime scene . There were already so many dodgy inconsistencies in the few items I'd managed to pull out of Massey to that point (as I've documented in various of my preceding articles). But this document is where, undeniably, the bodies are buried. You just need to know which clod of dirt to look under. Hidden in Plain Sight . . . In A Little Thing Called the Metadata (That the Average Pet Owner Wouldn't Even Know Existed ) There are four hidden but key findings demonstrating that the entire timeline of Harry’s “experience” in that hellhole were was orchestrated, and the sudden "neurological event/decline" exit strategy planned for him were a total fabrication. And that fabrication had a start time. (For this start time we will initially revert our focus back to Massey's previously-supplied "Clinical Summary" (in all its dodginess) . . . We will then lead from the immediately below into the afore-mentioned "Patient Change Log (Field-Level Audit)". Bear with me. I promise not to let this get boring). Well, one of two start times. Either: (1) The 8.38am disconnection of his (with, by-then, the TWO 750% overdoses of the renally contraindicated convenience sedative with which the "crying dog"-sensitive ICU staff had plied him overnight) now life-essential IV fluids (8.5 hours into the prescribed 24-hour protocol that they charged me for). And/or: (2) When the day shift ICU "vet" arrived at 9am and decided a THIRD 750% overdose would be a strategic way do deal with a clearly already massively overdosed little 3.8kg, 15-year-old, dehydrated dog. Now WHY would any vet take such a decision? Well, for legal purposes, of course (remembering that the Venerable Dean Jon Huxley and the obviously not- so-new-broom Vice-Chancellor Pierre Venter, have all the money in the public purse to pay their top-tier external legal counsel . . . and by gum, there are enough of the buggers, if this site's analytics are anything to be guided by), I will precede the following by stating that these are my conclusions, made on the basis of the collation and evaluation of the information before me. That said, what I know of my readers is this: You are no intellectual slouches. Feel free to let me know if you can come up with any other conclusion from the information (complete with now numerous "receipts") that I have thus far presented, most especially here and here , and most tellingly of all, in today's expose. R emember, though, I held the ultimate evidence in my arms at 6pm on December 1 . . . and, some 45 minutes later, I let them take it (safely, for them) away from me, just like Harry's (the literal body of evidence) life had just been taken from him. Little Numerals that Tell A BIG Story The plan for Harry's manufactured exit is not so much written into the records, as it is revealed by the tampering with the logs. They lay bare the lead vet’s apparent plan that his life would come to an abrupt end by the pre-scheduled time of (well, they couldn't quite get consistency in the logs regarding the exact minute, but by the absolute latest time of) 17:00 hours i.e. 5pm . . . assumedly, the end of the day shift on December 1. Just in time to mark him "Deceased" and seal off the records of this catastrophically overdosed patient, before the next shift came on, saw his records, and someone started asking the immediately necessary, and certainly appropriate, questions. And those questions would (0R SHOULD ) have included , but would certainly not have been limited to: How long has this dog been in this state? Why hasn't any rescue and remediation protocol been undertaken? Why was he given yet ANOTHER administration of 50mg of Gabapentin at 09:00 hours after the preceding two during night shift? Why is he disconnected from his IV fluids? Who approved that and why? (And if they knew he'd starred in a multi-video student film festival that morning): Was he taken out of his cage and handled in this state? When did he last drink? Was he given any food before he entered this near-comatose state? Does the owner know of the overdoses and the state he's in? Have you filled in an incident report? Have any emergency specialists been called in for advice? and, no doubt, many more questions. OR . . . maybe not. It depends if the rot in that ICU is fully immersive, or if it's concentrated on Dr Stephanie Rigg's day shift and the ICU shift staff of the preceding (November 30) night. But none of those questions could be asked and none of that could happen. The day shift - led by "Dr" Rigg ("Steffi") - wasn't about to let it happen. Thus, the pre-timestamped, just before end-of-shift, Time of Death entered into the "Euthanasia Authorisation" form that they had all queued up for me long before I ever arrived at that Godforsaken facility that fated December 1 afternoon.
by Jordan Kelly 17 February 2026
Harry WAS A Marked Dog. I Had Hoped Massey Vet Staff Couldn't Have Been Any More Wicked Than They'd Already Been Caught Out Being. But YES , Actually, They COULD . 
by Jordan Kelly 15 February 2026
This Is What Happens When Massey Thinks THEY Own Your Dog & Can Do With Him As They Please (You Just Pay the Invoice) At This Appalling, Unaccountable Veterinary House of Horrors (LATEST PROOF OF 'LAB RAT' TREATMENT HERE )
by Jordan Kelly 12 February 2026
FOR LATEST INVESTIGATION FINDINGS: GO HERE . My Precious Little Boy Died Needlessly, In Intense Physical, Mental & Emotional Agony . . . After Massive Overdosing, Intense Cruelty & Intentionally False Diagnosis by Massey 'Vet' (So Called) to Enable His 'Disposal' After Lab Rat-Style Experimentation
by Jordan Kelly 11 February 2026
While my focus is on the 750% overdosing of my precious little dog, Harry, with an unauthorised, contraindicated convenience sedative, his conversion from patient to live specimen, and the subsequent destruction of evidence (HIM), Massey’s focus is on deploying a taxpayer-funded legal hit squad to 'profile' me.
by Jordan Kelly 8 February 2026
An Expert Contributed Commentary (FOR LATEST INVESTIGATION FINDINGS, GO HERE .)
Show More