sauropods.win is one of the many independent Mastodon servers you can use to participate in the fediverse.
A Mastodon instance for sauropod appreciators everywhere.

Administered by:

Server stats:

90
active users

#ai

538 posts445 participants32 posts today

"Moving from the above, this study investigates whether and to what extent unlicensed AI training activities could be undertaken by relying, not on Article 4 DSMD as transposed into national law or a hypothetical reform of the UK system of exceptions, but rather on what appear to be so far potentially overlooked defences. Reference is made specifically to research and education exceptions, notably Article 3 DSMD and Article 5(3)(a) of Directive 2001/29 (InfoSoc Directive), also read in light of Article 5 DSMD. The discussion of other jurisdictions – including the US and countries, like South Korea and Singapore, which have adopted open-ended fair use-style defences – is also undertaken. This is done to determine whether unlicensed AI training, including training seemingly done for the purpose of research or education/learning, might be considered lawful.

In light of the context summarized above, the study tackles two key questions: (a) whether unlicensed AI training may be classified as “research” or even “learning” in the context of “teaching,” and (b) whether commercial AI developers may take advantage of the provisions above. Ultimately, both questions are answered in the negative, finding that no exception or open-ended defence fully covers unlicensed AI training activities. As a result, a licensing approach (and culture) appears to be the way for AI training to be undertaken lawfully, including when this is done for “research” and “learning.”"

cambridge.org/core/journals/eu

Cambridge CoreCopyright Exceptions and Fair Use Defences for AI Training Done for “Research” and “Learning,” or the Inescapable Licensing Horizon | European Journal of Risk Regulation | Cambridge CoreCopyright Exceptions and Fair Use Defences for AI Training Done for “Research” and “Learning,” or the Inescapable Licensing Horizon
#EU#AI#GenerativeAI

"Twiddling represents the big cognitive hazard from enshittification during the AI bubble: the parts of your UI that matter most to you are the parts that you use as vital cognitive prostheses. A product team whose KPI is "get users to tap on an AI button" is going to use the fine-grained data they have on your technological activities to preferentially target these UI elements that you rely on with AI boobytraps. You are too happy, so they are leaving money on the table, and they're coming for it.

This is a form of "attention rent": the companies are taxing your muscle-memory, forcing you to produce deceptive usage statistics at the price of either diverting your cognition from completing a task to hunt around for the button that banishes the AI and lets you get back to what you were doing; or to simply abandon that cognitive prosthesis:
(...)
It's true "engagement-hacking": not performing acts of dopamine manipulation; but rather, spying on your habitual usage of a digital tool in order to swap buttons around in order to get you to make a number go up. It's exploiting the fact that you engage with something useful and good to make it less useful and worse, because if you're too happy, some enshittifier is leaving money on the table."

pluralistic.net/2025/07/28/twi

pluralistic.netPluralistic: How twiddling enshittifies your brain (28 Jul 2025) – Pluralistic: Daily links from Cory Doctorow

Join us to see Meredith Whittaker, President of Signal, in conversation with Maria Exner and Matthias Spielkamp, to discuss why both perspectives - re-claiming 'digital sovereignty,' without conversations about accountability and control, and the promise of AI as salvation - threaten to send us on the wrong track, and what we must do to bring technology in line with human needs.

🎟️:publix.de/en/events/meredith-w
📍 25.09.2025, 6:30 pm

@Mer__edith @spielkamp @algorithmwatch
#AI #politics #technology

It never fails to surprise me how lazy some people are. And when that is combined with a wish to cut costs by ‘automating’ processes best (at the moment) left to experts is combined - in this case medical illustrations, it is downright dangerous.

theregister.com/2025/07/27/bio

edition.cnn.com/2025/07/23/pol

The Register · Seeing is believing in biomedicine, which isn't great when AI gets it wrongBy Thomas Claburn
#AI#Images#Lazy

Some encouraging news: more than half of the students in my online class did a graphing assignment by hand instead of by computer and took a picture of the resulting graphs. And of those that used a computer, more than half clearly used a spreadsheet program. Only a few seem to have used #AI.

I didn't tell them what to do one way or the other - they made the decision on their own.

It'd be nice if there was a #Demucs model that could separate laugh tracks from sitcom episodes. I know an #AI laugh track remover exists already, but to be honest, I wasn't impressed at all by the demo. It sounds like it just turned the episode all the way down when a laugh track came in. Unfortunately, I think the reason it can't happen easily yet is because there aren't many public domain croud sounds out there that you can just train AI on if any, or at least, not to my knowledge. #ML

Why Every Biotech Research Group Needs a Data Lakehouse

start tiny and scale fast without vendor lock-in

All biotech labs have data, tons of it. The problem is the same across scales. Accessing data across experiments is hard. Often data simply gets lost on somebody’s laptop with a pretty plot on a poster as the only clue it ever existed. The problem is almost insurmountable if you try to track multiple data types. Trying to run any kind of data management activity used to have large overhead. New technology like DuckDB and their new data lakehouse infrastructure, DuckLake, try to make it very easy to adopt and scale with your data. All while avoiding vendor lock-in.

American Scoter Duck from Birds of America (1827) by John James Audubon (1785 – 1851 ), etched by Robert Havell (1793 – 1878).

The data dilemma in modern biotech

High-content microscopy, single-cell sequencing, ELISAs, flow-cytometry FCS files, Lab Notebook PDFs—today’s wet-lab output is a torrent of heterogeneous, PB-scale assets. Traditional “raw-files-in-folders + SQL warehouse for analytics” architectures break down when you need to query an image-derived feature next to a CRISPR guide list under GMP audit. A lakehouse merges the cheap, schema-agnostic storage of a data lake with the ACID guarantees, time-travel, and governance of a warehouse—on one platform. Research teams, at discovery or clinical trial stages, can enjoy faster insights, lower duplication, and smoother compliance when they adopt a lakehouse model .

Lakehouse super-powers for biotech

  • Native multimodal storage: Keep raw TIFF stacks, Parquet tables, FASTQ files, and instrument logs side-by-side while preserving original resolution.
  • Column-level lineage & time-travel: Reproduce an analysis exactly as of “assay-plate upload on 2025-07-14” for FDA, EMA, or GLP audits.
  • In-place analytics for AI/ML: Push DuckDB/Spark/Trino compute to the data; no ETL ping-pong before model training.
  • Cost-elastic scaling: Store on low-cost S3/MinIO today; spin up GPU instances tomorrow without re-ingesting data.
  • Open formats: Iceberg/Delta/Hudi (and now DuckLake) keep your Parquet files portable and your exit costs near zero .

DuckLake: an open lakehouse format to prevent lock-in

DuckLake is still pretty new and isn’t quite production ready, but the team behind it is the same as DuckDB and I expect they will deliver high quality as 2025 progresses. Datalakes or even lakehouses, are not new at all. Iceberg and Delta pioneered open table formats, but still scatter JSON/Avro manifests across object storage and bolt on a separate catalog database. DuckLake flips the design: all metadata lives in a normal SQL database, while data stays in Parquet on blob storage. The result is simpler, faster, cross-table ACID transactions—and you can back the catalog with Postgres, MySQL, MotherDuck, or even DuckDB itself .

Key take-aways:

  • No vendor lock-in: Because operations are defined as plain SQL, any SQL-compatible engine can read or write DuckLake—good-bye proprietary catalogs.
  • Start on a laptop, finish on a cluster: DuckDB + DuckLake runs fine on your MacBook; point the same tables at MinIO-on-prem or S3 later without refactoring code.
  • Cross-table transactions: Need to update an assay table and its QC log atomically? One transaction—something Iceberg and Delta still treat as an “advanced feature.”

Psst… if you don’t understand or don’t care what ACID, manifests, or object stores mean, assign a grad student, it’s not complicated.