Tumblr and WordPress are reportedly preparing deals to sell user data to artificial intelligence companies OpenAI and Midjourney. 404 Media reports The platforms’ parent company, Automattic, is about to finalize a deal to provide data to help train AI companies’ models.
It’s unclear what data will be included, but the report suggests that Automattic may have gone overboard at the start. An alleged internal memo from Tumblr product manager Cyle Gage suggests that Automattic is willing to send personal or partner-related information that should not be included in the contract. Suspicious content reportedly includes private posts on public blog posts, deleted or suspended blogs, unanswered (and therefore not publicly published) questions, private replies, publicly flagged posts, and posts from premium partner blogs (such as Apple’s former music site). content included.
An internal post indicates that Automattic engineers have compiled a list of mail IDs that should be excluded. It is unclear whether the data has already been sent to AI companies.
Engadget emailed Automattic for comment on the report. The company responded with a published a statement, claiming, “We will only share public content hosted on WordPress.com and Tumblr from sites that have not opted out.” Legal regulations do not currently require AI companies’ web browsers to honor users’ opt-out choices, the statement said.
The bottom line of Automattic’s statement is consistent with the reported deals. “We also work directly with select AI companies as long as their plans align with what our community cares about: attribution, opt-out, and control,” Automattic wrote. “Our partnership will respect all opt-out parameters. We also plan to take this a step further and regularly update any partners on new opt-outs and request that their content be removed from past sources and future training.”
The company reportedly plans to launch a new opt-out tool on Wednesday that it claims will allow users to block third parties, including AI companies, from training their data. 404 Media “If you opt out in advance, we will block crawlers from accessing your content by adding your site to a blacklist. If you later change your mind, we also plan to update any partners of the new opt-outs and request that their content be removed from past sources and future training.”
Describing this as “asking” AI companies to delete data may be appropriate.
Andrew Spitl, Automattic’s head of artificial intelligence, explained in response to an employee question about data deletion safeguards when using the tool: “We will regularly notify existing partners of anyone who has opted out since our last rollout. list. I want this to be an ongoing process where we regularly advocate for the removal of past content based on current preferences. We will request that the content be removed and removed from future training. I believe partners will respect that based on our conversations with them up to this point. “I don’t think they will gain much overall by keeping them.”
So, if a Tumblr or WordPress user requests to opt out of AI training, Automattic will “request” and “advocate” their removal. And the company’s AI chief “believes” AI companies will find it in their best interest to comply “based on our conversations.” (How’s that for confidence!)
AI data training deals have become a profitable opportunity for water websites today a slippery online publishing landscape. (It has been reported that Tumblr employees reduced to a skeleton crew By the end of 2023.) Last week, Google struck a deal with Reddit (before the latter’s IPO) practice on the platform’s extensive database of user-generated content. Meanwhile, OpenAI launched a partnership program last year collect data sets from third parties Helping train AI models.
Update February 27, 2024 at 3:56 PM ET: This story has been updated to include a published statement from Automattic, the parent company of WordPress and Tumblr.