Wikipedia talk:Copyright problems/Archive 21

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
Archive 15 Archive 19 Archive 20 Archive 21 Archive 22 Archive 23

A "copyvio" bungle on the Backup article, and what we should learn from that about CopyPatrol problems

Early in the morning of 29 November 2018 I wrote an edit of the "live data subsection of the "Backup article, as a combination of material that was currently in the subsection, material that had been in the subsection until it was deleted by JohnInDC the previous day [1], and material that I had written from scratch early that morning. AFAICT the subsection had been previously basically unchanged since around 2008, so I was amazed to discover that User:Username_Needed had reverted my edit and placed a copyvio template on my Talk page about 2 hours later. I immediately protested "I am mystified ..." on a new section of Username Needed's Talk page, ending that comment with "Please tell me—or have your bot tell me—where the copyright violation lies, so that I can fix it."

(To preserve a comprehensive narrative, I have copied comments I originally made in that section of Usermame Needed's Talk page into this section of my own Talk page.)

About 18 hours after his/her reversion of my edit, User:Username_Needed reverted that reversion with the Edit Summary "SelfRev- misidentification". Still no comment from him/her on where the copyvio lay. Another 7 days later User:Username_Needed put on my Talk page a first feeble apology, which said "I had misidentified an edit as a copyvio when the actual material was elsewhere. Sorry for the confusion." I responded to that with "What does 'elsewhere' mean? In the 'Backup' article, but not in the 'Live data' subsection? In some other article?" My response went on to describe tests I had made with Earwig's Copyvio Detector before and after his/her original reversion of my edit, which showed a 15.3% probability of copyvio before the original reversion in a (properly quotemarked and footnoted) quote in a completely different section of the WP article, and showed a 16.7% probability of copyvio in the "Live data" subsection after the original reversion. Much of the "Live data" subsection appears to have originally been written based on the notes of a 1997 University of Wisconsin lecture by a database administrator; if that 16.7% probability represents a a genuine copyvio, it's likely to be related to that (not quotemarked but properly footnoted) lecture note material—which the database administrator may have copied from Oracle Corp.'s 1997-era documentation.

User:Username_Needed then replied on my Talk page "I was using User:Crow's copypartol tool, which has quite a complicated interface, and I misread it. I have no idea if there ever was any copyvio, whether it has been removed and whether it is still in the revs." I replied on User_talk:Username_Needed#Backup "I just used that CopyPatrol tool to discover a "copyright violation", but it's in the reverse direction! [2] is [explicitly at the top of the blog] dated 10 April 2013, which by internal evidence is almost 5 years after the existing material I edited in the "Live data" subsection was written. If you use View History for the WP article, and go back to a version before that date, it will become pretty obvious that [a consultant in Los Angeles County] copied at least that subsection of the WP article without AFAICT crediting Wikipedia."

(In apparent reaction to a message I—perhaps naively—left on his voicemail the other day, the Los Angeles County consultant has now removed the copyright-violating material from the current version of his blog—but it remains captured on the Wayback Machine on 18 May 2014; I have substituted the Wayback URL in the link in the preceding paragraph.)

What User:Username_Needed's bungle reminds me of is something that happened while my father was teaching me to drive around our suburban neighborhood in 1956. Disregarding his warning I went around a small traffic circle too fast, and ended up going across someone's lawn. Having a bit more intestinal fortitude than User:Username_Needed, I apologized to my father as soon as—not 7 days after—he drove our car off the lawn and around the corner and told me to get back behind the wheel. Maybe WP editors should be required to go through a Learner's Permit phase under the supervision of another CopyPatrol-experienced editor before being allowed to use CopyPatrol to identify copyvios on their own.

Earwig's Copyvio Detector didn't detect the reverse copyvio, but that may simply be because Earwig's Copyvio Detector doesn't look at non-institutional blogs such as that of the Los Angeles County consultant. OTOH Earwig's Copyvio Detector may have coding to detect reverse copyvios and ignore them (comparing in only one direction and seeing which file creation date is earlier are two simple-minded techniques that occur to me); maybe CopyPatrol should have similar coding added. DovidBenAvraham (talk) 10:44, 12 December 2018 (UTC)

  • I will not be using the Crow's copyvio tool in the future, but could you please stop trying to insult me. I get that you're angry at being accused of something that you didn't do. This is justified, but throwing insults isn't going to help. Again, I apologise for upsetting you with an incorrect accusation, but it was a mistake. No malicious intent or upset was intentional. Also, I was offline for a while, hence why I did not respond immediately. [Username Needed] 10:52, 12 December 2018 (UTC)
You forget that I saw the results of your CopyPatrol run on the "Backup" article before those results were invalidated by the Los Angeles County' consultant's deleting his reverse copyvio material. They showed 6-7 pages of text that was highlighted as being a copyvio. That IMHO would have alerted any WP editor experienced with copyvios that the copyvio might be in the reverse direction. Since the Los Angeles County consultant had explicitly put a 10 April 2013 date at the top of his blog post, it would have taken that experienced WP editor no more than 5 minutes to use View History to access a version of the "Backup" article written before 10 April 2013. 2 minutes of manually comparing text would then have convinced the experienced WP editor that the Los Angeles County consultant had copied from the WP article, rather than in the other direction. I think the foregoing thoroughly establishes your need for some Learner's Permit time; sorry you consider that an insult.
Next we must consider your delay of over 7 days in giving me any statement of what my alleged copyvio consisted of. The only conceivable answer you could have given from the CopyPatrol run is that I had copied essentially the entire "Backup" article into WP, and 2 minutes using View History would have convinced anyone that that answer was nonsensical. Do you have evidence that you asked any other WP editor for advice? No, you apparently just sat there with your mouth shut until I pressed you for an answer. And after 7 days you blamed your problem on the CopyPatrol interface. I think your lack of intestinal fortitude in this matter is pretty well demonstrated; you should have grown out of that in your teenage years. Heck, I just admitted in this section that my phoning the Los Angeles County consultant was an error in judgement on my part. We all have room to grow morally and intellectually. DovidBenAvraham (talk) 12:11, 12 December 2018 (UTC)
NB. I have had several conversations with this editor about WP:NPA, WP:Civility and their variants about 3/5 of the way into the section here. JohnInDC (talk) 12:19, 12 December 2018 (UTC)
It was not the article As a whole that insulted me, it was the line "Having a bit more intestinal fortitude than User:Username_Needed". I was not waiting to respond, I was merely offline. I also consider bringing me personally into this conversation in the first place was continuing on with a mistake that already had been resolved. [[[User:Username Needed|Username]] Needed] 13:06, 12 December 2018 (UTC)
I agree with both points - that this is beating a dead horse (at needless length to boot), and that the complaint was gratuitously insulting. JohnInDC (talk) 17:11, 12 December 2018 (UTC)

Copied from my personal Talk page, with elisions of the Los Angeles County consultant's name. You'll see the reason for the copy below:

It doesn't violate copyright. He can use it freely. See WP:Copyrights. Probably all he needed to do was to add an attribution somewhere on his blog page - if that. I wonder - is this better, or worse, than making a mistake with an unfamiliar copyvio tool? JohnInDC (talk) 11:56, 12 December 2018 (UTC)
Except I didn't say in my voicemail message that [the Los Angeles County consultant] should delete the copied WP text. I just stated that there appeared to be a copyright violation, and gave my phone number. By no particular coincidence, yesterday I took a fast look at the appropriate WP article on copyright. I think you're correct that adding an attribution would have been sufficient, probably accompanied by adding back those few citations that were in the WP article as of 10 April 2013. But then that wouldn't have showed off [the Los Angeles County consultant]'s consultative brilliance to the same extent. Should I have just said nothing?
As far as "making a mistake with an unfamiliar copyvio tool", I've replied to Username Needed's complaint of having been insulted here. I didn't mention my temporary decision to quit editing Wikipedia for at least 3 months, made before Username Needed reverted his/her reversion. Thank the Lord I didn't hit anyone rolling across that lawn in 1956. DovidBenAvraham (talk) 12:55, 12 December 2018 (UTC)
Yes, you should have said nothing. Because you don’t know what you are doing, you left a misleading message on a private party’s voicemail claiming a “copyright violation“ on material which with only the most minor of adjustments, he is fully entitled to use, and caused him to take an action that he didn’t need to take. The effect of your dispute here is now rippling outside of Wikipedia. You are making things worse, not better. JohnInDC (talk) 13:12, 12 December 2018 (UTC)
I'll concede that Username Needed's intestinal fortitude may be the equal of that of Luke Skywalker or Lara Croft. But that concession means we would have to find another reason for the 18-hour delay in reverting his/her reversion, and the further 7-day delay in providing an explanation that proved to be inadequate. I hereby propose that, in searching for copyright violations in a great many WP articles, Username Needed is stretching himself/herself too thin. IMHO Username Needed has admitted above that he/she should do less WP editing, and do it more thoroughly.
If JohnInDC works his way through the links in the Wayback Machine version of [the Los Angeles County consultant]'s Web page, he will find this page that also includes an 18 April 2013 article entitled "Viruses, Trojans and Malware in general". A Google search on the phrase that consists of the first 6 words of that article led me to this page which—in an updated version—still exists on Cisco's website. Truly "The wicked flee when no man pursueth ...", but someone connected with Cisco—which may have insisted on his/her adding more than an attribution—seems to have pursued the Los Angeles County consultant about that part of the Web page. DovidBenAvraham (talk) 18:35, 13 December 2018 (UTC)

DovidBenAvraham (talk) 19:04, 13 December 2018 (UTC)

  • I will say again. This horse is dead. Also wikilinking my name pings me. I am getting 3-4 pings everytime you make an edit. As explained above, I was offwiki for 18 hours, then breifly came back on, noticed that I had made an incorrect edit, and reverted it, lacking the insight to realise that you wanted an explanation, until you elaborated further. Sometimes people make mistakes, and then do not realise that they have made them until a day later. 09:01, 14 December 2018 (UTC)

The reason for the copy above is that CopyPatrol raises several policy questions for the "WP copyvio detection squad", which is my term for the editors who detect and deal with copyvios:

  • Should CopyPatrol now be used by the "WP copyvio detection squad"?
    Not at all, IMHO, until both its GUI and its user documentation is greatly improved. CopyPatrol, whose "seekrit" documentation is here, is AFAICT an apparently new combination of Turnitin and the database in User:EranBot. Trigger warning: The rest of this paragraph will almost certainly cause the reader to slap his/her forehead, and may cause him/her to roll on the floor laughing/crying. The "Backwards copies" paragraph in "Usage" starts out with "Note also the possibility of backwards copies, where the source appeared to copy content from Wikipedia. This is not necessarily a false positive ...", which does not really disclose that CopyPatrol shows reverse copyvios (at least more often than Earwig's Copyvio Detector)—which IMHO is what really confused Username Needed on 29 November. Also, because CopyPatrol—to avoid repeats of lengthy compare processing—maintains a database containing the results for every WP article already compared, there appears to be no way in the GUI to initiate a rerun of an article already checked; that database already contains the results of compares made in October 2016—probably with Earwig's Copyvio Detector rather than CopyPatrol—for the "Retrospect (software)" WP article. (This makes sense for the Turnitin commercial product, because high school and college students must at the least re-submit—presumably as a different document—any paper that has been previously flagged for plagiarism.)
  • After CopyPatrol has its GUI and user documentation fixed, should it be used by the "WP copyvio detection squad"?
    Not unless each squad member allowed to use it has gone through supervised "Learner's Permit" training that teaches how to distinguish between an outside-article-to-WP-article "normal" copyvio and a WP-article-to-outside-article "reverse" copyvio. Moreover, when Username Needed told me on 7 December that he/she had used CopyPatrol on 29 November, I was able to recognize the "reverse" copyvio because of three special circumstances: (1) The Los Angeles County consultant had (stupidly IMHO) inserted a 10 April 2013 date at the top of his copy. (2) I knew that the first 7 pages of the WP "Backup" article—highlighted by the CopyPatrol GUI—had not previously been substantially edited since 2011. (3) I was willing to spend some time on this, because it was allegedly my copyvio. Those circumstances combined to enable me to identify a "reverse" copyvio, but a member of the "WP copyvio detection squad" would be unlikely to encounter them again. So in addition at a minimum CopyPatrol needs a facility to report the File Creation Date of the outside document being compared to the WP article (whose Date Added can be approximated from View History), but I don't know enough about the details of Turnitin to know if that's feasible for a non-WP Web document.
  • Once CopyPatrol is ready to be used by a re-trained "WP copyvio detection squad", should ordinary "squad" members—or anyone—be allowed to communicate with perpetrators of "reverse" copyvios (which CopyPatrol will detect a lot of)?
    Not ordinary members, because the experience reported in my 12:55, 12 December 2018 (UTC) comment shows that that communication requires a more delicate touch than I—for one—possess. IMHO an ordinary "WP copyvio detection squad" member, upon discovering an apparent "reverse" copyvio, must communicate that fact to a supervisory "squad" member having further specialized training and HR talent. However the supervisory squad member should not hesitate to communicate with the perpetrator, because AFAIK (IANAL) a years-long history of failing to pursue "reverse" copyvios may eventually result in a judicial decision that Wikipedia has lost its right to insist upon attribution. DovidBenAvraham (talk) 04:27, 14 December 2018 (UTC)
  • I'm trying to make sense of the wall of text above. It looks like somebody misidentified something as a copyvio when it was in fact a reverse copyvio and fixed the mistake and apologised when this was pointed out to them. In which case there isn't much to discuss here. People are human and make mistakes from time to time, the only thing we can do in that situation is fix the mistake and apologise. Every automated copyvio detection tool is only an aid to human judgement, the results should never be used blindly (the way you're quoting Earwig confidence percentages suggests you're doing this as well). You've now graduated to making very elaborate proposals involving the creation of something called the "WP Copyvio Detection Squad", several ridiculously bureaucratic proposals which will basically ensure nobody does any copyvio detection work, and technical proposals which aren't very workable. There's no need to spend so long harping on over a simple mistake. Hut 8.5 07:57, 14 December 2018 (UTC)
I've watched this entire episode unfold in real time. I wasn't able to keep it from unspooling further, but I couldn't have summarized it better than that. Thanks and well done. Now perhaps we can be done with it. JohnInDC (talk) 11:46, 14 December 2018 (UTC)
I hate to drag this out any further, but I have to object to post-facto revisions of Talk page comments such as this which render later editors' comments cryptic or nonsensical. See Wikipedia:REDACT. JohnInDC (talk) 17:16, 14 December 2018 (UTC)
First, I apologize for the post-facto revisions of my 04:27, 14 December 2018 (UTC) comment. I took to heart Hut 8.5's criticism of the style of that comment, especially since the style had evidently led to his misunderstanding it. If you want drama you can read about it here, but my major revision was to change WP Copyvio Detection Squad—a joke that went over Hut 8.5's head—to "WP copyvio detection squad". I'm not advocating the formation of such a squad, but saying that in fact it already exists—with Hut 8.5 and Username Needed being prominent members. Other than that, my only other substantive edits were adding "(3) I was willing to spend some time on this, because it was allegedly my copyvio" to the second substantive paragraph, adding "(at least more often ...)" and "(which CopyPatrol will detect a lot of)" as parenthetical clarifications, and deleting "Having added ..., and having started edits ...,,"—which I realized were unnecessary clauses that contributed to the "wall of text" impression Hut 8.5 complained of. DovidBenAvraham (talk) 07:58, 15 December 2018 (UTC)

And now let me correct Hut 8.5's mis-perception of what I have been saying in this—retitled by me—section. Although I did initially throw in some criticism of Username Needed, I came to realize that he/she had been sandbagged by trying to use CopyPatrol. Read this section of his/her personal Talk page; it clearly shows that even 7 days later it required me to identify what CopyPatrol had found as a reverse copyvio. That was because I took the time to look at it; being a member of the "squad" requires one to take as little time as possible dealing with an individual possible copyvio, in order to keep up with the ceaseless flow of WP edits. IMHO it took considerable intestinal fortitude for Username Needed to say in his/her 10:52, 12 December 2018 (UTC) comment above in this section "I will not be using the Crow's copyvio tool in the future", considering how much emotional investment members of the "squad" such as CrowCaw and Hut 8.5 have in CopyPatrol.

Here's an executive summary of my 04:27, 14 December 2018 (UTC) "wall of text" comment:

  • CopyPatrol is IMHO currently a piece of c**p, which nobody should be using until its GUI and user documentation are fixed.
  • Because CopyPatrol evidently identifies a lot more reverse copyvios than Earwig's Copyvio Detector, members of the "squad" using CopyPatrol after the required fixes are IMHO going to need additional training and a further enhancement to CopyPatrol.
  • Once an ordinary "squad" member uses a fixed and enhanced CopyPatrol to identify a reverse copyvio, IME he/she will need to turn to a "supervisory" member of the "squad" to perform the delicate task of communicating with the violator (who will likely not be a WP editor). An alternative would be to simply ignore the reverse copyvio, but IMHO a history of such ignorings could eventually result in a court decision that anybody can freely copy a WP article without attribution.

OK, I can read signs, and the one at the top of this page translates as "This page is the baseball dugout for members of the 'squad'; if you're not a member, you need to go elsewhere." Just think of me as a retired applications programmer who wandered into the dugout looking for his lost baseball and stayed to express some maybe-appropriate opinions on why it got lost; I can find my own way out. DovidBenAvraham (talk) 10:13, 15 December 2018 (UTC)

  • Seriously, drop it. Imagine what you'd do if you found out someone had made a spelling mistake in an article. You'd fix it, or point out to that person that there was a spelling mistake, and that would be it. You certainly wouldn't write thousands of words criticising the person who made the spelling mistake, complaining that the person didn't spot their own spelling mistake, railing against people who change article contents for allowing spelling mistakes to be introduced, insisting that nobody be allowed to edit Wikipedia before the GUI is changed to include an automatic spell checker, and requiring that every Wikipedia editor go on a spelling course. I was going to write another paragraph explaining that the "WP Copyvio Detection Squad" doesn't exist and why your suggested technical changes are impossible or won't work, but honestly there's no point. Hut 8.5 11:18, 15 December 2018 (UTC)

What we should learn about CopyPatrol problems from the Backup "copyvio" incident

An invalid analogy and a mis-characterization of proposed solutions are not going to help solve the evident problems with using CopyPatrol as it presently exists. I've therefore started a new section on this page that is meant only to refer as far back as my 04:27, 14 December 2018 (UTC) comment in the immediately-preceding section.

  • Invalid analogy
    If I had made a spelling mistake in an article, I would not have been given a template warning on my personal Talk page with a next-to-last sentence reading "Wikipedia takes spelling mistakes very seriously and persistent violators of our spelling mistake policy will be blocked from editing."
  • Mis-characterization of proposed solutions
    I didn't insist that nobody be allowed to edit Wikipedia [my emphasis] before CopyPatrol is changed to include a facility to report the File Creation Date of the outside document being compared to the WP article, and requiring that every Wikipedia editor [my emphasis] go on a CopyPatrol course. I said "at a minimum CopyPatrol needs a facility to report the File Creation Date of the outside document being compared to the WP article (whose Date Added can be approximated from View History), but I don't know enough about the details of Turnitin to know if that's feasible for a non-WP Web document." I also said "Not unless each squad member allowed to use [CopyPatrol] [my emphasis] has gone through supervised 'Learner's Permit' training that teaches how to distinguish between an outside-article-to-WP-article 'normal' copyvio and a WP-article-to-outside-article 'reverse' copyvio."
  • What I'm actually proposing as a minimum software solution
    What if the Los Angeles County consultant, instead of deleting his 10 April 2013 copy of the Backup article, had phoned me back and said "I wrote that material first, and a Wikipedia editor copied it; delete Wikipedia's article as a copyright violation."? How would I prove that he was lying? CopyPatrol seems to be basically a new version of Earwig's Copyvio Detector that automatically turns on the Use Turnitin option. The Turnitin software "checks for potentially unoriginal content by comparing submitted papers to several databases using a proprietary algorithm. It scans its own databases, and also has licensing agreements with large academic proprietary databases." If Turnitin's own databases include the date a compared-to document was first entered into a database, having that date displayed would be a very convenient way for a member of the "squad" to distinguish between an outside-article-to-WP-article 'normal' copyvio and a WP-article-to-outside-article 'reverse' copyvio. If that's not possible, then the "squad" member examining the CopyPatrol result would have to do what I did, which is to use the Wayback Machine to approximate the date the outside article was first created. It takes time to do that, so it would be nice if CopyPatrol could automate that process. If that automation also turns out to be infeasible, would a member of the "squad" have the time to determine the direction of the copyvio—and if not why should he/she bother using CopyPatrol at all?
  • What I'm actually proposing as a minimum training solution
    In my 10:44, 12 December 2018 (UTC) comment, I quoted the "squad" member who reverted and then un-reverted the alleged copyvio as saying "I was using User:Crow's copypartol [sic] tool, which has quite a complicated interface, and I misread it. I have no idea if there ever was any copyvio, whether it has been removed and whether it is still in the revs." That makes it obvious that the CopyPatrol GUI needs to be improved and given proper user documentation. It also makes it obvious that every "squad" member using CopyPatrol needs to be given sufficient training to correctly identify a "normal" copyvio and a "reverse" copyvio. If the copyvio is identified as a "normal" one, then the "squad" member can presumably be trusted to follow the same procedure the "squad" member originally used for my 29 November edit. If OTOH the copyvio is identified as a "reverse" one, then some member of the "squad" has to be trained to properly communicate with the violator—who is likely not to be a Wikipedia editor. If that training can't be done, then why bother to use CopyPatrol—which because it automatically uses Turnitin has a greater likelihood of detecting "reverse" copyvios than Earwig's Copyvio Detector? DovidBenAvraham (talk) 05:54, 19 December 2018 (UTC)
I think it's sufficient if we remind editors to do their very best to try to avoid mistakes, to be gracious and understanding when their mistakes are pointed out to them - and, to be equally gracious and understanding on the occasions they are drawn in by the good faith mistakes of others. JohnInDC (talk) 12:29, 19 December 2018 (UTC)
Indeed. Hut 8.5 21:47, 19 December 2018 (UTC)

For simplicity, my comment beginning this subsection left out one item that was in the Should CopyPatrol now be used by the "WP copyvio detection squad"? paragraph of my 04:27, 14 December 2018 (UTC) comment: "Also, because CopyPatrol—to avoid repeats of lengthy compare processing—maintains a database containing the results for every WP article already compared, there appears to be no way in the GUI to initiate a rerun of an article already checked; that database already contains the results of compares made in October 2016—probably with Earwig's Copyvio Detector rather than CopyPatrol—for the 'Retrospect (software)' WP article. (This makes sense for the Turnitin commercial product, because high school and college students must at the least re-submit—presumably as a different document—any paper that has been previously flagged for plagiarism.)" Have I missed some way of initiating a CopyPatrol rerun for a WP article that has already been processed? DovidBenAvraham (talk) 21:55, 23 December 2018 (UTC)

DovidBenAvraham, would you please stop editing your posts after you've made them? – if you can't get your quotation marks right the first time, please either leave them out or leave them alone. For the rest of it, backwards copying is a common occurrence, and is adequately handled when it sends an alarm message. This is a volunteer project: there isn't a squad, there isn't any training, and I'm pretty sure there's no value in prolonging this discussion. Could you please just drop it now? Justlettersandnumbers (talk) 12:06, 24 December 2018 (UTC)

Justlettersandnumbers and others: I'm not ready to drop this matter yet, because you have collectively failed to acknowledge that the only way my alleged 29 November copyvio was "adequately handled" was that the volunteer who alleged it bravely realized he/she was in over his head and accepted my word that I hadn't made a copyvio. With additional information about the faulty diagnosis supplied 7 days later, I—not that volunteer—determined that the copyvio was in fact backwards copying. I've made a couple of suggestions—based on my 40 years of experience as an applications programmer—that would make such a determination easier for a volunteer, but nobody has been willing to soberly discuss the feasibility of those suggestions.

Moreover, nobody has been willing to discuss the fact that "When issues have been resolved in CopyPatrol, the indicator will not disappear from the feed. Once a page has been flagged for potential copyvio in the feed, it will stay that way." That means that I could phone the Los Angeles County consultant and tell him that it's now safe to copy any of the contents of the "Backup" article back into the pages on his promotional website without attribution, because no WP editor can use CopyPatrol to check that WP article a second time. Of course I'm not going to make that phonecall, because a lot of people—including me—have put effort into the article over many years. Let me point out that I have made 196 edits to that article since I started on 15 November 2017, and for none of them—except for the one on 29 November—was I notified of a backwards copyvio that had evidently existed since 10 April 2013. So Earwig's Copyvio Detector doesn't detect many backwards copyvios when it uses Google instead of Turnitin, and—like the legendary piano tuner Oppornockity—CopyPatrol "tunes but once". How about dealing with that problem, instead of berating me for a trivial grammatical edit to my own comment?

This collective behavior on your part is IMHO a prime illustration of "One motivation for the project is a significant decline in the number of people considered active contributors to the flagship English-language Wikipedia: it has fallen by 40 percent over the past eight years, to about 30,000. Research indicates that the problem is rooted in Wikipedians’ complex bureaucracy and their often hard-line responses to newcomers’ mistakes, enabled by semi-automated tools that make deleting new changes easy." The same three-year-old MIT Technology Review article goes on to quote Aaron Halfaker, a senior research scientist at Wikimedia Foundation, as saying "I suspect the aggressive behavior of Wikipedians doing quality control is because they’re making judgments really fast and they’re not encouraged to have a human interaction with the person".

You volunteers may not like the term "squad" but you are one; I came across a Leaderboard for you the other day, with Diannaa at the top. In the U.S. volunteer fire fighters "go through some or all of the same training as career personnel do". I wish all of you a Happy New Year with better tools, better training, more time, and less of a "drag the wagons into a circle" attitude. DovidBenAvraham (talk) 23:40, 25 December 2018 (UTC)

Hello, Diannaa here. I am going to try to answer a couple of your questions. The CopyPatrol interface uses a subscription to Turnitin, which produces an iThenticate link on each CopyPatrol report. Clicking on the iThenticate link takes you to the report generated by that external service, which donates their material to Wikipedia. Each item listed in the "match overview" box has a "crawled on" date which is the date the matching page was detected by the iThenticate service. If you've got a failproof way of determining the creation date of a webpage, I'd appreciate knowing what it is, because even with several tools at my disposal I am not always able to answer that question. Checking as to who had content first is often but not always possible. The Wikipedia history page can tell us when a particular piece of content, but it's not always obvious where it came from, particularly if it was copied or moved from elsewhere on Wikipedia without the required attribution. I do lots of CopyPatrol reports every day and can usually but not always spot when material has been copied from another article or from an old revision of the same article. Mistakes will happen from time to time.
There was a suggested restriction that we should prevent totally new editors from using the CopyPatrol interface, and is still an open ticket, considered low priority. phab:T178700. Since the user who made the error has 3300 edits in a year and a half on Wikipedia, they would not have been stopped from attempting to help at CopyPatrol by such a barrier anyway. I do check the leaderboard and see who's been using the interface to confirm that they are experienced editors, and may perform spot checks on their work, but (like every area of Wikipedia) there is no formal training required.
You seem to imply that Wikipedia editors should take responsibility to contact people who have re-used our content without providing the required attribution. As license holders of the content we have written, we are within our rights to do so, but it's not something I personally would do, as I don't have time, and virtually every bit of material on our wiki has been reproduced in multiple locations. I have no interest in trying to police the whole Internet, content to spend my online time cleaning and maintaining Wikipedia itself.
You offered a link to this discussion and have misunderstood what is being discussed there. What they are talking about is a link to Copypatrol on the New Pages Feed. This is New Pages Feed. What they are saying that there's a new feature at New Pages Feed: a little note on the bottom right that offers a link to the CopyPatrol report if one exists. The material you quoted ("When issues have been resolved in CopyPatrol, the indicator will not disappear from the feed. Once a page has been flagged for potential copyvio in the feed, it will stay that way") means that the link at New Page Patrol will not be removed if the CopyPatrol report has been resolved. It doesn't mean that it's not possible for CopyPatrol or iThenticate to check against that website a second time. Sorry you had a bad experience. — Diannaa 🍁 (talk) 16:14, 26 December 2018 (UTC)
Thanks for your helpful response, Diannaa. With regard to the Turnitin/iThenticate service, if that service's "crawled on" date is merely the date the matching page was detected in a WP copyvio search, then that date wouldn't be useful in determining the direction of a copyvio. I was hoping that iThenticate "pre-emptively crawled" the Web. There is a service that "pre-emptively crawls" the Web; it is known as the Wayback Machine. As I said in a comment some distance above, I was able to use the Wayback Machine to approximate the date the Los Angeles County consultant created his non-WP promotional Web pages—which was years after the WP article he had copied without attribution was written. That exercise took me a few minutes of time, which members of the "squad" don't seem to have; I am still hoping that CopyPatrol could be enhanced to do that Wayback Machine lookup automatically for non-WP Web pages preliminarily identified as being involved in a copyvio.
As I've said in a comment some distance above, I personally left a voicemail message for the Los Angeles County consultant. That was sufficient for him to preemptively remove the WP article material from his promotional Web pages, but—as JohnInDC has reminded me—I shouldn't have done that without including in the message the fact that adding an attribution to WP would make the copying legal. I understand that members of the "squad" don't have the time to go after "reverse" copyvios properly, but—as a WP editor who creates new content—I would hate to have the meme spread around that "you can copy any Wikipedia article without attribution; WP won't go after you".
Maybe it's possible to get CopyPatrol to check against a specific WP article a second time, but I can't figure out how to do it. For my own infrequent use, can you please tell me how? DovidBenAvraham (talk) 19:31, 26 December 2018 (UTC)
The CopyPatrol tool is a bot that checks all additions over a certain size using the donated Turnitin service. There is no way to ask that particular bot to check a specific Wikipedia article. The way to check a specific Wikipedia article is to use Earwig's copyvio detector. It does not use the Turnitin service, but rather checks using a Google search engine, for which Google donates to us 10,000 free daily queries. There's adavantages and disadvantages to each. For example, Turnitin occasionally finds material that is no longer present on the Internet and was never archived by the Wayback Machine. and Earwig's tool will check material already present in the article, not just new additions. — Diannaa 🍁 (talk) 20:31, 26 December 2018 (UTC)
Thank you again, Diannaa. Forgive me for pursuing this incident, but 40 years in the profession from which I am retired makes me want to understand any system problems I encounter—including procedural problems as well as programming problems. I think what you're saying is that the Web page I linked to gives only the results of CopyPatrol runs on WP articles, and doesn't allow anyone to initiate a CopyPatrol run. That would explain why I couldn't figure out how to get CopyPatrol to check against a specific WP article a second time. I think what you're also implying is that there is some super-secret process that automatically runs CopyPatrol against every newly-written WP article, at a point in time when there is no possibility of a "reverse" copyvio. So there's in fact no need to enhance CopyPatrol to enable a member of the "squad" to diagnose a "reverse" copyvio, because he/she will never encounter one while using CopyPatrol results.
There's only one problem with this conclusion, which is that somehow on 29 November Username_Needed was required to analyze the results of a CopyPatrol run on the "Backup" article—an article which has been somewhat steadily expanded since it was created in 2004. I know Username_Needed hates to get a message that he/she has been mentioned in connection with what I have called this "bungle", but I've ensured he/she will get a message so that he/she will be motivated to explain here how and by whom that CopyPatrol run was made. DovidBenAvraham (talk) 03:07, 27 December 2018 (UTC)
(1) CopyPatrol assesses every edit over a certain size, not just page creations. Therefore reverse copyvios and Wikipedia mirrors are indeed encountered. (2) Username_Needed did not initiate a bot run; the bot runs endlessly (or until a problem occurs; it has currently been running flawlessly for about three months non-stop). It checks all edits over the size limit, and reports its findings at https://tools.wmflabs.org/copypatrol/en. Interested patrollers visit the interface daily and attempt to determine whether or not a violation has occurred for each reported item. — Diannaa 🍁 (talk) 03:45, 27 December 2018 (UTC)
OK, so everything I said in the first two paragraphs of my 19:31, 26 December 2018 (UTC) comment turns out to be still valid. Sorry to have misread Diannaa's 20:31, 26 December 2018 (UTC) comment, folks; Username_Needed need not give any explanation. So how about discussing those first two paragraphs as part of a rational systems analysis of CopyPatrol? DovidBenAvraham (talk) 04:45, 27 December 2018 (UTC)
Thank you for your suggestions and comments. — Diannaa 🍁 (talk) 12:12, 27 December 2018 (UTC)
Actually what I said in the third paragraph of my 19:31, 26 December 2018 (UTC) comment turns out also to be still valid, provided you interpret "I can't figure out how to do it" as meaning "it's impossible for an editor to do it (other than by making an edit that's over the size limit per Diannaa)". DovidBenAvraham (talk) 13:01, 27 December 2018 (UTC)
Now let me discuss the third paragraph of my 19:31, 26 December 2018 (UTC) comment, as part of a rational systems analysis of CopyPatrol. I'll take a real-life example: the copyvios I committed that were caught by Earwig's Copyvio Detector in Report 24898397 for text added to "Retrospect (software)" at 17:12, 06 October 2016 (UTC). What I had done, as Diannaa spotted, was to copy overly-long passages (even though I properly quote-marked and referenced them) from two Retrospect Inc. User's Guides. Within a day or two I had paraphrased the offending passages. Let's assume that those User's Guides could not have been found via a Google search, so that CopyPatrol would have been needed to find the copyvios.
If either Diannaa or I had then wanted to verify that my paraphrasing was satisfactory, how could we have done so? IIRC I did not make "additions over a certain size" (which I think from the "Backup" View History must be approximately 1KB), but simply reworded the offending passages. If so, CopyPatrol would not have re-analyzed the article. Does using Earwig's Copyvio Detector, specifying a Revision ID and check-marking Use Turnitin, actually now do the re-analysis we would have wanted? If not, there's a gaping hole in the "squad"'s software + procedures for being able to follow up on detected "normal" (AKA "forward") copyvios.
I'll let you folks follow up on that question. I've contributed what I can, so I'm leaving this discussion. DovidBenAvraham (talk) 04:36, 28 December 2018 (UTC)
I need to make a correction to the second paragraph of my 04:36, 28 December 2018 (UTC) comment. Belatedly looking at the Revision History for Retrospect (software)—rather than relying on my own memory, I see that I made 6 edits from from 16 October 2016 to 18 October 2016 that each were over 1KB. Therefore I presume that CopyPatrol would have automatically run again daily during those 3 days. My edits added back rewritten sections that Diannaa had deleted several days previously because of excessive quotations.
I still think that the second paragraph of my 04:36, 28 December 2018 (UTC) comment discloses a possible hole in the "squad"'s software + procedures for being able to detect—or follow up on detected—"normal" (AKA "forward") copyvios. What if I now, knowing what I know about when CopyPatrol is run for an article, added back the sections as they were originally written—complete with excessive quotations—in small-enough edits that CopyPatrol would not be triggered?
BTW, I notice that certain edits in that Revision History have their size-in-bytes bolded. Even though some of those bolded sizes are somewhat less than 1KB, is that bolding a tipoff that that the edits would cause CopyPatrol to be run on them? DovidBenAvraham (talk) 03:12, 29 December 2018 (UTC)

With regard to the first paragraph of my 19:31, 26 December 2018 (UTC) comment, iThenticate's own FAQ Web page, in the "Against what resources will my manuscript be compared?" paragraph, says "iThenticate also maintains its own web crawler, indexing over 10 million web pages daily and totalling over 50 billion web pages." Since iThenticate thus "pre-emptively crawls" the Web, it's likely that company could supply the date a particular non-WP Web page was first crawled. As I said in that paragraph, having that date would be a great time-saver for a "squad" member needing to determine whether a detected copyvio was in the "normal" or "backwards" direction. Diannaa, don't you think some "squad" leader should ask iThenticate if it can supply those dates? DovidBenAvraham (talk) 04:03, 2 January 2019 (UTC)

With regard to the "What I'm actually proposing as a minimum training solution" paragraph in my comment beginning this sub-section, I forgot to mention one obvious training deficiency—which is also an obvious software deficiency in Template:uw-copyright. As I mentioned in the section to which this sub-section belongs, Username_Needed did not give the URL of the document I was supposed to have copied. There is no parameter in Template:uw-copyright for the URL of the document, and Username_Needed did not include the URL in the Edit Summary for his/her revert. Therefore for 7 days I was left totally in the dark about the nature of my alleged copyvio. And it was only because Username_Needed—after 7 days—gave me a link to the CopyPatrol listing that I was able to see the URL and determine that the copyvio was in fact in the backwards direction. Therefore I propose that (1) every new member of the "squad" should have one thing drilled into his/her head: "Give the URL in the Edit Summary!" and (2) Diannaa should have Template:uw-copyright modified to include a URL parameter. Based on my own experience, I'd say there is a 98% chance that any WP editor falsely accused of a copyvio will simply stop editing Wikipedia—unless that editor is determined to keep putting self-serving material into WP articles. So unless these two solutions are implemented, my prediction is that Wikipedia will continue its transition into a less-disciplined version of Facebook—which is IMHO a major reason the non-editing public has little respect for the content of WP articles. And AFAIK Facebook (which I don't use) has no position for volunteer "squaddies"—so you'll have to find something else to do with your spare time. DovidBenAvraham (talk) 11:56, 8 February 2019 (UTC)

What happens when an author wants to have his own article published

I hope to head this off at the pass before I start creating a draft.

I've been approached by a Michael M Wood who is the author of this article, on the Kentucky Historical Society Website http://kentuckyancestors.org/colonel-william-l-farrow-pioneer-soldier-statesman/ he would like to see this article published. The man is apparently a notable, but I do believe that he is indeed notable, and with his permission an article could be created with extensive copy editing, sans the genealogy stuff. If Michael M Wood grants permission can I do a copy edit, quoting liberally? Wiki commons has a permission slip (OTRS) for images and the link, is their something similar of mainspace? ThanksOldperson (talk) 22:56, 6 February 2019 (UTC)

See WP:IOWN for the procedures for a copyright owner to grant us the rights to use content. Hut 8.5 07:38, 7 February 2019 (UTC)
Hut Thanks, but it is not I who is the author. The author has an article on the Kentucky Historical Society webpage and is interested in having that article published. Your link to me to an explanation that applies were I the author trying to publish an article. This is not the case.Oldperson (talk) 17:05, 8 February 2019 (UTC)
Sure, but the same principles apply. If the author wants the article to be published here as an article and not deleted as a copyright violation then either he can post a notice at the source releasing it under the appropriate licences or he can email OTRS as described in the link. Hut 8.5 18:25, 8 February 2019 (UTC)

Input requested in discussion regarding links to potential copyright violations

There is a discussion at WT:C#CiteSeerX copyrights and linking regarding external links to possible copyright violations (copies of academic papers on CiteSeerX, of which some fraction are not properly licensed / inappropriately uploaded) in the context of bot-assisted additions of such links to citation templates. The discussion could use both wider input in general, and input specifically from editors who are familiar with Wikipedia's policies on copyright and experience in the practical application of them. If you believe there are other talk pages which could profitably be solicited for input on this I would appreciate a pointer. --Xover (talk) 17:30, 19 March 2019 (UTC)

😴😴😴

RfC on copyright status of Company-Histories.com

There is an RfC on the copyright status of Company-Histories.com, which appears to republish text from Gale's International Directory of Company Histories. If you're interested, please participate at WP:RSN § Rfc: company-histories.com. — Newslinger talk 14:20, 1 April 2019 (UTC)

Discussion on copyright status of biography.jrank.org

There is a discussion on the copyright status of biography.jrank.org, which appears to republish text from various Gale publications. If you're interested, please participate at Wikipedia:Reliable sources/Noticeboard § https://biography.jrank.org. — Newslinger talk 17:30, 10 April 2019 (UTC)

Can I be promoted to a CCI clerk?

As some may know, I've been active in this area and have been cleaning up reports. I even finished two, Wikipedia:Contributor copyright investigations/BigButterfly and Wikipedia:Contributor copyright investigations/BeeCeePhoto. However, I can only mark them as completed and am unable to archive them due only clerks being able to do so, and only one clerk (Lazygas) is currently active, and they seem to be busy with other stuff. Thus, I am asking the regulars here if they believe I meet the qualifications to become a clerk. I do not have any sort of past with copyright issues, and I believe I am knowledgeable enough in the field of copyvio to be qualified for the tools. Moved from the CCI talk page to here because it seems far more active.💵Money💵emoji💵💸 20:24, 4 May 2019 (UTC) Note: (Pinging @Sphilbrick:, @Diannaa:, and @Justlettersandnumbers:)💵Money💵emoji💵💸 13:39, 11 May 2019 (UTC)

Since there's been no response to your inquiry, I've gone ahead and archived the 4 completed cases for you. Wizardman used to look after the clerking, but he's now retired or semi-retired from Wikipedia. — Diannaa 🍁 (talk) 16:21, 25 May 2019 (UTC)
P.S. Thank you very much for your work at CCI. Appreciated — Diannaa 🍁 (talk) 16:23, 25 May 2019 (UTC)

large wave of presumptive deletions

As a heads up, I'm going to be nominating and tossing down articles from Wikipedia:Contributor copyright investigations/20110727 over the next few months, as many have fatal offline copyvios. 💵Money💵emoji💵💸 21:38, 30 June 2019 (UTC)

Circumstantial evidence and Copyvio

I have two questions: a general one about circumstantial evidence and CV, and a specific one regarding the application of such evidence in a particular case. I searched the archives here and at COPYVIO for circumstantial and nothing came up, so posting here.

The general question is: if you have circumstantial evidence that points to a possible copyvio, but hard evidence is difficult or impossible to come by, what should one do?

This question was prompted by a specific case, which appears to straddle the boundary between a possible CWW copy-paste (which I know how to deal with, and don't need a response about), a content dispute, and a possible plagiarism from an external site for which only circumstantial evidence exists. Unsure where to raise that case, I started an Rfd. I'd appreciate feedback about this specific case at this Rfd, or about the general question, below. Ping, please. Thanks, Mathglot (talk) 19:28, 18 July 2019 (UTC)

Suspected copyright problems at Her Secret

This text has way too many similarities with that text; entire sentences are almost the same. The article is some days old; a backwardscopy is extremely unlikely; the text asserts an older copyright term(2012). Lurking shadow (talk) 18:42, 11 August 2019 (UTC)

Article cleaned — JJMC89(T·C) 21:10, 11 August 2019 (UTC)

Suspected copyright problems at Baleleng

This edit is way too close to this paper, which is definitely older(2017) than the 2019 article. The text comes from section Analysis D. A copyright violations scan uncovered these similarities; they might be a bit too close to the source, and that source is no backwardscopy of Wikipedia either(because it is evidently older).Lurking shadow (talk) 23:12, 11 August 2019 (UTC)

Dealt with. Nthep (talk) 16:58, 13 August 2019 (UTC)

Noticeboard discussion on copyright status of documents hosted by Semantic Scholar

There is a noticeboard discussion on the copyright status of documents hosted by Semantic Scholar. If you're interested, please participate at Wikipedia:Reliable sources/Noticeboard § Semantic Scholar. — Newslinger talk 06:36, 25 August 2019 (UTC)

A user requested a RevDel at this article, for a copyvio from a US patent. Per Copyright on the content of patents and in the context of patent prosecution, I am not certain we need to delete the text and disabled the request for the moment. Could any experts comment please? —Kusma (t·c) 12:02, 24 September 2019 (UTC)

It appears that in this diff User:Peter grotzinger added copy-paste information from patent US8501186B2 (I tried using WikiBlame to find the diff but struggled so apologies). However, upon asking for a {{copyvio-revdel}} which Kusma responded to, Kusma pointed out: Copyright on the content of patents and in the context of patent prosecution, and that the text that was added predates the patent filing. I responded that "it appears at first glance that the addition predates the patent but actually the patent US8501186B2 was first filed on 2009-05-05 as a priority to IN2009MU01184A" (See here for full correspondence). I now wonder, is the diff above copyvio? If so I should rv this removal of it, if not the diff should be deleted from the article history. {{ping|waddie96}} {talk} 16:37, 24 September 2019 (UTC)

Updates to policy

Considering that as of 2019, a copyright holder can no longer exercise legal remedy to copyright without having first registered the copyright, (see the article here: https://www.natlawreview.com/article/us-supreme-court-holds-copyrights-must-be-registered-plaintiffs-can-file) it would appear that the copyright policy should be amended to require that a check for a copyright registration be conducted before any copyright action may be taken, in order to be consistent with the current limitations on copyright. This policy also needs to be amended to reflect fair use doctrine, which is not a violation of copyright under Section 107 of the U.S. copyright act. Notably, fair use is protected both under the Berne Convention as well as WIPO. 66.90.153.184 (talk) 02:21, 29 September 2019 (UTC)

Wikipedia page "later" published in scientific paper: copyvio?

{{cv-unsure|Steven Fruitsmaak (Reply) 19:56, 20 July 2019 (UTC)|https://en.wikipedia.org/w/index.php?title=Exome_sequencing&oldid=347638746}}

The article Exome sequencing was created by User:SarahKusala (contribs).

As can be seen on the article's talk page, User:Moonriddengirl has tagged the article as a possible copyvio of a scientific paper "Pussegoda KA. Exome sequencing: locating causative genes in rare disorders. Clin Genet. 2010 Jul;78(1):32-3."

Without going into further details about the real-life identity of this user, it is clear that both the Wikipedia page, as well as this scientific paper, are written by the same person/author.

The section Exome_sequencing#Case_studies still contains about 11 sentences that are literally identical between the Wikipedia page and the 2010 publication.

The Wikipedia page was started on March 4, 2010. The scientific paper was published online June 7, 2010. So basically, the same author released the content on Wikipedia first. However, we do not currently know exactly when the scientific paper was submitted.

I think this is a very unusual situation and so I am posting it here as well as on several other pages to let the copyright experts give their opinions on how to resolve this issue.

Most obviously I think a rewrite of the Exome_sequencing#Case_studies would, in any case, be warranted since it is not such a good section anyway. --Steven Fruitsmaak (Reply) 19:56, 20 July 2019 (UTC)

See https://en.wikipedia.org/wiki/User:SarahKusala/Exome_Sequencing That is a fake article. QuackGuru (talk) 20:05, 20 July 2019 (UTC)
@User:QuackGuru if you mean that it is like a review paper that the author wrote and turned into a Wikipedia page, yes then I think you are right. Proves how things went. But this was at the time original work not published yet elsewhere, prepared in a user Sandbox. Question is, how do we deal with this situation? --Steven Fruitsmaak (Reply) 20:08, 20 July 2019 (UTC)
See WP:FAKEARTICLE. The first step is to nominate this page for deletion or request an admin to delete it. QuackGuru (talk) 20:12, 20 July 2019 (UTC)
@User:QuackGuru I do not fully agree that this is your typical "fake article". The author is a genuine researcher who created an article on an important topic which was missing at that time. You are referring to a sandbox sub-page of a user page, which is allowed. The Exome sequencing article itself is clearly a long-standing, well developed article of which only a few sentences in a minor subsection, have been copied in a scientific paper published after they were first published on WP. --Steven Fruitsmaak (Reply) 20:18, 20 July 2019 (UTC)
Abandoned drafts that are similar to the original article are most likely a fake article. The copyright content in the article can be tagged like this. QuackGuru (talk) 20:46, 20 July 2019 (UTC)
Has anyone emailed Kusala Pussegoda to ask? They are contactable via the Ottawa Methods Centre, so I'm happy to contact them in case they can shed any light. If the text really was written in the sandbox before it was published in a journal, then my understanding is that the CC BY-SA license takes precedent (and technically the journal is in copyvio, though it'd probably not be worth their while to do anything about it). T.Shafee(Evo&Evo)talk 23:36, 20 July 2019 (UTC)
An author can publish their same work as many times as they wish – under a CC-BY-SA licence or not. None of them are copyright violations. It simply doesn't matter which was published first; as long as the author remains the copyright holder, they retain the right to copy the material and to grant others a licence to copy it. --RexxS (talk) 01:46, 21 July 2019 (UTC)
Publishing in a journal will normally involve some sharing of copyright with the journal, unless it is a fully open one. But if the WP publication preceded that publication it is not a copyvio -or possibly, depending what contract was agreed with the journal and when, the author violated her contract with the journal. Either way, WP is ok. Quack, please stop quacking nonsense. The userspace draft is not a fake article nor an abandoned draft. It is just a draft (that was soon moved to article space) & I know of no policy against keeping those. Johnbod (talk) 04:06, 21 July 2019 (UTC)
If they published it here first and in the journal second, then it is under a CC BY SA license as they stilled owned the content when the published it. Whatever deal they have with the journal is not our problem. Doc James (talk · contribs · email) 04:49, 21 July 2019 (UTC)
I agree with Doc James. It doesn't matter when the journal article was submitted, what matters is whether the publication in WP occurred before any copyright assignment to the journal. Most (all?) journals ask for copyright assignment at the time of page proofs, which is typically at most 2-3 months before publication. The online version of the paper doesn't state when it was accepted, which would have been a useful clue to the date of page proofs, but in any case it seems very unlikely that copyright assignment happened as early as March.Logophile59 (talk) 21:00, 21 July 2019 (UTC)
agree with Doc James as well--Ozzie10aaaa (talk) 10:34, 22 July 2019 (UTC)

Full ref

Pussegoda, KA (July 2010). "Exome sequencing: locating causative genes in rare disorders". Clinical genetics. 78 (1): 32–3. doi:10.1111/j.1399-0004.2010.01414_1.x. PMID 20597920.

Doc James (talk · contribs · email) 04:51, 21 July 2019 (UTC)

@T.Shafee(Evo&Evo) Yes I have... --Steven Fruitsmaak (Reply) 19:37, 21 July 2019 (UTC)

@RexxS and @Johnbod, okay seems that everything is okay then. --Steven Fruitsmaak (Reply) 19:42, 21 July 2019 (UTC)

  • The good doctor is correct. If they published it here first, then there is a copyright violation, but the violation is republishing CCBYSA content without providing attribution. Does anyone care? Maybe not, since it would seem unlikely that they would sue themselves for violating the terms of their own license. The only one who may legitimately care is the journal, in as much as publishing content from Wikipedia reflects on their editorial integrity (and perhaps the fact that this could prevent them from taking legal action if someone else plagiarized the content from their own article). GMGtalk 13:13, 22 July 2019 (UTC)
Sorry, but this is incorrect. An author may choose to licence his copyrighted work to different people under different licnses. The author licensed the work to Wikipedia (and by extension to everybody) under CC-BY-SA. The author also licensed the work to the journal under whatever mechanism the author and the journal agreed to. There is no requirement for the journal to comply with CC-BY-SA. A third party cannot copy from the journal, with or without attribution, but a third party can copy the identical text from Wikipedia, with attribution that complies with he CC-BY-SA. The only possible legal or contractual issue is between the author and the journal, if the journal required a disclosure of the prior license to WP and the author did not so disclose. Such a problem does not affect WP. If the author had conveyed ownership of the copyright to the journal prior to releasing it under CC-BY-SA, then the author no longer had the right to release it under CC-BY-SA, as the right to provide a license would have conveyed to the new copyright owner. But we don't think this happened. -Arch dude (talk) 03:50, 9 August 2019 (UTC)
Hey, we're a preprint server! A lot of journals have explicit copyright-agreement exceptions for preprint licenses, to avoid having to turn down a paper just because it's already been released on arXiv. Such an exception might cover Wikipedia in this case, leaving the author in no legal trouble at all. I think we undersell our usefulness to researchers. HLHJ (talk) 02:29, 29 September 2019 (UTC)

Resurrected redirect history

May I ask for views at Talk:Electronic cigarette and e-cigarette liquid marketing#Article re-write and disconnection from history, please? HLHJ (talk) 03:46, 29 September 2019 (UTC)

Input wanted

I would like some editors experienced with copyright issues to comment here: Talk:List of Advanced Dungeons & Dragons 2nd edition monsters#Is this list a copyright violation?. Any input is welcome.4meter4 (talk) 00:57, 23 October 2019 (UTC)

IPCMS

There was a copyright infringement in the French fr:IPCMS article. See Talk:Strasbourg Institute of Material Physics and Chemistry. — Preceding unsigned comment added by 130.79.153.102 (talk) 16:34, 28 October 2019 (UTC)

List of English words of Persian origin

In the List of English words of Persian origin article, there are areas cited with ©. If these were normal references, there'd be no need for that and also judging from the article as a whole it seems someone has been copying material from copyrighted source material. — Preceding unsigned comment added by 84.82.12.118 (talk) 18:29, 5 November 2019 (UTC)

Potential Copyright Issues related to text on the the Voynich manuscript page

Dear all- yesterday, I deleted a large section of material on the Voynich manuscript page which was comprised of three paragraphs and an image which were nothing but unsourced claims about alleged connections between Asiatic languages and the Voynich manuscript. The content had been on the website essentially unchallenged since 2004. I was told ([3]) that the material might actually be a copyright violation and was directed to inform you all of the problem.

I invite you to take a look at my triage work on that page ([4]). One user said, "I would guess that there are more copyright violations in the article."

Thanks for any input. Geographyinitiative (talk) 22:29, 23 November 2019 (UTC)

See also [5]. Geographyinitiative (talk) 22:31, 23 November 2019 (UTC)

On Talk:National Climate Change Secretariat#Indicated copyright issue I stated my reasons for rejecting a copyvio revdel request on National Climate Change Secretariat. I invite review of this rejection. DES (talk)DESiegel Contribs 19:34, 26 November 2019 (UTC)

Non-offending page tagged by EranBot; easy fix?

I had a draftspace page tagged by EranBot, but it's not a copyright violation (it's part of a slightly modified EB1911 page, just like thousands of other EN WP pages). But it didn't get added to today's list of detected problems, so I can't comment there.

Would it be acceptable for me to CopyPatrol the page and mark it No Action Needed, even though I'm the only author? David Brooks (talk) 21:13, 1 December 2019 (UTC)

Withdrawn: another CopyPatroller reviewed the report. I realize in general it's not a good idea for an editor to decide that their own edit is not a violation, even if clearly defensible. David Brooks (talk) 16:37, 3 December 2019 (UTC)

Circumstellar habitable zone

On Talk:Circumstellar habitable zone (diff), a user has expressed concern that the second paragraph of Circumstellar habitable zone#Determination might be a copyright violation, copied from here (page 185). He/she has also suggested "all of other editors to refrain from additional editing." I'm not sure if it's a copyright violation; I'm not sure if it needs to be removed and/or revdeled. Could someone take a look and reply there? Thanks, 153.214.164.48 (talk) 09:12, 19 December 2019 (UTC)

Snowycats has requested a RD1 rev del on this page, but RD1 states "If redacting a revision would remove any contributor's attribution, this criterion cannot be used." In this case, all but the current version of the text include the copyright violation, which would remove attributions to numerous other contributors. The article was previously deleted, and then restored; I ponder whether deleting it again and starting afresh might be a better strategy? (Although thinking about it, that would remove attributions no less!) Harrias talk 09:43, 26 December 2019 (UTC)

Attribution just requires that the names of the authors of the content be available, not that the text of each revision be available. As long as the revdel is done on the text and not the name of the editor who added the material there's no problem. Hut 8.5 16:03, 26 December 2019 (UTC)

30 appreciated for whether this deletion revision is justified

At Talk:Pinsk#CCI_review, I asked whether removing an old paragraph or two are sufficient grounds for del rev of ~15 years of article's history with ~250 edits by dozens of editors. I'd suggest commenting there not here to keep the discussion in one place. TIA. --Piotr Konieczny aka Prokonsul Piotrus| reply here 17:44, 14 February 2020 (UTC)

Technically speaking, yes, this is the ground for revision-deletion (the names of contributors remain of course visible).--Ymblanter (talk) 09:07, 15 February 2020 (UTC)

Google Play

I am not an expert in copyvios. So while RanPPing, I found an article with a possible 43.8% chance of copyvio. The source of concern? Google Play. Now this isn't a reliable source, but the lead section is exactly the same as the description. {{replyto}} Can I Log In's (talk) page 04:49, 5 April 2020 (UTC)

  • This isn't a copyvio, you can see from the link you provided that the text originates from Wikipedia: Description provided by Wikipedia under Creative Commons Attribution CC-BY-SA 4.0 (see Riverhead (soundtrack)). This is fairly common, there are lots of sites out there which use text from Wikipedia, often without explicitly saying so. Automated tools like the one you linked to are a long way from perfect and you should never rely on them by themselves. Hut 8.5 13:11, 5 April 2020 (UTC)

Pretty much the entire History section of Whiteman Air Force Base is directly copy-pasted from the official Air Force Whiteman website. I know from reading the Wikipedia policy that plagiarized stuff is supposed to be deleted, but I hesitated because I saw that some public domain material might be an exception? Although I don't know if this qualifies as public domain. Looking for guidance. Thank you. JimKaatFan (talk) 20:32, 27 March 2020 (UTC)

It is acceptable to copy PD material as long as it is attributed. The attribution is in the "Other sources" section of the article. — JJMC89(T·C) 01:18, 28 March 2020 (UTC)
JimKaatFan, the material is PD, but the page was deleted because of copyright violations previously; it had been implicated in the Bwmoll3 CCI, aka 20130819. The deleted content was restored in what seems to have been block evasion, meaning there are now likely copyvios in the article. I'll figure out what to do tomorrow, but it would probably be best to roll the article back to before the ip edit. I'll note that, unfortunately, most of our articles on air force bases and air squadrons have been thoroughly corrupted by that CCI. Moneytrees🌴Talk🌲Help out at CCI! 03:19, 28 March 2020 (UTC)
Okay, so, does that mean the material should be removed, or at least rewritten, because of copyright violations? I'm not clear on this. But I'm willing to re-write it if that's what is needed. JimKaatFan (talk) 14:22, 28 March 2020 (UTC)
JimKaatFan, Yeah (sorry for not getting back to you sooner), I removed it, as violations from this user can be particularly tricky to find. The material from the website is in public domain and can be used in a rewrite. Moneytrees🌴Talk🌲Help out at CCI! 00:37, 30 March 2020 (UTC)
Hi Moneytrees🌴, I think I'm done reconstructing the history section. Would you mind taking a look at it and telling me what you think? Thanks. JimKaatFan (talk) 13:02, 5 April 2020 (UTC)
JimKaatFan, I'm don't know too much when it comes to these kinds of articles, but copyright wise it has no issues. Thank you for your work on this issue! Moneytrees🌴Talk🌲Help out at CCI! 15:54, 5 April 2020 (UTC)

Need an expert of copyright of the UK

Good evening, I was wondering whether this photograph, which comes from the City of Edinburgh Council's website, is free because of UK-GOV. Can anyone advice? -- SERGIO aka the Black Cat 16:35, 16 April 2020 (UTC)

Blackcat, no, unfortunately, per the sites notice; Non commercial material isn’t compatible with our license, so a lower quality version would have to be used instead. Moneytrees🌴Talk🌲Help out at CCI! 16:49, 16 April 2020 (UTC)
Ok @Moneytrees:, thanks for your quick reply. -- SERGIO aka the Black Cat 16:57, 16 April 2020 (UTC)

Copyright in another language?

Could somebody please advise me as to whether the speedy deletion WP:G12 qualifies for an article that is clearly a word-for-word translation from a webpage in another language? The article in question is Julián David Trujillo Moreno (Thánatos) which (as a Spanish speaker) I can tell is a direct translation of this web page, and I think it is very likely that both the web page and the Wikipedia article are autobiographical. Richard3120 (talk) 20:21, 27 April 2020 (UTC)

Richard3120, Yes, it definitely does. Thank you for bringing this here; I've deleted the page. Moneytrees🌴Talk🌲Help out at CCI! 20:33, 27 April 2020 (UTC)
@Moneytrees: many thanks – if it hadn't qualified I would have taken it straight to AfD as it was clearly self-promotional... the web page is a directory listing from the cultural center of the artist's home town, which I suspect was also a biography provided by the artist. Richard3120 (talk) 20:35, 27 April 2020 (UTC)

Irish Supreme Court cases articles

Hello WT:CP, I'm cross-posting this to New Pages Patrol as well as here at the suggestion of User:Hut 8.5.

Over the past academic year I have been working with students and law faculty to create articles on Irish Supreme Court cases. There is an explanation of the rationale and pedagogy behind this project on the front of my user page. The key point is that non-disruptive contributions were my priority.

When the project started I took feedback from WikiProject Law based on a few early articles. They pointed out the need to avoid WP:QUOTEFARM. I worked with the student editors to strike a balance between this guidance and the advice from my law faculty colleagues that it is often important to preserve and quote the exact language used in a decision.

In this respect, one thing I have noticed is that Earwig is giving a false positive on some of the articles due to the presence of direct quotes from the case decision. The case decisions are from the British and Irish Legal Information Institute (BAILII) and its usuage rules for Irish decisions follows the Courts Service of Ireland regulations. Its rules allow for re-use and quoting as long as citations are provided. Earwig is also flagging common legal phrases and proper nouns. We are checking every article with Earwig to make sure that any flags it puts up relate to material that is properly quoted and cited or refer to common legal phrases and proper nouns. If anyone has concerns about content please let me know. Alternatively, I will have all the articles on my watchlist so I can respond to concerns on individual article talk pages.

I will be moving articles to mainspace starting later today and tomorrow, and over the course of May. The law faculty and I are working through them in chunks to do the Earwig checks, formatting, and any final tidying.

The first batch of articles is (I will add links as I move them):

  • Okunade v. Minister for Justice & Others
  • Comcast Int. Holdings v Minister for Public Enterprise & ors and Persona Digital Telephony Ltd v Minister for Public Enterprise & ors
  • Bederev v Ireland
  • The Health (amendment) (No. 2) Bill 2004
  • AAA & Anor v Minister for Justice & Ors
  • DPP v  McLoughlin
  • Attorney General v Oldridge
  • O'Connell v The Turf Club
  • Roche v Roche
  • Minister for Justice Equality and Law Reform v Bailey
  • N.V.H v Minister for Justice & Equality
  • Dunne v Minister for the Environment, Heritage and Local Government
  • Minister for Justice, Equality and Law Reform v Murphy
  • C.K. v J.K
  • D.C v Director of Public Prosecutions
  • Delahunty v Player and Willis (Ireland) Ltd
  • Braddish v Director of Public Prosecutions (DPP)

AugusteBlanqui (talk) 10:56, 30 April 2020 (UTC)

Considering the court cases are free to be reused, it may be a good idea to add the website to User:EarwigBot/Copyvios/Exclusions to prevent false postives. That is, asuming there is no other material on the website that should not be copied. --MrClog (talk) 11:21, 30 April 2020 (UTC)
Thanks @MrClog:. I don't think there is any protected content. I've posted a request on Earwig. AugusteBlanqui (talk) 13:44, 30 April 2020 (UTC)
  • Since these are being used as cited quotations in the articles, they're really no different than any other copyrighted source being quoted. For purposes of Earwig ignoring the direct quotes, that's probably good. CrowCaw 14:07, 30 April 2020 (UTC)

Unable to list a problem here

I discovered text in a Wikipedia article that appears to have been copied verbatim from the website of the subject of the article. Unfortunately it seems that anonymous editors cannot file a report as it requires the creation of a page. So I will note here that the problem is at Lagos State Ferry Services Corporation. Please take any necessary action. 37.152.231.22 (talk) 11:08, 12 May 2020 (UTC)

I've listed it at Wikipedia:Copyright problems/2020 May 12 for you. — JJMC89(T·C) 02:20, 13 May 2020 (UTC)

Possible copyvio in TCEC season articles

Can I get some help with whether the material in the "notable games" sections of TCEC Season 14, TCEC Season 15, TCEC Season 16 and TCEC Season 17 (the last sections of all of those articles) are copyvio issues? They don't directly quote from the cited source, but the content is immediately identifiable as coming from it. If they are copyvio, what is to be done? Banedon (talk) 23:01, 3 June 2020 (UTC)

Copypaste from fandom.com

Greeny Phatom: The Movie is copied in its entirety from https://dreamfiction.fandom.com/wiki/Greeny_Phatom:_The_Movie. dreamfiction.fandom.com has a CC-SA 3.0 license, so is this a copyvio or just a hoax? I mean, seriously, a 729 million dollar gross? That would be three-fourths of what The Lion King made! On that alone I'm going to nominate it under WP:G3. But is it a copyvio? - Sumanuil (talk) 04:15, 5 June 2020 (UTC)

Deletion of pages with default notability

@MER-C: Given the recent 3+ month spate of deletions (mostly Billy Hathorn's work, I believe) of pages with default notability under WP:POLITICIAN, is there a way to increase publicity (or some other method) so that these pages get rewritten, rather than all content deleted? I'd do it myself (and have for a few), but I don't have the time. In this latest batch, Odon Bacqué, Girod Jackson III, Linda Harper-Brown, Rebecca Petty and Trent Ashby have default notability (and look to me to be pretty well-sourced), probably along with a few of the others. While we're a very long slog away from having all state legislators on WP, going backwards doesn't help. And then there are cases like James Benjamin Aswell and Joe Waggonner (both previously poorly sourced), where members of congress get deleted because very few people monitor this page. Star Garnet (talk) 20:12, 29 June 2020 (UTC)

  • There have also been quite a few prominent politicians listed on Political party strength in Louisiana who has their articles deleted due to copyright problems. — Preceding unsigned comment added by 2601:241:301:4360:b40c:873f:543d:d614 (talk)
    • A copyvio blanking places a notice that is much more prominent than being sent to AFD. Yet nobody is contesting the deletions via any means - not even on the respective talk pages, nor are they providing a short rewrite that will stop the links from going red when I clear the listings a week later. Wikipedia cannot host copyright violating articles regardless of the notability of the subject they cover. It is policy that major contributions of repeat copyright infringers may be removed indiscriminately. This is the proper venue for copyright discussions. MER-C 19:40, 30 June 2020 (UTC)
      • That would require an experienced and interested editor to view a page that probably goes weeks without a single view (WP:NTEMP, etc.), or to be monitoring this page on the off-chance something relevant gets posted. Leaving that aside, I have a few questions. What needs to be done to an apparently well-sourced article to remove copyvio concerns? What were User:Billy Hathorn's signature copyvios? Long excerpts, poor sourcing...? I see on their investigation there are multitudes of pages entangled in this. Roughly how many pages are facing potential deletion? There doesn't appear to be a hidden cat for these pages, but is it possible to get one? If there was, with the help of petscan, I could probably get a group at WP:POLITICS to work on this so long as there's a definable objective. Star Garnet (talk) 00:42, 1 July 2020 (UTC)
        Star Garnet, As someone who has worked in the trenches of this CCI, I can answer your questions:
        1. I would phrase the article down to it's most essential, non-copyrightable facts. Billy tended to write in this archaic, quickly identifiable way that reads more like a story or thesis; see this version of Bill Dodd for example,- note the massive walls of text and the "Notes" section.
        2. " Long excerpts, poor sourcing...?", yep, and those are just the tips of the iceberg. Billy covered all bases when it came to bad sourcing; early on, it was inproper formatting, bare urls, self citations, and blatant pastes; Later on he went more towards close paraphrasing and extreme refbombing, as many of his articles were being sent to afd (Look at his talk page history if you want hundreds of examples). And since some of the sources he copied from can't be scanned by earwig (offline books, newspapers, journals, unattributed unpublished interviews he did with article subjects), there are many presumptive removals.
        3. You're not going to like this, but the amount that could be deleted is around 2500+ as there are 5499 articles listed at his CCI (according to The bot), and he created most of them. Worse, he's an LTA with hundreds of spas, high edit socks, and ips, who's edits also need to be accounted for. I'm pretty sure he's still socking to this day; I think I last saw an ip I'm pretty sure was him back in April? These deletions have been going on for about a year now, and the reason you're seeing more is that his socks and post unban edits are now listed. The good news is that most of the articles set to be deleted are non-notable local politicians who held low level offices.
        4. There doesn't appear to be a hidden cat for these pages, but is it possible to get one? That's a good idea, I think something was done previously for another massive CCI a while ago. I'll look into that. Help would be greatly appreciated, as you can tell this has been quite the endeavor.
        Hopefully that clears some stuff up (and sorry for the novel!) Moneytrees🌴Talk🌲Help out at CCI! 03:06, 1 July 2020 (UTC)

subslikescript.com and other subtitle sites

Recently subslikescript.com has been added to several articles as a reference, e.g. on Craigslist and Larry David. They describe themselves as 'a simple search engine of subtitles available at a wide variety (of) websites', but they serve the files rather than embedding them or linking to them; they don't reveal the sources. Their page on copyright/DMCA does not fill me with confidence. I haven't found a clear statement on the copyright status of subtitles, only warnings that people should not make their own. I'd like others' views on the suitability of linking to subslikescript and other unofficial (user-generated or not) subtitle sites. BlackcurrantTea (talk) 03:05, 20 June 2020 (UTC)

I'm taking this page off my watchlist. Please ping me if replying. BlackcurrantTea (talk) 01:07, 10 July 2020 (UTC)

BlackcurrantTea, Sorry for the delay, I think those links should be removed. The copyright status isn't clear, it'd probably be safer and better to site the actual dialogue instead of a site. Moneytrees🌴Talk🌲Help out at CCI! 17:38, 13 July 2020 (UTC)
Thanks, Moneytrees, I removed them. If there's some sort of blacklist or way to keep editors from adding it, that would be helpful. If not, perhaps there's a bot that removes links like this, and we could add subslikescript to its list. Otherwise, I expect it (and similar sites) will keep showing up in references. BlackcurrantTea (talk) 19:27, 13 July 2020 (UTC)
BlackcurrantTea, Blacklisting isn't my area of expertise, but MER-C should know what to do. Moneytrees🌴Talk🌲Help out at CCI! 20:38, 13 July 2020 (UTC)
I don't think this arises to the level of blacklisting yet. MER-C 19:07, 14 July 2020 (UTC)