Inside Higher Ed. on Plagiarism: Data?!
There is an interesting line between copyright law and policies regarding plagiarism. In my mind, the latter is clearly the worse offense, though it is often accompanied by the violation of the former. For example, unattributed sentences may not be copyright violations but are incidents of plagiarism. Randy Picker had an interesting series of posts (here’s one) on that particular subject. But I wanted to report on this Inside Higher Ed article. This table excerpt from the table struck me as particularly weird:
Practice Not at All Not Likely Likely Definitely Economists Editors Econ. Editors Econ. Editors Econ. Editors Use of privately collected data 7.7% 2.8% 16.8% 16.8% 31.4% 32.7% 44.0% 47.7%
I cannot think of a case where the use of privately collected data would ever be a problem in an open access/open data scholarly society. All data used in a paper would have to be submitted to the journal editors, and made available to the readers. Further, data doesn’t come out of thin air. The process for the creation of the data would need to accompany it, even if it is not in the paper itself, datasets are rarely usable just thrown into a worksheet left for someone to deal with without context. You would have to blatantly lie about the data process creation or simply say you got your data from X, see Y for the data production process. Then, it would be immediately apparent to anyone searching for those types of datasets (because they would all be available and relatively well organized) that there existed two of the same datasets. Obviously, the mal-intentioned often have better ideas for skirting systems, but I just do not see the incentives in an open-data community to just saying where you got the data from and then using it for your analyses.


December 20th, 2007 at 23:00 -0600
[...] Or thereabouts. Another post in the theme of “Doctors Kill.” Click the link to read the full article. I really need to get down and dirty with some health care data one of these days. Too bad not all data is open, and use of it is considered plagiarism. [...]
April 17th, 2008 at 11:18 -0500
[...] mentioned before that Randy Picker has written about the shady lines between plagiarism and copyright [...]