ESWC 2017 Panel

Future of Proceedings Publication

On the 30th of May 2017, at the end of the first day of the Extended Semantic Web Conference, a panel was convened to discuss the Semantic Web Community's perspective on scholarly communication. The session opened with statements (ordered as pictured) from Aldo Gangemi (AG), John Domingue (JD), Sarven Capadisli, Aliaksandr Birukou (AB), Ruben Verborgh (RV) and Pascal Hitzler (PH) and continued with statements, questions and discussion from the audience, for an hour and a half.

Some men on a stage with a slideshow.

Photo credit: Harald Sack; pilfered from Twitter.

Complete transcript

(Transcribed by rhiaro, corrections welcome via PR or direct request).

AG: Intro.. the sides of scientific literature (list on slide)

Currently there is complexity to scholarly research. Dave de Roure described the scholarly social machine two days ago. With new technologies we have new opportunities. How do we preserve goodness of current situation while we work for the next generation? Balance between research and society?

We imagine a dystopian future with everything broken... or are we going towards research 5.0?

We are talking here specifically about proceedings. What services are really there? Computer scientists maybe care more than other disciplines. They can also most easily do this without intermediation.

JD: I'm wearing several hats. President of STI who run this conference. We want to make sure that anything we do is in the best interests of the participants of the conference. I have also spent some time actually doing work in the area. First half of this is from the researcher, the second half is thinking more holistically.

Back in the last century we had a project, early days of the semantic web. We were producing ontologies and knowledge models to describe publications. Contents was too long, but meta structure of any publication. When you publish something scientific you have a methodology, a problem, their relationships to other problems. Some papers support a previous paper, etc. We had high hopes back in 1999 that within three or five years all researchers would be using some kind of semantic knowledge modelling you could query and papers would disappear. Hasn't happened, but progress has been made. in 2006 we started semantic web dogfood site, which is still running today. We have lots of data on authors and institutions. Especially from eswc and iswc and others. I fell strongly the community should be eating its own dogfood.

Rexplore, uses semantic technologies to detect emerging technologies and trends based on the data publishers hold. I have a PhD student who will look at the various elements of scholarly publishing; authors, institutions, citations; can use blockchains to describe each element, each nano contribution, having that recorded forever in a trusted fashion. Moving to a reputational currency. Many academics worry about their reputation. Maybe you could instantiate that reputation as a cryptocurrency. In my experience, something to really take off you have to think about all the scholarly context and the stakeholders involved. The most obvious reason for publishing is to share knowledge. Individual career progression is another important context. The process for moving to an associate professor at the OU.. have to have 4 great publications in the last 6 years. Institutions worry a lot about reputation. the REF is a UK initiative running every 7 years, every academic in the UK is assessed for their research quality. The panel members who assess this take away around 1500 papers that they spend 3 months reading and then theyr'ea ll evaluated and the national ranking is produced per subject. Quarter of a billion pounds.

Then there's funding and effort. Whatever systems we create somebody may have to do some work. That needs to be funded somehow. For me, we can have fabulous technologies, we have lots, but until we satisfy this context it won't take off.

SC: I will talk a little bit around this initiative called Call for Linked Research. One of the things in there is to hopefully one day we can have the freedom to express ourselves and our work in a way that is more appropriate to the current state of the Web (especially in this community). Using the technologies that are available today on the web, and the technologies that we are working on. Some of the shortcomings of the current state of articles, reviews, proceedings, is that we're still operating on this two dimensional view of what articles should be. Static, antisocial, so forth. As opposed to interactive or semantic. There's this technical aspects about what articles and reviews can be, and there's having these things be open and accessible. Accessible to anyone without having to require any special access like putting up money to get access to information. Anyone at anytime anywhere in the world should be able to access this body of human knowledge. Of course, the current state of how we do scholarly communication is that it prevents us from doing that. Of course there are some ways to make that possible. As a researcher, the position I'm put in right now is that if i want to participate, be part of this community, make a contribution to humanity, the options that are given to me are that I have to give complete exclusive rights to my work to a third party publisher, or I have to pay quite a bit of additional money to make it open access. that's in addition to the funding that's being given to libraries to already pay for access to proceedings/journals. We have an article in this conference, we want it open access, at least that was the compromise that we had. That was 38 EUR per page. the alternative is that I would have to give that right up, it would not be accessible to anyone unless they are part of the library system or they can pay. This is in the context of what hte web enables me to do. 26 years ago the web allows anyone to be a publisher. I can share my knowledge with the rest of the world. I can allow reviews, comments, have an open discussion, back and forth, and have some means to archive this stuff. I have my own profile, I don't have to create an account, I don't have to go through easy chair, I can help other people learn from my articles, I can ping other potential reviewers. I'm also learning about making decisions on what my URIs should be, how do I integrate all this data that may be related. The publishing process I have to do. The things that I'm trying to encourage in the call for Linked Research is that we take a bit more responsibility and consideration in the knowledge that we want the rest of the world ot access. If we introduce this extra friction, which is completely unnecessary. It makes it less accessible, less reusable. We have the expertise and knowledge to do that. It's not a technical constraint in any way. If it is we can work that out. But I truly believe we need a social change or paradigm shift. Hypertext has been around for 50 years, the web 25 years.. the things are there, we are able to pass our knowledge using this medium. It's not that print is bad, we can still produce the print, but socially we haven't made that jump yet. We're expecting that someone else will make those decisions for us. When I have the opportunity to express myself on my website, add all the semantics I want, I can reuse the work that this community has done, ontologies, query engines.. I can reuse that to help others learn from my work. Also up to me to find the best way sto communicate, to educate my readers. We don't expect a single design to cover all possible content. There's variation. Some things can be expressed better.. we don't have a bar chart for every data point. The same applies to having articles. What is the best way you think .. you might want to have animation in your article (you can't do taht with paper). I want the freedom to express my work in the best way I can. I want to be able to cite other people's work. I want to cite things I've read, not things I can't access. This is not to stop anyone from using a trusted party to publish their work or reviews. But that I also will be able to participate in this ecosystem if I want to publish on my website and you want to publish through at third party publisher, that's totally fine. I would hope that semantic web conferences for instance would accommodate that. find a medium ground where we can allow some sort of creativity to communicate, but at the same time if institutions insist that researchers should only publish through a publisher who will give you a DOI, then that's fine. But I think we can do better, and we should discuss whether we're willing to do that, if that's an agreeable goal. If we can agree on that, I think we should make the effort to do it and not just talk about it. iv'e tried ot exemplify this, and tried to help my colleagues to do this, to express their work as best as they can. It's possible technically. Whether this community is socially ready and willing to make that jump.

AB: I work as part of Springer Nature, but these are my thoughts. I expect we have a lot of things.. we have reality, and sometimes it's sad we need to follow this and that. John showed you need to wait ten fifteen years to make changes.. I put together some things that might seem obvious but we're still not doing as semweb or CS community at large. Maybe you can take something for next year of this conference. Now we do ORCID for proceedings. You can have it for editors. (example: You can also have it for authors. When you publish our paper with Springer in LNCS. We have online conference service like easychair but run by springer, and we support ORCID there (OCS). We hope that it will become standard for conference systems to support ORCID. A lot of these ideas come from the research communities and when the time is right industry implements them. One is CrossMark. It's about whether this article was peer reviewed and what is the latest version. We extended it to proceedings, for the peer review process. We are working with CrossRef and other publishers to make it part of CrossMark. This was in PEERE project for innovating peer review.

Most of you already heard about LOD pilot for conference proceedings. We have not only conference proceedings there but also books. Part of Springer Nature SciGraph, big collection of linked data. Maybe you can use it.

CorssRef and DataCite working group with different publishers. Unique conference IDs. We have this concept in the LOD portal, now we want to expand it to other publishers. We are also using in connection with the reivew process. HOw is it changing for a conference from year to year? By putting all these things together we can make this kind of analysis. I hope it contributes to can we make science more accessible.

Unfortunately once we put LOD online people didn't ask abou topen access or something. The top question still remains why my paper is not indexed. We had to implement it in our LOD portal, you can see if your proceedings are indexed in Scopus, EI COMPENDIX, CPCI..

I wanted to make a point that you guys are used to RDF, OWL and so on but it' snot necessarily that other players want that kind of data. So when were using structured data about conferences, google scholar and our internal library asked for different forms. I think it's important that the data is structured, but whether it's RDF is secondary.

Now we can publish embedded videos. It seems to be obvious but how many publishers can publish videos in their proceedings. We also do data and software publication pilots. Ad hoc last year. We launched new policies for Springer Nature journals which are now industry standard, we want to reuse those policies for proceedings. The idea is that each conference says how strict they are on data sharing - allow, encourage, control, mandate - We are running a pilot with a couple of conferences. We hope to have a proper solution 2018.

RV: This panel says it's about proceedings, but everyone is talking about a lot more. From my side, it's about how we want to do science. This is a really important things. Forget about the paperwork, this is about us. What is the best way to do science in 2017 ,in our community. I have a pretty strong opinion about certain things. I'll talk about science. If we only talk about the paperwork we miss the bigger picture. How do we need to do research? I don't think the semweb is going to survive if we don't focus on the web. Content-wise, but in the way we are doing this research. We need to look at what the web can offer to make us better researching.

We need to dogfood, to practice what we preach. Can we be surprised that if we make things and put them out people don't use them, if we aren't using them ourselves. If we just make great things to do annotations or whatever and then wait, nothing will happen. If we publish great things about those annotations and we use them ourselves, then maybe it will happen, and we as a community we have the benefit of what we're doing. Instead of expecting other people to use, we should use ourselves.

Publication metadata: we shouldn't forget about metadata which helps discovery of research. I think it needs to be LOD on the Web. Who is actually doing it on the Web? Who is doing it themselves? (2 hands).

Valentina Presutti: when we publish at this data at this conference it goes to LD

RV: Right when we publish we hope it goes to some system that publishes LD.

Others are publishing my data wrong. Mendeley, ResearchGate, SemWeb dogfood.. it's wrong. Spelling mistakes, incomplete.. I could go and correct them all. Or i could publish it myself the right way and hope they take it.

Valentina: What is right? Why do you think if each of us does it ourselves it will be more correct than if it's shared by the community? That's an effort? Maybe I'm not understanding. Why is it more correct if I do publish my metadata, what vocabulary do I use, what .. how do I spell my name.. today there was a session around this.. but if you look at how people use owl sameAs.. it's not true or not obvious to me that each of us do it our own way it's going to be more correct.

SC: It has to do with expressivity. Right now it doesn't matter what scholarlydata does, I will not be able to query fo ryor hypothesis. If I want to find out articles on a certain topic, any granular aspect, I will not be able to find it.

RV: This is the discussion I want to have! But not yet... I'm sure they're doing it wrong, I can see that. Springer is okay but not everything is in there only Springer papers so I'm incorrectly represented and I'm not disambiguated.

Another thing is, not just about metadata. You can link citations between articles. Easy for authors to add but more difficult as you progress along the chain.

Publishers right now they are the authorities for certain metadata. I see a role for them as harvester, as aggregators, but not as the only authority. I'm the only one who has a complete view on my data and who cares enough to make sure everything is right, so I want to be the authority.

On my website I have all my website, you can query it, browse it, ask questions over the data. All annotated HTML and RDFa. Just a matter of dogfooding. I just did what every semantic web minded person should do. I challenge you to publish your data as well and then we can do federated queries across our sites.

Second is about openness and doing the web. Whta I see is that reviews in our community suffer from quality. I regularly see reviews that are hastily written. We all know we're doing this for free and so on. By having them at least visible, and even having names, we have visibility for the author who put in effort, and also we have accountability. The hidden process means we cannot learn from them. We cannot enter a discussion. Once the proceedings are out there and review process is done, it stops. It's too bad.. it should continue. All of this discussion does not end up in the proceedings but is the most important thing. It should start here not stop. Publications should foster scientific discussion. Not just because it's published and approved, and it feeds into your metrics. but you didn't necessarily do good research. the quality stamp is now irrevocable, but publications should be the tart of the adventure.

I see low quality articles accepted and high quality ones rejected. For this conference at least I know one paper with a fundamental flaw that got accepted. I wanted to post a review on my website but that would make me look very arrogant because we don't have this culture.

My articles on my website are there, but I don't receive much feedback which is a pity.

My two points:

  1. Self publication of metadata
  2. Can we not see proceedings as something final, but as a starting point of discussion?

PH: I completely agree with the LD quality issue. I don't quite agree why everybody needs to do it themselves but it probably make sense to think of the processes. The other thing I hope...

Ruben said this is not about proceedings its about research. I completely agree with that. Three things that bug me.

Publishers provide added value. It's fine that they get compensated for this. There are people who disagree. I'm not going to do a poll. But they do provide added value. Having worked with several publishing houses and proceedings etc. They do work I don't want to do myself. so if I don't apy the professionals myself somebody else has to do it. Either I have to become a professional publisher, or I have to employ someone.. but of course we need to think about prices staying reasonable and the conditions we're getting as a community. Journal prices.. sometimes there is a multiple earn per journal article. Something is wrong there. Some people are exploiting the community. While this is a good thing to say and to mention and work against, we still have to keep in mind publishing houses provide added value and they need adequate compensation for this.

20 years ago I was a young phd student and got involved in the logic programming community, and then 1999 the whole editiorial board of the journal of logic programming resigned and moved the journal to another publisher. The publishing company had prices too high - they wouldn't' support it. So they left. You can guess what the publishing house is.

We need to push openness. Now I'm mirroring a lot of things Ruben said. Evaluation of data along with papers. I don't have a solution; i would hope publishing houses would take that job. It would be added value I'd be happy to pay for. If you want a prominent case of things that can go wrong if data is not provided, research the pace trial in chronic fatigue syndrome. The community in some fields, not necessarily in CS, is moving in that direction.

We need to move towards open access for all publications. i do not mean 'open access' I mean the public has a write to access all research output including papers. Somewhere between funding agencies, researcher,s publishing houses, we need to work out a way to do this.

Expressive and open metadata is necessary. Our job as SemWeb community to inforom whoever provides that what is needed for added value. We need information about which papers are cited by which paper. This is very difficult to get, and I think it's a basic thing which is important for research, and for research ON research.

I believe we need ti improve peer review, whatever that means. one of the things that bugs me is that there is very little research or funding to look into efficient ways of conducting peer review. Ruben mentioned some things.. I frequently see papers at conferences that missed something that was recently published.. if you push reviews to the open then of course things like that can be at least in more cases counteracted.

Takes courage to make changes. Good that we're discussion. Some of the discussion already probably indicates how early we are in this discussion, and we really need to find a way forward with a more narrow focus about what's really needed.

Valentina: I agree with both Pascal and Ruben for some things.. when I said why do you think that if we do it ourself it's goign to be more correct? I agree the data.. there is no possibility to query about specific things.. I think that's a separate problem. One thing is who is going to do this and what's the best way,t he best metadata. We have to be as expressive as possible. We need to describe the outcomes, the results, as much as possible so they can be accessible or automatically processed. On the other hand, the content like the methodology, results, I agree we have to improve and increase the metadata. I disagree that being each of us doing it .. I'm not sure I want to do it myself, unless you talk about what's the description of the methodology that you use. I know what it is, there is the need to have tools or agreement to help us doing this. It's at the same level you need tools for publishing, for correcting, for maintaining all this data. Although I share the vision of having everything open, everything accessible, I think we have to be realistic and .. in this sense the publishers can make an effort to change their business model. One of the issues is about the copyright.. one of the key points that made this panel possible.

For peer review I think it doesn't work. It doesn't need to be imporved, it just doesnt' work. We should go back one century. We shouldn't have peer reviewers put in scores. Making our names transparent is one way of improving, but still I have my doubts that it works at all.

RV: My point is not that we need to do it perfectly.. we first need to DO IT. We need to stop finding excuses not to publish RDF in general. There's no such thing as perfect metadata. It's important that we start, and aim to be beter, but that's something else. Whether we do it as individuals or per institution that's something else.. but if's not okay to say other people can publish our metadata and they don't let us add stuff.

Axel Polleres: I have some remarks.. based on what John said what we have now in the system is creating metrics. Whether we want it or not we are measured against this. What I'd like to know is how should we assess our research and publications in an ideal world? There is some need for that. There are different system used, and there are systems we have in place that we can't just throw away easily. What are the alternatives?

About peer review.. no one of us likes peer review.. but what better system do we have? I request everyone to stand up who has a better solution.. I don't know. I've discussed this topic amply with many colleagues and never found an agreed better solution for that. Some models to modify it or whatever but I don't have it. As for metrics... no-one in the room really likes these metrics. but at least we could also say anyone would admit they like these metrics. Other people who ego surf on their h-index should come up here.. We should also reflect a bit about this if we pitch about metrics. Is it all that bad?

The last thing.. if we want to throw over this publication reviewing metrics system we don't only need better solutions but we definitley need to guarantee ease of use. We need tools, we need those who enable that, not everybody wants to maintain their own webpage like Ruben, which is very honorable but i don't think everybody wants to do that.

AB from Springer mentioned ORCID.. whenever I have to register to a new pid I don't know how many I already have.. I have an ?? account, an national PID from austria, my ..?? typical silo id providers.. Seems to me useless to have new ones. Why do I have to have separate ID for my research to my other projects?

Valentina: I don't think peer review is working as it is implemented, there are at least two things we can do from tomorrow.. one is open the reviews. The second is to get rid of the scores. The programme chairs have to read the reviews and decide if a bunch of a reviews from a panel are negative or positive because they have the overview. The reviewers only have a view of two or three. These are two measures we can take starting from tomorrow.

AB: About peer review.. there are people from medicine and other disciplines.. don't think about it like black and while. Just make it possible to make peer review open and see how many people go for it. Next step, put your name, see how many people go for it.. then see if your community converges. Same about ORCID.. seems to be really taken up. Otherwise we are happy to work with whatever identifier you think is good.

Claudia D'Amato: There are too many things on the table.. The discussion si articulated in two directions. What we can do within the community for improving the community and giving positive examples for others? This point, eg. Ruben saying please publishing your data, good suggestion, why not. If someone would like to do that fine. If someone does not, fine. Arising the point is fair enough. This is for improving the community. The community is within in a larger system which is the one that has been mentioned by John and Axel, universities are evaluated, researchers are evaluated. We could think about our ideal world. But we stay in a larger system. We can certainly do something for pushing some good attitudes in our mind ,but in the meantime we stay in our system. I could not imagine what could happen if we remove publication or proceedings. What happens if we just put our paper on a website and that is it? We don't hav ea justification for attending the conference, we don't have a good evaluation as a researcher, we cannot contribute to our department.. this is a domino system.

I agree with Pascal, publishers do some job. They do quite a reasonable job.. everyone does jome job it's right to pay them. Maybe the amoutn of money we would like to pay would be under negotiation..

Anna Tordai: I work for Elsevier, but this is personal thoughts. We've been having these discussions at recent workshops. Somehow everybody has already said the things I want to say. I was struck by the peer review notion of the dialogue. When wer'e at the conference and we have an active dialoge around the papers, that is much more useful for the authors and for the community. I'm wondering whether that would be possible to have more dialgue style reviews as opposed to you read it sit around, last minute you write half a page and submit. Instead to have more of a process where several reviewers are assigned, and you can initiate comments and have responses, and have that kind of mechanism. Pretty sure we have the tech available for that. If it's open then it also makes it really valuable for not just the authors but also the community. I think i'ts really sad thing that most research is only read by the reviewers..

Self publishing metadata, I think that's a tricky thing. Even though we can have similar mechanisms, you tend to have a particular model of the world in our brain, and we will disagree with what someone else describes. That's what i end up thinking when I try to apply other people's examples. Somehow there's always this twist for your own situation.

Mendeley data gives you the ability to upload your own data and associate it with articles. You can get your profile, you can associate it with your scopus profile.. this also came up you don't want to update five different profiles in different systems, so I think I'll take back to work, these people want to make their metadata available so use it!

RV: Me again sorry. Forget it if you think you can overthrow the system. It's not what we have or something new. It's more as parallel systems.. we don't need to forget and change what we have. It's perfectly possible to publish as HTML and be part of the proceedings, do it. If we can do it in parallel, we might have a chance of succeeding. It's complementary.

Alex Garcia: When I first started looking at the slides I thought this was going to be about how to improve the publication of proceedings. These discussions seem to be about solving the whole problem of scholarly comms as a whole. I used to work at sepublica.. did that for 4 years, semantics for publications. Everything that has been said used to be said 4 years ago. Exactly the same thing. Poor metadata, data availability, reproduce the results... I want to have open reviews, okay possible. There were so many thing sthat have been repeated today. It really surprises me from this community because this is supposed to be the community at the cutting edge metadata management. What i see is that it's not the case. The system is such that all the value in the research life cycle is being put in just one object.. it's the paper. That is the way it is. It is up to us to decide what other scholarly outcomes do we want to have as valuable outcomes. Truth of the matter right now is only the paper has value. That's the only thing that is accountable. The rest of the things, no matter how careful you are with your git repos, i'ts not accountable. Even if you deposit on figshare not accountable. The only point of value is in the paper. If that is the only point of value then I do agree we need to have it much more discoverable. I'm afraid that i's not going to be the scigraph from Springer that is oging to make this more discoverable. Alternatives to make it open, for instance the center for open science has this open science framework which allows you to publish your preprints. They cover tech, $0. You can propose whatever additions you want to propose. you can enhance it for your preprints however you want. If you want to add new metadata you can do it, nanopublications you can do it. My point is that specifically going back to the beginning of this, it was about how to improve the publication of the proceedings of this conference. My proposal would be to run a preprint server using only semantic web technologies. You could have it running tomorrow.

SC: Let's talk about value. My articles, all of them since I started my masters, are online. I describe them as best I can, as granular as I can get. They're social, multi-modal and so on. CC-BY. It's accessible and so forth. This is intertwined.. the question is also for the publishers as well as for the conference. What value can the publishers put on top of what I'm currently doing. The other part is, is the conference willing to accept the contribution as such? It took me 5 years of asking for it to be done more webby. Finally came to a point where they accept HTML+friends contributions articles in those formats. Still it's a very tiny slow process. Small improvement. We still have to provide latex for the camera ready and to hand it off.. never mind the whole giving up the rights or paying extra. What's the value that anyone can provide on top of what I'm currently doing online? And whether that is acceptable going forward for conferences submission to this body of knowledge. Pascal brought up the value; I don't mean to gang up on the publishers. The first thing Is aid was the freedom of expression. I don't want to prevent others from going with a publisher or third party to make their work available if that improves their work. That's their call. I understand, it's a service, you pay for it. That's fine. But is the current system preventing me from excelling in what I'm trying to do.

PH: Since it came up, on peer review, I have some thoughts. Half baked. You know about the semantic web journal and so on, we have this open review process. About 80% of the reviews carry the name of the reviewer. If I think of peer review, then I think there is something else wrong with peer review. As we currently practice it. I believe, I would like data.. my guess is that some papers get reviewed a lot until they get accepted. I don't think there's anything particularly wrong with this. YOu submit, get some feedback, improve, submit again, and so on. Regretfully what that means is that each paper produces more review load on the community than the three or four reviews that are required for one conference or one journal. Multiply this up. This is something which open signed reviewing can also counteract. I conjecture that the semweb journal gets far fewer crap paper which we have to reject pre-review than journals which run a closed review process. We throw them out immediately and it's public. That's a barrier. Sometimes we do reviews for things which were previously for conferences or workshops. We encourage people to reuse the same reviewers. how easy would that be if the reviews were public in the previous event. How would that hlep in terms of work force of the community. How much more capacity would we get? That capacity could go back into making better reviews.

JD: A fair thing for us to do as STI is to think about how we might tweak the conference next year. I don't want to do anything radical and closed. I know people can't get funding to come here without a publication.

There is the organising committee, which is the engine that power sthe conference. We could tweak the organising committee.. we could have some chairs focused on taking on the diversity of formats, or supporting people who want to provide their own metadata.

We can have the discussion with our publishers about what to do with diverse requests that may come up. We'll think about that next year.

Not necessarily having a panel ever year, but having a review every year about ever-improving our publication process.