April 8, 2016



What is Open Access?

While the concept of open access is not new, the Open Access Movement, and the term itself, has been widely credited to the 2001 Budapest Open Access Initiative (BOAI). The BOAI defined “open access” as “free and unrestricted [access to] online” research. The Initiative was developed during a meeting held in Budapest by the Open Society Foundations (OSF) on December 1-2, 2001. The participants of the meeting represented a variety of different academic disciplines and nations. Many of the participants also had experience with pre-existing open access initiatives. The purpose of the meeting was to “accelerate progress in the international effort to make research articles in all academic fields freely available on the internet.” The BOAI marked the first call for open access across all disciplines and countries. It is also the first initiative to propose strategies for achieving openness, and outline open access principles. Building on previous open access initiatives, the participants of the BOAI determined that open access was not only economically feasible, but it also gave readers the power to make use of unrestricted peer-reviewed research. In return, this would grant authors and researchers “vast and measurable new visibility, readership, and impact.”

Since its launch, there has been a rise in the global movement to promote and implement open access models. The Open Access Movement has gone beyond academic disciplines, and has since been adopted by government bodies and organizations in many countries. To further promote open access, global events such as Open Access Week and International Open Data Day encourage individuals, scholars, institutions, and students to join the conversation and actively engage in open access activities. Specific to Canada, the Canadian Open Data Summit (CODS 2016), which will take place April 27-28, encourages a national conversation to explore how open access can be achieved. Another initiative, the Open Data Button is significant for giving individuals the power to request and access data at the push of a button.

In addition to promoting open access to research data, there has been a push within several academic fields to create openness in research through producing “open notebooks”. Creating an open notebook is the process of releasing unrestricted research notes to the public. In particular, this process allows other researchers to learn from the methods, failures, and results which moved a research project forward. Moreover, with the rise of digital technology, blogging platforms and code hosting sites, such as GitHub, allow researchers to easily share their research notes with other researchers and document the progress of their projects.

Furthermore, the Social Sciences and Humanities Research Council (SSHRC) created a Research Data Archiving Policy which directly affects historical research. The SSHRC, which is Canada’s research funding agency, supports post-secondary based research, and research training for the fields of social science and humanities. The stated purpose of the policy was to facilitate the advancement of knowledge in the social sciences and humanities through making research data an open source to the public. As a wide range of projects use public funding to collect research data, the SSHRC believes the research belongs in the public domain. Researchers who make use of SSHRC funding are expected to release all data collected within two years of completing the associated research project. This allows other researchers to analyze, replicate, and verify research findings. Also, open access to research data limits the duplication of primary data collection, and provides other researchers with the opportunity to expand upon the open research. Moreover, the SSHRC believes that scholarly openness would prove valuable in both graduate and undergraduate education.

In addition to this, in 2015 the SSHRC developed a harmonized policy with the Canadian Institute of Health Research (CIHR) and the Natural Sciences and Engineering Research Council (NSERC). This policy, which is called the Tri-Agency Open Access Policy on Publications builds on open access principles which were adopted by all three agencies, and includes suggestions which were provided in response to the open draft policy. Under this policy, journal articles funded by any one of these federal granting agencies are expected to be “freely available online within 12 months of publication.” Researchers are expected to comply with the open access policy by either “self-archiving” their peer-reviewed manuscript to an online repository or submitting their manuscript to a journal that offers open access publishing. Committing to open access as a means for strengthening the impact of research, the SSHRC’s aim is to “mobilize knowledge and build understanding, thereby broadening Canada’s realm of influence and strengthening access to the best, most promising ideas worldwide.”

Canada and Open Government

In Canada, the federal government recently renewed its biennial Action Plan on Open Government, which outlines the government’s plans for fostering greater openness, transparency and innovation. The initiative encourages Canadians to get involved, engage in conversation, and share what they want to see included. In a commitment to this, the Library and Archives Canada has opened “more than ten million pages of Canadian government records” to the public since 2010. As these records document all aspects of Canadian public life, they provide valuable information for researchers, students, and institutions looking to conduct and teach history. While open data initiatives in Canada are small, open data programs are increasing. Ongoing open government programs across the country can be seen in the map shown below. To explore the interactive map, click here.

open gov programs can 2016-04-08 at 4.45.44 PM

Open Access and Public Opinion

In a research report which focuses on open monographs in the UK, the OAPEN-UK examined related attitudes and perceptions from funders, researchers, publishers, universities, and libraries. Released in January 2016,this report was funded by Jisc and the Arts and Humanities Research Council (AHRC). It represents a five-year study into open access monograph publishing in the humanities and social sciences. Amongst their findings, the report demonstrated that participants felt uncertain and hesitant about adhering to open access principles. Addressing this, the report finishes with a list of in-depth recommendations for following open access principles, and shared advice and experiences from publishers who have participated in open access monograph publishing.

With the discussed developments in mind, I began this project as an exploration into public attitudes and perceptions towards the Open Access Movement. My intent was to look more specifically at how Canadians were discussing and promoting open access to historical data and research. However, as I sought to explore public opinion by looking at discussions on social media, I realized that the geospatial information would prove problematic. As the geospatial data can be turned off, it is difficult to determine the accurate frequency of Canadian locations in a dataset. Moreover, as “history” is both an academic discipline and a noun, searching for this specific word would not provide me with accurate results. With that said, I decided to use Twitter to search specifically for #OpenAccess and #SSHRC to try to get a sense of how Canadians are talking about SSHRC open access policies, and how society is talking about open access more generally.

Data Collection: Why Archive Twitter?

Sian milligan tweet SSHRC 2016-04-08 at 11.45.48 AM

While Twitter is not necessarily an accurate representation of broader society, every tweet, hashtag, favourite, and retweet generated represent the thoughts, values, and knowledge shared by “everyday” people. Working with this notion, researchers Nick Ruest and Ian Milligan recently released an article which uses Twitter to analyze public sentiment during the 42nd Canadian Federal Election. Valuing Twitter as a source for data, they argue that, “Tweets, as well as the much broader scope of archived webpages and born-digital data, are the primary sources of tomorrow. Websites and tweets present considerable advantages in that they represent the preservation of material representing the voices of everyday people that might not otherwise be saved, but also considerable challenges in the collection and use of data on such a large scale.” As such, Twitter can be an important source of information for social and cultural historians, as tweets can provide insight into the attitudes, values, and perceptions of individuals from all aspects of society.

To begin exploring public attitudes towards open access, the specified tweets were captured using search APIs in twarc. Twarc, or Twitter Archiving, is a tool which was developed by Ed Summers for searching and collecting tweets and their associated metadata. As Twitter’s API search prevents people from accessing large amounts of raw data at one time, the data collected for this project dates April 1-March 23, 2016.

Data Analysis and Results

#OpenAccess Dataset

Saving my datasets as csv files in excel, I was able to easily extract specific fields of data for text analysis. Using the data listed under user_location, I created a cleaned list of all locations using the 3class: Location, Person, Organization classifier in Stanford NER. To clean my data further, I entered the results into RegExr to find desired locations. RegExr is an online tool designed to help others learn Regular Expression (regex). As I had never used regex before I found this website very helpful and easy to use. The site provides real time results as you type, explanations for your expressions, and also gives you access to example expressions from other users. For my #OpenAccess dataset, I used an expression which specified which locations in Canada to look for within the data. Knowing that Twitter usage is more frequent within the provinces, I decided to specify all of the provinces and major cities within them:

/(\W|^)(British Columbia|Canada|BC|Ontario|Alberta|Manitoba|Saskatchewan|Nova 
Scotia|Newfoundland|Labrador|Quebec|PEI|Prince Edward Island|New 
o|Ottawa|Saskatoon|Halifax|Charlottetown|Quebec City)(\W|$)/g

frequency of CAN locations

To make sense of the data, I visualized my results in a format found on RAW. The visualization can be seen above, with the locations listed on the right and its respective frequency on the left. As you can see, my data shows that the location with the largest frequency is “Canada.” While some of the geolocation data is formatted to include the country, looking through my data I believe the geolocation data was often set to display the country only. As the geolocation is not always specified, it is difficult to determine which provinces, or cities in Canada were discussing open access over Twitter using this method.

In the same dataset, I extracted the hashtags to see what people were associating with open access. Using ReExr again to clean my data, I then entered it into Voyant Tools to visualize the frequency and relationship between hashtags. The results below show that #openscience was one of the most used hashtags in combination with #openaccess.

TRENDS voyant tool openaccess hashtags 2016-04-08 at 11.19.59 AM

To get a sense of how people are discussing open access outside of science, I decided to use the “Stop Word” tool in Voyant to remove terms associated with disciplines not relating to humanities. I removed hashtags such as #WomenInStem, #OpenScience, #WomenInScience, #Health, and #ZikaVirus. The results are below:

OPENACCESS hashtag relationship wordcloud

Here, hashtags such as #MuseumWeek, #ScholarlyPublishing, #OpenEducation, #Copyright, #Journals and #Academics become more visible. The selection of these hashtags show how people are thinking about open access in terms of education, research, and publishing. The use of hashtags such as #Copyright and #ScholarlyPublishing suggest some people are concerned about the impact of open access on ownership and publishing. Other hashtags, such as #Free, #OACountdown, #OA, #OA2020, #Scholcomm, and #OpenData suggest people are promoting open data initiatives, and thinking about future open access opportunities.

Using RegExr, I also extracted information from the text field in the dataset. As the text includes specific mentions of people, events and organizations, I used an expression to include hashtags and mentions with the ‘@’ symbol. My expression for this dataset is below:


OPENACCESS text frequency RAW

palladio OA relation 2016-04-08 at 4.29.24 PM

Mentions and hashtags are an important part of conversations on Twitter, as they help to provide context for the tweets. The first visualization shown above was created using RAW, and the second visualization shown above was created using the “Graph Tool” in Palladio. The first visualization shows the frequency of the terms found in the text field in the #openaccess dataset. Here it is evident that the open access hashtag was used to talk about research, journal articles, policies, and openness, to name a few. Additionally, the usage of terms like “new” and “free” demonstate public interest in open access policies. The second image represents a visualized relationship between #openaccess and all other terms. Viewing the terms in this way, you can see a general relationship between all listed terms. This visualization makes it more clear that the terms were used in reference to research, publication, and knowledge. In addition, terms such as “public,” “impact,” “information,” “need,” and “great” become more visible. While the frequency of these terms do not compare to terms such as “open” and “research,” their usage still speaks to the thoughts and opinions of “everyday” people.

#SSHRC Dataset

Using the same methods outlined above, I extracted the text field of data and entered it into RAW to visualize frequencies. The image below shows the most frequent terms on the right and their respective frequencies on the left. The most frequent terms shown are “research,” “CIHR” and “IRSC,” which are English and French acronyms for the Canadian Institute of Health Research, and “NSERC” which is an acronym for the Natural Sciences and Engineering Research Council of Canada. Interestingly, while I did not find frequent use of the terms “open” or “access” the usage of these terms show how Canadians are aware of SSHRC, CIHR, and NSERC policies towards research grants. The word cloud depicted just below gives added context by making specific terms more noticible. In this word cloud visualization, the terms such as “grant,” “aware,” “discovery,” “supports,” “thinking,” “excited,” and “scholarship” provide a greater understanding of the attitudes and opinions of the individuals who tweeted with #SSHRC. Similarly, the terms “Budget2016,” “Indigenous,” “sshrcimpact,” and “winners” provides context for the data. The SSHRC Impact Awards is an open call to nominate researchers for outstanding achievements in the social sciences and humanities. As the deadline to nominate was April 5, 2016 I believe this is the reason why this term and “winners” was found in this dataset. Furthermore, as the SSHRC rewards grants to research of Indigenous communities, developments in research or changes in funding may have prompted Canadians to speak about “Indigenous” in relation to the SSHRC. Moreover, as part of Budget2016 the Canadian government has made a commitment to open and transparent operations. The appearance of this term in this dataset suggests that Canadians are thinking about the SSHRC policies in terms of Canada’s move towards open access to information. The data from the hashtag field, which can be seen in the last image, reiterates most of the results found in the text field data. Noticeable differences are the #SSHRC101 and #CelebrationOfResearch and “#TalkGaming hashtags. The first hashtag was used by the SSHRC to provide the public with resources and information on their grant opportunities. The #TalkGaming hashtag was also used by the SSHRC to promote the Game Changers Exhibit on the impact and evolution of video games. The frequency of these hashtags demonstrates the benefits of social media platforms for sparking discussions about open access.

SSHRC text field data Frequency of Words blue SSHRC-RAW

SSHRC text field data ver 2 corrected SSHRC RegExr wordcloud

SSHRC hashtags field data SSHRC-hashtags-wordcloud