Skip to Main Content

Artificial Intelligence

This guide provides information on generative Artificial Intelligence (AI) tools, and best practices for using them.

Accuracy

Generative AI tools like ChatGPT are able to produce a lot of different content, from quick answers to a question to creating cover letters, poems, short stories, outlines, essays, and reports. However, it often contains errors, false claims, or plausible sounding, but completely incorrect or nonsensical answers, so be sure to take the time to verify and check the content created to catch these problems. 

Generative AI can also be used to create fake images or videos so well that they are increasingly difficult to detect, so be careful which images and videos you trust, as they may have been created to spread disinformation.

Fact checking is very important when using AI!

Bias

Generative AI relies on the information it finds on the internet to create new output. As information is often biased, the newly generated content may contain a similar kind of bias. Example of potential bias include gender-bias, racial bias, cultural bias, political bias, religious bias, and so on.  Closely scrutinize AI-generated content to check for inherent biases.  

Comprehensiveness

AI content may be selective as it depends on the algorithm which it uses to create the responses, and although it accesses a huge amount of information found on the internet, it may not be able to access subscription-based information that is secured behind firewalls. Content may also lack depth, be vague rather than specific, and it may be full of clichés, repetitions, and even contradictions. 

Currency

AI tools may not always use the most current information in the content they create. In some disciplines, it is crucial to have the most recent and updated information available. Think, for example, about the recent pandemic. Research was going at a very fast pace and it was important to have not only the most comprehensive and most reliable data available, but also the most recent. Technology is another area that is constantly changing, and information that is valid one year, may not be valid the next. There are many other examples, and it is important that you check the publication dates for any sources of information that are used in AI-generated texts. 

Note: As of October 2024, ChatGPT's training data comes from January 2022 and it has the ability to pull some current information from the Internet. Microsoft's Copilot's training data comes from October 2023, and it also has the ability to pull some current information from the Internet.

Sources

Generative AI tools do not always include citations to their sources of information. Additionally, they are known to create citations which are incorrect, and to make up citations to non-existent sources (sometimes referred to as AI Hallucination). For example, an AI tool may provide a citation for a source written by an author who is known to write about the topic you are researching, or even identify a relevant well-known journal – however, upon investigation, the title, pages numbers, dates, and sometimes author(s) of the source turn out to be completely fictional. 

Note: Not crediting sources of information used, and creating fake citations, are both cases of plagiarism. These are both considered breaches of Academic Integrity.

Always investigate any output you receive from a Generative AI tool, no matter how well-written or factually correct it might seem at first glance. Take the following steps every time you plan to use an AI-generated output for your research:

  1. Investigate all sources cited by the Generative AI tool, to see if they actually exist. This may take some time. You might find the sources in any (or all) of the following places:
    • Columbia College's Main Library Search tool
    • Google Scholar
    • Author’s website
      • If the source was authored by an organization, check the organization’s website
    • Publisher’s website
    • Google search
  2. If you cannot locate the original source that a Generative AI tool has cited, the source likely does not exist.
    • Note: If the Generative AI tool has cited a source that does not exist, you cannot use the output for your assignment.
  3. If you are able to verify that all sources cited by the Generative AI tool do exist, you must then verify that the AI tool summarized them correctly in its work. You will need to take a look at all sources cited by the AI tool in order to evaluate whether or not this is the case.
    • Note: The Generative AI tool may have paraphrased its sources in a way that is inaccurate to their content. If this is the case, the output is not an appropriate source to use for your assignment.
  4. Once you have verified that the Generative AI tool has used real sources for its output, and that those sources were paraphrased correctly, you may proceed with using the output for your assignment.
    • Be sure to follow the guidelines set by your instructor.

Copyright

Generative AI tools rely on what they can find in their vast knowledge repository to create new work, and a new work may infringe on copyright if it uses copyrighted work for the new creation.

For example, there have been several lawsuits against tech companies that use images found on the internet to program their AI tools. One such lawsuit in the United States is by Getty Images which accuses Stable Diffusion of using millions of pictures from Getty's library to train its AI tool. They are claiming damages of US $1.8 trillion.

There is much debate about the ownership of copyright to a product that was created by AI. Is it the person who wrote the code for the AI tool, the person who came up with the prompt, or is it the AI-tool itself? Although currently in Canada, AI-generated works are not copyright protected, this may change in the future.  Also note that laws in other countries may differ from that in Canada. 

Model Collapse

Recent research has raised the concern that as more text is published that has been created by generative AI, this AI-generated content will enter the training datasets for new generations of AI, which may decrease the quality of the data since errors in early generations of AI compound themselves over time. A study by Shumailov et al. (2023) found that the inclusion of AI-generated content in training datasets led to model collapse, which is  "a degenerative process whereby, over time, models forget the true underlying data distribution, even in the absence of a shift in the distribution over time" (p. 2).