Comprehensive guide for scientists using ChatGPT3
A discussion of this technology’s promise, as well as pitfalls and perils
--
Science work, whether taking place at the bench, screen, class room or conference room, can all benefit from artificial intelligence. It is already making an impact on how we cite papers,¹ with examples like SemanticScholar. For writing there are AI-powered assistants like Grammarly, and tools like Github Copilot for coding. ChatGPT3 is entirely free (for now), so I spent a few weeks experimenting with it. Similar to other language models, ChatGPT3 uses statistical regularities in petabytes of text that it has processed to both create a knowledge base and generate replies to user inputs.² I found myself amazed at first, then mortified, and later reassured that the computers won’t be taking our jobs anytime soon. Like with any new (and imperfect) technology, learning about ChatGPT3 early will give you the best chance to realize its full potential as this novel technology rapidly evolves. Note: for the rest of the article, I’ll refer to it as ChatGPT for reading ease.
For background, I completed a PhD in neuroscience five years ago. Since then, I have primarily been working as a Postdoc doing data analysis alongside experimentalists in a cell biology lab, where we focus on single-molecule imaging. Most of my work has involved coding in MATLAB or Python. My time as a scientist has been split across the lab bench, teaching podium, and computer screen.
Below I have outlined some of the ways I have used ChatGPT to assist me with basic lab tasks. For the “Too Long/Didn’t Read summary”: ChatGPT’s knowledge is miles wide, but only inches deep — and unlike Google, if it doesn’t have information you’re looking for, it will make things up, leading to potentially a lot of wasted time. From my experience, I believe ChatGPT is a big advance above Wikipedia or Google (with the major disadvantage that it lies), but it is far from anything like intelligence. You can read on to learn where I think ChatGPT excels over our current approaches, and where you’re better off sticking with the status quo for now.
I have attempted to aid myself in completing various science tasks using ChatGPT3, and I describe them below.
Scientific Writing: Huge time saver
Need a way to phrase something that you feel like you already wrote a million times? Or you want someone to complete your thought for you? ChatGPT is there for you.
And if you don’t like that response:
And ChatGPT can generate infinitely many such responses. Just keep asking it to try again. Remember, ChatGPT is not a search engine, it is a generator of text: recombining words with similar meanings and completing sentences in ways that make sense given what it knows.² Thus, you can rest assured that the sentence it wrote won’t already have been written (which is why ChatGPT can escape plaigarism detectors quite well). Unlike a writing partner, ChatGPT will never lose its patience with you. This way, you can turn writing into something more like a multiple-choice problem, which creates less cognitive load (very useful for those late nights before a deadline).
“LaTeX? Oh, yeah, that’s something our collaborators use…”
ChatGPT is phenomenal at helping you write basic documents in LaTeX/Overleaf, too. Writing with ChatGPT open in a nearby tab helps you generate rapid and concise LaTeX code. If you find yourself needing to use LaTeX for writing, especially equations, ChatGPT will provide wonderful guidance:
ChatGPT knows many standard equations, so you can simply ask it to write them out in LaTeX. I asked for LaTeX code to write out the Hodgkin-Huxley equation, an important equation in neuroscience and took the output to LaTeX:
ChatGPT knows its way around Microsoft Word and Google Docs, too, and can help you navigate the menu items and complete word processing tasks. In short, if you use ChatGPT like an assistant for taking care of the mechanical aspects of writing, it works nicely.
Literature search: Don’t believe everything you read
ChatGPT was good at synthesizing background information about different scientific fields and compiling it in a readable way, but when it came to asking for details about the literature, ChatGPT is inaccurate and confident about it. You can ask it about past collaborations between different types of labs and it will return a long list:
The problem is most of these papers don’t actually exist, or they aren’t published by who ChatGPT says they are. Remember, ChatGPT does not treat documents like research articles as individual objects,² thus it returns information based on statistical regularities in language, but it doesn’t know that research articles are a thing that cannot just be invented from just the relationships between words and names in a corpus. I am certain this is something that will be addressed by a major publisher like Elsevier when they license AI software.
But it gets worse!
I asked ChatGPT to summarize a specific scientific article published by my peers using chemogenetics to study thalamic activity in a mouse model.³ This is a publicly available article that was published in 2013. Here is where things get ugly:
The research in that paper was carried out in mice, not monkeys. That would make a huge difference to anyone in neuroscience, and would make you look pretty silly in a lab meeting or discussion with your peers. But it gets worse:
The paper didn’t use optogenetics. It used chemogenetics, a substantially different method. Now, a project that used chemogenetics in rodents has become a project that used optogenetics in monkeys, which wasn’t even possible at the time of the publication. Optogenetics as a technique has rarely been used successfully in primates, so I asked ChatGPT for example references where it believes it has:
The first two papers literally do not exist. ChatGPT made them up. The second paper is about rodents.
You can type “links” and ChatGPT3 will give you links and those links can be entirely made-up.
Only the last two papers are about optogenetics in primates. Thus, I would not trust ChatGPT to summarize an individual paper or even to summarize newer developments in a field of science.
But maybe ChatGPT has learned how to structure PubMed searches — and you could use it to make more efficient use of PubMed? I asked it to create a search query for use in PubMed, of all articles where the last author was Gordon. It couldn’t even do this correctly. Here is what it returned:
The actual query should be “Gordon”[Author — Last].
While ChatGPT can summarize basic areas of science like “molecular biology” or even “protein biology”, it has no deep knowledge of science and it has no ability to reason about what it knows, or doesn’t know. Thus, for literature searches, I would stick with getting good at PubMed and using citation tools. ChatGPT’s disregard for facticity and its comfort making up facts make it a highly unreliable source of information about scientific literature.
Proprietary Software: Equally unreliable
I asked ChatGPT questions about lab software like Prism, Geneious, and various molecular dynamics software. Each time, it would give me an answer. Sometimes a very long one! And each time you went to try it out in the software, you’d realize you had received plausible-sounding but completely made-up instructions. For Prism, I asked how to format Error Bars and it invented features of the graphical user interface that don’t exist:
Prism has neither a “Format” tab nor an “Error Bars” dialog box, nor does it have a “Line” tab. Instead, Prism has a dialog box called “Format Graph”, which ChatGPT did not seem to know about.
If you ask ChatGPT instructions to do something it doesn’t know how to do, it will make it up.
Similarly, when I asked it how to discover conserved motifs across sequences using the Geneious program, ChatGPT gave me an answer that just doesn’t exist in the software. I thought I was going crazy. I even checked multiple versions of software! Alas, just like ChatGPT does not seem to value the sanctity of a scientific article’s text as the bonafide source of information about what is in the article, it also does not place much value on a software’s online documentation — inventing its own instructions based on language regularities.
Open-Source Software: Scarily unreliable
Similarly to the case with proprietary lab software, ChatGPT will give you some basic ideas of the process you need to carry out to get your results, but when it comes to giving you specific instructions, it will be brazenly incorrect — again. In the case below, ChatGPT invents a GUI and again conjures up commands that don’t exist.
I asked ChatGPT to tell me how to use DISC, an open-source software package, to fit a three-state model to FRET data, it responded:
Sounds plausible, the problem is there are no tabs in the DISC GUI. So I asked it about using the DISC as a command line function:
It seemed to think DISC was a command from a Linux-based operating system, which it’s not. It only runs from the MATLAB command line. When I informed ChatGPT that DISC is a MATLAB package, it returned:
Unfortunately, neither of these commands are part of the DISC package, nor have they ever been.
ChatGPT is willing to substitute plausible statements for actual instructions, but would it make up software that doesn’t even exist?
I asked it about “FRET jump state analysis”, a non-sensical idea which nobody has ever proposed and I can confirm doesn’t exist. What happened next was quite alarming!
Of course, the link didn’t work. While there is at least one Köster laboratory that works on computational biology, this is not their GitHub. I asked ChatGPT if it was sure about FRETJumpStateAnalysis and it corrected itself, saying this wasn’t in fact a real package, but then unfortunately suggested other GitHub packages with links that also don’t exist (this was a bad day for GPT). Confirming that a webpage exists seems like a simple step ChatGPT could implement that could prevent a lot of headaches and wasted time.
Data analysis and coding: Gets there eventually, but no replacement for expertise
When it comes to basic Python and MATLAB tasks, ChatGPT3 is good at giving you the code you need to carry out your projects. This can save you a ton of time. Asking for basic things like “scatter plot with points colored by value” returns the right answer, and it will even give a full and complete response when asked to write code for carrying out maximum likelihood estimation:
In this way, ChatGPT3 could be used as a very patient tutor for helping you learn to implement advanced statistics in a scientific computing environment — very good!
I tried it out with Python to accomplish some basic machine learning tasks, where it needed a bit of nudging to complete the tasks. My query was “use Python to find a line that best separates two sets of points.” It told me to use PCA and gave me an example, which isn’t correct. I specified:
Then it got to a right answer: support vector machines. That’s fine, but it requires labeled training data which I did not say I had. I specified that I needed an unsupervised method. I went back and forth with it for a little while, as it tried a few different approaches but had some errors that could have been due to differences in the versions of sklearn. It eventually solved the problem using a module of sklearn that implements spectral clustering:
This was a good enough solution for the simple case of linearly separable data. But what about the more difficult case, for example, circles. Here was ChatGPT3’s solution.
I kept telling it that it wasn’t right, and it just gave me the same thing, substituting with a different example dataset each time (it cycled through moons, then blobs, circles again) — never getting it right. Eventually, I asked whether initial parameters might matter for the result. It gave me a long discussion on the choice of parameters and then showed me two examples of spectral clustering with different settings for the “affinity” parameter in spectralclustering (the default settings at left, and using nearest neighbors at right):
ChatGPT seemed to know a lot about the importance of tuning parameters for spectral clustering, but it didn’t get to it on the first try, or even second or third. ChatGPT needed significant nudging. It felt like I was teaching it what to do, rather than vice versa. To be clear, the instructions for carrying out this task are literally contained — verbatim — on the website documenting sklearn.
It would have been a lot faster and easier to get an example from the sklearn documentation, freely available online.
Next, I gave it a more open-ended query:
For this, it provided three reasonable approaches with variable complexity: thresholding, adaptive thresholding, and blob detection. They all relied on the cv2 package. I could say “without cv2” and it gave me a ton of other algorithms for carrying out the task. Whether they would run without some debugging isn’t as important as the fact that ChatGPT gave me a ton of ideas and ways to implement them. Here, I’d say ChatGPT did a great job. So while I’m sure that ChatGPT3 will save me a lot of finger calories when I’m carrying out basic tasks or starting a new project, it clearly doesn’t have the sophisticated knowledge of data analysis that you would need in most areas of scientific computing. I’m not worried about it taking coding work away from anyone in the lab.
Practical lab tasks: Suggestions to be checked
For practical tasks like preparing solutions or carrying out an experimental method, ChatGPT will offer understandable instructions but these should never be a substitute for methods printed in a paper. I emphasize the “understandable”, because while a journal provides methods sections that are exhaustive and comprehensive, the results ChatGPT provides will be readable but not necessarily complete (and potentially inaccurate as we saw above). But unlike a journal, with ChatGPT you can ask follow-up questions, or clarification. I imagine this could especially be useful for non-native English reading scientists who might benefit from having a certain sentence or word rotated out by ChatGPT’s transformation process, and substituted with another word or rewritten in an alternative way.
I asked ChatGPT about the optimal concentrations for several chemical reagents, and its answers were accurate. It also told me to perform pilot experiments. It also provided me with a recipe for making artificial cerebrospinal fluid, a standard neuroscience lab concoction. It gave me this recipe, which is roughly equivalent to one you’d find at this Cold Spring Harbor website⁴ — which kind of begs the question of why you wouldn’t just use the website since we know Cold Spring Harbor is a reputable source.
When asked about the best antibodies to use in an experiment, ChatGPT provided an accurate and comprehensive answer. I would say ChatGPT performed well with searching for lab materials. In general, ChatGPT can be effective as a personal shopper.
There are more obscure antibodies, which it did not tell me about, and I am certain I could have returned these results using Google, although it might have required more clicking through various websites, getting tons of cookie requests, and seeing irrelevant information. In this way, ChatGPT helps you shop.
Setting up a laboratory: Comprehensive and correct
As you may have already realized, ChatGPT is incredibly chatty. It will give you a lot of details, which can be helpful when you are trying to learn about many aspects of a topic. This makes it useful for answering general questions about setting up a laboratory:
ChatGPT also gave me a reasonable estimate for a budget for my hypothetical rodent electrophysiology lab:
These are all very realistic, and they even give ideas for specific parts. Again, ChatGPT can be very useful if you have money to burn or you’re writing a hypothetical budget — but this is something which your grants office could probably help you do in a more accurate way. ChatGPT also knows about grants, both public and private, even for specific types of research. When asked about private grants to fund electrophysiology research, it returned Brain & Behavior Research Foundation NARSAD, McKnight, Alzheimer’s Association, Whitehall, and Simon’s as potential funders — nothing made up!
Create instructional materials: Excellent
Occasionally, researchers will have to design courses, and ChatGPT really shines here. It can easily generate a variety of different syllabi for even specific types of seminars. Here’s one:
When I asked it for readings, it gave me a list of readings for each week, which were actually true. I suspect it has learned a large corpus of syllabi that repeat and it’s assembling fragments of those. I asked it for discussion questions and it provided me this comprehensive list:
It also gave me answers those questions which were correct. As long as you don’t get too specific about what you want to learn, this could save you a ton of time when you’re teaching a course.
Conclusion
One of the most important things to keep in mind when using ChatGPT is that it is not a search engine. It is creating responses to you based on matching statistical regularities in its “knowledge base”. This is why ChatGPT will invent statements that you will never actually find anywhere on a website or document. You’ll find it helpful for rephrasing a sentence you’ve written, but not for finding out details about what a specific scientific paper contains. ChatGPT is not fact-checking itself, it’s not even checking its own links, or running code that it gives you. Thus, ChatGPT should be seen as a time-saving assistant for lab tasks, compiling information for you to assess, and never as an authoritative source.
What comes next?
ChatGPT is nowhere near usurping the skills or knowledge of a scientist or even a coder. The productive attitude toward the use of ChatGPT and future AI in the sciences will be to learn the limitations of the technology and use them in a careful way to free up your time to hone the skills necessary for filling those gaps in the AI technology.
Even with improvements in technology, subjective measures like beauty, clarity, or utility, remain squarely in the realm of human intelligence. In science, this means that ChatGPT cannot output a highly-memorable and understandable figure, poster or conference presentation, a grant that answers exactly the right question at the right time for your field, or a software package that will be used by many. Additionally, for specific tasks, ChatGPT has minimal reasoning abilities.
If there are inconsistencies in the literature, ChatGPT will not ponder them, weigh the quality of the evidence, or make logical deductions about how to resolve inconsistencies based on the premises its knowledge contains.
You can ask it to hypothesize, but it will only recombine the previous literature in predictable (and strange, or even very incorrect) ways. It won’t figure out the solutions to open questions in science. For now, ChatGPT is less like a wise senior scientist and more of an autocompleting Wikipedia that can code a lot of things somewhat well. In other words, a combination of technologies that have recently been developed along with some slight improvements. These are the realities that must be remembered as we familiarize ourselves with this new technology.
Lastly, here’s a chart to show you how you might think about what tasks are appropriate and safe for ChatGPT to do. You’ll want to use ChatGPT with tasks where there is a high agreement among humans, and where the cost of getting it wrong is low. There are grey areas, where you should exercise care and consult with peers or your supervisor. Then there is the “Danger Zone”, which are tasks that you should certainly consult and work with others to do.
If you made it down here, here is my summary of how ChatGPT fared on a variety of science uses
Writing: 5/5
Literature search: 1/5 (avoid!)
Proprietary and open-source software: 0/5 (hard pass!)
Data analysis and coding: 4/5
Guiding practical lab tasks (including shopping): 3/5
Setting up a lab: 4/5
Creating course materials: 5/5
As there is definitely no substitute for crowdsourcing knowledge, please feel free to use the comments section to add your own uses of ChatGPT or other types of AI inside (and outside) the laboratory!
References:
[1] The Bibliomagician. “AI-based citation evaluation tools: good, bad or ugly?” https://thebibliomagician.wordpress.com/2020/07/23/ai-based-citation-evaluation-tools-good-bad-or-ugly/ (2020).
[2] Brown, Tom B., et al. “Language models are few-shot learners.” arXiv preprint arXiv:2005.14165 (2020).
[3] Parnaudeau, Sebastien and O’Neill, Pia-Kelsey, et al. “Inhibition of mediodorsal thalamus disrupts thalamofrontal connectivity and cognition.” Neuron 77.6 (2013): 1151–1162.
[4] “ACSF Recipe.” Cold Spring Harb Protoc (2007) (http://cshprotocols.cshlp.org/content/2007/2/pdb.rec10804.full?text_only=true)