I am going to diverge from my normal approach, where I put the bottom line on top. This time, I am putting the bottom line on the bottom. You will have to carefully read a disjointed set of thoughts building to a conclusion to understand why I started writing this. NUMBER 7 WILL SHOCK YOU.
Attention is All You Need
The world changed back in 2017 with the publication of the paper “Attention is All You Need”, by Vaswani et al, but the world didn’t know it yet. The paper introduced the Transformer model. The T in ChatGPT, well the capital T. The paper describes a novel neural network architecture that has since become the foundation for many state-of-the-art natural language processing (NLP) models, including BERT, GPT, and their successors. Do you even remember what you were doing in 2017? Thanks to keeping a blog, I am able to use my blog as a second brain and remember. I built a naive image classification algorithm to detect hotdogs or not hotdogs from video and I also went back to college to get a mathematics degree. I wanted to learn the math behind AI, why it worked, and how I could be a part of the tsunami.
“The next breakthrough in AI tech will not take six years to break through.”
Hans Scharler
Before the Transformer model, recurrent neural networks (RNNs) and long short-term memory (LSTM) networks were the dominant approaches for sequence-to-sequence tasks in NLP, such as machine translation. However, these models had limitations, particularly in their ability to handle long-range dependencies in sequences and their slower training times due to their sequential nature.
Here’s why Transformer models are different.
- Self-attention mechanism: Instead of relying on recurrence or convolutions, the Transformer utilizes a self-attention mechanism to capture the dependencies between words in an input sequence. This mechanism allows the model to weigh the importance of each word in the context of the entire sequence, making it more effective at handling long-range dependencies.
- Positional encoding: Since the Transformer model does not have an inherent sense of position or order of words in the sequence, the authors introduced positional encoding. This technique adds a unique vector to each input token’s embedding, representing its position in the sequence. This allows the model to learn positional relationships between words.
- Multi-head attention: The Transformer uses multiple self-attention “heads” to allow the model to focus on different aspects of the input simultaneously. This improves the model’s ability to capture complex relationships between words.
- Layer-wise parallelism: Unlike RNNs and LSTMs, which process input tokens sequentially, the Transformer processes all input tokens simultaneously. This enables greater parallelism during training, leading to faster training times and better scalability.
Even though Transformer models were a good idea and produced words and ideas, it took time for the tech to break through. That’s how innovation works. It takes time, gradual progress, adjacent tech catching up, and a collective of people keeping their attention on the space until there is a breakthrough or at least something useful emerges.
When you invent a new technology, you uncover a new class of responsibilities
All innovation comes with a set of ethical, social, and environmental obligations that must be considered and addressed. As new technologies are developed, they can potentially bring about significant changes and have far-reaching consequences, both positive and negative. It is the responsibility of the creators, developers, and users of these technologies to ensure that they are used in ways that promote the greater good and minimize harm.
Ethical considerations play a significant role in the development and use of new technologies. It’s essential to make sure that innovations align with moral principles and values, respecting human rights, privacy, and dignity. Simultaneously, it’s crucial to assess the social impact of new technologies, evaluating how they might affect society, economic systems, and cultural norms. Potential issues such as the digital divide, job displacement, and the reinforcement of social inequalities should be taken into account.
The environmental impact of new technologies is another important responsibility. Developers should consider the ecological consequences, such as energy consumption, waste generation, and overall sustainability, throughout the technology’s entire life cycle. Implementing measures to minimize harm to the environment is vital. Additionally, ensuring the safety and security of new technologies is critical, as is the need for appropriate measures to protect against misuse, hacking, or other threats.
Collaborating with policymakers and regulators is necessary to establish suitable laws and guidelines for the responsible development and use of new technologies. Educating the public about the technology and its potential risks and benefits is essential, along with promoting responsible use and digital literacy.
If the technology confers power, it starts a race
Remember when OpenAI was open? They started very open, but when they noticed how powerful GPT-3 was, they closed up and started building a business. It started a race. They had something and they were going to monetize. And, yes, they do have something and it is powerful. Now, there are hundreds of companies investing in this space.
When a new technology offers significant advantages or capabilities, it often triggers a competitive race among individuals, organizations, or even nations to acquire and exploit it. This race can lead to rapid advancements and widespread adoption of said technology, but it might also result in unintended consequences or exacerbate existing inequalities.
Consider the invention of the smartphone, a powerful technology that transformed the way we communicate, work, and access information. As smartphones gained popularity and their potential became apparent, a competitive race among tech companies ensued. These companies strived to create the most innovative, feature-rich, and user-friendly devices in an attempt to capture the largest market share.
While this race led to remarkable advancements in smartphone technology, it also had some downsides. The fierce competition drove companies to prioritize quick product releases, often at the expense of rigorous testing or addressing potential security vulnerabilities. This, in turn, left some devices prone to hacks or malfunctioning, putting users’ data and privacy at risk. Additionally, the race for market share created an uneven start, as some companies established a dominant position, leaving smaller or newer competitors struggling to catch up.
I can see this happening with AI tech too. AI has the potential to confer significant power and influence in various aspects of our lives and business, from healthcare and finance to education and national security. This has prompted a competitive sprint among tech companies, researchers, and nations to develop and harness AI’s capabilities.
To ensure a more equitable and responsible development of AI technology, it is crucial to recognize the challenges posed by this competitive race and foster collaboration, open communication, and shared goals among stakeholders. By doing so, we can harness the transformative potential of AI and steer it towards a future that benefits all of humanity.
If you do not coordinate, the race ends in tragedy
The lack of cooperation, communication, or organization can lead to disastrous consequences. This can apply to various situations where teamwork and coordination are crucial to achieving a successful outcome.
Imagine a group of friends who decide to participate in a boat race. They have an impressive boat and each of them possesses a unique skill set that could help them win the race. However, they neglect to properly coordinate their roles, discuss their strategies, or practice together. They all assume that their individual talents will be enough to secure victory.
As the race begins, the friends quickly realize that their lack of coordination is hindering their progress. Each person tries to steer the boat in a different direction, causing confusion and chaos. The rowers are out of sync, and their oars keep getting tangled, slowing them down. Some team members attempt to take charge and shout conflicting instructions, while others become frustrated and disengage from the effort.
As the race continues, the boat drifts off course and starts taking on water. Despite their individual abilities, the lack of coordination among the team members ultimately leads to their downfall. Their boat capsizes, and they watch from the water as the well-coordinated teams pass them by, working together seamlessly to reach the finish line.
You knew my boat story had nothing to do with boats!? Maybe I was talking about AI tech. As AI systems become increasingly sophisticated and integrated into various aspects of society, it is essential for researchers, developers, policymakers, and users to work together to ensure responsible and ethical development and deployment. A lack of coordination could lead to unintended consequences, misuse, or exacerbation of existing inequalities. By fostering a spirit of cooperation, open communication, and shared goals, the AI community can harness the transformative potential of this technology and steer it towards a future that benefits all of humanity, rather than letting it become a tragedy of uncoordinated efforts.
Community is all you need.
If we thought of AI tech as a community effort, we would not only care about ourselves individually (or a small, select few), we would care about the community itself.
“Community is all you need.”
Hans Scharler
My seminal paper, “Community is All You Need” by Scharler et al, centers on the idea that collaborative efforts and a strong sense of community are crucial for the responsible development and deployment of AI solutions moving forward. Drawing parallels to the AI paper “Attention is All You Need,” we can similarly emphasize the importance of fostering a community-centric approach in the AI ecosystem as we move forward.
Just as the attention mechanism in the Transformer model allows for capturing dependencies between words in a sequence, fostering a sense of community in the AI field can create stronger connections and dependencies among researchers, developers, policymakers, and users. By encouraging open communication, collaboration, and the sharing of ideas, the AI community can work together to tackle complex challenges, overcome potential biases, and ensure that AI technology benefits a diverse range of stakeholders.
In the same way that the Transformer model leverages multi-head attention to focus on different aspects of the input simultaneously, a strong AI community can benefit from the diverse perspectives and expertise of its members. By bringing together individuals from different backgrounds, disciplines, and sectors, the community can collectively address the multifaceted ethical, social, and technical challenges posed by AI technology.
The parallel processing nature of the Transformer model, where it processes all input tokens simultaneously, can also be seen as an analogy for the importance of inclusivity and collective decision-making in the AI community. By giving equal weight to the input and concerns of all stakeholders, the community can ensure that the development and deployment of AI technology are driven by a broad consensus that reflects the needs and values of society as a whole.
Attention is all you need until community is all you need in order to have AI move forward. This groundbreaking idea from Hans Scharler emphasizes the critical role of collaboration, diversity, and shared goals in the responsible development of AI technology. By nurturing a strong sense of community and fostering open dialogue among all stakeholders, the AI ecosystem can harness its collective expertise and creativity to steer AI toward a future that is both equitable and beneficial for all of humanity. And, profitable for a lot of people.