32 minutes to read - Nov 22, 2023

How to build a Sustainable ChatGPT Server Architecture?

VISIT
How to build a Sustainable ChatGPT Server Architecture?
At last week’s press conference, OpenAI announced the release of the GPT-4 model. The biggest improvement of GPT-4 compared to its previous versions is its multimodal capability – it can not only read the text but also recognize images.

It is worth noting that although there were previous reports stating that GPT-4 has one quadrillion parameters, OpenAI did not confirm this number. Instead, OpenAI emphasized GPT-4’s multimodal capability and performance in various tests.

According to OpenAI, GPT-4 performed better than most humans in various benchmark tests. For example, in the United States Uniform Bar Exam, the Law School Admission Test (LSAT), the math and evidence-based reading and writing sections of the SAT, GPT-4 scored higher than 88% of test-takers.

In addition, OpenAI is collaborating with several companies to integrate GPT-4 into their products, including Duolingo, Stripe, and Khan Academy. At the same time, the GPT-4 model will be provided to paying subscribers of ChatGPT Plus in the form of an API, which developers can use to build various applications. Microsoft also announced that the new Bing search engine would run on the GPT-4 system.

During the press conference, the presenter drew a rough sketch on a notepad and asked GPT-4 to create a website based on the sketch, including generating the website code. Impressively, GPT-4 generated the complete website code in just 10 seconds, achieving a one-click website generation effect.

The high-performance operation of ChatGPT depends on a stable server-side architecture. Establishing a sustainable server-side architecture can not only ensure the stability and reliability of ChatGPT but also help reduce energy consumption, lower costs, and support the sustainable development strategy of the enterprise. Therefore, this article will explore how to build a sustainable high-performance server-side architecture for ChatGPT.

 Is ChatGPT a breakthrough or an extension of AI?

ChatGPT is a natural language processing technology that can generate more realistic and natural conversations based on existing data through model training. Its development naturally extends the previous progress in AI but also achieves breakthroughs in certain aspects.

ChatGPT has more substantial conversation-generation capabilities. In earlier AI technologies, conversation generation was often based on rule-based and pattern-based approaches, which faced limitations and constraints. However, the GPT series models can generate more realistic and natural conversations that are flexible and adaptable to different conversation scenarios through extensive language training data.

ChatGPT’s training method has also changed. In the past, human participation was usually required in the data annotation process to enable the machine to understand the meaning of human language. However, the GPT series achieves more realistic and vivid conversation generation through unsupervised training, learning the rules and structures of human language expression from vast amounts of language data.

The GPT series models also have advantages in handling multilingual and multi-scene conversations. Traditional AI technologies mainly adapt to a single scene and lack language diversity. However, the GPT series models can hold conversations between different languages and meet the needs of different conversation scenarios due to their ability to train in multiple languages.

 ChatGPT’s Development and Prospects in Two Years
OpenAI has been co-developing GPT-4 for more than three years, and its release is expected soon. GPT-4 is likely to see significant efficiency improvements, but the specific new capabilities it will bring are still uncertain. It is known that GPT-4 will address important issues like optimizing data parameter ratios, improving information processing and pattern recognition efficiency, and enhancing information input quality compared to GPT-3.5. It is highly likely to see a substantial increase in efficiency, and the inference cost may be significantly reduced (possibly by a factor of 100). However, the size of the model and whether it will have multimodal capabilities are still unknown.
① ChatGPT currently faces many problems, but most of them have relatively simple engineering solutions. For example:
The “hallucination” problem (ChatGPT tends to produce inaccurate outputs) can be corrected by optimizing accuracy and introducing search data, and humans can participate in the judicial process to increase accuracy. In addition, when applying ChatGPT, assistance can be provided by making preliminary judgments on the scenes.
The issue of ChatGPT’s limited memory can be solved using the open interfaces provided by OpenAI. A particularly impressive solution involves explaining to ChatGPT that the provided content is only part of all the information before prompting it to answer.ChatGPT’s self-review capability is not only based on rules but also on understanding, which is more adjustable. OpenAI has also proposed a vision of allowing ChatGPT to adjust its speech scale according to its needs while respecting basic rules.
② The cost of ChatGPT is expected to decrease significantly, especially in terms of inference, which is likely to decrease by more than two orders of magnitude.
The cost of ChatGPT is expected to decrease significantly, particularly in terms of inference, which is likely to decrease by more than two orders of magnitude. Sam has previously stated in public that the inference cost for ChatGPT is just a few cents per message. Furthermore, according to detailed research by Jefferies Research in the “Key 
Takes from ChatGPT and Generative AI.pdf”, ChatGPT’s inference is most likely based on idle x86 CPUs rather than GPUs.
Based on our understanding of inference and the optimization potential of large language models, we believe that the inference cost will decrease significantly, which is highly probable. The decrease in cost means expanding the range of applications and the ability to collect data. Even if the number of ChatGPT users reaches a billion DAU level (the current estimation of 100 million DAU is inaccurate), it could be offered for free, with some limitations on usage. For example, New Bing once limited search frequency to 60 times but has since removed this restriction. These real-life conversations will undoubtedly further strengthen ChatGPT’s competitive advantage.
③ Regarding the “capability” sub-model of ChatGPT, it may need to be retrained. 
However, the “knowledge” sub-model only requires the input of new knowledge through instructive prompting, without the need to modify the existing pre-trained model.
For many subtasks, as long as ChatGPT has the ability to understand and a sufficient amount of knowledge, its performance can be continuously adjusted through dialogue, guidance, and education, allowing it to exhibit new abilities in various subtasks. In contrast, previous AI technologies needed to retrain models when faced with new tasks, and could not simply input new knowledge like ChatGPT.
If we use Iron Man 3 as an analogy, ChatGPT is like a general-purpose suit of armor that can handle most tasks. Through “education” and “guidance,” ChatGPT can perform a variety of tasks in multiple domains, such as providing medical advice, legal references, code framework writing, marketing plan development, psychological counseling, and serving as an interviewer.
It is important to emphasize the importance of promoting. Microsoft’s New Bing did not make significant modifications to ChatGPT but instead used prompting to guide ChatGPT in conducting reasonable searches. Based on prompting, if one wants to focus on certain aspects, such as sacrificing dialogue continuity to improve information accuracy, then retraining the model and making adjustments may be necessary. This may require integration with other capability modules, such as search and interfaces with other models, and the fusion of some tools, just like those specialized suits of armor. In short, by constantly refining ChatGPT’s capabilities and using tools, its application scope can be expanded, and more possibilities can be unlocked.
④ As time goes on, we predict that the prompting capabilities of self-service ChatGPT will be greatly improved, and more features will be gradually opened up.
This not only has obvious commercial advantages but also allows users to gradually train their own ChatGPT to adapt to their preferences and learn unique knowledge (rather than just being limited to skill prompts). Additionally, while ChatGPT’s model is still closed source, competition on different application layers can still be developed and improved, addressing concerns of only being able to provide UI design to OpenAI. Imagine a scenario where your ChatGPT can record all your conversations with them and gradually learn from your feedback. If you are an excellent marketing manager, over time, your ChatGPT will also acquire superior marketing skills compared to others.
⑤ GPT-4 is expected to significantly enhance the capabilities of ChatGPT, enabling it to perform at the level of a “good employee” in multiple domains.
The recent paradigm shift has demonstrated the significant difference between New Bing and ChatGPT. We have good reason to believe that GPT-4 will make huge strides in the following areas:
Larger models, more data, and optimized parameter and data ratios. The direction of these optimizations is clear, as more parameters and data can lead to more powerful models, but the ratio between them must be appropriate to ensure the model can fully absorb the knowledge from the data.More targeted training datasets. OpenAI’s ability to generate high-quality data is almost unparalleled, and after years of experimentation with GPT-3, they have a better understanding of which types of data are most useful for enhancing specific model capabilities (such as reading more code and adjusting the ratio of multiple languages).
Possible integration of “ability modules”. New Bing extends ChatGPT’s capabilities by incorporating search functionality. Is there a way to directly integrate search functionality into a pre-trained large-scale model? Similarly, consideration should be given to how other capabilities can be efficiently integrated into ChatGPT based on pre-trained large-scale models, and how they can be trained for a wider range of scenarios. Therefore, it is predicted that in the next two years, ChatGPT based on GPT-4 will be able to reach the level of a level 9 employee in most scenarios, with stronger induction and “understanding” abilities.
Exploring the Capability Barrier between ChatGPT and GPTChatGPT barriers come from the following aspects:
① The first source of ChatGPT’s barriers is that GPT-3 is closed-source
OpenAI is very cautious and unlikely to open-source ChatGPT. Therefore, the path of relying on “open-source model domestic implementation” for domestic machine learning seems unrealistic for ChatGPT.
② Increasing the model parameters requires strong engineering capabilities
As well as ensuring that large models can effectively learn knowledge from big data. OpenAI’s blog emphasizes the importance of addressing these issues in order to train models that produce outputs that meet human needs. Engineers with a “principled” thinking habit are needed to tackle these engineering bottlenecks. It is reported that OpenAI’s high talent density has successfully overcome many engineering bottlenecks. Therefore, the next step of engineering accumulation should be built on the foundation of the previous engineering breakthrough.
③ In specific business environments, practicality is emphasized
Such as ByteDance’s recommendation algorithm model, which is very large and difficult. However, continuous optimization based on existing patterns cannot form a paradigm shift. In the real business environment, if the model cannot provide positive feedback to the business, its development will be greatly hindered.
④ Leadership’s technical judgment is a scarce resource
The successful combination of New Bing and ChatGPT is seen as a rare miracle, surpassing other players in the market. This is not a replicable model and such expertise is hard to come by.
⑤ The data flywheel has formed
ChatGPT is one of the most successful consumer products, combining Microsoft’s resources and channels to gain a strong foothold. Therefore, ChatGPT’s usage data can continuously improve the model itself. ChatGPT’s blog also emphasizes its unique mechanism, which enables a closed loop for the use, understanding, and production of data.
ChatGPT- A New tool for the future AI era
The growth of ChatGPT’s DAU is phenomenal, and user feedback indicates its exceptional usefulness. While ChatGPT has significant entertainment value, its ability to improve productivity is even more prominent. Dialogue and reading are actually higher-level forms of entertainment, and in most cases, richness and depth are not the main determining factors of entertainment value. Therefore, we recommend focusing more on improving productivity when using ChatGPT.
Additionally, it is important to remember that ChatGPT is a disruptive product, not an incremental improvement. Early adopters of technology, may already be unable to leave ChatGPT. However, for the general public, opening a search engine for search is not even a common habit, and the degree of conversation with ChatGPT using clear and reasonable prompts is even lower. Therefore, in the next few years, ChatGPT will replace more SaaS, cloud, and efficiency tools such as search engines.
In practical application scenarios, we should follow two principles: targeted treatment and choosing the best course of action. ChatGPT is not equal to search engines and programs. We should let it play to its strengths rather than try to replace other more efficient tools or services. Additionally, given the clear illusion problem with ChatGPT currently, we should remain vigilant and not blindly trust ChatGPT’s conclusions in all situations. Instead, we should use ChatGPT in cases where human judgment is necessary, and have a person examine the truthfulness of its conclusions.
✅ ChatGPT and humans are essentially exploring the difference between the two
Due to the insufficient maturity of neuroscience and cognitive science, we can only explore the similarities and differences between humans and ChatGPT from a philosophical perspective.
Firstly, in terms of judgment, ChatGPT can only extract digital signals from the virtual world and cannot interact with the real world. Building the foundation of judgment requires practical experience.
Secondly, if ChatGPT relies solely on digital signals to make inferences, it is likely to produce incorrect conclusions. For example, Newton discovered the law of universal gravitation by observing an apple falling and predicting the movement of the stars. However, at that time, many people believed that the sun revolved around the earth. If there had been a ChatGPT at that time, it would likely have produced the wrong conclusion. Therefore, identifying thinking abilities, such as moments of “inspiration” and “enlightenment,” is also significant in everyday life.
Thirdly, ChatGPT may be better than humans in inducing existing knowledge. However, creating new knowledge that does not exist on the internet is something that ChatGPT cannot do.

Fourthly, in terms of understanding humans, humans can comprehend humanity without relying on research, questionnaires, or internet data. Furthermore, through real-world experience, humans can bring incremental understanding about humanity, which ChatGPT cannot achieve. This suggests that to truly understand humanity, one must engage in practical experience in the real world rather than blindly repeating what others say.
✅ Exploring ChatGPT’s Demands for Computing Power
The demand for computing power in AI models is mainly reflected in the training and inference stages. Currently, mainstream AI algorithms can usually be divided into two stages: “training” and “inference”. According to CCID Consulting, in 2022, China’s digital economy exhibited strong growth momentum, with a year-on-year increase of 20.7%, which was 2.9 percentage points higher than that of 2021. 
The scale of the digital economy reached 45.5 trillion yuan, more than twice the size of Germany, ranking second in the world in terms of digital economic development level. In recent years, China has also been actively promoting the acceleration of digital innovation capabilities, accelerating the upgrading of industry digitization, and narrowing the gap with the United States in digital economic competitiveness.
① Training Stage
The process of adjusting and optimizing the AI model to achieve the expected accuracy. To make the model more accurate, the training stage usually requires processing a large amount of data sets and using iterative calculations, which requires a lot of computing resources.
② Inference Stage
Compared with the training stage, the computing power requirement is not as high, but since the trained AI model needs to be used multiple times for inference tasks, the total amount of computation in inference is still considerable.
The computing requirements for ChatGPT can be further divided into three stages based on the actual application: pre-training, fine-tuning, and daily operation. In the pre-training stage, a large amount of unlabeled text data is used to train the model’s basic language ability, resulting in basic large models such as GPT-1, GPT-2, and GPT-3. In the fine-tuning stage, supervised learning, reinforcement learning, and transfer learning are used to optimize and adjust the model parameters based on the basic large model. In the daily operation stage, the model parameters are loaded based on the user input information, and inference calculations are performed to provide feedback and output the final results.
ChatGPT is a language model whose architecture is based on a Transformer. The Transformer architecture consists of an encoder and a decoder module, but GPT only uses the decoder module. Additionally, Transformer has three levels: the feedforward neural network, the self-attention layer, and the self-attention mask layer, all of which interact with each other to achieve model efficiency.
Self-attention is one of the most important parts of the Transformer, and its main function is to calculate the weights (i.e., attention) of each word to all the other words in the input sequence. This allows the model to better understand the inherent relationships within the text and efficiently learn the relationships between inputs. The self-attention layer also allows the model to perform larger-scale parallel computations, greatly improving computational efficiency.
The feedforward neural network layer provides efficient data storage and retrieval. At this level, the model can effectively handle large-scale datasets and perform efficient computations.
The mask layer is used to filter out unseen words on the right-hand side of the self-attention mechanism. This masking allows the model to only attend to the text that has already been presented, ensuring computational accuracy.
Compared to previous deep learning frameworks, the Transformer architecture has significant advantages. The parallel computing ability of the Transformer architecture is stronger, greatly improving computational efficiency. This allows GPT to train larger and more complex language models and better address language processing problems.
Based on previous data, it is estimated that daily operations will require approximately 7034.7 PFlop/s-day of computing power. User interaction also requires computing power support, with a cost of approximately $0.01 per interaction. According to ChatGPT’s official website, the total number of visits in the past month (January 17 to February 17, 2023) was 889 million. Therefore, the operational computing cost paid by OpenAI for ChatGPT in January 2023 was approximately $8.9 million. Additionally, 
Lambda has stated that the computing cost to train a GPT-3 model with 174 billion parameters is over $4.6 million, while OpenAI has stated that the computing power required to train such a model is approximately 3640 PFlop/s-day. Assuming a constant unit computing cost, ChatGPT’s monthly computing power requirement is approximately 7034.7 PFlop/s-day.
ChatGPT is a model that requires continuous fine-tuning to ensure it is in the best possible state for the application. This tuning process involves developers adjusting model parameters to ensure output is not harmful or distorted, and conducting large-scale or small-scale iterative training based on user feedback and PPO strategies. The computational power required for this process will bring costs to OpenAI, and the specific computational requirements and cost amount depend on the speed of the model’s iterations.
It is estimated that ChatGPT requires at least 1350.4 PFlop/s-day of computational power for fine-tuning per month. According to IDC’s predictions for China’s AI server load in 2022, the ratio of inference to training is 58.5% and 41.5%, respectively. Assuming ChatGPT’s computational requirements for inference and training follow this distribution, and knowing that monthly operation requires 7034.7 PFlop/s-day of computational power, and one pre-training requires 3640 PFlop/s-day, we can further assume that at most one pre-training can be done per month. From this, we calculate that the cost of computational power for ChatGPT’s monthly fine-tuning is at least 1350.4 PFlop/s-day.
With the parameter gap exceeding 10 times from 117 million of GPT-1 to 1.5 billion of GPT-2, there was a leap in performance. This seemed to indicate that as capacity and parameter size increased, there was even greater potential for model performance. Thus, in 2020, the parameter size of GPT-3 increased 100 times to 175 billion, with its pre-training data size reaching 45TB (GPT-2 was 40GB and GPT-1 was around 5GB). In fact, the massive parameters really give GPT-3 stronger performance, performing well in downstream tasks. Even in complex NLP tasks, GPT-3 is impressive: it can imitate human writing and write SQL queries, React or JavaScript code, etc. Looking back at the development of GPT-1, GPT-2, and GPT-3, many people have high hopes for GPT-4, and there are even rumors that GPT-4 will have parameters as high as 100 trillion.
Given that the model adjustment process may need to be repeated many times due to human feedback mechanisms, the cost of computational power required may be even higher.
✅ What are the types of ChatGPT servers?
① Current situation of server development in China
Countries are accelerating the development of the digital economy, traditional industries are undergoing digital transformation, and enterprises have a strong demand for digital and intelligent solutions, especially with the rapid development of new technologies such as 5G, big data, and artificial intelligence, which continuously empower the server industry.
High computing power demand drives the server industry to embrace new development opportunities.
As a core productivity factor, computing power is applied in various fields such as the internet, government, finance, etc. With the emergence of new concepts such as metaverse and Web 3.0, more complex computing scenarios generate high computing power demands, driving server products to upgrade towards higher computing performance.
The accelerated construction of large-scale data centers boosts the growth of the server market.The construction of large-scale data centers is the main driving force for the global server market growth, and the procurement of data center servers in most regions around the world such as North America and Asia-Europe continues to increase.
② ChatGPT requires two types of servers: AI training servers and AI inference servers.
Edge computing requires a large number of machines to handle high-load requests, and the traditional client-server (CS) model can no longer meet this demand. The current internet architecture is shifting toward the Content Delivery Network (CDN) services-based Cloud-Edge-Service (CES) model. However, the CES model has limitations in processing and storing unstructured data on the edge. Therefore, Edge computing is introduced to address this issue. In AI training scenarios, the changes in computational intensity and data types make the CES model insufficient. Therefore, the computational architecture is returning to the CS model and evolving towards efficient parallel computing.
As the hardware core, servers face different computing scenarios, and changes in computing architecture are the key to the evolution of server technology. With the emergence of cloud computing, edge computing, and AI training, server demands are constantly changing. Single servers focus more on individual performance, while cloud data center servers focus more on overall performance. Edge computing requires higher real-time data exchange and requires more server facilities. AI servers are mainly used for AI training, using vector/tensor data types and improving efficiency through large-scale parallel computing.
In the same technology roadmap, servers are continuously iterated toward data processing requirements. Looking back at the development history of mainstream servers, different types of servers have different driving forces as data volume increases and data scenarios become more complex. Specifically:
Traditional general-purpose servers have developed relatively slowly, mainly optimizing their performance by improving hardware indicators such as processor clock frequency, instruction set parallelism and core count. In comparison, cloud computing servers have developed rapidly and matured. This process began in the 1980s and was accelerated by the launch of products such as VMware Workstation, Amazon AWS, and the emergence of the OpenStack open-source project. Currently, cloud computing is widely used worldwide, and many companies use popular cloud service providers (such as AWS, Azure, Google Cloud, etc.) to store and process data.
 The concept of edge computing servers was incubated in 2015 and has since seen the emergence of edge computing platforms such as AWS Greengrass and Google GMEC. As more and more devices (such as wearable devices and smart home devices) connect to the Internet, the demand for edge computing technology is increasing. Finally, AI servers are tailored for AI and machine learning workloads, and their hardware architecture is more suitable for training computing power requirements. As AI applications become more widespread, the demand for AI servers is also increasing.
③ Cloud computing servers have brought about a significant business model shift in large-scale data processing requirements.
The emergence of cloud computing servers was to meet the high-performance computing needs brought about by the explosive growth of data. Traditional general-purpose servers rely on improving hardware specifications to enhance performance, but as CPU technology and the number of cores per CPU approach their limits, they cannot meet the performance needs of the explosive growth of data. In contrast, cloud computing servers use virtualization technology to pool computing and storage resources, virtualize and centralize individual computing resources that were originally physically isolated, and achieve high-performance computing through clustering. In addition, the computing power of cloud computing servers can be expanded by increasing the number of virtualization servers, breaking through the hardware limitations of single servers, and meeting the performance needs brought about by the explosive growth of data.
Cloud computing servers actually save some hardware costs and reduce the threshold for purchasing computing power. In the past, the cost of large-scale data processing was extremely high, mainly due to the high purchase and operation costs of general-purpose servers. Traditional servers usually include a complete set of devices such as processor modules, storage modules, network modules, power supplies, and fans. The architecture of cloud computing servers is simplified, eliminating redundant modules and improving utilization. In addition, cloud computing servers are designed for energy-saving needs, virtualizing storage modules and removing unnecessary hardware from the motherboard to reduce overall computing costs. Furthermore, the traffic billing model has helped many vendors bear the cost of computing power and reduced the threshold for purchasing computing power.
④ Edge server: Low latency with high data density and bandwidth constraints
Edge computing is a computing model that introduces an edge layer on the basis of cloud computing. It is located at the network edge close to the physical or data source and provides resources such as computing, storage, and networking to assist applications. Edge computing is based on a new architecture that introduces an edge layer, allowing cloud services to be extended to the network edge. In this architecture, the terminal layer consists of IoT devices, which are located closest to the users, and responsible for collecting raw data and uploading it to the upper layer for processing. The edge layer consists of devices such as routers, gateways, and edge servers, which are closer to the users and can run latency-sensitive applications, meeting the requirement of low latency for users. The cloud layer consists of high-performance servers and other devices that can handle complex computing tasks.
Compared to cloud computing, edge computing has advantages in terms of real-time performance, low cost, and security. It moves computing tasks from the central part or all of the cloud computing to the network edge closer to the user for processing, thereby improving data transmission performance and real-time processing. At the same time, edge computing can avoid the cost problem of long-distance data transmission and reduce the computing load of the cloud computing center. In addition, edge computing processes most of the data on local and edge devices reducing the amount of data uploaded to the cloud and lowering the risk of data leakage, thus having higher security.
⑤ AI server: More suitable for deep learning and other AI training scenarios
In the modern field of AI, the computational demands of large-scale models have surpassed the capabilities of traditional CPU servers. Compared to CPUs, GPUs (Graphics Processing Units) have architectures that are better suited for large-scale parallel computing, making them the primary choice for AI servers to improve computational performance.
AI servers are heterogeneous servers, unlike general-purpose servers. This means that they can use different combinations of hardware to improve computational performance, such as using CPU-GPU, CPU-TPU, and CPU-other accelerator cards, but with GPU as the primary way to provide computing power.
Taking the ChatGPT model as an example, it uses parallel computing to provide context for any character in the input sequence, resulting in not only higher accuracy but also processing all inputs at once instead of one word at a time, compared to RNN models.
From the perspective of GPU computing, GPU architecture uses a large number of computing units and extremely long pipelines, which allows for high throughput parallel computing compared to CPUs. This computational power is especially suited for large-scale parallel computing in AI.
Deep learning primarily involves matrix-vector calculations, and AI servers offer higher processing efficiency for this purpose. Looking at the ChatGPT model architecture, which is based on the Transformer framework, employs attention mechanisms for assigning weights to text words and a feedforward neural network for outputting numerical results. This process requires a large number of vector and tensor operations. AI servers usually integrate multiple AI GPUs that support multiple matrix operations such as convolution, pooling, and activation functions, to accelerate deep learning algorithmic computations. Therefore, in the context of artificial intelligence, 
AI servers tend to offer higher computational efficiency and have certain application advantages over GPU servers.
⑥ ChatGPT requires the following chips: CPU+GPU, FPGA, and ASIC
The training of GPT models requires significant computational power and may lead to increased demand for AI server infrastructure. As domestic manufacturers continue to invest in ChatGPT-like products, the pre-training, fine-tuning, and daily operation of large GPT models may require significant computational resources, driving growth in the domestic AI server market. For example, OpenAI reports that training a GPT-3 175B model requires approximately 3640 PFlop/s-day of computational power. Assuming calculations on the most powerful AI server currently available from Inspur Information, the NF5688M6 (PFlop/s), a single vendor would need to purchase 243, 146, and 73 
AI servers for pre-training periods of 3, 5, and 10 days, respectively
The demand for training AI models with large computational requirements is driving growth in the intelligent computing market, which is expected to lead to an increase in 
AI server infrastructure. According to IDC data, China’s intelligent computing capacity in 2021 was approximately 155.2 EFLOPS, calculated using half-precision (FP16) computational capabilities. As AI models become more complex, computational data volume rapidly increases, and AI applications deepen, China’s intelligent computing capacity is expected to grow rapidly in the future. IDC predicts that China’s intelligent computing capacity will increase by 72.7% YoY to 268.0 EFLOPS in 2022 and reach 1271.4 EFLOPS by 2026, with a CAGR of 69.2% from 2022 to 2026. We believe that AI servers, as the main infrastructure for carrying out intelligent computing operations, are expected to benefit from downstream demand growth.
✅ Words in the End
ChatGPT is a high-performance file transfer protocol that requires a sustainable server-side architecture to support its continued development. Here is a simple guide:
I. Understand customer requirements
Before building any server-side architecture, it is important to understand customer requirements. Some of the questions to consider include:
User count: How many users are expected to use the service?
Data volume: How much data will each user store? How much data is expected to be processed by the service?
Device types and platforms: What devices and platforms will users use to access the service?

II. Choose the right infrastructure
Choosing the right infrastructure is crucial to building a sustainable server-side architecture. Some common choices include:
Physical servers: This is the classic way of running servers locally. It requires purchasing server hardware and managing infrastructure.
Virtual Private Servers (VPS): VPS is a virtual server that runs on a shared physical server. Most cloud service providers offer VPS.
Cloud computing: Cloud computing allows you to gradually scale up and down infrastructure based on actual usage. Some providers include Amazon Web Services (AWS), 
Microsoft Azure, and Google Cloud Platform (GCP).
III. Design a scalable architecture
When designing your server-side architecture, you need to consider how to scale it to handle more traffic and users. Some key considerations include:
Horizontal scaling: This involves adding more servers to the system to handle more traffic and users.
Vertical scaling: This involves upgrading the same servers to handle more traffic and users.
Load balancing: This involves distributing requests to multiple servers to reduce load.
Caching: This involves storing the results of requests in memory to improve response time.
IV. Ensure security and reliability
When building any server-side architecture, security and reliability are crucial. This means you need to consider the following:
Data backup and recovery: You need to regularly back up data to prevent data loss and be able to quickly recover data if necessary.
Security: You need to ensure that your server-side architecture is secure, including using secure transmission protocols and encrypting data.
Monitoring and alerts: You need to set up monitoring and alert systems to be notified promptly when servers have problems.
A sustainable ChatGPT high-performance server-side architecture needs to consider multiple factors, including customer requirements, infrastructure choices, scalability design, and security and reliability guarantees. It can be achieved by assessing these factors comprehensively and taking appropriate measures.
Article source
Author: Leo Zhi , Leo Zhi
Author
loading...