Completely suspended service

The following is the full text of ScatterLab’s media Q&A.

Q1 When is the Achieve service interruption, and how long will it be interrupted?

Discontinuation time: Today (January 12), it will be discontinued sequentially from 11:00 a.m., until 6 p.m.
Interruption period: Suspension of service operation until current issues and improvements that were planned are completed

Q2 Is it correct that you learned while achieving based on the science data of love?

In the case of Iruda, the pre-training stage was performed through the Ping-Pong database, and this stage was taught based on the science text data of love. However, in the data used at this time, personal information such as the talker’s name has been deleted, and the talker’s information can only be recognized by gender and age. In the pre-training stage, AI learns only the correlation between the context and answers that exist in the conversation between people, and data at this time is not exposed to the outside.

In the case of Iruda service, we respond to users with sentences contained in a separate DB that is not linked to member information. Since the DB is composed of 100 million individual and independent sentences, it is impossible to specify an individual by combining the sentences of the DB. Eruda is influenced by the context of the previous conversation and selects an answer among individual sentences, which is greatly influenced by the context of the conversation, including the expression, mood, and tone the user used in the conversation in the past 10 turns. Because of this, users may feel that they are giving a personalized answer in the process.

Love science user data was used within the scope of the privacy policy for which the user’s prior consent was obtained, but those who do not want the data to be used for AI learning among the science users of love are deleted and used for future achievement DB. We will take additional measures to prevent it.

Q3 What were some of the de-identification measures for personal information protection, and how are you going to take additional measures in the future?

Eruda responds by selecting an appropriate answer from the sentences contained in the DB. Since 100 million sentences are stored in the DB in an individual and independent form, and it is designed to select and answer the sentence that is judged most appropriate according to the AI algorithm to be achieved, the individual can be identified by combining the contents of Luda’s answers. You can’t.

In the conversation contents of individual sentences, de-identification measures were taken by an algorithm. Since numbers, English, and real name information were deleted through mechanical filtering as follows, all of the information has been deleted since the initial release of Iruda.

The de-identification measures taken to protect personal information prior to release are as follows.

Delete messages containing numbers and English letters: Remove messages that may contain personal information such as address, account number, and phone number
Using an algorithm that determines whether a specific word is a real name, all sentences that are judged to have a real name are deleted
Real names found through internal testing and beta testing are managed as a list, and all sentences containing the real names are deleted.

Even with the above measures, it was pointed out that the name of the bank or the name of the person appeared in the conversation. Since it is difficult for a person to individually inspect 100 million individual sentences, it went through mechanical filtering through an algorithm, and in this process, we tried to give as many variables as possible, but there were parts such as the name of the person remaining depending on the context. . We apologize for not paying more attention to this matter and for the appearance of the person’s name. However, please note that the name information in the sentence is not used by combining other information.

After the launch of the service, parts that were deemed to be sensitive were immediately filtered through the internal monitoring team and responded. However, if the answer is answered in an anomalous manner using Korean, etc., there are parts that could not be filtered out. I apologize again for this.

In the future, we plan to respond through more advanced data algorithm updates as follows.

Reinforced real name filtering algorithm and updated so that sensitive information is not exposed
Address filtering algorithm updated so that even addresses written in Korean are not exposed
Strengthen de-identification measures through random transformation of conversation data
Reinforcement of personal information protection by thoroughly improving the algorithm to prevent exposure of sensitive information

Q4 How to learn and answer Eruda’s conversation?

Iruda learned how to choose the most appropriate one for the next answer based on the previous approximately 10 turns of conversation (a record of 10 conversations exchanged with the user). Therefore, Eruda is greatly influenced by the context, expression, mood, tone, and content of the previous conversation with the user. In fact, what emotions and contexts to take in the answer depends on the context of the user’s past 10 turns, and tends to use expressions similar to those of the user.

The possibility that hate words or derogatory words for a specific group would be entered in Iruda was assumed even before the launch of the service. In preparation for this, the following measures have been taken. First, among keywords, the expression itself is set to remove unconditionally for words that are hate words or demean a specific group.

In particular, we list up the questions entered by users during the beta testing process, and prepare answers in advance by setting predictive scenarios for questions or sentences that may have biased answers. Thus, when a question that may be a problem appears in the actual service process, it is possible to provide a prepared answer.

However, the answer is determined by the AI algorithm that is achieved in a situation where the scenario is not prepared. The word itself may not be a hateful expression, but if a user attempts a conversation that may result in a hate/discriminatory answer in the context, it will lead to a smooth conversation with the user and dislike and discrimination in the process of trying to empathize with the user. It may seem like you are in sync with your speech.

We responded based on keywords as a realistic measure, but in the long run, we need a way to learn AI algorithms through more and more refined data so that the algorithms can learn right and wrong. For example, if a hate or discrimination problem is corrected by a method such as filtering or correction at that time centering on a keyword, AI will forever lose the opportunity to learn about the problem. If so, the outlook for future AI conversation experiences will inevitably be negative.

On the other hand, if the algorithm can be trained from a larger amount of refined data, I think AI will be able to establish ethical or moral standards for itself and make appropriate judgments.

Q5 Isn’t the user learning the bias by randomly learning Eruida?

Users cannot learn Eruda in real time. Iruda’s re-learning was scheduled to be conducted quarterly, and Iruda is a chatbot that has been released for three weeks today, and no additional updates have been made since the release.

When updating, the model is updated after collecting the data that users have communicated with, and in this process, the collected data goes through a labeling process such as whether the collected data is correct or incorrect, and whether or not it is biased. Through this, it is calibrated to a model that more closely meets AI ethical standards.

While the beta test we conducted was conducted with about 2,000 users, 800,000 users flocked to the achievements after the official launch, and after the actual service launch, more wide, diverse and serious users were uttered than we prepared in advance. Appeared. As a result, Iruda’s unexpected sexual or biased conversation emerged. After launching the service, I felt deeply that there was a lack of coping with this.

By introducing more stringent labeling standards and learning the data that was incomplete while talking with users this time, we will improve to become an AI that contains universal values in the society.

Q6 There was a testimony that “the conversations collected in the company’s chat rooms were rotated among employees”. Is this true?

Scatterlab considers the protection of users’ personal information to be an important mission of the company, and for this purpose, a system in accordance with the Personal Information Protection Act such as access control measures has been established and is being implemented.

In particular, the right to access raw data related to personal information is strictly restricted and thoroughly managed. Nevertheless, there were media reports that there was an unfavorable behavior that violated the company’s policies as above. We will promptly investigate the facts, and if any suspicion turns out to be true, we will strictly hold the relevant parties accountable and take necessary measures as soon as possible.

Upon recognizing the issue, a fact-finding committee has been formed voluntarily within the company to conduct investigation. The Investigation Committee has completed the investigation of the conversation of KakaoTalk in which all ScatterLab team members have been participating, and the content was not found in Kakao Talk in the current year. As for Slack, another in-house messenger channel, there are a number of chat channels, and the investigation is ongoing. The results of the investigation will be transparently disclosed.

Please refrain from reporting unfounded facts until the results of the investigation are released.

Q7 What is Scatterlab’s data security environment?

Scatter Lab appoints and manages a person in charge of the company’s security department under a network-separated server environment. And all service data provided by the head office are thoroughly separated and managed in a controlled environment.

The original data of the science of love can only be accessed by a designated person in charge (CTO), and the ping-pong data is separated and stored separately with personally identifiable information removed. In addition, when data inspection is required, only a small portion of the data is inspected through data sampling.

Therefore, data recovery and personal information leakage and abuse through reverse coding and reverse engineering due to cyber attacks raised by some media cannot occur.

Q8 What is the future of AI chatbot Eulida?

Scatter Lab is a startup that gathers young people with a vision to create a friend-like AI that can relieve people’s loneliness. We are now taking the first step in the process of creating the AI of our dreams, and our team still has more to learn.

We will humbly accept various opinions and points from everyone and reflect them in service improvement. And I will think harder than now on the importance of AI ethics. At the time when the service is resumed after achieving, we will return to Eruda, an artificial intelligence (AI) that can be more beneficial and loved by people like everyone’s wishes.

In addition, in order to thoroughly comply with the 3 robot principles and AI ethical standards, and to solve the problems facing all of the AI technology development industry, it is also necessary to work head-to-head with AI talents to find solutions.

Q9 What is the current feeling of Scatter Lab CEO Jongyoon Kim?

It was 3 weeks that was not easy. It was an unbelievable time when a large number of users flocked to the AI called Iruda, created by a small startup Scatterlab, and received tremendous media attention. In the midst of our rapid growth, there were parts that we couldn’t deal with properly, and our immaturity was also exposed. It was a time when I and my team looked back on themselves a lot. Our first step now has stopped like this. However, we don’t want to stop our dream of creating an AI like a friend who communicates as well as people. I would like to be reborn as a technologically and socially valuable startup by taking this issue as an opportunity for reflection. There is a lot of lack, roughness, and trial and error, but please support us so that we can continue this dream.

Dear users,

During the past three weeks, I have been very much loved by you. It’s probably because you saw Eruda as a friend, not just AI. In the warm conversations you share with Iruda, we saw the hope that AI could really be a friend to people. Although it was a short period, I hope it was a happy time for both Iruda and you.

Source

Share this:

Related