<?xml version="1.0" encoding="ISO-8859-1"?><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<front>
<journal-meta>
<journal-id>1683-0789</journal-id>
<journal-title><![CDATA[Acta Nova]]></journal-title>
<abbrev-journal-title><![CDATA[RevActaNova.]]></abbrev-journal-title>
<issn>1683-0789</issn>
<publisher>
<publisher-name><![CDATA[Universidad Católica Boliviana]]></publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id>S1683-07892021000100003</article-id>
<title-group>
<article-title xml:lang="en"><![CDATA[Multilayer and convolutional neural networks for Bolivian Sign Language recognition: an empirical evaluation]]></article-title>
<article-title xml:lang="es"><![CDATA[Redes neuronales multicapa y convolucionales para el reconocimiento del lenguaje de señas boliviano: una evaluación empírica]]></article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Rodríguez Villarroel]]></surname>
<given-names><![CDATA[Juan Pablo]]></given-names>
</name>
<xref ref-type="aff" rid="A01"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Ponce de León Espinoza]]></surname>
<given-names><![CDATA[Nicolás]]></given-names>
</name>
<xref ref-type="aff" rid="A01"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Arteaga Sabja]]></surname>
<given-names><![CDATA[Wendoline]]></given-names>
</name>
<xref ref-type="aff" rid="A01"/>
</contrib>
</contrib-group>
<aff id="A01">
<institution><![CDATA[,Universidad Católica Boliviana 'San Pablo' Departamento de Ciencias Exactas e Ingenierías ]]></institution>
<addr-line><![CDATA[Cochabamba ]]></addr-line>
<country>Bolivia</country>
</aff>
<pub-date pub-type="pub">
<day>00</day>
<month>03</month>
<year>2021</year>
</pub-date>
<pub-date pub-type="epub">
<day>00</day>
<month>03</month>
<year>2021</year>
</pub-date>
<volume>10</volume>
<numero>1</numero>
<fpage>22</fpage>
<lpage>41</lpage>
<copyright-statement/>
<copyright-year/>
<self-uri xlink:href="http://www.scielo.org.bo/scielo.php?script=sci_arttext&amp;pid=S1683-07892021000100003&amp;lng=en&amp;nrm=iso"></self-uri><self-uri xlink:href="http://www.scielo.org.bo/scielo.php?script=sci_abstract&amp;pid=S1683-07892021000100003&amp;lng=en&amp;nrm=iso"></self-uri><self-uri xlink:href="http://www.scielo.org.bo/scielo.php?script=sci_pdf&amp;pid=S1683-07892021000100003&amp;lng=en&amp;nrm=iso"></self-uri><abstract abstract-type="short" xml:lang="en"><p><![CDATA[The deaf community is a social stratum with lots of struggles in daily life, chiefly cause for communication difficulties with the general public. Although each country has its sign language, which is the case of Bolivian Sign Language(BSL). However, only few people know it. Different approaches have been proposed to perform gesture recognitions and help people to translate sign language to a particular language, including neural networks. However, little is known about the effectiveness of the neural networks to detect Bolivian Sign Language (BSL). This paper proposes and evaluates the use of two neural network techniques, multilayer (MLP) and convolutional(CNN), to recognize Bolivian Sign Language. Our approach takes as input the most significant frames from a video using a motion-based algorithm and applying a border detection algorithm in the selected frames. We present an experiment on which we evaluate these techniques using 60 videos of four basic BSL phrases. As a result, we found that MLP has an accuracy which ranges between 65% and 88%, and CNN ranges from 95% and 99%, depending of number of neurons and internal layers used.]]></p></abstract>
<abstract abstract-type="short" xml:lang="es"><p><![CDATA[La comunidad de sordos es un estrato social con muchas luchas en la vida diaria, principalmente causa de dificultades de comunicación con el público en general. Aunque cada país tiene su lengua de signos, como es el caso de la Lengua de Signos Boliviana (BSL). Sin embargo, pocas personas lo saben. Se han propuesto diferentes enfoques para realizar reconocimientos de gestos y ayudar a las personas a traducir el lenguaje de señas a un idioma en particular, incluidas las redes neuronales. Sin embargo, se sabe poco sobre la efectividad de las redes neuronales para detectar el lenguaje de señas boliviano (BSL). Este artículo propone y evalúa el uso de dos técnicas de redes neuronales, multicapa (MLP) y convolucional (CNN), para reconocer el lenguaje de señas boliviano. Nuestro enfoque toma como entrada los fotogramas más significativos de un video utilizando un algoritmo basado en movimiento y aplicando un algoritmo de detección de bordes en los fotogramas seleccionados. Presentamos un experimento en el que evaluamos estas técnicas utilizando 60 videos de cuatro frases BSL básicas. Como resultado, encontramos que MLP tiene una precisión que varía entre 65% y 88%, y CNN varía entre 95% y 99%, dependiendo del número de neuronas y capas internas utilizadas.]]></p></abstract>
<kwd-group>
<kwd lng="en"><![CDATA[multilayer neural network]]></kwd>
<kwd lng="en"><![CDATA[convolutional neural networks]]></kwd>
<kwd lng="en"><![CDATA[computer vision]]></kwd>
<kwd lng="en"><![CDATA[sign language recognition]]></kwd>
<kwd lng="en"><![CDATA[BSL]]></kwd>
<kwd lng="es"><![CDATA[red neuronal multicapa]]></kwd>
<kwd lng="es"><![CDATA[redes neuronales convolucionales]]></kwd>
<kwd lng="es"><![CDATA[visión por computadora]]></kwd>
<kwd lng="es"><![CDATA[reconocimiento de lenguaje de signos]]></kwd>
<kwd lng="es"><![CDATA[BSL]]></kwd>
</kwd-group>
</article-meta>
</front><body><![CDATA[ <p align="right"><font face="Verdana, Arial, Helvetica, sans-serif" size="2"><b>Art&iacute;culo Cient&iacute;fico</b></font></p>     <p align="right">&nbsp;</p>     <p align="center"><b><font face="Verdana, Arial, Helvetica, sans-serif" size="4">Multilayer and convolutional neural networks for Bolivian Sign Language recognition: an empirical</font> <font face="Verdana, Arial, Helvetica, sans-serif" size="4">evaluation</font></b></p>     <p align="center">&nbsp;</p>     <p align="center"><b><font face="Verdana, Arial, Helvetica, sans-serif" size="3"><i>Redes neuronales multicapa y convolucionales para el reconocimiento del lenguaje de señas boliviano: una evaluación empírica</i></font></b></p>     <p align="center">&nbsp;</p>     <p align="center">&nbsp;</p>     <p align="center"><b><font face="Verdana, Arial, Helvetica, sans-serif" size="2">Juan Pablo Rodríguez Villarroel, Nicolás Ponce de León Espinoza, Wendoline Arteaga Sabja</font><font face="Verdana, Arial, Helvetica, sans-serif" size="2"></font></b></p>     <p align="center"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">Departamento de Ciencias Exactas e Ingenierías, Universidad Católica Boliviana &quot;San Pablo&quot;, Calle M. Márquez esquina Parque Jorge Trigo Andia</font> <font face="Verdana, Arial, Helvetica, sans-serif" size="2">Cochabamba, Bolivia</font></p>     <p align="center"><font face="Verdana, Arial, Helvetica, sans-serif" size="2"><a href="mailto:warteaga@ucb.edu.bo">warteaga@ucb.edu.bo</a></font></p>     ]]></body>
<body><![CDATA[<p align="center"><font face="Verdana, Arial, Helvetica, sans-serif" size="2"><b>Recibido: </b>17 de noviembre 2020    <br> <b>Aceptado: </b>23 de febrero 2021</font></p>     <p align="center">&nbsp;</p>     <p align="center">&nbsp;</p> <hr align="JUSTIFY" noshade>     <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2"><b>Abstract: </b>The deaf community is a social stratum with lots of struggles in daily life, chiefly cause for communication difficulties with the general public. Although each country has its sign language, which is the case of Bolivian Sign Language(BSL). However, only few people know it. Different approaches have been proposed to perform gesture recognitions and help people to translate sign language to a particular language, including neural networks. However, little is known about the effectiveness of the neural networks to detect Bolivian Sign Language (BSL).</font></p>     <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">This paper proposes and evaluates the use of two neural network techniques, multilayer (MLP) and convolutional(CNN), to recognize Bolivian Sign Language. Our approach takes as input the most significant frames from a video using a motion-based algorithm and applying a border detection algorithm in the selected frames. We present an experiment on which we evaluate these techniques using 60 videos of four basic BSL phrases. As a result, we found that MLP has an accuracy which ranges between 65% and 88%, and CNN ranges from 95% and 99%, depending of number of neurons and internal layers used.</font></p>     <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2"><b>Keywords: </b>multilayer neural network, convolutional neural networks, computer vision, sign language recognition, BSL.</font></p> <hr align="JUSTIFY" noshade>     <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2"><b>Resumen: </b>La comunidad de sordos es un estrato social con muchas luchas en la vida diaria, principalmente causa de dificultades de comunicación con el público en general. Aunque cada país tiene su lengua de signos, como es el caso de la Lengua de Signos Boliviana (BSL). Sin embargo, pocas personas lo saben. Se han propuesto diferentes enfoques para realizar reconocimientos de gestos y ayudar a las personas a traducir el lenguaje de señas a un idioma en particular, incluidas las redes neuronales. Sin embargo, se sabe poco sobre la efectividad de las redes neuronales para detectar el lenguaje de señas boliviano (BSL).</font></p>     <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">Este artículo propone y evalúa el uso de dos técnicas de redes neuronales, multicapa (MLP) y convolucional (CNN), para reconocer el lenguaje de señas boliviano. Nuestro enfoque toma como entrada los fotogramas más significativos de un video utilizando un algoritmo basado en movimiento y aplicando un algoritmo de detección de bordes en los fotogramas seleccionados. Presentamos un experimento en el que evaluamos estas técnicas utilizando 60 videos de cuatro frases BSL básicas. Como resultado, encontramos que MLP tiene una precisión que varía entre 65% y 88%, y CNN varía entre 95% y 99%, dependiendo del número de neuronas y capas internas utilizadas.</font></p>     <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2"><b>Palabras clave: </b>red neuronal multicapa, redes neuronales convolucionales, visión por computadora, reconocimiento de lenguaje de signos, BSL.</font></p> <hr align="JUSTIFY" noshade>     ]]></body>
<body><![CDATA[<p align="justify">&nbsp;</p>     <p align="justify">&nbsp;</p>     <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="3"><b>1</b></font><font face="Verdana, Arial, Helvetica, sans-serif" size="2">&nbsp; &nbsp; &nbsp;</font><font face="Verdana, Arial, Helvetica, sans-serif" size="3"><b>Introduction</b></font></p>     <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">Sign language is the main communication medium of deaf population. Sign language is a systematic language which includes fingerspelling, motions, lips reading, and another non-verbal expression. Sing language plays the important role in communication for deaf people, by tending to present the language visually with the use of signs (Ministerio de Educación Boliviano). Besides the existences of BSL sign language deaf people still have communication difficulties mainly because most of the people do not know sign language.</font></p>     <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">Diverse approaches have been proposed to improve this situation by providing people various mechanisms to help users: interpret sign language and/or translate a particular language to sign language. Such mechanisms may include particular hardware, such as gloves (Yang &amp; Chen, 2020), bracelet (Pascual, 2014), or Kinect camera (PASTOR, 2013). On the other hand, the computer vision research community proposed a number of approaches to perform gesture recognition, which may be used to translate sign language taking to a particular language, for instance, English (García &amp; Alarcon). However, little is known about the effectiveness of neural networks to detect <i>Bolivian </i>Sign Language.</font></p>     <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">In this paper, we present an empirical investigation on the use of multi-layer and convolutional neural networks to translate Bolivian Sign Language to Spanish. Where we use a sample of image frames collected from a video as input. To select a sample of image frames of the video we use a movement-based selection technique. For this, we consider capturing each frame where the change of the hand direction is notorious on the video. The algorithm compares each frame and only selects when the movement of the person has a sudden change in direction. For all of these selected frames we apply a border detection algorithm proposed by canny et al. (CANNY, 1986). Finally, we analyze every representative gesture as a class for the neural network.</font></p>     <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">In our experiment, we evaluate these two neuronal networks to analyze 60 videos of four basic BSL phrases. Our experiment reveals MLP has an accuracy which ranges between 65% and 88%, and CNN ranges from 95% and 99%, for the signs we consider.</font></p>     <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2"><b>Structure. </b>The following section of this paper as structured as follows: Section 2, describes the two neural networks techniques; Section 3, explains the video preprocessing technique used for the experiment; Section 4, describes the methodology; Section 5, are the results obtained using both of the neural networks; Section 6, the discussion and future work; Section 7, shows the related work; Section 8, represents the conclusions obtained based of the result.</font></p>     <p align="justify">&nbsp;</p>     <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="3"><b>2</b></font><font face="Verdana, Arial, Helvetica, sans-serif" size="2">&nbsp; &nbsp; &nbsp;</font><font face="Verdana, Arial, Helvetica, sans-serif" size="3"><b>Multi-layer and Convolutional Neural Networks</b></font></p>     ]]></body>
<body><![CDATA[<p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">This section briefly describes the two neural networks under analysis.</font></p>     <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2"><b>2.1</b>&nbsp; &nbsp; &nbsp;<b>Multilayer perceptron neural network architecture (MLP)</b></font></p>     <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">Multilayer perceptron neural networks are feedforward networks, which mean it has one or more hidden layers, besides the input and output layer (Haykin). Each one of these layers has one or more processing units or neurons and every one of them is completely communicated with the ones in the previous layer; each link has an activation function. Each link <i>j</i>, <i>i</i> has an associated weight <i>Wj, i </i>that determines the strength and sign of the link and spreads the activation <i>a<sub>j</sub>. </i>First, each neuron <i>i </i>calculates a weighted sum of its inputs <i>in<sub>i</sub>:</i></font></p>     <p align="center"><img src="/img/revistas/ran/v10n1/a02_ecuation_01.gif" width="135" height="71"></p>     <blockquote>       <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">Equation 1: Weighted sum of inputs. </font></p>       <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">Source: (Russell)</font></p> </blockquote>     <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">Next, the activation function <i>g </i>is applied to the sum, and outputs <i>a<sub>i</sub>. </i>the output that will be spread to the neurons in the next layer:</font></p>     <p align="center"><img src="/img/revistas/ran/v10n1/a02_ecuation_02.gif" width="230" height="74"></p>     <blockquote>       ]]></body>
<body><![CDATA[<p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">Equation 2: activation function application </font></p>       <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">Source: (Russell)</font></p> </blockquote>     <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">At this point, it is important to clarify that during the weighted sum, a <i>W<sub>0,j</sub> </i>bias weight was added; this bias represents the real neuron threshold, which means, it will activate if the sum of the weight and the input is positive (Russell).</font></p>     <p align="center"><img src="/img/revistas/ran/v10n1/a02_figure_01.gif" width="415" height="189"></p>     <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2"><b>2.2</b>&nbsp; &nbsp; &nbsp;<b>Convolutional neural network architecture</b></font></p>     <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">Convolutional neural networks follow the same structure as a normal neural network, but they use a particular layer called convolution layer. (LeCun) Convolution is a mathematical operation and it is used to analyze the features of the image by parts and it's denoted as the equation below:</font></p>     <blockquote>       <p align="left"><img src="/img/revistas/ran/v10n1/a02_ecuation_03.gif" width="173" height="35"></p>       <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">Equation 3: Convolution </font></p>       <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">Source: (Goodfellow)</font></p> </blockquote>     ]]></body>
<body><![CDATA[<p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">In convolutional network terminology, the first argument <i>x </i>is often referred to as the input of the neural network and the second argument <i>w </i>is the kernel. The output is referred to as the feature map. This input usually is a multidimensional array of data and the kernel a multidimensional array of parameters. These arrays are known as tensors. (Goodfellow) The kernel is used on the data as the image below:</font></p>     <p align="justify"><a name="f2"></a></p>     <p align="center"><img src="/img/revistas/ran/v10n1/a02_figure_02.gif" width="359" height="401"></p>     <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">In this example, the 2x2 kernel is applied to each sub matrix of the input, and the result is a new matrix with fewer elements called feature map. It is common to use a process after the convolution layer; this technique is called max-pooling and is represented as the image below:</font></p>     <p align="justify"><a name="f3"></a></p>     <p align="center"><img src="/img/revistas/ran/v10n1/a02_figure_03.gif" width="402" height="169"></p>     <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">This process consists of selecting the maximum values of the matrix resulting from the convolution layer, this value is selected from a sub matrix of the feature map through a window, and these windows have a defined size and a pass for every group of pixels of the feature map. All these max values represent the features of the initial input and are a multidimensional array that is why at the end of this process a flatten activation is used to obtain a one-dimensional array that will be used as the input to a normal neural network.</font></p>     <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2"><b>2.3</b>&nbsp; &nbsp; &nbsp;<b>The Backpropagation algorithm</b></font></p>     <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">A neural network can be represented as function <i>h(</i></font><font size="3"><i>x</i></font><font face="Verdana, Arial, Helvetica, sans-serif" size="2"><i>, </i></font><font face="Verdana, Arial, Helvetica, sans-serif">&oslash;</font><font face="Verdana, Arial, Helvetica, sans-serif" size="2">), where <i>X </i>is an example represented as an integer array and to which we want to obtain the belonging class; meanwhile </font><font face="Verdana, Arial, Helvetica, sans-serif">&oslash;</font><font face="Verdana, Arial, Helvetica, sans-serif" size="2"> represents a parameter vector that the network will use, these are the weights. There are as many weights as links in the network, trying to define its correct value of each one of them by hand is quite a challenging Job, maybe impossible. But here is where the Backpropagation joins the game (Buduma). Taking the next MLP network as an example:</font></p>     <p align="center"><img src="/img/revistas/ran/v10n1/a02_figure_04.gif" width="453" height="262"></p>     ]]></body>
<body><![CDATA[<p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">The algorithm has 2 phases: the first one takes place when the network is predicting the class of some example provided during the training, the data flows through each layer until we get a vector of predictions <i>h</i></font><font size="2"><i><sub>w</sub></i></font><font face="Verdana, Arial, Helvetica, sans-serif" size="2"> (</font><font size="3"><i>x</i></font><font face="Verdana, Arial, Helvetica, sans-serif" size="2">) in the output layer; it is at this point where a comparison is made, between this vector and the real value of the provided example, calculating the error <i>E </i>as follows:</font></p>     <p align="center"><img src="/img/revistas/ran/v10n1/a02_ecuation_04.gif" width="156" height="48"></p>     <blockquote>       <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">Equation 4: Error calculation </font></p>       <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">Source: (Russell)</font></p> </blockquote>     <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">The higher the value of <i>E </i>is, the worse the network performs, but on the other hand, the closer this value is to 0, the better the network will perform. While the value of the error in this layer is representative and gives the necessary information for a correction, this does not happen in the rest of the hidden layers, because the error there is unknown. And is here where the second phase of the algorithm begins, with the weights update, starting with the ones in the output layer, as follows:</font></p>     <p align="justify"><font size="2" face="Verdana, Arial, Helvetica, sans-serif">With: <img src="/img/revistas/ran/v10n1/a02_ecuation_05.gif" width="149" height="28" align="absmiddle"></font></p>     <p align="center"><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><img src="/img/revistas/ran/v10n1/a02_ecuation_06.gif" width="179" height="30" align="absmiddle"></font></p>     <blockquote>       <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">Equation 5: Output's weights update </font></p>       ]]></body>
<body><![CDATA[<p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">Source: (Russell)</font></p> </blockquote>     <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">Where </font><font size="2">&#945;</font><font face="Verdana, Arial, Helvetica, sans-serif" size="2"> is the learning rate and <i>g' </i>is the activation function derived. Now the hidden neurons update is based in the idea that: the hidden neuron <i>j</i> is responsible of a fraction &#916;<sub><i>i</i></sub> in every neuron in the output layer to which is linked. By this way, the values of &#916;<sub><i>i</i></sub> are divided based on the link's strength and it backpropagates the &#916;<i><sub>j</sub></i> values, which is obtained as follows:</font></p>     <p align="center"><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><img src="/img/revistas/ran/v10n1/a02_ecuation_07.gif" width="187" height="55" align="absmiddle"></font></p>     <blockquote>       <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">Equation 6: Delta j calculation</font></p>       <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">Source: (Russell)</font></p> </blockquote>     <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">And thus, the remaining hidden weights update begins:</font></p>     <p align="center"><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><img src="/img/revistas/ran/v10n1/a02_ecuation_08.gif" width="186" height="30" align="absmiddle"></font></p>     <blockquote>       <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">Equation 7: Hidden weights update </font></p>       ]]></body>
<body><![CDATA[<p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">Source: (Russell) </font></p> </blockquote>     <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">This algorithm could summarize as follows:</font></p>     <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">•&nbsp; &nbsp; Obtain the &#916; values for the output units, using the observed error.</font></p>     <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">•&nbsp; &nbsp; Starting in the output layer, repeat for every layer in the network until the first hidden layer is reached (Buduma; Russell):</font></p>     <blockquote>       <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif"><strong>&middot;</strong></font> <font face="Verdana, Arial, Helvetica, sans-serif" size="2">Backpropagate the &#916; values to the previous layer. </font></p>       <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif"><strong>&middot;</strong></font> <font face="Verdana, Arial, Helvetica, sans-serif" size="2">Update the weight between those two layers.</font></p> </blockquote>     <p align="justify">&nbsp;</p>     <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="3"><b>3</b></font><b><font face="Verdana, Arial, Helvetica, sans-serif" size="2">&nbsp; &nbsp; &nbsp;</font><font face="Verdana, Arial, Helvetica, sans-serif" size="3">Video Preprocessing: Frame Sampling and Border Detection</font></b></p>     <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">The presented approach takes a video as input, which we preprocess using two steps: first, we take representatives frames of the video as a sample, and then we apply the border detection algorithm to these frames, to finally send the end result as input to the neural network.</font></p>     ]]></body>
<body><![CDATA[<p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2"><b>3.1&nbsp; &nbsp; &nbsp;Movement-based Frame Sampling.</b></font></p>     <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">The movement detection algorithm consists of comparing each frame of a video and comparing with its predecessors. If it is a huge change in motion, that instant is</font> <font face="Verdana, Arial, Helvetica, sans-serif" size="2">captured and called a key point. When processing each sign, 1 or more key points will be obtained and each of these will be analyzed as a class in the neural network. For example preprocess the sign 'Cochabamba' will give the result shown on <a href="#f5">Figure 5</a>.</font></p>     <p align="justify"><a name="f5"></a></p>     <p align="center"><img src="/img/revistas/ran/v10n1/a02_figure_05.jpg" width="539" height="272"></p>     <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">Three frames were selected by the movement detection algorithm for this particular sign that is conformed by the letters: 'c', 'b' and 'a' on BSL. Each of these selected frames will continue with pre-process techniques to get the feature of the images.</font></p>     <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2"><b>3.2</b>&nbsp; &nbsp; &nbsp;<b>Border Detection and Image Preprocessing.</b></font></p>     <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">For each selected frame from the video, we bring the image to grayscale, apply a blur filter, detect the borders using the Canny border detection algorithm, and rescale the image to a desired number of pixels high and wide, in this case 300 height and 300 wide. Bringing an image to grayscale greatly reduces unnecessary data, since RGB matrices are no longer used to represent colors, only a matrix with pixels representing 1 - 255 in black and white levels is needed. A blur filter is used to eliminate noise from the image and then the Canny filter which highlights only the borders that it finds in the image and in this way a matrix is obtained with only the values that interest us. Finally, a rescaling is performed so that all the images obtained have the same dimensions in height and width, in addition to seekingless processing in the neural network. (Hninn, 2009) <a href="#f6">Figure 6</a> illustrates the descriptor extraction process.</font></p>     <p align="justify"><a name="f6"></a></p>     <p align="center"><img src="/img/revistas/ran/v10n1/a02_figure_06.jpg" width="485" height="387"></p>     <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">The techniques used in this step are common techniques in the field of computer vision, the use of that sequence of filters and processes guaranteed the feature extraction of an image. (Hninn, 2009)</font></p>     ]]></body>
<body><![CDATA[<p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">We used some techniques to improve the quality of the dataset and avoid overfitting. We confirm that the best results were obtained for the datasets, where the data augmentation strategy was emplaced to generate the data. (Núñez, 2017)</font></p>     <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">With this idea the neural network now learns the components of a sign, for example, the three gestures that make up the sign 'Cochabamba' as shown in <a href="#f5">Figure 5</a>. These components are found with the motion detection process, a change was made for it to save the individual frames in a directory. After the filter applied each of these gestures as a raw picture will conform to a class for the input of the neural network.</font></p>     <p align="justify">&nbsp;</p>     <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="3"><b>4</b></font><font face="Verdana, Arial, Helvetica, sans-serif" size="2">&nbsp; &nbsp; &nbsp;</font><font face="Verdana, Arial, Helvetica, sans-serif" size="3"><b>Experiment Setup</b></font></p>     <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">To validate the effectiveness of the proposed approaches. We designed a six step methodology and structure it in a workflow (<a href="#f7">Figure 7</a>):</font></p>     <p align="justify"><a name="f7"></a></p>     <p align="center"><img src="/img/revistas/ran/v10n1/a02_figure_07.gif" width="550" height="154"></p>     <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">A following we describe each one of the steps:</font></p>     <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2"><b>Data Set. </b>In the face of the lack of an existing video dataset containing BSL sign examples for the networks learning process; we proceeded to produce it. So, the first step of the process was the dataset acquisition, whose elements are videos containing BSL. Not having a set of videos for the generation of descriptors that make up the dataset, we proceeded to establish it. For our experiment, we focus on ten signs: auto, coffee, Cochabamba, what?, thank you, helio, please, want and me. We choose these signs based on the first book-module of the Bolivian Sign Language, provided by the &quot;Ministerio de Educación Boliviano&quot;. For each sign, we produce a sample of 15 videos, making a total of 150. These videos correspond to different persons doing each sign. Each of these videos follows, as far as possible, the following guidelines:</font></p>     <blockquote>       ]]></body>
<body><![CDATA[<p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">• Good lighting.</font></p>       <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">• Only one person in the video.</font></p>       <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">• High contrast between the hands, face and body of the person in the video.</font></p>       <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">• The person should only be focused from the waist up, since the legs are not used for any gesture.</font></p>       <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">• As far as possible, no objects in the background of the scene.</font></p> </blockquote>     <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2"><b>Video Preprocessing. </b>We have used the preprocessing techniques described in section 3. We apply the motion detection algorithm on the videos to select the most representative gestures of the sign and then we use filters to highlight the image features (border detection).</font></p>     <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2"><b>Classifier generation. </b>Machine learning techniques are used during this stage, to search a model that infers the rules for future examples classification. In this case, this generation is made by a multilayer and a convolutional neural network, setting a few parameters in order to find trends. The following parameter was used for the configurations:</font></p>     <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">MLP:</font></p>     <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">• Input classes. The number of gestures that represent the signs:</font></p>     <blockquote>       ]]></body>
<body><![CDATA[<p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif"><strong>&middot;</strong></font> <font face="Verdana, Arial, Helvetica, sans-serif" size="2">Word: Cochabamba (one-hand represented). 3 gestures/classes.</font></p>       <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif"><strong>&middot;</strong></font> <font face="Verdana, Arial, Helvetica, sans-serif" size="2">Words:  Please,  Coffee  and helio   (two-hand represented).  3 gestures/classes.</font></p> </blockquote>     <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">• Number of neurons per hidden layer. We use the following sets of ranges: [11 - 23] neurons on the hidden layer.</font></p>     <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">• Activation function. The activation function used was sigmoid.</font></p>     <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">• Number epochs. This experiment only used 5 epochs. CNN:</font></p>     <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">• Input classes. The number of gestures that represent the signs:</font></p>     <blockquote>       <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif"><strong>&middot;</strong></font> <font face="Verdana, Arial, Helvetica, sans-serif" size="2">Word: Cochabamba (one-hand represented). 3 gestures/classes.</font></p>       <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif"><strong>&middot;</strong></font> <font face="Verdana, Arial, Helvetica, sans-serif" size="2">Words:  Please,  coffee  and  helio   (two-hand  represented).   3 gestures/classes.</font></p>       <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif"><strong>&middot;</strong></font> <font face="Verdana, Arial, Helvetica, sans-serif" size="2">Words: Please, coffee, helio and want (two-hand represented). 4 gestures/classes. Where coffee and want use similar gestures.</font></p> </blockquote>     ]]></body>
<body><![CDATA[<p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">• Number of neurons per hidden layer. We use the following sets of ranges:</font></p>     <blockquote>       <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif"><strong>&middot;</strong></font> <font face="Verdana, Arial, Helvetica, sans-serif" size="2">[11 - 23] neuron on the hidden layer. </font></p>       <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif"><strong>&middot;</strong></font> <font face="Verdana, Arial, Helvetica, sans-serif" size="2">[15, 20, 25] neuron on the hidden layer.</font></p> </blockquote>     <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">• Activation function. The activation function used was sigmoid.</font></p>     <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">• Number epochs. We evaluated 1, 2, 3 and 5 epochs.</font></p>     <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">• Number of convolutional layers. The  first experiments  only used  1 convolutional layer, for the last one we used 1, 2 and 3 convolutional layers.</font></p>     <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2"><b>Accuracy and recall. </b>A performance analysis and comparison is made using the accuracy and recall metrics. The two techniques used in this experiment are (Buduma):</font></p>     <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">• <b>Accuracy: </b>The most common metric, which is in this case is the percentage of times that the neural network successfully classifies a sing. This metric is obtained as follows:</font></p>     <p align="center"><img src="/img/revistas/ran/v10n1/a02_ecuation_09.gif" width="340" height="49"></p>     ]]></body>
<body><![CDATA[<blockquote>       <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">Equation 8: Accuracy </font></p>       <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">Source: (Goodfellow) </font></p> </blockquote>     <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2"><b>Recall: </b>Is the fraction of the class elements that were correctly classified:</font></p>     <p align="center"><img src="/img/revistas/ran/v10n1/a02_ecuation_10.gif" width="353" height="49"></p>     <blockquote>       <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">Equation 9: Recall </font></p>       <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">Source: (Goodfellow)</font></p> </blockquote>     <p align="justify">&nbsp;</p>     <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="3"><b>5</b></font><font face="Verdana, Arial, Helvetica, sans-serif" size="2">&nbsp; &nbsp; &nbsp;</font><font face="Verdana, Arial, Helvetica, sans-serif" size="3"><b>Results</b></font></p>     ]]></body>
<body><![CDATA[<p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2"><b>5.1</b>&nbsp; &nbsp; &nbsp;<b>MLP</b></font></p>     <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2"><a href="#f8">Figure 8</a> shows the results obtained by the multilayer neural network for the 'Cochabamba' sign, which consists of 3 gestures and only one hand is used and the signs 'please', 'coffee' and 'helio'. In the same way they consist of 3 gestures in total, but both hands are used to be represented. In this way, <a href="#f8">Figure 8</a> shows the comparison of the results obtained by a sign that only uses one hand to be represented with three signs that use both hands.</font></p>     <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2"><b>Accuracy. </b>- In both cases, the MLP did not achieve cases greater than 88% and it seems that due to the similarity of the gestures it was more difficult for the 'Cochabamba' sign since it only changes the position of one hand. Given the results, it can be affirmed that the MLP does not classify satisfactorily either of the two sets of gestures and the tendency to increase neurons does not seem to improve accuracy. This is why it is not considered to increase more signals to the sets of gestures.</font></p>     <p align="justify"><a name="f8"></a></p>     <p align="center"><img src="/img/revistas/ran/v10n1/a02_figure_08.gif" width="568" height="377"></p> <table width="100%" border="1" cellpadding="3" cellspacing="0" bordercolor="#000000">   <tr>     <td>    <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">Observation 1. Our results show that MLP has values below 88% of accuracy for the signs under analysis, therefore it is not the appropriate machine learning technique for this problem.</font></p></td>   </tr> </table>     <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2"><b>5.2</b>&nbsp; &nbsp; &nbsp;<b>CNN</b></font></p>     <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">The convolutional neural network follows a similar topology to MLP, but these contain one or more convolutional layers. The same tests as the MLP were performed for the set of gestures for 'Cochabamba' and 'please', 'coffee' and 'hello'. In this case we add a convolutional layer on the topology. Same as the previous test, in <a href="#f9">Figure 9</a> we can see the comparison of both types sets of signs.</font></p>     <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2"><b>Accuracy. </b>- As we can see in <a href="#f9">Figure 9</a>, all the results are above 95% accuracy. Compared to the results obtained by the MLP, we can confirm this type of network successfully learned the two sets of gestures. So we can affirm that CNN is very well suited to the problem of image classification.</font></p>     <p align="justify"><a name="f9"></a></p>     ]]></body>
<body><![CDATA[<p align="center"><img src="/img/revistas/ran/v10n1/a02_figure_09.gif" width="562" height="377"></p> <table width="100%" border="1" cellpadding="3" cellspacing="0" bordercolor="#000000">   <tr>     <td>    <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">Observation 2. Our results show that CNN has values greater than 95% for the signs we consider, we can affirm that it satisfactorily classifies both sets of gestures and also CNN is better than MLP when in terms of image classification.</font></p></td>   </tr> </table>     <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2"><b>Recall. </b>- In <a href="#f10">figure 10</a> we can see the results for the recall metric, the same previous tests were used. In the sign sets: 'Please', 'coffee' and helio. The first results, with fewer neurons in the hidden layer, were low. As the number of neurons increased they improved. Up to 99% was reached in recall, which is why we can affirm that the network is stable when making predictions using these sets of gestures as input.</font></p>     <p align="justify"><a name="f10"></a></p>     <p align="center"><img src="/img/revistas/ran/v10n1/a02_figure_10.gif" width="571" height="381"></p> <table width="100%" border="1" cellpadding="3" cellspacing="0" bordercolor="#000000">   <tr>     <td>    <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">Observation 3. Our results show that by increasing the number of neurons in the hidden layer the recall metric will improve using convolutional neural networks for the signs under analysis.</font></p></td>   </tr> </table>     <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">A new set of signs was organized with two objectives: To verify if CNN can learn two similar signs and to demonstrate that the increase in convolution layers improves the performance of the network. So, the signs were used: 'Please', 'coffee', 'hello' and 'want', where 'want' and 'coffee' have some similarities since in both the hands are on the torso. The results were the following:</font></p>     <p align="justify"><a name="f11"></a></p>     <p align="center"><img src="/img/revistas/ran/v10n1/a02_figure_11.gif" width="555" height="370"></p>     <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">As we can see in <a href="#f11">Figure 11</a>, each of the tests, the one that gave the best results was a CNN with three convolution layers and a hidden layer of 25 neurons, so it can be stated that the increase in convolution layers can increase the performance of the network, adding more than three convolutional layers got the same results, so in this scenario the recall rate will get the best results with three convolutional layers. It is clear that CNN could classify different signs even if two of them have similar gestures, that is the case of 'want' and 'coffee', so we can confirm that this type of neural networks is efficient to solve these problems.</font></p> <table width="100%" border="1" cellpadding="3" cellspacing="0" bordercolor="#000000">   <tr>     <td>    ]]></body>
<body><![CDATA[<p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">Observation 4. Our results show that by increasing convolution layers the metrics tend to improve and a convolutional neural network can learn different signs even if they are similar for the signs under analysis.</font></p></td>   </tr> </table>     <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">For the last test, we try to figure it out if the increase in convolution layers will increase the speed at which the network reaches its maximum point in terms of accuracy.</font></p>     <p align="justify"><a name="f12"></a></p>     <p align="center"><img src="/img/revistas/ran/v10n1/a02_figure_12.gif" width="540" height="366"></p>     <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2"><a href="#f12">Figure 12</a> shows that the greater the number of convolution layers the growth will be a little faster until reaching the maximum point, there could be the possibility in which the neural network got overstrained and the accuracy falls, although in this test low epochs were used to avoid that issue.</font></p>     <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">A pattern found was that the greater the number of convolution layers, the more neurons in the hidden layer were needed for the results to improve, otherwise if only one layer was used and more neurons were added than necessary, the results would begin to decrease. It is very possible that in case the trend is lowered, it will be necessary to increase convolution layers to change this.</font></p> <table width="100%" border="1" cellpadding="3" cellspacing="0" bordercolor="#000000">   <tr>     <td>    <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">Observation 5. Our results show that the increase in convolution layers make the neural network improve its performance and reach high numbers quickly in terms of accuracy, with a smaller number of neurons in the hidden layer.</font></p></td>   </tr> </table>     <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">The results of these last tests were entirely positive, demonstrating that the convolutional neural network manages to learn these signs using this type of preprocessing videos with the proposed descriptor, and new patterns were discovered that can be used in real-life projects.</font></p>     <p align="justify">&nbsp;</p>     <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="3"><b>6</b></font><font face="Verdana, Arial, Helvetica, sans-serif" size="2">&nbsp; &nbsp; &nbsp;</font><font face="Verdana, Arial, Helvetica, sans-serif" size="3"><b>Discussion and future work</b></font></p>     ]]></body>
<body><![CDATA[<p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2"><b>Dataset. </b>Although the videos filmed were used for the research, it is recommended that you take into account the following aspects when filming: Illuminated environments, the use of specialized tools such as professional cameras and tripods and clothing also backgrounds without textures or colors. Since this can influence the quality at the moment to use filters to obtain borders.</font></p>     <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2"><b>Descriptor. </b>It necessary do more research about other possible descriptors that are more suited to real life problems such as: first option is emphasize points of interest such as arms, there is a classifier called Haar Cascade that is use to this type of task and the second option is the vectorization of the body, trying to analyze the key points if the movements, this would avoid the need for good conditions in the environment where the videos are filmed.</font></p>     <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2"><b>Machine learning. </b>It is clear that CNN is a better option of artificial intelligence than MLP in this type of problems, but it's fair to say that it could be better to do more research and combine two types of neural networks to improve the results although with CNN positive results were obtained.</font></p>     <p align="justify">&nbsp;</p>     <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="3"><b>7</b></font><font face="Verdana, Arial, Helvetica, sans-serif" size="2">&nbsp; &nbsp; &nbsp;</font><font face="Verdana, Arial, Helvetica, sans-serif" size="3"><b>Related Work</b></font></p>     <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">American Sign Language recognition is not a new computer vision problem. Over the past two decades, researchers have used classifiers from a variety of categories that we can group roughly into linear classifiers, neural networks and Bayesian networks (Garcia &amp; Alarcon).</font></p>     <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">While linear classifiers are easy to work with because they are relatively simple models, they require sophisticated feature extraction and preprocessing methods to be successful. For instance, Singha and Das obtained accuracy of 96% testing 10 different gestures that only need one hand using Karhunen-Loeve Transforms (Garcia &amp; Alarcon). These types of transformations translate and rotate the axes to establish a new coordinate system based on the variance of the data. Karhunen-Loeve transformation is applied after using a skin filter, hand cropping and border detection on the images. This technique uses a linear classifier to distinguish between hand gestures including thumbs up, Índex finger pointing left and right, and numbers (no ASL). Sharma et al. use piecewise classifiers (Support Vector Machines and k-Nearest Neighbors) to characterize each color channel after background subtraction and noise removal. Their innovation comes from using a contour trace, which is an efficient representation of hand contours. They attain an accuracy of 62.3% using an SVM on the segmented color channel model (Sharma, Nemani, Kumar, &amp; Kane, 2013).</font></p>     <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">Bayesian networks like Hidden Markov Models have also achieved high accuracies. These are particularly good at capturing temporal patterns, but they require clearly defined models that are defined prior to learning. Starner and Pentland used a Hidden Markov Model (HMM) and a 3-D glove that tracks hand movement. Since the glove is able to obtain 3-D information from the hand regardless of spatial orientation, they were able to achieve an impressive accuracy of 99.2% on the test set. Their HMM uses time series data to track hand movements and classify based on where the hand has been in recent frames (Starner &amp; Pentland).</font></p>     <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">Suk et al. propose a method for recognizing hand gestures in a continuous video stream using a dynamic Bayesian network or DBN model (Suk, Sin, &amp; Lee, 2010). They attempt to classify moving hand gestures, such as making a circle around the body or waving. They achieve an accuracy of over 99%, but it is worth noting that all gestures are markedly different from each other and that they are not American Sign Language.</font></p>     <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">Some neural networks have been used to tackle ASL translation (Garcia &amp; Alarcon). Arguably, the most significant advantage of neural networks is that they learn the most important classification features. However, they require considerably more time and data to train. To date, most have been relatively shallow. Mekala et al. classified video of ASL letters into text using advanced feature extraction and a 3-layer Neural Network. They extracted features in two categories: hand position and movement. Prior to ASL classification, they identify the presence and location of 6 &quot;points of interest&quot; in the hand: each of the fingertips and the center of the palm. Mekala et al. also take Fourier Transforms of the images and identify what section of the frame the hand is located in. While they claim to be able to correctly classify 100% of images with this framework, there is no mention of whether this result was achieved in the training, validation or test set (Mekala, Gao, Fan, &amp; Davari, 2011).</font></p>     ]]></body>
<body><![CDATA[<p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">Admasu and Raimond classified Ethiopian Sign Language correctly in 98.5% of cases using a feed-forward Neural Network (Admasu &amp; Raimond, 2010). They use a significant amount of image preprocessing, including image size normalization, image background subtraction, contrast adjustment, and image segmentation. Admasu and Raimond extracted features with a Gabor Filter and Principal Component Analysis.</font></p>     <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">The most relevant work to date is L. Pigou et al's application of CNN's to classify 20 Italian gestures from the ChaLearn 2014 Looking at People gesture spotting competition (Garcia &amp; Alarcon). They use a Microsoft Kinect on full body images of people performing the gestures and achieve a cross-validation accuracy of 91.7%. As in the case with the aforementioned 3-D glove, the Kinect allows capture of depth features, which aids significantly in classifying ASL signs.</font></p>     <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">The difference between the experiment exposed with the related work is that we analyzed the Bolivian Sign Language and we include signs that use one and both</font> <font face="Verdana, Arial, Helvetica, sans-serif" size="2">hands to be represented. We also chose to make a comparison between the effectiveness of the multilayer neural network with the convolutional neural network as machine learning techniques applied on this problem.</font></p>     <p align="justify">&nbsp;</p>     <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="3"><b>8</b></font><font face="Verdana, Arial, Helvetica, sans-serif" size="2">&nbsp; &nbsp; &nbsp;</font><font face="Verdana, Arial, Helvetica, sans-serif" size="3"><b>Conclusions</b></font></p>     <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">The obtained video set of people performing BSL was used for the generation of the different training and validation datasets for both neural networks, MLP and CNN. This video set contains four different signs and fifteen videos per sign, making a total of 60 samples. During the video production, a set of conditions was followed as much as possible: good scene illumination, a solid background without any objects or texture, just one person in the scene and this person has to be well-focused and centered in the video.</font></p>     <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="2">The combination of the border detection as descriptor and the preprocessing image techniques related to the motion detection in conjunction with artificial intelligence techniques such as CNN and MLP gave an accuracy which ranges between 65% and 88%, and CNN ranges from 95% and 99%, depending of the number of neurons and internal layers. Making this combination a good option for the image classification problems and, in consequence, for the BSL image-based recognition. As future work, we plan to evaluate these techniques using a bigger data set, including more complex signs.</font></p>     <p align="justify">&nbsp;</p>     <p align="justify"><font face="Verdana, Arial, Helvetica, sans-serif" size="3"><b>Bibliography</b></font></p>     <!-- ref --><p align="justify"><font size="2" face="Verdana, Arial, Helvetica, sans-serif">[1]  Admasu, &amp; Raimond. (2010). Ethiopian  sign language recognition using Artificial Neural Network. </font>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=817077&pid=S1683-0789202100010000300001&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><p align="justify"><font size="2" face="Verdana, Arial, Helvetica, sans-serif">[2]  Buduma, N. <i>Fundamentals of Deep  Learning, designing the next-generation machine intelligence algorithms. </i>Retrieved  from <a href="http://perso.ens-lyon.fr/jacques.jayez/Cours/Implicite/Fundamentals_of_Deep_Learning.pdf" target="_blank">http://perso.ens-lyon.fr/jacques.jayez/Cours/Implicite/Fundamentals_of_Deep_Learning.pdf</a></font>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=817078&pid=S1683-0789202100010000300002&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><p align="justify"><font size="2" face="Verdana, Arial, Helvetica, sans-serif">[3]  Canny, J., <i>A Computational Approach To  Edge Detection</i>, IEEE Trans. Pattern Analysis and Machine Intelligence,  8(6):679&ndash;698, 1986 </font></p>     <!-- ref --><p align="justify"><font size="2" face="Verdana, Arial, Helvetica, sans-serif">[4]  Garcia, &amp; Alarcon. (2016). Real-time  American Sign Language Recognition with Convolutional Neural. <i>Stanford  University</i>. </font>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=817080&pid=S1683-0789202100010000300004&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><p align="justify"><font size="2" face="Verdana, Arial, Helvetica, sans-serif">[5]  Goodfellow. (2016). <i>Deep Learning. </i>The  MIT Press. </font>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=817081&pid=S1683-0789202100010000300005&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><p align="justify"><font size="2" face="Verdana, Arial, Helvetica, sans-serif">[6]  Haykin, S. (2009.). Neural Networks and  Learning Machines </font>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=817082&pid=S1683-0789202100010000300006&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><p align="justify"><font size="2" face="Verdana, Arial, Helvetica, sans-serif">[7]  Hninn. (2009). Real-Time Hand Tracking and  Gesture Recognition System Using Neural Networks. </font>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=817083&pid=S1683-0789202100010000300007&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><p align="justify"><font size="2" face="Verdana, Arial, Helvetica, sans-serif">[8] LeCun. (2019.). Quand la  machine apprend: La r&eacute;volution des neurones artificiels et de l'apprentissage  profond. </font>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=817084&pid=S1683-0789202100010000300008&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><p align="justify"><font size="2" face="Verdana, Arial, Helvetica, sans-serif">[9] Mekala,  Gao, Fan, &amp; Davari. (2011). Real-time sign language recognition based on  neural network architecture. <i>IEEE 43rd Southeastern Symposium on System  Theory. </i>Auburn, AL, USA: IEEE. </font></p>     <!-- ref --><p align="justify"><font size="2" face="Verdana, Arial, Helvetica, sans-serif">[10] Ministerio de educacion boliviano. (2010).  Modulo I - Curso de ense&ntilde;anza de la lengua de se&ntilde;as boliviana. </font>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=817086&pid=S1683-0789202100010000300010&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><p align="justify"><font size="2" face="Verdana, Arial, Helvetica, sans-serif">[11] N&uacute;&ntilde;ez,  C. P. (2017). Convolutional Neural Networks and Long Short-Term Memory for  skeleton-based human activity and hand gesture recognition. Madrid, Spain: Universidad Rey  Juan Carlos. </font>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=817087&pid=S1683-0789202100010000300011&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><p align="justify"><font size="2" face="Verdana, Arial, Helvetica, sans-serif">[12] Russell, N. <i>Inteligencia Artificial, un  enfoque moderno. </i>Retrieved from <a href="https://luismejias21.files.wordpress.com/2017/09/inteligencia-artificial-un-enfoque-moderno-stuart-j-russell.pdf" target="_blank">https://luismejias21.files.wordpress.com/2017/09/inteligencia-artificial-un-enfoque-moderno-stuart-j-russell.pdf</a></font>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=817088&pid=S1683-0789202100010000300012&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><p align="justify"><font size="2" face="Verdana, Arial, Helvetica, sans-serif">[13] Sharma,  Nemani, Kumar, &amp; Kane. (2013). Recognition of Single Handed Sign Language  Gestures using Contour Tracing Descriptor. <i>Proceedings of the World Congress  on Engineering. </i>London, UK. </font>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=817089&pid=S1683-0789202100010000300013&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><p align="justify"><font size="2" face="Verdana, Arial, Helvetica, sans-serif">[14] Starner,  &amp; Pentland. (1996). Real-Time American Sign Language Recognition from Video  Using Hidden Markov Models. <i>Massachusetts Institute of Technology </i>. </font>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=817090&pid=S1683-0789202100010000300014&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><p align="justify"><font size="2" face="Verdana, Arial, Helvetica, sans-serif">[15] Suk,  Sin, &amp; Lee. (2010). Hand gesture recognition based on dynamic Bayesian  network framework. <i>ELSEVIER</i>. </font>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=817091&pid=S1683-0789202100010000300015&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><p align="justify"><font size="2" face="Verdana, Arial, Helvetica, sans-serif">[16] Yang,  &amp; Chen. (2020, June 29). Sign-to-speech translation using machine-learning-assisted  stretchable sensor arrays. <i>Nature Electronics</i>. </font></p>     <!-- ref --><p align="justify"><font size="2" face="Verdana, Arial, Helvetica, sans-serif">[17] Pascual, J. A. (2014, June 21). Google Gesture,  la voz de las personas sordomudas. <i>computerhoy</i>. </font>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=817093&pid=S1683-0789202100010000300017&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><p align="justify"><font size="2" face="Verdana, Arial, Helvetica, sans-serif">[18] PASTOR, J. (2013, 10 31). Kinect traduce el  lenguaje de signos a lenguaje hablado. <i>Xataka</i>. </font>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=817094&pid=S1683-0789202100010000300018&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><p align="justify">&nbsp;</p>      ]]></body><back>
<ref-list>
<ref id="B1">
<label>1</label><nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Admasu]]></surname>
</name>
<name>
<surname><![CDATA[Raimond]]></surname>
</name>
</person-group>
<source><![CDATA[Ethiopian sign language recognition using Artificial Neural Network]]></source>
<year>2010</year>
</nlm-citation>
</ref>
<ref id="B2">
<label>2</label><nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Buduma]]></surname>
<given-names><![CDATA[N.]]></given-names>
</name>
</person-group>
<source><![CDATA[Fundamentals of Deep Learning, designing the next-generation machine intelligence algorithms]]></source>
<year></year>
</nlm-citation>
</ref>
<ref id="B3">
<label>3</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Canny]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[A Computational Approach To Edge Detection]]></article-title>
<source><![CDATA[IEEE Trans. Pattern Analysis and Machine Intelligence]]></source>
<year>1986</year>
<volume>8</volume>
<numero>6</numero>
<issue>6</issue>
<page-range>679-698</page-range></nlm-citation>
</ref>
<ref id="B4">
<label>4</label><nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Garcia]]></surname>
</name>
<name>
<surname><![CDATA[Alarcon]]></surname>
</name>
</person-group>
<source><![CDATA[Real-time American Sign Language Recognition with Convolutional Neural]]></source>
<year>2016</year>
<publisher-name><![CDATA[Stanford University]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B5">
<label>5</label><nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Goodfellow]]></surname>
</name>
</person-group>
<source><![CDATA[Deep Learning]]></source>
<year>2016</year>
<publisher-name><![CDATA[The MIT Press]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B6">
<label>6</label><nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Haykin]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
</person-group>
<source><![CDATA[Neural Networks and Learning Machines]]></source>
<year>2009</year>
<month>.</month>
</nlm-citation>
</ref>
<ref id="B7">
<label>7</label><nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Hninn]]></surname>
</name>
</person-group>
<source><![CDATA[Real-Time Hand Tracking and Gesture Recognition System Using Neural Networks]]></source>
<year>2009</year>
</nlm-citation>
</ref>
<ref id="B8">
<label>8</label><nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[LeCun]]></surname>
</name>
</person-group>
<source><![CDATA[Quand la machine apprend: La révolution des neurones artificiels et de l'apprentissage profond]]></source>
<year>2019</year>
</nlm-citation>
</ref>
<ref id="B9">
<label>9</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Mekala]]></surname>
</name>
<name>
<surname><![CDATA[Gao]]></surname>
</name>
<name>
<surname><![CDATA[Fan]]></surname>
</name>
<name>
<surname><![CDATA[Davari]]></surname>
<given-names><![CDATA[xxx.]]></given-names>
</name>
</person-group>
<source><![CDATA[Real-time sign language recognition based on neural network architecture]]></source>
<year>2011</year>
<conf-name><![CDATA[ IEEE 43rd Southeastern Symposium on System Theory]]></conf-name>
<conf-loc>Auburn AL</conf-loc>
<publisher-name><![CDATA[IEEE]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B10">
<label>10</label><nlm-citation citation-type="">
<collab>Ministerio de educacion boliviano</collab>
<source><![CDATA[Modulo I - Curso de enseñanza de la lengua de señas boliviana]]></source>
<year>2010</year>
</nlm-citation>
</ref>
<ref id="B11">
<label>11</label><nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Núñez]]></surname>
<given-names><![CDATA[C. P.]]></given-names>
</name>
</person-group>
<source><![CDATA[Convolutional Neural Networks and Long Short-Term Memory for skeleton-based human activity and hand gesture recognition]]></source>
<year>2017</year>
<publisher-loc><![CDATA[Madrid, Spain ]]></publisher-loc>
<publisher-name><![CDATA[Universidad Rey Juan Carlos]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B12">
<label>12</label><nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Russell]]></surname>
<given-names><![CDATA[N.]]></given-names>
</name>
</person-group>
<source><![CDATA[Inteligencia Artificial, un enfoque moderno]]></source>
<year></year>
</nlm-citation>
</ref>
<ref id="B13">
<label>13</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Sharma]]></surname>
</name>
<name>
<surname><![CDATA[Nemani]]></surname>
</name>
<name>
<surname><![CDATA[Kumar]]></surname>
</name>
<name>
<surname><![CDATA[Kane]]></surname>
</name>
</person-group>
<source><![CDATA[Recognition of Single Handed Sign Language Gestures using Contour Tracing Descriptor]]></source>
<year>2013</year>
<conf-name><![CDATA[ World Congress on Engineering]]></conf-name>
<conf-loc>London </conf-loc>
</nlm-citation>
</ref>
<ref id="B14">
<label>14</label><nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Starner]]></surname>
</name>
<name>
<surname><![CDATA[Pentland]]></surname>
</name>
</person-group>
<source><![CDATA[Real-Time American Sign Language Recognition from Video Using Hidden Markov Models]]></source>
<year>1996</year>
<publisher-name><![CDATA[Massachusetts Institute of Technology]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B15">
<label>15</label><nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Suk]]></surname>
</name>
<name>
<surname><![CDATA[Sin]]></surname>
</name>
<name>
<surname><![CDATA[Lee]]></surname>
</name>
</person-group>
<source><![CDATA[Hand gesture recognition based on dynamic Bayesian network framework]]></source>
<year>2010</year>
<publisher-name><![CDATA[ELSEVIER]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B16">
<label>16</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Yang]]></surname>
</name>
<name>
<surname><![CDATA[Chen]]></surname>
<given-names><![CDATA[xxx.]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Sign-to-speech translation using machine-learning-assisted stretchable sensor arrays]]></article-title>
<source><![CDATA[Nature Electronics]]></source>
<year>2020</year>
<month>, </month>
<day>Ju</day>
</nlm-citation>
</ref>
<ref id="B17">
<label>17</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Pascual]]></surname>
<given-names><![CDATA[J. A.]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Google Gesture, la voz de las personas sordomudas]]></article-title>
<source><![CDATA[Computerhoy]]></source>
<year>2014</year>
<month>, </month>
<day>Ju</day>
</nlm-citation>
</ref>
<ref id="B18">
<label>18</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[PASTOR]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Kinect traduce el lenguaje de signos a lenguaje hablado]]></article-title>
<source><![CDATA[Xataka]]></source>
<year>2013</year>
<month>, </month>
<day>10</day>
</nlm-citation>
</ref>
</ref-list>
</back>
</article>
