Translating Chinese to English in SQL with Microsoft Translator

Jul 26, 2018 Microsoft Azure, Oracle, SQL

In Oracle, I had a table of data which had some Chinese words that I needed to translate into English on the fly, in SQL…this is how I did that…

Microsoft have a translator facility here with the Translator Text API v3.0 to allow you to call it programmatically. I’m using Microsoft as I’m currently working on Azure – of course, there are other translation facilities available.

The API has a translate method which one needs to construct a call to. The format of the call is:

https://api.cognitive.microsofttranslator.com/translate?api-version=3.0&from=xxxx&to=yyyy

…where xxxx is the from language, e.g. zh-Hans for Simplified Chinese (my case) and yyyy is the to language, e.g. en for English.

In the body of the request needs to be some JSON of the form:

[{"Text": "zzzz"}]

…where zzzz is the text that needs to be converted from Simplified Chinese to English.

Calling the API would result in a response which contains the translated text in JSON format.

So, what we need to do is create an Oracle Function which can be called from SQL passing in the text that needs translating from a selected column. The function will call the Microsoft Translator API via UTL_HTTP to translate the text and return the translated text which is then displayed in the SQL output.

Thanks to Tim Hall for this article and Lucas Jellema for this article which helped me with some of this – I just had to do a few tweaks to get things to work in my use case, namely:

  1. Set up the Oracle Wallet for using HTTPS
  2. Convert the publish_cinema_event procedure Lucas wrote to a function so I could call it in SQL
  3. Use LENGTHB instead of LENGTH to determine the length of the text to be translated due to the text being multi byte
  4. Use WRITE_RAW and UTL_RAW.CAST_TO_RAW rather than WRITE_TEXT otherwise the chinese characters get mangled
  5. Set the body text of the request to be UTF-8 by calling UTL_HTTP.SET_BODY_CHARSET

Firstly the calls to the Microsoft Translator are via HTTPS rather than HTTP so I needed to set up Oracle Wallet with keys to facilitate that. I tried to follow the instructions on Tim’s page about using Chrome to get the certificate but no matter which option I chose it wouldn’t include the keys/certificates in the output file. Instead, I chose to go onto our Linux server and do it this way (adjust to suit your paths):

mkdir -p /u01/app/oracle/admin/ORCL/wallet
openssl s_client -showcerts -connect api.cognitive.microsofttranslator.com:443 </dev/null 2>/dev/null|openssl x509 -outform DER >/u01/app/oracle/admin/ORCL/wallet/ms_translate_key.der

This seemed to work fine – at least everything else after worked and the end result was that we could call the API so whatever the above did differently to Chrome I don’t know but it worked.

I then created a wallet on the Linux server:

orapki wallet create -wallet /u01/app/oracle/admin/ORCL/wallet -pwd MyPassword -auto_login
orapki wallet add -wallet /u01/app/oracle/admin/ORCL/wallet -trusted_cert -cert "/u01/app/oracle/admin/ORCL/wallet/ms_translate_key.der" -pwd MyPassword

Now once the wallet is created I created the following function:

SET DEFINE OFF
CREATE OR REPLACE FUNCTION translate_text(p_text_to_translate in varchar2
                                         ,p_language_from in varchar2
                                         ,p_language_to in varchar2
                                         ) RETURN VARCHAR2 IS
  req utl_http.req;
  res utl_http.resp;
Going into the situation, shall eventually slow down and buy cialis from canada  take time for appreciating the moment. Fight against dry skin: it is the best solution that defends against  cheapest viagra canada the erotic disturbances without any failure. There are a lot of medicines also for repairing the erectile dysfunction.  viagra cheap no prescription Becoming withdrawn will only put an unnecessary strain on your joint partnership and possible your entire family.  tadalafil for sale cheap   url VARCHAR2(4000) := 'https://api.cognitive.microsofttranslator.com/translate?api-version=3.0&from='||
                          p_language_from||'&to='||p_language_to;
  buffer VARCHAR2(4000); 
  content VARCHAR2(4000) := '[{"Text": "'||p_text_to_translate||'"}]';
BEGIN
  dbms_output.put_line('URL:'||url);
  dbms_output.put_line('CONTENT:'||content);
  dbms_output.put_line('CONTENT LENGTH:'||TO_CHAR(LENGTH(content)));
  req := utl_http.begin_request(url, 'POST',' HTTP/1.1');
  utl_http.set_header(req, 'user-agent', 'mozilla/4.0'); 
  utl_http.set_header(req, 'content-type', 'application/json'); 
  utl_http.set_header(req, 'Ocp-Apim-Subscription-Key', 'OCP_APIM_SUBSCRIPTION_KEY'); 
  utl_http.set_header(req, 'Content-Length', LENGTHB(content));
  utl_http.set_body_charset(req, 'UTF-8');
  utl_http.write_raw(req,utl_raw.cast_to_raw(content));
  res := utl_http.get_response(req);
  utl_http.read_line(res, buffer);
  utl_http.end_response(res);
  RETURN buffer;
EXCEPTION
WHEN OTHERS
  THEN utl_http.end_response(res);
  RAISE;
END translate_text;
/

NOTE – The SET DEFINE OFF is important given the embedded ampersand characters. The OCP_APIM_SUBSCRIPTION_KEY value needs to have whatever is relevant for your subscription as well. You may need to set up ACLs for the user running this code – Tim and Lucas cover that in their articles.

Now to run the code, login to the database and run this to engage the wallet:

EXEC UTL_HTTP.set_wallet('file:/u01/app/oracle/admin/ORCL/wallet', NULL);

Create a test table with some Simplified Chinese in it:

create table test_chinese(chinese_text varchar2(200));
insert into test_chinese values('敏捷的棕色狐狸跳过了懒狗');
commit;

Now select the data out using the translate_text function and see what we get:

select chinese_text,translate_text(chinese_text,'zh-Hans','en') from test_chinese;

The returned translation is in JSON format but of course if you wanted you could extract the text element from it easily.

That’s it.

By Jeff

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.