Posted on Leave a comment

Top 5 Real-World Applications for Natural Language Processing

Emerging technologies have greatly facilitated our daily lives. For instance, when you are making yourself dinner but want to call your Mom for the secret recipe, you don’t have to stop what you are doing and dial the number to make the phone call. Instead, all you need to do is to simply speak out — “Hey Siri, call Mom.” And your iPhone automatically makes the call for you.

The application is simple enough, but the technology behind it could be sophisticated. The magic that makes the aforementioned scenario possible is natural language processing (NLP). NLP is far more than a pillar for building Siri. It can also empower many other AI-infused applications in the real world.

This article first explains what NLP is and later moves on to introduce five real-world applications of NLP.

What is NLP?

From chatbots to Siri, from virtual support agents to knowledge graphs, the application and usage of NLP are ubiquitous in our daily life. NLP stands for “Natural Language Processing”. Simply put, NLP is the ability of a machine to understand human language. It is the bridge that enables humans to directly interact and communicate with machines. NLP is a subfield of artificial intelligence (AI) and in Bill Gates's words, “NLP is the pearl in the crown of AI.”

With the ever-expanding market size of NLP, countless companies are investing heavily in this industry, and their product lines vary. Many different but specific systems for various tasks and needs can be built by leveraging the power of NLP.

The Five Real World NLP Applications

The most popular exciting and flourishing real-world applications of NLP include: Conversational user interface, AI-powered call quality assessment, Intelligent outbound calls, AI-powered call operators, and knowledge graphs, to name a few.

Chatbots in E-commerce

Over five years ago, Amazon already realized the potential benefit of applying NLP to their customer service channels. Back then, when customers had issues with their product orderings, the only way they could resort was by calling the customer service agents. However, what they could get from the other side of the phone was “Your call is important to us. Please hold, we’re currently experiencing a high call load. “ most of the time. Thankfully, Amazon immediately realized the damaging effect this could have on their brand image and tried to build chatbots.

Nowadays, when you want to quickly get, for example, a refund online, there’s a much more convenient way! All you need to do is to activate the Amazon customer service chatbot and type in your ordering information and make a refund request. The chatbot interacts and replies the same way a real human does. Apart from the chatbots that deal with post-sales customer experience, chatbots also offer pre-sales consulting. If you have any questions about the product you are going to buy, you can simply chat with a bot and get the answers.

E-commerce chatbots.
E-commerce chatbots.

With the emergence of new concepts like metaverse, NLP can do more than power AI chatbots. Avatars for customer support in the metaverse rely on the NLP technology. Giving customers more realistic chatting experiences.

Customer support avatar in metaverse.
Customer support avatar in the metaverse.

Conversational User Interface

Another more trendy and promising application is interactive systems. Many well-recognized companies are betting big on CUI ( Conversational user interface). CUI is the general term to describe those user interfaces for computers that can simulate conversations with real human beings.

The most common CUIs in our everyday life are Apple’s Siri, Microsoft’s Cortana, Google’s Google Assistant, Amazon’s Alexa, etc.

Apple’s Siri is a common example of conversational user interface.
Apple’s Siri is a common example of a conversational user interface.

In addition, CUIs can also be embedded into cars, especially EVs (electric vehicles). NIO, an automobile manufacturer dedicated to designing and developing EVs, launched its own set of CUI named NOMI in 2018. Visually, the CUIs in cars can work in the same way as Siri. Drivers can focus on steering the car while asking the CUI to adjust A/C temperature, play a song, lock windows/doors, navigate drivers to the nearest gas station, etc.

Conversational user interface in cars.
The conversational user interface in cars.

The Algorithm Behind

Despite all the fancy algorithms the technical media have boasted about, one of the most fundamental ways to build a chatbot is to construct and organize FAQ pairs(or more straightforwardly, question-answer pairs) and use NLP algorithms to figure out if the user query matches anyone of your FAQ knowledge base. A simple FAQ example would be like this:

Q: Can I have some coffee?

A: No, I’d rather have some ribs.

Now that this FAQ pair is already stored in your NLP system, the user can now simply ask a similar question for example: “coffee, please!”. If your algorithm is smart enough, it will figure out that “coffee, please” has a great resemblance to “Can I have some coffee?” and will output the corresponding answer “No, I’d rather have some ribs.” And that’s how things are done.

For a very long time, FAQ search algorithms are solely based on inverted indexing. In this case, you first do tokenization on the original sentence and put tokens and documents into systems like ElasticSearch, which uses inverted-index for indexing and algorithms like TF-IDF or BM25 for scoring.

This algorithm works just as fine until the deep learning era arrives. One of the most substantial problems with the algorithm above is that neither tokenization nor inverted indexing takes into account the semantics of the sentences. For instance, in the example above, users could say “ Can I have a cup of Cappuccino” instead. Now with tokenization and inverted-indexing, there’s a very big chance that the system won’t recognize “coffee” and “a cup of Cappuccino” as the same thing and would thus fail to understand the sentence. AI engineers have to do a lot of workarounds for these kinds of issues.

But things got much better with deep learning. With pre-trained models like BERT and pipelines like Towhee, we can easily encode all sentences into vectors and store them in a vector database, for example, Milvus, and simply calculate vector distance to figure out the semantic resembles of sentences.

The algorithm behind conversational user interfaces.

AI-powered Call Quality Control

Call centers are indispensable for many large companies that care about customer experience. To better spot issues and improve call quality, assessment is necessary. However, the problem is that call centers of large multi-national companies receive tremendous amounts of inbound calls per day. Therefore, it is impractical to listen to each of the millions of calls and make the evaluation. Most of the time, when you hear “in order to improve our service, this call could be recorded.” from the other end of the phone, it doesn’t necessarily mean your call would be checked for quality of service. In fact, even in big organizations, only 2%-3% of the calls would be replayed and checked manually by quality control people.

A call center. Image source: Pexels by Tima Miroshnichenko.

This is where NLP can help. An AI-powered call quality control engine powered by NLP can automatically spot the issues incalls and can handle massive volumes of calls in a relatively short period of time. The engine helps detect if the call operator uses the proper opening and ending sentences, and avoids that banned slang and taboo words in the call. This would easily increase the check rate from 2%-3% to 100%, with even less manpower and other costs.

With a typical AI-powered call quality control service, users need to first upload the call recordings to the service. Then the technology of Automatic speech recognition (ASR) is used to transcribe the audio files into texts. All the texts are subsequently vectorized using deep learning models and subsequently stored in a vector database. The service compares the similarity between the text vectors and vectors generated from a certain set of criteria such as taboo word vectors and vectors of desired opening and closing sentences. With efficient vector similarity search, handling great volumes of call recordings can be much more accurate and less time-consuming.

Intelligent outbound calls

Believe it or not, some of the phone calls you receive are not from humans! Chances are that it is a robot talking from the other side of the call. To reduce operation costs, some companies might leverage AI phone calls for marketing purposes and much more. Google launched Google Duplex back in 2018, a system that can conduct human-computer conversations and accomplish real-world tasks over the phone. The mechanism behind AI phone calls is pretty much the same as that behind chatbots.

Google assistant.
A user asks the Google Assistant for an appointment, which the Assistant then schedules by having Duplex call the business. Image source: Google AI blog.

In other cases, you might have also heard something like this on the phone:

“Thank you for calling. To set up a new account, press 1. To modify your password to an existing account, press 2. To speak to our customer service agent, press 0.”,

or in recent years, something like (with a strong robot accent):

“Please tell me what I can help you with. For example, You can ask me ‘check the balance of my account’.”

This is known as interactive voice response (IVR). It is an automated phone system that interacts with callers and performs based on the answers and actions of the callers. The callers are usually offered some choices via a menu. And then their choice will decide how the phone call system acts. If the user request is too complex, the system can route callers to a human agent. This can greatly reduce labor costs and save time for companies.

Intents are usually very helpful when dealing with calls like these. An intent is a group of sentences or dialects representing a certain user intention. For example, “weather forecast” can be intent, and this intent can be triggered with different sentences. See the picture of a Google Dialogflow example below. Intents can be organized together to accomplish complicated interactive human-computer conversations. Like booking a restaurant, ordering a flight ticket, etc.

Google Dialogflow.
Google Dialogflow.

AI-powered call operators

By adopting the technology of NLP, companies can carry call operation services to the next level. Conventionally, call operators need to look up a hundred page-long professional manual to deal with each call from customers and solve each of the user problems case by case. This process is extremely time-consuming and for most of the time cannot satisfy callers with desirable solutions. However, with an AI-powered call center, dealing with customer calls can be both cozy and efficient.

AI-aided call operators with greater efficiency.
AI-aided call operators with greater efficiency. Image source: Pexels by MART PRODUCTION.

When a customer dials in, the system immediately searches for the customer and their ordering information in the database so that the call operator can have a general idea of the case, like how old the customer is, their marriage status, things they have purchased in the past, etc. During the conversation, the whole chat will be recorded with a live chat log shown on the screen (thanks to living Automatic Speech Recognition). Moreover, when a customer asks a hard question or starts complaining, the machine will catch it automatically, look into the AI database, and tell you what is the best way to respond. With a decent deep learning model, your service could always give your customer >99% correct answers to their questions and can always handle customers’ complaints with the most proper words.

Knowledge graph

A knowledge graph is an information-based graph that consists of nodes, edges, and labels. Where a node (or a vertex) usually represents an entity. It could be a person, a place, an item, or an event. Edges are the lines connecting the nodes. There are also labels that signify the connection or relationship between a pair of nodes. A typical knowledge graph example is shown below:

A sample knowledge graph. Source: A guide to Knowledge Graphs.

The raw data for constructing a knowledge graph may come from various sources — unstructured docs, semi-structured data, and structured knowledge. Various algorithms must be applied to these data so as to extract entities (nodes) and the relationship between entities (edges). To name a few, one needs to do entity recognition, relations extracting, label mining, entity linking. To build a knowledge graph with data in docs, for instance, we need to first use deep learning pipelines to generate embeddings and store them in a vector database.

Once the knowledge graph is constructed, you can see it as the underlying pillar for many more specific applications like smart search engines, question-answering systems, recommending systems, advertisements, and more.

Endnote

This article introduces the top five real-world NLP applications. Leveraging NLP in your business can greatly reduce operational costs and improve user experience. Of course, apart from the five applications introduced in this article, NLP can facilitate more business scenarios including social media analytics, translation, sentiment analysis, meeting summarizing, and more.

There are also a bunch of NLP+, or more generally, AI+ concepts that are getting more and more popular these few years. For example, with AI + RPA (Robotic process automation). You can easily build smart pipelines that complete workflows automatically for you, such as an expense reimbursement workflow where you just need to upload your receipt, and AI + RPA will do all the rest for you. There’s also AI + OCR, where you just need to take a picture of, say, a contract, and AI will tell you if there’s a mistake in your contract, say, the telephone number of a company doesn’t match the number shown in Google search.

Source

Posted on Leave a comment

Researchers Find Way to Run Malware on iPhone Even When It’s OFF

A first-of-its-kind security analysis of iOS Find My function has demonstrated a novel attack surface that makes it possible to tamper with the firmware and load malware onto a Bluetooth chip that's executed while an iPhone is "off."

The mechanism takes advantage of the fact that wireless chips related to Bluetooth, Near-field communication (NFC), and ultra-wideband (UWB) continue to operate while iOS is shut down when entering a "power reserve" Low Power Mode (LPM).

While this is done so as to enable features like Find My and facilitate Express Card transactions, all the three wireless chips have direct access to the secure element, academics from the Secure Mobile Networking Lab (SEEMOO) at the Technical University of Darmstadt said in a paper.

"The Bluetooth and UWB chips are hardwired to the Secure Element (SE) in the NFC chip, storing secrets that should be available in LPM," the researchers said.

"Since LPM support is implemented in hardware, it cannot be removed by changing software components. As a result, on modern iPhones, wireless chips can no longer be trusted to be turned off after shutdown. This poses a new threat model."

The findings are set to be presented at the ACM Conference on Security and Privacy in Wireless and Mobile Networks (WiSec 2022) this week.

The LPM features, newly introduced last year with iOS 15, make it possible to track lost devices using the Find My network. Current devices with Ultra-wideband support include iPhone 11, iPhone 12, and iPhone 13.

A message displayed when turning off iPhones reads thus: "iPhone remains findable after power off. Find My helps you locate this iPhone when it is lost or stolen, even when it is in power reserve mode or when powered off."

Malware

Calling the current LPM implementation "opaque," the researchers not only sometimes observed failures when initializing Find My advertisements during power off, effectively contradicting the aforementioned message, they also found that the Bluetooth firmware is neither signed nor encrypted.

By taking advantage of this loophole, an adversary with privileged access can create malware that's capable of being executed on an iPhone Bluetooth chip even when it's powered off.

However, for such a firmware compromise to happen, the attacker must be able to communicate to the firmware via the operating system, modify the firmware image, or gain code execution on an LPM-enabled chip over-the-air by exploiting flaws such as BrakTooth.

Put differently, the idea is to alter the LPM application thread to embed malware, such as those that could alert the malicious actor of a victim's Find My Bluetooth broadcasts, enabling the threat actor to keep remote tabs on the target.

"Instead of changing existing functionality, they could also add completely new features," SEEMOO researchers pointed out, adding they responsibly disclosed all the issues to Apple, but that the tech giant "had no feedback."

With LPM-related features taking a more stealthier approach to carrying out its intended use cases, SEEMOO called on Apple to include a hardware-based switch to disconnect the battery so as to alleviate any surveillance concerns that could arise out of firmware-level attacks.

"Since LPM support is based on the iPhone's hardware, it cannot be removed with system updates," the researchers said. "Thus, it has a long-lasting effect on the overall iOS security model."

"Design of LPM features seems to be mostly driven by functionality, without considering threats outside of the intended applications. Find My after power off turns shutdown iPhones into tracking devices by design, and the implementation within the Bluetooth firmware is not secured against manipulation."

Source

Posted on Leave a comment

What is differential privacy in machine learning (preview)?

How differential privacy works

Differential privacy is a set of systems and practices that help keep the data of individuals safe and private. In machine learning solutions, differential privacy may be required for regulatory compliance.

Differential privacy machine learning process.

In traditional scenarios, raw data is stored in files and databases. When users analyze data, they typically use the raw data. This is a concern because it might infringe on an individual's privacy. Differential privacy tries to deal with this problem by adding "noise" or randomness to the data so that users can't identify any individual data points. At the least, such a system provides plausible deniability. Therefore, the privacy of individuals is preserved with limited impact on the accuracy of the data.

In differentially private systems, data is shared through requests called queries. When a user submits a query for data, operations known as privacy mechanisms add noise to the requested data. Privacy mechanisms return an approximation of the data instead of the raw data. This privacy-preserving result appears in a report. Reports consist of two parts, the actual data computed and a description of how the data was created.

Differential privacy metrics

Differential privacy tries to protect against the possibility that a user can produce an indefinite number of reports to eventually reveal sensitive data. A value known as epsilon measures how noisy, or private, a report is. Epsilon has an inverse relationship to noise or privacy. The lower the epsilon, the more noisy (and private) the data is.

Epsilon values are non-negative. Values below 1 provide full plausible deniability. Anything above 1 comes with a higher risk of exposure of the actual data. As you implement machine learning solutions with differential privacy, you want to data with epsilon values between 0 and 1.

Another value directly correlated to epsilon is delta. Delta is a measure of the probability that a report isn’t fully private. The higher the delta, the higher the epsilon. Because these values are correlated, epsilon is used more often.

Limit queries with a privacy budget

To ensure privacy in systems where multiple queries are allowed, differential privacy defines a rate limit. This limit is known as a privacy budget. Privacy budgets prevent data from being recreated through multiple queries. Privacy budgets are allocated an epsilon amount, typically between 1 and 3 to limit the risk of reidentification. As reports are generated, privacy budgets keep track of the epsilon value of individual reports as well as the aggregate for all reports. After a privacy budget is spent or depleted, users can no longer access data.

Reliability of data

Although the preservation of privacy should be the goal, there’s a tradeoff when it comes to usability and reliability of the data. In data analytics, accuracy can be thought of as a measure of uncertainty introduced by sampling errors. This uncertainty tends to fall within certain bounds. Accuracy from a differential privacy perspective instead measures the reliability of the data, which is affected by the uncertainty introduced by the privacy mechanisms. In short, a higher level of noise or privacy translates to data that has a lower epsilon, accuracy, and reliability.

Open-source differential privacy libraries

SmartNoise is an open-source project that contains components for building machine learning solutions with differential privacy. SmartNoise is made up of the following top-level components:

  • SmartNoise Core library
  • SmartNoise SDK library

SmartNoise Core

The core library includes the following privacy mechanisms for implementing a differentially private system:

ComponentDescription
AnalysisA graph description of arbitrary computations.
ValidatorA Rust library that contains a set of tools for checking and deriving the necessary conditions for an analysis to be differentially private.
RuntimeThe medium to execute the analysis. The reference runtime is written in Rust but runtimes can be written using any computation framework such as SQL and Spark depending on your data needs.
BindingsLanguage bindings and helper libraries to build analyses. Currently SmartNoise provides Python bindings.

SmartNoise SDK

The system library provides the following tools and services for working with tabular and relational data:

ComponentDescription
Data Access

Library that intercepts and processes SQL queries and produces reports. This library is implemented in Python and supports the following ODBC and DBAPI data sources:

  • PostgreSQL
  • SQL Server
  • Spark
  • Preston
  • Pandas
ServiceExecution service that provides a REST endpoint to serve requests or queries against shared data sources. The service is designed to allow composition of differential privacy modules that operate on requests containing different delta and epsilon values, also known as heterogeneous requests. This reference implementation accounts for additional impact from queries on correlated data.
Evaluator

Stochastic evaluator that checks for privacy violations, accuracy, and bias. The evaluator supports the following tests:

  • Privacy Test - Determines whether a report adheres to the conditions of differential privacy.
  • Accuracy Test - Measures whether the reliability of reports falls within the upper and lower bounds given a 95% confidence level.
  • Utility Test - Determines whether the confidence bounds of a report are close enough to the data while still maximizing privacy.
  • Bias Test - Measures the distribution of reports for repeated queries to ensure they aren’t unbalanced

Next steps

Learn more about differential privacy in machine learning:

Posted on Leave a comment

Responsible AI – Privacy and Security Requirements

Training data and prediction requests can both contain sensitive information about people / business which has to be protected. How do you safeguard the privacy of the individuals? What steps are taken to ensure that individuals have control of their data? There are regulations in countries to ensure privacy and security.

 In Europe you have the GDPR (General Data Protection Regulations) and in California there is CCPA (California Consumer Privacy Act,). Fundamentally, both give an individual control over its Data and requires that companies should protect the Data being used in the model. When Data processing is based on consent, then am individual has the right to revoke the consent at any time.

 Defending ML Models against attacks – Ensuring privacy of consumer data:

 I have discussed about very briefly about the tools for adversarial training – CleverHans and FoolBox Python libraries here: Model Debugging: Sensitivity Analysis, Adversarial Training, Residual Analysis  . Let us now look at more stringent means of protecting a ML model against attacks. It is important to protect the ML model against attacks, thus, ensuring the privacy and security of data. An ML model may be attacked in different ways – some literature classifies the attacks into: “Information Harms” and “Behavioural Harms”. Information Harm occurs when the information is allowed to leak from the model. There are different forms of Information Harms: Membership Inference, Model Inversion and Model Extraction. In Membership Inference, the attacker can determine if some information is part of the training data or not. In Model Inversion, the attacker can extract all the training data from the model and Model Extraction, the attacker is able to extract the entire model!

 Behavioural Harm occurs when the attacker can change the behaviour of the ML model itself – example: by inserting malicious data. In this post – I have given an example of an autonomous vehicle in this article: Model Debugging: Sensitivity Analysis, Adversarial Training, Residual Analysis

Cryptography | Differential privacy to protect data

You should consider privacy enhancing technologies like Secure Multi Party Computation ,(SMPC) and Fully Homomorphic Encryption (FHE). SMPC involves multiple systems to train or serve the model whilst the actual data is kept secure

In FHE the data is encrypted. Prediction requests involve encrypted data and training of the model is also carried out on encrypted data. This results in heavy computational cost because the data is never decrypted except by the user. Users will send encrypted prediction requests and will receive back an encrypted result. The goal is that using cryptography you can protect the consumers data.

Differential Privacy in Machine Learning

Differential privacy involves protection of the data by adding noise to the data so that the attackers cannot identify the real content. SmartNoise is an open-source project that contains components for building machine learning solutions with differential privacy. SmartNoise is made of following top level components:

✔️Smart Noise Core Library

✔️Smart Noise SDK Library

This is a good read to understand about Differential Privacy: https://docs.microsoft.com/en-us/azure/machine-learning/concept-differential-privacy

 Private Aggregation of Teacher Ensembles (PATE)

This follows the Knowledge Distillation concept that I discussed here: Post 1- Knowledge DistillationPost - 2 Knowldge Distillation. PATE begins by dividing the data into “k” partitions with no overlaps. It then trains k models on that data and then aggregates the results on an aggregate teacher model. During the aggregation for the aggregate teacher, you will add noise to the data and the output.

For deployment, you will use the student model. To train the student model you take unlabelled public data and feed it to the teacher model and the result is labelled data with which the student model is trained. For deployment, you use only the student model.

The process is illustrated in the figure below:

No alt text provided for this image

PATE (Private Aggregation of Teacher Ensembles)

Source

Credits:

Posted on Leave a comment

Employee monitoring software became the new normal during COVID-19. It seems workers are stuck with it

Many employers say they'll keep the surveillance software switched on — even for office workers.


In early 2020, as offices emptied and employees set up laptops on kitchen tables to work from home, the way managers kept tabs on white-collar workers underwent an abrupt change as well.

Bosses used to counting the number of empty desks, or gauging the volume of keyboard clatter, now had to rely on video calls and tiny green "active" icons in workplace chat programs.

In response, many employers splashed out on sophisticated kinds of spyware to claw back some oversight.

"Employee monitoring software" became the new normal, logging keystrokes and mouse movement, capturing screenshots, tracking location, and even activating webcams and microphones.

At the same time, workers were dreaming up creative new ways to evade the software's all-seeing eye.

Now, as workers return to the office, demand for employee tracking "bossware" remains high, its makers say.

Surveys of employers in white-collar industries show that even returned office workers will be subject to these new tools.

What was introduced in the crisis of the pandemic, as a short-term remedy for lockdowns and working from home (WFH), has quietly become the "new normal" for many Australian workplaces.

A game of cat-and-mouse jiggler

For many workers, the surveillance software came out of nowhere.

The abrupt appearance of spyware in many workplaces can be seen in the sudden popularity of covert devices designed to evade this surveillance.

Before the pandemic, "mouse jigglers" were niche gadgets used by police and security agencies to keep seized computers from logging out and requiring a password to access.

Mouse jigglers for sale on eBay
An array of mouse jigglers for sale on eBay.(Supplied: eBay)

Plugged into a laptop's USB port, the jiggler randomly moves the mouse cursor, faking activity when there's no-one there.

When the pandemic hit, sales boomed among WFH employees.

In the last two years, James Franklin, a young Melbourne software engineer, has mailed 5,000 jigglers to customers all over the country — mostly to employees of "large enterprises", he says.

Often, he's had to upgrade the devices to evade an employers' latest methods of detecting and blocking them.

It's been a game of cat-and-mouse jiggler.

"Unbelievable demand is the best way to describe it," he said.

And mouse jigglers aren't the only trick for evading the software.

In July last year, a Californian mum's video about a WFH hack went viral on TikTok.

Leah told how her computer set her status to "away" whenever she stopped moving her cursor for more than a few seconds, so she had placed a small vibrating device under the mouse.

"It's called a mouse mover … so you can go to the bathroom, free from paranoia."

Others picked up the story and shared their tips, from free downloads of mouse-mimicking software to YouTube videos that are intended to play on a phone screen, with an optical mouse resting on top. The movement of the lines in the video makes the cursor move.

"A lot of people have reached out on TikTok," Leah told the ABC.

"There were a lot of people going, 'Oh, my gosh, I can't believe I haven't heard of this before, send me the link.'"

Tracking software sales are up — and staying up

On the other side of the world, in New York, EfficientLab makes and sells an employee surveillance software called Controlio that's widely used in Australia.

It has "hundreds" of Australian clients, said sales manager Moath Galeb.

"At the beginning of the pandemic, there was already a lot of companies looking into monitoring software, but it wasn't such an important feature," he said.

"But the pandemic forced many people to work remotely and the companies started to look into employee monitoring software more seriously."

An online dashboard showing active time and productivity score for a worker
Managers can track employees' productivity scores on a realtime dashboard.(Supplied: Controlio)

In Australia, as in other countries, the number of Controlio clients has increased "two or three times" with the pandemic.

This increase was to be expected — but what surprised even Mr Galeb was that demand has remained strong in recent months.

"They're getting these insights into how people get their work done," he said.

The most popular features for employers, he said, track employee "active time" to generate a "productivity score".

Managers view these statistics through an online dashboard.

Advocates say this is a way of looking after employees, rather than spying on them.

Bosses can see who is "working too many hours", Mr Galeb said.

"Depending on the data, or the insights that you receive, you get to build this picture of who is doing more and doing less."

Nothing new for blue-collar workers

But those being monitored are likely to see things a little differently. 

Ultimately, how the software is used depends on what power bosses have over their workers.

For the increasing number of people in insecure, casualised work, these tools appear less than benign.

In an August 2020 submission to a NSW senate committee investigating the impact of technological change on the future of work, the United Workers Union featured the story of a call centre worker who had been working remotely during the pandemic. 

One day, the employer informed the man that monitoring software had detected his apparent absence for a 45-minute period two weeks earlier.

The submission reads:

Unable to remember exactly what he was doing that particular day, the matter was escalated to senior management who demanded to know exactly where he physically was during this time. This 45-minute break in surveillance caused considerable grief and anxiety for the company. A perceived productivity loss of $27 (the worker's hourly rate) resulted in several meetings involving members of upper management, formal letters of correspondence, and a written warning delivered to the worker.

There were many stories like this one, said Lauren Kelly, who wrote the submission.

"The software is sold as a tool of productivity and efficiency, but really it's about surveillance and control," she said.

"I find it very unlikely it would result in management asking somebody to slow down and do less work."

Ms Kelly, who is now a PhD candidate at RMIT with a focus on workplace technologies including surveillance, says tools for tracking an employee's location and activity are nothing new — what has changed in the past two years is the types of workplaces where they are used.

Before the pandemic, it was more for blue-collar workers. Now, it's for white-collar workers too.

"Once it's in, it's in. It doesn't often get uninstalled," she said.

"The tracking software becomes a ubiquitous part of the infrastructure of management."

The 'quid pro quo' of WFH?

More than half of Australian small-to-medium-sized businesses used software to monitor the activity and productivity of employees working remotely, according to a Capterra survey in November 2020.

That's about on par with the United States.

"There's a tendency in Australia to view these workplace trends as really bad in other places like the United States and China," Ms Kelly said.

"But actually, those trends are already here."

A screenshot of a dashboard showing a graph with different emotions
The latest software claims to monitor employee emotions like happiness and sadness.(Supplied: StaffCircle)

In fact, a 2021 survey suggested Australian employers had embraced location-tracking software more warmly than those of any other country.

Every two years, the international law firm Herbert Smith Freehills surveys thousands of its large corporate clients around the world for an ongoing series of reports on the future of work.

In 2021, it found 90 per cent of employers in Australia monitor the location of employees when they work remotely, significantly more than the global average of less than 80 per cent.

Many introduced these tools having found that during lockdown, some employees had relocated interstate or even overseas without asking permission or informing their manager, said Natalie Gaspar, an employment lawyer and partner at Herbert Smith Freehills.

"I had clients of mine saying that they didn't realise that their employees were working in India or Pakistan," she said.

"And that's relevant because there [are] different laws that apply in those different jurisdictions about workers compensation laws, safety laws, all those sorts of things."

She said that, anecdotally, many of her "large corporate" clients planned to keep the employee monitoring software tools — even for office workers.

"I think that's here to stay in large parts."

And she said employees, in general, accepted this elevated level of surveillance as "the cost of flexibility".

"It's the quid pro quo for working from home," she said.

Is it legal?

The short answer is yes, but there are complications.

There's no consistent set of laws operating across jurisdictions in Australia that regulate surveillance of the workplace.

In New South Wales and the ACT, an employer can only install monitoring software on a computer they supply for the purposes of work.

With some exceptions, they must also advise employees they're installing the software and explain what is being monitored 14 days prior to the software being installed or activated.

In NSW, the ACT and Victoria, it's an offence to install an optical or listening device in workplace toilets, bathroom or change rooms.

South Australia, Tasmania, Western Australia, the Northern Territory and Queensland do not currently have specific workplace surveillance laws in place.

Smile, you're at your laptop

Location tracking software may be the cost of WFH, but what about tools that check whether you're smiling into the phone, or monitor the pace and tone of your voice for depression and fatigue?

These are some of the features being rolled out in the latest generation of monitoring software.

Zoom, for instance, recently introduced a tool that provides sales meeting hosts with a post-meeting transcription and "sentiment analysis".

A screenshot of a sales video with analytics and sentiment analysis
Zoom IQ for Sales offers a breakdown of how the meeting went.(Supplied: Zoom)

Software already on the market trawls email and Slack messages to detect levels of emotion like happiness, anger, disgust, fear or sadness.

The Herbert Smith Freehills 2021 survey found 82 per cent of respondents planned to introduce digital tools to measure employee wellbeing.

A bit under half said they already had processes in place to detect and address wellbeing issues, and these were assisted by technology such as sentiment analysis software.

Often, these technologies are tested in call centres before they're rolled out to other industries, Ms Kelly said.

"Affect monitoring is very controversial and the technology is flawed.

"Some researchers would argue it's simply not possible for AI or any software to truly 'know' what a person is feeling.

"Regardless, there's a market for it and some employers are buying into it."

The movement of the second hand of an analogue wristwatch moves an optical mouse cursor a tiny amount.(Supplied: Reddit)

Back in Melbourne, Mr Franklin remains hopeful that plucky inventors can thwart the spread of bossware.

When companies switched to logging keyboard inputs, someone invented a random keyboard input device.

When managers went a step further and monitored what was happening on employees' screens, a tool appeared that cycled through a prepared list of webpages at regular intervals.

"The sky's the limit when it comes to defeating these systems," he said.

And sometimes the best solutions are low tech.

Recently, an employer found a way to block a worker's mouse jiggler, so he simply taped his mouse to the office fan.

"And it dragged the mouse back and forth.

"Then he went out to lunch."

 
Posted on Leave a comment

A one-up on motion capture

A new neural network approach captures the characteristics of a physical system’s dynamic motion from video, regardless of rendering configuration or image differences.
 
 

MIT researchers used the RISP method to predict the action sequence, joint stiffness, or movement of an articulated hand, like this one, from a target image or video.

From “Star Wars” to “Happy Feet,” many beloved films contain scenes that were made possible by motion capture technology, which records movement of objects or people through video. Further, applications for this tracking, which involve complicated interactions between physics, geometry, and perception, extend beyond Hollywood to the military, sports training, medical fields, and computer vision and robotics, allowing engineers to understand and simulate action happening within real-world environments.

As this can be a complex and costly process — often requiring markers placed on objects or people and recording the action sequence — researchers are working to shift the burden to neural networks, which could acquire this data from a simple video and reproduce it in a model. Work in physics simulations and rendering shows promise to make this more widely used, since it can characterize realistic, continuous, dynamic motion from images and transform back and forth between a 2D render and 3D scene in the world. However, to do so, current techniques require precise knowledge of the environmental conditions where the action is taking place, and the choice of renderer, both of which are often unavailable.

Now, a team of researchers from MIT and IBM has developed a trained neural network pipeline that avoids this issue, with the ability to infer the state of the environment and the actions happening, the physical characteristics of the object or person of interest (system), and its control parameters. When tested, the technique can outperform other methods in simulations of four physical systems of rigid and deformable bodies, which illustrate different types of dynamics and interactions, under various environmental conditions. Further, the methodology allows for imitation learning — predicting and reproducing the trajectory of a real-world, flying quadrotor from a video.

“The high-level research problem this paper deals with is how to reconstruct a digital twin from a video of a dynamic system,” says Tao Du PhD '21, a postdoc in the Department of Electrical Engineering and Computer Science (EECS), a member of Computer Science and Artificial Intelligence Laboratory (CSAIL), and a member of the research team. In order to do this, Du says, “we need to ignore the rendering variances from the video clips and try to grasp of the core information about the dynamic system or the dynamic motion.”

Du’s co-authors include lead author Pingchuan Ma, a graduate student in EECS and a member of CSAIL; Josh Tenenbaum, the Paul E. Newton Career Development Professor of Cognitive Science and Computation in the Department of Brain and Cognitive Sciences and a member of CSAIL; Wojciech Matusik, professor of electrical engineering and computer science and CSAIL member; and MIT-IBM Watson AI Lab principal research staff member Chuang Gan. This work was presented this week the International Conference on Learning Representations.

While capturing videos of characters, robots, or dynamic systems to infer dynamic movement makes this information more accessible, it also brings a new challenge. “The images or videos [and how they are rendered] depend largely on the on the lighting conditions, on the background info, on the texture information, on the material information of your environment, and these are not necessarily measurable in a real-world scenario,” says Du. Without this rendering configuration information or knowledge of which renderer is used, it’s presently difficult to glean dynamic information and predict behavior of the subject of the video. Even if the renderer is known, current neural network approaches still require large sets of training data. However, with their new approach, this can become a moot point. “If you take a video of a leopard running in the morning and in the evening, of course, you'll get visually different video clips because the lighting conditions are quite different. But what you really care about is the dynamic motion: the joint angles of the leopard — not if they look light or dark,” Du says.

In order to take rendering domains and image differences out of the issue, the team developed a pipeline system containing a neural network, dubbed “rendering invariant state-prediction (RISP)” network. RISP transforms differences in images (pixels) to differences in states of the system — i.e., the environment of action — making their method generalizable and agnostic to rendering configurations. RISP is trained using random rendering parameters and states, which are fed into a differentiable renderer, a type of renderer that measures the sensitivity of pixels with respect to rendering configurations, e.g., lighting or material colors. This generates a set of varied images and video from known ground-truth parameters, which will later allow RISP to reverse that process, predicting the environment state from the input video. The team additionally minimized RISP’s rendering gradients, so that its predictions were less sensitive to changes in rendering configurations, allowing it to learn to forget about visual appearances and focus on learning dynamical states. This is made possible by a differentiable renderer.

The method then uses two similar pipelines, run in parallel. One is for the source domain, with known variables. Here, system parameters and actions are entered into a differentiable simulation. The generated simulation’s states are combined with different rendering configurations into a differentiable renderer to generate images, which are fed into RISP. RISP then outputs predictions about the environmental states. At the same time, a similar target domain pipeline is run with unknown variables. RISP in this pipeline is fed these output images, generating a predicted state. When the predicted states from the source and target domains are compared, a new loss is produced; this difference is used to adjust and optimize some of the parameters in the source domain pipeline. This process can then be iterated on, further reducing the loss between the pipelines.

To determine the success of their method, the team tested it in four simulated systems: a quadrotor (a flying rigid body that doesn’t have any physical contact), a cube (a rigid body that interacts with its environment, like a die), an articulated hand, and a rod (deformable body that can move like a snake). The tasks included estimating the state of a system from an image, identifying the system parameters and action control signals from a video, and discovering the control signals from a target image that direct the system to the desired state. Additionally, they created baselines and an oracle, comparing the novel RISP process in these systems to similar methods that, for example, lack the rendering gradient loss, don’t train a neural network with any loss, or lack the RISP neural network altogether. The team also looked at how the gradient loss impacted the state prediction model’s performance over time. Finally, the researchers deployed their RISP system to infer the motion of a real-world quadrotor, which has complex dynamics, from video. They compared the performance to other techniques that lacked a loss function and used pixel differences, or one that included manual tuning of a renderer’s configuration.

In nearly all of the experiments, the RISP procedure outperformed similar or the state-of-the-art methods available, imitating or reproducing the desired parameters or motion, and proving to be a data-efficient and generalizable competitor to current motion capture approaches.

For this work, the researchers made two important assumptions: that information about the camera is known, such as its position and settings, as well as the geometry and physics governing the object or person that is being tracked. Future work is planned to address this.

“I think the biggest problem we're solving here is to reconstruct the information in one domain to another, without very expensive equipment,” says Ma. Such an approach should be “useful for [applications such as the] metaverse, which aims to reconstruct the physical world in a virtual environment," adds Gan. “It is basically an everyday, available solution, that’s neat and simple, to cross domain reconstruction or the inverse dynamics problem,” says Ma.

This research was supported, in part, by the MIT-IBM Watson AI Lab, Nexplore, DARPA Machine Common Sense program, Office of Naval Research (ONR), ONR MURI, and Mitsubishi Electric.

Source