Prescriptive Analytics: When data- and simulation-based models interact in a cooperative way
Abstract:
Prescriptive Analytics is an interdisciplinary topic in an interdisciplinary field, or put another way it is a synergistic hybridisation of various methods and algorithms from statistics, computer science, artificial intelligence, mathematics and operations research. Its aim is to provide optimized recommendations for action in various application areas. In this way, knowledge gained in the digital world is brought back to the real world, providing better and more efficient procedures, designs and processes.
Simulation and data-based models have several advantages and limitations. Simulation-based representations of reality can be very accurate; however, they are often too costly for the purpose of integration into powerful optimization methods. Data-based surrogate models on the other hand can be highly efficient in application – but may not be as accurate. Appropriate hybridizations of these approaches in terms of prescriptive analytics have the potential to support and complement each other and decisively advance research areas such as simulation-based optimization.
The presentation will include methodological research topics that are currently being pursued in the HEAL research group [https://heal.heuristiclab.com/] led by Affenzeller as well as concrete project results that have already found their way into economic and industrial applications.
Invited talk: SAT Solvers and Quantum Computing: Potential and Challenges
Abstract:
SAT or SMT solvers are well-established in conventional computing where they are used to solve a broad variety of problems, e.g., in the design of circuits and systems. With the emerging of quantum computers, it is natural to wonder whether the prospects of SAT-based solutions can also be materialized for this emerging technology. For one thing, many of the design tasks for quantum computing have been proven to be NP-hard—a class of problems for which SAT solvers are rather suited. Then again, quantum computing deals with quantum-mechanical concepts such as superposition and entanglement—easily rendering SAT unfeasible at a first glance. In this talk, we are investigating whether SAT solvers and quantum computing harbor potential; and, if so, what are the challenges in fully exploiting them.
Model checking is a technique that is well established in computer science for verifying the correctness of finite state systems (i.e., models of hardware or software systems with finite state spaces); the verification is performed with respect to specific safety properties such as the absence of runtime errors or with respect of general correctness properties that are typically specified in a variant of temporal logic. The principal advantage of model checking is that it is a fully automatic technique and demonstrates the violation of a property by a concrete countexample run.In this tutorial, we demonstrate how the RISCAL software can be utilized to enjoy similar benefits in a more general context, namely the design and analysis of mathematical theories over discrete domains and the specification and verification of algorithms that solve problems in these theories. RISCAL is a “mathematical model checker” based on a statically typed variant of first-order logic and set theory where all domains have parameterized but finite sizes. This allows to validate (respectively falsify) conjectures and problem solutions in specific small domains, before attempting to verify by proof their general correctness for domains of arbitrary size.We demonstrate these ideas by modeling and analyzing in live demonstrations example problems from various areas of discrete mathematics, algebra, and logic. The RISCAL software is freely available and the demonstration examples can be downloaded from the RISCAL web site (https://www.risc.jku.at/research/formal/software/RISCAL).
ABSTRACT. Due to the inherently wasteful nature of traditional proof of work mechanisms in blockchain networks, researchers are motivated to find alternatives that can make use of the computational power expended in the process. There are a number of such applications, however most of them target very specific problems, and few are proven to work in practice. This paper presents a proof of work mechanism based on matrix computation, serving a much wider range of applications while serving as the consensus mechanism of a blockchain network. I will cover the current research in the proof of useful work area, and focus on Proof of eXercise (PoX), another mechanism based on matrix computation. Proof of Computation (PoC), the consensus mechanism described in this paper, is a redesigned version of PoX. The mechanism is backed by a series of experiments that confirm the functionality of the protocol.
Quantitative Programming and Markov Decision Processes
ABSTRACT. Quantitative programming (or performance evaluation programming) is a programming paradigm, which supports the formal verification of (bounded versions of) concurrent programs by using model checking techniques. By partitioning the state space of programs into bisimulation equivalence classes, this approach enables the formal verification of programs with large state spaces. The paradigm was introduced by us in previous works by developing an experimental concurrent language designed to facilitate the construction of probabilistic models that capture the behavior of programs and can be verified by using probabilistic model checking techniques. The experimental language introduced in previous works is extended in this paper with syntactic constructions which enable the specification of behavioral equivalence classes. Concurrent programs are translated into corresponding probabilistic models, which are analyzed by using the PRISM probabilistic model checker. The programmer identifies bisimulation equivalence classes, in order to enable the formal verification of programs with large state spaces. For the purpose of formal verification, in this paper we employ Markov Decision Processes.
Empirical evaluation of LZW-Compressed Multiple Pattern Matching Algorithms
ABSTRACT. Data compression is used to reduce the cost of storing and transmitting increasingly large datasets
in various domains ranging from bioinformatics to particle physics and general
purpose computing. Most of these areas require the ability to scan the
datasets for patterns like DNA sequences, malicious executable code or various other
string queries, an operation that is hindered by the altered form of the
data. In order to speed up the matching process, several compressed pattern
matching algorithms have been previously proposed. This paper presents an overview of
the state of the art in multiple pattern matching of Lempel-Ziv-Welch encoded
archives, together with experimental results.
Deciding Whether two Codes Have the Same Ambiguities is in co-NP
ABSTRACT. We define a code to be a finite set of words C on
a finite alphabet, and an ambiguity to be an equality between
two words in the monoid C^∗. We recall that a code is uniquely
decipherable if its ambiguities are trivial. In this paper we
construct a finite-turn deterministic pushdown automaton that
recognizes the set of ambiguities of a code. This allows one to
show that whether two codes of the same size have the same
ambiguities is in co-NP.
ABSTRACT. The Z^n lattice is the lattice generated by the set of all orthogonal unit integer vectors. Since it has an orthonormal basis, the shortest vector problem and the closest vector problem are easy to solve in this particular lattice. But, these problems are hard to solve when we consider a rotation of Z^n lattice. In-fact, even though it is known that the Z^n-isomorphism problem in NP \cap Co-NP, we still don't have an efficient algorithm to solve it. Motivated by the above, in this paper we investigate the properties of the bases of Z^n lattice which are the sets of column/row vectors of unimodular matrices. We show that an integer primitive vector of norm strictly greater than 1 can be extended to a unimodular matrix U such that the remaining vectors have norm strictly smaller than the initial primitive vector. We also show a reduction from SVP in any lattice isomorphic to Z^n to SVP in n-1 dimensional sublattice of Z^n. We define two new classes of lattice bases and show certain results related to Z^n bases. Finally, we study the relation between any solution to Successive Minima Problem and the set of Voronoi relevant vectors and present some bounds related to the compact bases of Z^n.
Fully-adaptive Model for Broadcasting with Universal Lists
ABSTRACT. In classical broadcasting, a piece of information must be transmitted to all entities of a network as quickly as possible, starting from a particular member. Since this problem has an enormous number of applications and is proven to be NP-Hard, several models are defined in the literature while trying to simulate real-world situations and relax several constraints. A well-known branch of broadcasting utilizes a universal list throughout the process. That is, once a vertex is informed, it must follow its corresponding list, regardless of the originator and the neighbor it received the message. The problem of broadcasting with universal lists could be categorized into two sub-models: non-adaptive and adaptive. In the latter model, a sender will skip the vertices on its list from which it has received the message, while those vertices will not be skipped in the first model.
In this study, we will present another sub-model, called fully adaptive. Not only does this model benefit from a significantly better space complexity compared to the classical model, but, as will be proved, it is faster than the two other sub-models. Since the suggested model fits real-world network architectures, we will design optimal broadcast algorithms for well-known interconnection networks such as trees, grids, and cube connected cycles under the fully-adaptive model. We also present a tight upper bound for tori under the same model.
ABSTRACT. The Cyber-Threat industry is ever-growing and it is
very likely that malware creators are using generative methods
to create new malware as these algorithms prove to be very
potent. As the majority of researchers in this field are focused
on new methods to generate better adversarial examples (w.r.t.
fidelity, variety or number) just a small portion of them are
concerned with defense methods. This paper explores three
methods of feature selection in the context of adversarial attacks.
These methods aim to reduce the vulnerability of a Multi-Layer
Perceptron to GAN-inflicted attacks by removing features based
on rankings computed by type or by using LIME or F-Score.
Even if no strong conclusion can be drawn, this paper stands as
a Proof-of-Concept that because of good results in some cases,
adversarial feature selection is a worthy exploration path.
Obfuscation Techniques Based on Random Strings Used in Malicious VBA Scripts
ABSTRACT. The script-based attacks are still in trend when it comes to malicious files, the main attack vectors being the documents send to victims as email attachments. Besides the tricks used by the attackers to fool the victim to download and open the received documents, also, the attackers try their best to avoid the detection of security products. The obfuscation of the code is the main technique used in script-based attacks to evade detection.
This paper presents a study on VBA-malware obfuscation techniques that are based on the usage of generated random strings. As this technique is an obvious mark of obfuscated and, further, malicious code, the main concern in malicious code analysis is to identify as any randomly generated identifiers as possible.
On Technical and Legislative Challenges for Blockchain-Focused Fraud Detection
ABSTRACT. In the blockchain forensic research space it is often discussed about the scarcity of anti money laundering techniques and the difficulty of adapting the technologies to the current market demands. In this paper we are taking a closer look at the existent problems and the available solutions, by taking a deeper dive into the current technical and legislative regulations. We conduct a review study based on existent approaches used for fraud detection and we outline the importance of machine learning and artificial intelligence for developing scalable and reliable techniques. The main goal is to provide a stronger focus towards achieving money laundering detection, eventually using one or more approaches to support the collaboration between the financial and technical areas. The problem of financial crimes in blockchain-focused systems is divided between the regulations, policies and laws of the areas representing the two endpoints of a transaction, as well as the technical vulnerabilities or weak aspects of the blockchain which are exploited by cybercriminals.
One side class SVM training methods for malware detection
ABSTRACT. Even though machine learning methods are being used in practice for malware detection, there are still many hurdles to overcome. Nowadays, there are still some problems remaining regarding machine learning for malware detection: having a false positive rate as low as possible, fast classification, low volatile and disk memory usage. Due to these constraints, security solutions often have to rely on simpler models rather than on more complex ones. This paper has the purpose of reducing the false positive rate of SVM models in the context of malware detection.
Building a Machine Learning Model for Malware Detection from Ground Zero
ABSTRACT. I propose a method of identifying which kind of features are the most valuable in terms of creating a machine learning-based model used for malware detection, taking into consideration the development time needed to extract the information and the quality of the extracted data based on the F-score. This paper provides guidance in choosing what type of features new researchers should focus on first for achieving good results in detecting malware whether they need to use a small number of features or do not have any limitations regarding this.
A new class of nonlinear operators and fixed point convergence using K iteration process
ABSTRACT. In this presentation, we will discus about the newly developed class of nonlinear operators whose concept is more general than the concept of nonexpansive operators. We support the idea of the new class of mappings by constructing two numerical examples. By using the K iterative method, we establish several weak and strong fixed point convergence theorems. As an application, we will solve some variational inequalities problems.
Quasi, enriched, and admissible perturbations for some classes of operators in the iterative approximation of fixed points
ABSTRACT. The aim of this paper is threefold:
\begin{enumerate}
\item to present some remarks on the terminology of the operators in fixed point iterative methods;
\item to survey some of the most relevant M\u aru\c ster's results in fixed point theory;
\item to give new convergence results in terms of enriched classes of operators and admissible perturbations of a given operator.
\end{enumerate}
On a M\u aru\s ster problem in the iterative approximations of fixed points for demicontractive operators
ABSTRACT. In this paper, using the concept of demicontractive operators introduced by \c St. M\u aru\c ster we will present a survey of M\u aru\c ster's results and we will give new results for the following problem: in the context of a Hilbert space $H$, under which conditions the operator $T_\lambda(x)=(1-\lambda) x+ \lambda T(x)$ is a weakly Picard operator, where $T:C\subset H\to C$ is a demicontractive operator, $C$ is a nonempty closed and convex subset of $H$ and $\lambda\in (0,1)$.
Kannan and Bianchini fixed point theorems in partially ordered H-distance space
ABSTRACT. In this paper we establish new fixed point theorems for Kannan and Bianchini type mappings in the very general setting of a space with a distance. We will work on a partially ordered set where we suppose there is a H-distance and a non-decreasing mapping. The fixed point is unique under some additional conditions.
Weak Convergence Teorems for Inertial Krasnoselskii-Mann Iterations in the class of enriched nonexpansive operators in Hilbert Spaces
ABSTRACT. In this paper, we present some results about
the aproximation of xed points of nonexpansive
and enriched nonexpansive operators. There are
numerous works in this regard (for example [6], [7],
[9], [10], [14], [16], [35] and references to them). Of
course, the bibliogracal references are extensive
and they are mentioned at the end of this paper. In
order to approximate the xed points of enriched
nonexpansive mappings, we use the Krasnoselskii-
Mann iteration for which we prove weak conver-
gence theorem and the theorem which oers the
convergence rate analysis.
Our results in this paper extend some classical
convergence theorems from the literature from the
case of nonexpansive mappings to that of enriched
nonexpansive mappings. One of our contributions
is that the convergence analysis and rate of conver-
gence results are obtained using conditions which
appear not complicated and restrictive as assumed
in other previous related results in the literature.
ABSTRACT. In this paper, we will first present some applications for the local fixed point theorem for the single-valued Ciric operators, based on the main result of M. Moga(2022). The second purpose is to give a constructive proof of a fixed point theorem for multi-valued Ciric type operators, as well as some applications of a local fixed point theorem for multi-valued Ciric type operators. Furthermore, we study some stability properties for the fixed point problem related to the above mentioned operators. Our results complement and extend some recent works in the literature.
ABSTRACT. In the context of a project aiming to build human-behaving robots for process automation, named entity recognition becomes one of the first tasks we need to solve.
In this paper we present our experience on building NER models for recognizing specific entities of interest, with the help of the state-of-the-art pre-trained BERT model. Noticing that the model built with the help of a general knowledge dataset scores poor results in retrieving entities specific to our particular use cases, we constructed two datasets tailored for our working context and trained BERT-based models on it. We show that properly constructing the specific datasets is sufficient in order to obtain a good entity classification performance, without further increasing the model learning time.
ABSTRACT. The task of document summarization became more pressing as the information volume increased exponentially from news websites to scientific writings. As such, the necessity for tools that automatically summarize written text, while keeping its meaning and extracting relevant information, increased. Extractive summarization is an NLP task that targets the identification of relevant sentences from a document and the creation of a summary with those phrases. While extensive research and large datasets are available in English, Romanian and other low-resource languages lack such methods and corpora. In this work, we introduce a new approach for summarization using a Masked Language Model for assessing sentence importance, and we research several baselines for the Romanian language including K-Means with BERT embeddings, an MLP considering handcrafted features, and PacSum. In addition, we also present an evaluation corpus to be used for assessing current and future models. The unsupervised methods do not require large datasets for training and make use of low computational power. All of the proposed approaches consider BERT, a state-of-the-art Transformer used for generating contextualized embeddings. The obtained ROUGE score of 56.29 is comparable with state-of-the-art scores and the METEOR average of 51.20 supersedes the most advanced current model.
ABSTRACT. In the past decade, social media platforms gained a lot of popularity amongst people all around the globe, some of them seizing this opportunity to proliferate offensive language and hate speech. In addition, platforms that choose not to consider text filtering techniques are being exploited by users who tend to use offensive and abusive language. This paper presents the creation and annotation of a novel Romanian language corpus for offensive language detection, FB-RO-Offense, an offensive speech dataset containing 4,455 organic generated comments from Facebook live broadcasts annotated not only for coarse-grained binary detection tasks but also fine-grained, based on the degree of the offense. We describe the data collection process and the annotation procedure and analyze the content of the corpus. Additionally, we present the results of automatic classification processes using state-of-the-art classification processes and establish a strong baseline for this new dataset including SVM, BERT-based, and CNN architectures, with results that show an F1-score of 0.83 for a four-way classification and an F1-score of 0.90 for the binary classification.
Exploring the potential of prototype-based soft-labels data distillation for imbalanced data classification
ABSTRACT. Dataset distillation aims at synthesizing a dataset by a small number of data items, artificially generated, that are capable to preserve the information needed in order to reproduce or approximate a machine learning (ML) model trained on the entire original dataset. Consequently, data distillation methods are usually tied to a specific ML algorithm. Most of the work reported in the literature deals with the distillation of large collections of images in the context of neural network models. The work reported on tabular data distillation is sparse and mainly aims to address the distillation process from a theoretical perspective. The current paper explores the potential of a simple distillation technique previously proposed in the context of Less-than-one shot learning. The main goal is to push further the performance of prototype-based soft-labels distillation in terms of classification accuracy by incorporating optimization steps in the distillation process. The analysis is performed on real-world data sets presenting various degrees of imbalance. Experimental studies trace the capability of the method to distill the data, but also the opportunity to act as an augmentation method, i.e. to generate new data that is able to increase model accuracy when used in conjunction with the original data set.
Cerebral Metastases Segmentation using Transfer Gliomas Learning and GrabCut
ABSTRACT. Segmentation of medical images is an important area of research that can be used in prognosis prediction and patient treatment. Due to the high variability of data, the task to develop an accurate segmentation method remains challenging. In this paper we address the problem of cerebral metastases segmentation and focus our analysis on the BrainMetShare dataset. In order to enhance the metastases segmentation performance we propose a two stage method. In the first stage we employ a transfer learning procedure where we train an Unet model on the similar task of low and high grade gliomas segmentation provided by the BraTS dataset and then fine-tune the model for solving our problem of cerebral metastases segmentation. In the second stage we use GrabCut to refine the metastases segmentation masks obtained from the first stage. In the experimental evaluation we show that our two stage method based on transfer learning and GrabCut progressively outperforms the baseline model trained only on cerebral metastases data from BrainMetShare.
Astronomical Object Detection Using Artificial Intelligence
ABSTRACT. There are many objects found in astronomicalsurveys or even on the Internet. They are lacking scientificinformation such as: location, ascension, declination, morphology,etc. This research provides a way to acquire more data from abasic formatted picture found in surveys or on the web suchas PNG or JPG. The astronomical objects handled by thisapplication are: galaxies, nebulae or solar systems. Using seg-mentation algorithms, the objects within the picture are obtainedand searched throughout astronomical databases. An algorithmfor image comparison is employed to obtain the results. Thisresults are used to query the databases for all the informationneeded. The proposed algorithm starts with shallow methodssuch as the mean squared error. They will be improved alongthe development process to more advanced, machine learningtechniques such as neural networks. The results obtained fromthe algorithm helps by providing more information and contextto the surveys and the data available on the web.
Game Digger Creating an adaptive recommender system
ABSTRACT. With the rising popularity of indie game developers and the technological advancements of graphics cards, the video game industry is becoming more and more saturated. Dozens of new titles are released every month, and as the number of video games increases, players face another problem: ”What should I play?” The solution proposed in this paper comes from the need on the video game market for a recommendation system that does not take into account limitations such as the used console, hardware or the gaming platform used. At the moment, there are a multitude of recommendation systems, one for each video game platform, unfortunately each of them covers only a part of the multitude of existing video games. Even the most popular recommender systems, which do not take into account the limitations mentioned above, come with a number of disadvantages, the most important ones being: isolating the system into a web page, low accuracy, long waiting times and lack of a personalized experience. All this could be overcome if we integrated such a system with a tool used by video game players and we managed to offer a personalized experience to everyone. The proposed system will be evaluated by a group of users who will give it ratings based on key indices such as: accuracy, performance, impact and so on.
Fixed points of multivalued b-enriched multivalued nonexpansive mappings and *-b-enriched nonexpansive mappings
ABSTRACT. Vasile Berinde in a recent article generalized the nonexpansive mapping into b-enriched nonexpansive mapping.
In particular 0-enriched nonexpansive mapping is natural nonexpansive mapping.
The aimed of this paper is to turned b-enriched nonexpansive mapping into multivalued mapping in two ways:
In the first way, using definition of b-enriched multivalued nonexpansive mapping, provided in a recent article, we provide strong and weak convergence fixed point theorems based on Krasnoselskii iterative process.
In the second way, using definition of *-nonexpansive mapping introduced and studied by Hussain and Latif, b-enriched nonexpansive mapping have been turned into *-b-enriched nonexpansive mapping. For *-b-enriched nonexpansive mapping we provided some fixed point theorems.
Common fixed point theorems for enriched Jungck contractions in convex metric spaces
ABSTRACT. In this work, we introduce common fixed point theorems for two sigle-valued mappings under an enriched contraction condition in convex metric spaces in the sense of Takahashi. The unique common fixed point is approximated using Krasnoselskij interation. Our theorems extend results using the technique of enriched contractions in Banach space for two mappings satisfying a weak commutativity condition.
ABSTRACT. Motivated by the existence of cyclic phenomena in which some characteristics are mapped into corresponding ones over more than one phase, we introduce and investigate the behavior of $r$-cyclic operators with respect to a covering of a metric space. We study the convergence of the Picard iteration to a fixed point of such an operator under different types of generalized contraction conditions. The obtained results may have interesting practical applications in various domains.
ABSTRACT. In the present paper we prove an integral typemetrical fixed point theorem for non-self mappings. The existenceof fixed point is ensure by hypotheses formulated in terms ofequivalent metric spaces. Some illustrative examples are alsofurnished to support the main result.