- Machines Getting with the Program: Understanding Intent Arguments of Non-Canonical Directives Modern dialog managers face the challenge of having to fulfill human-level conversational skills as part of common user expectations, including but not limited to discourse with no clear objective. Along with these requirements, agents are expected to extrapolate intent from the user's dialogue even when subjected to non-canonical forms of speech. This depends on the agent's comprehension of paraphrased forms of such utterances. Especially in low-resource languages, the lack of data is a bottleneck that prevents advancements of the comprehension performance for these types of agents. In this regard, here we demonstrate the necessity of extracting the intent argument of non-canonical directives in a natural language format, which may yield more accurate parsing, and suggest guidelines for building a parallel corpus for this purpose. Following the guidelines, we construct a Korean corpus of 50K instances of question/command-intent pairs, including the labels for classification of the utterance type. We also propose a method for mitigating class imbalance, demonstrating the potential applications of the corpus generation method and its multilingual extensibility. 5 authors · Dec 1, 2019
1 Positive Geometries and Canonical Forms Recent years have seen a surprising connection between the physics of scattering amplitudes and a class of mathematical objects--the positive Grassmannian, positive loop Grassmannians, tree and loop Amplituhedra--which have been loosely referred to as "positive geometries". The connection between the geometry and physics is provided by a unique differential form canonically determined by the property of having logarithmic singularities (only) on all the boundaries of the space, with residues on each boundary given by the canonical form on that boundary. In this paper we initiate an exploration of "positive geometries" and "canonical forms" as objects of study in their own right in a more general mathematical setting. We give a precise definition of positive geometries and canonical forms, introduce general methods for finding forms for more complicated positive geometries from simpler ones, and present numerous examples of positive geometries in projective spaces, Grassmannians, and toric, cluster and flag varieties. We also illustrate a number of strategies for computing canonical forms which yield interesting representations for the forms associated with wide classes of positive geometries, ranging from the simplest Amplituhedra to new expressions for the volume of arbitrary convex polytopes. 3 authors · Mar 13, 2017
- Equivariance with Learned Canonicalization Functions Symmetry-based neural networks often constrain the architecture in order to achieve invariance or equivariance to a group of transformations. In this paper, we propose an alternative that avoids this architectural constraint by learning to produce a canonical representation of the data. These canonicalization functions can readily be plugged into non-equivariant backbone architectures. We offer explicit ways to implement them for many groups of interest. We show that this approach enjoys universality while providing interpretable insights. Our main hypothesis is that learning a neural network to perform canonicalization is better than using predefined heuristics. Our results show that learning the canonicalization function indeed leads to better results and that the approach achieves excellent performance in practice. 5 authors · Nov 11, 2022
1 Manify: A Python Library for Learning Non-Euclidean Representations We present Manify, an open-source Python library for non-Euclidean representation learning. Leveraging manifold learning techniques, Manify provides tools for learning embeddings in (products of) non-Euclidean spaces, performing classification and regression with data that lives in such spaces, and estimating the curvature of a manifold. Manify aims to advance research and applications in machine learning by offering a comprehensive suite of tools for manifold-based data analysis. Our source code, examples, datasets, results, and documentation are available at https://github.com/pchlenski/manify 4 authors · Mar 12 1
- Generating logical magic states with the aid of non-Abelian topological order In fault-tolerant quantum computing with the surface code, non-Clifford gates are crucial for universal computation. However, implementing these gates using methods like magic state distillation and code switching requires significant resources. In this work, we propose a new protocol that combines magic state preparation and code switching to realize logical non-Clifford operations with the potential for fault tolerance. Our approach begins with a special logical state in the Z_4 surface code. By applying a sequence of transformations, the system goes through different topological codes, including the non-Abelian D_4 quantum double model. This process ultimately produces a magic state in a condensed Z_2 surface code, which enables the implementation of a logical T gate in the standard Z_2 surface code. In our analysis, we employ a framework where the topological codes are represented by their topological orders and all the transformations are considered as topological manipulations such as gauging symmetries and condensing anyons. This perspective is particularly useful for understanding code switching between topological codes. 2 authors · Feb 2
- Cusps and Commensurability Classes of Hyperbolic 4-Manifolds There are six orientable, compact, flat 3-manifolds that can occur as cusp cross-sections of hyperbolic 4-manifolds. This paper provides criteria for exactly when a given commensurability class of arithmetic hyperbolic 4-manifolds contains a representative with a given cusp type. In particular, for three of the six cusp types, we provide infinitely many examples of commensurability classes that contain no manifolds with cusps of the given type; no such examples were previously known for any cusp type. 1 authors · Sep 24, 2021
- Table2answer: Read the database and answer without SQL Semantic parsing is the task of mapping natural language to logic form. In question answering, semantic parsing can be used to map the question to logic form and execute the logic form to get the answer. One key problem for semantic parsing is the hard label work. We study this problem in another way: we do not use the logic form any more. Instead we only use the schema and answer info. We think that the logic form step can be injected into the deep model. The reason why we think removing the logic form step is possible is that human can do the task without explicit logic form. We use BERT-based model and do the experiment in the WikiSQL dataset, which is a large natural language to SQL dataset. Our experimental evaluations that show that our model can achieves the baseline results in WikiSQL dataset. 2 authors · Feb 12, 2019
- Adposition and Case Supersenses v2.6: Guidelines for English This document offers a detailed linguistic description of SNACS (Semantic Network of Adposition and Case Supersenses; Schneider et al., 2018), an inventory of 52 semantic labels ("supersenses") that characterize the use of adpositions and case markers at a somewhat coarse level of granularity, as demonstrated in the STREUSLE corpus (https://github.com/nert-nlp/streusle/ ; version 4.5 tracks guidelines version 2.6). Though the SNACS inventory aspires to be universal, this document is specific to English; documentation for other languages will be published separately. Version 2 is a revision of the supersense inventory proposed for English by Schneider et al. (2015, 2016) (henceforth "v1"), which in turn was based on previous schemes. The present inventory was developed after extensive review of the v1 corpus annotations for English, plus previously unanalyzed genitive case possessives (Blodgett and Schneider, 2018), as well as consideration of adposition and case phenomena in Hebrew, Hindi, Korean, and German. Hwang et al. (2017) present the theoretical underpinnings of the v2 scheme. Schneider et al. (2018) summarize the scheme, its application to English corpus data, and an automatic disambiguation task. Liu et al. (2021) offer an English Lexical Semantic Recognition tagger that includes SNACS labels in its output. This documentation can also be browsed alongside corpus data on the Xposition website (Gessler et al., 2022): http://www.xposition.org/ 11 authors · Apr 7, 2017
1 Beyond Natural Language: LLMs Leveraging Alternative Formats for Enhanced Reasoning and Communication Natural language (NL) has long been the predominant format for human cognition and communication, and by extension, has been similarly pivotal in the development and application of Large Language Models (LLMs). Yet, besides NL, LLMs have seen various non-NL formats during pre-training, such as code and logical expression. NL's status as the optimal format for LLMs, particularly in single-LLM reasoning and multi-agent communication, has not been thoroughly examined. In this work, we challenge the default use of NL by exploring the utility of non-NL formats in these contexts. We show that allowing LLMs to autonomously select the most suitable format before reasoning or communicating leads to a 3.3 to 5.7\% improvement in reasoning efficiency for different LLMs, and up to a 72.7\% reduction in token usage in multi-agent communication, all while maintaining communicative effectiveness. Our comprehensive analysis further reveals that LLMs can devise a format from limited task instructions and that the devised format is effectively transferable across different LLMs. Intriguingly, the structured communication format decided by LLMs exhibits notable parallels with established agent communication languages, suggesting a natural evolution towards efficient, structured communication in agent communication. Our code is released at https://github.com/thunlp/AutoForm. 9 authors · Feb 28, 2024
3 ParaNames 1.0: Creating an Entity Name Corpus for 400+ Languages using Wikidata We introduce ParaNames, a massively multilingual parallel name resource consisting of 140 million names spanning over 400 languages. Names are provided for 16.8 million entities, and each entity is mapped from a complex type hierarchy to a standard type (PER/LOC/ORG). Using Wikidata as a source, we create the largest resource of this type to date. We describe our approach to filtering and standardizing the data to provide the best quality possible. ParaNames is useful for multilingual language processing, both in defining tasks for name translation/transliteration and as supplementary data for tasks such as named entity recognition and linking. We demonstrate the usefulness of ParaNames on two tasks. First, we perform canonical name translation between English and 17 other languages. Second, we use it as a gazetteer for multilingual named entity recognition, obtaining performance improvements on all 10 languages evaluated. 2 authors · May 15, 2024
1 Faster Algorithms for Structured Matrix Multiplication via Flip Graph Search We give explicit low-rank bilinear non-commutative schemes for multiplying structured n times n matrices with 2 leq n leq 5, which serve as building blocks for recursive algorithms with improved multiplicative factors in asymptotic complexity. Our schemes are discovered over F_2 or F_3 and lifted to Z or Q. Using a flip graph search over tensor decompositions, we derive schemes for general, upper-triangular, lower-triangular, symmetric, and skew-symmetric inputs, as well as products of a structured matrix with its transpose. In particular, we obtain 4 times 4 rank-34 schemes: (i) multiplying a general matrix by its transpose using 10 recursive calls, improving the factor from 26/41 (0.634) to 8/13 (0.615); and (ii) multiplying an upper-triangular matrix by a general matrix using 12 recursive calls, improving the factor from 8/13 (0.615) to 22/37 (0.595). Additionally, using F_3 flip graphs, we discover schemes over Q that fundamentally require the inverse of 2, including a 2 times 2 symmetric-symmetric multiplication of rank 5 and a 3 times 3 skew-symmetric-general multiplication of rank 14 (improving upon AlphaTensor's 15). 3 authors · Nov 13
- Linking Surface Facts to Large-Scale Knowledge Graphs Open Information Extraction (OIE) methods extract facts from natural language text in the form of ("subject"; "relation"; "object") triples. These facts are, however, merely surface forms, the ambiguity of which impedes their downstream usage; e.g., the surface phrase "Michael Jordan" may refer to either the former basketball player or the university professor. Knowledge Graphs (KGs), on the other hand, contain facts in a canonical (i.e., unambiguous) form, but their coverage is limited by a static schema (i.e., a fixed set of entities and predicates). To bridge this gap, we need the best of both worlds: (i) high coverage of free-text OIEs, and (ii) semantic precision (i.e., monosemy) of KGs. In order to achieve this goal, we propose a new benchmark with novel evaluation protocols that can, for example, measure fact linking performance on a granular triple slot level, while also measuring if a system has the ability to recognize that a surface form has no match in the existing KG. Our extensive evaluation of several baselines show that detection of out-of-KG entities and predicates is more difficult than accurate linking to existing ones, thus calling for more research efforts on this difficult task. We publicly release all resources (data, benchmark and code) on https://github.com/nec-research/fact-linking. 5 authors · Oct 23, 2023
- ParaNames: A Massively Multilingual Entity Name Corpus We introduce ParaNames, a multilingual parallel name resource consisting of 118 million names spanning across 400 languages. Names are provided for 13.6 million entities which are mapped to standardized entity types (PER/LOC/ORG). Using Wikidata as a source, we create the largest resource of this type to-date. We describe our approach to filtering and standardizing the data to provide the best quality possible. ParaNames is useful for multilingual language processing, both in defining tasks for name translation/transliteration and as supplementary data for tasks such as named entity recognition and linking. We demonstrate an application of ParaNames by training a multilingual model for canonical name translation to and from English. Our resource is released under a Creative Commons license (CC BY 4.0) at https://github.com/bltlab/paranames. 2 authors · Feb 28, 2022
- Denotationally Correct, Purely Functional, Efficient Reverse-mode Automatic Differentiation Reverse-mode differentiation is used for optimization, but it introduces references, which break the purity of the underlying programs, making them notoriously harder to optimize. We present a reverse-mode differentiation on a purely functional language with array operations. It is the first one to deliver a provably efficient, purely functional, and denotationally correct reverse-mode differentiation. We show that our transformation is semantically correct and verifies the cheap gradient principle. Inspired by PROPs and compilation to categories, we introduce a novel intermediate representation that we call 'unary form'. Our reverse-mode transformation is factored as a compilation scheme through this intermediate representation. We obtain provably efficient gradients by performing general partial evaluation optimizations after our reverse-mode transformation, as opposed to manually derived ones. For simple first-order programs, the obtained output programs resemble static-single-assignment (SSA) code. We emphasize the modularity of our approach and show how our language can easily be enriched with more optimized primitives, as required for some speed-ups in practice. 2 authors · Dec 19, 2022
1 Learning Naturally Aggregated Appearance for Efficient 3D Editing Neural radiance fields, which represent a 3D scene as a color field and a density field, have demonstrated great progress in novel view synthesis yet are unfavorable for editing due to the implicitness. In view of such a deficiency, we propose to replace the color field with an explicit 2D appearance aggregation, also called canonical image, with which users can easily customize their 3D editing via 2D image processing. To avoid the distortion effect and facilitate convenient editing, we complement the canonical image with a projection field that maps 3D points onto 2D pixels for texture lookup. This field is carefully initialized with a pseudo canonical camera model and optimized with offset regularity to ensure naturalness of the aggregated appearance. Extensive experimental results on three datasets suggest that our representation, dubbed AGAP, well supports various ways of 3D editing (e.g., stylization, interactive drawing, and content extraction) with no need of re-optimization for each case, demonstrating its generalizability and efficiency. Project page is available at https://felixcheng97.github.io/AGAP/. 8 authors · Dec 11, 2023
13 Model Editing with Canonical Examples We introduce model editing with canonical examples, a setting in which (1) a single learning example is provided per desired behavior, (2) evaluation is performed exclusively out-of-distribution, and (3) deviation from an initial model is strictly limited. A canonical example is a simple instance of good behavior, e.g., The capital of Mauritius is Port Louis) or bad behavior, e.g., An aspect of researchers is coldhearted). The evaluation set contains more complex examples of each behavior (like a paragraph in which the capital of Mauritius is called for.) We create three datasets and modify three more for model editing with canonical examples, covering knowledge-intensive improvements, social bias mitigation, and syntactic edge cases. In our experiments on Pythia language models, we find that LoRA outperforms full finetuning and MEMIT. We then turn to the Backpack language model architecture because it is intended to enable targeted improvement. The Backpack defines a large bank of sense vectors--a decomposition of the different uses of each word--which are weighted and summed to form the output logits of the model. We propose sense finetuning, which selects and finetunes a few (approx 10) sense vectors for each canonical example, and find that it outperforms other finetuning methods, e.g., 4.8% improvement vs 0.3%. Finally, we improve GPT-J-6B by an inference-time ensemble with just the changes from sense finetuning of a 35x smaller Backpack, in one setting outperforming editing GPT-J itself (4.1% vs 1.0%). 6 authors · Feb 8, 2024 1
- CanonicalFusion: Generating Drivable 3D Human Avatars from Multiple Images We present a novel framework for reconstructing animatable human avatars from multiple images, termed CanonicalFusion. Our central concept involves integrating individual reconstruction results into the canonical space. To be specific, we first predict Linear Blend Skinning (LBS) weight maps and depth maps using a shared-encoder-dual-decoder network, enabling direct canonicalization of the 3D mesh from the predicted depth maps. Here, instead of predicting high-dimensional skinning weights, we infer compressed skinning weights, i.e., 3-dimensional vector, with the aid of pre-trained MLP networks. We also introduce a forward skinning-based differentiable rendering scheme to merge the reconstructed results from multiple images. This scheme refines the initial mesh by reposing the canonical mesh via the forward skinning and by minimizing photometric and geometric errors between the rendered and the predicted results. Our optimization scheme considers the position and color of vertices as well as the joint angles for each image, thereby mitigating the negative effects of pose errors. We conduct extensive experiments to demonstrate the effectiveness of our method and compare our CanonicalFusion with state-of-the-art methods. Our source codes are available at https://github.com/jsshin98/CanonicalFusion. 7 authors · Jul 5, 2024
- Sāmayik: A Benchmark and Dataset for English-Sanskrit Translation We release S\={a}mayik, a dataset of around 53,000 parallel English-Sanskrit sentences, written in contemporary prose. Sanskrit is a classical language still in sustenance and has a rich documented heritage. However, due to the limited availability of digitized content, it still remains a low-resource language. Existing Sanskrit corpora, whether monolingual or bilingual, have predominantly focused on poetry and offer limited coverage of contemporary written materials. S\={a}mayik is curated from a diverse range of domains, including language instruction material, textual teaching pedagogy, and online tutorials, among others. It stands out as a unique resource that specifically caters to the contemporary usage of Sanskrit, with a primary emphasis on prose writing. Translation models trained on our dataset demonstrate statistically significant improvements when translating out-of-domain contemporary corpora, outperforming models trained on older classical-era poetry datasets. Finally, we also release benchmark models by adapting four multilingual pre-trained models, three of them have not been previously exposed to Sanskrit for translating between English and Sanskrit while one of them is multi-lingual pre-trained translation model including English and Sanskrit. The dataset and source code is present at https://github.com/ayushbits/saamayik. 7 authors · May 23, 2023
- Flat matrix models for quantum permutation groups We study the matrix models pi:C(S_N^+)to M_N(C(X)) which are flat, in the sense that the standard generators of C(S_N^+) are mapped to rank 1 projections. Our first result is a generalization of the Pauli matrix construction at N=4, using finite groups and 2-cocycles. Our second result is the construction of a universal representation of C(S_N^+), inspired from the Sinkhorn algorithm, that we conjecture to be inner faithful. 2 authors · Feb 14, 2016
- Nuclear Structure with Discrete Non-Orthogonal Shell-Model : new frontiers We present developments and applications for the diagonalization of shell-model hamiltonians in a discrete non-orthogonal basis (DNO-SM). The method, and its actual numerical implementation CARINA, based on mean-field and beyond-mean field techniques has already been applied in previous studies and is focused on basis states selection optimization. The method is benchmarked against a full set of sd shell exact diagonalizations, and is applied for the first time to the heavy deformed ^{254}No nucleus. 2 authors · Mar 2, 2022
- Mass-Radius Relationships for Solid Exoplanets We use new interior models of cold planets to investigate the mass-radius relationships of solid exoplanets, considering planets made primarily of iron, silicates, water, and carbon compounds. We find that the mass-radius relationships for cold terrestrial-mass planets of all compositions we considered follow a generic functional form that is not a simple power law: log_{10} R_s = k_1 + 1/3 log_{10}(M_s) - k_2 M_s^{k_3} for up to M_p approx 20 M_{oplus}, where M_s and R_s are scaled mass and radius values. This functional form arises because the common building blocks of solid planets all have equations of state that are well approximated by a modified polytrope of the form rho = rho_0 + c P^n. We find that highly detailed planet interior models, including temperature structure and phase changes, are not necessary to derive solid exoplanet bulk composition from mass and radius measurements. For solid exoplanets with no substantial atmosphere we have also found that: with 5% fractional uncertainty in planet mass and radius it is possible to distinguish among planets composed predominantly of iron or silicates or water ice but not more detailed compositions; with sim~5% uncertainty water ice planets with gtrsim 25% water by mass may be identified; the minimum plausible planet size for a given mass is that of a pure iron planet; and carbon planet mass-radius relationships overlap with those of silicate and water planets due to similar zero-pressure densities and equations of state. We propose a definition of "super Earths'' based on the clear distinction in radii between planets with significant gas envelopes and those without. 4 authors · Jul 19, 2007
1 Higher-Order DisCoCat (Peirce-Lambek-Montague semantics) We propose a new definition of higher-order DisCoCat (categorical compositional distributional) models where the meaning of a word is not a diagram, but a diagram-valued higher-order function. Our models can be seen as a variant of Montague semantics based on a lambda calculus where the primitives act on string diagrams rather than logical formulae. As a special case, we show how to translate from the Lambek calculus into Peirce's system beta for first-order logic. This allows us to give a purely diagrammatic treatment of higher-order and non-linear processes in natural language semantics: adverbs, prepositions, negation and quantifiers. The theoretical definition presented in this article comes with a proof-of-concept implementation in DisCoPy, the Python library for string diagrams. 2 authors · Nov 29, 2023
- On weakly Einstein Kähler surfaces Riemannian four-manifolds in which the triple contraction of the curvature tensor against itself yields a functional multiple of the metric are called weakly Einstein. We focus on weakly Einstein K\"ahler surfaces. We provide several conditions characterizing those K\"ahler surfaces which are weakly Einstein, classify weakly Einstein K\"ahler surfaces having some specific additional properties, and construct new examples. 4 authors · Dec 31, 2024
- Non-Perturbative Hamiltonian and Higher Loop Corrections in USR Inflation Calculating the action and the interaction Hamiltonian at higher orders in cosmological perturbation theory is a cumbersome task. We employ the formalism of EFT of inflation in models of single field ultra slow-roll inflation and obtain a non-perturbative result for the Hamiltonian in terms of the Goldstone field pi. To complete the dictionary, a non-linear relation between the curvature perturbations and pi is presented. Equipped with these non-linear results, we calculate the higher order loop corrections in USR models which are employed for PBHs formation. It is shown that the loop corrections on long CMB scales increase rapidly with the number of loop L and the setup will go out of perturbative control at the four-loop level. 2 authors · Feb 13
- The Virtual Large Cardinal Hierarchy We continue the study of the virtual large cardinal hierarchy by analysing virtual versions of superstrong, Woodin, and Berkeley cardinals. Gitman and Schindler showed that virtualizations of strong and supercompact cardinals yield the same large cardinal notion. We provide various equivalent characterizations of virtually Woodin cardinals, including showing that On is virtually Woodin if and only if for every class A, there is a proper class of virtually A-extendible cardinals. We introduce the virtual Vopenka principle for finite languages and show that it is not equivalent to the virtual Vopenka principle (although the two principles are equiconsistent), but is equivalent to the assertion that On is virtually pre-Woodin, a weakening of virtually Woodin, which is equivalent to having for every class A, a weakly virtually A-extendible cardinal. We show that if there are no virtually Berkeley cardinals, then On is virtually Woodin if and only if On is virtually pre-Woodin (if and only if the virtual Vopenka principle for finite languages holds). In particular, if the virtual Vopenka principle holds and On is not Mahlo, then On is not virtually Woodin, and hence there is a virtually Berkeley cardinal. 3 authors · Sep 13, 2021
- An elementary and unified proof of Grothendieck's inequality We present an elementary, self-contained proof of Grothendieck's inequality that unifies the real and complex cases and yields both the Krivine and Haagerup bounds, the current best-known explicit bounds for the real and complex Grothendieck constants respectively. This article is intended to be pedagogical, combining and streamlining known ideas of Lindenstrauss--Pe{\l}czy\'nski, Krivine, and Haagerup into a proof that need only univariate calculus, basic complex variables, and a modicum of linear algebra as prerequisites. 3 authors · Nov 28, 2017
- Who Said Neural Networks Aren't Linear? Neural networks are famously nonlinear. However, linearity is defined relative to a pair of vector spaces, f:XtoY. Is it possible to identify a pair of non-standard vector spaces for which a conventionally nonlinear function is, in fact, linear? This paper introduces a method that makes such vector spaces explicit by construction. We find that if we sandwich a linear operator A between two invertible neural networks, f(x)=g_y^{-1}(A g_x(x)), then the corresponding vector spaces X and Y are induced by newly defined addition and scaling actions derived from g_x and g_y. We term this kind of architecture a Linearizer. This framework makes the entire arsenal of linear algebra, including SVD, pseudo-inverse, orthogonal projection and more, applicable to nonlinear mappings. Furthermore, we show that the composition of two Linearizers that share a neural network is also a Linearizer. We leverage this property and demonstrate that training diffusion models using our architecture makes the hundreds of sampling steps collapse into a single step. We further utilize our framework to enforce idempotency (i.e. f(f(x))=f(x)) on networks leading to a globally projective generative model and to demonstrate modular style transfer. 3 authors · Oct 9
11 TeenyTinyLlama: open-source tiny language models trained in Brazilian Portuguese Large language models (LLMs) have significantly advanced natural language processing, but their progress has yet to be equal across languages. While most LLMs are trained in high-resource languages like English, multilingual models generally underperform monolingual ones. Additionally, aspects of their multilingual foundation sometimes restrict the byproducts they produce, like computational demands and licensing regimes. In this study, we document the development of open-foundation models tailored for use in low-resource settings, their limitations, and their benefits. This is the TeenyTinyLlama pair: two compact models for Brazilian Portuguese text generation. We release them under the permissive Apache 2.0 license on GitHub and Hugging Face for community use and further development. See https://github.com/Nkluge-correa/TeenyTinyLlama 5 authors · Jan 29, 2024 4
- Locally resolvable BIBDs and generalized quadrangles with ovoids In this note we establish a 1-to-1 correspondence between the class of generalized quadrangles with ovoids and the class of balanced incomplete block designs that posses a non-triangular local resolution system and have the appropriate parameters. We present a non-triangular local resolution system for a difference family BIBD construction of Sprott. 1 authors · Aug 1, 2024
1 Multiresolution Textual Inversion We extend Textual Inversion to learn pseudo-words that represent a concept at different resolutions. This allows us to generate images that use the concept with different levels of detail and also to manipulate different resolutions using language. Once learned, the user can generate images at different levels of agreement to the original concept; "A photo of S^*(0)" produces the exact object while the prompt "A photo of S^*(0.8)" only matches the rough outlines and colors. Our framework allows us to generate images that use different resolutions of an image (e.g. details, textures, styles) as separate pseudo-words that can be composed in various ways. We open-soure our code in the following URL: https://github.com/giannisdaras/multires_textual_inversion 2 authors · Nov 30, 2022
- Householder Projector for Unsupervised Latent Semantics Discovery Generative Adversarial Networks (GANs), especially the recent style-based generators (StyleGANs), have versatile semantics in the structured latent space. Latent semantics discovery methods emerge to move around the latent code such that only one factor varies during the traversal. Recently, an unsupervised method proposed a promising direction to directly use the eigenvectors of the projection matrix that maps latent codes to features as the interpretable directions. However, one overlooked fact is that the projection matrix is non-orthogonal and the number of eigenvectors is too large. The non-orthogonality would entangle semantic attributes in the top few eigenvectors, and the large dimensionality might result in meaningless variations among the directions even if the matrix is orthogonal. To avoid these issues, we propose Householder Projector, a flexible and general low-rank orthogonal matrix representation based on Householder transformations, to parameterize the projection matrix. The orthogonality guarantees that the eigenvectors correspond to disentangled interpretable semantics, while the low-rank property encourages that each identified direction has meaningful variations. We integrate our projector into pre-trained StyleGAN2/StyleGAN3 and evaluate the models on several benchmarks. Within only 1% of the original training steps for fine-tuning, our projector helps StyleGANs to discover more disentangled and precise semantic attributes without sacrificing image fidelity. 4 authors · Jul 16, 2023
- Correlation functions of degenerate fields in Super-Liouville field theory We study four-point correlation functions of degenerated fields in the NS sector in Super-Liouville field theory. We find integral expressions for these functions using the BPZ equation, and study some superconformal properties of these solutions. Finally, we present the general form for three-point correlation functions. 1 authors · Feb 17
4 Local Scale Equivariance with Latent Deep Equilibrium Canonicalizer Scale variation is a fundamental challenge in computer vision. Objects of the same class can have different sizes, and their perceived size is further affected by the distance from the camera. These variations are local to the objects, i.e., different object sizes may change differently within the same image. To effectively handle scale variations, we present a deep equilibrium canonicalizer (DEC) to improve the local scale equivariance of a model. DEC can be easily incorporated into existing network architectures and can be adapted to a pre-trained model. Notably, we show that on the competitive ImageNet benchmark, DEC improves both model performance and local scale consistency across four popular pre-trained deep-nets, e.g., ViT, DeiT, Swin, and BEiT. Our code is available at https://github.com/ashiq24/local-scale-equivariance. 7 authors · Aug 19 4
- A Graph is Worth K Words: Euclideanizing Graph using Pure Transformer Can we model non-Euclidean graphs as pure language or even Euclidean vectors while retaining their inherent information? The non-Euclidean property have posed a long term challenge in graph modeling. Despite recent GNN and Graphformer efforts encoding graphs as Euclidean vectors, recovering original graph from the vectors remains a challenge. We introduce GraphsGPT, featuring a Graph2Seq encoder that transforms non-Euclidean graphs into learnable graph words in a Euclidean space, along with a GraphGPT decoder that reconstructs the original graph from graph words to ensure information equivalence. We pretrain GraphsGPT on 100M molecules and yield some interesting findings: (1) Pretrained Graph2Seq excels in graph representation learning, achieving state-of-the-art results on 8/9 graph classification and regression tasks. (2) Pretrained GraphGPT serves as a strong graph generator, demonstrated by its ability to perform both unconditional and conditional graph generation. (3) Graph2Seq+GraphGPT enables effective graph mixup in the Euclidean space, overcoming previously known non-Euclidean challenge. (4) Our proposed novel edge-centric GPT pretraining task is effective in graph fields, underscoring its success in both representation and generation. 6 authors · Feb 4, 2024
- Orb-v3: atomistic simulation at scale We introduce Orb-v3, the next generation of the Orb family of universal interatomic potentials. Models in this family expand the performance-speed-memory Pareto frontier, offering near SoTA performance across a range of evaluations with a >10x reduction in latency and > 8x reduction in memory. Our experiments systematically traverse this frontier, charting the trade-off induced by roto-equivariance, conservatism and graph sparsity. Contrary to recent literature, we find that non-equivariant, non-conservative architectures can accurately model physical properties, including those which require higher-order derivatives of the potential energy surface. This model release is guided by the principle that the most valuable foundation models for atomic simulation will excel on all fronts: accuracy, latency and system size scalability. The reward for doing so is a new era of computational chemistry driven by high-throughput and mesoscale all-atom simulations. 7 authors · Apr 8
- An extended Kodaira Spencer functional This note is about an extension of the Kodaira-Spencer functional to Calabi-Yau manifolds of any dimension. 1 authors · Jun 16, 2021
- Superposition for Lambda-Free Higher-Order Logic We introduce refutationally complete superposition calculi for intentional and extensional clausal lambda-free higher-order logic, two formalisms that allow partial application and applied variables. The calculi are parameterized by a term order that need not be fully monotonic, making it possible to employ the lambda-free higher-order lexicographic path and Knuth-Bendix orders. We implemented the calculi in the Zipperposition prover and evaluated them on Isabelle/HOL and TPTP benchmarks. They appear promising as a stepping stone towards complete, highly efficient automatic theorem provers for full higher-order logic. 4 authors · May 5, 2020
- The complex evolution of supermassive black holes in cosmological simulations We present here self-consistent zoom-in simulations of massive galaxies forming in a full cosmological setting. The simulations are run with an updated version of the KETJU code, which is able to resolve the gravitational dynamics of their supermassive black holes, while simultaneously modelling the large-scale astrophysical processes in the surrounding galaxies, such as gas cooling, star formation and stellar and AGN feedback. The KETJU code is able to accurately model the complex behaviour of multiple SMBHs, including dynamical friction, stellar scattering and gravitational wave emission, and also to resolve Lidov-Kozai oscillations that naturally occur in hierarchical triplet SMBH systems. In general most of the SMBH binaries form at moderately high eccentricities, with typical values in the range of e =0.6-0.95, meaning that the circular binary models that are commonly used in the literature are insufficient for capturing the typical binary evolution. 7 authors · Mar 23, 2022
- On κ-solutions and canonical neighborhoods in 4d Ricci flow We introduce a classification conjecture for kappa-solutions in 4d Ricci flow. Our conjectured list includes known examples from the literature, but also a new 1-parameter family of Z_2^2times O_3-symmetric bubble-sheet ovals that we construct. We observe that some special cases of the conjecture follow from recent results in the literature. We also introduce a stronger variant of the classification conjecture for ancient asymptotically cylindrical 4d Ricci flows, which does not assume smoothness and nonnegative curvature operator a priori. Assuming this stronger variant holds true, we establish a canonical neighborhood theorem for 4d Ricci flow through cylindrical singularities, which shares some elements in common with Perelman's canonical neighborhood theorem for 3d Ricci flow as well as the mean-convex neighborhood theorem for mean curvature flow through neck-singularities. Finally, we argue that quotient-necks lead to new phenomena, and sketch an example of non-uniqueness for 4d Ricci flow through singularities. 1 authors · Aug 2, 2023
- A comparison between higher-order nonclassicalities of superposition engineered coherent and thermal states We consider an experimentally obtainable SUP operator, defined by using a generalized superposition of products of field annihilation (a) and creation (a^dagger) operators of the type, A = saa^dagger+t{a^dagger}a with s^2+t^2=1. We apply this SUP operator on coherent and thermal quantum states, the states thus produced are referred as SUP-operated coherent state (SOCS) and SUP-operated thermal state (SOTS), respectively. In the present work, we report a comparative study between the higher-order nonclassical properties of SOCS and SOTS. The comparison is performed by using a set of nonclassicality witnesses (e.g., higher-order antiubunching, higher-order sub-Poissonian photon statistics, higher-order squeezing, Agarwal-Tara parameter, Klyshko's condition). The existence of higher-order nonclassicalities in SOCS and SOTS have been investigated for the first time. In view of possible experimental verification of the proposed scheme, we present exact calculations to reveal the effect of non-unit quantum efficiency of quantum detector on higher-order nonclassicalities. 2 authors · Apr 13, 2022
- Non-local Neural Networks Both convolutional and recurrent operations are building blocks that process one local neighborhood at a time. In this paper, we present non-local operations as a generic family of building blocks for capturing long-range dependencies. Inspired by the classical non-local means method in computer vision, our non-local operation computes the response at a position as a weighted sum of the features at all positions. This building block can be plugged into many computer vision architectures. On the task of video classification, even without any bells and whistles, our non-local models can compete or outperform current competition winners on both Kinetics and Charades datasets. In static image recognition, our non-local models improve object detection/segmentation and pose estimation on the COCO suite of tasks. Code is available at https://github.com/facebookresearch/video-nonlocal-net . 4 authors · Nov 21, 2017
- Pair State Transfer Let L denote the Laplacian matrix of a graph G. We study continuous quantum walks on G defined by the transition matrix U(t)=expleft(itLright). The initial state is of the pair state form, e_a-e_b with a,b being any two vertices of G. We provide two ways to construct infinite families of graphs that have perfect pair transfer. We study a "transitivity" phenomenon which cannot occur in vertex state transfer. We characterize perfect pair state transfer on paths and cycles. We also study the case when quantum walks are generated by the unsigned Laplacians of underlying graphs and the initial states are of the plus state form, e_a+e_b. When the underlying graphs are bipartite, plus state transfer is equivalent to pair state transfer. 2 authors · Jun 4, 2019
- Categorical Schrödinger Bridge Matching The Schr\"odinger Bridge (SB) is a powerful framework for solving generative modeling tasks such as unpaired domain translation. Most SB-related research focuses on continuous data space R^{D} and leaves open theoretical and algorithmic questions about applying SB methods to discrete data, e.g, on finite spaces S^{D}. Notable examples of such sets S are codebooks of vector-quantized (VQ) representations of modern autoencoders, tokens in texts, categories of atoms in molecules, etc. In this paper, we provide a theoretical and algorithmic foundation for solving SB in discrete spaces using the recently introduced Iterative Markovian Fitting (IMF) procedure. Specifically, we theoretically justify the convergence of discrete-time IMF (D-IMF) to SB in discrete spaces. This enables us to develop a practical computational algorithm for SB which we call Categorical Schr\"odinger Bridge Matching (CSBM). We show the performance of CSBM via a series of experiments with synthetic data and VQ representations of images. 2 authors · Feb 3
- Surface codes: Towards practical large-scale quantum computation This article provides an introduction to surface code quantum computing. We first estimate the size and speed of a surface code quantum computer. We then introduce the concept of the stabilizer, using two qubits, and extend this concept to stabilizers acting on a two-dimensional array of physical qubits, on which we implement the surface code. We next describe how logical qubits are formed in the surface code array and give numerical estimates of their fault-tolerance. We outline how logical qubits are physically moved on the array, how qubit braid transformations are constructed, and how a braid between two logical qubits is equivalent to a controlled-NOT. We then describe the single-qubit Hadamard, S and T operators, completing the set of required gates for a universal quantum computer. We conclude by briefly discussing physical implementations of the surface code. We include a number of appendices in which we provide supplementary information to the main text. 4 authors · Aug 4, 2012
- Eigenvalues restricted by Lyapunov exponent of eigenstates We point out that the Lyapunov exponent of the eigenstate places restrictions on the eigenvalue. Consequently, with regard to non-Hermitian systems, even without any symmetry, the non-conservative Hamiltonians can exhibit real spectra as long as Lyapunov exponents of eigenstates inhibit imaginary parts of eigenvalues. Our findings open up a new route to study non-Hermitian physics. 2 authors · Jun 20, 2022
- Punctual Hilbert Schemes and Certified Approximate Singularities In this paper we provide a new method to certify that a nearby polynomial system has a singular isolated root with a prescribed multiplicity structure. More precisely, given a polynomial system f =(f_1, ldots, f_N)in C[x_1, ldots, x_n]^N, we present a Newton iteration on an extended deflated system that locally converges, under regularity conditions, to a small deformation of f such that this deformed system has an exact singular root. The iteration simultaneously converges to the coordinates of the singular root and the coefficients of the so called inverse system that describes the multiplicity structure at the root. We use $alpha$-theory test to certify the quadratic convergence, and togive bounds on the size of the deformation and on the approximation error. The approach relies on an analysis of the punctual Hilbert scheme, for which we provide a new description. We show in particular that some of its strata can be rationally parametrized and exploit these parametrizations in the certification. We show in numerical experimentation how the approximate inverse system can be computed as a starting point of the Newton iterations and the fast numerical convergence to the singular root with its multiplicity structure, certified by our criteria. 3 authors · Feb 14, 2020
- Normalization of Lithuanian Text Using Regular Expressions Text Normalization is an integral part of any text-to-speech synthesis system. In a natural language text, there are elements such as numbers, dates, abbreviations, etc. that belong to other semiotic classes. They are called non-standard words (NSW) and need to be expanded into ordinary words. For this purpose, it is necessary to identify the semiotic class of each NSW. The taxonomy of semiotic classes adapted to the Lithuanian language is presented in the work. Sets of rules are created for detecting and expanding NSWs based on regular expressions. Experiments with three completely different data sets were performed and the accuracy was assessed. Causes of errors are explained and recommendations are given for the development of text normalization rules. 1 authors · Dec 29, 2023
- Bimonoidal Structure of Probability Monads We give a conceptual treatment of the notion of joints, marginals, and independence in the setting of categorical probability. This is achieved by endowing the usual probability monads (like the Giry monad) with a monoidal and an opmonoidal structure, mutually compatible (i.e. a bimonoidal structure). If the underlying monoidal category is cartesian monoidal, a bimonoidal structure is given uniquely by a commutative strength. However, if the underlying monoidal category is not cartesian monoidal, a strength is not enough to guarantee all the desired properties of joints and marginals. A bimonoidal structure is then the correct requirement for the more general case. We explain the theory and the operational interpretation, with the help of the graphical calculus for monoidal categories. We give a definition of stochastic independence based on the bimonoidal structure, compatible with the intuition and with other approaches in the literature for cartesian monoidal categories. We then show as an example that the Kantorovich monad on the category of complete metric spaces is a bimonoidal monad for a non-cartesian monoidal structure. 2 authors · Apr 10, 2018
- Neural Generation for Czech: Data and Baselines We present the first dataset targeted at end-to-end NLG in Czech in the restaurant domain, along with several strong baseline models using the sequence-to-sequence approach. While non-English NLG is under-explored in general, Czech, as a morphologically rich language, makes the task even harder: Since Czech requires inflecting named entities, delexicalization or copy mechanisms do not work out-of-the-box and lexicalizing the generated outputs is non-trivial. In our experiments, we present two different approaches to this this problem: (1) using a neural language model to select the correct inflected form while lexicalizing, (2) a two-step generation setup: our sequence-to-sequence model generates an interleaved sequence of lemmas and morphological tags, which are then inflected by a morphological generator. 2 authors · Oct 11, 2019
- Factorising Meaning and Form for Intent-Preserving Paraphrasing We propose a method for generating paraphrases of English questions that retain the original intent but use a different surface form. Our model combines a careful choice of training objective with a principled information bottleneck, to induce a latent encoding space that disentangles meaning and form. We train an encoder-decoder model to reconstruct a question from a paraphrase with the same meaning and an exemplar with the same surface form, leading to separated encoding spaces. We use a Vector-Quantized Variational Autoencoder to represent the surface form as a set of discrete latent variables, allowing us to use a classifier to select a different surface form at test time. Crucially, our method does not require access to an external source of target exemplars. Extensive experiments and a human evaluation show that we are able to generate paraphrases with a better tradeoff between semantic preservation and syntactic novelty compared to previous methods. 2 authors · May 31, 2021
- Logical Languages Accepted by Transformer Encoders with Hard Attention We contribute to the study of formal languages that can be recognized by transformer encoders. We focus on two self-attention mechanisms: (1) UHAT (Unique Hard Attention Transformers) and (2) AHAT (Average Hard Attention Transformers). UHAT encoders are known to recognize only languages inside the circuit complexity class {sf AC}^0, i.e., accepted by a family of poly-sized and depth-bounded boolean circuits with unbounded fan-ins. On the other hand, AHAT encoders can recognize languages outside {sf AC}^0), but their expressive power still lies within the bigger circuit complexity class {sf TC}^0, i.e., {sf AC}^0-circuits extended by majority gates. We first show a negative result that there is an {sf AC}^0-language that cannot be recognized by an UHAT encoder. On the positive side, we show that UHAT encoders can recognize a rich fragment of {sf AC}^0-languages, namely, all languages definable in first-order logic with arbitrary unary numerical predicates. This logic, includes, for example, all regular languages from {sf AC}^0. We then show that AHAT encoders can recognize all languages of our logic even when we enrich it with counting terms. We apply these results to derive new results on the expressive power of UHAT and AHAT up to permutation of letters (a.k.a. Parikh images). 4 authors · Oct 5, 2023
- Consistency of the Predicative Calculus of Cumulative Inductive Constructions (pCuIC) In order to avoid well-know paradoxes associated with self-referential definitions, higher-order dependent type theories stratify the theory using a countably infinite hierarchy of universes (also known as sorts), Type_0 : Type_1 : cdots . Such type systems are called cumulative if for any type A we have that A : Type_{i} implies A : Type_{i+1}. The predicative calculus of inductive constructions (pCIC) which forms the basis of the Coq proof assistant, is one such system. In this paper we present and establish the soundness of the predicative calculus of cumulative inductive constructions (pCuIC) which extends the cumulativity relation to inductive types. 2 authors · Oct 11, 2017
- On Signs of eigenvalues of Modular forms satisfying Ramanujan Conjecture Let F in S_{k_1}(Gamma^{(2)}(N_1)) and G in S_{k_2}(Gamma^{(2)}(N_2)) be two Siegel cusp forms over the congruence subgroups Gamma^{(2)}(N_1) and Gamma^{(2)}(N_2) respectively. Assume that they are Hecke eigenforms in different eigenspaces and satisfy the Generalized Ramanujan Conjecture. Let lambda_F(p) denote the eigenvalue of F with respect to the Hecke operator T(p). In this article, we compute a lower bound for the density of the set of primes, { p : lambda_F(p) lambda_G(p) < 0 }. 1 authors · Dec 12, 2024
- End-to-End Non-Autoregressive Neural Machine Translation with Connectionist Temporal Classification Autoregressive decoding is the only part of sequence-to-sequence models that prevents them from massive parallelization at inference time. Non-autoregressive models enable the decoder to generate all output symbols independently in parallel. We present a novel non-autoregressive architecture based on connectionist temporal classification and evaluate it on the task of neural machine translation. Unlike other non-autoregressive methods which operate in several steps, our model can be trained end-to-end. We conduct experiments on the WMT English-Romanian and English-German datasets. Our models achieve a significant speedup over the autoregressive models, keeping the translation quality comparable to other non-autoregressive models. 2 authors · Nov 12, 2018
- Non-trivial saddles in microscopic description of black holes Non-trivial gravitational saddles have played a key role in the island proposal for the black hole information paradox. It is worth asking if non-trivial saddles exist in microscopic descriptions of black holes. We show this to be the case for 1/8 BPS black holes in N = 8 String Theory in a duality frame, where all charges are Ramond Ramond. The saddles are in the Coulomb branch, where they describe marginally stable bound states of the constituent branes, and correspond to vacua of the BFSS model. The non-perturbative suppression scale is determined by the binding energy. 2 authors · Dec 7, 2023
2 Beyond Euclid: An Illustrated Guide to Modern Machine Learning with Geometric, Topological, and Algebraic Structures The enduring legacy of Euclidean geometry underpins classical machine learning, which, for decades, has been primarily developed for data lying in Euclidean space. Yet, modern machine learning increasingly encounters richly structured data that is inherently nonEuclidean. This data can exhibit intricate geometric, topological and algebraic structure: from the geometry of the curvature of space-time, to topologically complex interactions between neurons in the brain, to the algebraic transformations describing symmetries of physical systems. Extracting knowledge from such non-Euclidean data necessitates a broader mathematical perspective. Echoing the 19th-century revolutions that gave rise to non-Euclidean geometry, an emerging line of research is redefining modern machine learning with non-Euclidean structures. Its goal: generalizing classical methods to unconventional data types with geometry, topology, and algebra. In this review, we provide an accessible gateway to this fast-growing field and propose a graphical taxonomy that integrates recent advances into an intuitive unified framework. We subsequently extract insights into current challenges and highlight exciting opportunities for future development in this field. 9 authors · Jul 12, 2024
- On the Parameterization and Initialization of Diagonal State Space Models State space models (SSM) have recently been shown to be very effective as a deep learning layer as a promising alternative to sequence models such as RNNs, CNNs, or Transformers. The first version to show this potential was the S4 model, which is particularly effective on tasks involving long-range dependencies by using a prescribed state matrix called the HiPPO matrix. While this has an interpretable mathematical mechanism for modeling long dependencies, it introduces a custom representation and algorithm that can be difficult to implement. On the other hand, a recent variant of S4 called DSS showed that restricting the state matrix to be fully diagonal can still preserve the performance of the original model when using a specific initialization based on approximating S4's matrix. This work seeks to systematically understand how to parameterize and initialize such diagonal state space models. While it follows from classical results that almost all SSMs have an equivalent diagonal form, we show that the initialization is critical for performance. We explain why DSS works mathematically, by showing that the diagonal restriction of S4's matrix surprisingly recovers the same kernel in the limit of infinite state dimension. We also systematically describe various design choices in parameterizing and computing diagonal SSMs, and perform a controlled empirical study ablating the effects of these choices. Our final model S4D is a simple diagonal version of S4 whose kernel computation requires just 2 lines of code and performs comparably to S4 in almost all settings, with state-of-the-art results for image, audio, and medical time-series domains, and averaging 85\% on the Long Range Arena benchmark. 4 authors · Jun 23, 2022
- Topological Quantum Compilation Using Mixed-Integer Programming We introduce the Mixed-Integer Quadratically Constrained Quadratic Programming framework for the quantum compilation problem and apply it in the context of topological quantum computing. In this setting, quantum gates are realized by sequences of elementary braids of quasiparticles with exotic fractional statistics in certain two-dimensional topological condensed matter systems, described by effective topological quantum field theories. We specifically focus on a non-semisimple version of topological field theory, which provides a foundation for an extended theory of Ising anyons and which has recently been shown by Iulianelli et al., Nature Communications {\bf 16}, 6408 (2025), to permit universal quantum computation. While the proofs of this pioneering result are existential in nature, the mixed integer programming provides an approach to explicitly construct quantum gates in topological systems. We demonstrate this by focusing specifically on the entangling controlled-NOT operation, and its local equivalence class, using braiding operations in the non-semisimple Ising system. This illustrates the utility of the Mixed-Integer Quadratically Constrained Quadratic Programming for topological quantum compilation. 5 authors · Nov 12
- GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond The Non-Local Network (NLNet) presents a pioneering approach for capturing long-range dependencies, via aggregating query-specific global context to each query position. However, through a rigorous empirical analysis, we have found that the global contexts modeled by non-local network are almost the same for different query positions within an image. In this paper, we take advantage of this finding to create a simplified network based on a query-independent formulation, which maintains the accuracy of NLNet but with significantly less computation. We further observe that this simplified design shares similar structure with Squeeze-Excitation Network (SENet). Hence we unify them into a three-step general framework for global context modeling. Within the general framework, we design a better instantiation, called the global context (GC) block, which is lightweight and can effectively model the global context. The lightweight property allows us to apply it for multiple layers in a backbone network to construct a global context network (GCNet), which generally outperforms both simplified NLNet and SENet on major benchmarks for various recognition tasks. The code and configurations are released at https://github.com/xvjiarui/GCNet. 5 authors · Apr 25, 2019
- Magic State Injection on IBM Quantum Processors Above the Distillation Threshold The surface code family is a promising approach to implementing fault-tolerant quantum computations. Universal fault-tolerance requires error-corrected non-Clifford operations, in addition to Clifford gates, and for the former, it is imperative to experimentally demonstrate additional resources known as magic states. Another challenge is to efficiently embed surface codes into quantum hardware with connectivity constraints. This work simultaneously addresses both challenges by employing a qubit-efficient rotated heavy-hexagonal surface code for IBM quantum processors (ibm\_fez) and implementing the magic state injection protocol. Our work reports error thresholds for both logical bit- and phase-flip errors, of approx0.37% and approx0.31%, respectively, which are higher than the threshold values previously reported with traditional embedding. The post-selection-based preparation of logical magic states |H_Lrangle and |T_Lrangle achieve fidelities of 0.8806pm0.0002 and 0.8665pm0.0003, respectively, which are both above the magic state distillation threshold. Additionally, we report the minimum fidelity among injected arbitrary single logical qubit states as 0.8356pm0.0003. Our work demonstrates the potential for realising non-Clifford logical gates by producing high-fidelity logical magic states on IBM quantum devices. 3 authors · Dec 2, 2024
- Normalizable fermion modes in a holographic superconductor We consider fermions in a zero-temperature superconducting anti-de Sitter domain wall solution and find continuous bands of normal modes. These bands can be either partially filled or totally empty and gapped. We present a semi-classical argument which approximately captures the main features of the normal mode spectrum. 3 authors · Nov 18, 2009
- Distinct Minkowski Spaces from BMS Supertranslations This work provides a smooth and everywhere well-defined extension of Bondi-Metzner-Sachs (BMS) supertranslations into the bulk of Minkowski space. The supertranslations lead to physically distinct spacetimes, all isometric to Minkowski space. This construction is in contrast to the often used, non-smooth BMS transformations that appear in a gauge-fixed description of the theory. 1 authors · Nov 7, 2017
1 A theory of meta-factorization We introduce meta-factorization, a theory that describes matrix decompositions as solutions of linear matrix equations: the projector and the reconstruction equation. Meta-factorization reconstructs known factorizations, reveals their internal structures, and allows for introducing modifications, as illustrated with SVD, QR, and UTV factorizations. The prospect of meta-factorization also provides insights into computational aspects of generalized matrix inverses and randomized linear algebra algorithms. The relations between the Moore-Penrose pseudoinverse, generalized Nystr\"{o}m method, and the CUR decomposition are revealed here as an illustration. Finally, meta-factorization offers hints on the structure of new factorizations and provides the potential of creating them. 1 authors · Nov 29, 2021
- Non-deep Networks Depth is the hallmark of deep neural networks. But more depth means more sequential computation and higher latency. This begs the question -- is it possible to build high-performing "non-deep" neural networks? We show that it is. To do so, we use parallel subnetworks instead of stacking one layer after another. This helps effectively reduce depth while maintaining high performance. By utilizing parallel substructures, we show, for the first time, that a network with a depth of just 12 can achieve top-1 accuracy over 80% on ImageNet, 96% on CIFAR10, and 81% on CIFAR100. We also show that a network with a low-depth (12) backbone can achieve an AP of 48% on MS-COCO. We analyze the scaling rules for our design and show how to increase performance without changing the network's depth. Finally, we provide a proof of concept for how non-deep networks could be used to build low-latency recognition systems. Code is available at https://github.com/imankgoyal/NonDeepNetworks. 4 authors · Oct 14, 2021
- How Do We Answer Complex Questions: Discourse Structure of Long-form Answers Long-form answers, consisting of multiple sentences, can provide nuanced and comprehensive answers to a broader set of questions. To better understand this complex and understudied task, we study the functional structure of long-form answers collected from three datasets, ELI5, WebGPT and Natural Questions. Our main goal is to understand how humans organize information to craft complex answers. We develop an ontology of six sentence-level functional roles for long-form answers, and annotate 3.9k sentences in 640 answer paragraphs. Different answer collection methods manifest in different discourse structures. We further analyze model-generated answers -- finding that annotators agree less with each other when annotating model-generated answers compared to annotating human-written answers. Our annotated data enables training a strong classifier that can be used for automatic analysis. We hope our work can inspire future research on discourse-level modeling and evaluation of long-form QA systems. 3 authors · Mar 21, 2022
12 The Multimodal Universe: Enabling Large-Scale Machine Learning with 100TB of Astronomical Scientific Data We present the MULTIMODAL UNIVERSE, a large-scale multimodal dataset of scientific astronomical data, compiled specifically to facilitate machine learning research. Overall, the MULTIMODAL UNIVERSE contains hundreds of millions of astronomical observations, constituting 100\,TB of multi-channel and hyper-spectral images, spectra, multivariate time series, as well as a wide variety of associated scientific measurements and "metadata". In addition, we include a range of benchmark tasks representative of standard practices for machine learning methods in astrophysics. This massive dataset will enable the development of large multi-modal models specifically targeted towards scientific applications. All codes used to compile the MULTIMODAL UNIVERSE and a description of how to access the data is available at https://github.com/MultimodalUniverse/MultimodalUniverse 29 authors · Dec 3, 2024
- A Benchmark and Dataset for Post-OCR text correction in Sanskrit Sanskrit is a classical language with about 30 million extant manuscripts fit for digitisation, available in written, printed or scannedimage forms. However, it is still considered to be a low-resource language when it comes to available digital resources. In this work, we release a post-OCR text correction dataset containing around 218,000 sentences, with 1.5 million words, from 30 different books. Texts in Sanskrit are known to be diverse in terms of their linguistic and stylistic usage since Sanskrit was the 'lingua franca' for discourse in the Indian subcontinent for about 3 millennia. Keeping this in mind, we release a multi-domain dataset, from areas as diverse as astronomy, medicine and mathematics, with some of them as old as 18 centuries. Further, we release multiple strong baselines as benchmarks for the task, based on pre-trained Seq2Seq language models. We find that our best-performing model, consisting of byte level tokenization in conjunction with phonetic encoding (Byt5+SLP1), yields a 23% point increase over the OCR output in terms of word and character error rates. Moreover, we perform extensive experiments in evaluating these models on their performance and analyse common causes of mispredictions both at the graphemic and lexical levels. Our code and dataset is publicly available at https://github.com/ayushbits/pe-ocr-sanskrit. 4 authors · Nov 15, 2022
- NoFunEval: Funny How Code LMs Falter on Requirements Beyond Functional Correctness Existing evaluation benchmarks of language models of code (code LMs) focus almost exclusively on whether the LMs can generate functionally-correct code. In real-world software engineering, developers think beyond functional correctness. They have requirements on "how" a functionality should be implemented to meet overall system design objectives like efficiency, security, and maintainability. They would also trust the code LMs more if the LMs demonstrate robust understanding of requirements and code semantics. We propose a new benchmark NoFunEval to evaluate code LMs on non-functional requirements and simple classification instances for both functional and non-functional requirements. We propose a prompting method, Coding Concepts (CoCo), as a way for a developer to communicate the domain knowledge to the LMs. We conduct an extensive evaluation of twenty-two code LMs. Our finding is that they generally falter when tested on our benchmark, hinting at fundamental blindspots in their training setups. Surprisingly, even the classification accuracy on functional-correctness instances derived from the popular HumanEval benchmark is low, calling in question the depth of their comprehension and the source of their success in generating functionally-correct code in the first place. We will release our benchmark and evaluation scripts publicly at https://aka.ms/NoFunEval. 5 authors · Jan 29, 2024
- PROSE-FD: A Multimodal PDE Foundation Model for Learning Multiple Operators for Forecasting Fluid Dynamics We propose PROSE-FD, a zero-shot multimodal PDE foundational model for simultaneous prediction of heterogeneous two-dimensional physical systems related to distinct fluid dynamics settings. These systems include shallow water equations and the Navier-Stokes equations with incompressible and compressible flow, regular and complex geometries, and different buoyancy settings. This work presents a new transformer-based multi-operator learning approach that fuses symbolic information to perform operator-based data prediction, i.e. non-autoregressive. By incorporating multiple modalities in the inputs, the PDE foundation model builds in a pathway for including mathematical descriptions of the physical behavior. We pre-train our foundation model on 6 parametric families of equations collected from 13 datasets, including over 60K trajectories. Our model outperforms popular operator learning, computer vision, and multi-physics models, in benchmark forward prediction tasks. We test our architecture choices with ablation studies. 6 authors · Sep 15, 2024
- RConE: Rough Cone Embedding for Multi-Hop Logical Query Answering on Multi-Modal Knowledge Graphs Multi-hop query answering over a Knowledge Graph (KG) involves traversing one or more hops from the start node to answer a query. Path-based and logic-based methods are state-of-the-art for multi-hop question answering. The former is used in link prediction tasks. The latter is for answering complex logical queries. The logical multi-hop querying technique embeds the KG and queries in the same embedding space. The existing work incorporates First Order Logic (FOL) operators, such as conjunction (wedge), disjunction (vee), and negation (neg), in queries. Though current models have most of the building blocks to execute the FOL queries, they cannot use the dense information of multi-modal entities in the case of Multi-Modal Knowledge Graphs (MMKGs). We propose RConE, an embedding method to capture the multi-modal information needed to answer a query. The model first shortlists candidate (multi-modal) entities containing the answer. It then finds the solution (sub-entities) within those entities. Several existing works tackle path-based question-answering in MMKGs. However, to our knowledge, we are the first to introduce logical constructs in querying MMKGs and to answer queries that involve sub-entities of multi-modal entities as the answer. Extensive evaluation of four publicly available MMKGs indicates that RConE outperforms the current state-of-the-art. 3 authors · Aug 21, 2024
- Foundations for Near-Term Quantum Natural Language Processing We provide conceptual and mathematical foundations for near-term quantum natural language processing (QNLP), and do so in quantum computer scientist friendly terms. We opted for an expository presentation style, and provide references for supporting empirical evidence and formal statements concerning mathematical generality. We recall how the quantum model for natural language that we employ canonically combines linguistic meanings with rich linguistic structure, most notably grammar. In particular, the fact that it takes a quantum-like model to combine meaning and structure, establishes QNLP as quantum-native, on par with simulation of quantum systems. Moreover, the now leading Noisy Intermediate-Scale Quantum (NISQ) paradigm for encoding classical data on quantum hardware, variational quantum circuits, makes NISQ exceptionally QNLP-friendly: linguistic structure can be encoded as a free lunch, in contrast to the apparently exponentially expensive classical encoding of grammar. Quantum speed-up for QNLP tasks has already been established in previous work with Will Zeng. Here we provide a broader range of tasks which all enjoy the same advantage. Diagrammatic reasoning is at the heart of QNLP. Firstly, the quantum model interprets language as quantum processes via the diagrammatic formalism of categorical quantum mechanics. Secondly, these diagrams are via ZX-calculus translated into quantum circuits. Parameterisations of meanings then become the circuit variables to be learned. Our encoding of linguistic structure within quantum circuits also embodies a novel approach for establishing word-meanings that goes beyond the current standards in mainstream AI, by placing linguistic structure at the heart of Wittgenstein's meaning-is-context. 4 authors · Dec 7, 2020
- Annotation Guidelines for Corpus Novelties: Part 2 -- Alias Resolution Version 1.0 The Novelties corpus is a collection of novels (and parts of novels) annotated for Alias Resolution, among other tasks. This document describes the guidelines applied during the annotation process. It contains the instructions used by the annotators, as well as a number of examples retrieved from the annotated novels, and illustrating how canonical names should be defined, and which names should be considered as referring to the same entity. 2 authors · Oct 1, 2024
- VALUE: Understanding Dialect Disparity in NLU English Natural Language Understanding (NLU) systems have achieved great performances and even outperformed humans on benchmarks like GLUE and SuperGLUE. However, these benchmarks contain only textbook Standard American English (SAE). Other dialects have been largely overlooked in the NLP community. This leads to biased and inequitable NLU systems that serve only a sub-population of speakers. To understand disparities in current models and to facilitate more dialect-competent NLU systems, we introduce the VernAcular Language Understanding Evaluation (VALUE) benchmark, a challenging variant of GLUE that we created with a set of lexical and morphosyntactic transformation rules. In this initial release (V.1), we construct rules for 11 features of African American Vernacular English (AAVE), and we recruit fluent AAVE speakers to validate each feature transformation via linguistic acceptability judgments in a participatory design manner. Experiments show that these new dialectal features can lead to a drop in model performance. To run the transformation code and download both synthetic and gold-standard dialectal GLUE benchmarks, see https://github.com/SALT-NLP/value 5 authors · Apr 6, 2022
- Breaking supersymmetry with pure spinors For several classes of BPS vacua, we find a procedure to modify the PDEs that imply preserved supersymmetry and the equations of motion so that they still imply the latter but not the former. In each case we trace back this supersymmetry-breaking deformation to a distinct modification of the pure spinor equations that provide a geometrical interpretation of supersymmetry. We give some concrete examples: first we generalize the Imamura class of Mink6 solutions by removing a symmetry requirement, and then derive some local and global solutions both before and after breaking supersymmetry. 2 authors · Nov 27, 2019
1 Beyond Fully-Connected Layers with Quaternions: Parameterization of Hypercomplex Multiplications with 1/n Parameters Recent works have demonstrated reasonable success of representation learning in hypercomplex space. Specifically, "fully-connected layers with Quaternions" (4D hypercomplex numbers), which replace real-valued matrix multiplications in fully-connected layers with Hamilton products of Quaternions, both enjoy parameter savings with only 1/4 learnable parameters and achieve comparable performance in various applications. However, one key caveat is that hypercomplex space only exists at very few predefined dimensions (4D, 8D, and 16D). This restricts the flexibility of models that leverage hypercomplex multiplications. To this end, we propose parameterizing hypercomplex multiplications, allowing models to learn multiplication rules from data regardless of whether such rules are predefined. As a result, our method not only subsumes the Hamilton product, but also learns to operate on any arbitrary nD hypercomplex space, providing more architectural flexibility using arbitrarily 1/n learnable parameters compared with the fully-connected layer counterpart. Experiments of applications to the LSTM and Transformer models on natural language inference, machine translation, text style transfer, and subject verb agreement demonstrate architectural flexibility and effectiveness of the proposed approach. 7 authors · Feb 17, 2021
- Logic2Text: High-Fidelity Natural Language Generation from Logical Forms Previous works on Natural Language Generation (NLG) from structured data have primarily focused on surface-level descriptions of record sequences. However, for complex structured data, e.g., multi-row tables, it is often desirable for an NLG system to describe interesting facts from logical inferences across records. If only provided with the table, it is hard for existing models to produce controllable and high-fidelity logical generations. In this work, we formulate logical level NLG as generation from logical forms in order to obtain controllable, high-fidelity, and faithful generations. We present a new large-scale dataset, Logic2Text, with 10,753 descriptions involving common logic types paired with the underlying logical forms. The logical forms show diversified graph structure of free schema, which poses great challenges on the model's ability to understand the semantics. We experiment on (1) Fully-supervised training with the full datasets, and (2) Few-shot setting, provided with hundreds of paired examples; We compare several popular generation models and analyze their performances. We hope our dataset can encourage research towards building an advanced NLG system capable of natural, faithful, and human-like generation. The dataset and code are available at https://github.com/czyssrs/Logic2Text. 7 authors · Apr 30, 2020
- Global Context Networks The Non-Local Network (NLNet) presents a pioneering approach for capturing long-range dependencies within an image, via aggregating query-specific global context to each query position. However, through a rigorous empirical analysis, we have found that the global contexts modeled by the non-local network are almost the same for different query positions. In this paper, we take advantage of this finding to create a simplified network based on a query-independent formulation, which maintains the accuracy of NLNet but with significantly less computation. We further replace the one-layer transformation function of the non-local block by a two-layer bottleneck, which further reduces the parameter number considerably. The resulting network element, called the global context (GC) block, effectively models global context in a lightweight manner, allowing it to be applied at multiple layers of a backbone network to form a global context network (GCNet). Experiments show that GCNet generally outperforms NLNet on major benchmarks for various recognition tasks. The code and network configurations are available at https://github.com/xvjiarui/GCNet. 5 authors · Dec 24, 2020 1
- Some Questions of Uniformity in Algorithmic Randomness The Omega numbers-the halting probabilities of universal prefix-free machines-are known to be exactly the Martin-L{\"o}f random left-c.e. reals. We show that one cannot uniformly produce, from a Martin-L{\"o}f random left-c.e. real alpha, a universal prefix-free machine U whose halting probability is alpha. We also answer a question of Barmpalias and Lewis-Pye by showing that given a left-c.e. real alpha, one cannot uniformly produce a left-c.e. real beta such that alpha -- beta is neither left-c.e. nor right-c.e. 3 authors · Nov 2, 2021
- A Convenient Category for Higher-Order Probability Theory Higher-order probabilistic programming languages allow programmers to write sophisticated models in machine learning and statistics in a succinct and structured way, but step outside the standard measure-theoretic formalization of probability theory. Programs may use both higher-order functions and continuous distributions, or even define a probability distribution on functions. But standard probability theory does not handle higher-order functions well: the category of measurable spaces is not cartesian closed. Here we introduce quasi-Borel spaces. We show that these spaces: form a new formalization of probability theory replacing measurable spaces; form a cartesian closed category and so support higher-order functions; form a well-pointed category and so support good proof principles for equational reasoning; and support continuous probability distributions. We demonstrate the use of quasi-Borel spaces for higher-order functions and probability by: showing that a well-known construction of probability theory involving random functions gains a cleaner expression; and generalizing de Finetti's theorem, that is a crucial theorem in probability theory, to quasi-Borel spaces. 4 authors · Jan 10, 2017
- Distinguishability and linear independence for H-chromatic symmetric functions We study the H-chromatic symmetric functions X_G^H (introduced in (arXiv:2011.06063) as a generalization of the chromatic symmetric function (CSF) X_G), which track homomorphisms from the graph G to the graph H. We focus first on the case of self-chromatic symmetric functions (self-CSFs) X_G^G, making some progress toward a conjecture from (arXiv:2011.06063) that the self-CSF, like the normal CSF, is always different for different trees. In particular, we show that the self-CSF distinguishes trees from non-trees with just one exception, we check using Sage that it distinguishes all trees on up to 12 vertices, and we show that it determines the number of legs of a spider and the degree sequence of a caterpillar given its spine length. We also show that the self-CSF detects the number of connected components of a forest, again with just one exception. Then we prove some results about the power sum expansions for H-CSFs when H is a complete bipartite graph, in particular proving that the conjecture from (arXiv:2011.06063) about p-monotonicity of ω(X_G^H) for H a star holds as long as H is sufficiently large compared to G. We also show that the self-CSFs of complete multipartite graphs form a basis for the ring Λ of symmetric functions, and we give some construction of bases for the vector space Λ^n of degree n symmetric functions using H-CSFs X_G^H where H is a fixed graph that is not a complete graph, answering a question from (arXiv:2011.06063) about whether such bases exist. However, we show that there generally do not exist such bases with G fixed, even with loops, answering another question from (arXiv:2011.06063). We also define the H-chromatic polynomial as an analogue of the chromatic polynomial, and ask when it is the same for different graphs. 2 authors · Nov 11
- Natural Vocabulary Emerges from Free-Form Annotations We propose an approach for annotating object classes using free-form text written by undirected and untrained annotators. Free-form labeling is natural for annotators, they intuitively provide very specific and exhaustive labels, and no training stage is necessary. We first collect 729 labels on 15k images using 124 different annotators. Then we automatically enrich the structure of these free-form annotations by discovering a natural vocabulary of 4020 classes within them. This vocabulary represents the natural distribution of objects well and is learned directly from data, instead of being an educated guess done before collecting any labels. Hence, the natural vocabulary emerges from a large mass of free-form annotations. To do so, we (i) map the raw input strings to entities in an ontology of physical objects (which gives them an unambiguous meaning); and (ii) leverage inter-annotator co-occurrences, as well as biases and knowledge specific to individual annotators. Finally, we also automatically extract natural vocabularies of reduced size that have high object coverage while remaining specific. These reduced vocabularies represent the natural distribution of objects much better than commonly used predefined vocabularies. Moreover, they feature more uniform sample distribution over classes. 3 authors · Jun 4, 2019
7 Symbol-LLM: Towards Foundational Symbol-centric Interface For Large Language Models Large Language Models (LLMs) have greatly propelled the progress in natural language(NL)-centric tasks based on NL interface. However, the NL form is not enough for world knowledge. Current works focus on this question by injecting specific symbolic knowledge into LLM, which ignore two critical challenges: the interrelations between various symbols and the balance between symbolic-centric and NL-centric capabilities. In this work, we tackle these challenges from both a data and framework perspective and introduce Symbol-LLM series models. First, we collect 34 symbolic tasks, covering ~20 different forms, which are unified to capture symbol interrelations. Then, a two-stage tuning framework succeeds in injecting symbolic knowledge without loss of the generality ability. Extensive experiments on both symbol- and NL-centric tasks demonstrate the balanced and superior performances of Symbol-LLM series models. 9 authors · Nov 15, 2023
1 QTRAJ 1.0: A Lindblad equation solver for heavy-quarkonium dynamics We introduce an open-source package called QTraj that solves the Lindblad equation for heavy-quarkonium dynamics using the quantum trajectories algorithm. The package allows users to simulate the suppression of heavy-quarkonium states using externally-supplied input from 3+1D hydrodynamics simulations. The code uses a split-step pseudo-spectral method for updating the wave-function between jumps, which is implemented using the open-source multi-threaded FFTW3 package. This allows one to have manifestly unitary evolution when using real-valued potentials. In this paper, we provide detailed documentation of QTraj 1.0, installation instructions, and present various tests and benchmarks of the code. 7 authors · Jul 13, 2021
- DepNeCTI: Dependency-based Nested Compound Type Identification for Sanskrit Multi-component compounding is a prevalent phenomenon in Sanskrit, and understanding the implicit structure of a compound's components is crucial for deciphering its meaning. Earlier approaches in Sanskrit have focused on binary compounds and neglected the multi-component compound setting. This work introduces the novel task of nested compound type identification (NeCTI), which aims to identify nested spans of a multi-component compound and decode the implicit semantic relations between them. To the best of our knowledge, this is the first attempt in the field of lexical semantics to propose this task. We present 2 newly annotated datasets including an out-of-domain dataset for this task. We also benchmark these datasets by exploring the efficacy of the standard problem formulations such as nested named entity recognition, constituency parsing and seq2seq, etc. We present a novel framework named DepNeCTI: Dependency-based Nested Compound Type Identifier that surpasses the performance of the best baseline with an average absolute improvement of 13.1 points F1-score in terms of Labeled Span Score (LSS) and a 5-fold enhancement in inference efficiency. In line with the previous findings in the binary Sanskrit compound identification task, context provides benefits for the NeCTI task. The codebase and datasets are publicly available at: https://github.com/yaswanth-iitkgp/DepNeCTI 7 authors · Oct 14, 2023
- Taxonomical hierarchy of canonicalized relations from multiple Knowledge Bases This work addresses two important questions pertinent to Relation Extraction (RE). First, what are all possible relations that could exist between any two given entity types? Second, how do we define an unambiguous taxonomical (is-a) hierarchy among the identified relations? To address the first question, we use three resources Wikipedia Infobox, Wikidata, and DBpedia. This study focuses on relations between person, organization and location entity types. We exploit Wikidata and DBpedia in a data-driven manner, and Wikipedia Infobox templates manually to generate lists of relations. Further, to address the second question, we canonicalize, filter, and combine the identified relations from the three resources to construct a taxonomical hierarchy. This hierarchy contains 623 canonical relations with highest contribution from Wikipedia Infobox followed by DBpedia and Wikidata. The generated relation list subsumes an average of 85% of relations from RE datasets when entity types are restricted. 3 authors · Sep 13, 2019
- The Norwegian Parliamentary Speech Corpus The Norwegian Parliamentary Speech Corpus (NPSC) is a speech dataset with recordings of meetings from Stortinget, the Norwegian parliament. It is the first, publicly available dataset containing unscripted, Norwegian speech designed for training of automatic speech recognition (ASR) systems. The recordings are manually transcribed and annotated with language codes and speakers, and there are detailed metadata about the speakers. The transcriptions exist in both normalized and non-normalized form, and non-standardized words are explicitly marked and annotated with standardized equivalents. To test the usefulness of this dataset, we have compared an ASR system trained on the NPSC with a baseline system trained on only manuscript-read speech. These systems were tested on an independent dataset containing spontaneous, dialectal speech. The NPSC-trained system performed significantly better, with a 22.9% relative improvement in word error rate (WER). Moreover, training on the NPSC is shown to have a "democratizing" effect in terms of dialects, as improvements are generally larger for dialects with higher WER from the baseline system. 2 authors · Jan 26, 2022
- Lie Group Decompositions for Equivariant Neural Networks Invariance and equivariance to geometrical transformations have proven to be very useful inductive biases when training (convolutional) neural network models, especially in the low-data regime. Much work has focused on the case where the symmetry group employed is compact or abelian, or both. Recent work has explored enlarging the class of transformations used to the case of Lie groups, principally through the use of their Lie algebra, as well as the group exponential and logarithm maps. The applicability of such methods to larger transformation groups is limited by the fact that depending on the group of interest G, the exponential map may not be surjective. Further limitations are encountered when G is neither compact nor abelian. Using the structure and geometry of Lie groups and their homogeneous spaces, we present a framework by which it is possible to work with such groups primarily focusing on the Lie groups G = GL^{+}(n, R) and G = SL(n, R), as well as their representation as affine transformations R^{n} rtimes G. Invariant integration as well as a global parametrization is realized by decomposing the `larger` groups into subgroups and submanifolds which can be handled individually. Under this framework, we show how convolution kernels can be parametrized to build models equivariant with respect to affine transformations. We evaluate the robustness and out-of-distribution generalisation capability of our model on the standard affine-invariant benchmark classification task, where we outperform all previous equivariant models as well as all Capsule Network proposals. 2 authors · Oct 17, 2023
- Abstract independence relations in neostability theory We develop a framework, in the style of Adler, for interpreting the notion of "witnessing" that has appeared (usually as a variant of Kim's Lemma) in different areas of neostability theory as a binary relation between abstract independence relations. This involves extending the relativisations of Kim-independence and Conant-independence due to Mutchnik to arbitrary independence relations. After developing this framework, we show that several results from simplicity, NTP_2, NSOP_1, and beyond follow as instances of general theorems for abstract independence relations. In particular, we prove the equivalence between witnessing and symmetry and the implications from this notion to chain local character and the weak independence theorem, and recover some partial converses. Finally, we use this framework to prove a dichotomy between NSOP_1 and Kruckman and Ramsey's BTP that applies to most known NSOP_4 examples in the literature. 1 authors · Nov 10
- Naturalizing a Programming Language via Interactive Learning Our goal is to create a convenient natural language interface for performing well-specified but complex actions such as analyzing data, manipulating text, and querying databases. However, existing natural language interfaces for such tasks are quite primitive compared to the power one wields with a programming language. To bridge this gap, we start with a core programming language and allow users to "naturalize" the core language incrementally by defining alternative, more natural syntax and increasingly complex concepts in terms of compositions of simpler ones. In a voxel world, we show that a community of users can simultaneously teach a common system a diverse language and use it to build hundreds of complex voxel structures. Over the course of three days, these users went from using only the core language to using the naturalized language in 85.9\% of the last 10K utterances. 4 authors · Apr 23, 2017
- Rethinking the Power of Graph Canonization in Graph Representation Learning with Stability The expressivity of Graph Neural Networks (GNNs) has been studied broadly in recent years to reveal the design principles for more powerful GNNs. Graph canonization is known as a typical approach to distinguish non-isomorphic graphs, yet rarely adopted when developing expressive GNNs. This paper proposes to maximize the expressivity of GNNs by graph canonization, then the power of such GNNs is studies from the perspective of model stability. A stable GNN will map similar graphs to close graph representations in the vectorial space, and the stability of GNNs is critical to generalize their performance to unseen graphs. We theoretically reveal the trade-off of expressivity and stability in graph-canonization-enhanced GNNs. Then we introduce a notion of universal graph canonization as the general solution to address the trade-off and characterize a widely applicable sufficient condition to solve the universal graph canonization. A comprehensive set of experiments demonstrates the effectiveness of the proposed method. In many popular graph benchmark datasets, graph canonization successfully enhances GNNs and provides highly competitive performance, indicating the capability and great potential of proposed method in general graph representation learning. In graph datasets where the sufficient condition holds, GNNs enhanced by universal graph canonization consistently outperform GNN baselines and successfully improve the SOTA performance up to 31%, providing the optimal solution to numerous challenging real-world graph analytical tasks like gene network representation learning in bioinformatics. 8 authors · Sep 1, 2023
3 Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets With the success of large-scale pre-training and multilingual modeling in Natural Language Processing (NLP), recent years have seen a proliferation of large, web-mined text datasets covering hundreds of languages. We manually audit the quality of 205 language-specific corpora released with five major public datasets (CCAligned, ParaCrawl, WikiMatrix, OSCAR, mC4). Lower-resource corpora have systematic issues: At least 15 corpora have no usable text, and a significant fraction contains less than 50% sentences of acceptable quality. In addition, many are mislabeled or use nonstandard/ambiguous language codes. We demonstrate that these issues are easy to detect even for non-proficient speakers, and supplement the human audit with automatic analyses. Finally, we recommend techniques to evaluate and improve multilingual corpora and discuss potential risks that come with low-quality data releases. 52 authors · Mar 22, 2021