SentenceTransformer based on Salesforce/codet5-small
This is a sentence-transformers model finetuned from Salesforce/codet5-small. It maps sentences & paragraphs to a 512-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: Salesforce/codet5-small
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 512 dimensions
- Similarity Function: Cosine Similarity
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: T5EncoderModel
(1): Pooling({'word_embedding_dimension': 512, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("buelfhood/SOCO-Java-CodeT5Small-ST")
# Run inference
sentences = [
'\n\n\n\n\n\nimport java.io.*;\nimport java.net.*;\n\n\n\npublic class Dictionary\n{\n public static void main (String args[]) throws IOException,\n MalformedURLException\n {\n final String username = "";\n final String fullurl = "http://sec-crack.cs.rmit.edu./SEC/2/";\n final String dictfile = "/usr/share/lib/dict/words";\n String temppass;\n String password = "";\n URL url = new URL(fullurl);\n boolean cracked = false;\n\n startTime = System.currentTimeMillis();\n\n \n BufferedReader r = new BufferedReader(new FileReader(dictfile));\n\n while((temppass = r.readLine()) != null && !cracked)\n { \n \n if(temppass.length() <= 3)\n {\n \n if(isAlpha(temppass))\n {\n \n Authenticator.setDefault(new MyAuthenticator(username,temppass));\n try{\n BufferedReader x = new BufferedReader(new InputStreamReader(\n url.openStream()));\n cracked = true;\n password = temppass;\n } catch(Exception e){}\n }\n }\n }\n\n stopTime = System.currentTimeMillis();\n \n if(!cracked)\n System.out.println("Sorry, couldnt find the password");\n else\n System.out.println("Password found: "+password);\n System.out.println("Time taken: "+(stopTime-startTime));\n }\n\n public static boolean isAlpha(String s)\n {\n boolean v = true;\n for(int i=0; i<s.length(); i++)\n {\n if(!Character.isLetter(s.charAt(i)))\n v = false;\n }\n return ;\n }\n}\n\n',
'\n\nimport java.net.*;\nimport java.text.*; \nimport java.util.*; \nimport java.io.*;\n\npublic class WatchDog {\n\n public WatchDog() {\n\n StringBuffer stringBuffer1 = new StringBuffer();\n StringBuffer stringBuffer2 = new StringBuffer();\n int i,j = 0;\n\n try{\n\n URL yahoo = new URL("http://www.cs.rmit.edu./students/"); \n BufferedReader in = new BufferedReader(new InputStreamReader(yahoo.openStream()));\n\n String inputLine = "";\n String inputLine1 = "";\n String changedtext= "";\n String changedflag= "";\n\n\n Thread.sleep(180);\n\n BufferedReader in1 = new BufferedReader(new InputStreamReader(yahoo.openStream()));\n\n\n while ((inputLine = in.readLine()) != null) {\n inputLine1 = in1.readLine();\n if (inputLine.equals(inputLine1)) {\n System.out.println("equal");\n }\n else {\n System.out.println("Detected a Change");\n System.out.println("Line Before the change:" + inputLine);\n System.out.println("Line After the change:" + inputLine1);\n changedtext = changedtext + inputLine + inputLine1;\n changedflag = "Y";\n }\n \n }\n\n if (in1.readLine() != null ) {\n System.out.println("Detected a Change");\n System.out.println("New Lines Added ");\n changedtext = changedtext + "New Lines added";\n changedflag = "Y";\n }\n\n in.print();\n in1.print();\n\n if (changedflag.equals("Y")) {\n String smtphost ="smtp.mail.rmit.edu." ; \n String from = "@rmit.edu."; \n String = "janaka1@optusnet.." ; \n }\n\n\n }\n catch(Exception e){ System.out.println("exception:" + e);}\n\t \n}\n\t\t\n public static void main (String[] args) throws Exception {\n\t\tWatchDog u = new WatchDog();\n }\n}\n',
'\n\n\n\nimport java.util.*;\nimport java.net.*;\nimport java.io.*;\nimport javax.swing.*;\n\npublic class PasswordCombination\n{\n private int pwdCounter = 0;\n private int startTime;\n private String str1,str2,str3;\n private String url = "http://sec-crack.cs.rmit.edu./SEC/2/";\n private String loginPwd;\n private String[] password;\n private HoldSharedData data;\n private char[] chars = {\'A\',\'B\',\'C\',\'D\',\'E\',\'F\',\'G\',\'H\',\'I\',\'J\',\'K\',\'L\',\'M\',\n \'N\',\'O\',\'P\',\'Q\',\'R\',\'S\',\'T\',\'U\',\'V\',\'W\',\'X\',\'Y\',\'Z\',\n \'a\',\'b\',\'c\',\'d\',\'e\',\'f\',\'g\',\'h\',\'i\',\'j\',\'k\',\'l\',\'m\',\n \'n\',\'o\',\'p\',\'q\',\'r\',\'s\',\'t\',\'u\',\'v\',\'w\',\'x\',\'y\',\'z\'};\n\n public PasswordCombination()\n {\n System.out.println("Programmed by for INTE1070 Assignment 2");\n\n String input = JOptionPane.showInputDialog( "Enter number of threads" );\n if( input == null )\n System.exit(0);\n\n int numOfConnections = Integer.parseInt( input );\n startTime = System.currentTimeMillis();\n int pwdCounter = 52*52*52 + 52*52 + 52;\n password = new String[pwdCounter];\n\n\n loadPasswords();\n System.out.println( "Total Number of Passwords: " + pwdCounter );\n createConnectionThread( numOfConnections );\n }\n\n private void doPwdCombination()\n {\n for( int i = 0; i < 52; i ++ )\n {\n str1 = "" + chars[i];\n password[pwdCounter++] = "" + chars[i];\n System.err.print( str1 + " | " );\n\n for( int j = 0; j < 52; j ++ )\n {\n str2 = str1 + chars[j];\n password[pwdCounter++] = str1 + chars[j];\n\n for( int k = 0; k < 52; k ++ )\n {\n str3 = str2 + chars[k];\n password[pwdCounter++] = str2 + chars[k];\n }\n }\n }\n }\n\n private void loadPasswords( )\n {\n FileReader fRead;\n BufferedReader buf;\n String line = null;\n String fileName = "words";\n\n try\n {\n fRead = new FileReader( fileName );\n buf = new BufferedReader(fRead);\n\n while((line = buf.readLine( )) != null)\n {\n password[pwdCounter++] = line;\n }\n }\n catch(FileNotFoundException e)\n {\n System.err.println("File not found: " + fileName);\n }\n catch(IOException ioe)\n {\n System.err.println("IO Error " + ioe);\n }\n }\n\n private void createConnectionThread( int input )\n {\n data = new HoldSharedData( startTime, password, pwdCounter );\n\n int numOfThreads = input;\n int batch = pwdCounter/numOfThreads + 1;\n numOfThreads = pwdCounter/batch + 1;\n System.out.println("Number of Connection Threads Used=" + numOfThreads);\n ConnectionThread[] connThread = new ConnectionThread[numOfThreads];\n\n for( int index = 0; index < numOfThreads; index ++ )\n {\n connThread[index] = new ConnectionThread( url, index, batch, data );\n connThread[index].conn();\n }\n }\n} ',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 512]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Training Details
Training Dataset
Unnamed Dataset
- Size: 33,411 training samples
- Columns:
sentence_0,sentence_1, andlabel - Approximate statistics based on the first 1000 samples:
sentence_0 sentence_1 label type string string int details - min: 52 tokens
- mean: 444.58 tokens
- max: 512 tokens
- min: 52 tokens
- mean: 470.35 tokens
- max: 512 tokens
- 0: ~99.80%
- 1: ~0.20%
- Samples:
sentence_0 sentence_1 label
import java.util.;
import java.io.;
public class MyTimer
{
public static void main(String args[])
{
Watchdog watch = new Watchdog();
Timer time = new Timer();
time.schedule(watch,864000000,864000000);
}
}
import java.io.;
import java.;
import java.net.;
import java.util.;
public class Dictionary {
public static void main (String[] args) throws IOException {
BufferedReader stdin = new BufferedReader (new InputStreamReader(System.in));
d = new Date().getTime();
FileReader fr = new FileReader("/usr/share/lib/dict/words");
BufferedReader bufr = new BufferedReader(fr);
String word = bufr.readLine();
int total = 960;
String[] pws = new String[total];
int count = 0;
while (word!=null){
if (word.length()<=3) { pws[count] = word; count++;}
word = bufr.readLine();
}
int i=0;
int response = 0;
for (i=0;i String uname = "";
String userinfo = uname + ":" + pws[i];
try{
String encoding = new bf.misc.BASE64Encoder().encode (userinfo.getBytes());
URL url = new URL("http://sec-crack.cs.rmit.edu./SEC/2/");
HttpURLConn...0
import java.io.;
import java.util.;
class BruteForce{
public static void main(String args[]){
String pass,s;
char a,b,c;
int z=0;
int attempt=0;
Process p;
char password[]={'A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q',
'R','S','T','U','V','W','X','Y','Z','a','b','c','d','e','f','g','h',
'i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z'};
z = System.currentTimeMillis();
int at=0;
for(int i=0;i for(int j=0;j for(int k=0;k pass=String.valueOf(password[i])+String.valueOf(password[j])+String.valueOf(password[k]);
try {
System.out.println("Trying crack using: "+pass);
at++;
p = Runtime.getRuntime().exec("wget --http-user= --http-passwd="+pass+" http://sec-crack.cs.rmit.edu./SEC/2/index.php");
try{
p.waitFor();
}
catch(Exception q){}
z = p.exitValue();
...
import java.io.*;
import java.util.Vector;
import java.util.Date;
interface UnaryPredicate {
boolean execute(Object obj);
}
public class DiffPrint {
static String outFile="";
public static abstract class Base {
protected Base(Object[] a,Object[] b) {
try
{
outfile = new PrintWriter(new FileWriter(outFile));
}
catch (Exception e)
{
e.printStackTrace();
}
file0 = a;
file1 = b;
}
protected UnaryPredicate ignore = null;
protected Object[] file0, file1;
public void print_script(Diff.change script) {
Diff.change next = script;
while (next != null)
{
Diff.change t, end;
t = next;
end = hunkfun(next);
next = end;
end = null;
print_hunk(t);
end = next;
}
outfile.flush();
}
protected Diff.change hunkfun(Diff.change hunk) {
...0package java.httputils;
import java.io.BufferedInputStream;
import java.io.BufferedOutputStream;
import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.FileReader;
import java.io.IOException;
import java.io.OutputStream;
public class WatchDog
{
protected final int MILLIS_IN_HOUR = (60 * 60 * 1000);
protected int interval = 24;
protected String URL = "http://www.cs.rmit.edu./students/";
protected String fileName = "WatchDogContent.html";
protected String command = "./alert_mail.sh";
protected String savedContent;
protected String retrievedContent;
public WatchDog()
{
super();
}
public void run() throws Exception
{
HttpRequestClient client = null;
System.out.println(getClass().getName() +
"Retrieving baseline copy of: " + getURL());
client = new HttpRequestClie...
import java.;
import java.io.;
import java.util.*;
public class Dictionary
{
public String[] passwds;
public int passwdNum;
public static void main(String[] args) throws IOException
{
Dictionary dic=new Dictionary();
dic.doDictionary();
System.exit(1);
}
void doDictionary() throws IOException
{
Runtime rt=Runtime.getRuntime();
passwds=new String[32768];
passwdNum=0;
time1=new Date().getTime();
try
{
File f = new File ("words");
FileReader fin = new FileReader (f);
BufferedReader buf = new BufferedReader(fin);
passwds[0]="00";
System.out.println(" loading words....");
{
passwds[passwdNum]=buf.readLine();
passwdNum++;
}while(passwds[passwdNum-1]!=null);
System.out.println("Finish loading words.");
} catch (FileNotFoundException exc) {
System.out.println ("File Not Found");
} catch (IOException exc) {
System.out.println ("IOException 1");
} catch (NullPointerException exc) {
System.out.println ("NullPointerEx...0 - Loss:
BatchAllTripletLoss
Training Hyperparameters
Non-Default Hyperparameters
per_device_train_batch_size: 16per_device_eval_batch_size: 16num_train_epochs: 1multi_dataset_batch_sampler: round_robin
All Hyperparameters
Click to expand
overwrite_output_dir: Falsedo_predict: Falseeval_strategy: noprediction_loss_only: Trueper_device_train_batch_size: 16per_device_eval_batch_size: 16per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1num_train_epochs: 1max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.0warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters:auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: round_robin
Training Logs
| Epoch | Step | Training Loss |
|---|---|---|
| 0.2393 | 500 | 0.2122 |
| 0.4787 | 1000 | 0.1686 |
| 0.7180 | 1500 | 0.2193 |
| 0.9574 | 2000 | 0.2084 |
Framework Versions
- Python: 3.11.13
- Sentence Transformers: 4.1.0
- Transformers: 4.52.4
- PyTorch: 2.6.0+cu124
- Accelerate: 1.7.0
- Datasets: 3.6.0
- Tokenizers: 0.21.1
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
BatchAllTripletLoss
@misc{hermans2017defense,
title={In Defense of the Triplet Loss for Person Re-Identification},
author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
year={2017},
eprint={1703.07737},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
- Downloads last month
- -
Model tree for buelfhood/SOCO-Java-CodeT5Small-ST
Base model
Salesforce/codet5-small