Hi All, We Have A Self Hosted Clearml Server, My Colleague Uploaded A Dataset From His Machine And When I Try To Clone And Enqueue The Dataset Into Different Project From My Machine The Task Gets Failed Prompting "The System Cannot Find The File Specified

Answered

Hi All,
we have a self hosted clearml server, my colleague uploaded a dataset from his machine and when I try to clone and enqueue the dataset into different project from my machine the task gets failed prompting "The system cannot find the file specified" and it is not visible in dataset tab also - Is there any way to recreate the dataset or to view the original dataset from the cloned one?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					ThoughtfulElephant4
				
					0
					 × 1

Votes Newest

Answers 26

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					ThoughtfulElephant4
				
					0
					 × 1

How did you create the dataset originally, can you share a snippet that reproduces this?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

This is where I cloned from.

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					ThoughtfulElephant4
				
					0
					 × 1

Execution log

from clearml import Dataset

ds = Dataset.create(dataset_project='Asteroid_Solution/.datasets/raw_asteroid_dataset', dataset_name='raw_asteroid_dataset', dataset_version='None')
ds.add_files(
    path='/tmp/nasa.csv', 
    wildcard=None, 
    local_base_folder=None, 
    dataset_path=None, 
    recursive=True
)
ds.upload(
    show_progress=True, 
    verbose=False, 
    output_url=None, 
    compression=None
)
ds.finalize()

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					ThoughtfulElephant4
				
					0
					 × 1

Can you attach the full task log?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

Hi @<1528546301493383168:profile|ThoughtfulElephant4> , where did you upload the dataset? Can you add the full log? If your colleague clones and enqueues - the code assumes that the files are local, no?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

I mean the one that failed...

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

I see this is not running using docker - can you just go to the venv directory C:/Users/guruprasad.j/.clearml/venvs-builds unser the last venv used and see what files you have there?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

@<1523701070390366208:profile|CostlyOstrich36> If I want to create a new project and I want to use the already existing dataset created by others in clearml server.

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					ThoughtfulElephant4
				
					0
					 × 1

It either cannot create the local code file from the uncommitted changes, or it can't find python...

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

I have to clone the dataset into a new project that other's have uploaded...what is the best way to do it?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					ThoughtfulElephant4
				
					0
					 × 1

but what does your clearml.conf define as the files host address?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

Hi @<1523701070390366208:profile|CostlyOstrich36> Clearml server is on aws, It created a dataset artifact when my colleague uploaded it then when I try to clone and enqueue, it fails.

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					ThoughtfulElephant4
				
					0
					 × 1

@<1528546301493383168:profile|ThoughtfulElephant4> , why would you clone a dataset?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

Can you show the task's execution section in the UI?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

@<1523701087100473344:profile|SuccessfulKoala55> Yes there is no docker involved and I have nothing in the venvs-builds folder.

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					ThoughtfulElephant4
				
					0
					 × 1

Are you running the task from a git repo? (also, can you show the top of the execution section?)

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

This is the artifact URL.

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					ThoughtfulElephant4
				
					0
					 × 1

This is what I cloned.

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					ThoughtfulElephant4
				
					0
					 × 1

@<1528546301493383168:profile|ThoughtfulElephant4> how is the ClearML Files server configured on your machine? is it None ?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					ThoughtfulElephant4
				
					0
					 × 1

What is the artifact URL from the task?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

Hi @<1523701070390366208:profile|CostlyOstrich36> here is the snippet

from clearml import Task, 
Dataset import global_config 
from data import database 

task = Task.init( project_name=global_config.PROJECT_NAME, task_name='get data', task_type='data_processing', reuse_last_task_id=False ) 

config = { 'query_date': '2022-01-01' } task.connect(config) 

# Get the data and a path to the file query = 'SELECT * FROM asteroids WHERE strftime("%Y-%m-%d", `date`) <= strftime("%Y-%m-%d", "{}")'.format(config['query_date']) df, data_path = database.query_database_to_df(query=query) print(f"Dataset downloaded to: {data_path}") print(df.head()) 

# Create a ClearML dataset dataset = Dataset.create( dataset_name='raw_asteroid_dataset', dataset_project=global_config.PROJECT_NAME ) 

# Add the local files we downloaded earlier dataset.add_files(data_path)
 dataset.get_logger().report_table(title='Asteroid Data', series='head', table_plot=df.head())
 
# Finalize and upload the data and labels of the dataset dataset.finalize(auto_upload=True) print(f"Created dataset with ID: {dataset.id}") 
print(f"Data size: {len(df)}")

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					ThoughtfulElephant4
				
					0
					 × 1

Task console log

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					ThoughtfulElephant4
				
					0
					 × 1

@<1523701087100473344:profile|SuccessfulKoala55> this is execution section of task.

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					ThoughtfulElephant4
				
					0
					 × 1

Yes @<1523701087100473344:profile|SuccessfulKoala55> same configuration as you mentioned before.

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					ThoughtfulElephant4
				
					0
					 × 1

Write your answer

2K Views

26 Answers

2 years ago