How did you create the dataset originally, can you share a snippet that reproduces this?
@<1528546301493383168:profile|ThoughtfulElephant4> how is the ClearML Files server configured on your machine? is it None ?
@<1528546301493383168:profile|ThoughtfulElephant4> , why would you clone a dataset?
Execution log
from clearml import Dataset
ds = Dataset.create(dataset_project='Asteroid_Solution/.datasets/raw_asteroid_dataset', dataset_name='raw_asteroid_dataset', dataset_version='None')
ds.add_files(
    path='/tmp/nasa.csv', 
    wildcard=None, 
    local_base_folder=None, 
    dataset_path=None, 
    recursive=True
)
ds.upload(
    show_progress=True, 
    verbose=False, 
    output_url=None, 
    compression=None
)
ds.finalize()
Hi @<1528546301493383168:profile|ThoughtfulElephant4> , where did you upload the dataset? Can you add the full log? If your colleague clones and enqueues - the code assumes that the files are local, no?
I see this is not running using docker - can you just go to the venv directory  C:/Users/guruprasad.j/.clearml/venvs-builds  unser the last venv used and see what files you have there?
@<1523701070390366208:profile|CostlyOstrich36> If I want to create a new project and I want to use the already existing dataset created by others in clearml server.
It either cannot create the local code file from the uncommitted changes, or it can't find python...
Can you show the task's execution section in the UI?
I have to clone the dataset into a new project that other's have uploaded...what is the best way to do it?
Hi @<1523701070390366208:profile|CostlyOstrich36> here is the snippet
from clearml import Task, 
Dataset import global_config 
from data import database 
task = Task.init( project_name=global_config.PROJECT_NAME, task_name='get data', task_type='data_processing', reuse_last_task_id=False ) 
config = { 'query_date': '2022-01-01' } task.connect(config) 
# Get the data and a path to the file query = 'SELECT * FROM asteroids WHERE strftime("%Y-%m-%d", `date`) <= strftime("%Y-%m-%d", "{}")'.format(config['query_date']) df, data_path = database.query_database_to_df(query=query) print(f"Dataset downloaded to: {data_path}") print(df.head()) 
# Create a ClearML dataset dataset = Dataset.create( dataset_name='raw_asteroid_dataset', dataset_project=global_config.PROJECT_NAME ) 
# Add the local files we downloaded earlier dataset.add_files(data_path)
 dataset.get_logger().report_table(title='Asteroid Data', series='head', table_plot=df.head())
 
# Finalize and upload the data and labels of the dataset dataset.finalize(auto_upload=True) print(f"Created dataset with ID: {dataset.id}") 
print(f"Data size: {len(df)}")
but what does your clearml.conf define as the files host address?
@<1523701087100473344:profile|SuccessfulKoala55> Yes there is no docker involved and I have nothing in the venvs-builds folder.
Hi @<1523701070390366208:profile|CostlyOstrich36> Clearml server is on aws, It created a dataset artifact when my colleague uploaded it then when I try to clone and enqueue, it fails.
@<1523701087100473344:profile|SuccessfulKoala55> this is execution section of task.
Are you running the task from a git repo? (also, can you show the top of the execution section?)
Yes @<1523701087100473344:profile|SuccessfulKoala55> same configuration as you mentioned before.
 
				

 
				 
				

