Reputation
Badges 1
31 × Eureka!AgitatedDove14 , you are right. It was invalid working directory. All works. Thank you
SuccessfulKoala55 Without commented line files are uploaded to http://files.community.clear.ml instead of by S3 bucket
TimelyPenguin76 , the same behaviour with clearml-agent 1.0.0
TimelyPenguin76 Ok, when no explicit artifact upload is done, it indeed uploads model locally, but not remotely
As I remember, I added it because it was not added automatically. But I'll recheck now...
TimelyPenguin76 , thank you for explanation. 1). Great. 2) As you can see from my screenshot, Data Processing task is created but I don't see Datasets tab as I see in https://clear.ml/blog/construction-feat-tf2-object-detection-api/ 3) I see. So need to specify with every cli command/SDK method call
SuccessfulKoala55 thank you. It worked
Do I understand right that I can avoid task (including dataset termination if I update it somehow once a period (say, sending a log line)?
AgitatedDove14 I didn't know about dataset.squish(). Thank you. I'll check this variant today
SuccessfulKoala55 To be more specific, I mean situations when training is long and its parts can be parallelized in some way like in Spark or Dask. I suspect that such functionality is framework-specific and it's hard to believe it is in focus on ClearML that is more or less framework-agnostic. On the other hand, ClearML has many integrations with concrete frameworks. So I'd like to understand whether there is any kind of support on general ClearML level or as a part of integrations with fra...
SuccessfulKoala55 the second option
Hi SuccessfulKoala55 Here is code_snipet
` task = Task.init(project_name=PROJECT_NAME, task_name=section)
task.connect(params)
print('params', params)
dataset = Dataset.create(dataset_name=params['dataset'], dataset_project=PROJECT_NAME)
dataset_local_dir = dataset.get_local_copy()
dataset._task.output_uri = task.output_uri
KeywordProcessor(params['es_host'], params['es_port'], True, DOCS_ROOT)
dataset.add_files(DOCS_ROOT, wildcard='*.csv')
dataset.upload() `I add several files to a da...
TimelyPenguin76 , sorry I didn't see this comment. No. I mean that when I run task locally (from PyCharm and without task.execute_remotely()), model is uploaded and registered. But when I do the same with task.execute_remotely() and it runs on agent model cannot be found in the task after this. I speak about the same script I sent in the second thread
TimelyPenguin76 , rechecking this situation with clearml-agent 1.0.0 now...
I see I run it from repository root
TimelyPenguin76 , any news regarding this?
TimelyPenguin76 , thank you. Trying...
TimelyPenguin76 , it worked. Thank you!
SuccessfulKoala55 , I have the following structure now (maybe it's not best practice and you can suggest a better one). There is a sequence of tasks, that are run manually or from pipeline. Every task at the end updates some dataset. The dataset should be closed only after all the sequence is finished (and some task in the sequence can take more than two days). The issue I want to avoid is aborting of the dataset task that these regular tasks update.
VexedCat68 I would try to find the process on the machine with something like 'ps aux | grep clearml ' and kill it
Hi AgitatedDove14 . Thank you. Yes. Pipeline means and clearml-agent on environment that runs some parallelization framework are options. I'll look in this direction
I didn't try yet but thought about dataset.upload(output_url=)
Hi SuccessfulKoala55 Thank you for response. So, it's not possible if we use community server, right?
TimelyPenguin76 , thank you for willing to help. Here is a small project attached. load_mnist.py generates a dataset, model_train.py is the script in question (it uses the dataset generated by load_mnist.py)
` Current configuration (clearml_agent v0.17.2, location: /home/olga/clearml.conf):
api.version = 1.5
api.verify_certificate = true
api.default_version = 1.5
api.http.max_req_size = 15728640
api.http.retries.total = 240
api.http.retries.connect = 240
api.http.retries.read = 240
api.http.retries.redirect = 240
api.http.retries.status = 240
api.http.retries.backoff_factor = 1.0
api.http.retries.backoff_max = 120.0
api.http.wait_on_maintenance_forever = true
api.http.pool_...