Reputation
Badges 1
31 × Eureka!TimelyPenguin76 , thank you. I'll try now
Hi SuccessfulKoala55 Here is code_snipet
` task = Task.init(project_name=PROJECT_NAME, task_name=section)
task.connect(params)
print('params', params)
dataset = Dataset.create(dataset_name=params['dataset'], dataset_project=PROJECT_NAME)
dataset_local_dir = dataset.get_local_copy()
dataset._task.output_uri = task.output_uri
KeywordProcessor(params['es_host'], params['es_port'], True, DOCS_ROOT)
dataset.add_files(DOCS_ROOT, wildcard='*.csv')
dataset.upload() `I add several files to a da...
SuccessfulKoala55 , I have the following structure now (maybe it's not best practice and you can suggest a better one). There is a sequence of tasks, that are run manually or from pipeline. Every task at the end updates some dataset. The dataset should be closed only after all the sequence is finished (and some task in the sequence can take more than two days). The issue I want to avoid is aborting of the dataset task that these regular tasks update.
Do I understand right that I can avoid task (including dataset termination if I update it somehow once a period (say, sending a log line)?
Hi SuccessfulKoala55 Thank you for response. So, it's not possible if we use community server, right?
AgitatedDove14 I didn't know about dataset.squish(). Thank you. I'll check this variant today
clearml 1.0.2. clearml-agent was 0.17 yet, now I deleted it and installed 1.0.0
TimelyPenguin76 , rechecking this situation with clearml-agent 1.0.0 now...
TimelyPenguin76 Ok, when no explicit artifact upload is done, it indeed uploads model locally, but not remotely
I didn't try yet but thought about dataset.upload(output_url=)
TimelyPenguin76 , it worked. Thank you!
VexedCat68 I would try to find the process on the machine with something like 'ps aux | grep clearml ' and kill it
As I remember, I added it because it was not added automatically. But I'll recheck now...
I see I run it from repository root
Hi AgitatedDove14 . Thank you. Yes. Pipeline means and clearml-agent on environment that runs some parallelization framework are options. I'll look in this direction
TimelyPenguin76 , sorry I didn't see this comment. No. I mean that when I run task locally (from PyCharm and without task.execute_remotely()), model is uploaded and registered. But when I do the same with task.execute_remotely() and it runs on agent model cannot be found in the task after this. I speak about the same script I sent in the second thread
SuccessfulKoala55 thank you. It worked
AgitatedDove14 , you are right. It was invalid working directory. All works. Thank you
SuccessfulKoala55 To be more specific, I mean situations when training is long and its parts can be parallelized in some way like in Spark or Dask. I suspect that such functionality is framework-specific and it's hard to believe it is in focus on ClearML that is more or less framework-agnostic. On the other hand, ClearML has many integrations with concrete frameworks. So I'd like to understand whether there is any kind of support on general ClearML level or as a part of integrations with fra...
SuccessfulKoala55 the second option
` Current configuration (clearml_agent v0.17.2, location: /home/olga/clearml.conf):
api.version = 1.5
api.verify_certificate = true
api.default_version = 1.5
api.http.max_req_size = 15728640
api.http.retries.total = 240
api.http.retries.connect = 240
api.http.retries.read = 240
api.http.retries.redirect = 240
api.http.retries.status = 240
api.http.retries.backoff_factor = 1.0
api.http.retries.backoff_max = 120.0
api.http.wait_on_maintenance_forever = true
api.http.pool_...
TimelyPenguin76 , any news regarding this?
TimelyPenguin76 , thank you for willing to help. Here is a small project attached. load_mnist.py generates a dataset, model_train.py is the script in question (it uses the dataset generated by load_mnist.py)
TimelyPenguin76 , thank you. Trying...
TimelyPenguin76 , the same behaviour with clearml-agent 1.0.0