I hit F12 to check projects.get_all_ex
but nothing is fired, I guess the web ui is just frozen in some weird state
The main issue is the task_logger.report_scalar()
not reporting the scalars
I should also rename /opt/trains/data/elastic_migrated_2020-08-11_15-27-05
folder to /opt/trains/data/elastic
before running the migration tool right?
AppetizingMouse58 After some thoughts, we decided to install from scratch 0.16, with no data migration, because we believe this was an edge case not worth spending efforts on. Thank you very much for your help there, very appreciated. You guys rock! 🙂
AgitatedDove14 ok, but this happens in my local machine, not in the agent
Sure, I opened an issue https://github.com/allegroai/clearml/issues/288 unfortunately I don't have time to open a PR 🙏
ok, thanks SuccessfulKoala55 !
I can also access these files directly if I enter the url in the browser
These images are actually stored there and I can access them via the url shared above (the one written in the pop up message saying that these files could not be deleted)
I just checked if something changed in https://allegro.ai/clearml/docs/docs/deploying_clearml/clearml_server_config.html#web-login-authentication
but I also make sure to write the trains.conf to the root directory in this bash script:echo " sdk.aws.s3.key = *** sdk.aws.s3.secret = *** " > ~/trains.conf ... python3 -m trains_agent --config-file "~/trains.conf" ...
If the reporting is done on a subprocess, I can imagine that the task.set_initial_iteration(0)
call is only effective in the main process, not in the subprocess used for reporting. Could it be the case?
Probably something's wrong with the instance, which AMI you used? the default one?
The default one is not existing/accessible anymore, I replaced it with the one that was shown in the NVIDIA Deep Learning AMI markplace page https://aws.amazon.com/marketplace/pp/B076K31M1S?qid=1610377938050&sr=0-1&ref_=srh_res_product_title that is: ami-04c0416d6bd8e4b1f
and just run the same code I run production
my docker-compose for the master node of the ES cluster is the following:
` version: "3.6"
services:
elasticsearch:
container_name: clearml-elastic
environment:
ES_JAVA_OPTS: -Xms2g -Xmx2g
bootstrap.memory_lock: "true"
cluster.name: clearml-es
cluster.initial_master_nodes: clearml-es-n1, clearml-es-n2, clearml-es-n3
cluster.routing.allocation.node_initial_primaries_recoveries: "500"
cluster.routing.allocation.disk.watermark.low: 500mb
clust...
is it different from Task.set_offline(True)?
so that any error that could arise from communication with the server could be tested
Actually I think I am approaching the problem from the wrong angle
you mean to run it on the CI machine ?
yes
That should not happen, no? Maybe there is a bug that needs fixing on clearml-agent ?
It just to test that the logic being executed in if not Task.running_locally()
is correct
Ho wow! is it possible to not specify a remote task? (If i am working with Task.set_offline(True))
I think the best case scenario would be that ClearML maintains a github action that sets up a dummy clearml-server, so that anyone can use it as a basis to run their tests, so that they just have to change to URL of the server to the local one executed in the github action and they can test seamlessly all their code, wdyt?
I wouldn't do it, this is less code to maintain from your side and honestly too much auto magic makes it difficult for the user to control the environment (ie. to understand what happens behind the scenes). I am not sure what switching back will solve, here the wheel should have been correct, it's just the architecture of the card that is incompatible
Thanks! Corrected both, now its building
Yes, actually thats what I am doing, because I have a task C depending on tasks A and B. Since a Task cannot have two parents, I retrieve one task id (task A) as the parent id and the other one (ID of task B) as a hyper-parameter, as you described 👍