Reputation
Badges 1
45 × Eureka!AgitatedDove14 Looks like that. First, I've created a toy task running in "services" queue (you didn't tell that but I guess you assumed). I haven't found how to specify the queue to run in code ( Task.equeue(task, queue_name='services')
returned an error), so I ran toy.py first in "default" queue, aborted toy.py, started nntraining in "default" queue. Then I reset toy.py and enqueued it to "services" queue. Toy.py failed shortly. I've also reset both toy.py and nntraining and enqueue...
AgitatedDove14 How can the first process corrupt the second and why doesn't this occur if I run pipeline from command line? Just to be precise - I run all the processes as administrator. However, I've tested running the pipeline from command line in non-administrator mode, it works fine.
Ok, ran (just used point instead of comma in print statement - comment if someone reading this will run this code). Attached to this message.
AgitatedDove14 Yes, the difference in installed packages is large - the training stage, which runs ok has all the following:
These libraries are absent in the option, which fails. The only libraries of that option (all are present in correct-working option) are:
absl_py==0.9.0
boto3==1.16.6
clearml==0.17.4
joblib==0.17.0
matplotlib==3.3.1
numpy==1.18.4
scikit_learn==0.23.2
tensorflow_gpu==2.2.0
watchdog==0.10.3
AgitatedDove14git diff
gives nothing - current local repository is up-to-date with gitlab origin.
Yes that is the git repository cache, you are correct. I wonder what happened there ?
So far my local and remote gitlab repositories are synchronized, I suspect, that Failed applying git diff, see diff above
error is caused by cached repository from which clearml tries to run the process. I've cleaned the cache, but it haven't helped.
The installed packages is fully editab...
Here's also the log of failed pipeline - maybe it may give a clue.
AgitatedDove14
No, I meant different thing. It's not easy to explain, sorry. Let me try. Say, I have a project in folder "d:\object_detection". There I have a script, which converts annotations from labelme format to coco format. This script name is convert_test.py and it runs a process, registered under the same name in clearml. This script, being run separately from command prompt creates new file in project folder - test.json . I delete this file, synch local and remote repos, both...
Exactly! To be more specified - the same base_task_id fails, if the pipeline is cloned and started from UI. I've checked the queues for failed and completed tasks - they are the same (default, gpu-all).
AgitatedDove14
No, I do not use --docker
flag for clearml agent In Windows setting system_site_packages
to true
allowed all stages in pipeline to start - but doesn't work in Lunux. I've deleted tfrecords from master branch and commit the removal, and set the folder for tfrecords to be ignored in .gitignore. Trying to find, which changes are considered to be uncommited. By cache files I mean the files in folder C:\Users\Super.clearml\vcs-cache - based on error message, cle...
AgitatedDove14 Great, thanks! Wow, guys, your response while being helpful is too fast, I didn't use to this! 🙂
Okay. I see, I didn't understand clearly the structure and logic behind ClearML. I though that exernal git repository should be set up to keep logs, stats, etc. So, all these are kept on the ClearML host, correct? However, if I want to keep logs on outer repo, is it possible to config ClearML to keep all these files there?
Thanks. Not yet, but will watch, by all means.
AgitatedDove14 It works!!! Thanks a lot!