Reputation
Badges 1
45 × Eureka!AgitatedDove14 Yes, the difference in installed packages is large - the training stage, which runs ok has all the following:
Regarding diff issue - just found that empty folder 'tfrecord' in which tfrecords should be created, doesn't exist on gitlab origin repository. Added it there, then pulled the origin. Still having diff issue, but I'll run few trials to be sure, there's nothin else to create the issue.
As for "installed packages" list. To create a pipeline, I first run each stage (as a script) from cmd. After all the stages are created and can be seen in UI, I run the pipeline. So far I understand, clearml tra...
Okay. I see, I didn't understand clearly the structure and logic behind ClearML. I though that exernal git repository should be set up to keep logs, stats, etc. So, all these are kept on the ClearML host, correct? However, if I want to keep logs on outer repo, is it possible to config ClearML to keep all these files there?
https://clearml.slack.com/archives/CTK20V944/p1610481348165400?thread_ts=1610476184.162600&cid=CTK20V944
Indeed, that was a cookie issue. After deleting cookies, everything works fine. Thanks. Interesting enough, I had this issue both on Chrome and FF.
Thanks. Not yet, but will watch, by all means.
AgitatedDove14 It works!!! Thanks a lot!
AgitatedDove14 Great, thanks! Wow, guys, your response while being helpful is too fast, I didn't use to this! 🙂
AgitatedDove14 Yes, it's running with an agent. I've updated the clearml from version 0.17.4 to 0.17.5. Sorry, didn't note the other libraries, which were automatically updated along with the new ClearML version.
However, is there any way to manipulate the packages, which will be installed in venv on running the pipeline? I've tried to run the pipeline on Linux server (clearml v.0.17.4) and got the following issue:
` Requirement already satisfied: numpy==1.19.5 in /root/.clearml/venvs-builds...
AgitatedDove14 According to the logs (up to traceback message), the only difference between those two tasks is task id name
Well, I'm pretty sure that nntraining is executed in the same queue for these two cases:
Will the record be available?
AgitatedDove14 Does it make any sense to chdnge system_site_packages to true if I run in clearml using Docker?
Ok, ran (just used point instead of comma in print statement - comment if someone reading this will run this code). Attached to this message.
AgitatedDove14
For classification example (clml_cl_toy) - script A is image_augmentation.py , which creates augmented images, script B is train_1st_nn.py (of train_2nd_nn.py , which does the same), which trains ANN based on augmented images For object detection example script A is represented by two scripts - annotation_conversion_test.py , which creates file test.json and annotation_conversion_train.py , which creates file train.json . These files are use...
AgitatedDove14git diff gives nothing - current local repository is up-to-date with gitlab origin.
Yes that is the git repository cache, you are correct. I wonder what happened there ?
So far my local and remote gitlab repositories are synchronized, I suspect, that Failed applying git diff, see diff above error is caused by cached repository from which clearml tries to run the process. I've cleaned the cache, but it haven't helped.
The installed packages is fully editab...
Here's also the log of failed pipeline - maybe it may give a clue.
AgitatedDove14 I've set system_site_packages: true . Almost succeeded. Current pipeline has the following stages: 1) convert annotations from labelme into coco format 2) convert annotations in coco format and corresponding images to tfrecords. 3) run training MASK RCNN. The process previously failed on the second stage. After setting system_site_packages: true , the pipeline starts the third stage, but fails with some git issue:
` diff --git a/work/tfrecord/test.record b/work/t...
AgitatedDove14 In "Results -> Console" tab of UI, I see that the issue with running object detection on Linux is the following:ERROR: Could not find a version that satisfies the requirement object_detection==0.1 (from -r /tmp/cached-reqsypv09bhw.txt (line 7)) (from versions: 0.0.3)
Is it possible to comment the line object_detection==0.1 ? Actually, no such version of this or similar library exists. I quess, that this requirement is not necessary. Can I turn of the installati...
AgitatedDove14 Looks like that. First, I've created a toy task running in "services" queue (you didn't tell that but I guess you assumed). I haven't found how to specify the queue to run in code ( Task.equeue(task, queue_name='services') returned an error), so I ran toy.py first in "default" queue, aborted toy.py, started nntraining in "default" queue. Then I reset toy.py and enqueued it to "services" queue. Toy.py failed shortly. I've also reset both toy.py and nntraining and enqueue...
These libraries are absent in the option, which fails. The only libraries of that option (all are present in correct-working option) are:
absl_py==0.9.0
boto3==1.16.6
clearml==0.17.4
joblib==0.17.0
matplotlib==3.3.1
numpy==1.18.4
scikit_learn==0.23.2
tensorflow_gpu==2.2.0
watchdog==0.10.3
Exactly! To be more specified - the same base_task_id fails, if the pipeline is cloned and started from UI. I've checked the queues for failed and completed tasks - they are the same (default, gpu-all).



