@<1571308003204796416:profile|HollowPeacock58> seems like an internal issue copying this object config.model
This is a complex object, and it seems that for some reason
None
As a workaround just do not connect this object. it seems you cannot pickle it / copy it (see GH issue)
PompousParrot44
you can always manually store/load models, example: https://github.com/allegroai/trains/blob/65a4aa7aa90fc867993cf0d5e36c214e6c044270/examples/reporting/model_config.py#L35 Sure, you can patch any frame work with something similar to what we do in xgboost, any such PR will be greatly appreciated! https://github.com/allegroai/trains/blob/master/trains/binding/frameworks/xgboost_bind.py
Lambdaโs are designed to be short-lived, I donโt think itโs a fine idea to run it in a loop TBH.
Yeah, you are right, but maybe it would be fine to launch, have the lambda run for 30-60sec (i.e. checking idle time for 1 min, stateless, only keeping track inside the execution context) then take it down)
What I'm trying to solve here, is (1) quick way to understand if the agent is actually idling or just between Tasks (2) still avoid having the "idle watchdog" short lived, to that it can...
Hi RipeGoose2
What exactly is being uploaded ? Are those the actual model weights or intermediate files ?
Yes, that seems to be the case. That said they should have different worker IDs agent-0 and agent-1 ...
What's your trains-agent version ?
It seems stuck somewhere in the python path... Can you check in runtime what's os.environ['PYTHONPATH']
As a hack you can try DEFAULT_VERSION
(it's just a flag and should basically do Store)
EDIT: sorry that won't work ๐
So if everything works you should see "my_package" package in the "installed packages"
the assumption is that if you do:pip install "my_package"
It will set "pandas" as one of its dependencies, and pip will automatically pull pandas as well.
That way we do not list the entire venv you are running on, just the packages/versions you are using, and we let pip sort the dependencies when installing with the agent
Make sense ?
Or you want to generate it from a previously executed run?
How can I ensure that additional tasks arenโt created for a notebook unless I really want to?
TrickySheep9 are you saying two Tasks are created in the same notebook without you closing one of them ?
(Also, how is the git diff warning there with the latest clearml, I think there was some fix related to that)
Hi, I changed it to 1.13.0, but it still threw the same error.
This is odd, just so we can make the agent better, any chance you can send the Task log ?
However, the pipeline experiment is not visible in the project experiment list.
I mean press on the "full details" in the pipeline page
` task = Task.init(...)
assume model checkpoint
if task.models['output']:
get the latest checlpoint
model_file_or_path = task.models['output'][-1].get_local_copy()
load the model checkpoint
run training code `RoughTiger69 Would the above work for you?
but this will be invoked before fil-profiler starts generating them
I thought it will flush in the background ๐
You can however configure the profiler to a specific folder, then mount the folder to the host machine:
In the "base docker args" section add -v /host/folder/for/profiler:/inside/container/profile
YEYYYYYYyyyyyyyyyyyyyyyyyy
OH I see. I think you should use the environment variable to override it:
None
so add to the docker args something like
-e CLEARML_AGENT__AGENT__PACKAGE_MANAGER__POETRY_INSTALL_EXTRA_ARGS=
how can you be snyk and lower than 0.96
Yep Snyk
auto "patching" is great ๐
as I mentioned wait for the GH sync tomorrow, a few more things are missing there
In the meantime you can just do ">= 0.109.1"
AverageBee39 I cannot reproduce it ๐ (at least on the latest from Github)
I'm assuming the pipeline is created with target_project
, anything else I need to add?
Hi SpotlessFish46 ,
Is the artifact already in S3 ?
Is the S3 configured as the default files_server in the trains.conf
?
You can always use the StorageManager upload to wherever and register the url on the artifacts.
You can also programmatically change the artifact destination server to S3, then upload the artifact as usual.
What would be the best natch for you?
ReassuredTiger98 I'm trying to debug what's going on, because it should have worked.
Regrading Prints ...
` from clearml import Task
from time import sleep
def main():
task = Task.init(project_name="test", task_name="test")
d = {"a": "1"}
print('uploading artifact')
task.upload_artifact("myArtifact", d)
print('done uploading artifact')
# not sure if this helps but it won'r hurt to debug
sleep(3.0)
if name == "main":
main() `
Are you referring to Poetry ?
` from clearml.automation.parameters import LogUniformParameterRange
sampler = LogUniformParameterRange(name='test', min_value=-3.0, max_value=1.0, step_size=0.5)
sampler.to_list()
Out[2]:
[{'test': 1.0},
{'test': 3.1622776601683795},
{'test': 10.0},
{'test': 31.622776601683793},
{'test': 100.0},
{'test': 316.22776601683796},
{'test': 1000.0},
{'test': 3162.2776601683795}] `
then when we triggered a inference deploy it failed
How would you control it? Is it based on a Task ? like a property "match python version" ?
Oh, and good job starting your reference with an author that goes early in the alphabetical ordering, lol:
LOL, worst case it would have been C ... ๐
So if you set it, then all nodes will be provisioned with the same execution script.
This is okay in a way, since the actual "agent ID" is by default set based on the machine hostname, which I assume is unique ?
Hi CluelessElephant89
Hi guys, if I spot issue with documentations, where should I post them?
The best way from our perspective PR the fix ๐ this is why we put it on GitHub
Hi WickedGoat98 ,
I think you are correct ๐
I would guess it is something with the ingress configuration (i.e. ConfigMap)
Wait, why aren't you just calling Popen? (or os.system), I'm not sure how it relates to the torch multiprocess example. What am I missing ?